Bioinformatics & Computational Biology Group
Claflin University
Research
1) Identifying cancer drug targets utilizing construction of complex interactome of proteins (article & article): The approach for CaI construction and analysis. (a) The three databases BIOCARTA, PID, REACTOME and KEGG utilised for the extraction of pathways followed by disintegration into protein constituents and identification of any other pathways they are involved with. (b) The meta database STRING for finding the interactions of all the proteins pooled above. (c) The large component of the CaI constructed from pooled interactions above, coloured by 335 modules by Rosvall Algorithm with node size plotted as per degree (d) The analyses for power law, K-core, inter- and intra-modular connectivities for CaI constructed followed by the drug statuses against centrality measures, in clockwise manner. The three SPINs and GPINs of A. baumannii reflecting the degree of connectivity. SPINs are represented in blue spheres connected through blue-colored curved lines for (A) VaCAB, having vaccine candidates; (B) ViFAB, with virulent factors; and (C) KeFAB, with key factors each with their interactors. (D) GPIN with proteins represented in black spheres connected with black curved lines to form the interactome. The technical analysis of the constructed CaI. (a) Complementary Cumulative Degree Distribution (CCDF) of CaI showing Power-Law behaviour. (b) K-core analysis of CaI representing the size of each k-shell (number of proteins appearing in k-core but not in k + 1th core) from periphery (k = 1) to inner core (k-max). (c) Classification of CaI proteins (R) based on its role and region in network space, the P-Z space classified into 7 categories of hub and non-hub nodes. The latter has been assigned as ultra-peripheral (R1), peripheral (R2), non-hub connector (R3) and non-hub kinless nodes (R4) and the former has been assigned as provincial (R5), connector (R6) and kinless hubs (R7) as described by Guimera et al. Kinless hubs nodes are supposed to be important in term of functionality, which has high connection within module as well as between modules.
2) Convolutional Neural Networks: Analysis of drug resistance from deep sequencing data (article). Restricted Boltzmann Machines are an effective machine learning tool for classification of genomic and structural data. They can also be used to compare resistance profiles of different protease inhibitors.
3) Prognostic Model Predicts Survival in Cancer Patients (article): An optimal prognostic model by the combination of six mRNAs was established. Kaplan–Meier survival analysis revealed effective risk stratification by this model for patients in the two datasets. The area under ROC curve (AUC) was > 0.65 for training and validation datasets, indicating good sensitivity and specificity of this model. Moreover, prominent superiority of this model to investigate prognostic biomarkers was demonstrated.
4) Regression and Tree Based Classification Models: Common cancer biomarkers identified through artificial intelligence (article). Identification of biomarker genes. (a) Heat map showing expression levels of top 25 cancer biomarker genes in ovarian and breast cancer types, (b) variable importance with gene ranks for all the genes, (c) mean decrease gini value for top 25 biomarker genes.
5) Deep Learning & Computer Vision: Bounding box algorithms are useful in localization of image patterns. Recently, utilization of convolutional neural networks on X-ray images has proven a promising disease prediction technique. However, pattern localization over prediction has always been a challenging task with inconsistent coordinates, sizes, resolution and capture positions of an image. In this article, we present a unique approach of SSD with a VGG-16 network as a backbone for feature detection of bounding box algorithm to predict the location of an anomaly within chest X-ray image (article & article):
Tutorials
-
Linear Regression: The aim of linear regression is to model a continuous variable Y as a mathematical function of one or more X variable(s), so that we can use this regression model to predict the Y when only the X is known. This mathematical equation can be generalized as follows: Y = β1 + β2X + ϵ, where, β1 is the intercept and β2 is the slope. Collectively, they are called regression coefficients. ϵ is the error term, the part of Y the regression model is unable to explain. The video can be found here: Linear Regression in R & source-code can be found here: Linear_Regression_R_Code
-
Tree Based Models: Recursive partitioning is a fundamental tool in data mining. It helps us explore the structure of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome. Classification and regression trees (as described by Brieman, Freidman, Olshen, and Stone) can be generated through the rpart package. In R, decision tree uses a complexity parameter (cp). It measures the tradeoff between model complexity and accuracy on training set. A smaller cp will lead to a bigger tree, which might overfit the model. Conversely, a large cp value might underfit the model. Underfitting occurs when the model does not capture underlying trends properly. The video can be found here: Tree based models in R & source-code can be found here: Tree based models code
-
Machine learning (ML) continues to grow in importance for many organizations across nearly all domains. Some example applications of machine learning in practice include: Predicting the likelihood of a patient returning to the hospital (readmission) within 30 days of discharge. Segmenting customers based on common attributes or purchasing behavior for targeted marketing. Predicting coupon redemption rates for a given marketing campaign. Predicting customer churn so an organization can perform preventative intervention. In essence, these tasks all seek to learn from data. To address each scenario, we can use a given set of features to train an algorithm and extract insights. These algorithms, or learners, can be classified according to the amount and type of supervision needed during training. This tutorial is focused on implementing 5 most popular ML algorithms (Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel and Random Forest (RF)) in R. The video can be found here: Machine learning in R & source-code can be found here: ML_techniques_in_R
-
Introduction to Python: Syntax, Variables, Data types, Operators, Conditional testing, Bioinformatics Implementation: Tutorial1 Tutorial2 Tutorial3 Tutorial Code.
Machine Learning News
-
The Department of Defense (DoD), Army Materiel Command (AMC), HBCU/MI Equipment/Instrumentation grant award (2024–2025), Claflin University, Orangeburg, South Carolina, USA. Total Award (Direct and indirect costs), Role: PI, Effort=100% $410,596. Basic, Applied, and Advanced Research in Science and Engineering: Intel Xeon NVIDIA GPU Server for Enhancing Computational Capabilities at Claflin University.
-
National Science Foundation (NSF) South Carolina Established Program for Stem Cooperative Research (SC EPSCoR), RII Track-1 Award: AI-enabled Devices for the Advancement of Personalized and Transformative Healthcare in South Carolina (ADAPT 2023-2027): Total Award (Direct and indirect costs) Role: Co-PI, NSF Identifier: 000879633, Effort=100%, $20,000,000.00, Claflin Sub-award $414,000.00. A multi-institutional grant collaboration between Clemson University in partnership with Benedict College, The Citadel, Claflin University, College of Charleston, Francis Marion University, Medical University of South Carolina, South Carolina Research Authority, South Carolina State University, Tri-County Technical College, University of South Carolina, and Winthrop University. [*2024:16;2023:9],[**38, 40, 42, 43], GitHub, Poster, Funding supports Claflin University students **Caliese J. Beckford+π, **Mr. Oluwademiladeayo Ashade+π, and **Mr. Sabb, Dinari+π. Presentation PPT PPT Thrust II: Techniques of computer vision for image analysis. Education and Workforce Development, Project Implementation Team. Role: Co-PI
-
National Science Foundation South Carolina Established Program for Stem Cooperative Research (SC EPSCoR), RII Track-1 Award, AI-enabled Devices for the Advancement of Personalized and Transformative Healthcare in South Carolina (ADAPT), GAIN CRP Subaward, Clemson University (Grants for Applications in Industry and Networking Collaborative Research Program): AI-Enabled Construction of Aligned Collagen Using Two-Photon Techniques (2024-2025): Total Award (Direct and indirect costs) $70,000.00 [**43] Poster Funding supports Claflin University students **Caliese J. Beckford+π and **Mr. Sabb, Dinari+π. Role: Co-PI, NSF Identifier: 000879633, (Direct costs, Effort=100%), [by Steve Ramirez, SC EPSCoR communications specialist.]. [**2024:47] ,GitHub, Poster 1, Poster 2.
-
Title III and Special Initiative Office SEED Funding, Claflin University, Orangeburg, South Carolina, USA: Biomarker Identification for the Diagnosis of Chronic Lymphocytic Leukemia (CLL) (2024): Total Award (Direct and indirect costs) $8,840.00 Github Students involved **Lierra Rivera+π, Presently doctoral student at Clemson University, Bioengineering Program **Germari Cull+π, **Mr. Adrian Lockwood+π and **Kalyn Wesby+π. (Direct costs, Effort=100%) Report, GitHub.
-
**Priscilla Fatokun+π and **Oluwamayowa Owolabi+π receives $7,000 (Direct costs) research funding from 2023 SC Independent Colleges & Universities SCICU Undergraduate Student/Faculty Research, [**2023:39], Program! Role: PI, (Effort=100%). Poster, Poster
-
ML Research receives support from Google HBCU Career Readiness Capacity Grant (2023): $20,000 (Direct costs). Role: PI (Effort=100%).*
-
******Ms. Smart & Ms. Etha from Sanders Middle School and Bamberg Ehrhardt High School will be engaging in lab for 2022 summer Institutional Development Award (IDeA) Networks of Biomedical Research Excellence (INBRE) Research Education of Teachers INRE EPSCoR RET South Carolina Established Program for Stem Cooperative Research program. Role: PI (Effort=100%).
-
Setting of a SMART-HOME database effort will be initiated with Dr. Liles at Claflin University, Orangeburg, South Carolina, USA and the Google data commons team. This pilot study will consist of Claflin cohort using smart watch devices to gather different EHR variables.
-
Congrats **Joshua+π, Presently software engineer at Atlassian (Poster) for receiving $7,169 (Direct costs) research funding from [**2022:30], 2022 SC Independent Colleges & Universities SCICU Undergraduate Student/Faculty Research Program! Claflin App Deploy: https://joshuakiplimo-resparse-resumematch-6a8pxp.streamlit.app/ Role: PI (Effort=100%).
-
ChestAi.org gets funds from Rothberg fund at Yale University, New Haven, Connecticut, USA! Poster Link. 1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8
-
ChestAi.org gets funds from Connecticut Entrepreneurship Foundation Fund, Connecticut, USA! LinkLink 1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8
-
ChestAi.org receives Microsoft for Startups Founders Hub Azure Credits Link Link 1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8
-
Oracle for Research Award: https://www.oracle.com/research/ Role: PI (Effort=100%). 1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8
-
Importance of understanding novel structural variants and machine learning, an editorial in SelectScience, by Charlotte Carter, Life sciences editor.**
-
We have consistently shown that machine learning can be effectively utilized for identification and characterization of molecular gene signatures in progression of benign tumors. Our presentation and Association for Computing Machinery publication from 2nd International Conference on Artificial Intelligence and Information Systems. Presentation, PPT.*
-
Unsupervised techniques like clustering identify common check-point genes in 6 different cancer types. Our presentation and Springer Bioinformatics & Biomedical Engineering publication1 & publication2 from International Work-Conference on Bioinformatics and Biomedical Engineering. Presentation, PPT.*
-
TechGuide Expert.*
-
A graph database for integrating data from multiple sources: LinkedImm (BioMed Central Bioinformatics).**
-
Our recent publication in Wileys Chemical Biology and Drug Design shows that the common cancer biomarkers of breast and ovarian types can be effectively identified through artificial intelligence.**
-
Interesting presentation from our collaborator, Dr. Stanam, University of Iowa, Iowa City, USA at American Public Health Association APHA’s Annual Meeting and Expo. Poster Link.***
-
Sustainability is important and here’s how: Sustainable Solutions (Springer Sustainable Intelligent Systems) for Overcoming Transportation and Pollution Problems in Smart Cities. Poster Link.**
-
We should sought for the best, and best should be Scalable, Reliable and Robust! (Institute of Electrical and Electronics Engineers) ***
-
American Association for Cancer Research (AACR) was fun! Important work presented at AACR Atlanta. Poster Link Predicting the prognosis for cancer patients with interleukins gene expression level.***
-
Machine Learning has been promising in predicting financial series, the direction of the medicine, stock market, macroeconomic variables, accounting balance sheet information and many other applications. Here’s one application in time-point forecasting data. Presentation, PPT, Springer Proceedings of Sixth International Congress on Information and Communication Technology***
-
Solution for an inexpensive and uncomplicated colony management software? We have it and here it is: RodentSQL: a software suite for colony management of animal protocols. (Future Science)***
-
Claflin Student **Moneefa Jones graduates in Spring 2021 with the thesis "CU-Park: an parking app for addressing parking solutions". Student **Emmanuel Ndubuisi graduates in Spring 2022 with the thesis "Protein Classification Using Embeddings from Language Models Trained on Amino Acid Sequences".
-
Faculty presentations at Sustainable Horizons Institute, California. Application of Neural Networks for Aiding Diagnosis of Lung Disorders. Poster Link, Video Product, App.*
-
Presenting a unique approach (BioRxiv) of Single Shot Detector (SSD) with a VGG-16 network as a backbone for feature detection of bounding box algorithm to predict the location of an anomaly within chest X-ray image. Video*
-
Our new method (Springer Computational Vision and Bioinspired Computing) generates a significant improvement in validation loss with an increase in the labeled dataset size by around 42 times compared to the traditionally used self-learning semi-supervised techniques. Video, App, Product, Presentation, PPT.*
-
Claflin University, Orangeburg, South Carolina, USA, Summer Seed Research Grant (2023) for studying utilization of Machine Learning Techniques for Aiding Detection of Ischemic Stroke Lesion, Infarct Volumes, and Small-artery Occlusions. $10,000, Role: PI (Direct costs, Effort=100%). Pilot Funding. Springer Intelligent Sustainable Systems Paper.
-
Managing research effectively, 2023 online summer school by Narayanan A, Lahiri C & Pawar S, Center for Research, Innovation & Translation, Department of Biotechnology, Atmiya University, India. Presentation.*
-
AWS Machine Learning Bootcamp for Educators. (1, 2, 3) Certificate Certificate Amazon Web Services (AWS) Machine Learning Professional Development Intensive Bootcamp Part I for Teachers 2023-2024, The Coding School: Data Visualization, Machine Learning Models, Responsible Artificial Intelligence.*
-
Our recent Springer Bioinformatics and Biomedical Engineering work tests importance of Cyclical Learning Rates (CLR’S) for Improving Training Accuracies and Lowering Computational Cost. GitHub, App, Poster, Product (NSF RII Track-1 award funded project Role: Co-PI, NSF Identifier: 000879633, (Direct costs, Effort=100%)), Presentation, PPT, PPT*
-
ChestAi product gets listed on AL4HLTH from United Nations Office for Project Services (UNOPS StopTB), Partnership & Foundation for Innovative New Diagnostics (FIND). Product. Link Link 1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8
-
Why does COVID19 cause severe illness in some patients but not others? Cell Reports Medicine with Dr. Kleinstein, Yale University, New Haven, Connecticut, USA Dr. Leying, Yale University, New Haven, Connecticut, USA, Dr. Arce, Dr. Kho, Harvard University, Cambridge, MA, USA.**
-
The Impact of Digital Learning on Minoritized and Poverty Affected College Students, Course Redesign Digital Learning & Courseware Equity in Higher Education, by Patricia O'Sullivan, Every Learner Everywhere.*