Research

1) Identifying cancer drug targets utilizing construction of complex interactome of proteins (article & article): The approach for CaI construction and analysis. (a) The three databases BIOCARTA, PID, REACTOME and KEGG utilised for the extraction of pathways followed by disintegration into protein constituents and identification of any other pathways they are involved with. (b) The meta database STRING for finding the interactions of all the proteins pooled above. (c) The large component of the CaI constructed from pooled interactions above, coloured by 335 modules by Rosvall Algorithm with node size plotted as per degree (d) The analyses for power law, K-core, inter- and intra-modular connectivities for CaI constructed followed by the drug statuses against centrality measures, in clockwise manner. The three SPINs and GPINs of A. baumannii reflecting the degree of connectivity. SPINs are represented in blue spheres connected through blue-colored curved lines for (A) VaCAB, having vaccine candidates; (B) ViFAB, with virulent factors; and (C) KeFAB, with key factors each with their interactors. (D) GPIN with proteins represented in black spheres connected with black curved lines to form the interactome. The technical analysis of the constructed CaI. (a) Complementary Cumulative Degree Distribution (CCDF) of CaI showing Power-Law behaviour. (b) K-core analysis of CaI representing the size of each k-shell (number of proteins appearing in k-core but not in k + 1th core) from periphery (k = 1) to inner core (k-max). (c) Classification of CaI proteins (R) based on its role and region in network space, the P-Z space classified into 7 categories of hub and non-hub nodes. The latter has been assigned as ultra-peripheral (R1), peripheral (R2), non-hub connector (R3) and non-hub kinless nodes (R4) and the former has been assigned as provincial (R5), connector (R6) and kinless hubs (R7) as described by Guimera et al. Kinless hubs nodes are supposed to be important in term of functionality, which has high connection within module as well as between modules.

2) Convolutional Neural Networks: Analysis of drug resistance from deep sequencing data (article). Restricted Boltzmann Machines are an effective machine learning tool for classification of genomic and structural data. They can also be used to compare resistance profiles of different protease inhibitors.

3) Prognostic Model Predicts Survival in Cancer Patients (article): An optimal prognostic model by the combination of six mRNAs was established. Kaplan–Meier survival analysis revealed effective risk stratification by this model for patients in the two datasets. The area under ROC curve (AUC) was > 0.65 for training and validation datasets, indicating good sensitivity and specificity of this model. Moreover, prominent superiority of this model to investigate prognostic biomarkers was demonstrated.

4) Regression and Tree Based Classification Models: Common cancer biomarkers identified through artificial intelligence (article). Identification of biomarker genes. (a) Heat map showing expression levels of top 25 cancer biomarker genes in ovarian and breast cancer types, (b) variable importance with gene ranks for all the genes, (c) mean decrease gini value for top 25 biomarker genes.

5) Deep Learning & Computer Vision: Bounding box algorithms are useful in localization of image patterns. Recently, utilization of convolutional neural networks on X-ray images has proven a promising disease prediction technique. However, pattern localization over prediction has always been a challenging task with inconsistent coordinates, sizes, resolution and capture positions of an image. In this article, we present a unique approach of SSD with a VGG-16 network as a backbone for feature detection of bounding box algorithm to predict the location of an anomaly within chest X-ray image (article & article):

Home: Class Overview

Home: Tests & Assignments

Tutorials

Linear Regression: The aim of linear regression is to model a continuous variable Y as a mathematical function of one or more X variable(s), so that we can use this regression model to predict the Y when only the X is known. This mathematical equation can be generalized as follows: Y = β1 + β2X + ϵ, where, β1 is the intercept and β2 is the slope. Collectively, they are called regression coefficients. ϵ is the error term, the part of Y the regression model is unable to explain. The video can be found here: Linear Regression in R & source-code can be found here: Linear_Regression_R_Code
Tree Based Models: Recursive partitioning is a fundamental tool in data mining. It helps us explore the structure of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome. Classification and regression trees (as described by Brieman, Freidman, Olshen, and Stone) can be generated through the rpart package. In R, decision tree uses a complexity parameter (cp). It measures the tradeoff between model complexity and accuracy on training set. A smaller cp will lead to a bigger tree, which might overfit the model. Conversely, a large cp value might underfit the model. Underfitting occurs when the model does not capture underlying trends properly. The video can be found here: Tree based models in R & source-code can be found here: Tree based models code
Machine learning (ML) continues to grow in importance for many organizations across nearly all domains. Some example applications of machine learning in practice include: Predicting the likelihood of a patient returning to the hospital (readmission) within 30 days of discharge. Segmenting customers based on common attributes or purchasing behavior for targeted marketing. Predicting coupon redemption rates for a given marketing campaign. Predicting customer churn so an organization can perform preventative intervention. In essence, these tasks all seek to learn from data. To address each scenario, we can use a given set of features to train an algorithm and extract insights. These algorithms, or learners, can be classified according to the amount and type of supervision needed during training. This tutorial is focused on implementing 5 most popular ML algorithms (Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel and Random Forest (RF)) in R. The video can be found here: Machine learning in R & source-code can be found here: ML_techniques_in_R
Introduction to Python: Syntax, Variables, Data types, Operators, Conditional testing, Bioinformatics Implementation: Tutorial1 Tutorial2 Tutorial3 Tutorial Code.