Faizy Ahsan

I'm a PhD candidate at School of Computer Science, McGill University under the supervision of Prof. Mathieu Blanchette and Prof. Doina Precup. My interests are in developing advanced machine learning techniques for genomics and sequence analysis. You can find my latest resume here.


Advanced Machine Learning for Biological Sequence Analysis and Gene Regulation : The study of gene regulation is an active research area and holds key to comprehend many biological mechanisms and prevention of diseases. Particularly, the transcription phase is crucial, where proteins called transcription factors bind to specific genomic sequences and initiate gene regulation.

In this direction, various experimental procedures have been developed to identify the specific genomic regions that are bound by transcription factors. These experiments are costly and time consuming, which led to development of computational approaches to predict these protein-DNA interactions. Recently, machine learning based methods, especially deep learning techniques, have proven to outperform classical computational based approaches in various research areas.

However, the sequential data required to train such machine learning models is often insufficient. Under such circumstances, the data can be augmented by the extant orthologous and ancestral sequences. The resultant models that combine sequences from different species can take advantage of advanced machine learning techniques and opens a door to evolutionary study of functional genomic regions. Thus, the goal of this research project is to summarize the topics related to computational analysis of transcriptional regulation with a focus on cutting-edge machine learning approaches.

Cell Type Prediction of Transcription Factor Binding Sites using Machine Learning : As my master's research project, we proposed a machine learning approach to predict the particular cell type where a given transcription factor can bind a DNA sequence. The learning models are trained on the DNA sequences provided from the publicly available ChIPseq experiments of the ENCODE project for 52 transcription factors across the GM12878, K562, HeLa, H1-hESC and HepG2 cell lines. Three different feature extraction methods are used based on k-mer representations, counts of known motifs, and a new model called the skip gram model, which has become very popular in the analysis of text. We used SVM, K-means and logistic regression for the classification task. We find that predictors based on known motifs counts detect cell-type specific signatures better than a previously published method, with mean AUC improvement of 0.18 and can be used to identify the interaction of transcription factors. Remarkably, the skip gram approach, which can be used without of any prior knowledge about transcription factor binding sites, performs almost as well as the motif-based method. Overall, our family of predictors will be useful to both better predict cell-type specific transcription factor occupancy and understand the mechanisms underlying this phenomenon.

The thesis was approved in 2016 and the results were published in ACM-BCB 2016 conference.

Ahsan, Faizy, Doina Precup, and Mathieu Blanchette. "Prediction of Cell Type Specific Transcription Factor Binding Site Occupancy." Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 2016. [ pdf ]


Master of Science Sep,2013 - May,2016

School of Computer Science, McGill University

Bachelor of Technology July, 2008 - April, 2012

Computer Science and Engg., Indian Institute of Technology Jodhpur


TandemLaunch Inc., Montreal

Machine Learning Researcher (May - August, 2015)

With SensingDynamics, we developed front end and back end of a software system to recognize smell using Microsoft Azure, RShiny, C and PHP.

Desautels Faculty of Management

Operations Management (Feb - August, 2015)

Under the guidance of Prof. Saibal Ray and Prof. Shanling Li, we developed a software to forecast the quantity of goods to be kept in retail stores using Machine Learning and Data Mining tools with Microsoft Access, R and SHELL.

National Institute of Informatics, Tokyo

Data Mining Internship (Feb - August, 2014)

Under supervision of Prof. Michael Houle, we developed clustering algorithms for High Dimension Datasets using C, C++, Shell and Matlab.

Center for Artificial Intelligence and Robotics, Bangalore

Scientist B (July, 2012 - July, 2013)

We were involved in development and implementation of Classification and Regression Trees using Gini Twoing criteria in Matlab and C.

Combutational Research Lab, Pune

Summer Intern (May - July 2011)

Development of front-end & back-end of Chipmunk ( Secure Data Transfer Appliance of CRL) using C, SHELL, Apache, MySQL, PHP, HTTP


At McGill University, I've been teaching assistant for the following courses:

Comp 250: Introduction to Computer Science

Fall 2014, Winter 2015, Fall 2015, Winter 2016, Winter 2017

Comp 251: Algorithm and Data Structure

Fall 2014

Comp 307: Principles of Web Development

Fall 2016

Comp 421: Database Systems

Winter 2016, Winter 2017


Room 3140, Trottier Building, 3630 University

Montreal, QC, Canada, H3A OC6