Accelerating Scientific Discovery in Biomedicine Using Artificial Intelligence

Volodymyr Kuleshov - Post-Doctoral Researcher, Computer Science, Stanford University

March 1, 2018, 10 a.m. - March 1, 2018, 11 a.m.



One of the greatest promises of artificial intelligence (AI) is to accelerate the creation of knowledge by assisting scientists in their work. This talk will explore how AI can achieve this vision, focusing on the field of genomics and its applications in personalized medicine.


In the first part of the talk, I will argue that the cost of scientific experiments can be dramatically reduced by training probabilistic generative models on vast amounts of existing experimental data. Specifically, I will describe a new genome sequencing technology which reduces the cost of genome haplotyping by up to ten fold by augmenting an existing wetlab protocol with a statistical model of the genome. I will also discuss extensions of this technology to the analysis of the human gut microbiome, and show how it enables scientists to study for the first time the fine variation among individual microbial strains.


The second part of the talk will focus on the problem of collecting and synthesizing scientific knowledge from the academic literature. I will present GWASkb, a machine reading system that enables researchers to access the results of thousands of genotype-phenotype association studies in the form of a structured database. Surprisingly, even the largest human literature curation efforts miss thousands of useful associations, which are effectively lost for many purposes, including disease risk prediction. GWASkb is the largest automated curation effort in its domain and is made possible by a novel paradigm for constructing machine learning systems called data programming.


The last part of the talk will look at how to transform biomedical knowledge into useful probabilistic predictions. Ideal predictions are typically defined as being precise (low variance) and calibrated (an 80% event occurs 80% of the time). I will describe algorithms for constructing such forecasts when the data is not distributed i.i.d., but is rather chosen by a malicious adversary. These algorithms help clinicians reliably estimate disease risk from genetic data, and also improve time series forecasting and reinforcement learning systems.


In summary, my work will help scientists study the human genome and turn their discoveries into personalized medical technologies.



Volodymyr Kuleshov is a post-doctoral researcher in the group of Stefano Ermon in the Department of Computer Science at Stanford University. He obtained his Ph.D. in Computer Science from Stanford University, advised by Serafim Batzoglou and Michael Snyder. His research interests lie at the intersection of machine learning and computational biology, and focus on two high-level goals: building intelligent tools that accelerate scientific discovery in biomedicine and developing core machine learning techniques that make such tools possible. Volodymyr has been awarded an NSERC Post-Graduate Fellowship and a Stanford Graduate Fellowship. His research has been covered in Nature Biotechnology, Scientific American, GenomeWeb, and Science Daily. His online lecture notes on probabilistic graphical models have been viewed more than 125,000 times by over 28,000 readers across the world and are used at Stanford. Part of his research on statistical genome phasing has been licensed commercially and now powers the Phased Sequencing product of Illumina Inc, following its acquisition of the startup Moleculo.