Research Summary: |
I have been working in machine learning, high dimensional data analysis and computational biology in over a decade. I am currently an adjunct Instructor in the division of Bioinformatics-Biostatistics at the Cancer Center of the Johns Hopkins University. My primary research focus aims to study and develop machine learning techniques for analyzing high dimensional biological data, both from biological application and theoretical statistics perspectives. The advent of high throughput biological data combined with improvements in computation power have opened new opportunities for the creation of new knowledge in biology and medicine through informatics. I aim to address new challenges for pattern recognition algorithms that have emerged, including addressing inaccuracy due to limited number of samples in situations where data exhibit high dimensionality, computational complexity, as well as lack of interpretability.
A prominent theme in my research, and one that continues to fascinate me, centers around the use of a small set of measurements to address these challenges. Such strategies give rise to simple, interpretable, robust, computationally efficient yet accurate algorithms for high dimensional data. My research interest along these lines has proceeded in two distinct directions. I have sought to understand (1) modeling cancer phenotypes through genomics measurement especially biomarker discovery from transcriptome, circulating tumor DNA, mutational landscape data, and (2) dysregulation detection of sets of genomics expression data through variation. Invariably, I have implemented these novel methodological approaches as R packages and made them freely and publicly available (usually through the BioConductor website) enabling the bioinformatics community to experiment with them in their own research.
|