About me

I am a PhD candidate at the Center for Computational Molecular Biology at Brown University, advised by Dr. Lorin Crawford and Dr. Jeffrey Bailey. My work sits at the intersection of machine learning, Bayesian statistics, and translational biology.

My research focuses on a single through-line: how do we build AI systems that extract reliable, interpretable signal from the messy, heterogeneous biological data that actually exists in the real world? That means genomic surveillance data with sparse sampling, multi-omics cohorts with missing modalities, and clinical datasets with confounded outcomes.

I work closely with domain experts such as biologists, clinicians, and public health scientists to ensure that models don’t just perform well in silico, but directly improve how decisions are made in biotechnology, pharmaceutical research, and public-health settings.

During the summer 2025 I was a Research Intern at Microsoft Research New England. Currently, I am a Visiting PhD Researcher at Oxford University and Imperial College London, and will be a Computational Biology Intern at Takeda Pharmaceutical (Jun–Aug 2026). I am a Frontera Computational Science Fellowship Fellow 2025-2026 (Texas Advanced Computing Center). My work has been cited in the WHO World Malaria Report 2025.

Research Interests

  • Interpretable ML
  • Generative models
  • Biomarker discovery
  • Drug discovery
  • Multi-omics integration
  • Clinical decision-making
  • Translational AI
  • Precision medicine
  • Infectious disease

Mentors

I have over seven years of research training, advised by:


Key research projects

Transformer-Based Generative Model for Population Genetic Inference

A forward-time generative simulator and transformer-based deep learning model for simulation-based inference of population genetic dynamics. Generates large-scale synthetic biological sequence trajectories under realistic evolutionary dynamics, enabling downstream ML inference without explicit likelihoods. A novel framework applicable to genomic and protein sequence modeling.

Multimodal AI for Personalized Blood Biomarker Prediction

A multimodal generative framework that integrates genomic, clinical, and environmental data to predict blood biomarkers, moving beyond genetic determinism to provide quantitative, modality-level interpretability. Enables actionable insights into which data types drive disease-linked predictions, directly supporting personalized medicine pipelines.

Spatiotemporal Bayesian Model for Drug Resistance Surveillance

A Bayesian spatiotemporal ML framework that maps the continuous prevalence of antimalarial drug resistance mutations across Africa from sparse, heterogeneous genomic surveillance data. Provides calibrated uncertainty quantification for early hotspot detection, before clinical efficacy is lost, enabling proactive drug strategy decisions at scale.

PreprintCode.

Clinical Relevance of Toxicity Metrics in Drug Combination Models

Systematic evaluation of whether ML-driven synergy scores and toxicity metrics in cancer drug combination models reflect clinically observed adverse interactions. Integrated large-scale synergy datasets with curated clinical toxicity databases, revealing a critical gap: optimizing for synergy alone can inadvertently prioritize toxic combinations. Directly informs safer ML-guided drug discovery pipelines.

Paper in BioinformaticsCode.

Bayesian Mixed-Effects Modeling for Drug-Resistance Forecasting

Bayesian mixed-effects model estimating the speed of antimalarial drug-resistance spread across Uganda and southeast Asia from sparse genomic surveillance data. Forecasts show combined resistance mutations could reach near fixation within a decade. These findings were cited in the WHO World Malaria Report 2025 and directly inform global drug policy. Demonstrates how calibrated, uncertainty-aware ML can turn incomplete real-world biological data into actionable public health decisions.

Paper in Lancet MicrobeCode.

Interpretable Evolutionary Modeling of Tumor Progression from scRNA-seq

An evolutionary ML framework integrating copy-number variation features from single-cell RNA-seq data to classify tumor evolutionary modes across cancer types. Random Forest ensemble learning captured non-linear interactions underlying clonal expansion, intratumoral heterogeneity, and treatment pressure — providing interpretable insights into cell-state dynamics for drug target identification.