Hello, I’m Michael. Welcome to my Portfolio!
I am a highly motivated Bioinformatics Ph.D. Candidate passionate about AI and machine learning applications to biomedicine. My dissertation focuses on machine learning applications in multi-modal biological network modeling for drug target discovery. With over 6 years of bioinformatics research, I have developed a solid foundation in machine learning, statistics, genomics and molecular biology, multiomics integration, and software development. I also have 2 years of AI development experience and LLM frameworks like Langchain and Pydantic AI, especially in RAG/graphRAG and multi-agent workflows.
Personal AI Projects
NVIDIA x Scverse Hackathon: I built GPU-accelerated single cell genomics analysis tools achieving 100x speedups over existing single cell libraries. I also actively contribute to the NVIDIA’s RAPIDS-singlecell GitHub. GitHub link
BioRAG-lab: I am creating a reinforcement learning coding assistant to help reproduce published research code and test it on new datasets. GitHub link
PubmedRAG: I developed a RAG-based literature search tool that accelerates research by automatically curating relevant PubMed articles. GitHub link
Research Experience
Biomedical Knowledge GraphRAG: With my internship at RefinedScience, I built a research report agent to generate insightful reports based on up-to-date biomedical knowledge graphs and drug market intelligence using GraphRAG and SQLRAG to help prioritize drug rescues for acute myeloid leukemia. I performed LLM-as-judge evaluation frameworks and developed chatbot UIs in Dash and Streamlit with transparent reasoning steps to solicit and incorporate feedback from the key opinion leaders.
Cell Attention is All You Need: Currently I am exploring graph attention networks and single cell foundation models like Geneformer and CellPLM to learn cell communication from single cell and spatial omics data.
Machine learning: I developed a machine learning model for scRNAseq and spatial transcriptomics data that uses gradient boosting to infer gene regulation in cells. My method can model disease gene mechanisms and shows up to 0.5 AUROC improvement in predicting affected genes in gene knockout experiments. This method is published in iScience [link].
Database Management: I have also applied this method to over 1300 public scRNAseq datasets and created a the first database of cell type gene regulatory networks (GRNs) across human and mouse tissues. I built this using a LAMP stack and Neo4j graph database. Feel free to explore any gene regulatory network here.
Precision Medicine: I have integrated gene regulatory network modeling with genetic data to genetic factors of traumatic brain injury recovery, allowing for tailored symptom prediction for brain injury patients. These findings are submitted to NPJ Systems Biology and Applications and currently in medRxiv [link].
Drug Target Discovery: I have also applied network modeling to drug screen databases to identify drug-targettable genes for cancer immunotherapy resistance. I collaborated with cancer biologists to validate these findings in lung cancer cohorts. These findings are submitted to Nature Communications and currently in bioRxiv [link].
Representation Learning I applied nonnegative matrix factorization on 24 cancer types in The Cancer Genome Atlas (TCGA) to identify latent epigenetic factors that associate with patient survival. I also trained a neural network to stratify patients into high and low survival groups. These findings are published in Communications Biology [link] and featured by the National Cancer Institute [link].
