Our lab uses advanced deep learning techniques, including large language models (LLMs), to study noncoding regulatory mutations associated with autism and other neurodevelopmental disorders. We’ve developed the CWAS framework (Kim et al. 2024) to analyze whole genome sequencing data from autism families. Currently, we’re integrating single-cell multiomics datasets from developing human brains to further refine our approach. By leveraging LLM-based deep learning models, we aim to systematically interpret noncoding mutations and their regulatory effects on neurodevelopment. Our research focuses on modeling complex interactions between genomic, epigenomic, and transcriptomic features to better predict functional consequences of noncoding variants. Through these efforts, we seek to enhance our understanding of neurodevelopmental risk and develop AI-driven tools for variant prioritization and mechanistic insights into autism and related disorders.
Our research focuses on building a large-scale single-cell RNA sequencing atlas to comprehensively map cellular states and gene regulatory networks across various biological systems. By integrating LLM-based foundation models, we develop an AI-driven virtual cell (AIVC) that enables the discovery of novel risk genes and their functional interactions. Through this approach, we systematically identify core gene regulatory networks and predict disease-associated mechanisms, with applications in neurodevelopmental disorders and autism. The AIVC provides a computational framework for high-fidelity simulations, in silico experimentation, and hypothesis-driven validation, accelerating discoveries in both fundamental biology and precision medicine.
Our lab investigates the genetic architecture of autism with a particular focus on East Asian populations, leveraging long-read whole genome sequencing to uncover previously inaccessible genomic variations. Since establishing my research group in Korea University in 2019, we have concentrated on exploring a genetic architecture of autism in Korean families. Our investigations encompass a wide range of genetic factors, including common, rare, and de novo variants, which we are analyzing within one of the largest East Asian cohorts for autism. Notably, our research has recently revealed sex-specific patterns in genetic risk factors among Korean autism families, showing that these differences may influence phenotypic severity and familial patterns in autism (Kim et al. 2024, Genome Medicine). We are currently analyzing long-read sequencing data from Korean autism families to further investigate the impact of complex genomic variations, including large insertions, deletions, and repeat expansions. Through this approach, we aim to uncover a complete genetic architecture of autism and phenotypic heterogeneity.
My research focus is centered on exploring the extreme genetic heterogeneity that underlies complex human disorder. I have examined the hypothesis that multiple risk genes converge on a reduced number of crucial biological processes. To this end, I have developed a computational prediction model to identify cohesive biological networks in autism (An et al. 2014). Further integration of this model with in vitro functional characterization has led to the identification of key pathways, including axonal guidance and the NRXN complex, in autism, which were evaluated through functional validation (Williams et al. 2018). Our lab has integrated large-scale whole-genome sequencing and transcriptomics datasets of human post-mortem cortex across fetal to adult stages to analyze the impact of genetic variation on gene expression in developing cortex (Werling et al. 2020). Furthermore, we have applied our systems approach to understand the core of pathology and to identify cancer subtypes and tumor progression, as outlined in Heo et al. 2021. Multi-omics analysis, particularly through large-scale proteomics, is an emerging area in biomedical science and holds great promise for mechanism and translational research. To that end, our lab has been developing analytical frameworks for multi-omics analysis of Korean lung cancer patients, including genomics, transcriptomics, proteomics, phospho-proteomics, and acetyl-proteomics, to characterize cancer subtypes and tumor microenvironment.