From Molecules to Models
Computational biologist and ML scientist specializing in antibody engineering, protein structure prediction, and large-scale sequence analysis. Published in JCTC, JBC, and MICCAI. 6+ years bridging in silico predictions with wet-lab validation for next-generation therapeutics.
About
I'm a computational biologist who works at the intersection of biological sequence analysis, molecular simulation, and machine learning — designing generative models for antibody libraries, developing neural network force fields for atomistic simulation, and engineering the data infrastructure that makes it all scale.
At FairJourney Biologics, I built LLM-hybrid models for immune library design across VHH, scFv, and IgG formats, maintained three proprietary human-donor-sourced libraries with 10–80% hit rates, and ran the SQL/AWS backbone behind 200+ antibody discovery projects spanning 50M+ sequences. I worked directly with wetlab teams to close design-build-test loops, cutting campaign cycles by ~40%.
My research roots are in the labs of Nobel laureates Michael Levitt and Roger Kornberg, where I co-developed hybrid NN force fields that enable classical molecular dynamics to achieve quantum-level accuracy — work published in JCTC and JPCA.
My path here wasn't linear: agricultural chemistry in Taipei, neuroscience and wet-lab research at Tsinghua, medical imaging AI in Beijing, protein biochemistry at USF — each step added a layer of domain intuition that pure ML engineers don't have. I don't just build models. I understand what they're modeling.
Selected Work
AI-Driven Antibody Library Engineering
FairJourney Biologics (Charles River Laboratories) · 2024–Present
What LLM-hybrid and geometric deep learning models for immune library design (VHH, scFv, IgG) on phage/yeast display platforms?
How Fine-tuned large language models on proprietary antibody sequences; sole maintainer of three human-donor-sourced immune libraries achieving 10–80% hit rates.
Impact Reduced campaign cycles by ~40% across 50+ therapeutic programs for 30+ oncology and immunology clients.
Antibody Discovery Data Infrastructure
FairJourney Biologics (Charles River Laboratories) · 2022–2025
What Production SQL databases and AWS-integrated ML-query interfaces for screening and NGS data.
How Databases spanning 200+ projects and 50M+ sequences; programmatically integrated PDB, IMGT, UniProt, and SRA with internal pipelines.
Impact Cut data retrieval time by 60%. Enabled scalable ML training workflows across the organization.
Hybrid Neural Network Force Fields
Freecurve / InterX Labs (Levitt & Kornberg Labs) · 2021–2025
What NN corrections to classical molecular force fields that capture nuclear quantum effects in biomolecular simulations.
How Geometric deep learning for atomistic simulation; achieved ±0.5 kcal/mol MAE on ab initio PES for dipeptide conformations.
Impact Classical MD matches path-integral MD benchmarks. Accurate protein structural modeling without expensive PIMD overhead. Two publications: JCTC 2024, JPCA 2024.
Universal Lesion Detection in CT Imaging
Deepwise Healthcare, Beijing · 2019–2020
What Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) for universal CT lesion detection.
How Depthwise separable convolutions + group transform module; novel 3D pre-training using 2D natural image datasets.
Impact +3.48% absolute SOTA sensitivity improvement on DeepLesion at FPs@0.5. Published at MICCAI 2020 (30 citations). Included clinical translation via on-site hospital visits.
Open-Source Project
↗ Antibody PLM Benchmark
Systematic comparison of protein language models (ESM-2, AbLang, AntiBERTy) vs. BLOSUM62 for antibody variant fitness prediction. Key finding: after controlling for mutation count, no PLM significantly predicts continuous antibody fitness.
Python Antibody Engineering PLMs Benchmarking
↗ Peak-to-Gene Linkage Benchmark
Benchmarking 5 peak–gene linkage methods on single-cell colon multi-omics using stratified LD-score regression for IBD GWAS heritability. Paired multiome method concentrates IBD heritability 10–21× in 0.2–0.5% of SNPs.
Python scanpys LDSC Single-Cell
↗ Amazon Multi-Method Occupancy Modeling
Hierarchical Bayesian occupancy modeling across 4 biodiversity survey methods in the Amazon. Multi-species MCMC with JAGS across 50 taxa.R
Python R Bayesian JAGS multi-model
Publications
Chen, Y.C., Yang, J. "Two stages of substrate discrimination dictate selectivity in the E. coli MetNI-Q ABC transporter system." Journal of Biological Chemistry (JBC), 2025. · 2 citations
"Neural Network Corrections to Intermolecular Interaction Terms of a Molecular Force Field Capture Nuclear Quantum Effects in Calculations of Liquid Thermodynamic Properties." J. Chem. Theory Comput. (JCTC), 2024, 20(3):1347–1357. · 10 citations
View Paper →"Combining Force Fields and Neural Networks for an Accurate Representation of Bonded Interactions." J. Phys. Chem. A (JPCA), 2024. · 5 citations
View Paper →Zhang, S., Xu, J., Chen, Y.C., et al. "Revisiting 3D Context Modeling with Supervised Pre-training for Universal Lesion Detection in CT Slices." MICCAI, 2020, pp. 542–551. · 30 citations
View Paper →
Skill & Tools & Education
Core Expertise
Antibody Engineering & Drug Discovery: VHH / scFv / IgG design, CDR analysis, germline assignment, CADD, AlphaFold2/3, interface modeling, immune library design, phage & yeast display
Machine Learning & AI: Geometric deep learning, LLMs & foundation models, generative models (VAEs, diffusion), graph neural networks, NN force fields, sequence-structure modeling
Data Infrastructure & Scale: SQL/MySQL, AWS/GPU, 50M+ sequence pipelines, PDB/UniProt/IMGT/SRA integration, NGS & single-cell analysis
Programming & Tools
Python, PyTorch, TensorFlow, R, Perl, C++/CUDA, SQL
Biopython, Scanpy, AlphaFold2/3, PyMOL, ANARCI, IgBLAST, GROMACS
RDKit, Schrödinger
AWS, Git, Linux
Additional Background
Neuroscience (Tsinghua — neural circuits, memory, early drug screening)
physical chemistry, gene editing, organic synthesis,
battery simulation (ANSYS/Simulink)
Education
M.S. Data Science — MITx MicroMasters, 2023
M.S. Chemistry — University of San Francisco, 2022
M.S. Neuroscience — Tsinghua University, 2019
B.S. Agricultural Chemistry — National Taiwan University, 2016