From Molecules to Models

Young woman with long dark hair, wearing a white blouse with black accents, smiling and standing next to a scientific research poster.

Computational biologist and ML scientist specializing in antibody engineering, protein structure prediction, and large-scale sequence analysis. Published in JCTC, JBC, and MICCAI. 6+ years bridging in silico predictions with wet-lab validation for next-generation therapeutics.

About

I'm a computational biologist who works at the intersection of biological sequence analysis, molecular simulation, and machine learning — designing generative models for antibody libraries, developing neural network force fields for atomistic simulation, and engineering the data infrastructure that makes it all scale.

At FairJourney Biologics, I built LLM-hybrid models for immune library design across VHH, scFv, and IgG formats, maintained three proprietary human-donor-sourced libraries with 10–80% hit rates, and ran the SQL/AWS backbone behind 200+ antibody discovery projects spanning 50M+ sequences. I worked directly with wetlab teams to close design-build-test loops, cutting campaign cycles by ~40%.

My research roots are in the labs of Nobel laureates Michael Levitt and Roger Kornberg, where I co-developed hybrid NN force fields that enable classical molecular dynamics to achieve quantum-level accuracy — work published in JCTC and JPCA.

My path here wasn't linear: agricultural chemistry in Taipei, neuroscience and wet-lab research at Tsinghua, medical imaging AI in Beijing, protein biochemistry at USF — each step added a layer of domain intuition that pure ML engineers don't have. I don't just build models. I understand what they're modeling.

Selected Work

AI-Driven Antibody Library Engineering

FairJourney Biologics (Charles River Laboratories) · 2024–Present

What‍‍ LLM-hybrid and geometric deep learning models for immune library design (VHH, scFv, IgG) on phage/yeast display platforms?

How‍‍‍ Fine-tuned large language models on proprietary antibody sequences; sole maintainer of three human-donor-sourced immune libraries achieving 10–80% hit rates.

Impact‍ ‍Reduced campaign cycles by ~40% across 50+ therapeutic programs for 30+ oncology and immunology clients.

Antibody Discovery Data Infrastructure

FairJourney Biologics (Charles River Laboratories) · 2022–2025

What ‍Production SQL databases and AWS-integrated ML-query interfaces for screening and NGS data.

How Databases spanning 200+ projects and 50M+ sequences; programmatically integrated PDB, IMGT, UniProt, and SRA with internal pipelines.

Impact Cut data retrieval time by 60%. Enabled scalable ML training workflows across the organization.

Hybrid Neural Network Force Fields

Freecurve / InterX Labs (Levitt & Kornberg Labs) · 2021–2025

What NN corrections to classical molecular force fields that capture nuclear quantum effects in biomolecular simulations.

How Geometric deep learning for atomistic simulation; achieved ±0.5 kcal/mol MAE on ab initio PES for dipeptide conformations.

Impact Classical MD matches path-integral MD benchmarks. Accurate protein structural modeling without expensive PIMD overhead. Two publications: JCTC 2024, JPCA 2024.

Universal Lesion Detection in CT Imaging

Deepwise Healthcare, Beijing · 2019–2020

What Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) for universal CT lesion detection.

How Depthwise separable convolutions + group transform module; novel 3D pre-training using 2D natural image datasets.

Impact +3.48% absolute SOTA sensitivity improvement on DeepLesion at FPs@0.5. Published at MICCAI 2020 (30 citations). Included clinical translation via on-site hospital visits.

Open-Source Project

Antibody PLM Benchmark

Systematic comparison of protein language models (ESM-2, AbLang, AntiBERTy) vs. BLOSUM62 for antibody variant fitness prediction. Key finding: after controlling for mutation count, no PLM significantly predicts continuous antibody fitness.

Python‍ ‍Antibody Engineering‍ ‍PLMs‍ ‍Benchmarking

Peak-to-Gene Linkage Benchmark

Benchmarking 5 peak–gene linkage methods on single-cell colon multi-omics using stratified LD-score regression for IBD GWAS heritability. Paired multiome method concentrates IBD heritability 10–21× in 0.2–0.5% of SNPs.

Python‍ ‍scanpys‍ ‍LDSC‍ ‍Single-Cell

Amazon Multi-Method Occupancy Modeling

Hierarchical Bayesian occupancy modeling across 4 biodiversity survey methods in the Amazon. Multi-species MCMC with JAGS across 50 taxa.R

Python‍ ‍ ‍R ‍Bayesian‍ ‍JAGS‍ ‍multi-model

Publications

  1. Chen, Y.C., Yang, J. "Two stages of substrate discrimination dictate selectivity in the E. coli MetNI-Q ABC transporter system." Journal of Biological Chemistry (JBC), 2025. · 2 citations

    View Paper →

  2. "Neural Network Corrections to Intermolecular Interaction Terms of a Molecular Force Field Capture Nuclear Quantum Effects in Calculations of Liquid Thermodynamic Properties." J. Chem. Theory Comput. (JCTC), 2024, 20(3):1347–1357. · 10 citations
    View Paper →

  3. "Combining Force Fields and Neural Networks for an Accurate Representation of Bonded Interactions." J. Phys. Chem. A (JPCA), 2024. · 5 citations
    View Paper →

  4. Zhang, S., Xu, J., Chen, Y.C., et al. "Revisiting 3D Context Modeling with Supervised Pre-training for Universal Lesion Detection in CT Slices." MICCAI, 2020, pp. 542–551. · 30 citations
    View Paper →

Skill & Tools & Education

Core Expertise

Antibody Engineering & Drug Discovery: VHH / scFv / IgG design, CDR analysis, germline assignment, CADD, AlphaFold2/3, interface modeling, immune library design, phage & yeast display

Machine Learning & AI: Geometric deep learning, LLMs & foundation models, generative models (VAEs, diffusion), graph neural networks, NN force fields, sequence-structure modeling

Data Infrastructure & Scale: SQL/MySQL, AWS/GPU, 50M+ sequence pipelines, PDB/UniProt/IMGT/SRA integration, NGS & single-cell analysis

Programming & Tools

Python, PyTorch, TensorFlow, R, Perl, C++/CUDA, SQL

Biopython, Scanpy, AlphaFold2/3, PyMOL, ANARCI, IgBLAST, GROMACS

RDKit, Schrödinger

AWS, Git, Linux

Additional Background

Neuroscience (Tsinghua — neural circuits, memory, early drug screening)

physical chemistry, gene editing, organic synthesis,

battery simulation (ANSYS/Simulink)

Education

M.S. Data Science — MITx MicroMasters, 2023

M.S. Chemistry — University of San Francisco, 2022

M.S. Neuroscience — Tsinghua University, 2019

B.S. Agricultural Chemistry — National Taiwan University, 2016