I am a first-year Ph.D. student at the VRVC Lab, ShanghaiTech University, under the supervision of Professor Jingyi Yu. My research focuses on representation learning for biomolecules, including small molecules and proteins, as well as autoregressive generative models.
Cryo-electron microscopy (cryo-EM) has revolutionized structural biology by resolving 3D structures of biomolecules at near-atomic resolution. However, revealing the continuous conformational heterogeneity from hundreds of thousands of noisy particle images remains challenging. Recent advances in heterogeneous reconstruction, often conducted in the Fourier domain, suffer from a lack of interpretability and are limited in achieving higher resolution in locally flexible regions. To address this issue, we propose CryoFormer, a novel approach for high-resolution and continuous heterogeneous cryo-EM reconstruction. CryoFormer leverages a feature volume in the real domain to capture fine-grained local changes. We then design a novel query-based transformer architecture that incorporates deformation-aware features and region-wise spatial features using a cross-attention mechanism. Our transformer-based pipeline further supports pose refinement and can automatically highlight flexible regions by visualizing 3D attention maps. Extensive experiments show that our method achieves the best performance on five datasets (two synthetic and three experimental). We also contribute a new synthetic dataset of the PEDV spike protein for more comprehensive evaluations. Both the code and the PEDV dataset will be released for better reproducibility.
In the past decade, deep conditional generative models have revolutionized the generation of realistic images, extending their application from entertainment to scientific domains. Single-particle cryo-electron microscopy (cryo-EM) is crucial in resolving near-atomic resolution 3D structures of proteins, such as the SARS-COV-2 spike protein. To achieve high-resolution reconstruction, a comprehensive data processing pipeline has been adopted. However, its performance is still limited as it lacks high-quality annotated datasets for training. To address this, we introduce physics-informed generative cryo-electron microscopy (CryoGEM), which for the first time integrates physics-based cryo-EM simulation with a generative unpaired noise translation to generate physically correct synthetic cryo-EM datasets with realistic noises. Initially, CryoGEM simulates the cryo-EM imaging process based on a virtual specimen. To generate realistic noises, we leverage an unpaired noise translation via contrastive learning with a novel mask-guided sampling scheme. Extensive experiments show that CryoGEM is capable of generating authentic cryo-EM images. The generated dataset can used as training data for particle picking and pose estimation models, eventually improving the reconstruction resolution.
Draco: Denoising Reconstruction Autoencoder for CryO-EM
Yingjun Shen*, Haizhao Dai*, Qihe Chen, and 4 more authors
Advances in Neural Information Processing Systems, 2024
Foundation models in computer vision have demonstrated exceptional performance in zero-shot and few-shot tasks by extracting multi-purpose features from large-scale datasets through self-supervised pre-training methods. However, these models often overlook the severe corruption in cryogenic electron microscopy (cryo-EM) images by high-level noises. We introduce DRACO, a Denoising-Reconstruction Autoencoder for CryO-EM, inspired by the Noise2Noise (N2N) approach. By processing cryo-EM movies into odd and even images and treating them as independent noisy observations, we apply a denoising-reconstruction hybrid training scheme. We mask both images to create denoising and reconstruction tasks. For DRACO’s pre-training, the quality of the dataset is essential, we hence build a high-quality, diverse dataset from an uncurated public database, including over 270,000 movies or micrographs. After pre-training, DRACO naturally serves as a generalizable cryo-EM image denoiser and a foundation model for various cryo-EM downstream tasks. DRACO demonstrates the best performance in denoising, micrograph curation, and particle picking tasks compared to state-of-the-art baselines. We will release the code, pre-trained models, and the curated dataset to stimulate further research.