New protein-folding AI vastly expands on Alphafold's efforts
The ESM Atlas, led by the Chan Zuckerberg Initiativeโs Biohub, expanded known protein structures by over 800 million, totaling over 1 billion predictions, using the open-source ESMFold2 AI model. ESMFold2 outperforms AlphaFold3 in predicting protein folding and interactions, particularly for protein complexes, aiding drug development and immunotherapy research.
A revolutionary artificial-intelligence model has expanded the known protein universe by more than 800 million structures, producing the largest open-access atlas of protein predictions to date. Researchers at the Chan Zuckerberg Initiativeโs Biohub in San Francisco unveiled the ESM Atlas, which includes predicted structures for over one billion proteins and sequences for an additional 6.8 billion. The achievement, led by Biohub science head Alex Rives, marks a significant leap beyond the AlphaFold Database, the widely used reference for protein structures developed by Google DeepMind. ESMFold2, the AI tool behind the atlas, is described in a newly published preprint and is claimed to outperform AlphaFold3 and other competing systems in predicting how proteins fold and interact.
The ESM Atlas is built on ESMFold2, an open-source AI model that leverages a protein language model trained on billions of protein sequences from across the tree of life. Unlike AlphaFoldโs dataset, which primarily includes well-studied proteins, the new atlas incorporates vast numbers of metagenomic sequences sourced from diverse environments such as soil and ocean ecosystems. This broader inclusion provides researchers with unprecedented access to previously uncharted regions of the protein universe, offering new opportunities to explore biological functions in underrepresented organisms. Biohub researchers report that ESMFold2 excels particularly in predicting the structures of protein complexes, including antibody-antigen interactions, which are critical for advancing immunotherapies and drug development.
The practical impact of this expansion is already evident. In laboratory tests, the team used ESMFold2 to design new antibodies and protein structures targeting proteins associated with cancers and immunological diseases. A high proportion of these computationally designed molecules folded correctly and exhibited the intended binding properties, demonstrating the modelโs potential to accelerate drug discovery. Rives emphasised that the atlas serves as a โpowerful substrate for the discovery of new biology,โ enabling scientists to probe parts of protein space that were previously inaccessible.
While the Biohubโs achievement represents a major milestone, it also underscores the rapidly evolving nature of protein-structure prediction. Competing open-source and proprietary models continue to emerge, with each iteration improving speed, accuracy, and scope. Yet the open-source nature of ESMFold2 distinguishes it, fostering broad collaboration and accelerating global scientific progress. As researchers begin mining this vast new dataset, the scientific community anticipates new insights into protein function, evolution, and therapeutic design, further unlocking the secrets of lifeโs molecular machinery.

