Democratizing science with Boltz-1: the first truly open-source biomolecular structure prediction model
With models like AlphaFold3 limited to academic research, MIT researchers wanted to build an equivalent model that is fully open-source and commercially available to encourage innovation beyond the realm of academia.
This year, the Nobel Committee for Chemistry recognized Demis Hassabis and John Jumper for their work on DeepMind’s AlphaFold2, certifying that AI-powered protein structure prediction achieves the lofty goal of providing “the greatest benefit to humanity.” As such, this year’s release of AlphaFold3 was highly anticipated. But unlike its predecessor, AlphaFold3 is not fully open-source, nor is it available for commercial use, prompting criticism from the scientific community upon its release in May. Now, MIT researchers have released Boltz-1, the first truly open-source biomolecular structure prediction model that achieves AlphaFold3-level accuracy.
Named after the Boltzmann Distribution, a probability measure that describes the distribution of molecular structures, Boltz-1 was developed in a span of four months by a team of PhD students affiliated with the MIT Abdul Latif Jameel Clinic for Machine Learning in Health (MIT Jameel Clinic) and advised by MIT professors of Electrical Engineering and Computer Science (EECS) Regina Barzilay and Tommi Jaakkola. The three lead developers behind Boltz-1 are PhD students Jeremy Wohlwend and Gabriele Corso, along with MIT Jameel Clinic researcher Saro Passaro.
Left to right: Gabriele Corso, Jeremy Wohlwend, Saro Passaro.
“Commercial entities use models in different ways, more directly tied to drug development,” says Wohlwend. “There is really a lot to be learned by how to improve those models by having [industry researchers] use it.”
The researchers hope that Boltz-1 can be used as a foundation for protein design and virtual screening to help standardize research practices in structural biology for the global scientific community.
“The biggest challenge was the scale,” says Corso. “It’s very large. It’s definitely bigger than anything we’ve ever done in the lab before in terms of data, training and compute.”
Most academic experiments in this field are limited to just a handful of Graphics Processing Units (GPUs) connected at once, which was far from what was needed to accomplish the feat of engineering necessary to run Boltz-1. To access more GPUs, the researchers solicited assistance from the U.S. Department of Energy and later Genesis Therapeutics, allowing them to build the computational muscle necessary to complete Boltz-1.
But the team’s obstacles weren’t limited to processing power. According to Wohlwend, another challenge they faced while developing Boltz-1 was grappling with data complexity. “Structure databases like the PDB [Protein Data Bank], while amazing resources, are built for biologists and are not well suited to model training out of the box,” he explains. “There are many edge cases, and it takes a lot of trial and error to develop a good understanding of the data.” Wohlwend adds that it was “exciting to see what we were able to achieve with a small team.”
Mathai Mammen, CEO and President of Parabilis Medicines, calls Boltz-1 a “breakthrough” model. “By open-sourcing this advance, the MIT Jameel Clinic and collaborators are democratizing access to cutting-edge structural biology tools,” he says. “This landmark effort will accelerate the creation of life-changing medicines. Thank you to the Boltz-1 team for driving this profound leap forward!”
“Boltz-1 will be enormously enabling for my lab and the whole community,” says Jonathan Weissman, MIT professor of Biology at the Whitehead Institute for Biomedical Engineering. “We will see a whole wave of discoveries made possible by democratizing this powerful tool.” Weissman adds that he anticipates that the open-source nature of Boltz-1 will lead to a vast array of creative new applications.
The research team plans to make continuous improvements to Boltz-1 and invites researchers to try Boltz-1 on their GitHub repository and connect with fellow users of Boltz-1 on their Slack channel.
This work was also supported by the NSF Expeditions grant (award 1918839: Collaborative Research: Understanding the World Through Code); the Abdul Latif Jameel Clinic for Machine Learning in Health; the DTRA Discovery of Medical Countermeasures Against New and Emerging (DOMANE) Threats program; and the MATCHMAKERS project supported by the Cancer Grand Challenges partnership financed by CRUK (CGCATF-2023/100001) and the National Cancer Institute.