Open letter to Nature editors complaining about the lack of code availability for AF3 (Publishing code is normally a prerequisite for publishing in Nature)
(Yes, I am fully aware of the irony of using a Google form to do this - not my idea, just sharing.)
Boosts Welcome!
#StructuralBiology #Alphafold #Crystallography #CryoEM #NMR @strucbio
https://docs.google.com/forms/d/e/1FAIpQLSf6ioZPbxiDZy5h4qxo-bHa0XOTOxEYHObht0SX8EgwfPHY_g/viewform
Google DocsLetter to the Editor: AlphaFold3 We are submitting the follow as a Letter to the Editor and will post the text immediately on Zenodo. If you would like to endorse to this letter, please fill out the form below.
Authors:
Stephanie A. Wankowicz, UCSF
Pedro Beltrao, ETH
Benjamin Cravatt, Scripps
Roland Dunbrack, FCCC
Anthony Gitter, UW Madison
Kresten Lindorff-Larsen, Copenhagen
Sergey Ovchinnikov, MIT
Nicholas Polizzi, DFCI/HMS
Brian K. Shoichet, UCSF
James S. Fraser, UCSF
The publication of AlphaFold2 was a breakthrough moment for structural biology. Its impact has been far-ranging. Structure predictions for individual proteins opened new avenues for understanding biological systems and small molecule drug discovery. Large-scale prediction studies enabled evolutionary analyses and genetic variant interpretations. The open code was extended and modified for new methods and applications in protein design and protein-protein assembly prediction. These examples, among many, demonstrate how subsequent research and benchmarks have been made possible because the code and models were open and downloadable.
For these reasons, we were disappointed with the lack of code, or even executables accompanying the publication of AlphaFold3 in Nature. Although AlphaFold3 expands AlphaFold2’s capacities to include small molecules, nucleic acids, and chemical modifications, it was released without the means to test and use the software in a high-throughput manner. This does not align with the principles of scientific progress, which rely on the ability of the community to evaluate, use, and build upon existing work. The high-profile publication advertises capabilities that remain locked behind the doors of the parent company.
In this publication, several deviations from our community's standards stand out. First, the absence of available code compromises peer review, a cornerstone of scientific publication and a standard typically upheld by journals. Indeed, one of us (RD) was a reviewer, and despite repeated requests, he was not given access to code during the review. Second, the model's limited availability on a hosted web server, capped at ten predictions per day, restricts the scientific community's capacity to verify the broad claims of the findings or apply the predictions on a large scale. Specifically, the inability to make predictions on novel organic molecules akin to chemical probes and drugs, one of the central claims of the paper, makes it impossible to test or use this method. Finally, the pseudocode released will require months of effort to turn into workable code that approximates the performance, wasting valuable time and resources. Even if such a reimplementation is attempted, restricted access raises questions about whether the results could be fully validated.
Computational costs of machine learning approaches are becoming prohibitive for academic institutions, owing to the high costs of training the models, leaving much computational research and potential breakthroughs in the hands of for-profit companies. While companies have the right to capitalize on their innovations, using the imprimatur of academic publications without the possibility of reproducing the results, far less building on them, subverts the enterprise. The amount of disclosure in the AlphaFold3 publication is appropriate for an announcement on a company website (which, indeed, the authors used to preview these developments), but it fails to meet the scientific community’s standards of being usable, scalable, and transparent.
This moment can motivate our community to raise the bar of openness and transparency to accelerate scientific progress. When journals fail to enforce their written policies about making code available to reviewers1 and alongside publications2, they demonstrate how these policies are applied inequitably and how editorial decisions do not align with the needs of the scientific community. While there is an ever-changing landscape of how science is performed and communicated, journals should uphold their role in the community by ensuring that science is reproducible upon dissemination, regardless of who the authors are.
AI approaches now directly impact biological discovery and human health. Fully realizing their potential will require not only technical breakthroughs but also open and collaborative efforts to build on others’ findings, as is foundational in all scientific research.
1)https://web.archive.org/web/20240511023627/https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards
2)https://web.archive.org/web/20240511023855/https://www.nature.com/nature/for-authors/initial-submission