Machine learning methods, in particular convolutional neural networks (CNNs), have been applied to a variety of problems in cryo-EM and macromolecular crystallographic structure solution. However, until rexently, they had only limited acceptance by the community, mainly in areas where they replace repetitive work and allow for easy visual checking, such as particle picking, crystal centering or crystal recognition. (you can find an overview here.) Now, it is clear that their scope could be much wider.
For decades, structural biologists have been solving thousands of protein structures by experimental methods. As the underlying rules of protein folding and dynamics are hard to fathom, we understand protein structures by modelling experimental data – from crystallography, NMR and more recently, electron cryo microscopy (cryo-EM). Very recently, AI-based fold prediction, such as AlphaFold2 or RoseTTAFold, has revolutionized the field, and brought us a step closer to the underlying rules.
However, as these methods are not trained on experimental data, but merely on the macromolecular models that have been built to interpret these measurements, they are inherently limited by our understanding of the experiment and the sample. This is simply demonstrated in macromolecular crystallography, where the large discrepancy between the measured diffraction data and the corresponding published models, is typically around 24%. We need better measurements, better models and tools for interpretation and ultimately ways to bring different experimental and in-silico methods together to better grasp the nature of life on a molecular basis (and improve prediction methods). This is the ultimate goal of my team.
And we are not alone in this endeavour: Thorn lab is a member of Daphne4NFDI, DIG-UM, and our software is distributed with Europe's foremost packages for experimental structural biology: CCP-EM and CCP4. Andrea Thorn is a member of the CUI machine learning task force, the IUCr Computing Comission and was the inaugural deputy chair for "Big Data Analytics" in the DIG-UM board, which represents 20 000 German researchers using large infrastructure in matters of digitization and AI.