Categories: NATURE

The huge protein database that spawned AlphaFold and biology’s AI revolution



Crystallographer Helen Berman co-founded the Protein Data Bank in the 1960s.


Credit: Rutgers University

The 2024 Nobels were

all about artificial intelligence
(AI). Pioneers of computer neural networks underlying AI

scooped the physics prize
, and

chemistry went to
two scientists who developed the revolutionary AlphaFold protein-structure prediction tool and one who pioneered

protein design
, a pursuit that has been

supercharged by AI
.


It’s easy to marvel at the technical wizardry behind

breakthroughs such as AlphaFold
. But a lot of that success is thanks to a database of protein structures dreamed up in the 1960s by Helen Berman, a crystallographer at the University of Southern California in Los Angeles, and like-minded scientists.

The Protein Data Bank (PDB) now holds the structures of more than 200,000 proteins, freely available to anyone. These data help AlphaFold to

predict the structures of proteins from their sequence
, and for other AIs to imagine new proteins at the push of a button.

Berman tells

Nature
why she’s pleased with the recognition — chemistry Nobel laureates David Baker at the University of Washington in Seattle, and John Jumper at Google DeepMind in London, both credited the PDB — and how other scientific fields can pave the way for AI breakthroughs with good data.

How did scientists share protein structures before the PDB?

The PDB came into existence when there were only a handful of structures to begin with. They were shared either by punch cards — every atom had its own punch card — or magnetic tape. The individual investigator would have to mail those things across the ocean if it was going from England to America.

What sparked the creation of the PDB?

I was a student in the 1960s in crystallography, and the structures of proteins were just beginning to appear. I was not a protein crystallographer, but I was struck by how important these structures were going to be.

I worked with a few other younger people who were also interested in structure. A small group of us began corresponding with one another about how we could get there to be a protein data bank. I don’t know that we called it that, but that’s what we wanted: some kind of a place where all these structures could be.

Was making these data open a key principle?

At the beginning of the PDB, the whole goal was just to get the protein-structure coordinates, and make sure we didn’t lose them. In the 1980s, there began a movement to say these structures are key for the public health. They’re key for good science. They have to be put in the PDB, because at the time there was no requirement. It required some encouragement on the part of the funding agencies. And it took a while for the journals to buy into the idea of requiring the data to be in the PDB. Now you cannot publish a structure without having it in the PDB.

Do you think we would have had Alpha Fold without the PDB?

Knowing what I think I know about how AlphaFold works, it would have been extremely difficult. Two things were important about the PDB data: it’s checked and validated by expert curators. The other thing is that the data are completely machine readable.

What’s it been like to observe this revolution in biological AI, with tools like AlphaFold, RoseTTAFold and protein-design software? They’re all trained on the PDB.

For me, it’s thrilling. The ideas that I had back then was that we would be able to understand protein sequence–structure relationships better. I am really, really happy about the results that came out of AlphaFold and all the work that David Baker has done in protein design.

Does it speak to the importance of experimental data for powering AI breakthroughs in science?

Yes, 100%. People will say, ‘Oh, well, the PDB data are really special.’ But we actually know why they’re special. It took a long, long time to figure out how to handle the data, how to represent the data, how to collect the data. We as a community, the PDB community, know how to do this.

I think that other communities can, should and must do this. Because otherwise we’re not going to get the big breakthroughs. The methodologies that allow you to do protein prediction and protein design — the same thing could happen in chemistry. It could happen in geology. It could happen in physics.

This interview has been edited for length and clarity.



Source link

fromermedia@gmail.com

Share
Published by
fromermedia@gmail.com

Recent Posts

Apple Unveils the Entry-Level iPhone 16e With Some Modern Features

Formerly known as the iPhone SE, the new handset features an A18 processor and can…

1 hour ago

Monster Army Rider Luca Harrington Takes Second Place in the Men’s Freeski Slopestyle at Stoneham World Cup in Canada

Monster Energy congratulates Monster Army rider Luca Harrington on taking second place in the Men's…

1 hour ago

Elon Musk’s star power fails to help far-right AfD win German election

In spite of the Tesla CEO's best efforts, the AfD performed no better than had…

2 hours ago

Meta, X approved ads containing violent anti-Muslim, antisemitic hate speech ahead of German election, study finds

Social media giants Meta and X approved ads targeting users in Germany with violent anti-Muslim…

2 days ago

Have a Lovely Weekend. | Cup of Jo

What are you up to this weekend? Anton, his friend, and I are on a…

2 days ago

My 2025 Macro Observations: Irrational Exuberance 3.0?

In This Article If the last two years in financial markets were a movie, they’d…

2 days ago