back to index

Dmitry Korkin: Evolution of Proteins, Viruses, Life, and AI | Lex Fridman Podcast #153


Chapters

0:0 Introduction
1:57 Proteins and the building blocks of life
9:0 Spike protein
15:48 Coronavirus biological structure explained
20:45 Virus mutations
27:16 Evolution of proteins
37:2 Self-replicating computer programs
44:38 Origin of life
52:11 Extraterrestrial life in our solar system
54:8 Joshua Lederberg
60:7 Dendral
63:1 Why did expert systems fail?
65:12 AlphaFold 2
86:50 Will AI revolutionize art and music?
93:49 Multi-protein folding
98:16 Will AlphaFold 2 result in a Nobel Prize?
100:47 Will AI be used to engineer deadly viruses?
115:54 Book recommendations
125:37 Family
128:15 A poem in Russian

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Dmitry Korkin,
00:00:02.880 | his second time in the podcast.
00:00:04.860 | He's a professor of bioinformatics
00:00:06.980 | and computational biology at WPI,
00:00:09.740 | where he specializes in bioinformatics of complex disease,
00:00:13.540 | computational genomics, systems biology,
00:00:16.300 | and biomedical data analytics.
00:00:18.580 | He loves biology, he loves computing,
00:00:22.060 | plus he is Russian and recites a poem in Russian
00:00:26.140 | at the end of the podcast.
00:00:27.780 | What else could you possibly ask for in this world?
00:00:31.100 | Quick mention of our sponsors,
00:00:32.960 | Brave Browser, NetSuite Business Management Software,
00:00:37.760 | Magic Spoon Low Carb Cereal,
00:00:40.300 | and Eight Sleep Self-Cooling Mattress.
00:00:42.920 | So the choice is browsing privacy, business success,
00:00:46.380 | healthy diet, or comfortable sleep.
00:00:49.180 | Choose wisely, my friends.
00:00:50.660 | And if you wish, click the sponsor links below
00:00:53.640 | to get a discount and to support this podcast.
00:00:56.460 | As a side note, let me say that to me,
00:00:58.620 | the scientists that did the best apolitical,
00:01:01.540 | impactful, brilliant work of 2020
00:01:04.020 | are the biologists who study viruses without an agenda,
00:01:09.020 | without much sleep, to be honest,
00:01:11.820 | just a pure passion for scientific discovery
00:01:14.500 | and exploration of the mysteries within viruses.
00:01:18.440 | Viruses are both terrifying and beautiful.
00:01:21.380 | Terrifying because they can threaten
00:01:22.980 | the fabric of human civilization,
00:01:25.180 | both biological and psychological.
00:01:27.860 | Beautiful because they give us insights
00:01:30.500 | into the nature of life on Earth
00:01:32.980 | and perhaps even extraterrestrial life
00:01:35.940 | of the not-so-intelligent variety
00:01:38.020 | that might meet us one day
00:01:39.580 | as we explore the habitable planets
00:01:41.540 | and moons in our universe.
00:01:43.780 | If you enjoy this thing, subscribe on YouTube,
00:01:45.820 | review it on Apple Podcasts, follow on Spotify,
00:01:49.080 | support on Patreon, or connect with me
00:01:50.940 | on Twitter @LexFriedman.
00:01:53.220 | And now, here's my conversation with Dmitry Korkin.
00:01:56.960 | It's often said that proteins
00:02:00.720 | and the amino acid residues that make them up
00:02:04.260 | are the building blocks of life.
00:02:06.460 | Do you think of proteins in this way
00:02:08.060 | as the basic building blocks of life?
00:02:11.200 | - Yes and no.
00:02:12.240 | So the proteins indeed is the basic unit,
00:02:16.340 | biological unit, that carries out
00:02:20.500 | important function of the cell.
00:02:22.860 | However, through studying the proteins
00:02:25.820 | and comparing the proteins across different species,
00:02:29.340 | across different kingdoms,
00:02:31.420 | you realize that proteins are actually
00:02:34.620 | much more complicated.
00:02:36.740 | So they have so-called modular complexity.
00:02:41.740 | And so what I mean by that is an average protein
00:02:47.880 | consists of several structural units.
00:02:52.880 | So we call them protein domains.
00:02:57.480 | And so you can imagine a protein as a string of beads,
00:03:02.480 | where each bead is a protein domain.
00:03:04.860 | And in the past 20 years,
00:03:10.240 | scientists have been studying the nature
00:03:13.600 | of the protein domains.
00:03:15.040 | Because we realized that it's the unit.
00:03:19.480 | Because if you look at the functions, right?
00:03:22.120 | So many proteins have more than one function.
00:03:25.880 | And those protein functions are often carried out
00:03:29.440 | by those protein domains.
00:03:31.560 | So we also see that in the evolution,
00:03:36.560 | those proteins domains get shuffled.
00:03:40.160 | So they act actually as a unit.
00:03:43.460 | Also from the structural perspective, right?
00:03:45.280 | So some people think of a protein
00:03:50.280 | as a sort of a globular molecule.
00:03:55.320 | But as a matter of fact,
00:03:56.800 | is the globular part of this protein is a protein domain.
00:04:01.800 | So we often have this, again,
00:04:06.000 | the collection of this protein domains
00:04:10.580 | align on a string as beads.
00:04:14.600 | - And the protein domains are made up of amino acid residues.
00:04:17.880 | - Yes.
00:04:18.720 | - So we're talking--
00:04:19.560 | - So it's--
00:04:20.380 | - So this is the basic,
00:04:21.220 | so you're saying the protein domain
00:04:22.560 | is the basic building block of the function
00:04:25.600 | that we think about proteins doing.
00:04:28.300 | So of course, you can always talk about
00:04:30.360 | different building blocks.
00:04:31.480 | It's turtles all the way down.
00:04:32.840 | But there's a point where there is,
00:04:35.320 | at the point of the hierarchy,
00:04:37.640 | where it's the most, the cleanest element block
00:04:42.040 | based on which you can put them together
00:04:46.220 | in different kinds of ways to form complex function.
00:04:49.200 | And you're saying protein domains,
00:04:50.880 | why is that not talked about as often in popular culture?
00:04:55.160 | - Well, there are several perspectives on this.
00:04:58.240 | And one, of course, is the historical perspective, right?
00:05:03.200 | So historically, scientists have been able
00:05:07.760 | to structurally resolve to obtain the 3D coordinates
00:05:12.400 | of a protein for smaller proteins.
00:05:17.400 | And smaller proteins tend to be a single domain protein.
00:05:21.000 | So we have a protein equal to a protein domain.
00:05:24.000 | And so because of that, the initial suspicion
00:05:27.080 | was that the proteins, they have globular shapes,
00:05:31.720 | and the more of smaller proteins you obtain structurally,
00:05:36.720 | the more you became convinced that that's the case.
00:05:41.720 | And only later when we started having
00:05:45.720 | alternative approaches, so the traditional ones
00:05:52.920 | are X-ray crystallography and NMR spectroscopy.
00:05:57.320 | So these are sort of the two main techniques
00:06:02.000 | that give us the 3D coordinates.
00:06:04.440 | But nowadays, there is huge breakthrough
00:06:07.760 | in cryo-electron microscopy.
00:06:10.460 | So the more advanced methods that allow us
00:06:13.600 | to get into the 3D shapes of much larger molecules,
00:06:20.440 | molecular complexes, just to give you
00:06:24.920 | one of the common examples for this year.
00:06:29.200 | Right, so the first experimental structure
00:06:32.760 | of a SARS-CoV-2 protein was the cryo-EM structure
00:06:37.760 | of the S protein, so the spike protein.
00:06:41.960 | And so it was solved very quickly.
00:06:46.320 | And the reason for that is the advancement
00:06:49.480 | of this technology is pretty spectacular.
00:06:53.960 | - How many domains is the, is it more than one domain?
00:06:57.480 | - Oh yes, oh yes, I mean, so it's a very complex structure.
00:07:01.320 | - It's complex.
00:07:03.040 | - We, you know, on top of the complexity
00:07:06.480 | of a single protein, right, so this structure
00:07:11.160 | is actually, is a complex, is a trimer.
00:07:13.720 | So it needs to form a trimer in order to function properly.
00:07:17.640 | - What's a complex?
00:07:18.720 | - So a complex is agglomeration of multiple proteins.
00:07:22.880 | And so we can have the same protein copied in multiple,
00:07:27.880 | you know, made up in multiple copies
00:07:32.120 | and forming something that we called a homo-oligomer.
00:07:36.160 | Homo means the same, right?
00:07:38.120 | So in this case, so the spike protein is an example
00:07:43.120 | of a homotetrimer, homotrimer, sorry.
00:07:46.720 | - So it needs three copies of a--
00:07:48.080 | - Three copies, exactly. - In order to.
00:07:50.040 | - Exactly.
00:07:50.880 | - We have these three chains, the three molecular chains
00:07:55.000 | coupled together and performing the function.
00:07:58.480 | That's what, when you look at this protein from the top,
00:08:02.380 | you see a perfect triangle.
00:08:03.920 | - Yeah.
00:08:04.760 | - So, but other, you know, so other complexes
00:08:08.280 | are made up of, you know, different proteins.
00:08:12.160 | Some of them are completely different,
00:08:15.400 | some of them are similar.
00:08:16.920 | The hemoglobin molecule, right, so it's actually,
00:08:20.160 | it's a protein complex.
00:08:21.880 | It's made of four basic subunits.
00:08:25.760 | Two of them are identical to each other,
00:08:29.040 | two other identical to each other,
00:08:30.800 | but they are also similar to each other,
00:08:32.820 | which sort of gives us some ideas
00:08:36.000 | about the evolution of this, you know, of this molecule.
00:08:40.640 | And perhaps, so one of the hypothesis is that, you know,
00:08:43.980 | in the past, it was just a homotetrimer, right?
00:08:48.280 | So four identical copies, and then it became, you know,
00:08:53.160 | sort of modified, it became mutated over the time
00:08:58.160 | and became more specialized.
00:09:00.200 | - Can we linger on the spike protein for a little bit?
00:09:02.600 | Is there something interesting
00:09:04.980 | or like beautiful you find about it?
00:09:07.000 | - I mean, first of all,
00:09:07.920 | it's an incredibly challenging protein.
00:09:11.000 | And so we, as a part of our sort of research
00:09:16.160 | to understand the structural basis of this virus
00:09:20.200 | to sort of decode, structurally decode
00:09:22.760 | every single protein in its proteome,
00:09:27.600 | we've been working on the spike protein.
00:09:31.800 | And one of the main challenges was that
00:09:34.480 | the cryo-EM data allows us to reconstruct
00:09:40.680 | or to obtain the 3D coordinates
00:09:44.680 | of roughly 2/3 of the protein.
00:09:48.080 | The rest of the 1/3 of this protein,
00:09:51.960 | it's a part that is buried into the membrane of the virus
00:09:56.960 | and of the viral envelope.
00:10:01.600 | And it also has a lot of unstable structures around it.
00:10:06.600 | - So it's chemically interacting somehow
00:10:08.680 | with whatever the heck it's connecting to.
00:10:10.200 | - Yeah, so people are still trying to understand.
00:10:12.840 | So the nature of and the role of this 1/3,
00:10:17.840 | 'cause the top part,
00:10:20.180 | the primary function is to get attached
00:10:25.240 | to the ACE2 receptor, human receptor.
00:10:29.560 | There is also beautiful mechanics
00:10:33.880 | of how this thing happens, right?
00:10:36.080 | So because there are three different copies of this chains,
00:10:41.960 | there are three different domains, right?
00:10:44.840 | So we're talking about domains.
00:10:46.080 | So this is the receptor binding domains, RBDs,
00:10:49.240 | that gets untangled and get ready
00:10:52.960 | to get attached to the receptor.
00:10:56.840 | And now they are not necessarily going in a sync mode.
00:11:01.840 | As a matter of fact--
00:11:05.360 | - It's asynchronous?
00:11:06.640 | - So yes, and this is where another level
00:11:11.680 | of complexity comes into play,
00:11:13.600 | because right now what we see is,
00:11:17.160 | we typically see just one of the arms going out
00:11:21.840 | and getting ready to be attached to the ACE2 receptors.
00:11:26.840 | However, there was a recent mutation
00:11:30.360 | that people studied in that spike protein.
00:11:35.080 | And very recently, a group from Umayyad,
00:11:40.080 | a group from Umayyad Medical School,
00:11:43.560 | we happened to collaborate with groups.
00:11:45.280 | So this is a group of Jeremy Luban
00:11:47.240 | and a number of other faculty.
00:11:50.640 | They actually solved the mutated structure of the spike.
00:11:56.560 | And they showed that actually, because of these mutations,
00:12:03.000 | you have more than one arms opening up.
00:12:08.860 | And so now, so the frequency of two arms going up
00:12:13.860 | increased quite drastically.
00:12:18.120 | - Oh, interesting.
00:12:18.960 | Does that change the dynamics somehow?
00:12:21.120 | - It potentially can change the dynamics of,
00:12:24.320 | because now you have two possible opportunities
00:12:28.420 | to get attached to the ACE2 receptor.
00:12:31.140 | It's a very complex molecular process, mechanistic process.
00:12:35.280 | But the first step of this process
00:12:37.420 | is the attachment of this spike protein,
00:12:41.500 | of the spike trimer, to the human ACE2 receptor.
00:12:46.500 | So this is a molecule that sits
00:12:48.900 | on the surface of the human cell.
00:12:51.940 | And that's essentially what initiates,
00:12:54.720 | what triggers the whole process of encapsulation.
00:12:58.880 | - If this was dating, this would be the first date.
00:13:01.440 | So this is the--
00:13:02.280 | (laughing)
00:13:03.120 | - In a way, yes.
00:13:05.660 | - So is it possible to have the spike protein
00:13:07.940 | just floating about on its own?
00:13:10.620 | Or does it need that interactability with the membrane?
00:13:14.660 | - Yeah, so it needs to be attached,
00:13:16.920 | at least as far as I know.
00:13:19.020 | But when you get this thing attached on the surface,
00:13:23.300 | there is also a lot of dynamics
00:13:25.140 | on how it sits on the surface.
00:13:28.200 | So for example, there was a recent work in,
00:13:32.220 | again, where people use the cryo-lactone microscopy
00:13:35.780 | to get the first glimpse of the overall structure.
00:13:38.940 | It's a very low res,
00:13:40.180 | but you still get some interesting details
00:13:43.820 | about the surface, about what is happening inside,
00:13:47.040 | because we have literally no clue until recent work
00:13:50.740 | about how the capsid is organized.
00:13:54.540 | - What's a capsid?
00:13:55.380 | - So a capsid is essentially,
00:13:56.740 | it's the inner core of the viral particle
00:14:01.020 | where there is the RNA of the virus,
00:14:05.020 | and it's protected by another protein, N-protein,
00:14:09.200 | that essentially acts as a shield.
00:14:13.460 | But now we are learning more and more,
00:14:16.580 | so it's actually, it's not just this shield.
00:14:18.660 | It potentially is used for the stability
00:14:21.840 | of the outer shell of the virus.
00:14:25.100 | So it's pretty complicated.
00:14:27.860 | - And I mean, understanding all of this
00:14:29.780 | is really useful for trying to figure out
00:14:31.660 | like developing a vaccine or some kind of drug
00:14:34.140 | to attack any aspects of this, right?
00:14:36.080 | - So, I mean, there are many different implications to that.
00:14:39.420 | I mean, first of all,
00:14:41.100 | it's important to understand the virus itself, right?
00:14:44.620 | So in order to understand how it acts,
00:14:49.620 | what is the overall mechanistic process of this virus,
00:14:56.540 | replication of this virus, proliferation to the cell, right?
00:15:00.580 | So that's one aspect.
00:15:03.020 | The other aspect is designing new treatments, right?
00:15:06.500 | So one of the possible treatments
00:15:09.060 | is designing nanoparticles.
00:15:12.500 | And so some nanoparticles that will resemble the viral shape
00:15:17.220 | that would have the spike integrated,
00:15:19.540 | and essentially would act as a competitor to the real virus
00:15:23.700 | by blocking the ACE2 receptors
00:15:26.700 | and thus preventing the real virus entering the cell.
00:15:30.400 | Now, there are also, you know,
00:15:33.940 | there is a very interesting direction
00:15:36.700 | in looking at the membrane,
00:15:39.420 | at the envelope portion of the protein
00:15:42.020 | and attacking its M protein.
00:15:45.980 | So there are, you know, to give you a, you know,
00:15:49.460 | sort of a brief overview,
00:15:51.200 | there are four structural proteins.
00:15:53.500 | These are the proteins that made up
00:15:55.780 | a structure of the virus.
00:15:59.340 | So spike, S protein, that acts as a trimer,
00:16:04.060 | so it needs three copies.
00:16:05.900 | E, envelope protein, that acts as a pentamer,
00:16:10.820 | so it needs five copies to act properly.
00:16:14.420 | M is a membrane protein.
00:16:17.940 | It forms dimers, and actually it forms beautiful lattice.
00:16:21.700 | And this is something that we've been studying
00:16:23.540 | and we are seeing it in simulations.
00:16:25.700 | It actually forms a very nice grid or, you know,
00:16:29.660 | threads, you know, of different dimers
00:16:33.740 | attached next to each other.
00:16:34.580 | - Just a bunch of copies of each other,
00:16:36.260 | and they naturally, when you have a bunch of copies
00:16:38.300 | of each other, they form an interesting lattice.
00:16:40.220 | - Exactly.
00:16:41.060 | And, you know, if you think about this, right,
00:16:43.540 | so this complex, you know, the viral shape
00:16:49.500 | needs to be organized somehow, self-organized somehow.
00:16:53.060 | Right?
00:16:53.900 | So, you know, if it was a completely random process,
00:16:57.940 | you know, you probably wouldn't have the envelope shell
00:17:02.100 | of the ellipsoid shape.
00:17:03.740 | You know, you would have something, you know,
00:17:05.900 | pre-random, right, shape.
00:17:07.620 | So there is some, you know, regularity
00:17:10.580 | in how this, you know, how these dimers
00:17:16.740 | get attached to each other in a very specific, directed way.
00:17:20.540 | - Is that understood at all?
00:17:21.940 | - It's not understood.
00:17:24.300 | We are now, we've been working in the past six months
00:17:28.420 | since, you know, we met.
00:17:29.860 | Actually, this is where we started working
00:17:32.140 | on trying to understand the overall structure
00:17:35.020 | of the envelope and the key components
00:17:38.180 | that made up this, you know, structure.
00:17:41.100 | - Wait, does the envelope also have
00:17:42.300 | the lattice structure or no?
00:17:43.580 | - So the envelope is essentially is the outer shell
00:17:47.380 | of the viral particle.
00:17:48.820 | The N, the nucleocapsid protein,
00:17:51.620 | is something that is inside.
00:17:54.020 | - Got it.
00:17:54.860 | - But get that, the N is likely to interact with M.
00:17:59.500 | - Does it go M and E?
00:18:01.460 | Like, where's the E in--
00:18:02.620 | - So E, those different proteins,
00:18:05.660 | they occur in different copies on the viral particle.
00:18:10.820 | So E, this pentamer complex,
00:18:13.980 | we only have two or three maybe per each particle.
00:18:18.980 | - Mm-hmm. - Okay?
00:18:20.020 | We have thousand or so of M dimers
00:18:24.540 | that essentially made up,
00:18:26.580 | that makes up the entire, you know, outer shell.
00:18:30.940 | - So most of the outer shell is the M--
00:18:33.700 | - M dimer.
00:18:34.540 | - It's the M protein. - And lipids.
00:18:35.640 | - When you say particle, that's the viron,
00:18:38.180 | the virates, the individual virus.
00:18:40.100 | - The single, yes.
00:18:41.020 | - Single element of the virus, single virus.
00:18:43.620 | - Single virus, right.
00:18:45.100 | And we have about, you know,
00:18:47.100 | roughly 50 to 90 spike trimers, right?
00:18:51.100 | So when you, you know, when you show a--
00:18:53.460 | - Per virus particle.
00:18:55.020 | - Per virus particle.
00:18:56.540 | - Sorry, what did you say, 50 to 90?
00:18:58.660 | - 50 to 90, right? - Cool.
00:19:00.700 | - So this is how this thing is organized.
00:19:03.980 | And so now, typically, right,
00:19:06.380 | so you see these, the antibodies that target,
00:19:11.380 | you know, spike protein, certain parts of the spike protein,
00:19:15.180 | but there could be some, also some treatments, right?
00:19:17.940 | So these are, you know, these are small molecules
00:19:21.980 | that bind strategic parts of these proteins,
00:19:26.980 | disrupting its functioning.
00:19:29.660 | So one of the promising directions,
00:19:34.060 | it's one of the newest directions,
00:19:35.620 | is actually targeting the M-dimer of the protein,
00:19:40.620 | targeting the proteins that make up this outer shell.
00:19:44.260 | Because if you're able to destroy the outer shell,
00:19:47.740 | you're essentially destroying the viral particle itself.
00:19:52.260 | So preventing it from, you know, functioning at all.
00:19:56.820 | - So that's, you think, is,
00:19:59.260 | from a sort of cybersecurity perspective,
00:20:01.540 | virus security perspective,
00:20:03.060 | that's the best attack vector?
00:20:05.260 | Is, or like, that's a promising attack vector?
00:20:08.540 | - I would say, yeah.
00:20:09.380 | So I mean, there's still tons of research needs to be,
00:20:12.700 | you know, to be done.
00:20:14.020 | But yes, I think, you know, so--
00:20:16.580 | - There's more attack surface, I guess.
00:20:18.900 | - More attack surface, but, you know,
00:20:20.460 | from our analysis, from other evolutionary analysis,
00:20:24.180 | this protein is evolutionarily more stable
00:20:27.980 | compared to the, say, to the spike protein.
00:20:31.220 | - Oh, and stable means a more static target?
00:20:35.500 | - Well, yeah.
00:20:36.340 | So it doesn't change, it doesn't evolve
00:20:39.980 | from the evolutionary perspective so drastically
00:20:43.460 | as, for example, the spike protein.
00:20:46.020 | - There's a bunch of stuff in the news
00:20:47.940 | about mutations of the virus in the United Kingdom.
00:20:51.420 | I also saw in South Africa something,
00:20:54.180 | maybe that was yesterday.
00:20:55.460 | You just kind of mentioned about stability and so on.
00:21:00.180 | Which aspects of this are mutatable
00:21:02.780 | and which aspects, if mutated, become more dangerous?
00:21:07.580 | And maybe even zooming out,
00:21:09.260 | what are your thoughts and knowledge and ideas
00:21:12.060 | about the way it's mutated,
00:21:13.660 | all the news that we've been hearing?
00:21:15.340 | Are you worried about it from a biological perspective?
00:21:18.460 | Are you worried about it from a human perspective?
00:21:21.260 | - So I mean, you know, mutations are sort of a general way
00:21:26.260 | for these viruses to evolve, right?
00:21:28.620 | So it's essentially, this is the way they evolve.
00:21:33.620 | This is the way they were able to jump
00:21:38.660 | from one species to another.
00:21:42.060 | We also see some recent jumps.
00:21:46.780 | There were some incidents of this virus
00:21:49.540 | jumping from human to dogs.
00:21:51.860 | So there is some danger in those jumps
00:21:55.860 | because every time it jumps, it also mutates, right?
00:21:59.460 | So when it jumps to the species and jumps back,
00:22:04.460 | so it acquires some mutations that are sort of driven
00:22:09.940 | by the environment of a new host, right?
00:22:16.300 | And it's different from the human environment.
00:22:19.220 | And so we don't know whether the mutations
00:22:21.420 | that are acquired in the new species are neutral
00:22:26.180 | with respect to the human host or maybe damaging.
00:22:31.180 | - Yeah, change is always scary.
00:22:33.620 | But so are you worried about, I mean, it seems like
00:22:37.460 | because the spread is during winter now
00:22:40.380 | seems to be exceptionally high,
00:22:42.280 | and especially with a vaccine just around the corner
00:22:46.780 | already being actually deployed,
00:22:49.140 | is there some worry that this puts evolutionary pressure,
00:22:53.020 | selective pressure on the virus for it to mutate?
00:22:58.020 | Is that a source of worry? - Yes.
00:23:00.420 | Well, I mean, there is always this thought
00:23:02.660 | in the scientist's mind, what will happen, right?
00:23:07.660 | So I know there've been discussions
00:23:12.500 | about sort of the arms race between the ability
00:23:17.580 | of the humanity to get vaccinated faster
00:23:22.580 | than the virus essentially becomes resistant to the vaccine.
00:23:30.940 | I mean, I don't worry that much
00:23:41.740 | simply because there is not that much evidence to that.
00:23:47.500 | - To aggressive mutation around a vaccine.
00:23:50.020 | - Exactly.
00:23:51.140 | Obviously there are mutations around the vaccine.
00:23:54.860 | So the reason we get vaccinated every year
00:23:59.860 | against the season of flu. - Because there's mutations.
00:24:02.740 | - But I think it's important to study it, no doubts.
00:24:08.860 | So I think one of the, to me, and again, I might be biased
00:24:15.220 | because we've been trying to do that as well.
00:24:20.220 | But one of the critical directions
00:24:22.900 | in understanding the virus is to understand its evolution
00:24:26.580 | in order to sort of understand the mechanisms,
00:24:30.220 | the key mechanisms that lead the virus to jump,
00:24:34.140 | the Nordic viruses to jump from species to another,
00:24:38.660 | that the mechanisms that lead the virus
00:24:41.700 | to become resistant to vaccines, also to treatments.
00:24:46.700 | Right?
00:24:47.780 | And hopefully that knowledge will enable us
00:24:51.500 | to sort of forecast the evolutionary traces,
00:24:55.500 | the future evolutionary traces of this virus.
00:24:58.100 | - I mean, what, from a biological perspective,
00:25:00.980 | this might be a dumb question,
00:25:02.180 | but is there parts of the virus that if souped up
00:25:07.940 | through mutation could make it more effective
00:25:11.780 | at doing its job?
00:25:12.620 | We're talking about this specific coronavirus.
00:25:15.700 | 'Cause we were talking about the different,
00:25:16.860 | like the membrane, the M protein, the E protein,
00:25:20.900 | the N and the S, the spike.
00:25:25.540 | Is there some-- - And 20 or so more
00:25:28.940 | in addition to that.
00:25:30.260 | - But is that a dumb way to look at it?
00:25:32.300 | Like which of these, if mutated, could,
00:25:38.140 | have the greatest impact, potentially damaging impact
00:25:42.020 | on the effectiveness of the virus?
00:25:43.460 | - So it's actually, it's a very good question.
00:25:46.580 | Because, and the short answer is we don't know yet.
00:25:50.140 | But of course there is capacity of this virus
00:25:53.500 | to become more efficient.
00:25:55.560 | The reason for that is, you know,
00:25:58.700 | so if you look at the virus, I mean, it's a machine, right?
00:26:01.820 | So it's a machine that does a lot of different functions.
00:26:05.500 | And many of these functions are sort of nearly perfect,
00:26:08.500 | but they are not perfect.
00:26:09.820 | And those mutations can make those functions more perfect.
00:26:14.100 | For example, the attachment to ACE2 receptor, right?
00:26:18.220 | Of the spike, right?
00:26:19.380 | So, you know, is it,
00:26:23.540 | has this virus reached the efficiency
00:26:28.340 | in which the attachment is carried out?
00:26:31.540 | Or there are some mutations that,
00:26:34.140 | that still to be discovered, right?
00:26:36.300 | That will make this attachment sort of stronger,
00:26:41.300 | or, you know, something more, in a way more efficient
00:26:46.900 | from the point of view of this virus functioning.
00:26:51.880 | That's sort of the obvious example.
00:26:54.660 | But if you look at each of these proteins,
00:26:57.500 | I mean, it's there for a reason.
00:26:58.820 | It performs certain function.
00:27:00.780 | And it could be that certain mutations
00:27:05.580 | will, you know, enhance this function.
00:27:08.500 | It could be that some mutations
00:27:09.900 | will make this function much less efficient.
00:27:13.700 | So that's also the case.
00:27:15.360 | - Let's, since we're talking about
00:27:17.820 | the evolutionary history of a virus,
00:27:20.420 | let's zoom back out and look at the evolution of proteins.
00:27:25.220 | I glanced at this 2010 Nature paper
00:27:29.980 | on the, quote, "Ongoing expansion of the protein universe."
00:27:34.340 | And then, you know, it kind of implies
00:27:37.380 | and talks about that proteins started
00:27:41.340 | with a common ancestor, which is, you know,
00:27:43.600 | kind of interesting.
00:27:44.700 | It's interesting to think about, like,
00:27:45.940 | even just like the first organic thing
00:27:49.740 | that started life on Earth.
00:27:51.860 | And from that, there's now, you know, what is it?
00:27:55.980 | 3.5 billion years later, there's now millions of proteins.
00:27:59.900 | And they're still evolving.
00:28:01.300 | And that's, you know, in part,
00:28:02.980 | one of the things that you're researching.
00:28:04.980 | Is there something interesting to you
00:28:06.820 | about the evolution of proteins
00:28:09.880 | from this initial ancestor to today?
00:28:14.620 | Is there something beautiful,
00:28:15.660 | insightful about this long story?
00:28:18.120 | - So I think, you know, if I were to pick a single keyword
00:28:23.120 | about protein evolution, I would pick modularity,
00:28:29.340 | something that we talked about in the beginning.
00:28:32.940 | And that's the fact that the proteins
00:28:36.260 | are no longer considered as, you know,
00:28:39.260 | as a sequence of letters.
00:28:41.480 | There are hierarchical complexities
00:28:46.060 | in the way these proteins are organized.
00:28:48.420 | And these complexities are actually going
00:28:51.940 | beyond the protein sequence.
00:28:54.140 | It's actually going all the way back to the gene,
00:28:57.940 | to the nucleotide sequence.
00:29:00.180 | And so, you know, again, these protein domains,
00:29:05.040 | they are not only functional building blocks.
00:29:08.060 | They're also evolutionary building blocks.
00:29:10.140 | And so what we see in the sort of,
00:29:12.780 | in the later stages of evolution,
00:29:15.300 | I mean, once this stable, structurally
00:29:18.940 | and functionally building blocks were discovered,
00:29:22.220 | they essentially, they stay, those domains stay as such.
00:29:28.220 | So that's why if you start comparing different proteins,
00:29:31.760 | you will see that many of them will have similar fragments.
00:29:36.760 | And those fragments will correspond to something
00:29:39.820 | that we call protein domain families.
00:29:42.460 | And so they are still different
00:29:44.220 | because you still have mutations and, you know,
00:29:48.220 | different mutations are attributed to, you know,
00:29:54.460 | diversification of the function of this, you know,
00:29:57.820 | protein domains.
00:29:59.020 | However, you don't, you very rarely see, you know,
00:30:03.700 | the evolutionary events that would split
00:30:07.980 | this domain into fragments because, and it's, you know,
00:30:12.160 | once you have the domain split, you actually,
00:30:18.180 | you know, you can completely cancel out its function
00:30:23.980 | or at the very least you can reduce it.
00:30:26.580 | And that's not, you know, efficient from the point of view
00:30:29.620 | of the, you know, of the cell functioning.
00:30:32.860 | So the protein domain level is a very important one.
00:30:37.860 | Now, on top of that, right?
00:30:42.020 | So if you look at the proteins, right?
00:30:44.100 | So you have this structural units
00:30:46.340 | and they carry out the function.
00:30:48.200 | But then much less is known about things
00:30:51.880 | that connect this protein domains.
00:30:54.400 | Something that we call linkers.
00:30:56.380 | And those linkers are completely flexible, you know,
00:31:00.780 | parts of the protein that nevertheless
00:31:03.540 | carry out a lot of function.
00:31:06.380 | - It's like little tails, little heads.
00:31:08.060 | - So we do have tails, so they're called termini,
00:31:11.100 | C and N termini.
00:31:12.340 | So these are things right on the,
00:31:14.060 | on one and another ends of the protein sequence.
00:31:20.060 | So they are also very important.
00:31:22.560 | So they attributed to very specific interactions
00:31:26.320 | between the proteins.
00:31:28.560 | - But you're referring to the links between domains.
00:31:30.800 | - That connect the domains.
00:31:32.600 | And, you know, apart from the,
00:31:35.120 | just the simple perspective,
00:31:38.440 | if you have, you know, a very short domain,
00:31:41.200 | you have, sorry, a very short linker,
00:31:43.720 | you have two domains next to each other.
00:31:45.880 | They are forced to be next to each other.
00:31:47.560 | If you have a very long one,
00:31:49.060 | you have the domains that are extremely flexible
00:31:52.080 | and they carry out a lot of sort of
00:31:54.320 | spatial reorganization, right?
00:31:56.880 | - That's awesome.
00:31:58.120 | - But on top of that, right,
00:31:59.640 | just this linker itself, because it's so flexible,
00:32:03.760 | it actually can adapt to a lot of different shapes.
00:32:07.480 | And therefore it's a very good interactor
00:32:11.080 | when it comes to interaction between
00:32:13.180 | this protein and other protein.
00:32:15.520 | All right, so these things also evolve, you know,
00:32:18.920 | and they in a way have different sort of laws of,
00:32:23.920 | the driving laws that underlie the evolution,
00:32:30.500 | because they no longer need to preserve certain structure,
00:32:35.580 | right, unlike protein domains.
00:32:38.860 | And so on top of that,
00:32:41.500 | you have something that is even less studied.
00:32:45.860 | And this is something that attribute
00:32:49.360 | to the concept of alternative splicing.
00:32:53.260 | So alternative splicing, so it's a very cool concept.
00:32:56.940 | It's something that we've been fascinated about
00:33:00.400 | for over a decade in my lab
00:33:03.520 | and trying to do research with that.
00:33:05.520 | But so, you know, so typically, you know,
00:33:08.100 | a simplistic perspective is that
00:33:11.600 | one gene is equal one protein product.
00:33:15.840 | Right, so you have a gene, you know,
00:33:18.340 | you transcribe it and translate it,
00:33:21.140 | and it becomes a protein.
00:33:22.900 | In reality, when we talk about eukaryotes,
00:33:28.380 | especially sort of more recent eukaryotes
00:33:32.320 | that are very complex,
00:33:33.820 | the gene is no longer equal to one protein.
00:33:41.100 | It actually can produce multiple
00:33:46.100 | functionally, you know, active protein products.
00:33:51.220 | And each of them is, you know,
00:33:53.480 | is called an alternatively spliced product.
00:33:57.920 | The reason it happens is that if you look at the gene,
00:34:01.840 | it actually has, it has also blocks.
00:34:05.580 | And the blocks, some of which,
00:34:08.300 | and it's essentially, it goes like this.
00:34:10.680 | So we have a block that will later be translated,
00:34:13.860 | we call it exon.
00:34:15.060 | Then we'll have a block that is not translated, cut out.
00:34:19.220 | We call it intron.
00:34:20.400 | So we have exon, intron, exon, intron,
00:34:22.840 | et cetera, et cetera, et cetera, right?
00:34:24.140 | So sometimes you can have, you know,
00:34:26.900 | dozens of these exons and introns.
00:34:29.860 | So what happens is during the process
00:34:32.660 | when the gene is converted to RNA,
00:34:37.320 | we have things that are cut out,
00:34:41.260 | the introns that cut out,
00:34:43.220 | and exons that now get assembled together.
00:34:47.180 | And sometimes we will throw out some of the exons.
00:34:51.400 | And the remaining protein product will become--
00:34:54.620 | - Still be the same.
00:34:55.620 | - Different. - Oh, different.
00:34:56.540 | - Right, so now you have fragments of the protein
00:34:59.980 | that no longer there.
00:35:01.360 | They were cut out with the introns.
00:35:03.820 | Sometimes you will essentially take one exon
00:35:07.560 | and replace it with another one, right?
00:35:09.840 | - So there's some flexibility in this process.
00:35:12.640 | - So that creates a whole new level of complexity.
00:35:17.200 | 'Cause now-- - Is this random though?
00:35:18.680 | Is it random?
00:35:19.680 | - It's not random.
00:35:20.880 | We, and this is where I think now the appearance
00:35:24.520 | of this modern single cell,
00:35:27.380 | and before that tissue level sequencing,
00:35:31.280 | next generation sequencing techniques such as RNA-Seq,
00:35:34.300 | allows us to see that these are the events
00:35:38.220 | that often happen in response.
00:35:41.300 | It's a dynamic event that happens in response to disease,
00:35:45.680 | or in response to certain developmental stage of a cell.
00:35:51.980 | And this is an incredibly complex layer
00:35:56.820 | that also undergoes, I mean,
00:35:59.820 | because it's at the gene level, right?
00:36:01.560 | So it undergoes certain evolution, right?
00:36:05.420 | And now we have this interplay between what happening,
00:36:10.060 | what is happening in the protein world,
00:36:12.740 | and what is happening in the gene and RNA world.
00:36:17.740 | And for example, it's often that we see
00:36:22.720 | that the boundaries of these exons
00:36:27.600 | coincide with the boundaries of the protein domains.
00:36:30.360 | Right, so there is this close interplay to that.
00:36:36.520 | It's not always, I mean,
00:36:37.960 | otherwise it would be too simple, right?
00:36:39.800 | But we do see the connection
00:36:41.880 | between those sort of machineries.
00:36:45.000 | And obviously the evolution will pick up this complexity
00:36:49.760 | and-- - Select for whatever
00:36:52.880 | is successful, whatever is functioning.
00:36:55.040 | - We see that complexity in play.
00:36:57.560 | And makes this question more complex, but more exciting.
00:37:02.280 | - As a small detour, I don't know if you think about this
00:37:05.200 | in into the world of computer science,
00:37:07.520 | there's a Douglas Hostetter, I think,
00:37:11.240 | came up with a name of Quine,
00:37:14.360 | which are, I don't know if you're familiar with these things,
00:37:16.740 | but it's computer programs that have,
00:37:20.240 | I guess, exon and intron, and they copy,
00:37:23.300 | the whole purpose of the program is to copy itself.
00:37:26.240 | So it prints copies of itself,
00:37:28.480 | but can also carry information inside of it.
00:37:31.000 | That's a very kind of crude, fun exercise of,
00:37:35.440 | can we sort of replicate these ideas from cells,
00:37:40.000 | can we have a computer program that when you run it,
00:37:42.960 | just prints itself, the entirety of itself,
00:37:47.080 | and does it in different programming languages and so on.
00:37:50.040 | I've been playing around and writing them.
00:37:51.960 | It's a kind of fun little exercise.
00:37:53.720 | - You know, when I was a kid,
00:37:54.920 | so it was essentially one of the sort of main stages
00:37:59.920 | in informatics Olympiads that you have to reach
00:38:07.880 | in order to be any so good,
00:38:10.880 | is you should be able to write a program
00:38:14.400 | that replicates itself.
00:38:16.680 | And so the task then becomes even sort of more complicated.
00:38:20.920 | So what is the shortest program?
00:38:24.040 | And of course it's a function of a programming language,
00:38:27.480 | but yeah, I remember a long, long, long time ago
00:38:30.920 | when we tried to make it short and short
00:38:34.840 | and find the shortcuts.
00:38:36.560 | - There's actually on Stack Exchange,
00:38:38.640 | there's an entire site called Code Golf, I think,
00:38:43.640 | where the entirety is just the competition.
00:38:46.520 | People just come up with whatever task.
00:38:48.680 | I don't know, like write code
00:38:51.680 | that reports the weather today.
00:38:54.640 | And the competition is about,
00:38:57.280 | in whatever programming language,
00:38:58.640 | what is the shortest program?
00:39:00.440 | And it makes you actually, people should check it out
00:39:02.240 | because it makes you realize
00:39:03.600 | there's some weird programming languages out there.
00:39:07.160 | But just to dig on that a little deeper,
00:39:11.720 | do you think, in computer science,
00:39:16.080 | we don't often think about programs.
00:39:19.280 | Just like the machine learning world now,
00:39:21.320 | that's still kind of basic programs.
00:39:26.320 | And then there's humans that replicate themselves, right?
00:39:29.640 | And there's these mutations and so on.
00:39:31.560 | Do you think we'll ever have a world
00:39:34.520 | where there's programs that kind of
00:39:36.360 | have an evolutionary process?
00:39:40.680 | So I'm not talking about evolutionary algorithms,
00:39:42.680 | but I'm talking about programs
00:39:43.760 | that kind of mate with each other and evolve
00:39:46.520 | and on their own replicate themselves.
00:39:49.600 | So this is kind of, the idea here is,
00:39:54.600 | that's how you can have a runaway thing.
00:39:57.120 | So we think about machine learning as a system
00:39:59.240 | that gets smarter and smarter and smarter and smarter.
00:40:01.440 | At least the machine learning systems of today are like,
00:40:05.640 | it's a program that you can turn off,
00:40:09.120 | as opposed to throwing a bunch of little programs out there
00:40:12.800 | and letting them multiply and mate and evolve
00:40:16.400 | and replicate.
00:40:17.480 | Do you ever think about that kind of world
00:40:20.560 | when we jump from the biological systems
00:40:23.440 | that you're looking at to artificial ones?
00:40:27.280 | - I mean, it's almost like you take
00:40:31.000 | the sort of the area of intelligent agents, right?
00:40:34.480 | Which are essentially the independent sort of codes
00:40:38.720 | that run and interact and exchange the information, right?
00:40:42.560 | So I don't see why not.
00:40:45.200 | I mean, it could be sort of a natural evolution
00:40:48.840 | in this area of computer science.
00:40:53.000 | - I think it's kind of an interesting possibility.
00:40:54.720 | It's terrifying too,
00:40:56.000 | but I think it's a really powerful tool.
00:40:58.360 | Like to have agents that,
00:41:00.680 | we have social networks with millions of people
00:41:02.800 | and they interact.
00:41:03.840 | I think it's interesting to inject into that,
00:41:05.720 | there's already injecting into that bots, right?
00:41:08.400 | But those bots are pretty dumb.
00:41:12.520 | They're probably pretty dumb algorithms.
00:41:14.720 | It's interesting to think that there might be bots
00:41:18.640 | that evolve together with humans.
00:41:20.480 | And there's the sea of humans and robots
00:41:23.960 | that are operating first in the digital space.
00:41:26.520 | And then you can also think,
00:41:27.960 | I love the idea, some people worked,
00:41:29.760 | I think at Harvard, at Penn,
00:41:32.600 | there's robotics labs that take as a fundamental task
00:41:38.960 | to build a robot that given extra resources
00:41:42.360 | can build another copy of itself,
00:41:44.840 | like in the physical space,
00:41:46.520 | which is super difficult to do,
00:41:49.360 | but super interesting.
00:41:50.840 | I remember there's like research on robots
00:41:54.000 | that can build a bridge.
00:41:55.200 | So they make a copy of themselves
00:41:56.840 | and they connect themselves.
00:41:57.960 | And so there's like self-building bridge
00:42:00.520 | based on building blocks.
00:42:02.320 | You can imagine like a building that self assembles.
00:42:05.600 | So it's basically self-assembling structures
00:42:07.560 | from robotic parts.
00:42:10.640 | But it's interesting to, within that robot,
00:42:13.880 | add the ability to mutate
00:42:15.480 | and do all the interesting little things
00:42:21.320 | that you're referring to in evolution
00:42:23.200 | to go from a single origin protein building block
00:42:26.320 | to like this weird complex--
00:42:28.920 | - And if you think about this,
00:42:30.320 | I mean, the bits and pieces are there.
00:42:34.600 | So you mentioned the evolutionary algorithm, right?
00:42:37.280 | So this is sort of,
00:42:38.520 | and maybe sort of the goal is in a way different, right?
00:42:43.520 | So the goal is to essentially to optimize your search.
00:42:49.840 | So, but sort of the ideas are there.
00:42:53.040 | So people recognize that the recombination events
00:42:58.040 | lead to global changes in the search trajectories,
00:43:03.280 | the mutations event is a more refined step in the search.
00:43:08.280 | Then you have other sort of nature inspired algorithm, right?
00:43:15.040 | So one of the reason that I think it's one of the funnest one
00:43:21.480 | is the slime based algorithm, right?
00:43:24.840 | So I think the first was introduced by the Japanese group
00:43:30.200 | where it was able to solve some pretty complex problems.
00:43:35.200 | So that's, and then I think there are still a lot of things
00:43:41.840 | we've yet to borrow from the nature, right?
00:43:48.320 | So there are a lot of sort of ideas
00:43:52.000 | that nature gets to offer us
00:43:56.000 | that it's up to us to grab it
00:43:58.360 | and to get the best use of it.
00:44:02.160 | - Including neural networks,
00:44:04.160 | that we have a very crude inspiration
00:44:07.040 | from nature on neural networks.
00:44:08.280 | Maybe there's other inspirations to be discovered
00:44:10.920 | in the brain or other aspects of the various systems,
00:44:15.920 | even like the immune system, the way it interplays.
00:44:20.160 | I recently started to understand that the immune system
00:44:23.560 | has something to do with the way the brain operates.
00:44:26.040 | Like there's multiple things going on in there,
00:44:28.360 | which all of which are not modeled
00:44:30.520 | in artificial neural networks.
00:44:32.120 | And maybe if you throw a little bit
00:44:33.960 | of that biological spice in there,
00:44:36.240 | you'll come up with something cool.
00:44:38.980 | I'm not sure if you're familiar with the Drake equation
00:44:43.720 | that estimate, I just did a video on it yesterday
00:44:46.720 | 'cause I wanted to give my own estimate of it.
00:44:49.280 | It's an equation that combines a bunch of factors
00:44:52.360 | to estimate how many alien civilizations are.
00:44:55.960 | - Oh yeah, I've heard about it, yes.
00:44:58.520 | - So one of the interesting parameters,
00:45:01.160 | you know, it's like how many stars are born every year,
00:45:06.000 | how many planets are on average per star,
00:45:10.720 | for this, how many habitable planets are there.
00:45:14.280 | And then the one that starts being really interesting
00:45:17.560 | is the probability that life emerges on a habitable planet.
00:45:24.740 | So like, I don't know if you think about,
00:45:27.920 | you certainly think a lot about evolution,
00:45:29.720 | but do you think about the thing
00:45:31.080 | which evolution doesn't describe,
00:45:32.520 | which is like the beginning of evolution,
00:45:34.920 | the origin of life?
00:45:36.640 | I think I put the probability of life
00:45:38.800 | developing in a habitable planet at 1%.
00:45:41.800 | This is very scientifically rigorous.
00:45:44.440 | Okay, first at a high level for the Drake equation,
00:45:48.760 | what would you put that percent at on earth?
00:45:51.680 | And in general, do you have something,
00:45:55.060 | do you have thoughts about how life might have started?
00:45:58.180 | You know, like the proteins being the first,
00:46:00.660 | kind of one of the early jumping points?
00:46:02.900 | - Yeah, so I think back in 2018,
00:46:07.460 | there was a very exciting paper published in Nature
00:46:10.420 | where they found one of the simplest amino acids,
00:46:18.260 | glycine, in a comet dust.
00:46:23.260 | So this is, and I apologize if I don't pronounce,
00:46:28.540 | it's a Russian-named comet,
00:46:32.080 | I think Chugryumov-Gerasimenko.
00:46:34.760 | This is the comet where, and there was this mission
00:46:39.760 | to get close to this comet
00:46:44.000 | and get the star dust from its tail.
00:46:48.160 | And when scientists analyzed it,
00:46:50.640 | they actually found traces of glycine,
00:46:53.640 | which makes up, it's one of the 20 basic amino acids
00:47:01.640 | that makes up proteins.
00:47:06.020 | So that was kind of very exciting.
00:47:10.960 | But the question is very interesting.
00:47:14.240 | So what, if there is some alien life,
00:47:18.560 | is it gonna be made of proteins, right?
00:47:22.920 | There may be RNAs, right?
00:47:24.360 | So we see that the RNA viruses are certainly
00:47:28.920 | very well-established sort of group of molecular machines.
00:47:35.400 | Right, so yeah, it's a very interesting question.
00:47:42.120 | - What probability would you put,
00:47:43.580 | like how hard is this,
00:47:45.280 | like how unlikely just on Earth do you think
00:47:48.760 | this whole thing is that we got going?
00:47:51.600 | Like are we really lucky or is it inevitable?
00:47:54.640 | Like what's your sense when you sit back
00:47:56.240 | and think about life on Earth?
00:47:58.840 | Is it higher or lower than 1%?
00:48:01.000 | Well, 'cause 1% is pretty low,
00:48:02.320 | but it's still like, damn, that's a pretty good chance.
00:48:05.060 | - Yes, it's a pretty good chance.
00:48:06.600 | I mean, I would personally, but again,
00:48:10.560 | I'm probably not the best person to do such estimations,
00:48:15.560 | but I would, intuitively, I would probably put it lower.
00:48:21.660 | But still, I mean--
00:48:23.940 | - So we're really lucky here on Earth?
00:48:26.560 | - I mean--
00:48:28.860 | - Or the conditions are really good.
00:48:30.500 | - It's, I think that everything was right in a way, right?
00:48:35.500 | So still, it's not, the conditions were not like ideal
00:48:39.740 | if you try to look at what was several billions years ago
00:48:44.740 | when the life emerged.
00:48:48.340 | - So there is something called the rare Earth hypothesis
00:48:52.060 | that in counter to the Drake equation says that
00:48:56.180 | the conditions of Earth,
00:49:00.260 | if you actually were to describe Earth,
00:49:03.300 | it's quite a special place, so special it might be unique
00:49:08.060 | in our galaxy and potentially close to unique
00:49:11.780 | in the entire universe.
00:49:12.860 | Like it's very difficult to reconstruct
00:49:14.700 | those same conditions.
00:49:16.340 | And what the rare Earth hypothesis argues
00:49:19.540 | is all those different conditions are essential for life.
00:49:23.060 | And so that's sort of the counter,
00:49:26.140 | like all the things we,
00:49:27.420 | thinking that Earth is pretty average,
00:49:31.700 | I mean, I can't really,
00:49:33.260 | I'm trying to remember to go through all of them,
00:49:35.380 | but just the fact that it is shielded
00:49:38.900 | from a lot of asteroids,
00:49:41.860 | obviously the distance to the sun,
00:49:43.800 | but also the fact that it's like a perfect balance
00:49:48.220 | between the amount of water and land
00:49:52.180 | and all those kinds of things.
00:49:53.660 | I don't know, there's a bunch of different factors
00:49:55.180 | that I don't remember, there's a long list.
00:49:57.540 | But it's fascinating to think about
00:49:59.140 | if in order for something like proteins
00:50:03.620 | and then the DNA and RNA to emerge,
00:50:06.580 | you need, and basic living organisms,
00:50:10.500 | you need to be very close to an Earth-like planet,
00:50:14.960 | which would be sad.
00:50:16.600 | Or exciting, I don't know.
00:50:19.740 | - If you ask me, in a way I put a parallel
00:50:23.220 | between our own research
00:50:28.220 | and from the intuitive perspective.
00:50:34.040 | You have those two extremes,
00:50:36.680 | and the reality is never, very rarely,
00:50:40.400 | falls into the extremes.
00:50:41.920 | It's always, the optimums always reach somewhere in between.
00:50:46.160 | And that's what I tend to think.
00:50:50.040 | I think that we're probably somewhere in between,
00:50:54.040 | so they were not unique-unique,
00:50:57.000 | but again, the chances are reasonably small.
00:51:01.960 | The problem is we don't know the other extremes.
00:51:05.240 | I tend to think that we don't actually understand
00:51:08.040 | the basic mechanisms of what this is all originated from.
00:51:13.040 | It seems like we think of life as this distinct thing,
00:51:16.160 | maybe intelligence as a distinct thing,
00:51:18.520 | maybe the physics from which planets and suns are born
00:51:23.120 | is a distinct thing, but that could be a very,
00:51:25.920 | it's like the Stephen Wolfram thing.
00:51:28.280 | From simple rules emerges greater and greater complexity.
00:51:31.020 | So I tend to believe that just life finds a way.
00:51:34.940 | We don't know the extreme of how common life is,
00:51:39.560 | 'cause it could be life is like everywhere.
00:51:43.660 | Like so everywhere that it's almost like laughable,
00:51:49.440 | like that we're such idiots to think we're,
00:51:52.120 | like it's ridiculous to even think,
00:51:56.280 | it's like ants thinking that their little colony
00:51:59.460 | is the unique thing and everything else doesn't exist.
00:52:03.200 | I mean, it's also very possible that that's the extreme,
00:52:07.520 | and we're just not able to maybe comprehend
00:52:09.880 | the nature of that life.
00:52:12.860 | Just to stick on alien life for just a brief moment more,
00:52:16.560 | there is some signs of life on Venus in gaseous form.
00:52:21.560 | There's hope for life on Mars, probably extinct.
00:52:28.140 | We're not talking about intelligent life.
00:52:30.860 | Although that has been in the news recently.
00:52:33.820 | We're talking about basic like bacteria.
00:52:36.900 | - Plural bacteria.
00:52:37.740 | - Yeah, and then also I guess, there's a couple moons.
00:52:42.180 | - Europa.
00:52:43.020 | - Yeah, Europa, which is Jupiter's moon.
00:52:46.020 | I think there's another one.
00:52:47.460 | Are you, is that exciting or is it terrifying to you
00:52:51.240 | that we might find life?
00:52:52.980 | Do you hope we find life?
00:52:54.460 | - I certainly do hope that we find life.
00:52:57.940 | I mean, it was very exciting to hear about this news
00:53:02.100 | about the possible life on Venus.
00:53:09.260 | - It'd be nice to have hard evidence of something
00:53:11.620 | which is what the hope is for Mars and Europa.
00:53:17.140 | But do you think those organisms
00:53:18.420 | would be similar biologically,
00:53:20.820 | or would they even be sort of carbon-based
00:53:23.980 | if we do find them?
00:53:25.760 | - I would say they would be carbon-based.
00:53:28.940 | How similar, it's a big question, right?
00:53:31.820 | So it's, the moment we discover things outside Earth,
00:53:36.820 | even if it's a tiny little single cell,
00:53:42.400 | I mean, there is so much.
00:53:45.380 | - Just imagine that, that would be so.
00:53:47.620 | - I think that that would be another turning point
00:53:50.460 | for the science.
00:53:52.060 | And if, especially if it's different in some very new way,
00:53:56.200 | that's exciting, 'cause that says,
00:53:58.200 | that's a definitive statement, not a definitive,
00:54:00.480 | but a pretty strong statement
00:54:01.760 | that life is everywhere in the universe.
00:54:05.440 | To me, at least, that's really exciting.
00:54:07.720 | You brought up Joshua Lederberg in an offline conversation.
00:54:13.480 | I think I'd love to talk to you about AlphaFold,
00:54:15.800 | and this might be an interesting way
00:54:17.220 | to enter that conversation, because,
00:54:20.400 | so he won the 1958 Nobel Prize in Physiology and Medicine
00:54:24.520 | for discovering that bacteria can mate and exchange genes,
00:54:29.000 | but he also did a ton of other stuff,
00:54:32.200 | like we mentioned, helping NASA find life on Mars,
00:54:37.200 | and the-- - Dendro.
00:54:41.840 | - Dendro, the chemical expert system,
00:54:45.260 | expert systems, remember those?
00:54:47.920 | Do you, what do you find interesting about this guy
00:54:51.400 | and his ideas about artificial intelligence in general?
00:54:55.000 | - So I have a kind of personal story to share.
00:55:00.000 | So I started my PhD in Canada back in 2000,
00:55:05.160 | and so essentially my PhD was,
00:55:07.760 | so we were developing sort of a new language
00:55:10.120 | for symbolic machine learning,
00:55:12.560 | so it's different from the feature-based machine learning,
00:55:15.120 | and one of the sort of cleanest applications
00:55:19.840 | of this approach, of this formalism,
00:55:24.000 | was to cheminformatics and computer-aided drug design.
00:55:28.600 | Right, so essentially we were,
00:55:31.360 | as a part of my research, I developed a system
00:55:35.680 | that essentially looked at chemical compounds
00:55:40.200 | of, say, the same therapeutic category,
00:55:44.680 | male hormones, right, and tried to figure out
00:55:48.600 | the structural fragments that are,
00:55:53.160 | the structural building blocks that are important,
00:55:56.480 | that define this class,
00:55:58.840 | versus structural building blocks
00:56:00.480 | that are there just because, you know,
00:56:03.400 | to complete the structure,
00:56:04.960 | but they are not essentially the ones
00:56:06.760 | that make up the key chemical properties
00:56:10.760 | of this therapeutic category.
00:56:13.520 | And, you know, for me it was something new.
00:56:17.520 | I was trained as an applied mathematician,
00:56:20.560 | you know, with some machine learning background,
00:56:23.720 | but, you know, computer-aided drug design
00:56:25.800 | was a completely new territory.
00:56:28.400 | So because of that, I often find myself
00:56:32.200 | asking lots of questions on one of the sort of
00:56:35.280 | central forums.
00:56:37.640 | Back then there were, you know,
00:56:39.400 | no Facebooks or stuff like that,
00:56:41.080 | there was-- - What's a forum?
00:56:42.480 | - It's a, you know, it's a forum,
00:56:44.400 | it's essentially it's like a bulletin board.
00:56:46.360 | - Yeah, on the internet. - Yeah, so you essentially,
00:56:49.680 | you have a bunch of people and you post a question
00:56:52.440 | and you get, you know, an answer from,
00:56:54.560 | you know, different people.
00:56:56.000 | And back then, one of the most popular forums was CCL.
00:57:01.000 | Think Computational Chemistry Library,
00:57:05.440 | not library, but something like that,
00:57:07.080 | but CCL, that was the forum.
00:57:09.840 | And there I, you know, I--
00:57:12.800 | - Asked a lot of dumb questions.
00:57:14.040 | - Yes, I asked questions, also shared some, you know,
00:57:17.960 | some information about our formalism,
00:57:20.600 | how we do and whether whatever we do makes sense.
00:57:25.080 | And so, you know, and I remember that one of these posts,
00:57:29.160 | I mean, I still remember, you know,
00:57:31.400 | I would call it desperately looking for a chemist's advice,
00:57:39.200 | something like that, right?
00:57:40.720 | And so I post my question, I explained, you know,
00:57:43.960 | how our formalism is, what it does,
00:57:48.960 | and what kind of applications I'm planning to do.
00:57:53.160 | And, you know, and it was, you know,
00:57:55.000 | in the middle of the night and, you know,
00:57:56.880 | I went back to bed and next morning
00:58:01.880 | have a phone call from my advisor
00:58:04.760 | who also looked at this forum.
00:58:06.880 | He's like, "You won't believe who replied to you."
00:58:11.040 | And it's like, "Who?"
00:58:13.880 | He said, "Well, you know, there is a message to you
00:58:16.440 | "from Joshua Lederberg."
00:58:18.040 | And my reaction was like, "Who is Joshua Lederberg?"
00:58:22.800 | (laughing)
00:58:24.840 | - Your advisor hung up.
00:58:26.000 | - So, and essentially, you know,
00:58:29.720 | Joshua wrote me that we had conceptually similar ideas
00:58:34.120 | in the Dendrel project.
00:58:36.680 | You may wanna look it up.
00:58:38.240 | And, you know--
00:58:40.160 | - And we should also, sorry, and this is a side comment,
00:58:42.640 | say that even though he won the Nobel Prize
00:58:45.960 | at a really young age, in '58, but so--
00:58:49.320 | - He was, I think, he was, what, 33.
00:58:52.400 | - Yeah, it's just crazy.
00:58:54.000 | So anyway, so that's, so hence in the '90s,
00:58:57.640 | responding to young whippersnappers on the CCL forum.
00:59:02.120 | Okay, so--
00:59:02.960 | - And so back then he was already very senior.
00:59:05.840 | I mean, he unfortunately passed away back in 2008.
00:59:09.600 | But, you know, back in 2001, he was, I mean,
00:59:12.560 | he was a professor emeritus at Rockefeller University.
00:59:15.960 | And, you know, that was actually,
00:59:17.720 | believe it or not, one of the reasons
00:59:22.720 | I decided to join, you know, as a postdoc,
00:59:26.800 | the group of Andrei Saleh,
00:59:28.160 | who was at Rockefeller University,
00:59:30.800 | with the hope that, you know, that I could actually,
00:59:33.440 | you know, have a chance to meet Joshua in person.
00:59:38.080 | And I met him very briefly, right?
00:59:40.880 | Just because he was walking, you know,
00:59:45.400 | there's a little bridge that connects
00:59:47.600 | the sort of the research campus
00:59:50.040 | with the sort of sky scrapper that Rockefeller owns,
00:59:55.040 | the way, you know, postdocs and faculty
00:59:58.760 | and graduate students live.
01:00:00.520 | And so I met him, you know,
01:00:02.440 | and had a very short conversation, you know.
01:00:06.320 | But so I started, you know, reading about Dendral,
01:00:10.360 | and I was amazed, you know, it's,
01:00:12.680 | we're talking about 1960, right?
01:00:16.080 | The ideas were so profound.
01:00:19.280 | - Well, what's the fundamental ideas of it?
01:00:21.120 | - The reason to make this is even crazier.
01:00:25.000 | So Lederberg wanted to make a system
01:00:29.840 | that would help him study
01:00:33.440 | the extraterrestrial molecules, right?
01:00:38.440 | So the idea was that, you know,
01:00:40.960 | the way you study the extraterrestrial molecules
01:00:43.400 | is you do the mass spec analysis, right?
01:00:46.560 | And so the mass spec gives you sort of bits,
01:00:49.680 | numbers about essentially,
01:00:51.680 | gives you the ideas about the possible fragments,
01:00:54.760 | or, you know, atoms, you know,
01:00:57.440 | and maybe a little fragments,
01:00:59.800 | pieces of this molecule that make up the molecule, right?
01:01:03.600 | So now you need to sort of,
01:01:06.080 | to decompose this information
01:01:09.200 | and to figure out what was the whole
01:01:12.440 | before, you know, became fragments, bits and pieces, right?
01:01:17.440 | So in order to make this, you know,
01:01:20.840 | to have this tool,
01:01:24.360 | the idea of Lederberg was to connect chemistry,
01:01:28.640 | computer science,
01:01:31.080 | and to design this so-called expert system
01:01:36.080 | that looks, that takes into account,
01:01:38.200 | that takes as an input, the mass spec data,
01:01:42.160 | the possible database of possible molecules,
01:01:47.160 | and essentially try to sort of induce the molecule
01:01:52.640 | that would correspond to this spectra.
01:01:55.600 | Or, you know, essentially,
01:01:57.760 | what this project ended up being
01:02:01.520 | was that, you know, it would provide a list of candidates
01:02:07.080 | that then a chemist would look at
01:02:09.760 | and make final decision, so.
01:02:12.560 | - But the original idea, I suppose,
01:02:13.960 | is to solve the entirety of this problem automatically.
01:02:16.840 | - Yes, yes.
01:02:17.680 | So he, you know,
01:02:20.520 | so he, back then, he--
01:02:24.000 | - '60s. - Approached, yes.
01:02:26.280 | Believe that.
01:02:27.120 | You know, it's amazing.
01:02:28.960 | I mean, it still blows my mind, you know,
01:02:31.080 | that it's, that's,
01:02:32.800 | and this was essentially the origin
01:02:37.400 | of the modern bioinformatics, cheminformatics,
01:02:41.120 | you know, back in '60s.
01:02:42.800 | So that's, you know,
01:02:44.000 | so every time you deal with projects like this,
01:02:48.960 | with the, you know, research like this,
01:02:51.360 | it just, you know,
01:02:52.560 | so the power of the, you know,
01:02:56.360 | intelligence of these people
01:02:58.960 | is just, you know, overwhelming.
01:03:01.720 | - Do you think about expert systems?
01:03:04.200 | Is there, and why they kind of didn't become successful?
01:03:09.200 | Especially in the space of bioinformatics,
01:03:12.520 | where it does seem like there is a lot of expertise
01:03:15.400 | in humans, and, you know,
01:03:18.280 | is it possible to see that a system like this
01:03:21.360 | could be made very useful?
01:03:23.600 | - Right. - And be built up?
01:03:24.440 | - So it's actually, it's a great question,
01:03:26.920 | and this is something, so, you know,
01:03:28.480 | so, you know, at my university,
01:03:31.480 | I teach artificial intelligence,
01:03:33.920 | and, you know, we start,
01:03:36.440 | my first two lectures are on the history of AI.
01:03:40.120 | And there we, you know, we try to, you know,
01:03:44.360 | go through the main stages of AI,
01:03:48.200 | and so, you know, the question of why expert systems
01:03:53.080 | failed or became obsolete,
01:03:57.120 | it's actually a very interesting one.
01:03:58.520 | And there are, you know, if you try to read the,
01:04:01.680 | you know, the historical perspectives,
01:04:03.320 | there are actually two lines of thoughts.
01:04:05.520 | One is that they were essentially
01:04:10.520 | not up to the expectations,
01:04:14.840 | and so therefore they were replaced, you know,
01:04:18.040 | by other things, right?
01:04:21.160 | The other one was that completely opposite one,
01:04:25.320 | that they were too good.
01:04:28.160 | And as a result, they essentially
01:04:31.440 | became sort of a household name,
01:04:33.180 | and then essentially they got transformed.
01:04:37.120 | I mean, in both cases, sort of the outcome was the same.
01:04:40.680 | They evolved into something.
01:04:42.400 | - Yeah. - Right?
01:04:43.760 | And that's what I, you know, if--
01:04:46.040 | - That's interesting. - If I look at this, right?
01:04:47.680 | So the modern machine learning, right?
01:04:50.200 | So-- - So there's echoes
01:04:52.000 | in the modern machine learning to that.
01:04:53.240 | - I think so, I think so, because, you know,
01:04:55.320 | if you think about this, you know,
01:04:57.200 | and how we design, you know,
01:04:59.640 | the most successful algorithms,
01:05:02.480 | including AlphaFold, right?
01:05:04.120 | You built in the knowledge about the domain
01:05:08.040 | that you study, right?
01:05:09.920 | So you built in your expertise.
01:05:12.860 | - So speaking of AlphaFold, so DeepMind's AlphaFold2
01:05:16.520 | recently was announced to have,
01:05:18.920 | quote unquote, "solved protein folding."
01:05:21.400 | How exciting is this to you?
01:05:24.160 | It seems to be one of the exciting things
01:05:28.280 | that have happened in 2020.
01:05:29.640 | It's an incredible accomplishment from the looks of it.
01:05:32.280 | What part of it is amazing to you?
01:05:33.840 | What part would you say is overhyped
01:05:36.280 | or maybe misunderstood?
01:05:39.000 | - It's definitely a very exciting achievement.
01:05:41.880 | To give you a little bit of perspective, right?
01:05:43.760 | So in bioinformatics, we have several competitions.
01:05:48.760 | And so the way, you know, you often hear
01:05:53.000 | how those competitions have been explained
01:05:56.160 | to sort of to non-bioinformaticians is that,
01:05:59.560 | you know, they call it bioinformatics Olympic games.
01:06:01.800 | And there are several disciplines, right?
01:06:03.600 | So the historical one of the first one
01:06:07.040 | was the discipline in predicting the protein structure,
01:06:10.280 | predicting the 3D coordinates of the protein.
01:06:12.560 | But there are some others.
01:06:13.600 | So the predicting protein functions,
01:06:16.760 | predicting effects of mutations on protein functions,
01:06:21.480 | then predicting protein-protein interactions.
01:06:24.920 | So the original one was CASP
01:06:28.080 | or a critical assessment of protein structure.
01:06:31.520 | And the, you know, typically what happens
01:06:40.000 | during these competitions is, you know, scientists,
01:06:43.960 | experimental scientists solve the structures,
01:06:48.360 | but don't put them into the protein databank,
01:06:51.680 | which is the centralized database
01:06:53.720 | that contains all the 3D coordinates.
01:06:57.240 | Instead, they hold it and release protein sequences.
01:07:02.240 | And now the challenge of the community
01:07:05.400 | is to predict the 3D structures of these proteins,
01:07:10.160 | and then use the experimentally solved structures
01:07:12.920 | to assess which one is the closest one, right?
01:07:16.600 | - And this competition, by the way,
01:07:17.760 | just a bunch of different tangents.
01:07:19.520 | And maybe you can also say, what is protein folding?
01:07:22.840 | Then this competition, CASP competition,
01:07:25.000 | has become the gold standard,
01:07:27.440 | and that's what was used to say
01:07:29.500 | that protein folding was solved.
01:07:32.440 | So I just added a little, yeah, just a bunch.
01:07:35.320 | - So if you can, whenever you say stuff,
01:07:37.680 | maybe throw in some of the basics
01:07:39.400 | for the folks that might be outside of the field.
01:07:41.560 | Anyway, sorry.
01:07:42.400 | - So, yeah, so, you know, so the reason it's, you know,
01:07:45.920 | it's relevant to our understanding of protein folding
01:07:50.280 | is because, you know, we've yet to learn
01:07:54.160 | how the folding mechanistically works, right?
01:07:58.140 | So there are different hypotheses
01:08:00.740 | what happens to this fold.
01:08:02.800 | For example, there is a hypothesis
01:08:05.960 | that the folding happens by, you know,
01:08:09.780 | also in the modular fashion, right?
01:08:12.640 | So that, you know, we have protein domains
01:08:16.220 | that get folded independently
01:08:17.920 | because their structure is stable,
01:08:19.720 | and then the whole protein structure gets formed.
01:08:23.360 | But, you know, within those domains,
01:08:25.360 | we also have so-called secondary structure,
01:08:27.460 | the small alpha helices, beta sheets.
01:08:29.800 | So these are, you know, elements that are structurally stable.
01:08:34.320 | And so, and the question is, you know,
01:08:37.800 | when do they get formed?
01:08:40.320 | Because some of the secondary structure elements,
01:08:42.560 | you have to have, you know, a fragment in the beginning
01:08:46.480 | and say the fragment in the middle, right?
01:08:49.400 | So you cannot potentially start
01:08:52.840 | having the full fold from the get-go, right?
01:08:57.120 | So it's still, you know, it's still a big enigma,
01:09:00.320 | what happens.
01:09:01.440 | We know that it's an extremely efficient
01:09:04.260 | and stable process, right?
01:09:05.680 | - So there's this long sequence,
01:09:07.640 | and the fold happens really quickly.
01:09:09.520 | - Exactly.
01:09:10.360 | - So that's really weird, right?
01:09:11.200 | And it happens like the same way almost every time.
01:09:15.120 | - Exactly, exactly, right?
01:09:16.600 | - That's really weird.
01:09:17.880 | That's freaking weird.
01:09:19.080 | - It's, yeah, that's why it's such an amazing thing.
01:09:22.920 | But most importantly, right, so it's, you know,
01:09:24.960 | so when you see the translation process, right,
01:09:29.240 | so when you don't have the whole protein translated, right,
01:09:34.240 | it's still being translated, you know,
01:09:39.200 | getting out from the ribosome,
01:09:41.200 | you already see some structural, you know, fragmentation.
01:09:45.760 | So folding starts happening
01:09:49.280 | before the whole protein gets produced, right?
01:09:53.040 | And so this is obviously, you know,
01:09:55.080 | one of the biggest questions in, you know,
01:09:59.240 | in modern molecular biologies.
01:10:00.960 | - Not like maybe what happens,
01:10:04.160 | like that's not, that's bigger than the question of folding.
01:10:07.880 | That's the question of like,
01:10:09.560 | so like deeper fundamental idea of folding.
01:10:12.440 | - Yes. - Behind folding.
01:10:13.400 | - Exactly, exactly.
01:10:14.640 | So, you know, so obviously if we are able to predict
01:10:21.320 | the end product of protein folding,
01:10:24.040 | we are one step closer to understanding
01:10:27.640 | sort of the mechanistics of the protein folding.
01:10:30.200 | Because we can then potentially look and start probing
01:10:34.680 | what are the critical parts of this process
01:10:38.200 | and what are not so critical parts of this process.
01:10:41.200 | So we can start decomposing this, you know,
01:10:44.440 | so in a way, this protein structure prediction algorithm
01:10:50.120 | can be used as a tool, right?
01:10:53.720 | So you change the, you know, you modify the protein,
01:10:58.720 | you get back to this tool, it predicts,
01:11:02.400 | okay, it's completely unstable.
01:11:05.000 | - Yeah, which aspects of the input
01:11:07.840 | will have a big impact on the output.
01:11:09.880 | - Exactly, exactly.
01:11:11.200 | So what happens is, you know,
01:11:13.360 | we typically have some sort of incremental advancement.
01:11:18.720 | You know, each stage of this CASP competition,
01:11:22.600 | you have groups with incremental advancement.
01:11:25.320 | And, you know, historically, the top performing groups
01:11:29.840 | were, you know, they were not using machine learning.
01:11:34.400 | They were using very advanced biophysics
01:11:37.720 | combined with bioinformatics,
01:11:39.600 | combined with, you know, the data mining.
01:11:43.220 | And that was, you know, that would enable them
01:11:47.360 | to obtain protein structures of those proteins
01:11:52.360 | that don't have any structurally solved relatives.
01:11:57.520 | Because, you know, if we have another protein,
01:12:01.860 | say the same protein, but coming from a different species,
01:12:05.640 | we could potentially derive some ideas,
01:12:10.440 | and that's so-called homology or comparative modeling,
01:12:13.200 | where we'll derive some ideas
01:12:15.280 | from the previously known structures.
01:12:17.360 | And that would help us tremendously in, you know,
01:12:21.440 | in reconstructing the 3D structure overall.
01:12:25.400 | But what happens when we don't have these relatives?
01:12:27.900 | This is when it becomes really, really hard, right?
01:12:31.200 | So that's so-called de novo, you know,
01:12:35.240 | de novo protein structure prediction.
01:12:37.560 | And in this case,
01:12:39.040 | those methods were traditionally very good.
01:12:43.040 | But what happened in the last year,
01:12:46.280 | the original alpha fold came into,
01:12:49.540 | and all of a sudden, it's much better than everyone else.
01:12:55.640 | - This is 2018.
01:12:57.920 | - Yeah.
01:12:58.760 | - Oh, and the competition is only every two years, I think.
01:13:02.160 | - And then, so, you know, it was sort of
01:13:06.600 | kind of of a shockwave to the bioinformatics community
01:13:10.160 | that we have like a state-of-the-art machine learning system
01:13:15.160 | that does structure prediction.
01:13:18.440 | And essentially what it does, you know,
01:13:20.760 | so if you look at this, it actually predicts the context.
01:13:25.760 | So, you know, so the process of reconstructing
01:13:29.480 | the 3D structure starts by predicting the context
01:13:34.480 | between the different parts of the protein.
01:13:38.880 | And the context is essentially the part of the proteins
01:13:40.960 | that are in the close proximity to each other.
01:13:43.240 | - Right, so it's actually,
01:13:44.680 | the machine learning part seems to be estimating,
01:13:47.800 | you can correct me if I'm wrong here,
01:13:51.080 | but it seems to be estimating the distance matrix,
01:13:53.200 | which is like the distance between the different parts.
01:13:55.880 | - Yeah, so we call it a contact map.
01:13:58.080 | - Contact map.
01:13:58.920 | - Right, so once you have the contact map,
01:14:00.600 | the reconstruction is becoming more straightforward.
01:14:03.920 | - Yeah.
01:14:04.760 | - Right, but so the contact map is the key.
01:14:06.800 | And so, you know, so that what happened.
01:14:11.280 | And now we started seeing in this current stage, right,
01:14:16.000 | where in the most recent one,
01:14:18.480 | we started seeing the emergence of these ideas
01:14:22.040 | in other people works, right?
01:14:25.080 | But yet here's, you know, AlphaFold2,
01:14:29.520 | that again outperforms everyone else.
01:14:33.400 | And also by introducing yet another wave
01:14:35.800 | of the machine learning ideas.
01:14:38.640 | - Yeah, there does seem to be also an incorporation.
01:14:41.280 | First of all, the paper's not out yet,
01:14:43.040 | but there's a bunch of ideas already out.
01:14:44.880 | There does seem to be an incorporation of this other thing.
01:14:48.120 | I don't know if it's something that you could speak to,
01:14:50.160 | which is like the incorporation of like other structures,
01:14:55.160 | like evolutionary similar structures
01:15:01.720 | that are used to kind of give you hints.
01:15:03.840 | - Yes, so evolutionary similarity is something
01:15:08.360 | that we can detect at different levels, right?
01:15:10.760 | So we know, for example, that the structure of proteins
01:15:15.760 | is more conserved than the sequence.
01:15:20.520 | The sequence could be very different,
01:15:22.320 | but the structural shape is actually still very conserved.
01:15:26.320 | So that's sort of the intrinsic property that, you know,
01:15:28.880 | in a way related to protein folds, you know,
01:15:31.720 | to the evolution of the protein,
01:15:35.400 | of proteins and protein domains, et cetera.
01:15:37.800 | But we know that, I mean, there've been multiple studies.
01:15:41.040 | And, you know, ideally if you have structures,
01:15:45.320 | you know, you should use that information.
01:15:48.560 | However, sometimes we don't have this information.
01:15:51.200 | Instead, we have a bunch of sequences.
01:15:53.200 | Sequences we have a lot, right?
01:15:54.880 | So we have, you know, hundreds, thousands of, you know,
01:15:59.880 | different organisms sequenced, right?
01:16:02.680 | And by taking the same protein,
01:16:05.680 | but in different organisms and aligning it,
01:16:09.800 | so making it, you know,
01:16:12.200 | making the corresponding positions aligned,
01:16:15.360 | we can actually say a lot about sort of what is conserved
01:16:20.360 | in this protein and therefore, you know,
01:16:24.040 | structure more stable, what is diverse in this protein.
01:16:27.240 | So on top of that, we could provide sort of the information
01:16:30.880 | about the sort of the secondary structure of this protein,
01:16:34.320 | et cetera, et cetera.
01:16:35.160 | So this information is extremely useful
01:16:37.560 | and it's already there.
01:16:39.880 | So while it's tempting to, you know,
01:16:42.840 | to do a complete ab initio,
01:16:44.760 | so you just have a protein sequence and nothing else,
01:16:48.200 | the reality is such that we are overwhelmed with this data.
01:16:52.880 | So why not use it?
01:16:55.200 | - Mm-hmm.
01:16:56.520 | - And so, yeah, so I'm looking forward
01:16:59.240 | to reading this paper.
01:17:01.480 | - It does seem to, like they've,
01:17:03.440 | in the previous version of AlphaFold, they didn't,
01:17:05.920 | for this evolutionary similarity thing,
01:17:09.760 | they didn't use machine learning for that.
01:17:12.020 | Or rather, they used it as like the input
01:17:15.600 | to the entirety of the neural net,
01:17:17.880 | like the features derived from the similarity.
01:17:22.000 | It seems like there's some kind of, quote unquote,
01:17:24.640 | iterative thing where it seems to be part of the learning
01:17:29.640 | process is the incorporation of this evolutionary similarity.
01:17:34.240 | - Yeah, I don't think there is a bioarchive paper, right?
01:17:36.920 | - There's nothing. - No, nothing.
01:17:38.520 | - There's a blog post that's written
01:17:40.680 | by a marketing team, essentially.
01:17:42.320 | - Yeah. - Which, you know,
01:17:43.640 | it has some scientific similarity probably
01:17:48.640 | to the actual methodology used,
01:17:51.800 | but it could be, it's like interpreting scripture.
01:17:55.240 | It could be just poetic interpretations
01:17:58.800 | of the actual work as opposed
01:18:00.040 | to direct connection to the work.
01:18:01.920 | - So now, speaking about protein folding, right?
01:18:04.280 | So, you know, in order to answer the question
01:18:06.800 | whether or not we have solved this, right?
01:18:09.440 | So we need to go back to the beginning of our conversation,
01:18:13.280 | you know, with the realization that, you know,
01:18:15.000 | an average protein is that typically what the cusp
01:18:21.040 | has been focusing on is, you know,
01:18:24.080 | this competition has been focusing on the single,
01:18:27.200 | maybe two domain proteins that are still very compact.
01:18:31.000 | And even those ones are extremely challenging to solve,
01:18:34.860 | right, but now we talk about, you know,
01:18:37.640 | an average protein that has two, three protein domains.
01:18:42.400 | If you look at the proteins that are in charge of the,
01:18:47.480 | you know, of the process with the neural system, right,
01:18:51.500 | perhaps one of the most recently evolved sort of systems
01:18:57.720 | in the organism, right?
01:19:04.120 | All of them, well, the majority of them
01:19:06.320 | are highly multi-domain proteins.
01:19:09.000 | So they are, you know, some of them have five, six, seven,
01:19:13.520 | you know, and more domains, right?
01:19:16.840 | And, you know, we are very far away from understanding
01:19:20.720 | how these proteins are folded.
01:19:22.400 | - So the complexity of the protein matters here,
01:19:24.440 | the complexity of the protein modules
01:19:27.960 | or the protein domains.
01:19:30.240 | So you're saying solved, so the definition of solved here
01:19:36.000 | is particularly the cast competition,
01:19:38.640 | achieving human level, not human level,
01:19:41.760 | achieving experimental level performance
01:19:45.640 | on these particular sets of proteins
01:19:48.520 | that have been used in these competitions.
01:19:50.280 | - Well, I mean, you know, I do think that, you know,
01:19:54.680 | especially with regards to the alpha fold, you know,
01:19:57.760 | it is able to, you know, to solve, you know,
01:20:02.760 | at the near experimental level,
01:20:07.360 | pretty big majority of the more compact proteins,
01:20:15.000 | like, or protein domains, because, again,
01:20:17.480 | in order to understand how the overall protein,
01:20:21.200 | you know, multi-domain protein fold,
01:20:24.640 | we do need to understand the structure
01:20:26.840 | of its individual domains.
01:20:28.760 | - I mean, unlike if you look at alpha zero
01:20:31.120 | or like mu zero, if you look at that work,
01:20:36.120 | you know, it's nice, reinforcement learning,
01:20:39.520 | self-playing mechanisms are nice
01:20:41.120 | 'cause it's all in simulation,
01:20:42.400 | so you can learn from just huge amounts.
01:20:45.920 | Like, you don't need data.
01:20:47.360 | It was like the problem with proteins,
01:20:49.760 | like the size, I forget how many 3D structures
01:20:54.560 | have been mapped, but the training data is very small,
01:20:56.960 | no matter what, it's like millions,
01:20:59.040 | maybe a one or two million, something like that,
01:21:01.400 | but it's some very small number,
01:21:02.920 | but like, it doesn't seem like that's scalable.
01:21:06.000 | There has to be, I don't know,
01:21:09.360 | it feels like you want to somehow 10X the data
01:21:13.120 | or 100X the data somehow.
01:21:15.720 | - Yes, but we also can take advantage
01:21:18.720 | of homology models, right?
01:21:23.080 | So the models that are of very good quality
01:21:26.760 | because they are essentially obtained
01:21:30.680 | based on the evolutionary information, right?
01:21:33.760 | So you can, there is a potential to enhance this information
01:21:38.560 | and use it again to empower the training set.
01:21:43.560 | And it's, I think, I am actually very optimistic.
01:21:53.600 | I think it's been one of these sort of, you know,
01:21:58.760 | churning events where you have a system
01:22:07.160 | that is, you know, a machine learning system
01:22:11.320 | that is truly better than the sort of the more conventional
01:22:15.760 | biophysics-based methods.
01:22:17.960 | - That's a huge leap.
01:22:19.400 | This is one of those fun questions,
01:22:21.320 | but where would you put it in the ranking
01:22:26.320 | of the greatest breakthroughs
01:22:28.560 | in artificial intelligence history?
01:22:30.500 | So like, okay, so let's see who's in the running.
01:22:34.980 | Maybe you can correct me.
01:22:35.880 | So you got like AlphaZero and AlphaGo
01:22:39.960 | beating the world champion at the game of Go.
01:22:44.520 | Thought to be impossible like 20 years ago,
01:22:48.240 | or at least the AI community was highly skeptical.
01:22:51.380 | Then you got like also Deep Blue original Kasparov.
01:22:55.120 | You have deep learning itself, like the,
01:22:56.960 | maybe what would you say, the AlexNet image in that moment.
01:23:01.000 | So the first, you know,
01:23:02.120 | network achieving human level performance,
01:23:04.840 | super not, that's not true.
01:23:07.880 | Achieving like a big leap in performance
01:23:11.040 | on the computer vision problem.
01:23:12.760 | There is OpenAI, the whole like GPT-3,
01:23:19.000 | that whole space of transformers and language models,
01:23:23.060 | just achieving this incredible performance
01:23:27.160 | of application of neural networks to language models.
01:23:30.240 | Boston Dynamics, pretty cool.
01:23:33.560 | Like robotics, even though people are like,
01:23:35.960 | there's no AI, no, no, no.
01:23:38.760 | There's no machine learning currently,
01:23:41.560 | but AI is much bigger than machine learning.
01:23:44.560 | So that just the engineering aspect,
01:23:48.920 | I would say it's one of the greatest accomplishments
01:23:50.840 | in engineering side, engineering meaning
01:23:53.920 | like mechanical engineering of robotics ever.
01:23:58.040 | Then of course, autonomous vehicles,
01:23:59.560 | you can argue for Waymo,
01:24:01.320 | which is like the Google self-driving car,
01:24:03.600 | or you can argue for Tesla,
01:24:05.480 | which is like actually being used
01:24:07.920 | by hundreds of thousands of people
01:24:09.880 | on the road today, machine learning system.
01:24:12.040 | And I don't know if you can, what else is there?
01:24:17.600 | But I think that's it.
01:24:18.440 | So, and then AlphaFold, many people are saying
01:24:20.960 | is up there, potentially number one.
01:24:23.360 | Would you put them at number one?
01:24:24.860 | - Well, in terms of the impact on the science
01:24:30.200 | and on the society beyond, it's definitely,
01:24:34.080 | to me would be one of the--
01:24:35.840 | - Top three?
01:24:38.480 | - I mean, I'm probably not the best person to answer that.
01:24:44.640 | But I do have, I remember my,
01:24:50.480 | back in, I think 1997, when Deep Blue,
01:24:56.400 | but Kasparov, it was, I mean, it was a shock.
01:25:01.400 | I mean, it was, and I think for the,
01:25:04.280 | for the pretty substantial part of the world,
01:25:13.000 | that especially people who have some experience
01:25:18.240 | with chess, right, and realizing how incredibly human
01:25:25.080 | this game, how much of a brain power you need
01:25:30.080 | to reach those levels of grandmasters, right, level.
01:25:35.800 | - Yeah, and it's probably one of the first time,
01:25:37.920 | and how good Kasparov was.
01:25:39.760 | - And again, yeah, so Kasparov's arguably
01:25:42.280 | one of the best ever, right?
01:25:45.560 | And you get a machine that beats him, right?
01:25:48.080 | So it's--
01:25:48.920 | - First time a machine probably beat a human
01:25:50.760 | at that scale of a thing, of anything.
01:25:53.720 | - Yes, yes, so that was, to me, that was like,
01:25:56.480 | you know, one of the groundbreaking events
01:25:59.400 | in the history of AI.
01:26:00.600 | - Yeah, it's probably number one.
01:26:01.920 | - As probably, like, we don't, it's hard to remember.
01:26:05.440 | It's like Muhammad Ali versus, I don't know,
01:26:08.120 | any other, Mike Tyson, something like that.
01:26:09.880 | It's like, nah, you gotta put Muhammad Ali at number one.
01:26:12.760 | Same with Deep Blue, even though
01:26:16.240 | it's not machine learning based.
01:26:17.980 | Still, it uses advanced search,
01:26:21.520 | and search is the integral part of AI, right?
01:26:24.080 | So as you said, it's--
01:26:25.400 | - People don't think of it that way at this moment.
01:26:27.640 | In Vogue, currently, search is not seen
01:26:30.880 | as a fundamental aspect of intelligence,
01:26:34.240 | but it very well, it very likely is.
01:26:37.680 | In fact, I mean, that's what neural networks are,
01:26:39.640 | is they're just performing search
01:26:41.240 | on the space of parameters.
01:26:42.880 | And it's all search. (laughs)
01:26:45.520 | All of intelligence is some form of search,
01:26:47.760 | and you just have to become clever and clever
01:26:49.640 | at that search problem.
01:26:50.920 | - And I also have another one that you didn't mention
01:26:54.000 | that's one of my favorite ones.
01:26:57.440 | So you probably heard of this.
01:27:00.880 | I think it's called Deep Rembrandt.
01:27:03.440 | It's the project where they trained,
01:27:06.600 | I think there was a collaboration between
01:27:08.840 | the experts in Rembrandt painting in Netherlands,
01:27:15.280 | and a group, an artificial intelligence group,
01:27:18.320 | where they train an algorithm
01:27:20.200 | to replicate the style of the Rembrandt,
01:27:23.000 | and they actually printed a portrait
01:27:27.000 | that never existed before in the style of Rembrandt.
01:27:31.160 | I think they printed it on a sort of,
01:27:36.760 | on the canvas that, using pretty much
01:27:40.000 | same types of paints and stuff.
01:27:42.560 | To me, it was mind-blowing.
01:27:44.080 | - Yeah.
01:27:44.920 | - It's--
01:27:45.740 | - In the space of art, that's interesting.
01:27:46.880 | There hasn't been, maybe that's it,
01:27:50.080 | but I think there hasn't been an image
01:27:53.600 | in that moment yet in the space of art.
01:27:55.840 | You haven't been able to achieve superhuman level
01:27:59.600 | performance in the space of art,
01:28:01.420 | even though there was this big famous thing
01:28:04.660 | where a piece of art was purchased,
01:28:07.680 | I guess, for a lot of money.
01:28:08.680 | - Yes.
01:28:09.520 | - Yeah, but it's still, people are like,
01:28:12.360 | in the space of music at least,
01:28:15.600 | that's, it's clear that human-created pieces
01:28:19.720 | are much more popular.
01:28:21.680 | So there hasn't been a moment where it's like,
01:28:24.400 | oh, this is, we're now, I would say in the space of music,
01:28:28.760 | what makes a lot of money, we're talking about serious money,
01:28:32.080 | it's music and movies, or like shows and so on,
01:28:35.280 | and entertainment.
01:28:36.600 | There hasn't been a moment where AI created,
01:28:40.000 | AI was able to create a piece of music
01:28:44.440 | or a piece of cinema, like Netflix show,
01:28:49.440 | that is, that's sufficiently popular to make a ton of money.
01:28:54.840 | - Yeah.
01:28:56.120 | - And that moment would be very, very powerful,
01:28:58.960 | 'cause that's like, that's an AI system
01:29:01.560 | being used to make a lot of money.
01:29:03.040 | And like direct, of course, AI tools,
01:29:05.480 | like even Premiere, audio editing, all the editing,
01:29:07.920 | everything I do, to edit this podcast,
01:29:09.760 | there's a lot of AI involved.
01:29:11.640 | Actually, there's this program,
01:29:13.280 | I wanna talk to those folks just 'cause I wanna nerd out,
01:29:15.560 | it's called iZotope, I don't know if you're familiar with it.
01:29:18.080 | They have a bunch of tools of audio processing,
01:29:20.160 | and they have, I think they're Boston-based.
01:29:23.080 | Just, it's so exciting to me to use it,
01:29:26.360 | like on the audio here, 'cause it's all machine learning.
01:29:30.360 | It's not, 'cause most audio production stuff
01:29:35.360 | is like any kind of processing you do,
01:29:37.520 | it's very basic signal processing.
01:29:39.520 | And you're tuning knobs and so on.
01:29:41.960 | They have all of that, of course,
01:29:43.560 | but they also have all of this machine learning stuff,
01:29:46.000 | like where you actually give it training data.
01:29:48.520 | You select parts of the audio you train on,
01:29:51.440 | you train on it, and it figures stuff out.
01:29:56.360 | It's great, it's able to detect,
01:29:58.080 | like the ability of it to be able to separate voice
01:30:03.320 | and music, for example, or voice in anything, is incredible.
01:30:07.240 | Like it just, it's clearly exceptionally good
01:30:11.160 | at applying these different neural networks models
01:30:14.920 | to separate the different kinds of signals from the audio.
01:30:18.920 | Okay, so that's really exciting.
01:30:22.280 | Photoshop, Adobe people also use it.
01:30:24.600 | But to generate a piece of music
01:30:28.280 | that will sell millions, a piece of art, yeah.
01:30:31.960 | - No, I agree, and that's,
01:30:38.640 | as I mentioned, I offer my AI class,
01:30:41.160 | and an integral part of this is the project, right?
01:30:44.640 | So it's my favorite, ultimate favorite part,
01:30:47.320 | because typically we have these project presentations
01:30:51.360 | the last two weeks of the class,
01:30:53.480 | it's right before the Christmas break,
01:30:56.160 | and it's sort of, it adds this cool excitement.
01:31:00.320 | And every time, I'm amazed with some projects
01:31:05.520 | that people come up with.
01:31:08.720 | And so, and quite a few of them are actually,
01:31:13.120 | they have some link to arts.
01:31:17.760 | I mean, I think last year, we had a group
01:31:22.160 | who designed an AI producing hokus, Japanese poems.
01:31:27.160 | - Oh, wow.
01:31:29.400 | - And some of them, so it got trained
01:31:34.120 | on the English base. - Hokus?
01:31:35.680 | - Hokus, hokus, right?
01:31:37.280 | So, and some of them, they get to present
01:31:42.280 | like the top selection.
01:31:44.240 | They were pretty good.
01:31:45.160 | I mean, of course, I'm not a specialist,
01:31:47.840 | but you read them, and you see--
01:31:50.280 | - It seems profound.
01:31:51.440 | - Yes, yeah, it seems reason, so it's kind of cool.
01:31:54.920 | We also had a couple of projects
01:31:57.840 | where people tried to teach AI how to play like rock music,
01:32:03.520 | classical music, I think, and popular music.
01:32:08.400 | Interestingly enough, classical music
01:32:14.240 | was among the most difficult ones.
01:32:16.600 | Of course, if you look at the grandmasters of music,
01:32:25.160 | like Bach, right?
01:32:31.080 | So, there is a lot of almost math.
01:32:34.840 | - Yeah, well, he's very mathematical.
01:32:36.600 | - Exactly, so this is, I would imagine
01:32:39.040 | that at least some style of this music could be picked up,
01:32:43.800 | but then you have this completely different spectrum
01:32:46.960 | of classical composers, and so it's almost like,
01:32:51.960 | you don't have to sort of look at the data.
01:32:56.800 | You just listen to it, and say, "Nah, that's not it,
01:33:00.560 | "not yet." - That's not it.
01:33:01.600 | Yeah, that's how I feel, too.
01:33:03.360 | There's OpenAI has, I think, OpenMuse
01:33:05.840 | or something like that, the system.
01:33:07.560 | It's cool, but it's like, eh,
01:33:09.760 | it's not compelling for some reason.
01:33:12.080 | It could be a psychological reason, too.
01:33:14.200 | Maybe we need to have a human being,
01:33:17.560 | a tortured soul behind the music, I don't know.
01:33:20.680 | - Yeah, no, absolutely, I completely agree.
01:33:23.940 | But yeah, whether or not we'll have,
01:33:26.560 | one day we'll have a song written by an AI engine
01:33:31.560 | to be in top charts, musical charts,
01:33:37.960 | I wouldn't be surprised.
01:33:39.300 | I wouldn't be surprised.
01:33:41.720 | - I wonder if we already have one,
01:33:44.720 | and it just hasn't been announced.
01:33:46.400 | (both laugh)
01:33:48.000 | We wouldn't know.
01:33:49.960 | How hard is the multi-protein folding problem?
01:33:53.920 | Is that something you've already mentioned,
01:33:57.080 | which is baked into this idea
01:33:58.720 | of greater and greater complexity of proteins?
01:34:01.160 | Like multi-domain proteins,
01:34:03.280 | is that basically become multi-protein complexes?
01:34:08.280 | - Yes, you got it right.
01:34:10.640 | So it has the components of both,
01:34:15.640 | of protein folding and protein-protein interactions.
01:34:22.560 | Because in order for these domains,
01:34:24.480 | I mean, many of these proteins,
01:34:26.520 | actually, they never form a stable structure.
01:34:30.140 | One of my favorite proteins,
01:34:33.080 | and pretty much everyone who works in the,
01:34:37.760 | I know, whom I know who works with proteins,
01:34:41.800 | they always have their favorite proteins.
01:34:44.720 | Right, so one of my favorite proteins,
01:34:47.720 | probably my favorite protein,
01:34:49.200 | the one that I worked when I was a postdoc,
01:34:51.500 | is so-called post-synaptic density 95, PSD95 protein.
01:34:56.240 | So it's one of the key actors
01:35:00.520 | in the majority of neurological processes
01:35:03.820 | at the molecular level.
01:35:04.880 | And essentially, it's a key player
01:35:10.000 | in the post-synaptic density.
01:35:13.520 | So this is the crucial part of the synapse,
01:35:17.200 | where a lot of these chemological processes
01:35:21.400 | are happening.
01:35:22.480 | So it has five domains, right?
01:35:26.280 | So five protein domains.
01:35:27.480 | It's a pretty large proteins,
01:35:30.920 | I think 600 something, I mean,
01:35:33.920 | but the way it's organized itself, it's flexible, right?
01:35:40.740 | So it acts as a scaffold.
01:35:43.900 | So it is used to bring in other proteins.
01:35:49.340 | So they start acting in the orchestrated manner, right?
01:35:54.340 | So, and the type of the shape of this protein,
01:35:58.860 | it's in a way, there are some stable parts of this protein,
01:36:02.580 | but there are some flexible.
01:36:04.540 | And this flexibility is built in,
01:36:07.740 | into the protein in order to become
01:36:09.580 | sort of this multifunctional machine.
01:36:13.180 | - So do you think that kind of thing is also learnable
01:36:16.540 | through the AlphaFold2 kind of approach?
01:36:19.400 | - I mean, the time will tell.
01:36:22.420 | - Is it another level of complexity?
01:36:24.500 | Is it, like how big of a jump in complexity
01:36:27.340 | is that whole thing?
01:36:28.180 | - To me, it's yet another level of complexity,
01:36:31.380 | because when we talk about protein-protein interactions,
01:36:35.180 | and there is actually a different challenge for this,
01:36:38.860 | called CAPRI.
01:36:40.020 | And so this, that is focused specifically
01:36:43.420 | on macromolecular interactions.
01:36:45.700 | Protein-protein, protein-DNA, et cetera.
01:36:48.560 | So, but it's, you know, there are different mechanisms
01:36:53.560 | that govern molecular interactions,
01:36:58.760 | and that need to be picked up,
01:37:00.740 | say by a machine learning algorithm.
01:37:03.660 | Interestingly enough, we actually,
01:37:06.540 | we participated for a few years in this competition.
01:37:11.540 | We typically don't participate in competitions,
01:37:14.900 | I don't know, don't have enough time,
01:37:19.220 | you know, 'cause it's very intensive.
01:37:21.100 | It's a very intensive process.
01:37:23.700 | But we participated back in, you know,
01:37:28.180 | about 10 years ago or so.
01:37:30.580 | And the way we entered this competition,
01:37:32.660 | so we design a scoring function, right?
01:37:35.420 | So the function that evaluates whether or not
01:37:38.100 | your protein-protein interaction is supposed to look like
01:37:41.900 | experimentally solved, right?
01:37:43.340 | So the scoring function is very critical part
01:37:45.900 | of the model prediction.
01:37:49.820 | So we design it to be a machine learning one.
01:37:52.700 | And so it was one of the first
01:37:55.820 | machine learning-based scoring function used in CAPRI.
01:37:59.980 | And, you know, we essentially, you know,
01:38:03.900 | learned what should contribute,
01:38:06.580 | what are the critical components contributing
01:38:08.860 | into the protein-protein interaction.
01:38:10.540 | - So this could be converted into a learning problem
01:38:13.340 | and thereby it could be learned.
01:38:15.580 | - I believe so, yes.
01:38:17.020 | - Do you think AlphaFold2 or something similar to it
01:38:20.460 | from DeepMind or somebody else will be,
01:38:24.300 | will result in a Nobel Prize or multiple Nobel Prizes?
01:38:28.660 | So like, you know, obviously, maybe not so obviously,
01:38:33.300 | you can't give a Nobel Prize to a computer program.
01:38:38.020 | You, at least for now, give it to the designers
01:38:40.980 | of that program.
01:38:42.140 | But do you see one or multiple Nobel Prizes
01:38:46.060 | where AlphaFold2 is like a large percentage
01:38:51.060 | of what that prize is given for?
01:38:54.880 | Would it lead to discoveries at the level of Nobel Prizes?
01:38:58.940 | - I mean, I think we are definitely destined
01:39:05.380 | to see the Nobel Prize becoming sort of,
01:39:08.700 | to be evolving with the evolution of science.
01:39:12.300 | And the evolution of science is such
01:39:14.500 | that it now becomes like really multifaceted, right?
01:39:17.820 | So where you don't really have like a unique discipline,
01:39:21.300 | you have sort of the, a lot of cross-disciplinary talks
01:39:25.660 | in order to achieve sort of, you know,
01:39:28.460 | really big advancements, you know.
01:39:32.380 | So I think, you know, the computational methods
01:39:37.380 | will be acknowledged in one way or another.
01:39:42.500 | And as a matter of fact, you know,
01:39:46.860 | they were first acknowledged back in 2013, right?
01:39:50.580 | Where, you know, the first three people were, you know,
01:39:55.580 | awarded the Nobel Prize for the,
01:39:59.140 | for study of the protein folding, right, the principle.
01:40:01.460 | And, you know, I think all three of them
01:40:03.820 | are computational biophysicists, right?
01:40:06.940 | So, you know, that I think is unavoidable, you know.
01:40:11.940 | It will come with the time.
01:40:16.560 | The fact that, you know, alpha fold and, you know,
01:40:23.460 | similar approaches, 'cause again, it's a matter of time
01:40:26.340 | that people will embrace this, you know, principle.
01:40:31.700 | And we'll see more and more such, you know,
01:40:34.940 | such tools coming into play.
01:40:36.940 | But, you know, these methods will be critical
01:40:41.940 | in a scientific discovery, no doubts about it.
01:40:47.380 | - On the engineering side, maybe a dark question,
01:40:51.460 | but do you think it's possible
01:40:53.380 | to use these machine learning methods
01:40:55.140 | to start to engineer proteins?
01:40:59.000 | And the next question is something quite a few biologists
01:41:04.000 | are against, some are for, for study purposes,
01:41:07.280 | is to engineer viruses.
01:41:09.620 | Do you think machine learning, like something
01:41:11.860 | like alpha fold could be used to engineer viruses?
01:41:14.780 | - So to answering the first question, you know,
01:41:16.980 | it has been, you know, a part of the research
01:41:21.660 | in the protein science.
01:41:22.700 | The protein design is, you know,
01:41:25.500 | is a very prominent areas of research.
01:41:29.160 | Of course, you know, one of the pioneers is David Baker
01:41:32.000 | and Rosetta algorithm that, you know,
01:41:34.900 | essentially was doing the de novo design
01:41:38.220 | and was used to design new proteins, you know.
01:41:41.540 | - And design of proteins means design of function.
01:41:44.200 | So like when you design a protein, you can control,
01:41:47.320 | I mean, the whole point of a protein,
01:41:49.080 | with the protein structure comes a function,
01:41:52.200 | like it's doing something.
01:41:53.720 | - Correct, correct.
01:41:54.560 | - Or design different things.
01:41:56.240 | - So you can, yeah, so you can, well,
01:41:58.160 | you can look at the proteins from the functional perspective,
01:42:00.680 | you can also look at the proteins
01:42:02.720 | from the structural perspective, right?
01:42:04.200 | So the structural building blocks.
01:42:05.700 | So if you want to have a building block of a certain shape,
01:42:08.880 | you can try to achieve it by, you know,
01:42:11.160 | introducing a new protein sequence
01:42:13.160 | and predicting, you know, how it will fold.
01:42:17.240 | So with that, I mean, it's a natural,
01:42:22.040 | one of the natural applications of these algorithms.
01:42:27.040 | Now, talking about engineering a virus.
01:42:32.480 | - With machine learning.
01:42:35.120 | - With machine learning, right?
01:42:36.360 | So, well, you know, so luckily for us,
01:42:41.360 | I mean, we don't have that much data, right?
01:42:46.680 | We actually, right now, one of the projects
01:42:50.120 | that we are carrying on in the lab
01:42:53.680 | is we're trying to develop a machine learning algorithm
01:42:56.960 | that determines whether or not
01:43:00.040 | the current strain is pathogenic.
01:43:02.680 | And-- - The current strain
01:43:03.760 | of the coronavirus.
01:43:04.600 | - Of the virus, I mean, so there are applications
01:43:07.720 | to coronaviruses because we have strains of SARS-CoV-2,
01:43:11.440 | also SARS-CoV, MERS, that are pathogenic,
01:43:14.600 | but we also have strains of other coronaviruses
01:43:17.640 | that are not pathogenic, I mean,
01:43:20.440 | the common cold viruses and some other ones, right?
01:43:25.440 | So-- - Pathogenic meaning spreading.
01:43:29.000 | - Pathogenic meaning it's actually inflicting damage.
01:43:33.760 | Correct.
01:43:35.320 | There are also some seasonal versus pandemic strains
01:43:39.720 | of influenza, right?
01:43:41.760 | And determining what are the molecular determinant, right,
01:43:45.520 | so that are built in into the protein sequence,
01:43:48.320 | into the gene sequence, right?
01:43:50.720 | So, and whether or not the machine learning
01:43:53.000 | can determine those components, right?
01:43:58.000 | - Oh, interesting, so like using machine learning,
01:44:00.680 | that's really interesting, to given,
01:44:03.400 | the input is like, what, the entire--
01:44:06.800 | - Protein sequence.
01:44:07.640 | - The protein sequence, and then determine
01:44:09.760 | if this thing is gonna be able to do damage
01:44:12.360 | to a biological system.
01:44:14.640 | - Yeah.
01:44:15.920 | So, I mean-- - It's good machine learning,
01:44:17.480 | you're saying we don't have enough data for that?
01:44:19.720 | - I mean, for this specific one, we do.
01:44:22.640 | We might actually have to back up on this,
01:44:25.560 | 'cause we're still in the process.
01:44:27.240 | There was one work that appeared in bio-archive
01:44:31.680 | by Eugene Kunin, who is one of these pioneers
01:44:35.480 | in evolutionary genomics, and they tried to look at this,
01:44:41.760 | but the methods were sort of standard,
01:44:45.080 | supervised learning methods, and now the question is,
01:44:50.200 | can you advance it further by using not so standard methods?
01:44:56.320 | So, there's obviously a lot of hope in transfer learning,
01:45:02.680 | where you can actually try to transfer the information
01:45:06.200 | that the machine learning learns
01:45:07.680 | about the proper protein sequences, right?
01:45:11.320 | And so, there is some promise in going this direction,
01:45:16.320 | but if we have this, it would be extremely useful,
01:45:20.440 | because then we could essentially forecast
01:45:22.960 | the potential mutations that would make a current strain
01:45:26.280 | more or less pathogenic.
01:45:27.560 | - Anticipate them from a vaccine development,
01:45:31.120 | for the treatment, antiviral drug development.
01:45:34.520 | - That would be a very crucial task.
01:45:36.840 | - But you could also use that system to then say,
01:45:41.840 | how would we potentially modify this virus
01:45:45.240 | to make it more pathogenic?
01:45:47.160 | - That's true, that's true.
01:45:50.240 | I mean, you know, again, the hope is, well, several things.
01:45:55.240 | So, one is that, even if you design a sequence, right?
01:46:06.760 | So, to carry out the actual experimental biology,
01:46:11.760 | to ensure that all the components working,
01:46:16.000 | you know, is a completely different matter.
01:46:19.080 | - Difficult process.
01:46:19.920 | - Yes, then we've seen in the past,
01:46:24.400 | there could be some regulation of the moment
01:46:27.680 | the scientific community recognizes
01:46:30.440 | that it's now becoming no longer a sort of a fun puzzle
01:46:34.600 | for machine learning.
01:46:36.640 | - Could be a weapon.
01:46:37.840 | - Yeah, so then there might be some regulation.
01:46:40.440 | So, I think back in, what, 2015,
01:46:44.760 | there was an issue on regulating the research
01:46:49.520 | on influenza strains, right?
01:46:52.480 | So, there were several groups use sort of mutation analysis
01:46:57.480 | to determine whether or not this strain will jump
01:47:01.820 | from one species to another.
01:47:03.280 | And I think there was like a half a year moratorium
01:47:06.520 | on the research, on the paper published,
01:47:09.760 | until scientists analyzed it
01:47:13.600 | and decided that it's actually safe.
01:47:16.440 | - I forgot what that's called,
01:47:17.600 | something of function, test of function.
01:47:20.040 | - Gain of function, loss of function.
01:47:21.280 | - Gain of function, yeah, gain of function,
01:47:23.040 | loss of function, that's right, sorry.
01:47:24.980 | It's like, let's watch this thing mutate for a while
01:47:29.640 | to see what kind of things we can observe.
01:47:33.760 | I guess, I'm not so much worried about that kind of research
01:47:37.240 | if there's a lot of regulation
01:47:38.600 | and if it's done very well
01:47:40.160 | and with competence and seriously.
01:47:42.760 | I am more worried about kind of this,
01:47:45.680 | the underlying aspect of this question
01:47:49.600 | is more like 50 years from now.
01:47:51.280 | Speaking to the Drake equation,
01:47:54.940 | one of the parameters in the Drake equation
01:47:57.280 | is how long civilizations last.
01:47:59.840 | And that seems to be the most important value, actually,
01:48:03.880 | for calculating if there's other alien
01:48:06.160 | intelligent civilizations out there.
01:48:08.080 | That's where there's most variability.
01:48:11.000 | Assuming, like if life, if that percentage
01:48:15.120 | that life can emerge is like not zero,
01:48:19.400 | like if we're super unique,
01:48:21.280 | then it's the how long we last
01:48:23.980 | is basically the most important thing.
01:48:26.200 | So from a selfish perspective,
01:48:29.000 | but also from a Drake equation perspective,
01:48:32.040 | I'm worried about our civilization lasting.
01:48:35.040 | And you kind of think about all the ways
01:48:37.640 | in which machine learning can be used
01:48:39.120 | to design greater weapons of destruction.
01:48:44.120 | And I mean, one way to ask that,
01:48:48.600 | if you look sort of 50 years from now,
01:48:50.560 | 100 years from now,
01:48:51.680 | would you be more worried about natural pandemics
01:48:55.800 | or engineered pandemics?
01:48:57.560 | Like who is the better designer of viruses,
01:49:02.640 | nature or humans, if we look down the line?
01:49:05.980 | - I think, in my view,
01:49:08.840 | I would still be worried about the natural pandemics,
01:49:12.680 | simply because, I mean, the capacity
01:49:15.600 | of the nature producing this.
01:49:20.720 | - It does a pretty good job, right?
01:49:22.720 | - Yes.
01:49:23.560 | - The motivation for using virus,
01:49:25.320 | engineering viruses as a weapon is a weird one,
01:49:29.060 | because maybe you can correct me on this,
01:49:31.520 | but it seems very difficult to target a virus, right?
01:49:35.640 | The whole point of a weapon, the way a rocket works,
01:49:38.440 | if a starting point, you have an end point,
01:49:40.120 | and you're trying to hit a target,
01:49:42.360 | to hit a target with a virus is very difficult.
01:49:44.680 | It's basically just, right?
01:49:47.040 | It's the target would be the human species.
01:49:50.100 | Oh man.
01:49:52.920 | - Yeah, I have a hope in us,
01:49:54.800 | I'm forever optimistic that we will not,
01:49:58.240 | there's insufficient evil in the world
01:50:02.120 | to lead to that kind of destruction.
01:50:04.560 | - Well, I also hope that, I mean, that's what we see.
01:50:07.760 | I mean, with the way we are getting connected,
01:50:11.760 | the world is getting connected,
01:50:14.440 | I think it helps for the world to become more transparent.
01:50:19.440 | - Yeah.
01:50:22.560 | - So the information spread is,
01:50:27.120 | I think it's one of the key things for the society
01:50:31.640 | to become more balanced.
01:50:35.600 | - Yeah. - One way or another.
01:50:36.440 | - This is something that people disagree with me on,
01:50:38.360 | but I do think that the kind of secrecy
01:50:41.920 | that governments have,
01:50:43.480 | so you're kind of speaking more to the other aspects,
01:50:47.060 | like research community being more open,
01:50:49.700 | companies are being more open,
01:50:52.160 | government is still like,
01:50:53.920 | we're talking about like military secrets.
01:50:57.880 | I think military secrets of the kind
01:51:01.440 | that could destroy the world
01:51:03.760 | will become also a thing of the 20th century.
01:51:07.320 | It'll become more and more open.
01:51:09.360 | - Yeah.
01:51:10.200 | - I think nations will lose power in the 21st century,
01:51:13.240 | like lose sufficient power towards secrecies.
01:51:16.000 | Transparency is more beneficial than secrecy,
01:51:18.900 | but of course it's not obvious.
01:51:21.200 | Let's hope so, let's hope so,
01:51:23.400 | that the governments will become more transparent.
01:51:28.400 | - So we last talked, I think, in March or April.
01:51:35.280 | What have you learned?
01:51:36.760 | How has your philosophical, psychological,
01:51:40.480 | biological worldview changed since then?
01:51:43.800 | Or you've been studying it nonstop
01:51:46.120 | from a computational biology perspective.
01:51:48.900 | How has your understanding and thoughts
01:51:50.420 | about this virus changed over those months,
01:51:53.000 | from the beginning to today?
01:51:54.440 | - One thing that I was really amazed
01:51:58.160 | at how efficient the scientific community was.
01:52:03.120 | I mean, and even just judging on this very narrow domain
01:52:08.120 | of protein structure,
01:52:12.520 | understanding the structural characterization
01:52:16.600 | of this virus from the components point of view,
01:52:19.880 | whole virus point of view.
01:52:21.480 | If you look at SARS, right,
01:52:26.040 | the something that happened, you know,
01:52:28.620 | less than 20, but close enough, 20 years ago.
01:52:34.040 | And you see what, when it happened,
01:52:38.500 | what was sort of the response by the scientific community.
01:52:42.460 | You see that the structural characterizations did occur,
01:52:47.100 | but it took several years, right?
01:52:51.640 | Now, the things that took several years,
01:52:54.920 | it's a matter of months, right?
01:52:56.880 | So we see that, you know, the research pop up.
01:53:01.620 | We are at the unprecedented level
01:53:03.960 | in terms of the sequencing, right?
01:53:06.000 | Never before we had a single virus sequenced so many times.
01:53:11.000 | You know, so which allows us to actually,
01:53:17.040 | to trace very precisely the sort of the evolutionary nature
01:53:22.040 | of this virus, what happens.
01:53:25.780 | And it's not just the, you know,
01:53:28.200 | this virus independently of everything.
01:53:31.780 | It's, you know, it's the, you know,
01:53:34.240 | the sequence of this virus linked,
01:53:36.520 | anchored to the specific geographic place,
01:53:39.940 | to specific people, because, you know,
01:53:42.160 | our genotype influences also, you know,
01:53:47.160 | the evolution of this, you know,
01:53:48.900 | it's always a host pathogen co-evolution that, you know,
01:53:53.900 | occurs.
01:53:55.400 | - It'd be cool if we also had a lot more data about,
01:53:58.960 | sort of, the spread of this virus.
01:54:01.280 | Not maybe, well, it'd be nice if we had it
01:54:05.120 | for like contact tracing purposes for this virus,
01:54:08.120 | but it'd be also nice if we had it
01:54:09.700 | for the study for future viruses,
01:54:11.960 | to be able to respond and so on.
01:54:13.600 | But it's already nice that we have geographical data
01:54:16.040 | and the basic data from individual humans.
01:54:18.240 | - Exactly.
01:54:19.080 | No, I think contact tracing is obviously a key component
01:54:24.080 | in understanding the spread of this virus.
01:54:28.020 | There is also, there is a number of challenges, right?
01:54:31.720 | So XPRICE is one of them.
01:54:33.760 | We, you know, just recently, you know,
01:54:38.760 | took a part of this competition.
01:54:40.880 | It's the prediction of the number of infections
01:54:45.880 | in different regions.
01:54:47.880 | So, and, you know, obviously the AI is the main topic
01:54:53.880 | in those predictions.
01:54:56.280 | - Yeah, but it's still the data.
01:54:58.840 | I mean, that's a competition,
01:55:00.320 | but the data is weak on the training.
01:55:05.320 | Like, it's great.
01:55:07.620 | It's much more than probably before,
01:55:09.320 | but like, it would be nice if it was like really rich.
01:55:12.840 | Like I talked to Michael Mina from Harvard.
01:55:16.760 | I mean, he dreams that the community comes together
01:55:19.000 | with like a weather map to where a viruses, right?
01:55:22.960 | Like really high resolution sensors on like how,
01:55:27.840 | from person to person, the viruses that travel,
01:55:29.880 | all the different kinds of viruses, right?
01:55:32.000 | Because there's a ton of them.
01:55:34.620 | And then you'd be able to tell the story
01:55:36.800 | that you've spoken about of the evolution of these viruses,
01:55:41.200 | like day-to-day mutations that are occurring.
01:55:44.800 | I mean, that'd be fascinating,
01:55:46.040 | just from a perspective of study
01:55:48.680 | and from the perspective of being able to respond
01:55:50.680 | to future pandemics.
01:55:51.640 | That's ultimately what I'm worried about.
01:55:53.940 | People love books.
01:55:56.440 | Is there some three or whatever number of books,
01:56:01.160 | technical fiction, philosophical,
01:56:02.960 | that brought you joy in life,
01:56:06.240 | had an impact on your life,
01:56:07.760 | and maybe some that you would recommend others?
01:56:11.360 | - So I'll give you three very different books,
01:56:13.640 | and I also have a special runner-up.
01:56:15.840 | - Honorable mention.
01:56:18.160 | - I mean, it's an audio book,
01:56:22.000 | and there's some specific reason behind it.
01:56:25.520 | So the first book is something
01:56:28.360 | that sort of impacted my earlier stage of life,
01:56:32.480 | and I'm probably not gonna be very original here.
01:56:36.240 | It's "Bulgakov's Master and Margarita."
01:56:39.120 | So that's probably, you know.
01:56:41.280 | - Well, not for a Russian, maybe it's not super original,
01:56:43.880 | but it's a really powerful book for even in English.
01:56:47.640 | So I read it in English, so.
01:56:49.160 | - It is incredibly powerful,
01:56:51.440 | and I mean, it's the way it ends, right?
01:56:55.440 | So I still have goosebumps
01:56:58.640 | when I read the very last sort of,
01:57:01.520 | it's called prologue,
01:57:03.120 | where it's just so powerful.
01:57:05.800 | - What impact did it have on you?
01:57:07.320 | What ideas, what insights did you get from it?
01:57:09.280 | - I was just taken by, you know,
01:57:12.200 | by the fact that you have those parallel lives
01:57:17.200 | apart from many centuries, right?
01:57:23.160 | And somehow they got sort of intertwined
01:57:26.880 | into one story.
01:57:28.960 | And that, to me, was fascinating.
01:57:33.840 | And, you know, of course, the romantic part of this book
01:57:38.840 | is like, it's not just romance,
01:57:41.760 | it's like the romance empowered by sort of magic, right?
01:57:45.840 | And maybe on top of that, you have some irony,
01:57:50.840 | which is unavoidable, right?
01:57:53.400 | Because it was that, you know, the Soviet time.
01:57:56.440 | - But it's very, it's deeply Russian.
01:57:58.560 | So that's the wit, the humor, the pain, the love,
01:58:03.560 | all of that is one of the books
01:58:06.200 | that kind of captures something about Russian culture
01:58:10.280 | that people outside of Russia should probably read.
01:58:12.560 | - I agree.
01:58:13.400 | - What's the second one?
01:58:14.240 | - So the second one is, again, another one that,
01:58:18.240 | it happened, I read it later in my life.
01:58:21.880 | I think I read it first time when I was a,
01:58:26.600 | a graduate student.
01:58:27.760 | And that's the Solzhenitsyn's "Cancer Ward."
01:58:30.880 | That is amazingly powerful book.
01:58:36.440 | It's-- - What is it about?
01:58:37.640 | - It's about, I mean, essentially based on,
01:58:41.640 | you know, Solzhenitsyn was diagnosed with cancer
01:58:44.760 | when he was reasonably young and he made a full recovery.
01:58:49.480 | But, you know, so this is about a person
01:58:54.480 | who was sentenced for life in one of these, you know, camps.
01:58:59.360 | And he had some cancer.
01:59:03.680 | So he was, you know, transported back
01:59:06.840 | to one of these Soviet republics,
01:59:10.000 | I think, you know, South Asian republics.
01:59:13.480 | And the book is about, you know,
01:59:19.820 | his experience being a prisoner,
01:59:24.820 | being a, you know, a patient in the cancer clinic,
01:59:29.820 | in a cancer ward, surrounded by people,
01:59:32.380 | many of which die, right?
01:59:35.340 | But in the way, you know, the way it reads,
01:59:41.580 | I mean, first of all, later on,
01:59:43.140 | I read the accounts of the doctors
01:59:47.580 | who described these, you know, the experiences,
01:59:51.780 | in the book, by the patient as incredibly accurate, right?
01:59:58.740 | So, you know, I read that there was some doctors saying
02:00:03.340 | that, you know, every single doctor should read this book
02:00:07.140 | to understand what the patient feels.
02:00:10.660 | But, you know, again, as many of the Solzhenitsyn's books,
02:00:17.060 | it has multiple levels of complexity.
02:00:19.540 | And obviously, you know, if you look above the cancer
02:00:24.540 | and the patient, I mean, the tumor that was growing
02:00:29.820 | and then disappeared in his body with some consequences,
02:00:35.340 | I mean, this is, you know, allegorically the Soviet,
02:00:45.300 | and, you know, and he actually, he agreed,
02:00:48.020 | you know, when he was asked, he said that this is
02:00:51.020 | what made him think about this, you know,
02:00:53.940 | how to combine these experiences.
02:00:56.140 | Him being a part of the, you know, of the Soviet regime,
02:01:00.140 | also being a part of the, you know,
02:01:04.020 | of someone sent to the Gulag camp, right?
02:01:08.020 | And also someone who experienced cancer in his life.
02:01:12.700 | You know, the Gulag Archipelago and this book,
02:01:16.580 | these are the works that actually made him,
02:01:20.300 | you know, receive a Nobel Prize.
02:01:22.860 | But, you know, to me, I've, you know,
02:01:25.940 | I've read other, you know, books by Solzhenitsyn.
02:01:30.940 | This one is, to me, is the most powerful one.
02:01:34.780 | - And by the way, both this one and the previous one,
02:01:37.100 | you read in Russian?
02:01:38.700 | - Yes, yes.
02:01:40.280 | So now there is, the third book is an English book,
02:01:44.460 | and it's completely different.
02:01:45.700 | So, you know, we're switching the gears completely.
02:01:48.760 | So this is the book, which it's not even a book,
02:01:52.260 | it's an essay by Jonathan Neumann.
02:01:56.940 | - Oh, wow.
02:01:57.780 | - Called "The Computer and the Brain."
02:01:59.660 | And that was the book he was writing,
02:02:03.980 | knowing that he was dying of cancer.
02:02:07.760 | So the book was released back,
02:02:09.860 | it's a very thin book, right?
02:02:12.340 | But the power, the intellectual power
02:02:17.340 | in this book, in this essay is incredible.
02:02:21.320 | I mean, you probably know that von Neumann
02:02:24.320 | is considered to be one of the biggest thinkers, right?
02:02:28.680 | So his intellectual power was incredible, right?
02:02:32.500 | And you can actually feel this power in this book
02:02:36.500 | where, you know, the person is writing
02:02:38.220 | knowing that he will be, you know, he will die.
02:02:41.340 | The book actually got published only after his death,
02:02:44.260 | back in 1958, he died in 1957.
02:02:48.220 | And, but, so he tried to put as many ideas
02:02:53.060 | that, you know, he still, you know, hadn't realized.
02:02:58.060 | And, you know, so this book is very difficult to read
02:03:04.780 | because, you know, every single paragraph is just compact.
02:03:09.780 | You know, it's filled with these ideas
02:03:13.620 | and, you know, the ideas are incredible.
02:03:15.960 | You know, nowadays, you know, so he tried
02:03:19.900 | to put the parallels between the brain computing power,
02:03:24.900 | the neural system, and the computers, you know,
02:03:28.740 | as they were understood.
02:03:29.580 | - Do you remember what year he was working on this?
02:03:31.380 | Like approximately?
02:03:32.420 | - '57. - '57.
02:03:33.720 | - So that was right during his, you know,
02:03:36.420 | when he was diagnosed with cancer and he was essentially.
02:03:39.780 | - Yeah, he's one of those, there's a few folks
02:03:42.740 | people mention, I think Ed Witten is another,
02:03:45.540 | that like, everyone that meets them,
02:03:49.080 | they say he's just an intellectual powerhouse.
02:03:51.820 | - Yes.
02:03:52.640 | - Okay, so who's the honorable mention?
02:03:54.340 | - So, so, so, and this is, I mean,
02:03:56.620 | the reason I put it sort of in this separate section
02:03:59.500 | because this is a book that I recently listened to.
02:04:04.500 | So it's an audio book, and this is a book
02:04:09.240 | called Lab Girl by Hope Jarren.
02:04:12.560 | So Hope Jarren, she is a scientist,
02:04:16.480 | she's a geochemist that essentially studies
02:04:20.440 | the fossil plants and so she uses the fossil plants,
02:04:29.000 | the chemical analysis to understand what was the climate
02:04:33.560 | back in, you know, in thousand years,
02:04:38.320 | hundreds of thousands of years ago.
02:04:40.280 | And so something that incredibly touched me by this book,
02:04:45.280 | it was narrated by the author.
02:04:48.400 | - Nice, excellent. - And it's an incredibly
02:04:51.680 | personal story, incredibly.
02:04:54.080 | So certain parts of the book you could actually
02:04:58.720 | hear the author crying.
02:05:00.100 | And that, to me, I mean, I never experienced
02:05:03.920 | anything like this, you know, reading the book.
02:05:06.400 | But it was like, you know, the connection
02:05:10.200 | between you and the author.
02:05:12.720 | And I think this is, you know, this is really a must read,
02:05:17.240 | but even better, a must listen to audio book
02:05:22.240 | for anyone who wants to learn about sort of, you know,
02:05:26.880 | academia, science, research in general.
02:05:30.920 | Because it's a very personal account
02:05:32.960 | about her becoming a scientist.
02:05:36.760 | - So we're just before New Year's, you know,
02:05:42.980 | we talked a lot about some difficult topics,
02:05:46.560 | the viruses and so on.
02:05:48.120 | Do you have some exciting things you're looking forward
02:05:51.840 | to in 2021, some New Year's resolutions,
02:05:56.120 | maybe silly or fun, or something very important
02:06:01.120 | and fundamental to the world of science
02:06:04.720 | or something completely unimportant?
02:06:06.680 | - Well, I'm definitely looking forward
02:06:11.160 | towards, you know, things becoming normal, right?
02:06:15.640 | So yes, so I really miss traveling.
02:06:19.840 | Every summer I go to a international city,
02:06:25.900 | international summer school,
02:06:27.340 | it's called the School for Molecular and Theoretical Biology.
02:06:30.380 | It's held in Europe,
02:06:31.820 | it's organized by very good friends of mine.
02:06:34.380 | And this is the school for gifted kids
02:06:37.860 | from all over the world, and they're incredibly bright.
02:06:41.020 | It's like, every time I go there, it's like, you know,
02:06:43.940 | it's the highlight of the year.
02:06:46.580 | And we couldn't make it this August,
02:06:50.980 | so we did this school remotely, but it's different.
02:06:55.460 | So I am definitely looking forward
02:06:58.780 | to next August coming there.
02:07:01.140 | I also, I mean, you know, one of my personal resolutions,
02:07:05.900 | I realized that being in-house and working from home,
02:07:10.900 | you know, I realized that actually,
02:07:15.500 | I apparently missed a lot, you know,
02:07:20.460 | spending time with my family, believe it or not.
02:07:24.420 | So you typically, you know, with all the research
02:07:28.300 | and teaching and everything related to the academic life,
02:07:33.300 | I mean, you get distracted.
02:07:38.280 | And so, you know, you don't feel that, you know,
02:07:43.280 | the fact that you are away from your family
02:07:46.700 | doesn't affect you because you're, you know,
02:07:48.660 | naturally distracted by other things.
02:07:50.860 | And, you know, this time I realized that, you know,
02:07:55.140 | that that's so important, right?
02:07:58.180 | Spending your time with the family, with your kids.
02:08:01.940 | And so that would be my new year resolution
02:08:05.420 | in actually trying to spend as much time as possible.
02:08:09.900 | - Even when the world opens up.
02:08:11.300 | Yeah, that's a beautiful message.
02:08:13.900 | That's a beautiful reminder.
02:08:15.660 | I asked you if there's a Russian poem you could read
02:08:20.780 | that I could force you to read.
02:08:22.060 | And you said, okay, fine, sure.
02:08:23.960 | Do you mind reading?
02:08:27.700 | And you said that no paper needed, so.
02:08:30.700 | - Nope.
02:08:31.540 | So yeah, so this poem was written by my namesake,
02:08:35.580 | another Dmitry, Dmitry Kemmerfeld.
02:08:38.060 | And it's a recent poem,
02:08:42.380 | and it's called "Sorceress", "Vedma" in Russian.
02:08:50.220 | Or actually, "Kaldunya".
02:08:52.940 | So that's sort of another connotation of sorceress or witch.
02:08:57.940 | And I really like it.
02:08:59.740 | And it's one of just a handful poems
02:09:02.740 | I actually can recall by heart.
02:09:05.460 | I also have a very strong association
02:09:08.660 | when I read this poem with Master Margarita,
02:09:12.540 | the main female character, Margarita.
02:09:18.260 | And also it's happening about the same time
02:09:23.220 | we are talking now, so around New Year, around Christmas.
02:09:28.220 | - Do you mind reading it in Russian?
02:09:31.980 | - I'll give it a try.
02:09:34.140 | (speaking in foreign language)
02:09:39.700 | (speaking in foreign language)
02:09:43.620 | (speaking in foreign language)
02:09:47.540 | (speaking in foreign language)
02:09:51.940 | (speaking in foreign language)
02:09:55.860 | (speaking in foreign language)
02:09:59.780 | (speaking in foreign language)
02:10:03.700 | (speaking in foreign language)
02:10:08.140 | (speaking in foreign language)
02:10:12.060 | (speaking in foreign language)
02:10:15.980 | - That's beautiful.
02:10:41.340 | I love how it captures a moment of longing
02:10:44.820 | and maybe love even.
02:10:49.740 | - Yes.
02:10:50.660 | To me, it has a lot of meaning about this,
02:10:55.340 | something that is happening, something that is far away,
02:10:59.940 | but still very close to you.
02:11:02.260 | And yes, it's the winter.
02:11:06.180 | - There's something magical about winter, isn't it?
02:11:08.580 | What is the, well, I don't know.
02:11:10.420 | I don't know how to translate it,
02:11:11.900 | but a kiss in winter is interesting.
02:11:14.980 | Lips in winter and all that kind of stuff.
02:11:17.580 | It's beautifully, I mean, Russian has a way.
02:11:20.260 | As a reason, Russian poetry is just,
02:11:22.620 | I'm a fan of poetry in both languages,
02:11:24.540 | but English doesn't capture some of the magic
02:11:28.220 | that Russian seems to, so thank you for doing that.
02:11:31.620 | That was awesome.
02:11:32.660 | Dmitry, it's great to talk to you again.
02:11:35.560 | It's contagious how much you love what you do,
02:11:39.260 | how much you love life.
02:11:40.780 | So I really appreciate you taking the time to talk today.
02:11:44.020 | - And thank you for having me.
02:11:46.340 | - Thanks for listening to this conversation
02:11:47.820 | with Dmitry Korkin, and thank you to our sponsors,
02:11:50.740 | Brave Browser, NetSuite Business Management Software,
02:11:54.940 | Magic Spoon Low Carb Cereal,
02:11:56.940 | and Asleep Self-Cooling Mattress.
02:12:00.220 | So the choice is browsing privacy, business success,
02:12:04.100 | healthy diet, a comfortable sleep.
02:12:06.940 | Choose wisely, my friends.
02:12:08.580 | And if you wish, click the sponsor links below
02:12:11.140 | to get a discount and to support this podcast.
02:12:14.220 | Now, let me leave you with some words
02:12:16.620 | from Jeffrey Eugenides.
02:12:19.100 | Biology gives you a brain.
02:12:21.500 | Life turns it into a mind.
02:12:24.100 | Thank you for listening, and hope to see you next time.
02:12:27.780 | (upbeat music)
02:12:30.360 | (upbeat music)
02:12:32.940 | [BLANK_AUDIO]