back to indexDmitry Korkin: Computational Biology of Coronavirus | Lex Fridman Podcast #90
Chapters
0:0 Introduction
2:33 Viruses are terrifying and fascinating
6:2 How hard is it to engineer a virus?
10:48 What makes a virus contagious?
29:52 Figuring out the function of a protein
53:27 Functional regions of viral proteins
79:9 Biology of a coronavirus treatment
94:46 Is a virus alive?
97:5 Epidemiological modeling
115:27 Russia
122:31 Science bobbleheads
126:31 Meaning of life
00:00:00.000 |
The following is a conversation with Dmitry Korkin. 00:00:09.320 |
where he specializes in bioinformatics of complex diseases, 00:00:18.360 |
I came across Dmitry's work when in February, 00:00:21.400 |
his group used the viral genome of the COVID-19 00:00:25.040 |
to reconstruct the 3D structure of its major viral proteins 00:00:29.040 |
and their interaction with the human proteins, 00:00:32.360 |
in effect, creating a structural genomics map 00:00:44.480 |
and how computational methods can help us understand 00:00:49.360 |
in order to develop antiviral drugs and vaccines. 00:00:58.680 |
For everyone feeling the medical, psychological, 00:01:04.760 |
Stay strong, we're in this together, we'll beat this thing. 00:01:16.040 |
support it on Patreon, or simply connect with me on Twitter 00:01:30.560 |
Cash App lets you send money to friends, buy Bitcoin, 00:01:33.680 |
and invest in the stock market with as little as $1. 00:01:40.840 |
in the context of the history of money is fascinating. 00:01:55.680 |
And Bitcoin, the first decentralized cryptocurrency, 00:02:00.600 |
So given that history, cryptocurrency is still very much 00:02:10.820 |
So again, if you get Cash App from the App Store, 00:02:18.240 |
you get $10, and Cash App will also donate $10 to FIRST, 00:02:22.160 |
an organization that is helping to advance robotics 00:02:25.000 |
and STEM education for young people around the world. 00:02:28.480 |
And now, here's my conversation with Dmitry Korkin. 00:02:32.260 |
Do you find viruses terrifying or fascinating? 00:02:36.920 |
- When I think about viruses, I think about them, 00:02:53.000 |
That is impossible not to be fascinated with them. 00:02:57.600 |
- So what do you imagine when you think about a virus? 00:03:07.280 |
Or do you imagine the whole pandemic, like society level, 00:03:11.040 |
when you say the efficiency at which they do their work, 00:03:18.960 |
that occupy a human body or a living organism, 00:03:26.680 |
or do you think of the individual little guy? 00:03:34.240 |
that allows you to move from micro scale to the macro scale. 00:03:48.800 |
But it is perfected to the way that it essentially 00:04:11.640 |
So it's a machine, it's an intelligent machine. 00:04:20.240 |
you're in danger of reducing the power of this thing 00:04:25.000 |
But you now mentioned that it's also possibly intelligent. 00:04:30.520 |
It seems that there's these elements of brilliance 00:04:37.520 |
of maximizing so many things about its behavior 00:04:55.880 |
I understand it differently than I think about intelligence 00:05:00.880 |
of humankind or intelligence of the artificial intelligence 00:05:13.280 |
I think the intelligence of a virus is in its simplicity, 00:05:50.020 |
where essentially the viruses act as the whole 00:06:00.820 |
- So what do you attribute the incredible simplicity 00:06:14.320 |
are you more worried about the natural pandemics 00:06:23.080 |
- Yes, it's a very, very interesting question 00:06:26.200 |
because obviously there's a lot of conversations 00:06:55.140 |
We keep seeing new strains of influenza emerging, 00:07:02.200 |
We keep seeing new strains of coronaviruses emerging. 00:07:20.960 |
I've read papers about scientists trying to study 00:07:25.960 |
the capacity of the modern, you know, biotechnology 00:07:42.000 |
it won't be our main concern in the nearest future. 00:07:57.080 |
and look at the history of the most dangerous viruses, 00:08:02.400 |
right, so the first thing that comes into mind 00:08:08.580 |
So right now there is perhaps a handful of places 00:08:14.520 |
where this, you know, the strains of this virus are stored. 00:08:24.120 |
of the whole society to limit the access to those viruses. 00:08:29.120 |
- You mean in a lab in a controlled environment 00:08:37.680 |
for which it should be stated there's a vaccine 00:08:48.040 |
I mean, in my opinion, it was perhaps the most dangerous 00:09:08.840 |
Biologically, it's a so-called double-stranded DNA virus, 00:09:13.840 |
but also in the way that it is much more contagious. 00:09:34.320 |
as person infected by the virus can spread to other people. 00:09:49.480 |
And, you know, there is still some discussion 00:10:00.280 |
You know, the estimations vary between, you know, 00:10:11.000 |
And we're talking about the exponential growth, right? 00:10:33.040 |
but it's definitely, definitely more contagious 00:10:38.040 |
that the seasonal flu than the current coronavirus 00:10:54.080 |
that come into play, but is it that whole discussion 00:11:00.880 |
if it's airborne, or is there some other stuff 00:11:17.880 |
so the ways in which the virus is spread is definitely one. 00:11:22.240 |
The ability to virus to stay on the surfaces. 00:11:55.920 |
that we, one really needs to take into account, 00:11:59.840 |
the percentage of the symptomatic population. 00:12:09.920 |
and still are, you know, they still are contagious. 00:12:15.560 |
which I think is probably the most impressive size-wise, 00:12:27.720 |
is like, just the number of people who got infected 00:12:44.320 |
So the lucky thing there is the fatality rate is low. 00:12:52.320 |
an entire population so quickly, it's terrifying. 00:13:00.100 |
that's perhaps my favorite example of a butterfly effect. 00:13:23.280 |
a couple of small changes in the viral genome, 00:13:39.440 |
So this is, I mean, it's not even the size of a virus. 00:13:50.640 |
And all of a sudden this change has such a major impact. 00:13:55.640 |
- So is that a mutation like on a single virus? 00:14:04.640 |
the flap of a butterfly wing, like what's the first flap? 00:14:34.960 |
I mean, the fact that there are coronaviruses, 00:14:40.520 |
in various bat species, I mean, we know that. 00:14:47.400 |
they study them, they look at their genomic sequences. 00:14:54.360 |
what make this viruses to jump from bats to human. 00:15:17.360 |
studying influenza virus, essentially, you know, 00:15:25.760 |
can jump from one species to another, you know, 00:15:30.680 |
by changing, I think, just a couple of residues. 00:15:39.000 |
I think there was a moratorium on this study for a while, 00:15:43.600 |
but then the study was released, it was published. 00:15:58.560 |
I personally think it is important to study this. 00:16:06.000 |
we should try to understand as much as possible 00:16:10.440 |
- But, so then, the engineering aspect there is, 00:16:18.440 |
because there's so many strands of viruses out there, 00:16:24.320 |
that are the deadliest from the virologist's perspective, 00:16:32.160 |
try to see how to, but see, there's a nice aspect to it. 00:16:37.160 |
The really nice thing about engineering viruses, 00:16:44.240 |
is it's hard for it to not lead to mutual self-destruction. 00:16:53.800 |
- Yeah, that's why, you know, in the beginning I said, 00:16:56.240 |
you know, I'm hopeful, because there definitely 00:17:01.760 |
are regulations to be, needed to be introduced. 00:17:10.840 |
we are in charge of, you know, making the right actions, 00:17:31.520 |
by which the virus can become more, you know, 00:17:45.520 |
eventually lead to designing better vaccines, 00:17:50.560 |
And that would be a triumph of the, of science. 00:17:58.600 |
So is that something that, how universal is universal? 00:18:04.960 |
'cause you kind of mentioned the dream of this. 00:18:32.400 |
Now, if the next pandemic, influenza pandemic will occur, 00:18:38.320 |
most likely this vaccine would not save us, right? 00:18:44.160 |
Although it's, you know, it's the same virus, 00:18:52.480 |
So if we're able to essentially design a vaccine 00:19:17.160 |
might've been something that you would be worried 00:19:22.600 |
Well, we're sitting here in the middle of a COVID-19 pandemic 00:20:20.400 |
I mean, luckily for us, this strain was not pandemic. 00:20:25.400 |
All right, so it was jumping from birds to human 00:20:30.460 |
but I don't think it was actually transmittable 00:20:35.200 |
And, you know, this is actually a very interesting question 00:20:44.620 |
between the virus being very contagious, right, 00:20:55.020 |
you know, causing, you know, harms, you know, 00:21:05.300 |
So it looks like that the more pathogenic the virus is, 00:21:28.500 |
if you look at, you know, the deadlier relative, MERS. 00:21:42.280 |
again, the mortality rate from MERS is far above, 00:21:55.320 |
doesn't want us dead, 'cause it's balancing out nicely. 00:21:59.860 |
I mean, how do you explain that we're not dead yet? 00:22:13.400 |
- I mean, we also have, you know, a lot of protection, right? 00:22:31.680 |
And I think with the, now we're much better equipped, right? 00:22:36.000 |
So with the discoveries of vaccines and, you know, 00:22:44.120 |
that maybe 200 years ago would wipe us out completely. 00:22:49.120 |
But because of these vaccines, we are actually, 00:22:53.000 |
we are capable of eradicating pretty much fully, 00:22:58.840 |
- So if we could, can we go to the basics a little bit 00:23:13.740 |
And of course, the first one, the viral particle 00:23:22.340 |
In the case of coronavirus, there is a lot of evidence 00:23:42.260 |
there is a growing number of papers suggesting it. 00:23:45.740 |
Moreover, most recent, I think most recent results 00:23:51.140 |
suggest that this virus attaches more efficiently 00:24:02.900 |
so there is a family of viruses, the coronaviruses, 00:24:12.760 |
- So SARS actually stands for the disease that you get, 00:24:21.020 |
So SARS is the first strand, and then there's MERS. 00:24:27.340 |
- And there is, yes, people, scientists actually know 00:24:46.200 |
And so there is a lot of work done on this virus 00:25:01.220 |
So when you say attach, proteins are involved 00:25:16.880 |
and I mean, that's essentially because of this protein, 00:25:35.940 |
it makes three copies, and it makes so-called trimer. 00:25:40.940 |
So this trimer is essentially a functional unit, 00:25:45.700 |
a single functional unit that starts interacting 00:25:56.780 |
that now sits on the surface of a human cell, 00:26:19.420 |
it fuses its membrane with the host membrane, 00:26:32.500 |
and then essentially hijacks the machinery of the cell, 00:26:37.500 |
because none of the viruses that we know of have ribosome, 00:26:42.900 |
the machinery that allows us to print out proteins. 00:26:53.600 |
that are necessary for functioning of this virus, 00:26:55.940 |
it actually needs to hijack the host ribosomes. 00:27:00.340 |
- So a virus is an RNA wrapped in a bunch of proteins, 00:27:20.540 |
the protein that lives on the surface of the viral particle. 00:27:27.340 |
with the highest number of copies is the membrane protein. 00:27:38.300 |
sorry, the envelope of the protein of the viral particle, 00:27:54.700 |
Then there is another protein called envelope protein 00:27:59.660 |
or E protein, and it actually occurs in far less quantities. 00:28:13.820 |
So these are sort of the three major surface proteins 00:28:34.860 |
So it actually binds to the viral RNA, creates a capsid. 00:28:47.060 |
And, you know, if you compare the amount of the genes 00:28:58.000 |
it's significantly higher than of influenza virus, 00:29:05.020 |
Influenza virus has, I think, around eight or nine proteins 00:29:13.180 |
- Wow, that has to do with the length of the RNA strand. 00:29:17.660 |
- So, I mean, so it affects the length of the RNA strand. 00:29:21.220 |
Right, so because you essentially need to have 00:29:47.340 |
We've yet to uncover all functionalities of its proteins. 00:29:54.520 |
and can you say how one would try to figure out 00:30:02.140 |
So you've mentioned people are still trying to figure out 00:30:06.880 |
what the function of the envelope protein might be 00:30:15.300 |
that computational scientists do might be of help 00:30:19.340 |
because, you know, in the past several decades, 00:30:24.180 |
we actually have collected pretty decent amount of knowledge 00:30:28.940 |
about different proteins in different viruses. 00:30:37.820 |
and this is sort of, could be sort of our first lead 00:30:42.820 |
to a possible function, is to see whether those, 00:30:46.960 |
you know, say we have this genome of the coronavirus, 00:31:08.960 |
In such a way, we can, you know, for example, 00:31:11.880 |
clearly identify, you know, some critical components 00:31:15.640 |
that RNA polymerase or different types of proteases. 00:31:31.640 |
However, in some cases, you have truly novel proteins, 00:31:41.040 |
- Now, as a small pause, when you say similar, 00:31:53.960 |
Of course, you know, what bioinformatics does, 00:32:18.660 |
like reasonable predictions based on structure 00:32:27.640 |
is data grounded, good predictions of what should happen. 00:32:47.360 |
- So yeah, virology is one of the experimental sciences 00:32:53.760 |
They often work with other experimental scientists, 00:32:58.000 |
for example, the molecular imaging scientists, right? 00:33:07.160 |
and reconstructed through electron microscopy techniques. 00:33:33.920 |
So the techniques that are used are very similar 00:33:58.320 |
often involves the collaborations between virologists, 00:34:23.640 |
just how much we understand about RNA and DNA, 00:34:46.520 |
So for me, from a computer science perspective, 00:34:49.520 |
I know how to write a Python program, things are clear, 00:34:55.480 |
it feels like to me, from an outsider's perspective. 00:35:19.240 |
applications to much more fundamental systems, 00:35:27.680 |
or, you know, small chemical compounds, right? 00:35:43.880 |
of, you know, of the computer science, of mathematics. 00:35:54.040 |
that computer science can offer help in this messy world. 00:36:07.720 |
It just seems like a very complicated set of problems, 00:36:39.880 |
not even try to understand and figure out all the details, 00:36:55.800 |
at the beginning of understanding the human mind. 00:36:58.640 |
So where's biology sit in terms of understanding 00:37:21.600 |
to be able to mess with it using an antiviral, 00:37:32.720 |
- I think we are much farther in terms of understanding 00:37:35.680 |
of the complex genetic disorders, such as cancer, 00:37:50.200 |
with how many different layers of complexity, 00:37:56.560 |
that can be hijacked by cancer simultaneously. 00:38:00.200 |
And so, you know, I think biology in the past 20 years, 00:38:18.560 |
And one thing that where computational scientists 00:38:33.920 |
it's coming from the fact that we are now able 00:38:37.280 |
to generate a lot of information about the cell. 00:38:43.280 |
Whether it's next generation sequencing or transcriptomics, 00:39:07.920 |
And now the next step is to become equally efficient 00:39:19.800 |
- That could then be validated with experiment. 00:39:30.600 |
if we can match the new proteins you found in the virus 00:39:39.400 |
but there could be cases where it's a totally new protein. 00:39:50.440 |
and you're probably aware of the case of machine learning, 00:39:54.400 |
many of these methods rely on the previous knowledge. 00:40:00.560 |
So things that where we try to do from scratch 00:40:10.480 |
And this is, I mean, it's not just the function. 00:40:16.960 |
to predict the structures of these proteins in ab initio 00:40:39.120 |
And then somehow, magically, maybe you can tell me, 00:40:50.120 |
- And that's where actually the idea of protein folding, 00:40:57.080 |
or just not the idea, but the problem of figuring out 00:41:00.800 |
- The concept, yeah, how they fold into those weird shapes 00:41:05.520 |
So that's another side of computational work. 00:41:13.680 |
And maybe your thoughts on the folding at home efforts 00:41:16.800 |
that a lot of people know that you can use your machine 00:41:30.680 |
So the reason for that is we've yet to understand 00:41:35.360 |
precisely how the protein gets folded so efficiently, 00:41:40.360 |
to the point that in many cases where you try to unfold it 00:41:47.560 |
due to the high temperature, it actually folds back 00:41:54.040 |
All right, so we know a lot about the mechanisms, right? 00:41:59.320 |
But putting those mechanisms together and making sense, 00:42:14.240 |
can they fold in arbitrary large number of ways, 00:42:16.880 |
or do they usually fold in a very small number of ways? 00:42:19.360 |
- No, it's typically, I mean, we tend to think that 00:42:23.080 |
there is a one sort of canonical fold for a protein, 00:42:26.880 |
although there are many cases where the proteins, 00:42:32.400 |
it can be folded into a different conformation. 00:42:46.160 |
So those structural units, we call them protein domains. 00:42:49.240 |
Essentially, a protein domain is a single unit 00:42:56.960 |
that typically carries out a single function, 00:42:59.640 |
and typically has a very distinct fold, right? 00:43:12.640 |
would have a bit of two or three such subunits. 00:43:19.360 |
And how they are trying to fold into the sort of, 00:43:32.680 |
and then they fold into the larger 3D structure, right? 00:43:42.440 |
but not to put together to be able to fold it. 00:43:44.960 |
- We're still, I mean, we're still struggling. 00:44:07.920 |
proteins that sit in the membranes of the cell. 00:44:19.640 |
- And so basically, there's a lot of degrees of freedom, 00:44:31.280 |
but it doesn't mean that we cannot approach it 00:44:34.240 |
from the non-con, not from the brute force approach. 00:44:43.840 |
you know, have been emerged that try to tackle it. 00:44:57.400 |
and I remember, I was, I mean, it was long time ago. 00:45:07.360 |
'cause it was originally designed as the game. 00:45:18.960 |
you know, it's very transparent, very intuitive. 00:45:27.440 |
but, you know, kids are actually getting very good 00:45:39.680 |
not as a surprise, but actually as the sort of manifest 00:45:44.560 |
of, you know, our capacity to do this kind of, 00:45:52.680 |
when a paper was published in one of these top journals 00:45:57.680 |
with the co-authors being the actual players of this game. 00:46:07.480 |
So, and what happened is, was that they managed 00:46:12.480 |
to get better structures than the scientists themselves. 00:46:17.080 |
So, so that, you know, that was very, I mean, 00:46:23.660 |
it was kind of profound, you know, revelation that problems 00:46:28.680 |
that are so challenging for a computational science, 00:46:35.680 |
maybe not that challenging for a human brain. 00:46:58.280 |
So, if you look at what DeepMind does with AlphaFold. 00:47:03.960 |
- So, they kind of, that's a learning approach. 00:47:07.920 |
I mean, your background is in machine learning, 00:47:14.440 |
Are we in the Garry Kasparov, Deep Blue days? 00:47:19.280 |
Or are we in the AlphaGo playing the game of Go days 00:47:24.520 |
- Well, I think we are advancing towards this direction. 00:47:28.920 |
I mean, if you look, so there is sort of Olympic game 00:47:49.800 |
And of course, there is different sort of subtasks, 00:47:58.400 |
AlphaFold was among the top performing teams, 00:48:05.000 |
So, there is definitely a benefit from the data 00:48:10.000 |
that have been generated in the past several decades, 00:48:22.240 |
to summarize this data, to generalize this data, 00:48:34.160 |
- That's one of the really cool things here is, 00:48:38.600 |
There seems to be these open datasets of protein. 00:48:49.560 |
Is this a recent thing for just the coronavirus? 00:49:07.840 |
I mean, this is a great example of the community efforts 00:49:20.400 |
or a protein complex, this is where you submit it. 00:49:33.920 |
And we, bioinformaticians, use this information 00:49:42.720 |
- So there's no culture of like hoarding discoveries here. 00:49:55.240 |
whatever, we'll talk about details a little bit. 00:50:02.000 |
it's kind of amazing how open the culture here is. 00:50:08.200 |
And I think this pandemic actually demonstrated 00:50:34.900 |
in which people establish new collaborations, 00:50:37.960 |
in which people offer their help to each other. 00:50:50.520 |
to sort of do the very accelerated review cycle, 00:50:54.260 |
but so many preprints, so just posting a paper, going out, 00:51:05.040 |
I mean, the way we think about knowledge, I would say. 00:51:17.080 |
the knowledge is becoming sort of the core value, 00:51:30.400 |
in the times where it becomes really crystallized, 00:51:49.840 |
I hope we get a chance to talk about a little bit, 00:52:08.160 |
of fundamental ideas here in a much shorter amount 00:52:11.420 |
that's less traditionally acceptable by the journal context? 00:52:16.020 |
- So, well, so the first version that we posted, 00:52:34.260 |
So our submission, I think it took five or six days 00:52:43.820 |
So we, essentially we put the first pre-print on our website 00:53:13.940 |
with introducing the information that is necessary 00:53:26.060 |
- So maybe we can dive right in if it's okay. 00:53:30.420 |
- So this is a paper called Structural Genomics of SARS-CoV-2. 00:53:39.140 |
- By the way, COVID is such a terrible name, but it stuck. 00:53:46.540 |
conserved functional regions of viral proteins. 00:54:06.060 |
I mean, there's so many questions I could ask here, 00:54:07.660 |
but maybe at the, how do you get started doing this paper? 00:54:16.340 |
- Yes, so there is actually a little story behind it. 00:54:20.740 |
And so the story actually dated back in September of 2019. 00:54:31.060 |
we had another dangerous virus, triple E virus, 00:54:41.860 |
I have to admit I was sadly completely unaware. 00:54:52.820 |
The danger in this virus was that it actually, 00:55:39.620 |
because the kids were suffering quite tremendously 00:55:57.980 |
in Boston necessarily, but in the Metro West area. 00:56:18.200 |
So these viruses kind of attach to something in the body. 00:56:23.200 |
- So it essentially attaches to these proteins 00:56:43.780 |
from what I read, they have the epithelial cells inside. 00:57:10.100 |
where they are actually expressed in abundance. 00:57:35.460 |
this is something that we would call a neglected disease 00:57:39.340 |
because it's not big enough to make, you know, 00:57:44.340 |
the drug design companies to design a new antiviral 00:57:53.100 |
It's not big enough to generate a lot of grants 00:58:04.700 |
So does it mean we cannot do anything about it? 00:58:09.500 |
And so what I did is I taught a bioinformatics class 00:58:19.100 |
and we are very much a problem learning institution. 00:58:24.100 |
So I thought that that would be a perfect, you know, 00:58:42.100 |
to understand as much as possible about this virus. 00:58:51.660 |
was to understand the structures of the proteins, 00:58:55.740 |
to understand how they interact with each other 00:59:03.820 |
try to understand the evolution of this virus. 00:59:08.580 |
It's obviously, you know, a very important question, 00:59:14.680 |
how, you know, how it happened here, you know. 00:59:28.360 |
where all these undergraduate students will be co-authors. 00:59:51.280 |
And immediately I thought that, well, we just did that. 01:00:00.880 |
And so we started waiting for the genome to be released 01:00:04.640 |
because that's essentially the first piece of information 01:00:10.880 |
you can start doing a lot using bioinformatics. 01:00:23.760 |
the entire information encoded in the protein, right? 01:00:39.040 |
- So genes is essentially is a basic functional unit 01:00:47.120 |
So each gene in the virus would correspond to a protein. 01:00:56.920 |
It needs to be converted or translated into the protein 01:01:12.480 |
So the first step is to figure out the genome, 01:01:16.940 |
the sequence of things that will be then used 01:01:23.360 |
- So then the next step, so once we have this, 01:01:27.440 |
and so we use the existing information about SARS 01:01:41.120 |
and actually other related coronaviruses, MERS, 01:01:47.980 |
And we started by identifying the potential genes 01:01:54.520 |
'cause right now it's just a sequence, right? 01:02:09.160 |
- And we now need to define the boundaries of the genes 01:02:14.160 |
that would then be used to identify the proteins 01:02:23.960 |
- It's not, I mean, it's pretty straightforward. 01:02:45.360 |
this is where sort of the first more traditional 01:02:57.000 |
and get the 3D information about those proteins. 01:03:15.960 |
- And here you're looking for similar proteins. 01:03:25.200 |
it's called homology or template-based modeling. 01:03:43.780 |
- And this is at the micro, at the very local scale and-- 01:04:00.500 |
pretty sophisticated modeling tools to do so. 01:04:08.300 |
Once we get the structures of the individual proteins, 01:04:16.620 |
we try to see whether or not these proteins act alone 01:04:31.580 |
And again, so this is sort of the next level of the modeling 01:04:34.980 |
because now you need to understand how proteins interact. 01:04:39.660 |
And it could be the case that the protein interacts 01:04:44.120 |
with itself and makes sort of a multimeric complex. 01:04:51.520 |
The same protein just repeated multiple times 01:04:54.360 |
and we have quite a few such proteins in SARS-CoV-2, 01:04:59.360 |
specifically spike protein needs three copies to function. 01:05:07.880 |
Envelope protein needs five copies to function. 01:05:14.640 |
And there are some other multimeric complexes. 01:05:18.400 |
- That's what you mean by interacting with itself 01:05:27.160 |
- Well, again, so there are two approaches, right? 01:05:29.460 |
So one is look at the previously solved complexes. 01:05:34.460 |
Now we're looking not at the individual structures 01:05:47.280 |
- And when you say glued, that's the interaction. 01:05:53.160 |
different sort of physical forces behind this. 01:05:59.680 |
but is it the interaction fundamentally structural 01:06:10.560 |
- That's actually a very good way to ask this question 01:06:14.600 |
because it turns out that the interaction is structural 01:06:30.680 |
to carry out very specific function of a protein. 01:06:38.080 |
figuring out you're really starting at the structure 01:06:44.020 |
So there's a beautiful figure two in the paper 01:07:03.440 |
that's through the step two that you mentioned 01:07:08.560 |
when you try to guess at the possible proteins, 01:07:12.200 |
that's what you're going to get is these blue cyan blobs. 01:07:35.680 |
we may not necessarily have the coverage of all 29 proteins. 01:07:40.680 |
However, the biggest advantage is that the accuracy 01:07:45.280 |
in which we can model these proteins is very high, 01:07:55.060 |
- So, but nevertheless, this figure also has, 01:08:12.040 |
so the difference you find is on the 2D sequence, 01:08:15.400 |
and then you try to infer what that will look like 01:08:18.520 |
- Yeah, so the difference actually is on 1D sequence. 01:08:24.760 |
- So, and so this is one of this first questions 01:08:39.160 |
which are SARS and a couple of bad coronavirus strains, 01:08:51.760 |
Now, what are the difference between this virus 01:08:58.360 |
And if you look, typically when you take a sequence, 01:09:03.160 |
those differences could be quite far away from each other. 01:09:08.240 |
So what make, what 3D structure makes those difference 01:09:13.240 |
to do, they very often, they tend to cluster together. 01:09:28.080 |
And sometimes they are there because they correspond, 01:09:36.400 |
So they are there because this is the functional side 01:09:53.480 |
that there's something, this could be actually indicative 01:10:01.360 |
and the 3D structure gives us a lot of information 01:10:11.400 |
to look at this information and then start to ask, 01:10:19.360 |
so this place of this protein that is highly mutated, 01:10:23.200 |
does it, is it the functional part of the protein? 01:10:36.960 |
with some other proteins or maybe with some other ligands, 01:10:50.000 |
- So you have a bunch of these mutated parts, 01:10:55.240 |
if like, I don't know, like how many are there 01:11:01.200 |
in the new novel coronavirus when you compare it to SARS? 01:11:10.880 |
And it's very interesting that if you look at that, 01:11:22.960 |
is that some of the proteins in the new coronavirus 01:11:31.640 |
So they're pretty much exactly the same as SARS, 01:11:38.560 |
whereas some others are heavily mutated, right? 01:11:47.880 |
is not occurring uniformly across the entire viral genome, 01:12:05.640 |
- Well, you know, so one of the most interesting findings 01:12:25.040 |
that get targeted by the known small molecules, 01:12:39.840 |
or small drug-like compounds can be efficient 01:12:51.000 |
- Ah, so this all actually maps to the drug compounds too, 01:13:13.560 |
and which parts are likely to behave similar. 01:13:25.840 |
but hopefully that sort of helps us to delineate 01:13:30.520 |
the regions of this virus that can be promising 01:13:45.720 |
and functional perspective does the new coronavirus 01:13:56.060 |
the overall structural characteristics of this virus, 01:14:09.220 |
- So that means you have the individual proteins, 01:14:15.500 |
Is that where this graph kind of interactome-- 01:14:25.020 |
so our prediction on the potential interactions, 01:14:55.220 |
- Yeah, are those already converged towards for SARS? 01:15:07.220 |
that now investigate the sort of the large-scale 01:15:14.940 |
set of interactions between the new SARS and its host. 01:15:19.940 |
And so I think that's an ongoing study, I think-- 01:15:32.740 |
not trying to figure out the entire particle, 01:15:40.260 |
So what this viral particle looks like, right? 01:15:43.900 |
So as I said, the surface of it is an envelope, 01:15:48.900 |
which is essentially a so-called lipid bilayer 01:15:59.100 |
So an average particle is around 80 nanometers, right? 01:16:09.780 |
So this particle can have about 50 to 100 spike proteins. 01:16:26.480 |
it's very comparable to MHV virus in mice and SARS virus. 01:16:31.480 |
- Micrographs are actual pictures of the actual-- 01:16:36.300 |
- Okay, so these are models, this is actual-- 01:16:56.200 |
- Well, they typically are not perfect, right? 01:17:12.660 |
And now, our collaborators for Texas A&M University, 01:17:27.800 |
and there's some actually evidence behind it, 01:17:34.820 |
but is actually is an elongated ellipsoid-like particle. 01:17:39.820 |
So that's what we are trying to incorporate into our model. 01:17:47.260 |
And I mean, if you look at the actual micrographs, 01:17:54.660 |
you see that those particles are not symmetric, 01:18:05.520 |
it could be due to the treatment of the material, 01:18:10.360 |
it could be due to some noise in the imaging. 01:18:14.880 |
- Right, so there's a lot of uncertainty in all this. 01:18:16.800 |
So it's okay, so structurally figuring out the entire part. 01:18:31.440 |
virion, so virion particle, it's essentially a single virus. 01:18:37.920 |
'cause particle to me, from the physics perspective, 01:18:44.500 |
'cause there seems to be so much going on inside the virus. 01:18:51.000 |
- Yeah, well, yeah, it's probably, I think it's, 01:19:05.440 |
- Yes, so this is, so the virion has 50 to 100 spikes, 01:19:19.920 |
membrane protein dimers, and those are arranged 01:19:26.800 |
in a very nice lattice, so you can actually see 01:19:38.120 |
And occasionally you also see this envelope protein inside. 01:19:44.480 |
- Is that the one we don't know what it does? 01:19:46.080 |
- Exactly, exactly, the one that forms the pentamer, 01:19:51.840 |
And so, you know, so this is what we're trying to, 01:19:55.440 |
you know, we're trying to put now all our knowledge together 01:20:03.480 |
this overall virion model with an idea to understand, 01:20:08.480 |
you know, well, first of all, to understand how it looks like 01:20:16.720 |
how far it is from those images that were generated. 01:20:27.920 |
nanoparticle design that will mimic this virion particle. 01:20:41.600 |
- Yes, you know, so the one that can potentially compete 01:20:50.560 |
and therefore reduce the effect of the infection. 01:20:55.460 |
- So is this the idea of, like, what is a vaccine? 01:21:02.500 |
so there are two ways of essentially treating, 01:21:06.420 |
and in the case of vaccine is preventing the infection. 01:21:10.640 |
So vaccine is, you know, a way to train our immune system. 01:21:15.640 |
So our immune system becomes aware of this new danger. 01:21:25.300 |
And therefore is capable of generating the antibodies, 01:21:30.300 |
then will essentially bind to the spike proteins, 01:21:35.820 |
'cause that's the main target for the, you know, 01:21:40.020 |
for the vaccine's design, and block its functioning. 01:21:45.020 |
If you have the spike with the antibody on top, 01:21:55.200 |
- So the process of designing a vaccine, then, 01:22:00.780 |
is you have to understand enough about the structure 01:22:03.100 |
of the virus itself to be able to create an artificial, 01:22:12.180 |
nanoparticle is a very exciting and new research. 01:22:16.940 |
So there are already established ways to, you know, 01:22:21.020 |
to make vaccines, and there are several different ones, 01:22:25.460 |
right, so there is one where essentially the virus 01:22:30.460 |
gets through the cell culture multiple times, 01:22:47.380 |
compatible with the, you know, host human cells. 01:22:52.380 |
So, and therefore it's sort of the idea of the life vaccine, 01:23:12.220 |
And they can be introduced to the immune system. 01:23:18.180 |
and the person who gets this vaccine won't get, you know, 01:23:23.180 |
sick or, you know, will have mild, you know, mild symptoms. 01:23:29.460 |
So then there is sort of different types of the way 01:23:40.640 |
or the virus where some of the information is stripped down. 01:23:45.680 |
For example, the virus with no genetic material, 01:23:54.220 |
It cannot essentially perform most of its function. 01:24:00.140 |
What is the biggest hurdle to design one of these, 01:24:08.340 |
in the fundamental understanding of this new virus, 01:24:14.140 |
well, complicated world of experimental validation, 01:24:19.980 |
like going through the whole process of showing 01:24:21.740 |
this is actually gonna work with FDA approval, 01:24:28.020 |
of the molecular mechanisms will allow us to, you know, 01:24:32.380 |
to design, to have more efficient designs of the vaccines. 01:24:36.340 |
However, once you design the vaccine, it needs to be tested. 01:24:46.540 |
it seems like an exceptionally, historically speaking, 01:24:52.060 |
even 18 months seems like a very accelerated timeline. 01:25:08.140 |
and properly test a vaccine before its mass production. 01:25:25.420 |
I mean, the scientific community is really stepping up 01:25:28.440 |
The collaborative aspect is really interesting. 01:25:33.020 |
and then there's antivirals, antiviral drugs. 01:25:38.420 |
vaccines are typically needed to prevent the infection. 01:26:06.780 |
So there are a number of interesting candidates. 01:27:07.480 |
so antiviral drugs mess with some aspect of this process. 01:27:32.200 |
There is one that was originally designed for malaria, 01:27:37.200 |
which is a bacterial, you know, bacterial disease, so. 01:27:50.500 |
so like providing candidates for where drugs can plug in. 01:28:09.060 |
Now, is it gonna be efficient against new SARS-CoV-2? 01:28:14.060 |
We don't know that, and there are multiple aspects 01:29:02.740 |
are pretty much intact, which is very promising. 01:29:07.500 |
- So if we could just like zoom out for a second, 01:29:26.320 |
get figured out very quickly, probably drugs first. 01:29:49.560 |
Do you see the anti-drugs or the work you're doing 01:30:07.080 |
So the social distancing, things like wearing masks, 01:30:14.760 |
will be the method with which we fight coronavirus 01:30:30.520 |
I would view that as the, at least the short-term solution. 01:30:41.860 |
of those new drug candidates being administered 01:30:55.920 |
- But do we need it in order to reopen the economy? 01:31:02.080 |
I cannot sort of speculate on how that will affect 01:31:11.640 |
you know, we are kind of deep into the pandemic, 01:31:16.640 |
and it's not just the states, it's also, you know, 01:31:28.640 |
of the second wave, as we, you know, as you mentioned, 01:31:33.640 |
and this is why, you know, we need to be super careful. 01:31:50.600 |
- Are you worried about the mutation of the virus? 01:31:58.960 |
Now, how, to what extent this virus can mutate, 01:32:13.340 |
and to become transmittable between humans, right? 01:32:36.900 |
- This is such a beautiful and terrifying process 01:32:41.020 |
that a virus, some viruses may be able to mutate 01:33:07.700 |
that makes it, you know, it just, the way it evolves, 01:33:12.820 |
and actually, it's the way it co-evolves with its host. 01:33:17.120 |
- It's just amazing, especially the evolutionary mechanisms, 01:33:22.260 |
especially amazing given how simple the virus is. 01:33:27.260 |
It's incredible that it's, I mean, it's beautiful. 01:33:32.700 |
It's beautiful because it's one of the cleanest examples 01:33:38.940 |
- Well, I think, I mean, the, one of the sort of, 01:33:47.180 |
it does not require all the necessary functions 01:33:58.460 |
of the necessary functions from the host cell. 01:34:05.380 |
in my view, reduces the complexity of this machine 01:34:10.380 |
drastically, although if you look at the, you know, 01:34:25.980 |
it actually, those discoveries made scientists 01:34:31.700 |
to reconsider the origins of the virus, you know, 01:34:36.700 |
and what are the mechanisms and how, you know, 01:34:40.460 |
what are the mechanisms, the evolutionary mechanisms 01:34:46.900 |
- By the way, I mean, you did mention that viruses are, 01:34:50.460 |
I think you mentioned that they're not living. 01:34:56.500 |
Why do you think they're not living organisms? 01:35:15.420 |
Let me be the philosophical devil's advocate here and say, 01:35:27.940 |
So you can basically take every living organism 01:35:35.100 |
it's always going to have some aspects of its host 01:35:42.540 |
So is that really the key aspect of why a virus, 01:35:49.820 |
Because it seems to be very good at doing so many things 01:36:00.980 |
- Well, I mean, it, yeah, it's difficult to answer 01:36:42.380 |
That's how I separate the idea of the living organism, 01:36:47.380 |
on a very high level, between the living organism and-- 01:36:54.100 |
these are just terms, and perhaps they don't mean much, 01:36:57.500 |
but we have some kind of sense of what autonomous means, 01:37:24.580 |
and then the spread of viruses from one to the other. 01:37:27.220 |
How does, at a high level, agent-based simulation work? 01:37:33.240 |
- All right, so it's also one of this irony of timing, 01:37:51.580 |
from my PhD student that the last experiments were completed. 01:38:16.880 |
the code name, it started with a bunch of undergraduates. 01:38:48.260 |
one of the biggest threats to the cruise ship economy. 01:39:00.560 |
And this is essentially one of this stomach flus 01:39:12.940 |
So there are, occasionally there are cruise ships get, 01:39:20.300 |
they get returned to the, back to the origin. 01:39:34.300 |
So we wanted to study this in a confined environment, 01:39:41.500 |
It could be other, you know, other places such as, 01:40:01.960 |
So we can actually see the whole course of the evolution, 01:40:15.260 |
and, you know, the host and the pathogen, et cetera. 01:40:50.320 |
that's where, you know, we introduce some novelty 01:40:58.940 |
So that allowed us to essentially model the behavior 01:41:06.400 |
on the horse side, as well on the pathogen side. 01:41:16.860 |
that allows us to integrate all the key parameters 01:41:36.480 |
How long does virus survive on the surface, the fomite? 01:41:43.100 |
What is, you know, how much of the viral particles 01:41:52.540 |
does a horse shed when he or she is asymptomatic 01:42:02.900 |
- And you can encode all of that into this path. 01:42:08.360 |
usually the agent represents a single human being. 01:42:11.700 |
And then there's some graphs, like contact graphs 01:42:16.080 |
that represent the interaction between those human beings. 01:42:30.700 |
And we can provide instructions for these agents 01:42:51.540 |
It's kind of a brilliant, like a brilliant way 01:43:06.980 |
- So yeah, it was, you know, we realized that, 01:43:18.460 |
by very specific sort of aspects of the specific virus. 01:43:31.040 |
but we continued to modeling Ebola virus outbreak, 01:43:51.380 |
So we actually modeled the virus from the contagion movie. 01:44:02.060 |
that virus, and we tried to extract as much information. 01:44:06.980 |
Luckily, this movie was, the scientific consultant 01:44:11.580 |
was Jan Lipkin, a virologist from Columbia University, 01:44:17.020 |
who is actually, who provided, I think he designed 01:44:22.020 |
this virus for this movie based on Nipah virus, 01:44:34.140 |
And, you know, the movie surprisingly contained 01:44:39.140 |
enough details for us to extract and to model it. 01:44:43.740 |
- I was hoping you would like publish a paper 01:44:57.500 |
actually being a scientist and studying the virus 01:45:07.420 |
is assignment number one in my bioinformatics class 01:45:24.460 |
- So there is, you know, approximately a week 01:45:28.860 |
from the virus detection, we see a screenshot 01:45:41.460 |
you know, if you ask an experimental biologist, 01:45:56.220 |
If you ask a bioinformatician, they tell you, 01:46:23.520 |
So it was a lot of scientific thought put into the movie. 01:46:31.280 |
it was interesting to learn is that the origin 01:46:38.480 |
that led to the, you know, the zoonotic origin 01:46:57.920 |
this definitely feels like we're living in a simulation. 01:47:10.840 |
but larger scale, are used now to drive some policy. 01:47:14.880 |
So politicians use them to tell stories and narratives 01:47:24.200 |
But in your sense, are agent-based simulation 01:47:34.720 |
relative comparison of different intervention methods? 01:47:44.500 |
we essentially learning that the current intervention 01:47:53.600 |
One thing that, one important aspect that I find 01:48:09.240 |
that was overlooked, you know, during the past pandemics, 01:48:19.480 |
This virus is different because it has such a long 01:48:49.420 |
for when you're asymptomatic, like that aspect, 01:49:04.080 |
how contagious the person with asymptomatic virus 01:49:21.240 |
So, so far, what I saw is the study that tells us 01:49:31.640 |
that the, you know, the person during the asymptomatic period 01:49:41.040 |
the person sheds enough viruses to infect another host. 01:49:46.040 |
- Yeah, and I think there's so many excellent papers 01:49:50.560 |
coming out, but I think I just saw maybe a Nature paper 01:49:53.520 |
that said the first week is when you're symptomatic 01:50:00.920 |
so the highest level of, like, they plot sort of, 01:50:05.200 |
in the 14-day period, they collected a bunch of subjects, 01:50:09.200 |
and I think the first week is when it's the most contagious. 01:50:15.360 |
I'm waiting to see sort of more populated studies, 01:50:29.400 |
a very recent one, where scientists determined 01:50:37.680 |
So, so there is, you know, so there is no viral shedding 01:50:45.660 |
- So they found one moist thing that's not contagious, 01:50:51.160 |
and I mean, there's a lot of, I've personally been, 01:50:59.240 |
that's looking at masks, and there's been so much 01:51:03.320 |
interesting debates on the efficacy of masks, 01:51:05.720 |
and there's a lot of work, and there's a lot of 01:51:08.720 |
interesting work on whether this virus is airborne. 01:51:19.000 |
whether it can travel in aerosols long distances. 01:51:22.920 |
I mean, do you have a, do you think about this stuff, 01:51:25.120 |
do you track this stuff, are you focused on the-- 01:51:27.240 |
- Yeah, I mean, you know-- - The bioinformatics of it? 01:51:35.700 |
I think the, I mean, and it's sort of a very simple 01:51:41.880 |
sort of idea, but I agree with people who say that 01:51:55.780 |
So it not only protects you from the incoming viral 01:52:00.780 |
particles, it also, you know, it makes the potentially 01:52:06.300 |
contagious person not to spread the viral particles. 01:52:36.860 |
- Yeah, exactly, 'cause I mean, you don't know where, 01:52:39.660 |
you know, and about 30%, as far as I remember, 01:52:53.540 |
I mean, you don't have any symptoms, yet you shed viruses. 01:52:59.020 |
- Do you think it's possible that we'll all wear masks? 01:53:06.500 |
this was like a week ago, maybe it's already changed, 01:53:12.780 |
I think the CDC has said that we should be wearing masks, 01:53:20.260 |
that this country will really struggle doing, or no? 01:53:33.220 |
during the Spanish flu, and you could see that the, 01:53:38.220 |
you know, pretty much everyone was wearing masks, 01:53:43.860 |
with some exceptions, and there were like, you know, 01:54:10.780 |
So how much do we treat this problem seriously? 01:54:31.940 |
just worry about the entirety, the entire big mess 01:54:42.620 |
You know, masks have a way of distancing us from others 01:54:50.120 |
and all that kind of stuff, but at the same time, 01:54:52.540 |
masks also signal that I care about your well-being. 01:54:58.540 |
- So it's a really interesting trade-off that's just-- 01:55:02.260 |
- Yeah, it's interesting, right, about distancing. 01:55:16.460 |
that's going to be a long road of rebuilding trust 01:55:23.620 |
Let me ask sort of, you have a bit of a Russian accent. 01:56:43.920 |
I wanted to be a pilot of a passenger jet plane. 01:57:10.580 |
Of course, math was one of my favorite subjects, 01:57:16.580 |
somehow I liked those four subjects together. 01:57:21.580 |
And yes, so essentially after a certain period of time, 01:57:32.500 |
I wanted to actually, back then it was a very popular 01:57:42.820 |
So it's sort of, it's not really computer science, 01:57:45.440 |
but it was like computational robotics in this sense. 01:58:14.300 |
I also realized that I really want to apply the knowledge. 01:58:54.480 |
From my distant perspective in computer science, 01:58:57.860 |
I don't, I'm not, we can go to conferences in Russia. 01:59:02.100 |
I sadly don't have many collaborators in Russia. 01:59:05.260 |
I don't know many people doing great AI work in Russia. 01:59:23.860 |
And I think this is the bioinformatics school 01:59:30.680 |
- In Moscow, in Novosibirsk, in St. Petersburg, 02:00:14.660 |
- It could be bioinformatics, too, and it could, yeah, 02:00:22.660 |
that I would, you know, you talk about the seminal people 02:00:30.100 |
France welcomes them with open arms in so many ways. 02:00:37.020 |
I do on the human beings, like people in general, 02:00:40.980 |
like friends and just cool, interesting people, 02:00:44.820 |
but from the scientific community, no conferences, 02:00:50.660 |
- Yeah, it's actually, you know, I'm trying to think. 02:00:54.500 |
Yeah, I cannot recall any big AI conferences in Russia. 02:01:04.380 |
I haven't, sadly, been back to Russia, so I should, 02:01:15.060 |
- I mean, I'm a citizen in the United States, 02:01:21.140 |
- I wanna be able to travel, like, you know, legitimately. 02:01:37.340 |
this unfortunate circumstances that we are in 02:01:41.740 |
will actually promote the remote collaborations. 02:01:49.220 |
- And I think we've, I think what we are experiencing 02:01:55.760 |
you know, being quarantined in your own homes. 02:02:01.540 |
- Especially when it comes, I mean, you know, 02:02:02.900 |
I certainly understand there is a very challenging time 02:02:07.140 |
I mean, I have many collaborators who are, you know, 02:02:10.740 |
who are affected by that, but for computational scientists. 02:02:14.260 |
- Yeah, we're really leaning into the remote communication. 02:02:17.400 |
Nevertheless, I had to force you to talk to you in person 02:02:21.660 |
'cause there's something that you just can't do 02:02:25.600 |
I don't know why, but in person is very much needed. 02:02:31.900 |
You have a collection of science bobbleheads. 02:02:46.900 |
So yeah, by the way, I was trying to bring it in, 02:02:53.980 |
They sort of demonstrate the social distance. 02:02:56.620 |
So they're nicely spaced away from each other. 02:03:14.540 |
it started with the two bobbleheads of Watson and Crick. 02:03:23.100 |
my last bobblehead in this collection for now, 02:03:26.740 |
and my favorite one, 'cause I felt so good when I got it, 02:03:47.580 |
of course, Charles Darwin, sorry, Charles Darwin, 02:04:17.360 |
- That's, you know, I've been always fascinated by 02:04:25.820 |
but also his dedication to educating young people, 02:05:11.900 |
would be Alan Turing, would be John von Neumann. 02:05:21.580 |
- Yes, well, I mean, they don't make them, you know. 02:05:24.580 |
I still am amazed they haven't made Alan Turing. 02:05:29.540 |
- Yet, yes, and I would also add Linus Pauling. 02:05:38.500 |
- So this is, to me, is one of the greatest chemists, 02:05:53.620 |
who was very close to solving the DNA structure, 02:06:02.220 |
but some of them were pretty sure that if not 02:06:07.220 |
for this, you know, photograph 51 by Rosalind Franklin, 02:06:12.380 |
that, you know, Watson and Cree got access to, 02:06:39.420 |
and its defenses and these enemies that are about, 02:06:44.420 |
from a biological perspective, bioinformatics perspective, 02:06:59.100 |
or just even seeing what it means to be human? 02:07:15.900 |
can impact the life of the whole humankind to such extent, 02:08:03.260 |
- Well, I don't think there's a better way to end it. 02:08:14.740 |
and thank you to our presenting sponsor, Cash App. 02:08:22.980 |
If you enjoy this podcast, subscribe on YouTube, 02:08:28.940 |
or simply connect with me on Twitter @LexFriedman. 02:08:40.100 |
"The variety of genes on the planet in viruses exceeds 02:08:50.020 |
Thank you for listening and hope to see you next time.