back to index

Dmitry Korkin: Computational Biology of Coronavirus | Lex Fridman Podcast #90


Chapters

0:0 Introduction
2:33 Viruses are terrifying and fascinating
6:2 How hard is it to engineer a virus?
10:48 What makes a virus contagious?
29:52 Figuring out the function of a protein
53:27 Functional regions of viral proteins
79:9 Biology of a coronavirus treatment
94:46 Is a virus alive?
97:5 Epidemiological modeling
115:27 Russia
122:31 Science bobbleheads
126:31 Meaning of life

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Dmitry Korkin.
00:00:02.720 | He's a professor of bioinformatics
00:00:04.560 | and computational biology at WPI,
00:00:07.280 | Worcester Polytechnic Institute,
00:00:09.320 | where he specializes in bioinformatics of complex diseases,
00:00:12.980 | computational genomics, systems biology,
00:00:16.000 | and biomedical data analytics.
00:00:18.360 | I came across Dmitry's work when in February,
00:00:21.400 | his group used the viral genome of the COVID-19
00:00:25.040 | to reconstruct the 3D structure of its major viral proteins
00:00:29.040 | and their interaction with the human proteins,
00:00:32.360 | in effect, creating a structural genomics map
00:00:34.920 | of the coronavirus and making this data open
00:00:37.560 | and available to researchers everywhere.
00:00:40.180 | We talked about the biology of COVID-19,
00:00:42.320 | SARS, and viruses in general,
00:00:44.480 | and how computational methods can help us understand
00:00:47.740 | their structure and function
00:00:49.360 | in order to develop antiviral drugs and vaccines.
00:00:52.840 | This conversation was recorded recently
00:00:56.360 | in the time of the coronavirus pandemic.
00:00:58.680 | For everyone feeling the medical, psychological,
00:01:01.040 | and financial burden of this crisis,
00:01:03.000 | I'm sending love your way.
00:01:04.760 | Stay strong, we're in this together, we'll beat this thing.
00:01:07.800 | This is the Artificial Intelligence Podcast.
00:01:11.520 | If you enjoy it, subscribe on YouTube,
00:01:13.680 | review it with Five Stars and Apple Podcast,
00:01:16.040 | support it on Patreon, or simply connect with me on Twitter
00:01:19.400 | at Lex Friedman, spelled F-R-I-D-M-A-N.
00:01:23.480 | This show is presented by Cash App,
00:01:25.560 | the number one finance app in the App Store.
00:01:27.720 | When you get it, use code LEX, podcast.
00:01:30.560 | Cash App lets you send money to friends, buy Bitcoin,
00:01:33.680 | and invest in the stock market with as little as $1.
00:01:36.680 | Since Cash App allows you to buy Bitcoin,
00:01:38.880 | let me mention that cryptocurrency
00:01:40.840 | in the context of the history of money is fascinating.
00:01:44.000 | I recommend "A Scent of Money"
00:01:46.120 | as a great book on this history.
00:01:48.200 | Debits and credits on ledgers
00:01:49.840 | started around 30,000 years ago.
00:01:52.760 | The US dollar created over 200 years ago.
00:01:55.680 | And Bitcoin, the first decentralized cryptocurrency,
00:01:58.440 | released just over 10 years ago.
00:02:00.600 | So given that history, cryptocurrency is still very much
00:02:04.120 | in its early days of development,
00:02:06.000 | but it's still aiming to, and just might,
00:02:08.760 | redefine the nature of money.
00:02:10.820 | So again, if you get Cash App from the App Store,
00:02:15.120 | Google Play, and use the code LEX, podcast,
00:02:18.240 | you get $10, and Cash App will also donate $10 to FIRST,
00:02:22.160 | an organization that is helping to advance robotics
00:02:25.000 | and STEM education for young people around the world.
00:02:28.480 | And now, here's my conversation with Dmitry Korkin.
00:02:32.260 | Do you find viruses terrifying or fascinating?
00:02:36.920 | - When I think about viruses, I think about them,
00:02:42.280 | I mean, I imagine them as those villains
00:02:47.280 | that do their work so perfectly well.
00:02:53.000 | That is impossible not to be fascinated with them.
00:02:57.600 | - So what do you imagine when you think about a virus?
00:03:00.040 | Do you imagine the individual,
00:03:02.840 | sort of these 100 nanometer particle things?
00:03:07.280 | Or do you imagine the whole pandemic, like society level,
00:03:11.040 | when you say the efficiency at which they do their work,
00:03:15.200 | do you think of viruses as the millions
00:03:18.960 | that occupy a human body or a living organism,
00:03:23.560 | society level, like spreading as a pandemic,
00:03:26.680 | or do you think of the individual little guy?
00:03:29.240 | - Yes, I think this is a unique concept
00:03:34.240 | that allows you to move from micro scale to the macro scale.
00:03:39.680 | All right, so the virus itself,
00:03:41.720 | I mean, it's not a living organism.
00:03:45.080 | It's a machine, to me, it's a machine.
00:03:48.800 | But it is perfected to the way that it essentially
00:03:53.360 | has a limited number of functions
00:03:57.000 | it needs to do, necessary functions.
00:04:00.560 | And it essentially has enough information
00:04:05.560 | just to do those functions,
00:04:07.760 | as well as the ability to modify itself.
00:04:11.640 | So it's a machine, it's an intelligent machine.
00:04:18.280 | - So yeah, look, maybe on that point,
00:04:20.240 | you're in danger of reducing the power of this thing
00:04:23.320 | by calling it a machine, right?
00:04:25.000 | But you now mentioned that it's also possibly intelligent.
00:04:30.520 | It seems that there's these elements of brilliance
00:04:34.360 | that a virus has, of intelligence,
00:04:37.520 | of maximizing so many things about its behavior
00:04:42.520 | and to ensure its survival and its success.
00:04:48.240 | So do you see it as intelligent?
00:04:51.440 | - So I think it's a different,
00:04:55.880 | I understand it differently than I think about intelligence
00:05:00.880 | of humankind or intelligence of the artificial intelligence
00:05:06.760 | mechanisms.
00:05:13.280 | I think the intelligence of a virus is in its simplicity,
00:05:18.280 | the ability to do so much
00:05:26.120 | with so little material and information.
00:05:31.020 | But also I think it's interesting.
00:05:36.520 | It keeps me thinking, it keeps me wondering
00:05:41.360 | whether or not it's also an example
00:05:46.000 | of the basic swarm intelligence,
00:05:50.020 | where essentially the viruses act as the whole
00:05:56.200 | and they're extremely efficient in that.
00:06:00.820 | - So what do you attribute the incredible simplicity
00:06:05.240 | and the efficiency to?
00:06:07.560 | Is it the evolutionary process?
00:06:09.680 | So maybe another way to ask that,
00:06:12.040 | if you look at the next 100 years,
00:06:14.320 | are you more worried about the natural pandemics
00:06:18.320 | or the engineered pandemics?
00:06:20.360 | So how hard is it to build a virus?
00:06:23.080 | - Yes, it's a very, very interesting question
00:06:26.200 | because obviously there's a lot of conversations
00:06:30.640 | about whether we are capable of engineering
00:06:38.080 | a, you know, an even worse virus.
00:06:42.800 | I personally expect and am mostly concerned
00:06:47.640 | with the naturally occurring viruses
00:06:51.300 | simply because we keep seeing that.
00:06:55.140 | We keep seeing new strains of influenza emerging,
00:06:59.880 | some of them becoming pandemic.
00:07:02.200 | We keep seeing new strains of coronaviruses emerging.
00:07:07.800 | This is a natural process
00:07:09.840 | and I think this is why it's so powerful.
00:07:14.340 | You know, if you ask me, you know,
00:07:20.960 | I've read papers about scientists trying to study
00:07:25.960 | the capacity of the modern, you know, biotechnology
00:07:32.520 | to alter the viruses.
00:07:37.600 | But I hope that, you know,
00:07:42.000 | it won't be our main concern in the nearest future.
00:07:47.000 | - What do you mean by hope?
00:07:51.520 | - Well, you know, if you look back
00:07:57.080 | and look at the history of the most dangerous viruses,
00:08:02.400 | right, so the first thing that comes into mind
00:08:05.880 | is smallpox.
00:08:08.580 | So right now there is perhaps a handful of places
00:08:14.520 | where this, you know, the strains of this virus are stored.
00:08:21.080 | Right, so this is essentially the effort
00:08:24.120 | of the whole society to limit the access to those viruses.
00:08:29.120 | - You mean in a lab in a controlled environment
00:08:34.280 | in order to study? - Correct.
00:08:35.680 | - And then smallpox is one of the viruses
00:08:37.680 | for which it should be stated there's a vaccine
00:08:42.680 | is developed. - Yes, yes.
00:08:44.360 | And that's, you know, it's until '70s,
00:08:48.040 | I mean, in my opinion, it was perhaps the most dangerous
00:08:53.400 | thing that was there.
00:08:56.640 | - Is that a very different virus
00:08:59.200 | than the influenza and the coronaviruses?
00:09:04.120 | - It is, it is different in several aspects.
00:09:08.840 | Biologically, it's a so-called double-stranded DNA virus,
00:09:13.840 | but also in the way that it is much more contagious.
00:09:21.040 | So the R naught for, so this is the--
00:09:29.600 | - What's R naught?
00:09:31.000 | - R naught is essentially an average number
00:09:34.320 | as person infected by the virus can spread to other people.
00:09:40.480 | So then the average number of people
00:09:45.200 | that he or she can spread it to.
00:09:49.480 | And, you know, there is still some discussion
00:09:55.240 | about the estimates of the current virus.
00:10:00.280 | You know, the estimations vary between, you know,
00:10:03.480 | 1.5 and three.
00:10:05.500 | In case of smallpox, it was five to seven.
00:10:11.000 | And we're talking about the exponential growth, right?
00:10:17.560 | So that's a very big difference.
00:10:21.120 | It's not the most contagious one.
00:10:25.680 | Measles, for example, it's, I think,
00:10:28.880 | 15 and up, so it's, you know,
00:10:33.040 | but it's definitely, definitely more contagious
00:10:38.040 | that the seasonal flu than the current coronavirus
00:10:43.560 | or SARS, for that matter.
00:10:47.600 | - What makes the virus more contagious?
00:10:52.320 | Well, I'm sure there's a lot of variables
00:10:54.080 | that come into play, but is it that whole discussion
00:10:57.680 | of aerosol and, like, the size of droplets,
00:11:00.880 | if it's airborne, or is there some other stuff
00:11:03.280 | that's more biology-centered?
00:11:05.080 | - I mean, there are a lot of components,
00:11:06.640 | and there are biological components
00:11:09.760 | that are also, you know, social components.
00:11:13.320 | The ability of the virus to, you know,
00:11:17.880 | so the ways in which the virus is spread is definitely one.
00:11:22.240 | The ability to virus to stay on the surfaces.
00:11:27.920 | To survive.
00:11:29.320 | The ability of the virus to replicate fast,
00:11:34.040 | or so, you know--
00:11:34.880 | - Or once it's in the cell, or whatever.
00:11:37.520 | - Once it's inside the host.
00:11:39.960 | And, interestingly enough, something that
00:11:42.680 | I think we didn't pay that much attention to
00:11:48.120 | is the incubation period, where, you know,
00:11:52.400 | hosts are symptomatic.
00:11:53.760 | And now it turns out that another thing
00:11:55.920 | that we, one really needs to take into account,
00:11:59.840 | the percentage of the symptomatic population.
00:12:05.680 | Because those people still shed this virus,
00:12:09.920 | and still are, you know, they still are contagious.
00:12:13.600 | - I saw there's an, the Iceland study,
00:12:15.560 | which I think is probably the most impressive size-wise,
00:12:18.480 | shows 50% asymptomatic, this virus.
00:12:23.600 | I also recently learned the swine flu
00:12:27.720 | is like, just the number of people who got infected
00:12:33.720 | was in the billions.
00:12:36.160 | It was some crazy number.
00:12:37.520 | It was like, it was like,
00:12:39.260 | like 20% of the, 30% of the population,
00:12:43.200 | something crazy like that.
00:12:44.320 | So the lucky thing there is the fatality rate is low.
00:12:48.820 | But the fact that a virus can just take over
00:12:52.320 | an entire population so quickly, it's terrifying.
00:12:56.920 | - I think, I mean, this is, you know,
00:13:00.100 | that's perhaps my favorite example of a butterfly effect.
00:13:03.620 | Because it's really, I mean,
00:13:05.840 | it's even tinier than a butterfly.
00:13:09.540 | And look at, you know, and with, you know,
00:13:11.720 | if you think about it, right,
00:13:13.120 | so it used to be in those bat species.
00:13:19.440 | And perhaps because of, you know,
00:13:23.280 | a couple of small changes in the viral genome,
00:13:28.280 | it first had, you know, become capable
00:13:32.600 | of jumping from bats to human.
00:13:34.840 | And then it became capable of jumping
00:13:37.480 | from human to human, right?
00:13:39.440 | So this is, I mean, it's not even the size of a virus.
00:13:42.720 | It's the size of several, you know,
00:13:45.960 | several atoms or a few atoms.
00:13:50.640 | And all of a sudden this change has such a major impact.
00:13:55.640 | - So is that a mutation like on a single virus?
00:14:01.560 | Is that like, so if we talk about those,
00:14:04.640 | the flap of a butterfly wing, like what's the first flap?
00:14:09.360 | - Well, I think this is the mutations
00:14:12.880 | that made this virus capable of jumping
00:14:17.880 | from bat species to human.
00:14:20.400 | Of course, there's, you know,
00:14:22.600 | the scientists are still trying to find,
00:14:24.720 | I mean, they're still even trying to find
00:14:27.080 | who was the first infected, right?
00:14:29.680 | The patient zero.
00:14:31.280 | - The first human.
00:14:32.360 | - The first human infected, right?
00:14:34.960 | I mean, the fact that there are coronaviruses,
00:14:38.200 | different strains of coronaviruses
00:14:40.520 | in various bat species, I mean, we know that.
00:14:43.520 | So we, you know, virologists observe them,
00:14:47.400 | they study them, they look at their genomic sequences.
00:14:51.720 | They're trying, of course, to understand
00:14:54.360 | what make this viruses to jump from bats to human.
00:14:59.360 | There was, you know, similar to that,
00:15:04.160 | and, you know, in influenza,
00:15:05.880 | there was, I think a few years ago,
00:15:07.560 | there was this, you know, interesting story
00:15:12.560 | where several groups of scientists
00:15:17.360 | studying influenza virus, essentially, you know,
00:15:21.840 | made experiments to show that this virus
00:15:25.760 | can jump from one species to another, you know,
00:15:30.680 | by changing, I think, just a couple of residues.
00:15:35.160 | And, of course, it was very controversial.
00:15:39.000 | I think there was a moratorium on this study for a while,
00:15:43.600 | but then the study was released, it was published.
00:15:46.600 | - So why was there a moratorium?
00:15:49.920 | 'Cause it shows through engineering it,
00:15:53.000 | through modifying it, you can make it jump.
00:15:56.160 | - Yes, yes.
00:15:58.560 | I personally think it is important to study this.
00:16:02.720 | I mean, we should be informed,
00:16:06.000 | we should try to understand as much as possible
00:16:09.200 | in order to prevent it.
00:16:10.440 | - But, so then, the engineering aspect there is,
00:16:15.100 | can't you then just start searching,
00:16:18.440 | because there's so many strands of viruses out there,
00:16:21.480 | can't you just search for the ones in bats
00:16:24.320 | that are the deadliest from the virologist's perspective,
00:16:29.080 | and then just try to engineer,
00:16:32.160 | try to see how to, but see, there's a nice aspect to it.
00:16:37.160 | The really nice thing about engineering viruses,
00:16:41.680 | it has the same problem as nuclear weapons,
00:16:44.240 | is it's hard for it to not lead to mutual self-destruction.
00:16:49.240 | So you can't control a virus,
00:16:51.360 | it can't be used as a weapon, right?
00:16:53.800 | - Yeah, that's why, you know, in the beginning I said,
00:16:56.240 | you know, I'm hopeful, because there definitely
00:17:01.760 | are regulations to be, needed to be introduced.
00:17:05.840 | And, I mean, as the scientific society is,
00:17:10.840 | we are in charge of, you know, making the right actions,
00:17:16.040 | making the right decisions.
00:17:18.960 | But I think we will benefit tremendously
00:17:24.200 | by understanding the mechanisms
00:17:28.560 | by which the virus can jump,
00:17:31.520 | by which the virus can become more, you know,
00:17:35.960 | more dangerous to humans,
00:17:40.120 | because all these answers would, you know,
00:17:45.520 | eventually lead to designing better vaccines,
00:17:48.120 | hopefully universal vaccines, right?
00:17:50.560 | And that would be a triumph of the, of science.
00:17:57.120 | - So what's the universal vaccine?
00:17:58.600 | So is that something that, how universal is universal?
00:18:02.200 | - Well, I mean, you know, so--
00:18:03.920 | - What's the dream, I guess,
00:18:04.960 | 'cause you kind of mentioned the dream of this.
00:18:07.040 | - I would be extremely happy if, you know,
00:18:11.120 | we designed the vaccine that is able,
00:18:14.800 | I mean, I'll give you an example, right?
00:18:16.680 | So every year we do a seasonal flu shot.
00:18:21.520 | The reason we do it is because, you know,
00:18:24.200 | we are in the arms race, you know,
00:18:26.440 | our vaccines are in the arms race
00:18:28.240 | with constantly changing virus, right?
00:18:32.400 | Now, if the next pandemic, influenza pandemic will occur,
00:18:38.320 | most likely this vaccine would not save us, right?
00:18:44.160 | Although it's, you know, it's the same virus,
00:18:48.320 | might be different strain.
00:18:52.480 | So if we're able to essentially design a vaccine
00:18:57.280 | against, you know, influenza A virus,
00:19:01.000 | no matter what's the strain,
00:19:02.360 | no matter which species did it jump from,
00:19:07.360 | that would be, I think that would be a huge,
00:19:10.840 | huge progress and advancement.
00:19:14.000 | - You mentioned smallpox until the '70s
00:19:17.160 | might've been something that you would be worried
00:19:19.240 | the most about.
00:19:20.680 | What about these days?
00:19:22.600 | Well, we're sitting here in the middle of a COVID-19 pandemic
00:19:27.600 | but these days, nevertheless,
00:19:30.920 | what is your biggest worry virus-wise?
00:19:33.460 | What are you keeping your eye out on?
00:19:36.640 | - It looks like, and you know,
00:19:40.880 | based on the past several years
00:19:43.800 | of the new viruses emerging,
00:19:48.240 | I think we're still dealing with
00:19:53.240 | different types of influenza.
00:19:55.400 | I mean, it's also the H7N9 avian flu
00:20:00.400 | that was, that emerged, I think,
00:20:05.320 | a couple of years ago in China.
00:20:07.380 | I think the mortality rate was incredible.
00:20:13.360 | I mean, it was, you know, I think above 30%,
00:20:17.880 | you know, so this is huge.
00:20:20.400 | I mean, luckily for us, this strain was not pandemic.
00:20:25.400 | All right, so it was jumping from birds to human
00:20:30.460 | but I don't think it was actually transmittable
00:20:33.640 | between the humans.
00:20:35.200 | And, you know, this is actually a very interesting question
00:20:38.680 | which scientists try to understand, right?
00:20:42.620 | So the balance, the delicate balance
00:20:44.620 | between the virus being very contagious, right,
00:20:48.240 | so efficient in spreading,
00:20:51.760 | and virus to be very pathogenic,
00:20:55.020 | you know, causing, you know, harms, you know,
00:21:01.400 | and deaths to their hosts.
00:21:05.300 | So it looks like that the more pathogenic the virus is,
00:21:10.300 | the less contagious it is.
00:21:14.520 | - Is that a property of biology or what is--
00:21:17.960 | - I don't have an answer to that.
00:21:19.580 | And I think this is still an open question.
00:21:22.900 | But, you know, if you look at, you know,
00:21:25.840 | with the coronavirus, for example,
00:21:28.500 | if you look at, you know, the deadlier relative, MERS.
00:21:33.500 | MERS was never a pandemic virus.
00:21:38.060 | - Right. - But the, you know,
00:21:42.280 | again, the mortality rate from MERS is far above,
00:21:46.880 | you know, I think 20 or 30%, so.
00:21:49.760 | - So whatever is making this all happen
00:21:55.320 | doesn't want us dead, 'cause it's balancing out nicely.
00:21:59.860 | I mean, how do you explain that we're not dead yet?
00:22:03.520 | Like, 'cause there's so many viruses
00:22:08.800 | and they're so good at what they do.
00:22:11.440 | Why do they keep us alive?
00:22:13.400 | - I mean, we also have, you know, a lot of protection, right?
00:22:18.760 | So we do the immune system.
00:22:21.160 | And so, I mean, we do have, you know,
00:22:26.160 | ways to fight against those viruses.
00:22:31.680 | And I think with the, now we're much better equipped, right?
00:22:36.000 | So with the discoveries of vaccines and, you know,
00:22:39.880 | there are vaccines against the viruses
00:22:44.120 | that maybe 200 years ago would wipe us out completely.
00:22:49.120 | But because of these vaccines, we are actually,
00:22:53.000 | we are capable of eradicating pretty much fully,
00:22:56.360 | as is the case with smallpox.
00:22:58.840 | - So if we could, can we go to the basics a little bit
00:23:02.000 | of the biology of the virus?
00:23:04.960 | How does a virus infect the body?
00:23:08.060 | - So I think there are some key steps
00:23:11.780 | that the virus needs to perform.
00:23:13.740 | And of course, the first one, the viral particle
00:23:18.540 | needs to get attached to the host cell.
00:23:22.340 | In the case of coronavirus, there is a lot of evidence
00:23:26.300 | that it actually interacts in the same way
00:23:29.700 | as the SARS coronavirus.
00:23:34.300 | So it gets attached to AC2 human receptor.
00:23:39.300 | And so there is, I mean, as we speak,
00:23:42.260 | there is a growing number of papers suggesting it.
00:23:45.740 | Moreover, most recent, I think most recent results
00:23:51.140 | suggest that this virus attaches more efficiently
00:23:56.140 | to this human receptor than SARS.
00:24:00.660 | - So just to sort of back off,
00:24:02.900 | so there is a family of viruses, the coronaviruses,
00:24:07.020 | and SARS, whatever the heck, forgot,
00:24:09.420 | this is, whatever that stands for.
00:24:12.760 | - So SARS actually stands for the disease that you get,
00:24:17.260 | is the syndrome of acute respiratory--
00:24:19.460 | - Respiratory syndrome.
00:24:21.020 | So SARS is the first strand, and then there's MERS.
00:24:25.180 | - MERS, and there is-- - Also that family.
00:24:27.340 | - And there is, yes, people, scientists actually know
00:24:31.380 | more than three strands.
00:24:32.520 | I mean, so there is the MHV strain,
00:24:36.300 | which is considered to be a canonical model,
00:24:43.540 | disease model in mice.
00:24:46.200 | And so there is a lot of work done on this virus
00:24:50.620 | because it's-- - Interesting.
00:24:52.460 | But it hasn't jumped to humans yet?
00:24:53.980 | - No, no, it's-- - Oh, interesting.
00:24:55.260 | - Yes. - That's fascinating.
00:24:56.700 | So, and then you mentioned AC2.
00:25:01.220 | So when you say attach, proteins are involved
00:25:06.040 | on both sides. - Yes, yes.
00:25:07.180 | So we have this infamous spike protein
00:25:11.540 | on the surface of the virion particle,
00:25:15.260 | and it does look like a spike,
00:25:16.880 | and I mean, that's essentially because of this protein,
00:25:20.860 | we call the coronavirus coronavirus,
00:25:22.780 | so what makes corona on top of the surface.
00:25:28.100 | So this protein, it actually, it acts,
00:25:32.640 | so it doesn't act alone, it actually,
00:25:35.940 | it makes three copies, and it makes so-called trimer.
00:25:40.940 | So this trimer is essentially a functional unit,
00:25:45.700 | a single functional unit that starts interacting
00:25:50.380 | with the AC2 receptor.
00:25:54.740 | So this is, again, another protein
00:25:56.780 | that now sits on the surface of a human cell,
00:26:01.100 | or a host cell, I would say,
00:26:03.100 | and that's essentially, in that way,
00:26:08.540 | the virus anchors itself to the host cell,
00:26:13.440 | because then it needs to actually,
00:26:16.900 | it needs to get inside, you know,
00:26:19.420 | it fuses its membrane with the host membrane,
00:26:24.060 | it releases the key components,
00:26:27.740 | it releases its, you know, RNA,
00:26:32.500 | and then essentially hijacks the machinery of the cell,
00:26:37.500 | because none of the viruses that we know of have ribosome,
00:26:42.900 | the machinery that allows us to print out proteins.
00:26:50.860 | So in order to print out proteins
00:26:53.600 | that are necessary for functioning of this virus,
00:26:55.940 | it actually needs to hijack the host ribosomes.
00:27:00.340 | - So a virus is an RNA wrapped in a bunch of proteins,
00:27:04.380 | one of which is this functional mechanism
00:27:06.540 | of a spike protein that does the attachment.
00:27:08.740 | - So yeah, so if you look at this virus,
00:27:12.340 | there are several basic components, right?
00:27:15.340 | So we start with the spike protein.
00:27:18.260 | This is not the only surface protein,
00:27:20.540 | the protein that lives on the surface of the viral particle.
00:27:24.340 | There is also perhaps the protein
00:27:27.340 | with the highest number of copies is the membrane protein.
00:27:33.820 | So it's essentially, it forms the capsid,
00:27:38.300 | sorry, the envelope of the protein of the viral particle,
00:27:43.300 | and essentially, you know,
00:27:48.260 | helps to maintain a certain curvature,
00:27:52.260 | helps to make a certain curvature.
00:27:54.700 | Then there is another protein called envelope protein
00:27:59.660 | or E protein, and it actually occurs in far less quantities.
00:28:04.660 | And still there is an ongoing research,
00:28:09.380 | what exactly does this protein do?
00:28:13.820 | So these are sort of the three major surface proteins
00:28:17.340 | that make the viral envelope.
00:28:22.220 | And when we go inside,
00:28:24.780 | then we have another structural protein
00:28:28.060 | called nuclear protein.
00:28:29.780 | And the purpose of this protein
00:28:32.260 | is to protect the viral RNA.
00:28:34.860 | So it actually binds to the viral RNA, creates a capsid.
00:28:38.220 | And so the rest of the viral information
00:28:43.580 | is inside of this RNA.
00:28:47.060 | And, you know, if you compare the amount of the genes
00:28:52.060 | or proteins that are made of these genes,
00:28:58.000 | it's significantly higher than of influenza virus,
00:29:03.860 | for example.
00:29:05.020 | Influenza virus has, I think, around eight or nine proteins
00:29:08.740 | where this one has at least 29.
00:29:13.180 | - Wow, that has to do with the length of the RNA strand.
00:29:16.820 | I mean, what--
00:29:17.660 | - So, I mean, so it affects the length of the RNA strand.
00:29:21.220 | Right, so because you essentially need to have
00:29:24.940 | sort of the minimum amount of information
00:29:27.340 | to encode those genes.
00:29:29.460 | - How many proteins did you say?
00:29:30.780 | Say again. - 29.
00:29:31.820 | - 29 proteins.
00:29:34.100 | - Yes, so this is, you know,
00:29:36.780 | something definitely interesting
00:29:39.420 | because, you know, believe it or not,
00:29:42.140 | we've been studying, you know, coronaviruses
00:29:45.500 | for over two decades.
00:29:47.340 | We've yet to uncover all functionalities of its proteins.
00:29:52.340 | - Could we maybe take a small tangent
00:29:54.520 | and can you say how one would try to figure out
00:29:59.520 | what a function of a particular protein is?
00:30:02.140 | So you've mentioned people are still trying to figure out
00:30:06.880 | what the function of the envelope protein might be
00:30:09.300 | or what's the process?
00:30:11.900 | - So this is where the research
00:30:15.300 | that computational scientists do might be of help
00:30:19.340 | because, you know, in the past several decades,
00:30:24.180 | we actually have collected pretty decent amount of knowledge
00:30:28.940 | about different proteins in different viruses.
00:30:33.360 | So what we can actually try to do,
00:30:37.820 | and this is sort of, could be sort of our first lead
00:30:42.820 | to a possible function, is to see whether those,
00:30:46.960 | you know, say we have this genome of the coronavirus,
00:30:50.660 | of the novel coronavirus,
00:30:52.900 | and we identify the potential proteins.
00:30:57.340 | Then in order to infer the function,
00:30:58.980 | what we can do, we can actually see
00:31:01.100 | whether those proteins are similar
00:31:04.340 | to those ones that we already know.
00:31:07.760 | Okay?
00:31:08.960 | In such a way, we can, you know, for example,
00:31:11.880 | clearly identify, you know, some critical components
00:31:15.640 | that RNA polymerase or different types of proteases.
00:31:19.520 | These are the proteins that essentially
00:31:22.000 | clip the protein sequences.
00:31:26.220 | And so this works in many cases.
00:31:31.640 | However, in some cases, you have truly novel proteins,
00:31:36.520 | and this is a much more difficult task.
00:31:41.040 | - Now, as a small pause, when you say similar,
00:31:45.000 | like what if some parts are different
00:31:46.800 | and some parts are similar?
00:31:48.800 | Like how do you disentangle that?
00:31:51.340 | - You know, it's a big question.
00:31:53.960 | Of course, you know, what bioinformatics does,
00:31:57.500 | it does predictions, right?
00:32:00.020 | So those predictions,
00:32:01.760 | they have to be validated by experiments.
00:32:05.560 | - Functional or structural predictions?
00:32:08.160 | - Both.
00:32:09.000 | I mean, we do structural predictions,
00:32:10.840 | we do functional predictions,
00:32:12.100 | we do interactions predictions.
00:32:14.800 | - Oh, so this is interesting.
00:32:15.640 | So you just generate a lot of predictions,
00:32:18.660 | like reasonable predictions based on structure
00:32:21.160 | and function, interaction, like you said,
00:32:23.600 | and then here you go.
00:32:25.920 | That's the power of bioinformatics
00:32:27.640 | is data grounded, good predictions of what should happen.
00:32:33.080 | - So in a way, I see it,
00:32:36.080 | we're helping experimental scientists
00:32:39.360 | to streamline their discovery process.
00:32:42.120 | - Yeah.
00:32:43.200 | And the experimental scientists,
00:32:45.000 | is that what a virologist is?
00:32:47.360 | - So yeah, virology is one of the experimental sciences
00:32:51.240 | that focus on viruses.
00:32:53.760 | They often work with other experimental scientists,
00:32:58.000 | for example, the molecular imaging scientists, right?
00:33:02.160 | So the viruses often can be viewed
00:33:07.160 | and reconstructed through electron microscopy techniques.
00:33:12.240 | So, but these are specialists
00:33:14.280 | that are not necessarily virologists.
00:33:17.000 | They work with small particles,
00:33:20.000 | whether it's viruses or it's an organelle
00:33:27.200 | of a human cell,
00:33:30.280 | whether it's a complex molecular machinery.
00:33:33.920 | So the techniques that are used are very similar
00:33:37.880 | in sort of in their essence.
00:33:42.360 | And so yeah, so typically,
00:33:44.320 | and we see it now,
00:33:47.320 | the research on,
00:33:52.080 | that is emerging and that is needed
00:33:58.320 | often involves the collaborations between virologists,
00:34:03.320 | you know, biochemists,
00:34:06.960 | people from pharmaceutical sciences,
00:34:14.160 | computational sciences.
00:34:16.240 | So we have to work together.
00:34:19.400 | - So from my perspective, just to step back,
00:34:21.440 | sometimes I look at this stuff,
00:34:23.640 | just how much we understand about RNA and DNA,
00:34:27.400 | how much we understand about protein,
00:34:28.680 | like your work, the amount of proteins
00:34:32.280 | that you're exploring,
00:34:33.800 | is it surprising to you that we were able,
00:34:38.000 | we descendants of apes,
00:34:39.600 | were able to figure all of this out?
00:34:41.720 | Like how, so you're a computer scientist.
00:34:46.520 | So for me, from a computer science perspective,
00:34:49.520 | I know how to write a Python program, things are clear,
00:34:52.120 | but biology is a giant mess,
00:34:55.480 | it feels like to me, from an outsider's perspective.
00:34:58.600 | Is how surprising is it, amazing is it,
00:35:01.680 | that we were able to figure this stuff out?
00:35:04.600 | - You know, if you look at the,
00:35:06.440 | you know, how computational science
00:35:09.480 | and computer science was evolving, right?
00:35:12.680 | I think it was just a matter of time
00:35:14.520 | that we would approach biology.
00:35:16.920 | So we started from, you know,
00:35:19.240 | applications to much more fundamental systems,
00:35:23.840 | physics, you know, and now we are,
00:35:27.680 | or, you know, small chemical compounds, right?
00:35:32.520 | So now we are approaching
00:35:36.200 | the more complex biological systems.
00:35:39.760 | And I think it's a natural evolution
00:35:43.880 | of, you know, of the computer science, of mathematics.
00:35:48.440 | - So sure, that's the computer science side,
00:35:50.120 | I just meant even in higher levels.
00:35:52.440 | So that to me is surprising,
00:35:54.040 | that computer science can offer help in this messy world.
00:35:57.520 | But it just means it's incredible
00:35:59.360 | that the biologists and the chemists
00:36:01.920 | can figure all this out.
00:36:03.120 | Or does that just sound ridiculous to you,
00:36:04.640 | that of course they would.
00:36:07.720 | It just seems like a very complicated set of problems,
00:36:10.280 | like the variety of the kinds of things
00:36:13.640 | that could be produced in the body.
00:36:16.120 | Just like you said, 29 pro, I mean,
00:36:19.240 | just getting a hang of it so quickly,
00:36:24.240 | it just seems impossible to me.
00:36:27.160 | - I agree, I mean, and I have to say,
00:36:29.640 | we are, you know, in the very, very
00:36:32.440 | beginning of this journey.
00:36:33.760 | I mean, we've yet to, I mean,
00:36:38.080 | we've yet to comprehend,
00:36:39.880 | not even try to understand and figure out all the details,
00:36:44.880 | but we've yet to comprehend the complexity
00:36:49.400 | of the cell.
00:36:51.160 | - We know that neuroscience is not even
00:36:55.800 | at the beginning of understanding the human mind.
00:36:58.640 | So where's biology sit in terms of understanding
00:37:04.200 | the function, deeply understanding
00:37:07.160 | the function of viruses and cells?
00:37:10.280 | So sometimes it's easy to say,
00:37:12.800 | when you talk about function,
00:37:14.360 | what you really refer to is perhaps
00:37:16.480 | not a deep understanding,
00:37:18.520 | but more of a understanding sufficient
00:37:21.600 | to be able to mess with it using an antiviral,
00:37:25.000 | like mess with it chemically
00:37:26.880 | to prevent some of its function.
00:37:29.680 | Or do you understand the function?
00:37:31.400 | - Well, I think-- - Deeply.
00:37:32.720 | - I think we are much farther in terms of understanding
00:37:35.680 | of the complex genetic disorders, such as cancer,
00:37:40.680 | where you have layers of complexity.
00:37:42.680 | And we, you know, as in my laboratory,
00:37:45.680 | we're trying to contribute to that research,
00:37:47.680 | but we're also, you know, we're overwhelmed
00:37:50.200 | with how many different layers of complexity,
00:37:53.240 | different layers of mechanisms
00:37:56.560 | that can be hijacked by cancer simultaneously.
00:38:00.200 | And so, you know, I think biology in the past 20 years,
00:38:06.240 | again, from the perspective of the outsider,
00:38:11.840 | 'cause I'm not a biologist,
00:38:13.560 | but I think it has advanced tremendously.
00:38:18.560 | And one thing that where computational scientists
00:38:23.280 | and data scientists are now becoming very,
00:38:28.280 | very helpful is in the fact,
00:38:33.920 | it's coming from the fact that we are now able
00:38:37.280 | to generate a lot of information about the cell.
00:38:43.280 | Whether it's next generation sequencing or transcriptomics,
00:38:48.200 | whether it's life imaging information,
00:38:51.480 | where it is, you know, complex interactions
00:38:56.480 | between proteins or between proteins
00:38:59.080 | and small molecules such as drugs.
00:39:02.080 | We are becoming very efficient
00:39:05.520 | in generating this information.
00:39:07.920 | And now the next step is to become equally efficient
00:39:12.280 | in processing this information
00:39:16.440 | and extracting the key knowledge from that.
00:39:19.800 | - That could then be validated with experiment.
00:39:23.320 | - Yes. - I'm back.
00:39:24.160 | - Yes. - So maybe then going
00:39:25.920 | all the way back, we were talking,
00:39:27.160 | you said the first step is seeing
00:39:30.600 | if we can match the new proteins you found in the virus
00:39:34.080 | against something we've seen before
00:39:35.480 | to figure out its function.
00:39:37.600 | And then you also mentioned that,
00:39:39.400 | but there could be cases where it's a totally new protein.
00:39:42.640 | Is there something bioinformatics can offer
00:39:45.320 | when it's a totally new protein?
00:39:47.120 | - This is where many of the methods,
00:39:50.440 | and you're probably aware of the case of machine learning,
00:39:54.400 | many of these methods rely on the previous knowledge.
00:39:59.400 | - Right. - Right?
00:40:00.560 | So things that where we try to do from scratch
00:40:05.320 | are incredibly difficult.
00:40:08.360 | Something that we call ab initio.
00:40:10.480 | And this is, I mean, it's not just the function.
00:40:12.920 | I mean, we've yet to have a robust method
00:40:16.960 | to predict the structures of these proteins in ab initio
00:40:21.160 | by not using any templates
00:40:27.600 | of other related proteins.
00:40:32.040 | - So protein is a chain of amino acids.
00:40:36.160 | - It's residues.
00:40:37.640 | - Residues, yeah.
00:40:39.120 | And then somehow, magically, maybe you can tell me,
00:40:44.720 | they seem to fold in incredibly weird
00:40:47.520 | and complicated 3D shapes.
00:40:49.280 | - Yes.
00:40:50.120 | - And that's where actually the idea of protein folding,
00:40:57.080 | or just not the idea, but the problem of figuring out
00:40:59.640 | how the-- - Yeah, the concept.
00:41:00.800 | - The concept, yeah, how they fold into those weird shapes
00:41:04.680 | comes in.
00:41:05.520 | So that's another side of computational work.
00:41:09.240 | So can you describe what protein folding
00:41:11.720 | from the computational side is?
00:41:13.680 | And maybe your thoughts on the folding at home efforts
00:41:16.800 | that a lot of people know that you can use your machine
00:41:19.800 | to do protein folding?
00:41:22.600 | - So yeah, protein folding is one of those
00:41:26.800 | one million dollar price challenges, right?
00:41:30.680 | So the reason for that is we've yet to understand
00:41:35.360 | precisely how the protein gets folded so efficiently,
00:41:40.360 | to the point that in many cases where you try to unfold it
00:41:47.560 | due to the high temperature, it actually folds back
00:41:51.920 | into its original state.
00:41:54.040 | All right, so we know a lot about the mechanisms, right?
00:41:59.320 | But putting those mechanisms together and making sense,
00:42:04.320 | it's computationally a very expensive task.
00:42:11.280 | - In general, do proteins fold,
00:42:14.240 | can they fold in arbitrary large number of ways,
00:42:16.880 | or do they usually fold in a very small number of ways?
00:42:19.360 | - No, it's typically, I mean, we tend to think that
00:42:23.080 | there is a one sort of canonical fold for a protein,
00:42:26.880 | although there are many cases where the proteins,
00:42:30.640 | upon destabilization,
00:42:32.400 | it can be folded into a different conformation.
00:42:35.720 | And this is especially true when you look at
00:42:40.000 | sort of proteins that include
00:42:44.360 | more than one structural unit.
00:42:46.160 | So those structural units, we call them protein domains.
00:42:49.240 | Essentially, a protein domain is a single unit
00:42:53.720 | that typically is evolutionarily preserved,
00:42:56.960 | that typically carries out a single function,
00:42:59.640 | and typically has a very distinct fold, right?
00:43:04.640 | The structure, 3D structure organization.
00:43:07.240 | But turns out that if you look at human,
00:43:09.760 | an average protein in a human cell
00:43:12.640 | would have a bit of two or three such subunits.
00:43:19.360 | And how they are trying to fold into the sort of,
00:43:24.360 | next level fold, right?
00:43:30.440 | - So within subunit, there's folding,
00:43:32.680 | and then they fold into the larger 3D structure, right?
00:43:38.880 | And all of that, there's some understanding
00:43:41.080 | of the basic mechanisms,
00:43:42.440 | but not to put together to be able to fold it.
00:43:44.960 | - We're still, I mean, we're still struggling.
00:43:47.360 | I mean, we're getting pretty good
00:43:49.480 | about folding relatively small proteins
00:43:52.920 | up to 100 residues.
00:43:55.400 | I mean, but we're still far away
00:43:59.240 | from folding larger proteins.
00:44:02.240 | And some of them are notoriously difficult.
00:44:06.120 | For example, transmembrane proteins,
00:44:07.920 | proteins that sit in the membranes of the cell.
00:44:12.920 | They're incredibly important,
00:44:16.360 | but they are incredibly difficult to solve.
00:44:19.640 | - And so basically, there's a lot of degrees of freedom,
00:44:22.960 | how it folds.
00:44:24.360 | And so it's a combinatorial problem,
00:44:26.120 | or it just explodes.
00:44:27.360 | There's so many dimensions.
00:44:28.960 | - Well, it is a combinatorial problem,
00:44:31.280 | but it doesn't mean that we cannot approach it
00:44:34.240 | from the non-con, not from the brute force approach.
00:44:39.240 | And so the machine learning approaches,
00:44:43.840 | you know, have been emerged that try to tackle it.
00:44:47.440 | - So folding at home,
00:44:49.800 | I don't know how familiar you are with it,
00:44:51.560 | but is that using machine learning,
00:44:54.000 | or is it more brute force?
00:44:55.120 | - No, so folding at home, it was originally,
00:44:57.400 | and I remember, I was, I mean, it was long time ago.
00:45:01.000 | I was a postdoc, and we learned about this,
00:45:05.520 | you know, this game,
00:45:07.360 | 'cause it was originally designed as the game.
00:45:10.960 | And we, you know, I took a look at it,
00:45:15.200 | and it's interesting because it's really,
00:45:18.960 | you know, it's very transparent, very intuitive.
00:45:22.520 | So, and from what I heard,
00:45:25.280 | I've yet to introduce it to my son,
00:45:27.440 | but, you know, kids are actually getting very good
00:45:30.440 | at folding the proteins.
00:45:33.120 | And it, you know, it came to me as the,
00:45:39.680 | not as a surprise, but actually as the sort of manifest
00:45:44.560 | of, you know, our capacity to do this kind of,
00:45:49.560 | to solve this kind of problems,
00:45:52.680 | when a paper was published in one of these top journals
00:45:57.680 | with the co-authors being the actual players of this game.
00:46:07.480 | So, and what happened is, was that they managed
00:46:12.480 | to get better structures than the scientists themselves.
00:46:17.080 | So, so that, you know, that was very, I mean,
00:46:23.660 | it was kind of profound, you know, revelation that problems
00:46:28.680 | that are so challenging for a computational science,
00:46:35.680 | maybe not that challenging for a human brain.
00:46:38.200 | - Well, that's a really good,
00:46:40.360 | that's a hopeful message always,
00:46:42.000 | when there's the proof of existence,
00:46:47.000 | the existence proof that it's possible.
00:46:51.360 | That's really interesting.
00:46:52.440 | But it seems, what are the best ways
00:46:56.000 | to do protein folding now?
00:46:58.280 | So, if you look at what DeepMind does with AlphaFold.
00:47:02.760 | - AlphaFold, yes.
00:47:03.960 | - So, they kind of, that's a learning approach.
00:47:06.600 | What's your sense?
00:47:07.920 | I mean, your background is in machine learning,
00:47:10.120 | but is this a learnable problem?
00:47:13.040 | Is this still a brute force?
00:47:14.440 | Are we in the Garry Kasparov, Deep Blue days?
00:47:19.280 | Or are we in the AlphaGo playing the game of Go days
00:47:23.160 | of folding?
00:47:24.520 | - Well, I think we are advancing towards this direction.
00:47:28.920 | I mean, if you look, so there is sort of Olympic game
00:47:32.480 | for protein folders called CASP.
00:47:35.680 | And it's essentially, it's a competition
00:47:39.520 | where different teams are given exactly
00:47:43.920 | the same protein sequences,
00:47:47.400 | and they try to predict their structures.
00:47:49.800 | And of course, there is different sort of subtasks,
00:47:55.640 | but in the recent competition,
00:47:58.400 | AlphaFold was among the top performing teams,
00:48:02.000 | if not the top performing team.
00:48:05.000 | So, there is definitely a benefit from the data
00:48:10.000 | that have been generated in the past several decades,
00:48:15.520 | the structural data.
00:48:17.240 | And certainly, we are now at the capacity
00:48:22.240 | to summarize this data, to generalize this data,
00:48:28.280 | and to use those principles,
00:48:31.360 | in order to predict protein structures.
00:48:34.160 | - That's one of the really cool things here is,
00:48:36.320 | there's, maybe you can comment on it.
00:48:38.600 | There seems to be these open datasets of protein.
00:48:42.640 | How did that?
00:48:43.480 | - Protein data bank?
00:48:45.920 | - The protein data bank?
00:48:48.320 | I mean, that's crazy.
00:48:49.560 | Is this a recent thing for just the coronavirus?
00:48:52.440 | Or is this been a--
00:48:53.360 | - It's been for many, many years.
00:48:56.240 | I believe the first protein data bank
00:48:59.200 | was designed on flashcards.
00:49:01.720 | So, yes, it's this,
00:49:07.840 | I mean, this is a great example of the community efforts
00:49:13.120 | of everyone contributing.
00:49:16.520 | 'Cause every time you solve a protein
00:49:20.400 | or a protein complex, this is where you submit it.
00:49:25.400 | And the scientists get access to it,
00:49:30.400 | scientists get to test it.
00:49:33.920 | And we, bioinformaticians, use this information
00:49:38.920 | to make predictions.
00:49:42.720 | - So there's no culture of like hoarding discoveries here.
00:49:47.720 | So that's, I mean, you've released a few,
00:49:52.920 | or a bunch of proteins that were matching
00:49:55.240 | whatever, we'll talk about details a little bit.
00:49:57.360 | But it's kind of amazing that that's,
00:50:02.000 | it's kind of amazing how open the culture here is.
00:50:07.320 | - It is.
00:50:08.200 | And I think this pandemic actually demonstrated
00:50:13.080 | the ability of scientific community
00:50:17.880 | to solve this challenge collaboratively.
00:50:23.840 | And this is, I think, if anything,
00:50:25.960 | it actually moved us to a brand new level
00:50:30.960 | of collaborations of the efficiency
00:50:34.900 | in which people establish new collaborations,
00:50:37.960 | in which people offer their help to each other.
00:50:42.760 | Scientists offer their help to each other.
00:50:44.720 | - And publish results, too.
00:50:45.880 | It's very interesting. - Exactly.
00:50:47.120 | - We're now trying to figure out,
00:50:49.040 | there's a few journals that are trying
00:50:50.520 | to sort of do the very accelerated review cycle,
00:50:54.260 | but so many preprints, so just posting a paper, going out,
00:50:58.560 | I think it's fundamentally changing
00:51:00.600 | the way we think about papers.
00:51:04.040 | - Yes.
00:51:05.040 | I mean, the way we think about knowledge, I would say.
00:51:08.600 | - Knowledge. - Yes.
00:51:09.740 | Because, yes, I completely agree.
00:51:12.440 | I think now it's,
00:51:17.080 | the knowledge is becoming sort of the core value,
00:51:21.600 | not the paper or the journal
00:51:24.800 | where this knowledge is published.
00:51:27.480 | And I think this is, again, we are living
00:51:30.400 | in the times where it becomes really crystallized,
00:51:35.400 | that the idea that the most important value
00:51:41.640 | is in the knowledge.
00:51:43.720 | - So maybe you can comment,
00:51:45.280 | what do you think the future
00:51:46.520 | of that knowledge sharing looks like?
00:51:47.980 | So you have this paper that we'll,
00:51:49.840 | I hope we get a chance to talk about a little bit,
00:51:52.480 | but it has a really nice abstract
00:51:54.560 | and introduction and related,
00:51:56.440 | it has all the usual, I mean,
00:51:58.200 | probably took a long time to put together.
00:52:00.600 | (laughs)
00:52:02.880 | But is that going to remain,
00:52:05.600 | you could have communicated a lot
00:52:08.160 | of fundamental ideas here in a much shorter amount
00:52:11.420 | that's less traditionally acceptable by the journal context?
00:52:16.020 | - So, well, so the first version that we posted,
00:52:21.940 | not even on the bio-archive,
00:52:25.180 | because bio-archive back then,
00:52:27.580 | it was essentially overwhelmed
00:52:32.500 | with the number of submissions.
00:52:34.260 | So our submission, I think it took five or six days
00:52:38.820 | just for it to be screened and put online.
00:52:43.820 | So we, essentially we put the first pre-print on our website
00:52:49.020 | and it started getting accessed right away.
00:52:55.580 | So, and this original pre-print was
00:53:02.300 | in a much rougher shape than this paper.
00:53:08.460 | But we tried, I mean, we honestly tried
00:53:11.420 | to be as compact as possible
00:53:13.940 | with introducing the information that is necessary
00:53:20.060 | to explain our results.
00:53:26.060 | - So maybe we can dive right in if it's okay.
00:53:29.580 | - Sure.
00:53:30.420 | - So this is a paper called Structural Genomics of SARS-CoV-2.
00:53:34.420 | How do you even pronounce?
00:53:35.820 | - SARS-CoV-2.
00:53:37.020 | - Co-vee-too?
00:53:37.980 | - Yeah.
00:53:39.140 | - By the way, COVID is such a terrible name, but it stuck.
00:53:42.580 | - Yes.
00:53:43.580 | - SARS-CoV-2 indicates evolutionary
00:53:46.540 | conserved functional regions of viral proteins.
00:53:50.380 | So this is looking at all kinds of proteins
00:53:53.380 | that are part of this novel coronavirus
00:53:57.700 | and how they match up against the previous
00:54:00.540 | other kinds of coronaviruses.
00:54:02.580 | I mean, there's a lot of beautiful figures.
00:54:04.220 | I was wondering if you could,
00:54:06.060 | I mean, there's so many questions I could ask here,
00:54:07.660 | but maybe at the, how do you get started doing this paper?
00:54:11.940 | So how do you start to figure out
00:54:13.340 | the 3D structure of a novel virus?
00:54:16.340 | - Yes, so there is actually a little story behind it.
00:54:20.740 | And so the story actually dated back in September of 2019.
00:54:25.740 | And you probably remember that back then
00:54:31.060 | we had another dangerous virus, triple E virus,
00:54:35.820 | Eastern equine encephalitis virus.
00:54:39.460 | And--
00:54:40.820 | - Can you maybe linger on it?
00:54:41.860 | I have to admit I was sadly completely unaware.
00:54:45.980 | - So that was actually a virus outbreak
00:54:49.900 | that happened in New England only.
00:54:52.820 | The danger in this virus was that it actually,
00:54:56.180 | it targeted your brain.
00:54:58.620 | So the word death from this virus,
00:55:04.620 | it was, the main vector was mosquitoes.
00:55:09.620 | And obviously fall time is the time
00:55:15.940 | where you have a lot of them in New England.
00:55:19.660 | And on one hand, people realize
00:55:24.500 | this is actually a very dangerous thing.
00:55:27.060 | So it had an impact on the local economy.
00:55:33.080 | The schools were closed past six o'clock.
00:55:37.700 | No activities outside for the kids
00:55:39.620 | because the kids were suffering quite tremendously
00:55:44.100 | from when infected from this virus.
00:55:48.540 | - How do I not know about this?
00:55:50.100 | Was the universities impacted?
00:55:52.220 | - It was in the news.
00:55:53.500 | I mean, it was not impacted to a high degree
00:55:57.980 | in Boston necessarily, but in the Metro West area.
00:56:02.820 | And actually spread around, I think,
00:56:05.260 | all the way to New Hampshire, Connecticut.
00:56:09.340 | - And you mentioned affecting the brain.
00:56:11.060 | That's one other comment we should make.
00:56:13.620 | So you mentioned AC2 for the coronavirus.
00:56:18.200 | So these viruses kind of attach to something in the body.
00:56:23.200 | - So it essentially attaches to these proteins
00:56:27.460 | in those cells in the body
00:56:30.820 | where those proteins are expressed,
00:56:32.420 | where they actually have them in abundance.
00:56:35.820 | - So sometimes that could be in the lungs,
00:56:37.620 | that could be in the brain, that could be--
00:56:39.540 | - So I think what they, right now,
00:56:43.780 | from what I read, they have the epithelial cells inside.
00:56:48.780 | So the cells essentially inside the,
00:56:53.340 | cells that are covering the surface.
00:56:58.300 | So inside the nasal surfaces, the throat,
00:57:03.300 | the lung cells, and I believe liver,
00:57:08.820 | a couple of other organs
00:57:10.100 | where they are actually expressed in abundance.
00:57:13.660 | - That's for the AC2, you said?
00:57:15.260 | - For the AC2 receptors.
00:57:17.140 | - So okay, so back to the story.
00:57:19.060 | - So yes. - In the fall.
00:57:20.700 | - So now the, these, you know,
00:57:24.860 | the impact of this virus is significant.
00:57:29.540 | However, it's a pre-local problem
00:57:33.460 | to the point that, you know,
00:57:35.460 | this is something that we would call a neglected disease
00:57:39.340 | because it's not big enough to make, you know,
00:57:44.340 | the drug design companies to design a new antiviral
00:57:49.820 | or a new vaccine.
00:57:53.100 | It's not big enough to generate a lot of grants
00:57:58.100 | from the national funding agencies.
00:58:04.700 | So does it mean we cannot do anything about it?
00:58:09.500 | And so what I did is I taught a bioinformatics class
00:58:14.500 | and in Worcester Polytechnic Institute,
00:58:19.100 | and we are very much a problem learning institution.
00:58:24.100 | So I thought that that would be a perfect, you know,
00:58:30.200 | perfect project for the class.
00:58:32.100 | - It's an ongoing case study.
00:58:33.540 | - So I asked, you know, so I,
00:58:37.460 | we essentially designed a study
00:58:40.100 | where we tried to use bioinformatics
00:58:42.100 | to understand as much as possible about this virus.
00:58:47.980 | And a very substantial portion of the study
00:58:51.660 | was to understand the structures of the proteins,
00:58:55.740 | to understand how they interact with each other
00:58:59.980 | and with the host proteins,
00:59:03.820 | try to understand the evolution of this virus.
00:59:08.580 | It's obviously, you know, a very important question,
00:59:11.460 | how, where it will evolve further,
00:59:14.680 | how, you know, how it happened here, you know.
00:59:19.680 | So we did all this, you know, projects,
00:59:25.360 | and now I'm trying to put them into a paper
00:59:28.360 | where all these undergraduate students will be co-authors.
00:59:31.260 | But essentially the projects were finished
00:59:36.320 | right about mid-December.
00:59:38.240 | And a couple of weeks later,
00:59:42.960 | I heard about this mysterious new virus
00:59:46.440 | that was discovered in, you know,
00:59:48.240 | was reported in Wuhan province.
00:59:51.280 | And immediately I thought that, well, we just did that.
00:59:55.880 | Can't we do the same thing with this virus?
01:00:00.880 | And so we started waiting for the genome to be released
01:00:04.640 | because that's essentially the first piece of information
01:00:07.680 | that is critical.
01:00:09.220 | Once you have the genome sequence,
01:00:10.880 | you can start doing a lot using bioinformatics.
01:00:13.880 | - When you say genome sequence,
01:00:15.200 | that's referring to the sequence of letters
01:00:18.640 | that make up the RNA?
01:00:20.760 | - So, well, the sequence that make up
01:00:23.760 | the entire information encoded in the protein, right?
01:00:28.760 | So that includes all 29 genes.
01:00:33.140 | - What are genes?
01:00:36.200 | What's the encoding of information?
01:00:39.040 | - So genes is essentially is a basic functional unit
01:00:42.520 | that we can consider.
01:00:47.120 | So each gene in the virus would correspond to a protein.
01:00:52.120 | So gene by itself doesn't do its function.
01:00:56.920 | It needs to be converted or translated into the protein
01:01:01.840 | that will become the actual functional unit.
01:01:07.440 | - Yeah, like you said, the printer.
01:01:10.040 | - So we need the printer for that.
01:01:11.600 | - We need the printer, okay.
01:01:12.480 | So the first step is to figure out the genome,
01:01:16.940 | the sequence of things that will be then used
01:01:20.320 | for printing the protein.
01:01:21.760 | So, okay.
01:01:23.360 | - So then the next step, so once we have this,
01:01:27.440 | and so we use the existing information about SARS
01:01:31.640 | 'cause the SARS genomics has been done
01:01:36.440 | in abundance.
01:01:38.200 | So we have different strains of SARS
01:01:41.120 | and actually other related coronaviruses, MERS,
01:01:45.120 | the bat coronavirus.
01:01:47.980 | And we started by identifying the potential genes
01:01:54.520 | 'cause right now it's just a sequence, right?
01:01:57.000 | So it's a sequence that is roughly,
01:01:59.900 | it's less than 30,000 nucleotide long.
01:02:05.080 | - Just a raw sequence.
01:02:06.680 | - It's a raw sequence.
01:02:07.520 | - No other information really.
01:02:09.160 | - And we now need to define the boundaries of the genes
01:02:14.160 | that would then be used to identify the proteins
01:02:20.800 | and protein structures.
01:02:22.480 | - How hard is that problem?
01:02:23.960 | - It's not, I mean, it's pretty straightforward.
01:02:27.760 | So, 'cause we use the existing information
01:02:32.040 | about SARS proteins and SARS genes.
01:02:35.720 | - So once again, you kinda--
01:02:37.740 | - We are relying on the, yes.
01:02:40.360 | So, and then once we get there,
01:02:45.360 | this is where sort of the first more traditional
01:02:50.800 | bioinformatics steps, step begins.
01:02:54.480 | We're trying to use this protein sequences
01:02:57.000 | and get the 3D information about those proteins.
01:03:03.000 | So this is where we are relying heavily
01:03:08.000 | on the structure information,
01:03:11.400 | specifically from the protein data bank
01:03:14.440 | that we were talking about.
01:03:15.960 | - And here you're looking for similar proteins.
01:03:18.640 | - Yes, so the concept that we are operating
01:03:23.160 | when we do this kind of modeling,
01:03:25.200 | it's called homology or template-based modeling.
01:03:28.380 | So essentially, using the concept that
01:03:31.620 | if you have two sequences that are similar
01:03:37.160 | in terms of the letters,
01:03:38.620 | the structures of the sequences
01:03:41.840 | are expected to be similar as well.
01:03:43.780 | - And this is at the micro, at the very local scale and--
01:03:48.540 | - At the scale of the whole protein.
01:03:50.420 | - At the whole protein.
01:03:51.460 | - Right, so actually, so, you know,
01:03:53.420 | of course the devil is in the details
01:03:58.060 | and this is why we need actually
01:04:00.500 | pretty sophisticated modeling tools to do so.
01:04:08.300 | Once we get the structures of the individual proteins,
01:04:16.620 | we try to see whether or not these proteins act alone
01:04:23.980 | or they have to be forming protein complexes
01:04:28.940 | in order to perform this function.
01:04:31.580 | And again, so this is sort of the next level of the modeling
01:04:34.980 | because now you need to understand how proteins interact.
01:04:39.660 | And it could be the case that the protein interacts
01:04:44.120 | with itself and makes sort of a multimeric complex.
01:04:51.520 | The same protein just repeated multiple times
01:04:54.360 | and we have quite a few such proteins in SARS-CoV-2,
01:04:59.360 | specifically spike protein needs three copies to function.
01:05:07.880 | Envelope protein needs five copies to function.
01:05:14.640 | And there are some other multimeric complexes.
01:05:18.400 | - That's what you mean by interacting with itself
01:05:20.640 | and you see multiple copies.
01:05:22.200 | So how do you make a good guess
01:05:25.120 | whether something's going to interact?
01:05:27.160 | - Well, again, so there are two approaches, right?
01:05:29.460 | So one is look at the previously solved complexes.
01:05:34.460 | Now we're looking not at the individual structures
01:05:36.920 | but the structures of the whole complex.
01:05:40.060 | - Complex is a bunch, multiple proteins.
01:05:43.480 | - Yeah, so it's a bunch of proteins
01:05:45.600 | essentially glued together.
01:05:47.280 | - And when you say glued, that's the interaction.
01:05:49.800 | - That's the interaction.
01:05:51.160 | So there are different forces,
01:05:53.160 | different sort of physical forces behind this.
01:05:57.760 | - Sorry to keep asking dumb questions,
01:05:59.680 | but is it the interaction fundamentally structural
01:06:04.680 | or is it functional?
01:06:07.080 | Like in the way you're thinking about it.
01:06:10.560 | - That's actually a very good way to ask this question
01:06:14.600 | because it turns out that the interaction is structural
01:06:19.560 | but in the way it forms the structure,
01:06:24.160 | it actually also carries out the function.
01:06:26.360 | So interaction is often needed
01:06:30.680 | to carry out very specific function of a protein.
01:06:34.540 | - But in terms of on the rotor side,
01:06:38.080 | figuring out you're really starting at the structure
01:06:41.760 | before you figure out the function.
01:06:44.020 | So there's a beautiful figure two in the paper
01:06:48.600 | of all the different proteins that make up,
01:06:51.000 | that you're able to figure out that make up
01:06:53.200 | the new, the novel coronavirus.
01:06:58.920 | What are we looking at?
01:07:02.080 | So these are like,
01:07:03.440 | that's through the step two that you mentioned
01:07:08.560 | when you try to guess at the possible proteins,
01:07:12.200 | that's what you're going to get is these blue cyan blobs.
01:07:16.660 | - Yes, so those are the individual proteins
01:07:20.440 | for which we have at least some information
01:07:25.280 | from the previous studies.
01:07:26.860 | So there is advantage and disadvantage
01:07:30.760 | of using previous studies.
01:07:31.880 | The biggest, well, the disadvantage is that,
01:07:35.680 | we may not necessarily have the coverage of all 29 proteins.
01:07:40.680 | However, the biggest advantage is that the accuracy
01:07:45.280 | in which we can model these proteins is very high,
01:07:48.480 | much higher compared to ab initio methods
01:07:51.920 | that do not use any template information.
01:07:55.060 | - So, but nevertheless, this figure also has,
01:07:59.880 | I mean, it's such a beautiful,
01:08:01.760 | and I love these pictures so much.
01:08:04.060 | It has like the pink parts,
01:08:07.400 | which are the parts that are different.
01:08:10.360 | So you're highlighting,
01:08:12.040 | so the difference you find is on the 2D sequence,
01:08:15.400 | and then you try to infer what that will look like
01:08:17.480 | on the 3D.
01:08:18.520 | - Yeah, so the difference actually is on 1D sequence.
01:08:23.000 | - 1D, sorry, not 2D, right.
01:08:24.760 | - So, and so this is one of this first questions
01:08:29.040 | that we try to answer is that,
01:08:32.760 | well, if you take this new virus
01:08:35.560 | and you take the closest relatives,
01:08:39.160 | which are SARS and a couple of bad coronavirus strains,
01:08:44.160 | they are already the closest relatives
01:08:49.740 | that we are aware of.
01:08:51.760 | Now, what are the difference between this virus
01:08:56.000 | and its closest relatives, right?
01:08:58.360 | And if you look, typically when you take a sequence,
01:09:03.160 | those differences could be quite far away from each other.
01:09:08.240 | So what make, what 3D structure makes those difference
01:09:13.240 | to do, they very often, they tend to cluster together.
01:09:18.360 | - Interesting.
01:09:19.440 | - And over sudden, the differences
01:09:21.080 | that may look completely unrelated,
01:09:24.500 | actually relate to each other.
01:09:28.080 | And sometimes they are there because they correspond,
01:09:32.200 | they attack the functional side, right?
01:09:36.400 | So they are there because this is the functional side
01:09:40.680 | that is highly mutated.
01:09:43.720 | - So that's a computational approach
01:09:47.480 | to figuring something out.
01:09:48.720 | And when it comes together like that,
01:09:51.220 | that's kind of a nice clean indication
01:09:53.480 | that there's something, this could be actually indicative
01:09:56.080 | of what's happening.
01:09:58.720 | - Yes, I mean, so we need this information
01:10:01.360 | and the 3D structure gives us a lot of information
01:10:06.400 | and gives us just a very intuitive way
01:10:11.400 | to look at this information and then start to ask,
01:10:16.400 | start asking questions such as,
01:10:19.360 | so this place of this protein that is highly mutated,
01:10:23.200 | does it, is it the functional part of the protein?
01:10:32.920 | So does this part of the protein interact
01:10:36.960 | with some other proteins or maybe with some other ligands,
01:10:41.080 | small molecules, right?
01:10:43.320 | So we will try now to functionally inform
01:10:47.400 | this 3D structure.
01:10:50.000 | - So you have a bunch of these mutated parts,
01:10:55.240 | if like, I don't know, like how many are there
01:11:01.200 | in the new novel coronavirus when you compare it to SARS?
01:11:03.840 | We're talking about hundreds, thousands,
01:11:06.120 | like these pink regions.
01:11:08.240 | - No, no, much less than that.
01:11:10.880 | And it's very interesting that if you look at that,
01:11:13.840 | so the first thing that you start seeing,
01:11:16.840 | you look at patterns, right?
01:11:18.640 | And the first pattern that becomes obvious
01:11:22.960 | is that some of the proteins in the new coronavirus
01:11:28.600 | are pretty much intact, right?
01:11:31.640 | So they're pretty much exactly the same as SARS,
01:11:35.640 | as the bat coronavirus,
01:11:38.560 | whereas some others are heavily mutated, right?
01:11:43.360 | So it looks like that the evolution
01:11:47.880 | is not occurring uniformly across the entire viral genome,
01:11:57.800 | but actually target very specific proteins.
01:12:02.240 | - And what do you do with that,
01:12:03.560 | like from the Sherlock Holmes perspective?
01:12:05.640 | - Well, you know, so one of the most interesting findings
01:12:10.640 | we had was the fact that the viral,
01:12:18.280 | so the binding sites on the viral surfaces
01:12:25.040 | that get targeted by the known small molecules,
01:12:30.040 | they were pretty much not affected at all.
01:12:34.840 | And so that means that the same small drugs
01:12:39.840 | or small drug-like compounds can be efficient
01:12:45.400 | for the new coronavirus.
01:12:51.000 | - Ah, so this all actually maps to the drug compounds too,
01:12:55.000 | like so you're actually mapping out
01:12:57.960 | what old stuff is gonna work on this thing,
01:13:02.600 | and then possibilities for new stuff to work
01:13:05.480 | by mapping out the things that have mutated.
01:13:07.840 | - Yes, so we essentially know
01:13:10.320 | which parts behave differently
01:13:13.560 | and which parts are likely to behave similar.
01:13:18.200 | And again, of course, all our predictions
01:13:22.240 | need to be validated by experiments,
01:13:25.840 | but hopefully that sort of helps us to delineate
01:13:30.520 | the regions of this virus that can be promising
01:13:35.520 | in terms of the drug discovery.
01:13:39.200 | - You kind of mentioned this already,
01:13:41.440 | but maybe you can elaborate.
01:13:43.140 | So how different from the structural
01:13:45.720 | and functional perspective does the new coronavirus
01:13:49.680 | appear to be relative to SARS?
01:13:52.300 | - We now are trying to understand
01:13:56.060 | the overall structural characteristics of this virus,
01:13:59.940 | because that's our next step,
01:14:02.340 | trying to model the viral particle
01:14:05.820 | of a single viral particle of this virus.
01:14:09.220 | - So that means you have the individual proteins,
01:14:12.740 | like you said, you have to figure out
01:14:14.060 | what their interaction is.
01:14:15.500 | Is that where this graph kind of interactome--
01:14:20.020 | - So the interactome is essentially a,
01:14:25.020 | so our prediction on the potential interactions,
01:14:29.060 | some of them that we already deciphered
01:14:31.740 | from the structural knowledge,
01:14:33.540 | but some of them that are essentially
01:14:35.500 | are deciphered from the knowledge
01:14:38.220 | of the existing interactions
01:14:40.460 | that people previously obtained for SARS,
01:14:45.180 | for MERS, or other related viruses.
01:14:49.740 | - Is there kind of interactomes,
01:14:53.100 | am I pronouncing that correctly, by the way?
01:14:54.300 | - Yeah, interactome.
01:14:55.220 | - Yeah, are those already converged towards for SARS?
01:15:00.220 | - So I think there are a couple of papers
01:15:07.220 | that now investigate the sort of the large-scale
01:15:14.940 | set of interactions between the new SARS and its host.
01:15:19.940 | And so I think that's an ongoing study, I think--
01:15:26.140 | - And the success of that,
01:15:27.300 | the result would be an interactome.
01:15:29.220 | - Yes.
01:15:30.700 | - And so when you say,
01:15:32.740 | not trying to figure out the entire particle,
01:15:36.980 | the entire thing-- - The particle, right?
01:15:38.420 | So if you look, so structure, right?
01:15:40.260 | So what this viral particle looks like, right?
01:15:43.900 | So as I said, the surface of it is an envelope,
01:15:48.900 | which is essentially a so-called lipid bilayer
01:15:52.780 | with proteins integrated into the surface.
01:15:59.100 | So an average particle is around 80 nanometers, right?
01:16:09.780 | So this particle can have about 50 to 100 spike proteins.
01:16:14.780 | So at least we suspect it,
01:16:22.980 | and based on the micrographs images,
01:16:26.480 | it's very comparable to MHV virus in mice and SARS virus.
01:16:31.480 | - Micrographs are actual pictures of the actual--
01:16:35.460 | - Virus.
01:16:36.300 | - Okay, so these are models, this is actual--
01:16:38.500 | - This is the actual images, right?
01:16:40.620 | - What are they, sorry for the tangents,
01:16:42.340 | but what are these things?
01:16:43.900 | So when you look on the internet,
01:16:46.120 | the models and the pictures are,
01:16:48.100 | and the models you have here
01:16:50.220 | are just gorgeous and beautiful.
01:16:52.460 | When you actually take pictures of them
01:16:54.020 | with a micrograph, like what do we look?
01:16:56.200 | - Well, they typically are not perfect, right?
01:17:00.100 | So most of the images that you see now
01:17:03.900 | is the sphere with those spikes.
01:17:08.900 | - You actually see the spikes?
01:17:10.580 | - Yes, you do see the spikes.
01:17:12.660 | And now, our collaborators for Texas A&M University,
01:17:17.660 | Benjamin Newman, he actually,
01:17:25.340 | in the recent paper about SARS, he proposed,
01:17:27.800 | and there's some actually evidence behind it,
01:17:32.080 | that the particle is not a sphere,
01:17:34.820 | but is actually is an elongated ellipsoid-like particle.
01:17:39.820 | So that's what we are trying to incorporate into our model.
01:17:47.260 | And I mean, if you look at the actual micrographs,
01:17:54.660 | you see that those particles are not symmetric,
01:18:02.400 | so some of them, and of course,
01:18:05.520 | it could be due to the treatment of the material,
01:18:10.360 | it could be due to some noise in the imaging.
01:18:14.880 | - Right, so there's a lot of uncertainty in all this.
01:18:16.800 | So it's okay, so structurally figuring out the entire part.
01:18:20.800 | By the way, again, sorry for the tangents,
01:18:22.860 | but why the term particle?
01:18:26.560 | Or is it just--
01:18:27.400 | - It's a single, so we call it the vira,
01:18:31.440 | virion, so virion particle, it's essentially a single virus.
01:18:35.700 | - Single virus, but it just feels like,
01:18:37.920 | 'cause particle to me, from the physics perspective,
01:18:41.620 | feels like the most basic unit,
01:18:44.500 | 'cause there seems to be so much going on inside the virus.
01:18:49.600 | It doesn't feel like a particle to me.
01:18:51.000 | - Yeah, well, yeah, it's probably, I think it's,
01:18:53.500 | virion is a good way to call it.
01:18:58.160 | So, okay, so trying to figure out
01:19:01.360 | the entirety of the system.
01:19:05.440 | - Yes, so this is, so the virion has 50 to 100 spikes,
01:19:10.440 | primer spikes, it has roughly 200 to 400
01:19:19.920 | membrane protein dimers, and those are arranged
01:19:26.800 | in a very nice lattice, so you can actually see
01:19:29.720 | sort of the, it's like a, it's a carpet of--
01:19:34.720 | - On the surface again.
01:19:36.480 | - Exactly, on the surface.
01:19:38.120 | And occasionally you also see this envelope protein inside.
01:19:43.120 | And some--
01:19:44.480 | - Is that the one we don't know what it does?
01:19:46.080 | - Exactly, exactly, the one that forms the pentamer,
01:19:48.880 | this very nice pentameric ring.
01:19:51.840 | And so, you know, so this is what we're trying to,
01:19:55.440 | you know, we're trying to put now all our knowledge together
01:20:00.440 | and see whether we can actually generate
01:20:03.480 | this overall virion model with an idea to understand,
01:20:08.480 | you know, well, first of all, to understand how it looks like
01:20:16.720 | how far it is from those images that were generated.
01:20:21.320 | But I mean, the implications are, you know,
01:20:25.280 | there is a potential for the, you know,
01:20:27.920 | nanoparticle design that will mimic this virion particle.
01:20:33.960 | - Is the process of nanoparticle design,
01:20:38.680 | meaning artificially designing something
01:20:40.680 | that looks similar?
01:20:41.600 | - Yes, you know, so the one that can potentially compete
01:20:46.600 | with the actual virion particles,
01:20:50.560 | and therefore reduce the effect of the infection.
01:20:55.460 | - So is this the idea of, like, what is a vaccine?
01:20:59.420 | - So vaccine, vaccine, so, yeah,
01:21:02.500 | so there are two ways of essentially treating,
01:21:06.420 | and in the case of vaccine is preventing the infection.
01:21:10.640 | So vaccine is, you know, a way to train our immune system.
01:21:15.640 | So our immune system becomes aware of this new danger.
01:21:25.300 | And therefore is capable of generating the antibodies,
01:21:30.300 | then will essentially bind to the spike proteins,
01:21:35.820 | 'cause that's the main target for the, you know,
01:21:40.020 | for the vaccine's design, and block its functioning.
01:21:45.020 | If you have the spike with the antibody on top,
01:21:51.820 | it can no longer interact with AC2 receptor.
01:21:55.200 | - So the process of designing a vaccine, then,
01:22:00.780 | is you have to understand enough about the structure
01:22:03.100 | of the virus itself to be able to create an artificial,
01:22:07.500 | an artificial particle?
01:22:10.320 | - Well, I mean, so, so, so,
01:22:12.180 | nanoparticle is a very exciting and new research.
01:22:16.940 | So there are already established ways to, you know,
01:22:21.020 | to make vaccines, and there are several different ones,
01:22:25.460 | right, so there is one where essentially the virus
01:22:30.460 | gets through the cell culture multiple times,
01:22:34.620 | so it becomes essentially, you know,
01:22:36.780 | adjusted to the specific embryonic cell,
01:22:41.780 | and as a result becomes less, you know,
01:22:47.380 | compatible with the, you know, host human cells.
01:22:52.380 | So, and therefore it's sort of the idea of the life vaccine,
01:22:57.560 | where the particles are there,
01:23:01.700 | but they are not so efficient, you know,
01:23:03.940 | so they cannot replicate, you know,
01:23:07.220 | as rapidly as, you know, before the vaccine.
01:23:12.220 | And they can be introduced to the immune system.
01:23:16.020 | The immune system will learn,
01:23:18.180 | and the person who gets this vaccine won't get, you know,
01:23:23.180 | sick or, you know, will have mild, you know, mild symptoms.
01:23:29.460 | So then there is sort of different types of the way
01:23:32.980 | to introduce the non-functional,
01:23:35.940 | non-functional parts of this virus,
01:23:40.640 | or the virus where some of the information is stripped down.
01:23:45.680 | For example, the virus with no genetic material,
01:23:49.620 | so with no RNA genome, exactly.
01:23:52.740 | So it cannot replicate.
01:23:54.220 | It cannot essentially perform most of its function.
01:23:58.500 | - This is fascinating.
01:24:00.140 | What is the biggest hurdle to design one of these,
01:24:04.580 | to arrive at one of these?
01:24:05.740 | Is it the work that you're doing
01:24:08.340 | in the fundamental understanding of this new virus,
01:24:11.020 | or is it in the, from our perspective,
01:24:14.140 | well, complicated world of experimental validation,
01:24:18.460 | and sort of showing that this,
01:24:19.980 | like going through the whole process of showing
01:24:21.740 | this is actually gonna work with FDA approval,
01:24:23.860 | all that kind of stuff?
01:24:24.940 | - I think it's both.
01:24:26.060 | I mean, you know, our understanding
01:24:28.020 | of the molecular mechanisms will allow us to, you know,
01:24:32.380 | to design, to have more efficient designs of the vaccines.
01:24:36.340 | However, once you design the vaccine, it needs to be tested.
01:24:43.140 | - But when you look at the 18 months
01:24:44.980 | and the different projections,
01:24:46.540 | it seems like an exceptionally, historically speaking,
01:24:50.340 | maybe you can correct me, but it's,
01:24:52.060 | even 18 months seems like a very accelerated timeline.
01:24:55.180 | - It is, it is.
01:24:56.620 | I mean, I remember reading about,
01:25:00.140 | in a book about some previous vaccines,
01:25:04.980 | that it could take up to 10 years to design
01:25:08.140 | and properly test a vaccine before its mass production.
01:25:13.140 | So yeah, we, you know,
01:25:16.100 | everything is accelerated these days.
01:25:18.140 | I mean, for better, for worse,
01:25:20.980 | but, you know, we definitely need that.
01:25:23.980 | - Well, especially with the coronavirus,
01:25:25.420 | I mean, the scientific community is really stepping up
01:25:27.540 | and working together.
01:25:28.440 | The collaborative aspect is really interesting.
01:25:30.700 | You mentioned, so the vaccine is one,
01:25:33.020 | and then there's antivirals, antiviral drugs.
01:25:36.140 | - So antiviral drugs.
01:25:37.320 | So where, you know,
01:25:38.420 | vaccines are typically needed to prevent the infection.
01:25:42.260 | Right, but once you have an infection,
01:25:44.060 | one, you know, so what we try to do,
01:25:47.300 | we try to stop it.
01:25:48.300 | So we try to stop a virus from functioning.
01:25:51.980 | And so the antiviral drugs are designed
01:25:55.220 | to block some critical functioning
01:25:58.940 | of the proteins
01:26:03.500 | from the virus.
01:26:06.780 | So there are a number of interesting candidates.
01:26:11.280 | And I think, you know, if you ask me,
01:26:15.300 | I, you know, I think Remdesivir
01:26:22.340 | is perhaps the most promising.
01:26:24.440 | It's, it has been shown to be, you know,
01:26:31.100 | an efficient and effective antiviral
01:26:35.520 | for SARS.
01:26:37.680 | Originally it was the antiviral drug
01:26:42.760 | developed for a completely different virus,
01:26:46.960 | I think for Ebola and Marburg.
01:26:49.560 | - At high levels, do you know how it works?
01:26:52.000 | - So it tries to mimic
01:26:54.400 | one of the nucleotides in RNA,
01:26:59.160 | and essentially that stops the replication.
01:27:05.100 | - So it messes, I guess that's what,
01:27:07.480 | so antiviral drugs mess with some aspect of this process.
01:27:11.480 | - So, you know, so essentially we try
01:27:13.080 | to stop certain functions of the virus.
01:27:17.700 | There are some other ones, you know,
01:27:20.660 | that are designed to inhibit the protease,
01:27:26.940 | the thing that clips protein sequences.
01:27:32.200 | There is one that was originally designed for malaria,
01:27:37.200 | which is a bacterial, you know, bacterial disease, so.
01:27:41.580 | - This is so cool, so but that's exactly
01:27:44.180 | where your work steps in,
01:27:45.520 | is you're figuring out the functional,
01:27:47.480 | and the structure of these different,
01:27:50.500 | so like providing candidates for where drugs can plug in.
01:27:54.380 | - Exactly, well yes, because, you know,
01:27:57.260 | one thing that we don't know is
01:28:01.140 | whether or not, so let's say we have
01:28:03.380 | a perfect drug candidate that is efficient
01:28:06.460 | against SARS and against MERS.
01:28:09.060 | Now, is it gonna be efficient against new SARS-CoV-2?
01:28:14.060 | We don't know that, and there are multiple aspects
01:28:20.060 | that can affect this efficiency.
01:28:22.500 | So for instance, if the binding site,
01:28:26.820 | so the part of the protein where this ligand
01:28:30.700 | gets attached, if this site is mutated,
01:28:34.780 | then the ligand may not be attachable
01:28:38.020 | to this part any longer.
01:28:40.820 | And, you know, our work and the work
01:28:44.260 | of other bioinformatics groups, you know,
01:28:47.660 | essentially are trying to understand
01:28:51.500 | whether or not that will be the case,
01:28:53.680 | or, and it looks like for the ligands
01:28:57.740 | that we looked at, the ligand binding sites
01:29:02.740 | are pretty much intact, which is very promising.
01:29:07.500 | - So if we could just like zoom out for a second,
01:29:10.580 | are you optimistic, so there's two,
01:29:17.140 | well, there's three possible ends
01:29:18.780 | to the coronavirus pandemic.
01:29:21.640 | So one is there's, or drugs or vaccines
01:29:26.320 | get figured out very quickly, probably drugs first.
01:29:30.600 | The other is the pandemic runs its course
01:29:35.600 | for this wave, at least.
01:29:37.560 | And then the third is, you know,
01:29:40.800 | things go much worse in some dark, bad,
01:29:45.400 | very bad direction.
01:29:47.240 | Do you see, let's focus on the first two.
01:29:49.560 | Do you see the anti-drugs or the work you're doing
01:29:54.520 | being relevant for us right now
01:29:59.480 | in stopping the pandemic, or do you hope
01:30:04.200 | that the pandemic will run its course?
01:30:07.080 | So the social distancing, things like wearing masks,
01:30:12.080 | all those discussions that we're having
01:30:14.760 | will be the method with which we fight coronavirus
01:30:19.760 | in the short term, or do you think
01:30:22.880 | that it'll have to be antiviral drugs?
01:30:25.200 | - I think antivirals would be,
01:30:30.520 | I would view that as the, at least the short-term solution.
01:30:37.320 | I see more and more cases in the news
01:30:41.860 | of those new drug candidates being administered
01:30:46.860 | in hospitals, and I mean, this is,
01:30:52.280 | this is right now the best what we have.
01:30:55.920 | - But do we need it in order to reopen the economy?
01:30:59.320 | - I mean, we definitely need it.
01:31:02.080 | I cannot sort of speculate on how that will affect
01:31:07.080 | reopening of the economy, because we are,
01:31:11.640 | you know, we are kind of deep into the pandemic,
01:31:16.640 | and it's not just the states, it's also, you know,
01:31:20.280 | worldwide, you know, of course,
01:31:24.980 | you know, there is also the possibility
01:31:28.640 | of the second wave, as we, you know, as you mentioned,
01:31:33.640 | and this is why, you know, we need to be super careful.
01:31:39.540 | We need to follow all the precautions
01:31:47.120 | that the doctors tell us to do.
01:31:50.600 | - Are you worried about the mutation of the virus?
01:31:54.760 | - It's, of course, a real possibility.
01:31:58.960 | Now, how, to what extent this virus can mutate,
01:32:03.960 | it's an open question.
01:32:07.160 | I mean, we know that it is able to mutate,
01:32:11.180 | to jump from one species to another,
01:32:13.340 | and to become transmittable between humans, right?
01:32:19.340 | So, will it, you know, so let's imagine
01:32:24.060 | that we have the new antiviral.
01:32:26.980 | Will this virus become eventually resistant
01:32:31.980 | to this antiviral?
01:32:33.460 | We don't know.
01:32:34.300 | I mean, this is what needs to be studied.
01:32:36.900 | - This is such a beautiful and terrifying process
01:32:41.020 | that a virus, some viruses may be able to mutate
01:32:45.500 | to respond to the, to mutate around
01:32:48.860 | the thing we've put before it.
01:32:52.140 | Can you explain that process?
01:32:53.640 | Like, how does that happen?
01:32:55.100 | Is that just the way of evolution?
01:32:57.240 | - I would say so, yes.
01:32:59.660 | I mean, it's the evolutionary mechanisms.
01:33:02.780 | There is nothing imprinted into this virus
01:33:07.700 | that makes it, you know, it just, the way it evolves,
01:33:12.820 | and actually, it's the way it co-evolves with its host.
01:33:17.120 | - It's just amazing, especially the evolutionary mechanisms,
01:33:22.260 | especially amazing given how simple the virus is.
01:33:27.260 | It's incredible that it's, I mean, it's beautiful.
01:33:32.700 | It's beautiful because it's one of the cleanest examples
01:33:36.500 | of evolution working.
01:33:38.940 | - Well, I think, I mean, the, one of the sort of,
01:33:42.620 | the reasons for its simplicity is because
01:33:47.180 | it does not require all the necessary functions
01:33:52.180 | to be stored, right?
01:33:54.060 | So it actually can hijack the majority
01:33:58.460 | of the necessary functions from the host cell.
01:34:00.940 | And so, so, so, so the ability to do so,
01:34:05.380 | in my view, reduces the complexity of this machine
01:34:10.380 | drastically, although if you look at the, you know,
01:34:14.540 | most recent discoveries, right?
01:34:16.140 | So the scientists discovered viruses
01:34:19.300 | that are as large as bacteria, right?
01:34:22.340 | So this mimi viruses and mama viruses,
01:34:25.980 | it actually, those discoveries made scientists
01:34:31.700 | to reconsider the origins of the virus, you know,
01:34:36.700 | and what are the mechanisms and how, you know,
01:34:40.460 | what are the mechanisms, the evolutionary mechanisms
01:34:43.940 | that leads to the appearance of the viruses?
01:34:46.900 | - By the way, I mean, you did mention that viruses are,
01:34:50.460 | I think you mentioned that they're not living.
01:34:52.900 | - Yes, they're not living organisms.
01:34:54.740 | - So let me ask that question again.
01:34:56.500 | Why do you think they're not living organisms?
01:35:00.620 | - Well, because they are dependent.
01:35:04.340 | The majority of the functions of the virus
01:35:08.420 | are dependent on the host.
01:35:13.020 | - So let me do the devil's advocate.
01:35:15.420 | Let me be the philosophical devil's advocate here and say,
01:35:19.820 | well, humans, which we would say are living,
01:35:23.380 | need our host planet to survive.
01:35:27.940 | So you can basically take every living organism
01:35:32.700 | that we think of as definitively living,
01:35:35.100 | it's always going to have some aspects of its host
01:35:39.380 | that it needs of its environment.
01:35:42.540 | So is that really the key aspect of why a virus,
01:35:48.340 | is that dependence?
01:35:49.820 | Because it seems to be very good at doing so many things
01:35:55.460 | that we consider to be intelligent.
01:35:58.300 | It's just that dependence part.
01:36:00.980 | - Well, I mean, it, yeah, it's difficult to answer
01:36:05.980 | in this way.
01:36:11.180 | I mean, the way I think about the virus is,
01:36:15.940 | in order for it to function,
01:36:22.540 | it needs to have the critical component,
01:36:26.700 | the critical tools that it doesn't have.
01:36:31.380 | So, I mean, that's, in my way,
01:36:36.900 | it's not autonomous.
01:36:42.380 | That's how I separate the idea of the living organism,
01:36:47.380 | on a very high level, between the living organism and--
01:36:52.100 | - And you have some, we have, I mean,
01:36:54.100 | these are just terms, and perhaps they don't mean much,
01:36:57.500 | but we have some kind of sense of what autonomous means,
01:37:00.980 | and that humans are autonomous.
01:37:02.800 | You've also done excellent work
01:37:08.660 | in the epidemiological modeling,
01:37:13.660 | the simulation of these things.
01:37:15.460 | So the zooming out outside of the body,
01:37:17.860 | doing the agent-based simulation.
01:37:19.740 | So that's where you actually simulate
01:37:22.140 | individual human beings,
01:37:24.580 | and then the spread of viruses from one to the other.
01:37:27.220 | How does, at a high level, agent-based simulation work?
01:37:33.240 | - All right, so it's also one of this irony of timing,
01:37:39.100 | 'cause, I mean, we've worked on this project
01:37:44.700 | for the past five years,
01:37:46.860 | and the New Year's Eve, I got an email
01:37:51.580 | from my PhD student that the last experiments were completed.
01:37:56.580 | And three weeks after that,
01:38:00.300 | we get this Diamond Princess story,
01:38:03.300 | and emailing each other with the same,
01:38:07.940 | the same news saying like--
01:38:10.100 | - So the Diamond Princess is a cruise ship,
01:38:12.860 | and what was the project that you worked on?
01:38:14.900 | - So the project, I mean, it's,
01:38:16.880 | the code name, it started with a bunch of undergraduates.
01:38:22.360 | The code name was Zombies on a Cruise Ship.
01:38:26.180 | So they wanted to essentially model
01:38:29.660 | the zombie apocalypses on a cruise ship.
01:38:34.660 | And after having some fun,
01:38:39.020 | we then thought about the fact that
01:38:41.860 | if you look at the cruise ships,
01:38:44.140 | I mean, the infectious outbreak has been
01:38:48.260 | one of the biggest threats to the cruise ship economy.
01:38:53.260 | So perhaps the most frequently occurring
01:38:58.940 | is the Norwalk virus.
01:39:00.560 | And this is essentially one of this stomach flus
01:39:06.820 | that you have.
01:39:08.380 | And it can be quite devastating.
01:39:12.940 | So there are, occasionally there are cruise ships get,
01:39:17.940 | you know, they get canceled,
01:39:20.300 | they get returned to the, back to the origin.
01:39:24.460 | And so we wanted to study,
01:39:28.020 | and this is very different
01:39:29.140 | from the traditional epidemiological studies
01:39:31.780 | where the scale is much larger.
01:39:34.300 | So we wanted to study this in a confined environment,
01:39:39.060 | which is a cruise ship.
01:39:40.220 | It could be a school.
01:39:41.500 | It could be other, you know, other places such as,
01:39:46.500 | you know, the large company
01:39:50.660 | where people are in interaction.
01:39:54.260 | And the benefit of this model is
01:39:58.460 | we can actually track that in the real time.
01:40:01.960 | So we can actually see the whole course of the evolution,
01:40:05.660 | the whole course of the interaction
01:40:10.260 | between the infected horse
01:40:15.260 | and, you know, the host and the pathogen, et cetera.
01:40:21.260 | So agent-based system or multi-agent system
01:40:26.260 | to be precisely,
01:40:29.500 | is a good way to approach this problem
01:40:37.820 | because we can introduce the behavior
01:40:42.820 | of the passengers, of the cruise.
01:40:47.700 | And what we did for the first time,
01:40:50.320 | that's where, you know, we introduce some novelty
01:40:53.940 | is we introduce a pathogen agent explicitly.
01:40:58.940 | So that allowed us to essentially model the behavior
01:41:06.400 | on the horse side, as well on the pathogen side.
01:41:11.400 | And over sudden we can have a flexible model
01:41:16.860 | that allows us to integrate all the key parameters
01:41:22.260 | about the infections.
01:41:23.940 | So for example, the virus, right?
01:41:28.940 | So the ways of transmitting the virus
01:41:33.140 | between the horse.
01:41:36.480 | How long does virus survive on the surface, the fomite?
01:41:43.100 | What is, you know, how much of the viral particles
01:41:52.540 | does a horse shed when he or she is asymptomatic
01:41:59.780 | versus symptomatic?
01:42:02.900 | - And you can encode all of that into this path.
01:42:05.460 | And just for people who don't know,
01:42:06.840 | so agent-based simulation,
01:42:08.360 | usually the agent represents a single human being.
01:42:11.700 | And then there's some graphs, like contact graphs
01:42:16.080 | that represent the interaction between those human beings.
01:42:18.840 | - So yes, so we, so essentially, you know,
01:42:22.400 | so agents are, you know, individual programs
01:42:27.240 | that are run in parallel.
01:42:30.700 | And we can provide instructions for these agents
01:42:35.700 | how to interact with each other,
01:42:40.140 | how to exchange information,
01:42:42.180 | in this case, exchange the infection.
01:42:45.540 | - But in this case, in your case,
01:42:46.820 | you've added a pathogen as an agent.
01:42:49.900 | I mean, that's kind of fascinating.
01:42:51.540 | It's kind of a brilliant, like a brilliant way
01:42:56.540 | to condense the parameters, to aggregate,
01:43:00.340 | to bring the parameters together
01:43:01.900 | that represent the pathogen, the virus.
01:43:04.500 | - Yes.
01:43:05.340 | - That's fascinating, actually.
01:43:06.980 | - So yeah, it was, you know, we realized that,
01:43:10.780 | you know, by bringing in the virus,
01:43:13.320 | we can actually start modeling.
01:43:15.540 | I mean, we are not, no longer bounded
01:43:18.460 | by very specific sort of aspects of the specific virus.
01:43:23.460 | So we end up, we started with, you know,
01:43:28.860 | Norwalk virus, and of course zombies,
01:43:31.040 | but we continued to modeling Ebola virus outbreak,
01:43:36.040 | flu, SARS, and because I felt that we need
01:43:41.580 | to add a little bit more sort of excitement
01:43:48.020 | for our undergraduate students.
01:43:51.380 | So we actually modeled the virus from the contagion movie.
01:43:56.140 | - Yes.
01:43:57.060 | - So MEV-1, and you know, unfortunately,
01:44:02.060 | that virus, and we tried to extract as much information.
01:44:06.980 | Luckily, this movie was, the scientific consultant
01:44:11.580 | was Jan Lipkin, a virologist from Columbia University,
01:44:17.020 | who is actually, who provided, I think he designed
01:44:22.020 | this virus for this movie based on Nipah virus,
01:44:26.740 | and I think with some ideas behind SARS,
01:44:31.020 | or flu, like airborne viruses.
01:44:34.140 | And, you know, the movie surprisingly contained
01:44:39.140 | enough details for us to extract and to model it.
01:44:43.740 | - I was hoping you would like publish a paper
01:44:45.740 | of how this virus works.
01:44:47.380 | - Yeah, we are planning to publish it.
01:44:49.340 | - I would love it if you did.
01:44:50.540 | But it would be nice if the, you know,
01:44:52.500 | of the origin of the virus, but you're now
01:44:57.500 | actually being a scientist and studying the virus
01:45:00.500 | from that perspective.
01:45:01.780 | - But the origin of the virus, you know,
01:45:04.820 | the first time, actually, so this movie
01:45:07.420 | is assignment number one in my bioinformatics class
01:45:11.300 | that I give, because it also tells you that,
01:45:16.300 | you know, bioinformatics can be of use,
01:45:19.420 | 'cause if, I don't know, you watched it,
01:45:22.660 | have you watched it?
01:45:23.500 | - A long time ago, yeah.
01:45:24.460 | - So there is, you know, approximately a week
01:45:28.860 | from the virus detection, we see a screenshot
01:45:33.100 | of a scientist looking at the structure
01:45:37.500 | of the surface protein.
01:45:39.340 | And this is where I tell my students that,
01:45:41.460 | you know, if you ask an experimental biologist,
01:45:45.220 | they will tell you that it's impossible,
01:45:48.040 | because it takes months, maybe years,
01:45:51.100 | to get the crystal structure of this,
01:45:54.060 | you know, the structure that is represented.
01:45:56.220 | If you ask a bioinformatician, they tell you,
01:45:59.860 | sure, why not?
01:46:01.060 | You know, we'll just get it modeled.
01:46:03.940 | - Yeah.
01:46:04.780 | - And, yes, so, but it was very interesting
01:46:09.540 | to see that there's actually, you know,
01:46:12.380 | and if you do it, do screenshots,
01:46:17.500 | you actually see the phylogenetic tree,
01:46:19.420 | the evolutionary tree that relate this virus
01:46:21.960 | with other viruses.
01:46:23.520 | So it was a lot of scientific thought put into the movie.
01:46:27.840 | And one thing that I was actually, you know,
01:46:31.280 | it was interesting to learn is that the origin
01:46:34.480 | of this virus was, there were two animals
01:46:38.480 | that led to the, you know, the zoonotic origin
01:46:46.020 | of this virus were fruit bat and a pig.
01:46:50.460 | So, you know, so this is--
01:46:54.200 | - This doesn't feel like we're,
01:46:57.920 | this definitely feels like we're living in a simulation.
01:47:00.600 | Okay, but maybe a big picture,
01:47:05.600 | agent-based simulation now, larger scale,
01:47:08.980 | sort of not focused on a crucia,
01:47:10.840 | but larger scale, are used now to drive some policy.
01:47:14.880 | So politicians use them to tell stories and narratives
01:47:17.760 | and try to figure out how to move forward
01:47:21.680 | under so much, so much uncertainty.
01:47:24.200 | But in your sense, are agent-based simulation
01:47:28.200 | useful for actually predicting the future?
01:47:30.920 | Or are they useful mostly for comparing,
01:47:34.720 | relative comparison of different intervention methods?
01:47:38.340 | - Well, I think both, because, you know,
01:47:41.320 | in the case of new coronavirus,
01:47:44.500 | we essentially learning that the current intervention
01:47:50.920 | methods may not be efficient enough.
01:47:53.600 | One thing that, one important aspect that I find
01:47:59.840 | to be so critical, and yet something
01:48:09.240 | that was overlooked, you know, during the past pandemics,
01:48:14.240 | is the effect of the symptomatic period.
01:48:19.480 | This virus is different because it has such a long
01:48:23.400 | symptomatic period, and over sudden,
01:48:28.400 | that creates a completely new game
01:48:31.880 | when trying to contain this virus.
01:48:34.280 | - In terms of the dynamics of the infection.
01:48:36.800 | - Exactly.
01:48:38.920 | - Do you also, I don't know how close
01:48:41.680 | you're tracking this, but do you also think
01:48:45.400 | that there's a different rate of infection
01:48:49.420 | for when you're asymptomatic, like that aspect,
01:48:53.480 | or does the virus not care?
01:48:55.520 | - So there were a couple of works.
01:48:59.840 | So one important parameter that tells us
01:49:04.080 | how contagious the person with asymptomatic virus
01:49:09.080 | versus asymptomatic is looking at the number
01:49:14.160 | of viral particles this person sheds,
01:49:17.940 | you know, as a function of time.
01:49:21.240 | So, so far, what I saw is the study that tells us
01:49:31.640 | that the, you know, the person during the asymptomatic period
01:49:36.640 | is already contagious, and it sheds,
01:49:41.040 | the person sheds enough viruses to infect another host.
01:49:46.040 | - Yeah, and I think there's so many excellent papers
01:49:50.560 | coming out, but I think I just saw maybe a Nature paper
01:49:53.520 | that said the first week is when you're symptomatic
01:49:58.520 | or asymptomatic, you're the most contagious,
01:50:00.920 | so the highest level of, like, they plot sort of,
01:50:05.200 | in the 14-day period, they collected a bunch of subjects,
01:50:09.200 | and I think the first week is when it's the most contagious.
01:50:11.600 | - Yeah, I think, I mean, I'm waiting,
01:50:15.360 | I'm waiting to see sort of more populated studies,
01:50:20.360 | or the study with higher numbers.
01:50:23.200 | My, one of my favorite studies was, again,
01:50:29.400 | a very recent one, where scientists determined
01:50:32.680 | that tears are not contagious.
01:50:37.680 | So, so there is, you know, so there is no viral shedding
01:50:44.000 | done through tears.
01:50:45.660 | - So they found one moist thing that's not contagious,
01:50:51.160 | and I mean, there's a lot of, I've personally been,
01:50:55.960 | 'cause I'm on a survey paper, somehow,
01:50:59.240 | that's looking at masks, and there's been so much
01:51:03.320 | interesting debates on the efficacy of masks,
01:51:05.720 | and there's a lot of work, and there's a lot of
01:51:08.720 | interesting work on whether this virus is airborne.
01:51:13.360 | I mean, it's a totally open question,
01:51:15.320 | it's leaning one way right now,
01:51:16.800 | but it's a totally open question,
01:51:19.000 | whether it can travel in aerosols long distances.
01:51:22.920 | I mean, do you have a, do you think about this stuff,
01:51:25.120 | do you track this stuff, are you focused on the--
01:51:27.240 | - Yeah, I mean, you know-- - The bioinformatics of it?
01:51:28.960 | - I mean, this is a very important aspect
01:51:33.240 | for our epidemiology study.
01:51:35.700 | I think the, I mean, and it's sort of a very simple
01:51:41.880 | sort of idea, but I agree with people who say that
01:51:48.680 | the mask, the masks work in both ways.
01:51:55.780 | So it not only protects you from the incoming viral
01:52:00.780 | particles, it also, you know, it makes the potentially
01:52:06.300 | contagious person not to spread the viral particles.
01:52:11.420 | - Who is, when they're asymptomatic,
01:52:13.420 | may not even know that they're--
01:52:14.860 | - Exactly. - In fact, it seems to be
01:52:16.780 | there's evidence that they don't,
01:52:18.420 | surgical, and certainly homemade masks,
01:52:21.620 | which is what's needed now, actually,
01:52:23.380 | because there's a huge shortage of,
01:52:25.780 | they don't work as to protect you that well.
01:52:29.500 | They work much better to protect others.
01:52:31.860 | So it's a motivation for us to all wear one.
01:52:36.860 | - Yeah, exactly, 'cause I mean, you don't know where,
01:52:39.660 | you know, and about 30%, as far as I remember,
01:52:44.660 | at least 30% of the asymptomatic cases
01:52:48.380 | are completely asymptomatic, right?
01:52:51.060 | So you don't really cough, you don't,
01:52:53.540 | I mean, you don't have any symptoms, yet you shed viruses.
01:52:59.020 | - Do you think it's possible that we'll all wear masks?
01:53:01.380 | So I wore a mask at a grocery store,
01:53:03.780 | and you just, you get looks, I mean,
01:53:06.500 | this was like a week ago, maybe it's already changed,
01:53:09.860 | because I think CDC or somebody,
01:53:12.780 | I think the CDC has said that we should be wearing masks,
01:53:15.300 | like LA, it's starting to happen.
01:53:18.160 | But do you, it just seems like something
01:53:20.260 | that this country will really struggle doing, or no?
01:53:24.320 | - I hope not.
01:53:26.540 | I mean, you know, it was interesting,
01:53:28.460 | I was looking through the old pictures
01:53:33.220 | during the Spanish flu, and you could see that the,
01:53:38.220 | you know, pretty much everyone was wearing masks,
01:53:43.860 | with some exceptions, and there were like, you know,
01:53:46.500 | sort of iconic photograph of the,
01:53:49.340 | I think it was San Francisco, this tram,
01:53:52.980 | who was refusing to let in a, you know,
01:53:57.140 | someone without a mask.
01:53:59.320 | So I think, well, you know, it's also,
01:54:03.780 | you know, it's related to the fact of,
01:54:06.580 | you know, how much we are scared, right?
01:54:10.780 | So how much do we treat this problem seriously?
01:54:17.720 | And, you know, my take on it is we should,
01:54:22.720 | 'cause it is very serious.
01:54:27.360 | - Yeah, I, from a psychology perspective,
01:54:31.940 | just worry about the entirety, the entire big mess
01:54:36.380 | of a psychology experiment that this is,
01:54:38.680 | whether a mask will help it or hurt it.
01:54:42.620 | You know, masks have a way of distancing us from others
01:54:46.720 | by removing the emotional expression
01:54:50.120 | and all that kind of stuff, but at the same time,
01:54:52.540 | masks also signal that I care about your well-being.
01:54:57.540 | - Exactly.
01:54:58.540 | - So it's a really interesting trade-off that's just--
01:55:02.260 | - Yeah, it's interesting, right, about distancing.
01:55:05.260 | Aren't we distanced enough?
01:55:07.660 | - Right, exactly.
01:55:08.880 | And when we try to come closer together,
01:55:13.720 | when they do reopen the economy,
01:55:16.460 | that's going to be a long road of rebuilding trust
01:55:20.740 | and not all being huge germaphobes.
01:55:23.620 | Let me ask sort of, you have a bit of a Russian accent.
01:55:29.740 | Russian or no?
01:55:31.700 | Russian accent? - Russian.
01:55:33.060 | - So were you born in Russia?
01:55:35.700 | - Yes, and you're too kind.
01:55:38.020 | I have a pretty thick Russian accent.
01:55:40.840 | - What are your favorite memories of Russia?
01:55:45.180 | - So I moved first to Canada
01:55:50.180 | and then to the United States back in '99.
01:55:54.820 | So by that time, I was 22,
01:55:56.860 | so whatever Russian accent I got back then,
01:56:01.860 | it's stuck with me for the rest of my life.
01:56:12.540 | By the time the Soviet Union collapsed,
01:56:15.980 | I was a kid, but sort of old enough
01:56:22.900 | to realize that there are changes.
01:56:26.300 | - Did you want to be a scientist back then?
01:56:30.700 | - Oh yes, oh yeah.
01:56:31.700 | I mean, the first sort of 10 years
01:56:39.420 | of my sort of juvenile life,
01:56:43.920 | I wanted to be a pilot of a passenger jet plane.
01:56:50.060 | - Wow.
01:56:51.740 | - So yes, it was like, I was getting ready
01:56:55.820 | to go to a college to get the degree,
01:57:01.620 | but I've been always fascinated by science
01:57:06.940 | and so not just by math.
01:57:10.580 | Of course, math was one of my favorite subjects,
01:57:13.140 | but biology, chemistry, physics,
01:57:16.580 | somehow I liked those four subjects together.
01:57:21.580 | And yes, so essentially after a certain period of time,
01:57:32.500 | I wanted to actually, back then it was a very popular
01:57:37.500 | sort of area of science called cybernetics.
01:57:42.820 | So it's sort of, it's not really computer science,
01:57:45.440 | but it was like computational robotics in this sense.
01:57:50.440 | And so I really wanted to do that.
01:57:53.620 | But then I realized that, you know,
01:58:01.740 | my biggest passion was in mathematics.
01:58:06.740 | And later I, you know, when, you know,
01:58:11.580 | studying in Moscow State University,
01:58:14.300 | I also realized that I really want to apply the knowledge.
01:58:19.300 | So I really wanted to mix, you know,
01:58:25.940 | the mathematical knowledge that I get
01:58:28.620 | with real life problems.
01:58:30.940 | - And that could be, you mentioned chemistry
01:58:33.820 | and now biology.
01:58:36.320 | And I sort of, does it make you sad?
01:58:41.780 | Maybe I'm wrong on this, but it seems like
01:58:44.660 | it's difficult to be in collaboration,
01:58:50.580 | to do open, big science in Russia.
01:58:54.480 | From my distant perspective in computer science,
01:58:57.860 | I don't, I'm not, we can go to conferences in Russia.
01:59:02.100 | I sadly don't have many collaborators in Russia.
01:59:05.260 | I don't know many people doing great AI work in Russia.
01:59:08.960 | Does that make you sad?
01:59:12.620 | Am I wrong in seeing it this way?
01:59:14.940 | - Well, I mean, I am, I have to tell you,
01:59:17.660 | I am privileged to have collaborators
01:59:21.700 | in bioinformatics in Russia.
01:59:23.860 | And I think this is the bioinformatics school
01:59:26.540 | in Russia is very strong.
01:59:29.020 | We have-- - In Moscow?
01:59:30.680 | - In Moscow, in Novosibirsk, in St. Petersburg,
01:59:35.100 | have great collaborators in Kazan.
01:59:40.260 | And so at least, you know, in terms of,
01:59:46.220 | you know, my area of research--
01:59:51.340 | - There's strong people there.
01:59:53.300 | - Yeah, strong people, a lot of great ideas,
01:59:56.140 | very open to collaborations.
01:59:57.900 | So I, perhaps, you know, it's my luck,
02:00:02.900 | but I haven't experienced any difficulties
02:00:07.820 | in establishing collaborations.
02:00:13.380 | That's bioinformatics, though.
02:00:14.660 | - It could be bioinformatics, too, and it could, yeah,
02:00:17.660 | it could be person-by-person related,
02:00:19.620 | but I just don't feel the warmth and love
02:00:22.660 | that I would, you know, you talk about the seminal people
02:00:27.620 | who are French in artificial intelligence.
02:00:30.100 | France welcomes them with open arms in so many ways.
02:00:34.300 | I just don't feel the love from Russia.
02:00:37.020 | I do on the human beings, like people in general,
02:00:40.980 | like friends and just cool, interesting people,
02:00:44.820 | but from the scientific community, no conferences,
02:00:48.020 | no big conferences, and it's--
02:00:50.660 | - Yeah, it's actually, you know, I'm trying to think.
02:00:54.500 | Yeah, I cannot recall any big AI conferences in Russia.
02:00:59.500 | - It has an effect on, for me,
02:01:04.380 | I haven't, sadly, been back to Russia, so I should,
02:01:08.220 | but my problem is it's very difficult.
02:01:10.260 | So I'm now, I have to renounce citizenship.
02:01:14.220 | - Oh, is that right?
02:01:15.060 | - I mean, I'm a citizen in the United States,
02:01:16.580 | and they make it very difficult.
02:01:17.780 | There's a mess now, right, so.
02:01:19.740 | - Yeah.
02:01:21.140 | - I wanna be able to travel, like, you know, legitimately.
02:01:25.700 | - Yeah, yeah.
02:01:26.540 | - And it's not an obvious process.
02:01:29.220 | They don't make it super easy.
02:01:30.140 | I mean, that's part of that.
02:01:31.260 | Like, you know, it should be super easy
02:01:33.420 | for me to travel there.
02:01:34.700 | - Well, you know, hopefully,
02:01:37.340 | this unfortunate circumstances that we are in
02:01:41.740 | will actually promote the remote collaborations.
02:01:46.740 | - Yes.
02:01:49.220 | - And I think we've, I think what we are experiencing
02:01:52.420 | right now is that you still can do science,
02:01:55.760 | you know, being quarantined in your own homes.
02:02:00.700 | - Yeah.
02:02:01.540 | - Especially when it comes, I mean, you know,
02:02:02.900 | I certainly understand there is a very challenging time
02:02:05.460 | for experimental scientists.
02:02:07.140 | I mean, I have many collaborators who are, you know,
02:02:10.740 | who are affected by that, but for computational scientists.
02:02:14.260 | - Yeah, we're really leaning into the remote communication.
02:02:17.400 | Nevertheless, I had to force you to talk to you in person
02:02:21.660 | 'cause there's something that you just can't do
02:02:24.220 | in terms of conversation like this.
02:02:25.600 | I don't know why, but in person is very much needed.
02:02:29.260 | So I really appreciate you doing it.
02:02:31.900 | You have a collection of science bobbleheads.
02:02:35.100 | - Yes.
02:02:36.420 | - Which look amazing.
02:02:37.980 | Which bobblehead is your favorite
02:02:40.900 | and which real world version,
02:02:43.620 | which scientist is your favorite?
02:02:46.060 | - Yeah.
02:02:46.900 | So yeah, by the way, I was trying to bring it in,
02:02:50.220 | but they are quarantined now in my office.
02:02:53.980 | They sort of demonstrate the social distance.
02:02:56.620 | So they're nicely spaced away from each other.
02:03:01.620 | But so, you know, it's interesting.
02:03:04.220 | So I've been collecting those bobbleheads
02:03:07.660 | for the past, maybe 12 or 13 years.
02:03:10.900 | And it, you know, interestingly enough,
02:03:14.540 | it started with the two bobbleheads of Watson and Crick.
02:03:18.420 | And interestingly enough,
02:03:23.100 | my last bobblehead in this collection for now,
02:03:26.740 | and my favorite one, 'cause I felt so good when I got it,
02:03:31.740 | was the Rosalind Franklin.
02:03:33.800 | And so, you know, when I got--
02:03:38.780 | - Who's the full group?
02:03:40.140 | - So I have Watson, Crick, Newton, Einstein,
02:03:45.060 | Marie Curie, Tesla,
02:03:47.580 | of course, Charles Darwin, sorry, Charles Darwin,
02:03:53.740 | and Rosalind Franklin.
02:03:57.440 | I am definitely missing quite a few of my
02:04:01.420 | favorite scientists.
02:04:04.780 | And, but so, you know,
02:04:06.860 | if I were to add to this collection,
02:04:10.660 | so I would add, of course, Kolmogorov.
02:04:15.660 | (laughing)
02:04:16.540 | - Interesting.
02:04:17.360 | - That's, you know, I've been always fascinated by
02:04:21.840 | his, well, his dedication to science,
02:04:25.820 | but also his dedication to educating young people,
02:04:30.820 | the next generation.
02:04:33.740 | So it's very inspiring.
02:04:36.220 | - He's one of the, okay, yeah,
02:04:38.580 | he's one of Russia's greats.
02:04:40.540 | - Yes, he, yeah, so he also, you know,
02:04:44.860 | the school, the high school that I attended
02:04:47.900 | was named after him, and he was great,
02:04:51.300 | you know, so he founded the school,
02:04:52.940 | and he actually taught there.
02:04:57.220 | - Is this in Moscow?
02:04:59.460 | - Yes.
02:05:00.300 | So, but then, I mean, you know,
02:05:06.660 | other people that I would definitely like
02:05:08.860 | to see in my collections was,
02:05:11.900 | would be Alan Turing, would be John von Neumann.
02:05:16.740 | - Yeah, you're a little bit later
02:05:19.900 | than the computer scientists.
02:05:21.580 | - Yes, well, I mean, they don't make them, you know.
02:05:24.580 | I still am amazed they haven't made Alan Turing.
02:05:28.700 | - Yeah.
02:05:29.540 | - Yet, yes, and I would also add Linus Pauling.
02:05:35.060 | - Linus?
02:05:35.900 | - Pauling.
02:05:36.740 | - Who is Linus Pauling?
02:05:38.500 | - So this is, to me, is one of the greatest chemists,
02:05:43.500 | and the person who actually discovered
02:05:50.380 | the secondary structure of proteins,
02:05:53.620 | who was very close to solving the DNA structure,
02:05:58.620 | and, you know, people argue,
02:06:02.220 | but some of them were pretty sure that if not
02:06:07.220 | for this, you know, photograph 51 by Rosalind Franklin,
02:06:12.380 | that, you know, Watson and Cree got access to,
02:06:19.980 | he would be the one who would solve it.
02:06:23.500 | - Science is a funny race.
02:06:27.540 | - It is.
02:06:29.580 | - It is. - Let me ask the biggest
02:06:33.740 | and the most ridiculous question.
02:06:35.060 | So you've kind of studied the human body
02:06:39.420 | and its defenses and these enemies that are about,
02:06:44.420 | from a biological perspective, bioinformatics perspective,
02:06:49.900 | a computer scientist's perspective,
02:06:51.780 | how has that made you see your own life?
02:06:55.860 | Sort of the meaning of it,
02:06:59.100 | or just even seeing what it means to be human?
02:07:02.940 | - Well, it certainly makes me realizing
02:07:08.340 | how fragile the human life is.
02:07:11.500 | If you think about this little tiny thing
02:07:15.900 | can impact the life of the whole humankind to such extent,
02:07:23.020 | so, you know, it's something to appreciate
02:07:28.020 | and to, you know, to remember that,
02:07:36.860 | that, you know, we are fragile,
02:07:42.900 | we have to bond together as a society,
02:07:51.860 | and, you know, it also gives me sort of hope
02:07:56.860 | that what we do as scientists is useful.
02:08:03.260 | - Well, I don't think there's a better way to end it.
02:08:07.340 | Dmitry, thank you so much for talking today.
02:08:09.140 | It was an honor.
02:08:10.260 | - Thank you very much.
02:08:12.140 | - Thanks for listening to this conversation
02:08:13.580 | with Dmitry Korkin,
02:08:14.740 | and thank you to our presenting sponsor, Cash App.
02:08:17.420 | Please consider supporting the podcast
02:08:19.100 | by downloading Cash App and using code LEX,
02:08:21.780 | podcast.
02:08:22.980 | If you enjoy this podcast, subscribe on YouTube,
02:08:25.560 | review it with five stars on Apple Podcast,
02:08:27.800 | support it on Patreon,
02:08:28.940 | or simply connect with me on Twitter @LexFriedman.
02:08:32.660 | And now, let me leave you with some words
02:08:35.940 | from Edward Osborne Wilson, E.O. Wilson.
02:08:40.100 | "The variety of genes on the planet in viruses exceeds
02:08:44.060 | or is likely to exceed that
02:08:46.300 | in all of the rest of life combined."
02:08:50.020 | Thank you for listening and hope to see you next time.
02:08:53.060 | (upbeat music)
02:08:55.640 | (upbeat music)
02:08:58.220 | [BLANK_AUDIO]