back to index

Manolis Kellis: Biology of Disease | Lex Fridman Podcast #133


Chapters

0:0 Introduction
2:49 Molecular basis for human disease
26:48 Deadliest diseases
32:31 Genetic component of diseases
41:22 Genetic understanding of disease
57:9 Unified theory of human disease
63:10 Genome circuitry
88:13 CRISPR
99:50 Mitochondria
107:54 Future of biology research
137:30 The genetic circuitry of disease

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Manolis Kellis,
00:00:03.040 | his third time on the podcast.
00:00:05.360 | He's a professor at MIT
00:00:07.480 | and head of the MIT Computational Biology Group.
00:00:11.200 | This time we went deep on the science, biology and genetics.
00:00:16.200 | So this is a bit of an experiment.
00:00:19.800 | Manolis went back and forth between the basics of biology
00:00:23.360 | to the latest state of the art in the research.
00:00:26.320 | He's a master at this.
00:00:28.120 | So I just sat back and enjoyed the ride.
00:00:31.080 | This conversation happened at 7 a.m.
00:00:33.440 | So it's yet another podcast episode
00:00:36.400 | after an all-nighter for me.
00:00:38.160 | And once again, since the universe has a sense of humor,
00:00:42.000 | this one was a tough one for my brain to keep up,
00:00:45.400 | but I did my best and I never shy away from good challenge.
00:00:50.240 | Quick mention of each sponsor,
00:00:51.680 | followed by some thoughts related to the episode.
00:00:55.600 | First is SEMrush, the most advanced SEO optimization tool
00:01:00.320 | I've ever come across.
00:01:02.200 | I don't like looking at numbers,
00:01:04.240 | but someone probably should.
00:01:06.040 | It helps you make good decisions.
00:01:08.360 | Second is Pessimist Archive.
00:01:10.740 | They're back.
00:01:11.840 | One of my favorite history podcasts
00:01:13.760 | on why people resist new things,
00:01:16.100 | from recorded music to umbrellas, to cars,
00:01:18.600 | chess, coffee, and the elevator.
00:01:22.120 | Third is 8sleep, a mattress that cools itself,
00:01:25.200 | measures heart rate variability, has an app,
00:01:28.160 | and has given me yet another reason
00:01:30.160 | to look forward to sleep,
00:01:31.400 | including the all-important power nap.
00:01:34.760 | And finally, BetterHelp, online therapy
00:01:37.800 | when you want to face your demons
00:01:39.440 | with a licensed professional,
00:01:41.280 | not just by doing the David Goggins-like physical challenges
00:01:45.000 | like I seem to do on occasion.
00:01:47.900 | Please check out these sponsors in the description
00:01:50.160 | to get a discount and to support this podcast.
00:01:54.120 | As a side note, let me say that biology in the brain
00:01:58.080 | and in the various systems of the body fill me with awe
00:02:01.300 | every time I think about how such a chaotic mess
00:02:04.900 | coming from its humble origins in the ocean
00:02:07.400 | was able to achieve such incredibly complex
00:02:10.640 | and robust mechanisms of life that survived
00:02:14.200 | despite all the forces of nature that want to destroy it.
00:02:18.440 | It is so unlike the computing systems
00:02:20.200 | we humans have engineered that it makes me feel
00:02:22.680 | that in order to create artificial general intelligence
00:02:26.120 | and artificial consciousness,
00:02:27.920 | we may have to completely rethink
00:02:30.440 | how we engineer computational systems.
00:02:33.480 | If you enjoy this thing, subscribe on YouTube,
00:02:36.040 | review it with 5 Stars on Apple Podcasts,
00:02:38.280 | follow on Spotify, support on Patreon,
00:02:41.000 | or connect with me on Twitter @LexFriedman.
00:02:44.800 | And now, here's my conversation with Manolis Callas.
00:02:48.700 | So your group at MIT is trying to understand
00:02:52.120 | the molecular basis of human disease.
00:02:54.760 | What are some of the biggest challenges in your view?
00:02:57.360 | - Don't get me started.
00:02:59.320 | I mean, understanding human disease
00:03:01.440 | is the most complex challenge in modern science.
00:03:06.100 | So because human disease is as complex as the human genome,
00:03:11.100 | it is as complex as the human brain,
00:03:14.200 | and it is in many ways even more complex
00:03:18.720 | because the more we understand disease complexity,
00:03:22.240 | the more we start understanding genome complexity
00:03:24.920 | and epigenome complexity and brain circuitry complexity
00:03:28.920 | and immune system complexity and cancer complexity
00:03:31.160 | and so on and so forth.
00:03:32.180 | So traditionally, human disease
00:03:37.000 | was following basic biology.
00:03:39.320 | You would basically understand basic biology
00:03:41.040 | and model organisms like mouse and fly and yeast.
00:03:45.920 | You would understand sort of mammalian biology
00:03:48.960 | and animal biology and eukaryotic biology
00:03:52.280 | in sort of progressive layers of complexity,
00:03:56.480 | getting closer to human phylogenetically.
00:03:59.640 | And you would do perturbation experiments in those species
00:04:03.600 | to see if I knock out a gene, what happens?
00:04:07.800 | And based on the knocking out of these genes,
00:04:10.280 | you would basically then have a way to drive human biology
00:04:14.440 | because you would sort of understand
00:04:15.880 | the functions of these genes.
00:04:16.840 | And then if you find that a human gene, locus,
00:04:20.880 | something that you've mapped from human genetics
00:04:23.680 | to that gene is related to a particular human disease,
00:04:26.960 | you'd say, aha, now I know the function of the gene
00:04:29.240 | from the model organisms.
00:04:31.440 | I can now go and understand the function
00:04:34.000 | of that gene in human.
00:04:35.440 | But this is all changing.
00:04:38.200 | This is dramatically changed.
00:04:39.360 | So that was the old way of doing basic biology.
00:04:41.720 | You would start with the animal models,
00:04:43.360 | the eukaryotic models, the mammalian models,
00:04:46.120 | and then you would go to human.
00:04:48.360 | Human genetics has been so transformed
00:04:51.060 | in the last decade or two that human genetics
00:04:55.420 | is now actually driving the basic biology.
00:04:58.300 | There is more genetic mutation information
00:05:01.960 | in the human genome than there will ever be
00:05:04.480 | in any other species.
00:05:06.120 | - What do you mean by mutation information?
00:05:08.400 | - So perturbations is how you understand systems.
00:05:11.360 | So an engineer builds systems,
00:05:14.080 | and then they know how they work from the inside out.
00:05:16.240 | A scientist studies systems through perturbations.
00:05:19.300 | You basically say, if I poke that balloon,
00:05:22.280 | what's gonna happen?
00:05:23.120 | And I'm gonna film it in super high resolution,
00:05:24.760 | understand, I don't know, air dynamics or fluid dynamics,
00:05:27.280 | if it's filled with water, et cetera.
00:05:28.840 | So you can then make experimentation by perturbation.
00:05:32.160 | And then the scientific process is sort of building models
00:05:35.420 | that best fit the data, designing new experiments
00:05:40.000 | that best test your models and challenge your models
00:05:42.120 | and so on and so forth.
00:05:43.280 | That's the same thing with science.
00:05:44.520 | Basically, if you're trying to understand
00:05:45.840 | biological science, you basically want to do perturbations
00:05:49.560 | that then drive the models.
00:05:54.560 | - So how do these perturbations
00:05:56.080 | allow you to understand disease?
00:05:58.280 | - So if you know that a gene is related to disease,
00:06:02.360 | you don't wanna just know that it's related to the disease.
00:06:04.760 | You wanna know what is the disease mechanism
00:06:07.320 | because you wanna go and intervene.
00:06:09.900 | So the way that I like to describe it
00:06:11.800 | is that traditionally, epidemiology,
00:06:16.800 | which is basically the study of disease,
00:06:19.280 | sort of the observational study of disease,
00:06:21.820 | has been about correlating one thing with another thing.
00:06:25.720 | So if you have a lot of people with liver disease
00:06:28.240 | who are also alcoholics, you might say,
00:06:30.240 | well, maybe the alcoholism is driving the liver disease,
00:06:33.720 | or maybe those who have liver disease
00:06:35.120 | self-medicate with alcohol.
00:06:36.760 | So the connection could be either way.
00:06:39.880 | With genetic epidemiology,
00:06:42.520 | it's about correlating changes in the genome
00:06:45.600 | with phenotypic differences,
00:06:47.680 | and then you know the direction of causality.
00:06:50.040 | So if you know that a particular gene
00:06:52.120 | is related to the disease,
00:06:55.160 | you can basically say, okay,
00:06:57.760 | perturbing that gene in mouse
00:07:00.080 | causes the mice to have X phenotype.
00:07:02.720 | So perturbing that gene in human
00:07:06.280 | causes the humans to have the disease.
00:07:08.180 | So I can now figure out
00:07:09.360 | what are the detailed molecular phenotypes
00:07:11.680 | in the human that are related
00:07:15.240 | to that organismal phenotype in the disease.
00:07:18.760 | So it's all about understanding disease mechanism,
00:07:21.280 | understanding what are the pathways,
00:07:22.840 | what are the tissues,
00:07:24.440 | what are the processes that are associated with the disease
00:07:27.320 | so that we know how to intervene.
00:07:28.920 | You can then prescribe particular medications
00:07:31.320 | that also alter these processes.
00:07:33.300 | You can prescribe lifestyle changes
00:07:35.260 | that also affect these processes, and so on and so forth.
00:07:38.020 | - That's such a beautiful puzzle to try to solve,
00:07:40.960 | like what kind of perturbations
00:07:42.400 | eventually have this ripple effect
00:07:43.960 | that leads to disease across the population.
00:07:47.000 | And then you study that for animals or mice first,
00:07:50.340 | and then see how that might possibly connect to humans.
00:07:54.520 | How hard is that puzzle of trying to figure out
00:07:57.600 | how little perturbations might lead to,
00:08:01.260 | in a stable way, to a disease?
00:08:03.960 | - In animals, we make the puzzle simpler
00:08:08.340 | because we perturb one gene at a time.
00:08:10.920 | That's the beauty of it, it's the power of animal models.
00:08:13.460 | You can basically decouple the perturbations.
00:08:15.780 | You only do one perturbation,
00:08:17.860 | and you only do strong perturbations at a time.
00:08:21.140 | In human, the puzzle is incredibly complex
00:08:25.180 | because, I mean, obviously,
00:08:26.700 | you don't do human experimentation.
00:08:28.500 | You wait for natural selection
00:08:30.780 | and natural genetic variation
00:08:33.180 | to basically do its own experiments,
00:08:34.880 | which it has been doing for hundreds and thousands of years
00:08:38.960 | in the human population,
00:08:40.720 | and for hundreds of thousands of years
00:08:43.240 | across the history leading to the human population.
00:08:48.240 | So you basically take this natural genetic variation
00:08:52.700 | that we all carry within us.
00:08:54.300 | Every one of us carries six million perturbations.
00:08:57.320 | So I've done six million experiments on you,
00:09:00.480 | six million experiments on me,
00:09:01.880 | six million experiments on every one
00:09:03.940 | of seven billion people on the planet.
00:09:06.220 | - What's the six million correspond to?
00:09:08.620 | - Six million unique genetic variants
00:09:11.960 | that are segregating the human population.
00:09:14.620 | Every one of us carries millions of polymorphic sites,
00:09:19.620 | poly, many, morph, forms.
00:09:22.420 | Polymorphic means many forms, variants.
00:09:25.020 | That basically means that every one of us
00:09:26.900 | has single nucleotide alterations
00:09:29.420 | that we have inherited from mom and from dad
00:09:31.780 | that basically can be thought of
00:09:33.560 | as tiny little perturbations.
00:09:36.000 | Most of them don't do anything,
00:09:38.720 | but some of them lead to all of the phenotypic differences
00:09:42.620 | that we see between us.
00:09:43.900 | The reason why two twins are identical
00:09:46.040 | is because these variants completely determine
00:09:49.100 | the way that I'm gonna look at exactly 93 years of age.
00:09:52.500 | - How happy are you with this kind of data set?
00:09:54.680 | Is it large enough of the human population of Earth?
00:09:59.420 | Is that too big, too small?
00:10:01.680 | - Yeah, so is it large enough is a power analysis question.
00:10:06.680 | And in every one of our grants,
00:10:08.260 | we do a power analysis based on what is the effect size
00:10:11.540 | that I would like to detect
00:10:13.540 | and what is the natural variation in the two forms.
00:10:18.540 | So every time you do a perturbation,
00:10:20.220 | you're asking I'm changing form A into form B.
00:10:23.160 | Form A has some natural phenotypic variation around it
00:10:27.460 | and form B has some natural phenotypic variation around it.
00:10:31.140 | If those variances are large
00:10:33.140 | and the differences between the mean of A
00:10:34.860 | and the mean of B are small,
00:10:37.200 | then you have very little power.
00:10:38.860 | The further the means go apart, that's the effect size,
00:10:43.020 | the more power you have,
00:10:44.420 | and the smaller the standard deviation,
00:10:47.300 | the more power you have.
00:10:48.700 | So basically when you're asking is that sufficiently large,
00:10:52.620 | certainly not for everything,
00:10:54.260 | but we already have enough power
00:10:56.100 | for many of the stronger effects
00:10:59.220 | in the more tight distributions.
00:11:01.700 | - So that's a hopeful message
00:11:02.860 | that there exists parts of the genome
00:11:07.300 | that have a strong effect that has a small variance.
00:11:12.300 | - That's exactly right.
00:11:14.140 | Unfortunately, those perturbations
00:11:16.100 | are the basis of disease in many cases.
00:11:18.700 | So it's not a hopeful message.
00:11:20.540 | Sometimes it's a terrible message.
00:11:22.640 | It's basically, well, some people are sick,
00:11:24.660 | but if we can figure out
00:11:26.740 | what are these contributors to sickness,
00:11:28.980 | we can then help make them better
00:11:30.700 | and help many other people better
00:11:32.420 | who don't carry that exact mutation,
00:11:34.900 | but who carry mutations on the same pathways.
00:11:38.900 | And that's what we like to call the allelic series of a gene.
00:11:42.780 | You basically have many perturbations
00:11:45.380 | of the same gene in different people,
00:11:49.180 | each with a different frequency in the human population
00:11:52.500 | and each with a different effect
00:11:54.520 | on the individual that carries them.
00:11:56.540 | - So you said in the past,
00:11:58.460 | there would be these small experiments
00:12:00.740 | on perturbations in animal models.
00:12:03.780 | What does this puzzle-solving process look like today?
00:12:08.300 | - So we basically have something
00:12:10.860 | like seven billion people in the planet,
00:12:12.420 | and every one of them carries
00:12:13.660 | something like six million mutations.
00:12:16.660 | You basically have an enormous matrix
00:12:19.660 | of genotype by phenotype
00:12:22.960 | by systematically measuring the phenotype
00:12:26.020 | of these individuals.
00:12:27.780 | And the traditional way of measuring this phenotype
00:12:30.540 | has been to look at one trait at a time.
00:12:33.660 | You would gather families,
00:12:35.540 | and you would sort of paint the pedigrees
00:12:38.260 | of a strong effect,
00:12:40.100 | what we like to call Mendelian mutation.
00:12:42.840 | So a mutation that gets transmitted
00:12:46.060 | in a dominant or a recessive,
00:12:48.680 | but strong effect form,
00:12:50.220 | where basically one locus plays a very big role
00:12:53.340 | in that disease.
00:12:54.540 | And you could then look at carriers versus non-carriers
00:12:56.900 | in one family,
00:12:58.220 | carriers versus non-carriers in another family,
00:13:01.100 | and do that for hundreds, sometimes thousands of families,
00:13:04.300 | and then trace these inheritance patterns,
00:13:06.580 | and then figure out what is the gene that plays that role.
00:13:09.500 | - Is this the matrix that you're showing
00:13:11.340 | in talks or lectures?
00:13:14.340 | - So that matrix is the input
00:13:18.940 | to the stuff that I show in talks.
00:13:20.980 | So basically that matrix
00:13:22.060 | has traditionally been strong effect genes.
00:13:24.940 | What the matrix looks like now
00:13:26.780 | is instead of pedigrees, instead of families,
00:13:29.540 | you basically have thousands,
00:13:31.820 | and sometimes hundreds of thousands
00:13:34.100 | of unrelated individuals,
00:13:36.100 | each with all of their genetic variants,
00:13:38.020 | and each with their phenotype,
00:13:39.820 | for example, height, or lipids,
00:13:42.740 | or whether they're sick or not for a particular trait.
00:13:46.180 | That has been the modern view.
00:13:49.740 | Instead of going to families,
00:13:51.300 | we're going to unrelated individuals
00:13:53.260 | with one phenotype at a time.
00:13:55.780 | And what we're doing now,
00:13:56.980 | as we're maturing in all of these sciences,
00:14:00.460 | is that we're doing this in the context
00:14:02.300 | of large medical systems,
00:14:04.780 | or enormous cohorts that are very well phenotyped
00:14:08.180 | across hundreds of phenotypes,
00:14:10.820 | sometimes with our complete electronic health record.
00:14:13.780 | So you can now start relating
00:14:15.500 | not just one gene, segregating one family,
00:14:18.220 | not just thousands of variants,
00:14:20.820 | segregating with one phenotype,
00:14:22.860 | but now you can do millions of variants
00:14:24.900 | versus hundreds of phenotypes.
00:14:27.020 | And as a computer scientist,
00:14:28.300 | I mean, deconvolving that matrix,
00:14:30.540 | partitioning it into the layers of biology
00:14:35.340 | that are associated with every one of these elements
00:14:38.500 | is a dream come true.
00:14:39.780 | It's like the world's greatest puzzle.
00:14:42.860 | And you can now solve that puzzle
00:14:45.340 | by throwing in more and more knowledge
00:14:48.540 | about the function of different genomic regions
00:14:52.740 | and how these functions are changed across tissues
00:14:56.300 | and in the context of disease.
00:14:58.100 | And that's what my group and many other groups are doing.
00:15:00.740 | We're trying to systematically relate
00:15:02.340 | this genetic variation with molecular variation
00:15:05.900 | at the expression level of the genes,
00:15:08.380 | at the epigenomic level of the gene regulatory circuitry,
00:15:12.700 | and at the cellular level
00:15:14.260 | of what are the functions that are happening in those cells,
00:15:17.020 | at the single cell level, using single cell profiling,
00:15:20.340 | and then relate all that vast amount of knowledge
00:15:23.620 | computationally with the thousands of traits
00:15:27.500 | that each of these of thousands of variants are perturbing.
00:15:31.460 | - I mean, this is something we talked about,
00:15:32.900 | I think, last time.
00:15:34.220 | So there's these effects at different levels that happen.
00:15:36.460 | You said at a single cell level,
00:15:38.780 | you're trying to see things that happen
00:15:40.460 | due to certain perturbations.
00:15:42.700 | And then, it's not just like a puzzle
00:15:45.740 | of perturbation and disease.
00:15:49.460 | It's perturbation, then effect at a cellular level,
00:15:53.420 | then at an organ level, at a body,
00:15:55.780 | like how do you disassemble this
00:16:00.020 | into what your group is working on?
00:16:02.420 | You're basically taking a bunch of the hard problems
00:16:05.340 | in the space.
00:16:06.500 | How do you break apart a difficult disease
00:16:09.660 | and break it apart into puzzles
00:16:14.140 | that you can now start solving?
00:16:15.440 | - So there's a struggle here.
00:16:17.100 | Computer scientists love hard puzzles.
00:16:19.660 | And they're like, "Oh, I wanna build a method
00:16:21.820 | that just deconvolves the whole thing computationally."
00:16:24.620 | And that's very tempting and it's very appealing,
00:16:28.420 | but biologists just like to decouple
00:16:31.540 | that complexity experimentally,
00:16:32.980 | to just like peel off layers of complexity experimentally.
00:16:35.940 | And that's what many of these modern tools
00:16:37.780 | that my group and others have both developed and used.
00:16:41.540 | The fact that we can now figure out tricks
00:16:44.500 | for peeling off these layers of complexity
00:16:46.780 | by testing one cell type at a time,
00:16:49.780 | or by testing one cell at a time.
00:16:52.740 | And you could basically say,
00:16:54.100 | what is the effect of these genetic variants
00:16:55.880 | associated with Alzheimer's on human brain?
00:16:58.140 | Human brain sounds like, oh, it's an organ, of course,
00:17:02.520 | just go one organ at a time.
00:17:04.200 | But human brain has, of course,
00:17:05.920 | dozens of different brain regions.
00:17:08.420 | And within each of these brain regions,
00:17:10.200 | dozens of different cell types.
00:17:12.500 | And every single type of neuron,
00:17:14.660 | every single type of glial cell,
00:17:16.820 | between astrocytes, oligodendrocytes, microglia,
00:17:20.540 | between all of the neural cells and the vascular cells
00:17:25.540 | and the immune cells that are co-inhabiting the brain
00:17:29.520 | between the different types of excitatory
00:17:31.660 | and inhibitory neurons that are sort of interacting
00:17:34.360 | with each other between different layers of neurons
00:17:37.340 | in the cortical layers.
00:17:39.020 | Every single one of these has a different type of function
00:17:44.020 | to play in cognition, in interaction with the environment,
00:17:49.380 | in maintenance of the brain, in energetic needs,
00:17:54.140 | in feeding the brain with blood, with oxygen,
00:17:58.300 | in clearing out the debris that are resulting
00:18:01.460 | from the super high energy production
00:18:03.780 | of cognition in humans.
00:18:06.820 | So all of these things are basically
00:18:09.580 | potentially deconvolvable computationally,
00:18:15.200 | but experimentally, you can just do single cell profiling
00:18:19.140 | of dozens of regions of the brain
00:18:21.060 | across hundreds of individuals, across millions of cells.
00:18:24.420 | And then now you have pieces of the puzzle
00:18:28.100 | that you can then put back together
00:18:30.060 | to understand that complexity.
00:18:33.180 | - I mean, first of all, I mean, the human brain,
00:18:35.100 | the cells in the human brain are the most,
00:18:37.740 | okay, maybe I'm romanticizing it,
00:18:39.500 | but cognition seems to be very complicated.
00:18:42.340 | So separating into the function,
00:18:45.900 | breaking Alzheimer's down to the cellular level
00:18:52.780 | seems very challenging.
00:18:54.560 | Is that basically you're trying to find a way
00:18:59.540 | that some perturbation in genome results
00:19:05.220 | in some obvious major dysfunction in the cell?
00:19:10.220 | You're trying to find something like that.
00:19:14.340 | - Exactly.
00:19:15.180 | So what does human genetics do?
00:19:16.900 | Human genetics basically looks at the whole path
00:19:19.580 | from genetic variation all the way to disease.
00:19:22.260 | So human genetics has basically taken thousands
00:19:26.660 | of Alzheimer's cases and thousands of controls
00:19:31.700 | matched for age, for sex, for environmental backgrounds
00:19:36.700 | and so on and so forth.
00:19:38.320 | And then looked at that map where you're asking
00:19:41.640 | what are the individual genetic perturbations
00:19:44.540 | and how are they related to all the way
00:19:46.940 | to Alzheimer's disease?
00:19:48.360 | And that has actually been quite successful.
00:19:51.200 | So we now have more than 27 different loci,
00:19:54.820 | these are genomic regions that are associated
00:19:57.780 | with Alzheimer's at this end to end level.
00:20:02.340 | But the moment you sort of break up that very long path
00:20:05.180 | into smaller levels, you can basically say from genetics,
00:20:09.700 | what are the epigenomic alterations
00:20:12.380 | at the level of gene regulatory elements?
00:20:14.240 | Where that genetic variant perturbs
00:20:16.660 | the control region nearby.
00:20:19.060 | That effect is much larger.
00:20:20.580 | - You mean much larger in terms of
00:20:23.500 | this down the line impact?
00:20:25.460 | - It's much larger in terms of the measurable effect.
00:20:28.100 | This A versus B variance is actually so much cleanly defined
00:20:33.100 | when you go to the shorter branches.
00:20:35.660 | Because for one genetic variant to affect Alzheimer's,
00:20:39.380 | that's a very long path.
00:20:40.740 | That basically means that in the context of millions
00:20:42.940 | of these six million variants that every one of us carries,
00:20:45.500 | that one single nucleotide has a detectable effect
00:20:49.340 | all the way to the end.
00:20:51.300 | I mean, it's just mind boggling that that's even possible.
00:20:54.620 | - That's crazy. - But indeed, yeah.
00:20:55.940 | But indeed, there are such effects.
00:20:57.540 | - So the hope is, or the most, scientifically speaking,
00:21:01.420 | the most effective place where to detect
00:21:03.900 | the alteration that results in disease
00:21:07.460 | is earlier on in the pipeline, as early as possible.
00:21:10.700 | - It's a trade off.
00:21:12.620 | If you go very early on in the pipeline,
00:21:14.860 | now each of these epigenomic alterations,
00:21:17.680 | for example, this enhancer control region,
00:21:20.460 | is active maybe 50% less, which is a dramatic effect.
00:21:25.420 | Now you can ask, well, how much does changing
00:21:27.340 | one regulatory region in the genome
00:21:29.340 | in one cell type change disease?
00:21:31.260 | Well, that path is now long.
00:21:32.700 | So if you instead look at expression,
00:21:37.640 | the path between genetic variation
00:21:39.140 | and the expression of one gene
00:21:40.420 | goes through many enhancer regions,
00:21:42.700 | and therefore it's a subtler effect at the gene level.
00:21:45.500 | But then now you're closer because one gene is acting
00:21:49.620 | in the context of only 20,000 other genes,
00:21:52.300 | as opposed to one enhancer acting
00:21:53.700 | in the context of 2 million other enhancers.
00:21:56.320 | So you basically now have genetic,
00:22:00.020 | epigenomic, the circuitry, transcriptomic,
00:22:03.140 | the gene expression level, and then cellular,
00:22:06.580 | where you can basically say,
00:22:07.500 | I can measure various properties of those cells.
00:22:10.980 | What is the calcium influx rate
00:22:15.060 | when I have this genetic variation?
00:22:17.380 | What is the synaptic density?
00:22:19.580 | What is the electric impulse conductivity?
00:22:22.700 | And so on and so forth.
00:22:24.380 | So you can measure things along this path to disease,
00:22:29.380 | and you can also measure endophenotypes.
00:22:32.560 | You can basically measure your brain activity.
00:22:37.380 | You can do imaging in the brain.
00:22:39.580 | You can basically measure, I don't know,
00:22:41.180 | the heart rate, the pulse, the lipids,
00:22:43.140 | the amount of blood secreted, and so on and so forth.
00:22:46.540 | And then through all of that,
00:22:48.740 | you can basically get at the path to causality,
00:22:52.300 | the path to disease.
00:22:53.480 | - And is there something beyond cellular?
00:22:57.540 | So you mentioned lifestyle interventions
00:23:01.340 | or changes as a way to,
00:23:03.340 | or like be able to prescribe changes in lifestyle.
00:23:07.740 | Like what about organs?
00:23:09.240 | What about like the function of the body as a whole?
00:23:13.220 | - Yeah, absolutely.
00:23:14.060 | So basically, when you go to your doctor,
00:23:16.020 | they always measure your pulse.
00:23:18.140 | They always measure your height.
00:23:19.260 | They always measure your weight, your BMI.
00:23:21.380 | So basically, these are just very basic variables.
00:23:23.980 | But with digital devices nowadays,
00:23:26.300 | you can start measuring hundreds of variables
00:23:27.900 | for every individual.
00:23:29.500 | You can basically also phenotype cognitively
00:23:32.980 | through tests, Alzheimer's patients.
00:23:37.180 | There are cognitive tests that you can measure,
00:23:38.940 | that you typically do for cognitive decline,
00:23:43.620 | these mini mental observations
00:23:46.060 | that you have specific questions to.
00:23:48.420 | You can think of sort of enlarging
00:23:49.940 | the set of cognitive tests.
00:23:51.860 | So in the mouse, for example,
00:23:53.020 | you do experiments for how do they get out of mazes?
00:23:55.580 | How do they find food?
00:23:57.100 | Whether they recall a fear,
00:23:59.160 | whether they shake in a new environment,
00:24:01.300 | and so on and so forth.
00:24:02.340 | In the human, you can have much, much richer phenotypes,
00:24:04.900 | where you can basically say,
00:24:06.340 | not just imaging at the organ level,
00:24:10.680 | but, and all kinds of other activities at the organ level,
00:24:13.740 | but you can also do at the organism level,
00:24:17.540 | you can do behavioral tests.
00:24:19.360 | And how did they do on empathy?
00:24:20.980 | How did they do on memory?
00:24:22.740 | How did they do on long-term memory
00:24:24.780 | versus short-term memory?
00:24:26.020 | And so on and so forth.
00:24:26.860 | - I love how you're calling that phenotype.
00:24:28.740 | I guess it is. - It is.
00:24:30.900 | - But like your behavior patterns
00:24:32.500 | that might change over a period of a life.
00:24:36.060 | Your ability to remember things,
00:24:38.700 | your ability to be, yeah, empathetic or emotionally,
00:24:42.620 | or your intelligence, perhaps even.
00:24:44.940 | - Yeah, but intelligence has hundreds of variables.
00:24:46.940 | You can be your math intelligence,
00:24:48.380 | your literary intelligence,
00:24:49.460 | your puzzle-solving intelligence, your logic.
00:24:51.300 | It could be like hundreds of things.
00:24:52.780 | - And all of that, we're able to measure that
00:24:55.660 | better and better.
00:24:56.500 | And all of that could be connected to the entire pipeline.
00:24:58.820 | - We used to think of each of these as a single variable,
00:25:01.180 | like intelligence.
00:25:02.020 | I mean, that's ridiculous.
00:25:03.180 | It's basically dozens of different genes
00:25:06.600 | that are controlling every single variable.
00:25:10.780 | You can basically think of,
00:25:12.340 | imagine us in a video game
00:25:14.060 | where every one of us has measures of strength,
00:25:17.620 | stamina, energy left, and so on and so forth.
00:25:20.840 | But you could click on each of those five bars
00:25:23.060 | that are just the main bars,
00:25:24.120 | and each of those will just give you then hundreds of bars.
00:25:27.020 | And you can basically say,
00:25:27.980 | "Okay, great, for my machine learning task,
00:25:31.180 | "I want someone who, a human,
00:25:34.280 | "who has these particular forms of intelligence.
00:25:37.200 | "I require now these 20 different things."
00:25:40.500 | And then you can combine those things
00:25:42.240 | and then relate them to, of course,
00:25:44.580 | performance in a particular task.
00:25:46.180 | But you can also relate them to genetic variation
00:25:49.420 | that might be affecting different parts of the brain.
00:25:52.700 | For example, your frontal cortex
00:25:54.260 | versus your temporal cortex
00:25:55.460 | versus your visual cortex, and so on and so forth.
00:25:57.920 | So genetic variation that affects expression of genes
00:26:01.060 | in different parts of your brain
00:26:02.540 | can basically affect your music ability,
00:26:05.120 | your auditory ability, your smell,
00:26:07.460 | your, you know, just dozens of different phenotypes
00:26:11.340 | can be broken down into, you know,
00:26:14.100 | hundreds of cognitive variables,
00:26:15.780 | and then relate each of those to thousands of genes
00:26:19.060 | that are associated with them.
00:26:20.880 | - So somebody who loves RPGs, role-playing games,
00:26:24.640 | there's too few variables that we can control.
00:26:28.340 | So I'm excited, if we're in fact living in a simulation,
00:26:31.180 | this is a video game,
00:26:32.560 | I'm excited by the quality of the video game.
00:26:35.700 | The game designer did a hell of a good job,
00:26:39.620 | so we're impressed.
00:26:41.020 | - So I don't know, the sunset last night
00:26:42.460 | was a little unrealistic.
00:26:43.620 | (both laughing)
00:26:45.060 | - Yeah, yeah, the graphics.
00:26:47.020 | - Exactly.
00:26:47.860 | - Come on, Nvidia.
00:26:48.860 | To zoom back out, we've been talking about
00:26:50.900 | the genetic origins of diseases,
00:26:53.980 | but I think it's fascinating to talk about
00:26:57.140 | what are the most important diseases to understand,
00:27:00.780 | and especially as it connects to the things
00:27:03.360 | that you're working on.
00:27:05.220 | - So it's very difficult to think about
00:27:07.460 | important diseases to understand.
00:27:08.820 | There's many metrics of importance.
00:27:10.260 | One is lifestyle impact.
00:27:13.100 | I mean, if you look at COVID,
00:27:14.300 | the impact on lifestyle has been enormous.
00:27:16.380 | So understanding COVID is important
00:27:18.500 | because it has impacted the wellbeing
00:27:22.040 | in terms of ability to have a job,
00:27:24.620 | ability to have an apartment,
00:27:25.660 | ability to go to work,
00:27:26.920 | ability to have a mental circle of support,
00:27:30.580 | and all of that for millions of Americans,
00:27:33.540 | like huge, huge impact.
00:27:35.460 | So that's one aspect of importance.
00:27:36.980 | So basically mental disorders,
00:27:38.780 | Alzheimer's has a huge importance
00:27:41.060 | in the wellbeing of Americans.
00:27:43.740 | Whether or not it kills someone,
00:27:46.000 | for many, many years, it has a huge impact.
00:27:48.140 | So the first measure of importance is just wellbeing.
00:27:51.900 | - Impact on the quality of life.
00:27:53.540 | - Impact on the quality of life, absolutely.
00:27:55.780 | The second metric, which is much easier to quantify,
00:27:58.420 | is deaths.
00:27:59.860 | - What is the number one killer?
00:28:01.860 | - The number one killer is actually heart disease.
00:28:04.700 | It is actually killing 650,000 Americans per year.
00:28:08.940 | Number two is cancer, with 600,000 Americans.
00:28:14.100 | Number three, far, far down the list, is accidents.
00:28:17.440 | Every single accident combined.
00:28:19.520 | So basically you read the news, accidents,
00:28:22.100 | like there was a huge car crash, all over the news.
00:28:25.700 | But the number of deaths, number three by far, 167,000.
00:28:30.700 | Lower respiratory disease, so that's asthma,
00:28:33.660 | not being able to breathe, and so on and so forth, 160,000.
00:28:37.900 | Alzheimer's, number five, with 120,000.
00:28:42.300 | And then stroke, brain aneurysms, and so on and so forth,
00:28:44.940 | that's 147,000.
00:28:46.700 | Diabetes and metabolic disorders, et cetera, that's 85,000.
00:28:51.080 | The flu, a 60,000.
00:28:53.540 | Suicide, 50,000.
00:28:56.060 | And then overdose, et cetera,
00:28:58.140 | you know, goes further down the list.
00:29:00.080 | So of course, COVID has creeped up
00:29:01.720 | to be the number three killer this year,
00:29:04.820 | with more than 100,000 Americans, and counting.
00:29:09.820 | And, you know, but if you think about sort of
00:29:14.500 | what do we use, what are the most important diseases,
00:29:17.020 | you have to understand both the quality of life,
00:29:20.460 | and the just sheer number of deaths,
00:29:22.820 | and just numbers of years lost, if you wish.
00:29:25.100 | - And each of these diseases you can think of as,
00:29:28.820 | and also including terrorist attacks,
00:29:30.900 | and school shootings, for example,
00:29:32.600 | things which lead to fatalities,
00:29:36.600 | you can look at as problems that could be solved.
00:29:41.060 | And some problems are harder to solve than others.
00:29:44.900 | I mean, that's part of the equation.
00:29:46.740 | So maybe if you look at these diseases,
00:29:48.620 | if you look at heart disease, or cancer, or Alzheimer's,
00:29:53.100 | or just like schizophrenia, and obesity,
00:29:56.700 | that'd be like, not necessarily things that kill you,
00:29:59.620 | but affect the quality of life.
00:30:01.620 | Which problems are solvable, which aren't,
00:30:04.940 | which are harder to solve, which aren't?
00:30:07.300 | - I love your question, because it puts it in the context
00:30:09.780 | of a global effort, rather than just a local effort.
00:30:14.780 | So basically, if you look at the global aspect,
00:30:19.580 | exercise and nutrition are two interventions
00:30:22.620 | that we can, as a society, make a much better job at.
00:30:26.080 | So if you think about sort of the availability
00:30:28.900 | of cheap food, it's extremely high in calories,
00:30:32.900 | it's extremely detrimental for you,
00:30:34.780 | like a lot of processed food, et cetera.
00:30:36.860 | So if we change that equation, and as a society,
00:30:40.900 | we made availability of healthy food much, much easier,
00:30:44.340 | and charged a burger at McDonald's,
00:30:48.500 | the price that it costs on the health system,
00:30:51.520 | then people would actually start buying more healthy foods.
00:30:56.260 | So basically, that's sort of a societal intervention,
00:30:58.660 | if you wish.
00:30:59.500 | In the same way, increasing empathy, increasing education,
00:31:03.980 | increasing the social framework and support
00:31:08.020 | would basically lead to fewer suicides,
00:31:09.960 | it would lead to fewer murders,
00:31:11.800 | it would lead to fewer deaths overall.
00:31:15.740 | So that's something that we as a society can do.
00:31:18.540 | You can also think about external factors
00:31:20.800 | versus internal factors.
00:31:21.900 | So the external factors are basically communicable diseases,
00:31:24.700 | like COVID, like the flu, et cetera.
00:31:26.960 | And the internal factors are basically things like cancer
00:31:31.780 | and Alzheimer's, where basically your genetics
00:31:34.260 | will eventually drive you there.
00:31:36.580 | And then of course, with all of these factors,
00:31:41.620 | every single disease has both a genetic component
00:31:44.480 | and environmental component.
00:31:46.100 | So heart disease, huge genetic contribution.
00:31:49.580 | Alzheimer's, it's like 60% plus genetic.
00:31:55.620 | So I think it's like 79% heritability.
00:31:58.960 | So that basically means that genetics alone
00:32:01.560 | explains 79% of Alzheimer's incidence.
00:32:06.040 | And yes, there's a 21% environmental component
00:32:10.240 | where you could basically enrich your cognitive environment,
00:32:15.240 | enrich your social interactions, read more books,
00:32:19.420 | learn a foreign language, go running,
00:32:21.900 | sort of have a more fulfilling life.
00:32:24.680 | All of that will actually decrease Alzheimer's,
00:32:26.640 | but there's a limit to how much that can impact
00:32:29.360 | because of the huge genetic footprint.
00:32:31.200 | - So this is fascinating.
00:32:32.040 | So each one of these problems have a genetic component
00:32:36.720 | and an environment component.
00:32:38.760 | And so when there's a genetic component,
00:32:41.240 | what can we do about some of these diseases?
00:32:43.000 | What have you worked on?
00:32:44.760 | What can you say that's in terms of problems
00:32:47.160 | that are solvable here or understandable?
00:32:50.400 | - So my group works on the genetic component,
00:32:52.800 | but I would argue that understanding the genetic component
00:32:56.040 | can have a huge impact even on the environmental component.
00:32:59.560 | Why is that?
00:33:00.560 | Because genetics gives us access to mechanism.
00:33:03.440 | And if we can alter the mechanism,
00:33:05.520 | if we can impact the mechanism,
00:33:07.680 | we can perhaps counteract
00:33:09.600 | some of the environmental components.
00:33:11.000 | - Oh, interesting.
00:33:11.960 | - So understanding the biological mechanisms
00:33:15.720 | leading to disease is extremely important
00:33:18.300 | in being able to intervene.
00:33:20.700 | But when you can intervene,
00:33:22.920 | the analogy that I like to give is, for example, for obesity.
00:33:27.320 | Think of it as a giant bathtub of fat.
00:33:30.280 | There's basically fat coming in from your diet
00:33:32.920 | and there's fat coming out from your exercise.
00:33:36.640 | That's an in-out equation.
00:33:40.120 | And that's the equation that everybody's focusing on.
00:33:42.100 | But your metabolism impacts that bathtub.
00:33:47.360 | Basically, your metabolism controls
00:33:50.460 | the rate at which you're burning energy.
00:33:52.980 | It controls the rate at which you're storing energy.
00:33:55.620 | And it also teaches you about the various valves
00:34:01.180 | that control the input and the output equation.
00:34:03.900 | So if we can learn from the genetics, the valves,
00:34:08.020 | we can then manipulate those valves.
00:34:09.920 | And even if the environment is feeding you a lot of fat
00:34:14.100 | and getting a little of that out,
00:34:15.960 | you can just poke another hole at the bathtub
00:34:18.180 | and just get a lot of the fat out.
00:34:19.900 | - Yeah, that's fascinating.
00:34:21.680 | Yeah, so we're not just passive observers of our genetics.
00:34:25.700 | The more we understand,
00:34:26.860 | the more we can come up with actual treatments.
00:34:29.540 | - And I think that's an important aspect to realize.
00:34:33.080 | When people are thinking about strong effect
00:34:36.180 | versus weak effect variants.
00:34:37.980 | So some variants have strong effects.
00:34:39.520 | We talked about these Mendelian disorders
00:34:41.300 | where a single gene has a sufficiently large effect,
00:34:43.980 | penetrance, expressivity, and so on and so forth,
00:34:46.400 | that basically you can trace it in families
00:34:50.760 | with cases and not cases, cases not cases,
00:34:53.160 | and so on and so forth.
00:34:54.960 | But even the, you know, but,
00:34:58.320 | so these are the genes that everybody says,
00:35:01.280 | oh, that's the genes we should go after
00:35:02.920 | because that's a strong effect gene.
00:35:04.920 | I like to think about it slightly differently.
00:35:06.600 | These are the genes where genetic impacts
00:35:11.040 | that have a strong effect were tolerated.
00:35:13.940 | Because every single time we have a genetic association
00:35:17.500 | with disease, it depends on two things.
00:35:20.040 | Number one, the obvious one,
00:35:21.880 | whether the gene has an impact on the disease.
00:35:24.500 | Number two, the more subtle one,
00:35:26.220 | is whether there is genetic variation
00:35:30.300 | standing and circulating and segregating
00:35:33.180 | in the human population that impacts that gene.
00:35:36.400 | Some genes are so darn important
00:35:41.000 | that if you mess with them even a tiny little amount,
00:35:44.360 | that person's dead.
00:35:46.360 | So those genes don't have variation.
00:35:48.940 | You're not gonna find a genetic association
00:35:51.100 | if you don't have variation.
00:35:52.960 | That doesn't mean that the gene has no role.
00:35:55.380 | It simply that the gene,
00:35:56.460 | it simply means that the gene tolerates no mutations.
00:35:59.060 | - So that's actually a strong signal
00:36:00.420 | when there's no variation.
00:36:01.380 | That's so fascinating. - Exactly.
00:36:03.260 | Genes that have very little variation are hugely important.
00:36:06.740 | You can actually rank the importance of genes
00:36:08.900 | based on how little variation they have.
00:36:10.700 | And those genes that have very little variation
00:36:13.700 | but no association with disease,
00:36:16.740 | that's a very good metric to say,
00:36:18.140 | oh, that's probably a developmental gene
00:36:19.840 | because we're not good at measuring those phenotypes.
00:36:22.780 | So it's genes that you can tell evolution
00:36:25.100 | has excluded mutations from,
00:36:27.280 | but yet we can't see them associated
00:36:29.940 | with anything that we can measure nowadays.
00:36:32.040 | It's probably early embryonic lethal.
00:36:34.380 | - What are all the words you just said?
00:36:36.000 | Early embryonic what?
00:36:37.140 | - Lethal.
00:36:38.260 | - Meaning?
00:36:39.100 | - Meaning that--
00:36:39.920 | - If you don't have that, then--
00:36:40.760 | - If you don't have that, you'll die.
00:36:41.580 | - Okay.
00:36:42.420 | There's a bunch of stuff that is required
00:36:45.980 | for a stable functional organism--
00:36:48.380 | - Exactly. - Across the board.
00:36:49.700 | - Exactly. - For our entire,
00:36:51.900 | for an entire species, I guess.
00:36:53.840 | - If you look at sperm, it expresses thousands of proteins.
00:36:58.540 | Does sperm actually need thousands of proteins?
00:37:01.000 | No, but it's probably just testing them.
00:37:04.060 | So my speculation-- - Early on.
00:37:06.460 | - Is that misfolding of these proteins
00:37:09.500 | is an early test for failure.
00:37:11.880 | So that out of the millions of sperm that are possible,
00:37:16.440 | you select the subset that are just not
00:37:19.120 | grossly misfolding thousands of proteins.
00:37:21.840 | - So it's kind of an assert that this is folded correctly.
00:37:25.560 | - Correct.
00:37:26.400 | - Yeah, this, just because if this little thing
00:37:29.520 | about the folding of a protein isn't correct,
00:37:32.380 | that probably means somewhere down the line
00:37:34.520 | there's a bigger issue.
00:37:35.560 | - That's exactly right, so fail fast.
00:37:37.480 | So basically if you look at the mammalian investment
00:37:41.080 | in a newborn, that investment is enormous
00:37:45.880 | in terms of resources.
00:37:47.600 | So mammals have basically evolved mechanisms for fail fast.
00:37:52.600 | Where basically in those early months of development,
00:37:56.400 | I mean, it's horrendous, of course, at the personal level
00:37:59.680 | when you lose your future child,
00:38:03.320 | but in some ways there's so little hope
00:38:08.320 | for that child to develop and sort of make it
00:38:11.100 | through the remaining months that sort of fail fast
00:38:12.920 | is probably a good evolutionary principle.
00:38:16.360 | - Yeah, from an evolutionary perspective.
00:38:17.480 | - For mammals, and of course humans have a lot
00:38:22.100 | of medical resources that you can sort of give
00:38:24.780 | those children a chance.
00:38:26.720 | And we have so much more success in sort of giving folks
00:38:32.560 | who have these strong carrier mutations a chance,
00:38:35.400 | but if they're not even making it
00:38:36.640 | through the first three months, we're not gonna see them.
00:38:39.740 | So that's why when we say what are the most important genes
00:38:43.580 | to focus on, the ones that have a strong effect mutation
00:38:46.840 | or the ones that have a weak effect mutation,
00:38:49.200 | well, the jury might be out,
00:38:51.280 | because the ones that have a strong effect mutation
00:38:53.920 | are basically not mattering as much.
00:38:58.640 | The ones that only have weak effect mutations,
00:39:01.900 | by understanding through genetics
00:39:04.720 | that they have a weak effect mutation
00:39:07.120 | and understanding that they have a causal role
00:39:08.860 | on the disease, we can then say, okay, great,
00:39:11.280 | evolution has only tolerated a 2% change in that gene.
00:39:14.700 | Pharmaceutically, I can go in and induce a 70% change
00:39:19.480 | in that gene.
00:39:20.320 | And maybe I will poke another hole at the bathtub
00:39:24.800 | that was not easy to control in many of the studies
00:39:30.500 | in many of the other strong effect genetic variants.
00:39:35.060 | - So there's this beautiful map across the population
00:39:39.980 | of things that you're saying strong and weak effects,
00:39:42.660 | so stuff with a lot of mutations
00:39:44.880 | and stuff with little mutations, with no mutations.
00:39:48.260 | And you have this map and it lays out the puzzle.
00:39:51.320 | - Yeah, so when I say strong effect,
00:39:53.420 | I mean at the level of individual mutations.
00:39:56.020 | So basically genes where,
00:39:58.900 | so you have to think of first the effect of the gene
00:40:03.120 | on the disease.
00:40:03.960 | Remember how I was sort of painting that map earlier
00:40:07.100 | from genetics all the way to phenotype.
00:40:09.060 | That gene can have a strong effect on the disease,
00:40:14.140 | but the genetic variant might have a weak effect
00:40:16.880 | on the gene.
00:40:18.840 | So basically when you ask what is the effect
00:40:22.440 | of that genetic variant on the disease,
00:40:24.880 | it could be that that genetic variant impacts the gene
00:40:27.200 | by a lot, and then the gene impacts the disease by a little,
00:40:30.920 | or it could be that the genetic variant
00:40:32.440 | impacts the gene by a little,
00:40:33.600 | and then the gene impacts the disease by a lot.
00:40:35.800 | So what we care about is genes that impact the disease a lot,
00:40:40.800 | but genetics gives us the full equation.
00:40:43.560 | And what I would argue is if we couple the genetics
00:40:48.120 | with expression variation,
00:40:51.960 | to basically ask what genes change by a lot,
00:40:54.720 | and which genes correlate with disease by a lot,
00:41:01.240 | even if the genetic variants change them by a little,
00:41:04.560 | then those are the best places to intervene.
00:41:07.760 | Those are the best places where pharmaceutically,
00:41:10.360 | if I have even a modest effect,
00:41:12.680 | I will have a strong effect on the disease.
00:41:15.240 | Whereas those genetic variants
00:41:16.280 | that have a huge effect on the disease,
00:41:18.200 | I might not be able to change that gene by this much
00:41:20.480 | without affecting all kinds of other things.
00:41:22.760 | - Interesting.
00:41:23.600 | So yeah, okay, so that's what we're looking at.
00:41:25.320 | And what have we been able to find
00:41:27.720 | in terms of which disease could be helped?
00:41:31.920 | - Again, don't get me started.
00:41:34.960 | This is, we have found so much.
00:41:38.840 | Our understanding of disease has changed so dramatically
00:41:43.520 | with genetics.
00:41:46.040 | I mean, places that we had no idea would be involved.
00:41:48.880 | So one of the worst things about my genome
00:41:51.720 | is that I have a genetic predisposition
00:41:53.520 | to age-related macular degeneration, AMD.
00:41:56.440 | So it's a form of blindness that causes you
00:41:59.080 | to lose the central part of your vision
00:42:01.680 | progressively as you grow older.
00:42:03.400 | My increased risk is fairly small.
00:42:06.240 | I have an 8% chance.
00:42:07.560 | You only have a 6% chance.
00:42:09.480 | - You, I'm an average.
00:42:11.120 | - Yeah.
00:42:12.080 | - By the way, when you say my, you mean literally yours.
00:42:14.440 | You know this about you.
00:42:15.880 | - I know this about me, yeah.
00:42:17.880 | - Which is kind of, I mean, philosophically speaking
00:42:22.600 | is a pretty powerful thing to live with.
00:42:26.240 | Maybe that's, so we agreed to talk again, by the way,
00:42:29.280 | for the listeners to where we're gonna try to focus
00:42:32.480 | on science today and a little bit of philosophy next time.
00:42:35.920 | But it's interesting to think about
00:42:40.480 | the more you're able to know about yourself
00:42:42.600 | from the genetic information in terms of the diseases,
00:42:45.960 | how that changes your own view of life.
00:42:48.760 | - Yeah, so there's a lot of impact there.
00:42:51.960 | And there's something called genetic exceptionalism,
00:42:55.960 | which basically thinks of genetics
00:42:58.120 | as something very, very different than everything else
00:43:01.120 | as a type of determinism.
00:43:03.760 | And, you know, let's talk about that next time.
00:43:07.280 | So basically-- - That's a good preview.
00:43:08.880 | - Yeah.
00:43:10.000 | So let's go back to AMD.
00:43:11.640 | So basically with AMD, we have no idea what causes AMD.
00:43:16.640 | You know, it was a mystery
00:43:20.080 | until the genetics were worked out.
00:43:23.600 | And now the fact that I know that I have a predisposition
00:43:27.160 | allows me to sort of make some life choices, number one.
00:43:30.480 | But number two, the genes that lead to that predisposition
00:43:34.920 | give us insights as to how does it actually work.
00:43:37.360 | And that's a place where genetics
00:43:40.800 | gave us something totally unexpected.
00:43:42.920 | So there's a complement pathway,
00:43:46.400 | which is an immune function pathway
00:43:48.840 | that was in, you know,
00:43:52.400 | most of the loci associated with AMD.
00:43:55.880 | And that basically told us that, wow,
00:43:57.600 | there's an immune basis to this eye disorder
00:44:02.520 | that people had just not expected before.
00:44:05.120 | If you look at complement,
00:44:06.760 | it was recently also implicated in schizophrenia.
00:44:10.640 | And there's a type of microglia
00:44:13.960 | that is involved in synaptic pruning.
00:44:16.960 | So synapses are the connections between neurons.
00:44:19.960 | And in this whole use it or lose it view
00:44:22.680 | of mental cognition and other capabilities,
00:44:26.840 | you basically have microglia, which are immune cells
00:44:31.200 | that are sort of constantly traversing your brain
00:44:33.720 | and then pruning neuronal connections,
00:44:36.160 | pruning synaptic connections that are not utilized.
00:44:40.080 | So in schizophrenia,
00:44:42.600 | there's thought to be a change in the pruning
00:44:47.040 | that basically if you don't prune your synapses
00:44:49.480 | the right way,
00:44:50.960 | you will actually have an increased role of schizophrenia.
00:44:53.720 | This is something that was completely unexpected
00:44:56.320 | for schizophrenia.
00:44:57.160 | Of course, we knew it has to do with neurons,
00:44:58.960 | but the role of the complement complex,
00:45:01.280 | which is also implicated in AMD,
00:45:03.760 | which is now also implicated in schizophrenia,
00:45:05.400 | was a huge surprise.
00:45:06.280 | - What's the complement complex?
00:45:07.920 | - So it's basically a set of genes, the complement genes,
00:45:11.160 | that are basically having various immune roles.
00:45:15.320 | And as I was saying earlier,
00:45:16.440 | our immune system has been co-opted
00:45:18.520 | for many different roles across the body.
00:45:21.040 | So they actually play many diverse roles.
00:45:23.400 | - And somehow the immune system is connected
00:45:26.360 | to the synaptic pruning process.
00:45:28.200 | - Exactly. - The process of it.
00:45:29.120 | - Exactly.
00:45:29.960 | So immune cells were co-opted to prune synapses.
00:45:33.080 | - How did you figure this out?
00:45:34.280 | (laughing)
00:45:35.640 | How does one go about figuring this intricate connection,
00:45:40.200 | like pipeline of connections out?
00:45:42.040 | - Yeah, let me give you another example.
00:45:43.840 | So Alzheimer's disease,
00:45:45.960 | the first place that you would expect it to act
00:45:48.200 | is obviously the brain.
00:45:49.680 | So we had basically this
00:45:51.280 | roadmap epigenomics consortium view of the human epigenome,
00:45:56.800 | the largest map of the human epigenome
00:45:59.080 | that has ever been built,
00:46:01.760 | across 127 different tissues and samples
00:46:05.680 | with dozens of epigenomic marks
00:46:07.800 | measured in hundreds of donors.
00:46:10.440 | So what we've basically learned through that
00:46:13.520 | is that you basically can map
00:46:15.840 | what are the active gene regulatory elements
00:46:18.040 | for every one of the tissues in the body.
00:46:20.200 | And then we connected these gene regulatory active maps
00:46:25.080 | of basically what regions of the human genome
00:46:28.360 | are turning on in every one of different tissues.
00:46:31.920 | We then can go back and say,
00:46:34.440 | where are all of the genetic loci
00:46:37.480 | that are associated with disease?
00:46:39.160 | This is something that my group,
00:46:41.720 | I think was the first to do back in 2010
00:46:45.160 | in this Ernst Nature Biotech paper.
00:46:48.640 | But basically we were for the first time able to show
00:46:50.960 | that specific chromatin states,
00:46:53.160 | specific epigenomic states,
00:46:54.560 | in that case enhancers,
00:46:56.160 | were in fact enriched in disease associated variants.
00:47:00.640 | We pushed that further
00:47:02.240 | in the Ernst Nature paper a year later,
00:47:05.440 | and then in this roadmap epigenomics paper,
00:47:08.920 | a few years after that.
00:47:11.320 | But basically that matrix that you mentioned earlier
00:47:15.120 | was in fact the first time that we could see
00:47:17.320 | what genetic traits have genetic variants
00:47:21.160 | that are enriched in what tissues in the body.
00:47:26.040 | And a lot of that map made complete sense.
00:47:28.640 | If you looked at a diversity of immune traits,
00:47:31.440 | like allergies and type 1 diabetes and so on and so forth,
00:47:34.720 | you basically could see that they were enriching,
00:47:37.800 | that the genetic variants associated with those traits
00:47:40.120 | were enriched in enhancers,
00:47:42.960 | in these gene regulatory elements,
00:47:44.600 | active in T cells and B cells
00:47:46.520 | and hematopoietic stem cells and so on and so forth.
00:47:49.040 | So that basically gave us a confirmation in many ways
00:47:54.560 | that those immune traits were indeed
00:47:56.880 | enriching in immune cells.
00:47:58.840 | If you looked at type 2 diabetes,
00:48:02.480 | you basically saw an enrichment in only one type of sample,
00:48:06.000 | and it was pancreatic islets.
00:48:07.880 | And we know that type 2 diabetes
00:48:10.760 | sort of stems from the dysregulation of insulin
00:48:14.680 | in the beta cells of pancreatic islets.
00:48:17.200 | And that sort of was spot on, super precise.
00:48:19.920 | If you looked at blood pressure,
00:48:22.800 | where would you expect blood pressure to occur?
00:48:25.760 | You know, I don't know, maybe in your metabolism,
00:48:27.800 | in ways that you process coffee or something like that,
00:48:30.160 | maybe in your brain, the way that you stress out
00:48:32.360 | and increases your blood pressure, et cetera.
00:48:34.240 | What we found is that blood pressure localized specifically
00:48:37.240 | in the left ventricle of the heart.
00:48:40.280 | So the enhancers of the left ventricle in the heart
00:48:42.680 | contained a lot of genetic variants
00:48:44.160 | associated with blood pressure.
00:48:45.760 | If you look at height,
00:48:48.320 | we found an enrichment specifically
00:48:50.480 | in embryonic stem cell enhancers.
00:48:52.280 | So the genetic variants predisposing you
00:48:54.800 | to be taller or shorter
00:48:56.400 | are in fact acting in developmental stem cells.
00:48:59.320 | Makes complete sense.
00:49:01.280 | If you looked at inflammatory bowel disease,
00:49:03.400 | you basically found inflammatory, which is immune,
00:49:06.640 | and also bowel disease, which is digestive.
00:49:09.760 | And indeed, we saw a double enrichment,
00:49:12.000 | both in the immune cells and in the digestive cells.
00:49:15.680 | So that basically told us that,
00:49:16.880 | aha, this is acting in both components.
00:49:18.880 | There's an immune component to inflammatory bowel disease
00:49:21.360 | and there's a digestive component.
00:49:23.320 | And the big surprise was for Alzheimer's.
00:49:25.840 | We had seven different brain samples.
00:49:28.120 | We found zero enrichment in the brain samples
00:49:33.120 | for genetic variants associated with Alzheimer's.
00:49:36.080 | And this is mind boggling.
00:49:37.880 | Our brains were literally hurting.
00:49:39.760 | What is going on?
00:49:41.560 | And what is going on is that the brain samples
00:49:44.520 | are primarily neurons, oligodendrocytes, and astrocytes
00:49:49.520 | in terms of the cell types that make them up.
00:49:52.920 | So that basically indicated that genetic variants
00:49:56.480 | associated with Alzheimer's were probably not acting
00:50:00.520 | in oligodendrocytes, astrocytes, or neurons.
00:50:04.400 | So what could they be acting in?
00:50:06.240 | Well, the fourth major cell type is actually microglia.
00:50:09.560 | Microglia are resident immune cells in your brain.
00:50:13.680 | - Oh, nice.
00:50:14.520 | The immune, oh, wow.
00:50:17.320 | - And they are CD14+, which is this sort of
00:50:21.280 | cell surface markers of those cells.
00:50:24.080 | So they're CD14+ cells, just like microphages
00:50:27.000 | that are circulating in your blood.
00:50:29.080 | The microglia are resident monocytes
00:50:33.960 | that are basically sitting in your brain.
00:50:35.560 | They're tissue-specific monocytes.
00:50:38.280 | And every one of your tissues, like your fat, for example,
00:50:41.440 | has a lot of macrophages that are resident.
00:50:43.680 | And the M1 versus M2 macrophage ratio
00:50:46.720 | has a huge role to play in obesity.
00:50:49.120 | And so basically, again, these immune cells are everywhere,
00:50:51.760 | but basically what we found,
00:50:53.480 | through this completely unbiased view
00:50:56.080 | of what are the tissues that likely
00:50:58.280 | underlie different disorders,
00:51:00.440 | we found that Alzheimer's was humongously enriched
00:51:05.040 | in microglia, but not at all in the other cell types.
00:51:08.960 | - So what are we supposed to make of that
00:51:11.080 | if you look at the tissues involved?
00:51:14.720 | Is that simply useful for indication
00:51:18.040 | of propensity for disease?
00:51:20.920 | Or does it give us somehow a pathway of treatment?
00:51:24.560 | - It's very much the second.
00:51:26.040 | If you look at the way to therapeutics,
00:51:31.040 | you have to start somewhere.
00:51:33.760 | What are you gonna do?
00:51:34.600 | You're gonna basically make assays
00:51:36.960 | that manipulate those genes and those pathways
00:51:41.960 | in those cell types.
00:51:43.680 | So before we know the tissue of action,
00:51:46.640 | we don't even know where to start.
00:51:49.120 | We basically are at a loss.
00:51:50.960 | But if you know the tissue of action,
00:51:52.520 | and even better, if you know the pathway of action,
00:51:55.240 | then you can basically screen your small molecules,
00:51:58.160 | not for the gene, you can screen them directly
00:52:00.360 | for the pathway in that cell type.
00:52:03.080 | So you can basically develop
00:52:04.560 | a high throughput multiplexed robotic system
00:52:08.880 | for testing the impact of your favorite molecules
00:52:12.960 | that you know are safe, efficacious,
00:52:14.800 | and sort of hit that particular gene, and so on and so forth.
00:52:18.800 | You can basically screen those molecules
00:52:20.800 | against either a set of genes that act in that pathway,
00:52:25.800 | or on the pathway directly by having a cellular assay.
00:52:29.640 | And then you can basically go into mice and do experiments
00:52:32.360 | and basically sort of figure out ways
00:52:33.800 | to manipulate these processes
00:52:36.280 | that allow you to then go back to humans
00:52:38.520 | and do a clinical trial that basically says,
00:52:40.200 | okay, I was able indeed to reverse these processes in mice,
00:52:43.880 | can I do the same thing in humans?
00:52:46.080 | So the knowledge of the tissues
00:52:49.080 | gives you the pathway to treatment.
00:52:51.280 | But that's not the only part.
00:52:52.640 | There are many additional steps
00:52:54.760 | to figuring out the mechanism of disease.
00:52:56.960 | - I mean, so that's really promising.
00:52:58.760 | Maybe to take a small step back,
00:53:01.760 | you've mentioned all these puzzles
00:53:03.560 | that were figured out with the Nature paper for,
00:53:07.720 | I mean, you've mentioned a ton of diseases,
00:53:10.540 | from obesity to Alzheimer's, even schizophrenia,
00:53:14.800 | I think you mentioned.
00:53:15.840 | What is the actual methodology of figuring this out?
00:53:20.640 | - So indeed, I mentioned a lot of diseases,
00:53:22.920 | and my lab works on a lot of different disorders.
00:53:25.760 | And the reason for that is that
00:53:29.280 | if you look at the,
00:53:31.680 | if you look at biology,
00:53:36.360 | it used to be zoology departments
00:53:39.560 | and botanology departments and virology departments
00:53:42.560 | and so on and so forth.
00:53:43.400 | And MIT was one of the first schools
00:53:45.160 | to basically create a biology department,
00:53:47.080 | like, oh, we're gonna study all of life suddenly.
00:53:49.520 | Why was that even a case?
00:53:51.320 | Because the advent of DNA and the genome
00:53:55.240 | and the central dogma of DNA makes RNA makes protein,
00:53:58.520 | in many ways, unified biology.
00:54:00.620 | You could suddenly study the process of transcription
00:54:04.360 | in viruses or in bacteria
00:54:06.920 | and have a huge impact on yeast and fly
00:54:10.120 | and maybe even mammals,
00:54:12.040 | because of this realization
00:54:14.600 | of these common underlying processes.
00:54:16.440 | And in the same way that DNA unified biology,
00:54:22.400 | genetics is unifying disease studies.
00:54:27.080 | So you used to have,
00:54:28.460 | you used to have, I don't know,
00:54:34.560 | cardiovascular disease department
00:54:36.640 | and neurological disease department
00:54:40.120 | and neurodegeneration department
00:54:41.960 | and basically immune and cancer and so on and so forth.
00:54:46.960 | And all of these were studied in different labs
00:54:50.520 | because it made sense,
00:54:52.640 | because basically the first step
00:54:54.240 | was understanding how the tissue functions
00:54:56.440 | and we kind of knew the tissues involved
00:54:57.960 | in cardiovascular disease and so on and so forth.
00:55:00.560 | But what's happening with human genetics
00:55:01.840 | is that all of that,
00:55:03.520 | all of these walls and edifices
00:55:05.600 | that we had built are crumbling.
00:55:08.280 | And the reason for that is that genetics
00:55:11.680 | is in many ways revealing unexpected connections.
00:55:16.440 | So suddenly we now have to bring the immunologists
00:55:19.080 | to work on Alzheimer's.
00:55:20.300 | They were never in the room.
00:55:22.600 | They were in another building altogether.
00:55:25.800 | The same way for schizophrenia,
00:55:28.040 | we now have to sort of worry about
00:55:30.440 | all these interconnected aspects.
00:55:33.060 | For metabolic disorders,
00:55:34.300 | we're finding contributions from brain.
00:55:36.380 | So suddenly we have to call the neurologist
00:55:38.920 | from the other building and so on and so forth.
00:55:41.200 | So in my view, it makes no sense anymore
00:55:46.000 | to basically say, oh, I'm a geneticist
00:55:48.400 | studying immune disorders.
00:55:50.640 | I mean, that's ridiculous because,
00:55:52.440 | I mean, of course, in many ways,
00:55:54.280 | you still need to sort of focus.
00:55:56.260 | But what we're doing is that we're basically saying,
00:55:59.240 | we'll go wherever the genetics takes us.
00:56:02.080 | And by building these massive resources,
00:56:05.580 | by working on our latest maps, now 833 tissues,
00:56:10.280 | sort of the next generation of the epigenomics roadmap,
00:56:13.740 | which we're now called EpiMap, is 833 different tissues.
00:56:17.880 | And using those, we've basically found enrichments
00:56:21.140 | in 540 different disorders.
00:56:24.460 | Those enrichments are not like, oh, great,
00:56:26.200 | you guys work on that and we'll work on this.
00:56:29.180 | They're intertwined amazingly.
00:56:31.840 | So of course there's a lot of modularity,
00:56:34.500 | but there's these enhancers that are sort of broadly active
00:56:37.120 | and these disorders that are broadly active.
00:56:39.500 | So basically some enhancers are active in all tissues
00:56:41.820 | and some disorders are enriching in all tissues.
00:56:45.060 | So basically there's these multifactorial
00:56:47.740 | and these other class,
00:56:48.580 | which I like to call polyfactorial diseases,
00:56:51.420 | which are basically lighting up everywhere.
00:56:54.140 | And in many ways, it's sort of cutting across these walls
00:56:59.140 | that were previously built across these departments.
00:57:01.640 | - And the polyfactorial ones were probably
00:57:03.980 | the previous structure departments
00:57:06.180 | wasn't equipped to deal with those.
00:57:08.620 | I mean, again, maybe it's a romanticized question,
00:57:12.860 | but there's in physics, there's a theory of everything.
00:57:16.780 | Do you think it's possible to move towards
00:57:19.740 | an almost theory of everything of disease
00:57:22.520 | from a genetic perspective?
00:57:24.020 | So if this unification continues, is it possible that,
00:57:27.940 | like, do you think in those terms,
00:57:29.460 | like trying to arrive at a fundamental understanding
00:57:33.060 | of how disease emerges, period?
00:57:35.460 | - That unification is not just foreseeable, it's inevitable.
00:57:40.460 | I see it as inevitable.
00:57:43.440 | We have to go there.
00:57:45.140 | You cannot be a specialist anymore if you're a genomicist,
00:57:49.660 | you have to be a specialist in every single disorder.
00:57:53.740 | And the reason for that is that the fundamental understanding
00:57:58.260 | of the circuitry of the human genome
00:58:00.780 | that you need to solve schizophrenia,
00:58:03.800 | that fundamental circuitry is hugely important
00:58:08.020 | to solve Alzheimer's.
00:58:09.500 | And that same circuitry is hugely important
00:58:11.420 | to solve metabolic disorders.
00:58:13.020 | And that same exact circuitry is hugely important
00:58:16.940 | for solving immune disorders and cancer
00:58:19.340 | and every single disease.
00:58:22.180 | So all of them have the same sub task.
00:58:26.580 | And I teach dynamic programming in my class.
00:58:29.580 | Dynamic programming is all about sort of
00:58:31.380 | not redoing the work, it's reusing the work that you do once.
00:58:36.380 | So basically for us to say, oh, great,
00:58:40.140 | you guys in the immune building,
00:58:41.780 | go solve the fundamental circuitry of everything.
00:58:44.140 | And then you guys in the schizophrenia building
00:58:46.120 | go solve the fundamental circuitry
00:58:47.380 | of everything separately, is crazy.
00:58:50.040 | So what we need to do is come together
00:58:52.700 | and sort of have a circuitry group,
00:58:55.580 | the circuitry building that sort of tries
00:58:57.500 | to solve the circuitry of everything.
00:58:59.300 | And then the immune folks who will apply this knowledge
00:59:03.840 | to all of the disorders that are associated
00:59:06.620 | with immune dysfunction.
00:59:09.080 | And the schizophrenia folks will basically interact
00:59:12.500 | with both the immune folks and with the neuronal folks.
00:59:15.460 | And all of them will be interacting
00:59:16.780 | with the circuitry folks and so on and so forth.
00:59:18.940 | So that's sort of the current structure of my group,
00:59:21.500 | if you wish.
00:59:22.340 | So basically what we're doing is focusing
00:59:24.180 | on the fundamental circuitry.
00:59:25.700 | But at the same time, we're the users of our own tools
00:59:31.160 | by collaborating with many other labs
00:59:34.860 | in every one of these disorders that we mentioned.
00:59:37.460 | We basically have a heart focus on cardiovascular disease,
00:59:41.060 | coronary artery disease, heart failure, and so on and so forth.
00:59:44.220 | We have an immune focus on several immune disorders.
00:59:48.840 | We have a cancer focus on metastatic melanoma
00:59:53.840 | and immunotherapy response.
00:59:55.520 | We have a psychiatric disease focus on schizophrenia,
01:00:00.360 | autism, PTSD, and other psychiatric disorders.
01:00:04.060 | We have an Alzheimer's and neurodegeneration focus
01:00:06.880 | on Huntington's disease, ALS, and AD-related disorders
01:00:11.780 | like frontotemporal dementia and Lewy body dementia.
01:00:14.380 | And of course, a huge focus on Alzheimer's.
01:00:16.660 | We have a metabolic focus on the role of exercise and diet
01:00:21.200 | and sort of how they're impacting metabolic organs
01:00:26.200 | across the body and across many different tissues.
01:00:28.980 | And all of them are interfacing with the circuitry.
01:00:33.980 | And the reason for that is another computer science principle
01:00:38.100 | of eat your own dog food.
01:00:40.640 | If everybody ate their own dog food,
01:00:44.900 | dog food would taste a lot better.
01:00:46.600 | The reason why Microsoft Excel and Word and PowerPoint
01:00:51.720 | was so important and so successful
01:00:55.140 | is because the employees that were working on them
01:00:58.340 | were using them for their day-to-day tasks.
01:01:01.460 | You can't just simply build a circuitry and say,
01:01:04.420 | "Here it is, guys, take the circuitry, we're done,"
01:01:07.060 | without being the users of that circuitry
01:01:09.380 | because you then go back.
01:01:11.220 | And because we span the whole spectrum
01:01:13.420 | from profiling the epigenome, using comparative genomics,
01:01:17.420 | finding the important nucleotides in the genome,
01:01:19.800 | building the basic functional map
01:01:21.740 | of what are the genes in the human genome,
01:01:24.500 | what are the gene regulatory elements of the human genome.
01:01:27.620 | I mean, over the years, we've written a series of papers
01:01:30.260 | on how do you find human genes in the first place,
01:01:32.740 | using comparative genomics.
01:01:34.060 | How do you find the motifs that are the building blocks
01:01:36.940 | of gene regulation, using comparative genomics?
01:01:39.620 | How do you then find how these motifs come together
01:01:43.060 | and act in specific tissues, using epigenomics?
01:01:46.220 | How do you link regulators to enhancers and enhancers
01:01:50.980 | to their target genes,
01:01:52.220 | using epigenomics and regulatory genomics?
01:01:55.220 | So through the years, we've basically built
01:01:57.500 | all this infrastructure for understanding
01:02:00.380 | what I like to say, every single nucleotide
01:02:03.700 | of the human genome and how it acts in every one
01:02:07.420 | of the major cell types and tissues of the human body.
01:02:10.500 | I mean, this is no small task.
01:02:12.020 | This is an enormous task that takes the entire field.
01:02:15.460 | And that's something that my group has taken on,
01:02:18.060 | along with many other groups.
01:02:19.500 | And we have also, and that sort of, I think,
01:02:23.100 | sets my group perhaps apart,
01:02:24.780 | we have also worked with specialists
01:02:26.940 | in every one of these disorders
01:02:28.820 | to basically further our understanding
01:02:30.700 | all the way down to disease.
01:02:32.460 | And in some cases, collaborating with pharma
01:02:34.420 | to go all the way down to therapeutics
01:02:36.580 | because of our deep, deep understanding
01:02:39.980 | of that basic circuitry
01:02:41.140 | and how it allows us to now improve the circuitry,
01:02:46.500 | not just treat it as a black box,
01:02:49.060 | but basically go and say, okay,
01:02:50.300 | we need a better cell type specific wiring
01:02:53.620 | that we now have at the tissue specific level.
01:02:56.420 | So we're focusing on that because we're understanding
01:02:59.500 | the needs from the disease front.
01:03:01.780 | - So you have a sense of the entire pipeline.
01:03:04.060 | I mean, one, maybe you can indulge me,
01:03:07.940 | one nice question to ask would be,
01:03:09.780 | how do you, from the scientific perspective,
01:03:14.340 | go from knowing nothing about the disease
01:03:17.340 | to going, you said, to going through the entire pipeline
01:03:22.060 | and actually have a drug or a treatment
01:03:25.260 | that cures that disease?
01:03:26.740 | - So that's an enormously long path
01:03:30.100 | and an enormously great challenge.
01:03:32.460 | And what I'm trying to argue is that
01:03:34.940 | it progresses in stages of understanding
01:03:38.820 | rather than one gene at a time.
01:03:40.900 | The traditional view of biology was
01:03:42.740 | you have one postdoc working on this gene
01:03:44.860 | and another postdoc working on that gene.
01:03:47.460 | And they'll just figure out everything about that gene.
01:03:50.300 | And that's their job.
01:03:51.940 | What we've realized is how polygenic
01:03:54.500 | the diseases are.
01:03:56.260 | So we can't have one postdoc per gene anymore.
01:03:58.140 | We now have to have these cross-cutting needs.
01:04:03.140 | And I'm gonna describe the path to circuitry
01:04:07.460 | along those needs.
01:04:10.140 | And every single one of these paths,
01:04:12.900 | we are now doing in parallel across thousands of genes.
01:04:16.860 | So the first step is you have a genetic association.
01:04:21.660 | And we talked a little bit about sort of the Mendelian path
01:04:24.740 | and the polygenic path to that association.
01:04:27.660 | So the Mendelian path was looking through families
01:04:30.020 | to basically find gene regions and ultimately genes
01:04:34.500 | that are underlying particular disorders.
01:04:36.740 | The polygenic path is basically looking at
01:04:40.580 | unrelated individuals in this giant matrix
01:04:43.220 | of genotype by phenotype,
01:04:44.940 | and then finding hits where a particular variant
01:04:47.940 | impacts disease all the way to the end.
01:04:51.300 | And then we now have a connection,
01:04:53.900 | not between a gene and a disease,
01:04:56.700 | but between a genetic region and a disease.
01:05:00.100 | And that distinction is not understood by most people.
01:05:03.460 | So I'm gonna explain it a little bit more.
01:05:05.580 | Why do we not have a connection between a gene
01:05:10.020 | and a disease, but we have a connection
01:05:11.340 | between a genetic region and a disease?
01:05:13.380 | The reason for that is that 93% of genetic variants
01:05:19.980 | that are associated with disease
01:05:21.900 | don't impact the protein at all.
01:05:25.220 | So if you look at the human genome, there's 20,000 genes.
01:05:29.940 | There's 3.2 billion nucleotides.
01:05:32.100 | Only 1.5% of the genome codes for proteins.
01:05:38.140 | The other 98.5% does not code for proteins.
01:05:46.100 | If you now look at where are the disease variants located,
01:05:49.460 | 93% of them fall in that outside the genes portion.
01:05:55.700 | Of course, genes are enriched,
01:05:57.340 | but they're only enriched by a factor of three.
01:06:00.500 | That means that still 93% of genetic variants
01:06:03.460 | fall outside the proteins.
01:06:05.700 | Why is that difficult?
01:06:08.140 | Why is that a problem?
01:06:09.420 | The problem is that when a variant falls outside the gene,
01:06:14.220 | you don't know what gene is impacted by that variant.
01:06:16.860 | You can't just say, "Oh, it's near this gene.
01:06:19.220 | Let's just connect that variant to the gene."
01:06:20.860 | And the reason for that is that the genome circuitry
01:06:24.340 | is very often long range.
01:06:26.740 | So you basically have that genetic variant
01:06:30.740 | that could sit in the intron of one gene.
01:06:34.500 | And an intron is sort of the place
01:06:36.300 | between the exons that code for proteins.
01:06:38.140 | So proteins are split up into exons and introns,
01:06:41.300 | and every exon codes for a particular subset
01:06:43.740 | of amino acids, and together they're spliced together,
01:06:46.860 | and then make the final protein.
01:06:49.180 | So that genetic variant might be sitting
01:06:50.660 | in an intron of a gene.
01:06:51.820 | It's transcribed with the gene,
01:06:53.620 | it's processed and then excised,
01:06:55.500 | but it might not impact this gene at all.
01:06:57.140 | It might actually impact another gene
01:06:59.500 | that's a million nucleotides away.
01:07:01.060 | - So it's just riding along,
01:07:02.380 | even though it has nothing to do
01:07:03.540 | with this nearby neighborhood.
01:07:05.820 | - That's exactly right.
01:07:07.580 | Let me give you an example.
01:07:09.580 | The strongest genetic association with obesity
01:07:12.380 | was discovered in this FTO gene,
01:07:15.660 | fat and obesity associated gene.
01:07:18.340 | So this FTO gene was studied ad nauseum.
01:07:23.340 | People did tons of experiments on it.
01:07:26.700 | They figured out that FTO is in fact
01:07:29.140 | RNA methylation trans rays.
01:07:32.980 | It basically, it sort of impacts something
01:07:36.060 | that we know that we call the epitranscriptome,
01:07:38.740 | just like the genome can be modified,
01:07:40.740 | the transcriptome, the transcripts of the genes
01:07:43.580 | can be modified.
01:07:44.660 | And we basically said, oh great,
01:07:46.620 | that means that epitranscriptomics
01:07:48.500 | is hugely involved in obesity
01:07:49.980 | because that gene FTO is clearly
01:07:53.740 | where the genetic locus is at.
01:07:55.620 | My group studied FTO in collaboration
01:08:00.180 | with a wonderful team led by Melina Klausnitzer.
01:08:04.140 | And what we found is that this FTO locus,
01:08:08.460 | even though it is associated with obesity,
01:08:11.740 | does not implicate the FTO gene.
01:08:14.860 | The genetic variant sits in the first intron
01:08:19.420 | of the FTO gene, but it controls two genes,
01:08:23.500 | IRX3 and IRX5, that are sitting 1.2 million nucleotides away,
01:08:28.500 | several genes away.
01:08:31.860 | - Oh boy.
01:08:34.500 | When am I supposed to feel about that?
01:08:36.140 | Isn't that like super complicated then?
01:08:38.620 | - So the way that I was introduced at a conference
01:08:40.740 | a few years ago was, and here's Manolis Kellis
01:08:43.780 | who wrote the most depressing paper of 2015.
01:08:46.940 | (laughs)
01:08:48.620 | And the reason for that is that
01:08:49.780 | the entire pharmaceutical industry was so comfortable
01:08:52.180 | that there was a single gene in that locus.
01:08:56.020 | Because in some loci, you basically have three dozen genes
01:08:58.500 | that are all sitting in the same region of association.
01:09:01.340 | And you're like, oh gosh, which ones of those is it?
01:09:04.020 | But even that question of which ones of those is it,
01:09:06.780 | is making the assumption that it is one of those,
01:09:09.860 | as opposed to some random gene just far, far away,
01:09:12.860 | which is what our paper showed.
01:09:14.820 | So basically what our paper showed is that
01:09:16.860 | you can't ignore the circuitry.
01:09:19.060 | You have to first figure out the circuitry,
01:09:21.060 | all of those long range interactions,
01:09:23.180 | how every genetic variant impacts the expression
01:09:25.780 | of every gene in every tissue imaginable
01:09:28.620 | across hundreds of individuals.
01:09:30.780 | And then you now have one of the building blocks,
01:09:33.980 | not even all of the building blocks,
01:09:35.660 | for then going and understanding disease.
01:09:39.340 | - So okay, so embrace the wholeness of the circuitry.
01:09:44.340 | - Correct.
01:09:45.500 | - But what, so back to the question of starting
01:09:48.060 | knowing nothing to the disease and going to the treatment.
01:09:51.660 | So what are the next steps?
01:09:53.460 | - So you basically have to first figure out the tissue
01:09:56.100 | and then describe how you figure out the tissue.
01:09:57.980 | You figure out the tissue by taking
01:09:59.260 | all of these non-coding variants
01:10:01.300 | that are sitting outside proteins,
01:10:03.220 | and then figuring out what are the epigenomic enrichments.
01:10:06.700 | And the reason for that, thankfully,
01:10:10.220 | is that there is convergence,
01:10:12.420 | that the same processes are impacted in different ways
01:10:17.060 | by different loci.
01:10:18.180 | And that's a saving grace for our field.
01:10:23.060 | The fact that if I look at hundreds of genetic variants
01:10:26.140 | associated with Alzheimer's,
01:10:27.700 | they localize in a small number of processes.
01:10:31.820 | - Can you clarify why that's hopeful?
01:10:34.580 | So they show up in the same exact way
01:10:36.860 | in the specific set of processes?
01:10:40.180 | - Yeah, so basically there's a small number
01:10:42.460 | of biological processes that underlie,
01:10:44.980 | or at least that play the biggest role in every disorder.
01:10:48.460 | So in Alzheimer's, you basically have
01:10:51.380 | maybe 10 different types of processes.
01:10:53.900 | One of them is lipid metabolism.
01:10:56.180 | One of them is immune cell function.
01:10:58.740 | One of them is neuronal energetics.
01:11:02.300 | So these are just a small number of processes,
01:11:04.100 | but you have multiple lesions,
01:11:06.940 | multiple genetic perturbations
01:11:08.580 | that are associated with those processes.
01:11:10.860 | So if you look at schizophrenia,
01:11:12.740 | it's excitatory neuron function,
01:11:14.300 | it's inhibitory neuron function,
01:11:15.620 | it's synaptic pruning, it's calcium signaling,
01:11:17.780 | and so on and so forth.
01:11:18.820 | So when you look at disease genetics,
01:11:22.100 | you have one hit here and one hit there
01:11:24.580 | and one hit there and one hit there,
01:11:26.300 | completely different parts of the genome.
01:11:28.100 | But it turns out all of those hits
01:11:30.100 | are calcium signaling proteins.
01:11:32.340 | - Oh, cool.
01:11:33.180 | - You're like, aha, that means
01:11:35.060 | that calcium signaling is important.
01:11:37.300 | So those people who are focusing on one docus at a time
01:11:39.940 | cannot possibly see that picture.
01:11:42.540 | You have to become a genomicist.
01:11:44.500 | You have to look at the omics, the om,
01:11:46.940 | the holistic picture to understand these enrichments.
01:11:51.220 | - But you mentioned the convergence thing.
01:11:53.500 | So whatever the thing associated
01:11:56.820 | with the disease shows up.
01:11:58.300 | - So let me explain convergence.
01:11:59.700 | Convergence is such a beautiful concept.
01:12:01.780 | (Zubin laughs)
01:12:03.460 | So you basically have these four genes
01:12:07.380 | that are converging on calcium signaling.
01:12:11.340 | So that basically means that they are acting
01:12:15.260 | each in their own way, but together in the same process.
01:12:18.900 | But now in every one of these loci,
01:12:22.540 | you have many enhancers controlling each of those genes.
01:12:27.500 | That's another type of convergence
01:12:29.540 | where dysregulation of seven different enhancers
01:12:32.180 | might all converge on dysregulation of that one gene,
01:12:35.980 | which then converges on calcium signaling.
01:12:39.220 | And in each one of those enhancers,
01:12:41.020 | you might have multiple genetic variants
01:12:43.580 | distributed across many different people.
01:12:46.540 | Everyone has their own different mutation,
01:12:49.820 | but all of these mutations are impacting that enhancer,
01:12:52.860 | and all of these enhancers are impacting that gene,
01:12:55.140 | and all of these genes are impacting this pathway,
01:12:57.540 | and all of these pathways are acting in the same tissue,
01:12:59.980 | and all of these tissues are converging together
01:13:02.420 | on the same biological process of schizophrenia.
01:13:04.700 | - And you're saying the saving grace
01:13:07.820 | is that that convergence seems to happen
01:13:09.660 | for a lot of these diseases.
01:13:11.060 | - For all of them.
01:13:11.900 | Basically that for every single disease
01:13:14.620 | that we've looked at,
01:13:15.500 | we have found an epigenomic enrichment.
01:13:18.380 | How do you do that?
01:13:19.580 | You basically have all of the genetic variants
01:13:22.540 | associated with the disorder,
01:13:24.020 | and then you're asking for all of the enhancers
01:13:26.140 | active in a particular tissue.
01:13:27.940 | For 540 disorders,
01:13:30.860 | we've basically found that indeed there is an enrichment.
01:13:33.740 | That basically means that there is commonality,
01:13:37.020 | and from the commonality, we can just get insights.
01:13:40.620 | So to explain in mathematical terms,
01:13:43.300 | we're basically building an empirical prior.
01:13:47.060 | We're using a Bayesian approach to basically say,
01:13:50.020 | "Great, all of these variants are equally likely
01:13:53.540 | "in a particular locus to be important."
01:13:55.780 | So in a genetic locus,
01:13:58.420 | you basically have a dozen variants that are co-inherited,
01:14:02.660 | because the way that inheritance works in the human genome
01:14:05.460 | is through all of these recombination events during meiosis.
01:14:10.140 | You basically have, you know,
01:14:12.700 | you inherit maybe three, chromosome three, for example,
01:14:16.740 | in your body.
01:14:17.780 | It's inherited from four different parts.
01:14:20.140 | One part comes from your dad,
01:14:21.860 | another part comes from your mom,
01:14:23.060 | another part comes from your dad,
01:14:24.380 | another part comes from your mom.
01:14:25.860 | So basically, the way that it,
01:14:28.060 | sorry, from your mom's mom.
01:14:30.180 | So you basically have one copy that comes from your dad
01:14:33.140 | and one copy that comes from your mom.
01:14:34.540 | But that copy that you got from your mom
01:14:36.580 | is a mixture of her maternal and her paternal chromosome.
01:14:40.980 | And the copy that you got from your dad
01:14:42.260 | is a mixture of his maternal and his paternal chromosome.
01:14:45.620 | So these breakpoints that happen
01:14:48.300 | when chromosomes are lining up
01:14:51.180 | are basically ensuring, through these crossover events,
01:14:54.780 | they're ensuring that every child cell
01:14:59.780 | during the process of meiosis,
01:15:02.540 | where you basically have, you know,
01:15:04.460 | one spermatozoid that basically couples with one ovule
01:15:08.220 | to basically create one egg,
01:15:10.020 | to basically create the zygote.
01:15:12.180 | You basically have half of your genome that comes from dad
01:15:15.780 | and half your genome that comes from mom.
01:15:17.300 | But in order to line them up,
01:15:19.540 | you basically have these crossover events.
01:15:21.100 | These crossover events are basically leading
01:15:23.740 | to co-inheritance of that entire block
01:15:27.700 | coming from your maternal grandmother
01:15:30.260 | and that entire block coming from your maternal grandfather.
01:15:33.780 | Over many generations,
01:15:35.620 | these crossover events don't happen randomly.
01:15:38.780 | There's a protein called PRDM9
01:15:41.460 | that basically guides the double-stranded breaks
01:15:45.140 | and then leads to these crossovers.
01:15:48.180 | And that protein has a particular preference
01:15:50.820 | to only a small number of hotspots of recombination,
01:15:54.180 | which then lead to a small number of breaks
01:15:57.740 | between these co-inheritance patterns.
01:15:59.900 | So even though there are 6 million variants,
01:16:01.820 | there are 6 million loci,
01:16:03.300 | this variation is inherited in blocks.
01:16:09.580 | And every one of these blocks has like two dozen
01:16:11.940 | genetic variants that are all associated.
01:16:14.220 | So in the case of FTO, it wasn't just one variant,
01:16:17.420 | it was 89 common variants
01:16:19.740 | that were all humongously associated with obesity.
01:16:23.180 | Which ones of those is the important one?
01:16:26.780 | Well, if you look at only one locus, you have no idea.
01:16:29.620 | But if you look at many loci,
01:16:32.140 | you basically say, "Aha, all of them are enriching
01:16:36.700 | in the same epigenomic map."
01:16:40.060 | In that particular case, it was mesenchymal stem cells.
01:16:44.140 | So these are the progenitor cells that give rise
01:16:47.100 | to your brown fat and your white fat.
01:16:50.660 | - Progenitor is like the early on developmental stem cells?
01:16:54.020 | - So you start from one zygote,
01:16:56.060 | and that's a totipotent cell type, it can do anything.
01:16:59.100 | You then, that cell divides, divides, divides,
01:17:04.340 | and then every cell division is leading to specialization,
01:17:09.340 | where you now have a mesodermal lineage,
01:17:13.060 | an ectodermal lineage, an endodermal lineage,
01:17:15.620 | that basically leads to different parts of your body.
01:17:19.220 | The ectoderm will basically give rise to your skin.
01:17:22.300 | Ecto means outside, derm is skin.
01:17:25.740 | So ectoderm, but it also gives rise to your neurons
01:17:28.740 | and your whole brain, so that's a lot of ectoderm.
01:17:31.060 | Mesoderm gives rise to your internal organs,
01:17:33.780 | including the vasculature and your muscle
01:17:37.220 | and stuff like that.
01:17:38.300 | So you basically have this progressive differentiation,
01:17:43.060 | and then if you look further, further down that lineage,
01:17:45.940 | you basically have one lineage that will give rise
01:17:47.900 | to both your muscle and your bone, but also your fat.
01:17:51.760 | And if you go further down the lineage of your fat,
01:17:55.840 | you basically have your white fat cells.
01:17:58.900 | These are the cells that store energy.
01:18:01.580 | So when you eat a lot, but you don't exercise too much,
01:18:03.940 | there's an excess set of calories, excess energy.
01:18:07.540 | What do you do with those?
01:18:08.620 | You basically create, you spend a lot of that energy
01:18:11.140 | to create these high-energy molecules, lipids,
01:18:14.660 | which you can then burn when you need them on a rainy day.
01:18:19.660 | So that leads to obesity if you don't exercise
01:18:22.580 | and if you overeat, because your body's like,
01:18:26.420 | oh, great, I have all these calories, I'm gonna store them.
01:18:28.620 | Ooh, more calories, I'm gonna store them too.
01:18:30.420 | Ooh, more calories.
01:18:31.460 | And 42% of European chromosomes
01:18:36.020 | have a predisposition to storing fat,
01:18:40.140 | which was selected probably in the food scarcity periods.
01:18:45.140 | Like basically as we were exiting Africa
01:18:50.140 | before and during the ice ages,
01:18:52.660 | there was probably a selection to those individuals
01:18:54.380 | who made it north to basically be able to store energy,
01:18:58.120 | a lot more energy.
01:19:00.820 | So you basically now have this lineage
01:19:03.860 | that is deciding whether you want to store energy
01:19:07.220 | in your white fat or burn energy in your beige fat.
01:19:11.580 | Turns out that your fat is,
01:19:14.340 | we have such a bad view of fat.
01:19:18.540 | Fat is your best friend.
01:19:19.900 | Fat can both store all these excess lipids
01:19:23.140 | that would be otherwise circulating through your body
01:19:26.500 | and causing damage, but it can also burn calories directly.
01:19:29.900 | If you have too much of energy,
01:19:33.180 | you can just choose to just burn some of that as heat.
01:19:36.220 | So basically when you're cold,
01:19:38.020 | you're burning energy to basically warm your body up
01:19:41.180 | and you're burning all these lipids
01:19:42.340 | and you're burning all these calories.
01:19:44.500 | So what we basically found is that across the board,
01:19:48.740 | genetic variants associated with obesity
01:19:50.540 | across many of these regions were all enriched repeatedly
01:19:54.260 | in mesenchymal stem cell enhancers.
01:19:58.340 | So that gave us a hint as to which of these genetic variants
01:20:02.020 | was likely driving this whole association.
01:20:05.940 | And we ended up with this one genetic variant
01:20:09.340 | called RS1421085.
01:20:13.340 | And that genetic variant out of the 89
01:20:17.340 | was the one that we predicted to be causal for the disease.
01:20:20.780 | So going back to those steps,
01:20:23.180 | first step is figure out the relevant tissue
01:20:25.900 | based on the global enrichment.
01:20:27.740 | Second step is figure out the causal variant
01:20:30.780 | among many variants in this linkage disequilibrium
01:20:34.900 | in this co-inherited block
01:20:37.140 | between these recombination hotspots,
01:20:39.660 | these boundaries of these inherited blocks.
01:20:42.500 | That's the second step.
01:20:43.900 | The third step is once you know that causal variant,
01:20:47.660 | try to figure out what is the motif
01:20:50.020 | that is disrupted by that causal variant.
01:20:52.540 | Basically, how does it act?
01:20:54.140 | Variants don't just disrupt elements,
01:20:56.060 | they disrupt the binding of specific regulators.
01:20:59.500 | So basically the third step there was
01:21:01.020 | how do you find the motif that is responsible,
01:21:04.660 | like the gene regulatory word,
01:21:07.020 | the building block of gene regulation
01:21:09.020 | that is responsible for that dysregulatory event.
01:21:12.420 | And the fourth step is finding out
01:21:13.980 | what regulator normally binds that motif
01:21:17.140 | and is now no longer able to bind.
01:21:19.900 | - And then once you have the regulator,
01:21:21.340 | can you then try to figure out how to,
01:21:23.740 | what, after it developed, how to fix it?
01:21:27.140 | - That's exactly right.
01:21:28.140 | You now know how to intervene.
01:21:30.260 | You have basically a regulator,
01:21:32.140 | you have a gene that you can then perturb.
01:21:34.140 | And you say, well, maybe that regulator
01:21:36.140 | has a global role in obesity.
01:21:38.740 | I can perturb the regulator.
01:21:40.180 | - Just to clarify, when we say perturb,
01:21:43.180 | like on the scale of a human life,
01:21:46.300 | can a human being be helped?
01:21:48.380 | - Of course, of course.
01:21:49.940 | Yeah, so perturb. - I guess understanding
01:21:51.180 | is the first step.
01:21:52.180 | - Exactly.
01:21:53.340 | No, no, but perturbed basically means
01:21:55.020 | you now develop therapeutics,
01:21:56.420 | pharmaceutical therapeutics against that.
01:21:59.140 | Or you develop other types of intervention
01:22:01.140 | that affect the expression of that gene.
01:22:03.660 | - What do pharmaceutical therapeutics look like
01:22:06.780 | when your understanding's on a genetic level?
01:22:11.340 | - Yeah. - Sorry if it's
01:22:12.180 | a dumb question. - No, no, no.
01:22:13.180 | It's a brilliant question,
01:22:14.260 | but I wanna save it for a little bit later
01:22:15.900 | when we start talking about therapeutics.
01:22:17.700 | - Perfect. - We've talked about
01:22:18.620 | the first four steps.
01:22:20.200 | There's two more.
01:22:21.540 | So basically the first step is figure out,
01:22:23.780 | I mean, the zeroth step,
01:22:25.060 | the starting point is the genetics.
01:22:26.740 | The first step after that is figure out
01:22:29.220 | the tissue of action.
01:22:31.060 | The second step is figuring out the nucleotide
01:22:34.300 | that is responsible or set of nucleotides.
01:22:36.900 | The third step is figure out the motif
01:22:38.700 | and the upstream regulator, number four.
01:22:40.800 | Number five and six is what are the targets?
01:22:44.260 | So number five is great.
01:22:45.740 | Now I know the regulator, I know the motif,
01:22:48.260 | I know the tissue, and I know the variant.
01:22:51.380 | What does it actually do?
01:22:53.380 | So you have to now trace it to the biological process
01:22:56.980 | and the genes that mediate that biological process.
01:23:00.420 | So knowing all of this can now allow you
01:23:03.340 | to find the target genes.
01:23:06.260 | By basically doing perturbation experiments
01:23:08.920 | or by looking at the folding of the epigenome
01:23:13.300 | or by looking at the genetic impact
01:23:16.180 | of that genetic variant on the expression of genes.
01:23:19.420 | And we use all three.
01:23:21.500 | So let me go through them.
01:23:22.500 | Basically, one of them is physical links.
01:23:26.300 | This is the folding of the genome onto itself.
01:23:29.780 | How do you even figure out the folding?
01:23:32.100 | It's a little bit of a tangent,
01:23:33.380 | but it's a super awesome technology.
01:23:35.220 | Think of the genome as, again,
01:23:39.120 | this massive packaging that we talked about
01:23:41.440 | of taking two meters worth of DNA
01:23:43.540 | and putting it in something that's a million times smaller
01:23:48.500 | than two meters worth of DNA, that's a single cell.
01:23:51.700 | You basically have this massive packaging,
01:23:53.540 | and this packaging basically leads to the chromosome
01:23:57.060 | being wrapped around in sort of tie-tight ways,
01:24:01.320 | in ways, however, that are functionally capable
01:24:04.100 | of being reopened and reclosed.
01:24:05.940 | So I can then go in and figure out that folding
01:24:11.460 | by sort of chopping up the spaghetti soup,
01:24:14.860 | putting glue, and ligating the segments
01:24:17.900 | that were chopped up but nearby each other,
01:24:20.740 | and then sequencing through these ligation events
01:24:23.660 | to figure out that this region of this chromosome,
01:24:25.940 | that region of the chromosome were near each other,
01:24:28.300 | that means they were interacting,
01:24:30.020 | even though they were far away on the genome itself.
01:24:32.620 | So that chopping up, sequencing, and re-gluing
01:24:37.540 | is basically giving you folds of the genome that we call.
01:24:42.540 | - Sorry, can you backtrack?
01:24:44.580 | - Of course. - How does cutting it
01:24:46.420 | help you figure out which ones were close
01:24:49.220 | in the original folding?
01:24:50.500 | - So you have a bowl of noodles.
01:24:52.440 | - Go on.
01:24:54.460 | - And in that bowl of noodles,
01:24:56.100 | some noodles are near each other.
01:24:59.860 | So throwing a bunch of glue,
01:25:01.560 | you basically freeze the noodles in place,
01:25:05.580 | throwing a cutter that chops up the noodles
01:25:07.500 | into little pieces,
01:25:08.560 | now throwing some ligation enzyme
01:25:13.820 | that lets those pieces that were free
01:25:17.260 | re-ligate near each other.
01:25:19.060 | In some cases, they re-ligate what you had just cut,
01:25:22.740 | but that's very rare.
01:25:24.020 | Most of the time, they will re-ligate
01:25:26.700 | in whatever was proximal.
01:25:28.620 | You now have glued the red noodle
01:25:32.940 | that was crossing the blue noodle to each other.
01:25:35.380 | You then reverse the glue, the glue goes away,
01:25:40.260 | and you just sequence the heck out of it.
01:25:42.860 | Most of the time, you'll find red segment with,
01:25:46.140 | you know, a red segment,
01:25:47.700 | but you can specifically select for ligation events
01:25:50.180 | that have happened that were not from the same segment
01:25:52.500 | by sort of marking them a particular way,
01:25:54.420 | and then selecting those,
01:25:56.540 | and then you sequence and you look for red with blue matches
01:26:00.180 | of sort of things that were glued
01:26:01.860 | that were not immediate proximal to each other,
01:26:04.500 | and that reveals the linking of the blue noodle
01:26:07.780 | and the red noodle.
01:26:08.600 | You're with me so far? - Yeah.
01:26:09.740 | - Good.
01:26:10.580 | So we've done these experiments.
01:26:11.420 | - That's the physical.
01:26:13.340 | - That's the physical.
01:26:14.180 | That's step one of the physical,
01:26:15.660 | and what the physical revealed
01:26:17.180 | is topologically associated domains,
01:26:18.900 | basically big blocks of the genome
01:26:20.820 | that are topologically, you know, connected together.
01:26:23.960 | That's the physical.
01:26:26.260 | The second one is the genetic links.
01:26:29.060 | It basically says, across individuals
01:26:32.660 | that have different genetic variants,
01:26:35.520 | how are their genes expressed differently?
01:26:39.140 | Remember before I was saying
01:26:40.300 | that the path between genetics and disease is enormous,
01:26:43.020 | but we can break it up to look at the path
01:26:44.620 | between genetics and gene expression.
01:26:47.460 | So instead of using Alzheimer's as the phenotype,
01:26:50.220 | I can now use expression of IRX3 as the phenotype,
01:26:54.180 | expression of gene A,
01:26:55.300 | and I can look at all of the humans
01:26:59.380 | who contain a G at that location
01:27:01.260 | and all the humans that contain a T at that location
01:27:03.980 | and basically say, wow,
01:27:05.220 | turns out that the expression of this gene is higher
01:27:07.420 | for the T humans than for the G humans at that location.
01:27:10.580 | So that basically gives me a genetic link
01:27:12.660 | between a genetic variant, a locus, a region,
01:27:16.220 | and the expression of nearby genes.
01:27:18.740 | Good on the genetic link?
01:27:20.980 | - I think so. - Awesome.
01:27:22.240 | So the third link is the activity link.
01:27:25.380 | What's an activity link?
01:27:26.420 | It basically says,
01:27:27.260 | if I look across 833 different epigenomes,
01:27:31.060 | whenever this enhancer is active,
01:27:34.180 | this gene is active.
01:27:35.980 | That gives me an activity link
01:27:37.540 | between this region of the DNA and that gene.
01:27:40.840 | And then the fourth one is perturbations
01:27:43.940 | where I can go in and blow up that region
01:27:46.940 | and see what are the genes that change in expression,
01:27:49.340 | or I can go in and over-activate that region
01:27:51.980 | and see what genes change in expression.
01:27:53.980 | - So I guess that's similar to activity?
01:27:57.620 | - Yeah, yeah, so that's basically similar to activity.
01:28:00.380 | I agree, but it's causal rather than correlational.
01:28:03.180 | - Again, I'm a little weird.
01:28:05.380 | - No, no, you're 100% on.
01:28:07.060 | It's exactly the same as activity,
01:28:08.380 | but perturbation. - But the perturbations.
01:28:09.700 | - Where I go and intervene,
01:28:11.500 | I basically take a bunch of cells.
01:28:13.700 | So you know CRISPR, right?
01:28:15.860 | CRISPR is this genome guidance and cutting mechanism.
01:28:20.860 | It's what George Church likes to call genome vandalism.
01:28:24.640 | So you basically are able to, (laughs)
01:28:28.020 | you can basically take a guide RNA
01:28:31.700 | that you put into the CRISPR system,
01:28:34.380 | and the CRISPR system will basically use this guide RNA,
01:28:36.740 | scan the genome, find wherever there's a match,
01:28:39.380 | and then cut the genome.
01:28:41.420 | So I digress, but it's a bacterial immune defense system.
01:28:47.500 | So basically bacteria are constantly attacked by viruses,
01:28:50.880 | but sometimes they win against the viruses,
01:28:55.300 | and they chop up these viruses,
01:28:56.820 | and remember as a trophy inside their genome,
01:28:59.820 | they have these loci, these CRISPR loci,
01:29:02.740 | that basically stands for clustered,
01:29:04.460 | repeats, interspersed, et cetera.
01:29:06.380 | So basically it's an interspersed repeats structure,
01:29:10.140 | where basically you have a set of repetitive regions,
01:29:13.220 | and then interspersed where these variable segments
01:29:16.760 | that were basically matching viruses.
01:29:19.540 | So when this was first discovered,
01:29:21.820 | it was basically hypothesized
01:29:23.260 | that this is probably a bacterial immune system
01:29:25.560 | that remembers the trophies of the viruses
01:29:27.940 | that manage the kill.
01:29:29.060 | And then the bacteria pass on,
01:29:32.100 | you know, they sort of do lateral transfer of DNA,
01:29:34.460 | and they pass on these memories
01:29:36.340 | so that the next bacterium says,
01:29:37.660 | "Ooh, you killed that guy.
01:29:38.940 | "When that guy shows up again, I will recognize him."
01:29:41.580 | And the CRISPR system was basically evolved
01:29:44.020 | as a bacterial adaptive immune response
01:29:47.300 | to sense foreigners that should not belong,
01:29:50.920 | and to just go and cut their genome.
01:29:53.060 | So it's an RNA guided, RNA cutting enzyme,
01:29:57.480 | or an RNA guided DNA cutting enzyme.
01:30:00.220 | So there's different systems.
01:30:02.140 | Some of them cut DNA, some of them cut RNA,
01:30:04.340 | but all of them remember this sort of viral attack.
01:30:09.340 | So what we have done now as a field is, you know,
01:30:13.660 | through the work of, you know, Jennifer Doudna,
01:30:16.700 | Manuel Carpentier, Feng Zhang, and many others,
01:30:19.340 | is co-opted that system of bacterial immune defense
01:30:24.300 | as a way to cut genomes.
01:30:26.460 | You basically have this guiding system
01:30:30.780 | that allows you to use an RNA guide
01:30:33.380 | to bring enzymes to cut DNA at a particular locus.
01:30:37.660 | - That's so fascinating.
01:30:39.140 | So this is like already a natural mechanism,
01:30:41.900 | a natural tool for cutting that was useful
01:30:45.580 | in this particular context.
01:30:47.260 | And we're like, well, we can use that thing to actually,
01:30:49.940 | it's a nice tool that's already in the body.
01:30:51.860 | - Yeah, yeah.
01:30:52.700 | It's not in our body, it's in the bacterial body.
01:30:55.380 | It was discovered by the yogurt industry.
01:30:58.300 | They were trying to make better yogurts,
01:31:01.660 | and they were trying to make their bacteria
01:31:03.560 | in their yogurt cultures more resilient to viruses.
01:31:06.800 | And they were studying bacteria,
01:31:10.060 | and they found that, wow, this CRISPR system is awesome.
01:31:12.420 | It allows you to defend against that.
01:31:14.700 | And then it was co-opted in mammalian systems
01:31:16.920 | that don't use anything like that as a targeting way
01:31:21.380 | to basically bring these DNA-cutting enzymes
01:31:23.980 | to any locus in the genome.
01:31:25.740 | Why would you want to cut DNA to do anything?
01:31:29.580 | The reason is that our DNA has a DNA repair mechanism,
01:31:33.860 | where if a region of the genome gets randomly cut,
01:31:36.580 | you will basically scan the genome for anything that matches
01:31:40.180 | and sort of use it by homology.
01:31:43.420 | So the reason why we're diploid
01:31:45.140 | is because we now have a spare copy.
01:31:47.200 | As soon as my mom's copy is deactivated,
01:31:49.220 | I can use my dad's copy.
01:31:50.580 | And somewhere else, if my dad's copy is deactivated,
01:31:52.540 | I can use my mom's copy to repair it.
01:31:55.180 | So this is called homologous-based repair.
01:31:58.640 | - So all you have to do is the cutting,
01:32:02.020 | and you don't have to do the fixing.
01:32:04.020 | - That's exactly right.
01:32:04.860 | You don't have to do the fixing.
01:32:06.140 | - 'Cause it's already built in.
01:32:07.220 | - That's exactly right.
01:32:08.400 | But the fixing can be co-opted
01:32:10.780 | by throwing in a bunch of homologous segments
01:32:14.340 | that instead of having your dad's version,
01:32:16.780 | have whatever other version you'd like to use.
01:32:20.540 | - So you then control the fixing
01:32:22.460 | by throwing in a bunch of other stuff.
01:32:23.980 | - That's exactly right.
01:32:24.820 | And that's how you do genome editing.
01:32:26.380 | - So that's what CRISPR is.
01:32:27.780 | - That's what CRISPR is.
01:32:28.620 | - In popular culture, people use the term.
01:32:30.900 | I've never, wow, that's brilliant.
01:32:32.540 | That's just an awesome explanation.
01:32:34.340 | - Genome vandalism followed by a bunch of Band-Aids
01:32:38.420 | that have the sequence that you'd like.
01:32:39.900 | - And you can control the choices of Band-Aids.
01:32:42.740 | - Correct.
01:32:43.660 | And of course, there's new generations of CRISPR.
01:32:46.300 | There's something that's called prime editing
01:32:48.400 | that was sort of very, very much in the press recently,
01:32:51.860 | that basically instead of sort of making
01:32:53.160 | a double-stranded break, which again is genome vandalism,
01:32:57.240 | you basically make a single-stranded break.
01:33:00.800 | You basically just nick one of the two strands,
01:33:03.560 | enabling you to sort of peel off
01:33:06.180 | without sort of completely breaking it up,
01:33:08.740 | and then repair it locally using a guide
01:33:11.820 | that is coupled to your initial RNA
01:33:16.000 | that took you to that location.
01:33:18.440 | - Dumb question, but is CRISPR as awesome
01:33:22.200 | and cool as it sounds?
01:33:23.900 | I mean, technically speaking, in terms of like,
01:33:27.760 | as a tool for manipulating our genetics
01:33:31.840 | in the positive meaning of the word manipulating,
01:33:35.920 | or is there downsides, drawbacks,
01:33:38.680 | in this whole context of therapeutics
01:33:40.360 | that we're talking about, or understanding and so on?
01:33:42.840 | - So when I teach my students about CRISPR,
01:33:46.580 | I show them articles with the headline,
01:33:49.480 | Genome Editing Tool Revolutionizes Biology,
01:33:53.000 | and then I show them the date of these articles,
01:33:55.440 | and they're 2004, like five years
01:33:57.920 | before CRISPR was invented.
01:33:59.320 | And the reason is that they're not talking about CRISPR.
01:34:02.260 | They're talking about zinc finger enzymes
01:34:04.680 | that are another way to bring these cutters to the genome.
01:34:08.880 | It's a very difficult way of sort of designing
01:34:12.040 | the right set of zinc finger proteins,
01:34:13.800 | the right set of amino acids
01:34:15.480 | that will now target a particular long stretch of DNA,
01:34:19.160 | because for every location that you want to target,
01:34:22.160 | you need to design a particular regulator,
01:34:25.360 | a particular protein that will match that region well.
01:34:29.200 | There's another technology called talons,
01:34:31.760 | which are basically just a different way of using proteins
01:34:36.760 | to sort of guide these cutters
01:34:39.300 | to a particular location in the genome.
01:34:41.140 | These require a massive team of engineers,
01:34:44.620 | of biological engineers,
01:34:45.680 | to basically design a set of amino acids
01:34:48.100 | that will target a particular sequence of your genome.
01:34:51.420 | The reason why CRISPR is amazingly, awesomely revolutionary
01:34:55.720 | is because instead of having this team of engineers
01:34:58.760 | design a new set of proteins
01:35:00.640 | for every locus that you want to target,
01:35:02.560 | you just type it in your computer,
01:35:04.560 | and you just synthesize an RNA guide.
01:35:06.740 | The beauty of CRISPR is not the cutting.
01:35:10.000 | It's not the fixing.
01:35:11.000 | All of that was there before.
01:35:12.800 | It's the guiding, and the only thing that changes
01:35:15.860 | is that it makes the guiding easier
01:35:17.980 | by sort of just typing in the RNA sequence,
01:35:21.940 | which then allows the system to sort of scan the DNA
01:35:24.860 | to find that.
01:35:25.860 | - So the coding, the engineering of the cutter is easier
01:35:29.740 | in terms of SV. - Exactly.
01:35:32.180 | - That's kind of similar to the story of deep learning
01:35:34.260 | versus old school machine learning,
01:35:36.660 | is some of the challenging parts are automated.
01:35:39.540 | Okay, so, but CRISPR's just one cutting technology.
01:35:44.540 | And then there's, that's part of the challenges
01:35:47.120 | and exciting opportunities of the field
01:35:49.600 | is to design different cutting technologies.
01:35:52.960 | - So now, this was a big parenthesis on CRISPR,
01:35:56.080 | but now, when we were talking about perturbations,
01:36:00.760 | you basically now have the ability
01:36:02.380 | to not just look at correlation between enhancers and genes,
01:36:05.800 | but actually go and either destroy that enhancer
01:36:09.900 | and see if the gene changes in expression,
01:36:11.800 | or you can use the CRISPR targeting system
01:36:15.260 | to bring in not vandalism and cutting,
01:36:19.860 | but you can couple the CRISPR system with,
01:36:24.500 | and the CRISPR system is called usually CRISPR-Cas9
01:36:27.500 | because Cas9 is the protein that will then come and cut.
01:36:30.760 | But there's a version of that protein called dead Cas9
01:36:34.580 | where the cutting part is deactivated.
01:36:36.700 | So you basically use dCas9, dead Cas9,
01:36:39.640 | to bring in an activator or to bring in a repressor.
01:36:44.640 | So you can now ask, is this enhancer changing that gene?
01:36:48.640 | By taking this modified CRISPR,
01:36:51.860 | which is already modified from the bacteria
01:36:53.580 | to be used in humans,
01:36:54.740 | that you can now modify the Cas9 to be dead Cas9,
01:36:57.700 | and you can now further modify to bring in a regulator,
01:37:00.960 | and you can basically turn on or turn off that enhancer
01:37:03.980 | and then see what is the impact on that gene.
01:37:06.580 | So these are the four ways of linking the locus
01:37:10.100 | to the target gene, and that's step number five.
01:37:12.700 | Step number five is find the target gene,
01:37:16.100 | and step number six is what the heck does that gene do?
01:37:19.540 | You basically now go and manipulate that gene
01:37:22.180 | to basically see what are the processes that change.
01:37:26.620 | And you can basically ask, well, in this particular case,
01:37:31.060 | in the FTO locus, we found mesenchymal stem cells
01:37:34.660 | that are the progenitors of white fat and brown fat,
01:37:38.100 | or beige fat.
01:37:39.500 | We found the RS1421085 nucleotide variant
01:37:43.020 | as the causal variant.
01:37:44.820 | We found this large enhancer, this master regulator.
01:37:49.820 | I like to call it OB1 for obesity one,
01:37:53.300 | like the strongest enhancer associated with it.
01:37:55.620 | And OB1 was kind of chubby as the actor,
01:37:57.180 | I don't know if you remember him.
01:37:58.500 | (laughing)
01:38:00.420 | - Yeah.
01:38:01.260 | - So you basically are using this Jedi mind trick
01:38:02.980 | to basically find out the--
01:38:04.580 | - Thank you.
01:38:05.420 | (laughing)
01:38:06.300 | - The location of the genome that is responsible,
01:38:09.100 | the enhancer that harbors it, the motif,
01:38:12.540 | the upstream regulator, which is ARID5B,
01:38:15.540 | for AT-rich interacting domain 5B.
01:38:18.100 | That's a protein that sort of comes and binds normally.
01:38:20.960 | That protein is normally a repressor.
01:38:23.100 | It represses this super enhancer,
01:38:25.060 | this massive 12,000 nucleotide
01:38:26.900 | master regulatory control region,
01:38:28.740 | and it turns off IRX3,
01:38:32.500 | which is a gene that's 600,000 nucleotides away,
01:38:34.940 | and IRX5, which is 1.2 million nucleotides away.
01:38:38.340 | So those--
01:38:39.180 | - And what's the effect of turning them off?
01:38:40.620 | - That's exactly the next question.
01:38:42.260 | So step six is, what do these genes actually do?
01:38:45.420 | So we then ask, what does IRX3 and IRX5 do?
01:38:48.580 | The first thing we did is look across individuals
01:38:50.940 | for individuals that had higher expression of IRX3
01:38:53.740 | or lower expression of IRX3.
01:38:55.380 | And then we looked at the expression
01:38:56.700 | of all of the other genes in the genome.
01:38:58.860 | And we looked for simply correlation.
01:39:01.500 | And we found that IRX3 and IRX5
01:39:03.500 | were both correlated positively with lipid metabolism
01:39:08.340 | and negatively with mitochondrial biogenesis.
01:39:12.680 | You're like, what the heck does that mean?
01:39:16.300 | - Doesn't sound related to obesity.
01:39:17.940 | - Not at all, superficially.
01:39:20.320 | But lipid metabolism should,
01:39:22.840 | because lipids is these high energy molecules
01:39:26.620 | that basically store fat.
01:39:28.420 | So IRX3 and IRX5 are negatively correlated
01:39:32.380 | with lipid metabolism.
01:39:33.700 | So that basically means that when they turn on,
01:39:36.100 | lipid metabolism, or positively,
01:39:37.740 | when they turn on, they turn on lipid metabolism.
01:39:41.020 | And they're negatively correlated
01:39:42.460 | with mitochondrial biogenesis.
01:39:45.820 | What do mitochondria do in this whole process?
01:39:49.220 | Again, small parenthesis, what are mitochondria?
01:39:51.740 | Mitochondria are little organelles.
01:39:56.140 | They arose, they only are found in eukaryotes.
01:40:00.860 | Euk means good, karyote means nucleus.
01:40:03.900 | So truly, like a true nucleus.
01:40:05.820 | So eukaryotes have a nucleus.
01:40:07.740 | Prokaryotes are before the nucleus.
01:40:09.900 | They don't have a nucleus.
01:40:11.200 | So eukaryotes have a nucleus.
01:40:13.500 | Hmm, compartmentalization.
01:40:15.180 | Eukaryotes have also organelles.
01:40:18.620 | Some eukaryotes have chloroplasts.
01:40:22.700 | These are the plants, they photosynthesize.
01:40:25.860 | Some other eukaryotes, like us,
01:40:28.580 | have another type of organelle called mitochondria.
01:40:33.140 | These arose from an ancient species that we engulfed.
01:40:38.140 | This is an endosymbiosis event.
01:40:43.660 | Symbiosis, bio means life, sym means together.
01:40:47.260 | So symbiotes are things that live together.
01:40:49.860 | Endosymbiosis, endo means inside,
01:40:51.980 | so endosymbiosis means you live together,
01:40:53.780 | holding the other one inside you.
01:40:55.980 | So the pre-eukaryotes engulfed an organism
01:41:00.980 | that was very good at energy production,
01:41:05.620 | and that organism eventually shed most of its genome
01:41:11.140 | to now have only 13 genes in the mitochondrial genome.
01:41:14.740 | And those 13 genes are all involved in energy production,
01:41:21.060 | the electron transport chain.
01:41:23.340 | So basically, electrons are these massive,
01:41:25.620 | super energy-rich molecules.
01:41:27.280 | We basically have these organelles that produce energy,
01:41:33.340 | and when your muscle exercises,
01:41:35.700 | you basically multiply your mitochondria.
01:41:37.780 | You basically sort of use more and more mitochondria,
01:41:41.900 | and that's how you get beefed up.
01:41:43.740 | So basically, the muscle sort of learns
01:41:45.740 | how to generate more energy.
01:41:47.820 | So basically, every single time,
01:41:49.180 | your muscles will, you know, overnight regenerate
01:41:51.580 | and sort of become stronger
01:41:52.460 | and amplify their mitochondria and so forth.
01:41:55.180 | So what do the mitochondria do?
01:41:56.460 | The mitochondria use energy to sort of do any kind of task.
01:42:01.460 | When you're thinking, you're using energy.
01:42:04.920 | This energy comes from mitochondria.
01:42:06.740 | Your neurons have mitochondria all over the place.
01:42:09.800 | Basically, this mitochondria can multiply its organelles,
01:42:12.180 | and they can be spread along the body of your muscle.
01:42:14.900 | Some of your muscle cells have actually multiple nuclei.
01:42:17.340 | They're polynucleated,
01:42:18.500 | but they also have multiple mitochondria
01:42:20.380 | to basically deal with the fact
01:42:22.580 | that your muscle is enormous.
01:42:24.340 | You can sort of span this super, super long length,
01:42:26.780 | and you need energy throughout the length of your muscle.
01:42:29.320 | So that's why you have mitochondria throughout the length,
01:42:31.420 | and you also need transcription through the length,
01:42:32.820 | so you have multiple nuclei as well.
01:42:35.000 | So these two processes, lipids store energy.
01:42:40.000 | What do mitochondria do?
01:42:42.020 | So there's a process known as thermogenesis,
01:42:45.660 | thermal heat genesis generation.
01:42:48.020 | Thermogenesis is the generation of heat.
01:42:50.380 | Remember that bathtub with in and out?
01:42:55.060 | That's the equation that everybody's focused on.
01:42:57.100 | So how much energy do you consume?
01:42:58.800 | How much energy do you burn?
01:43:00.920 | But in every thermodynamic system,
01:43:03.280 | there's three parts to the equation.
01:43:05.940 | There's energy in, energy out, and energy lost.
01:43:10.780 | Any machine has loss of energy.
01:43:14.580 | How do you lose energy?
01:43:15.660 | You emanate heat.
01:43:17.540 | So heat is energy loss.
01:43:18.980 | So there's--
01:43:24.740 | - Which is where the thermogenesis comes in.
01:43:26.660 | - Thermogenesis is actually a regulatory process
01:43:30.140 | that modulates the third component
01:43:32.100 | of the thermodynamic equation.
01:43:33.940 | You can basically control thermogenesis explicitly.
01:43:37.180 | You can turn on and turn off thermogenesis.
01:43:39.340 | - And that's where the mitochondria comes into play.
01:43:40.980 | - Exactly.
01:43:41.820 | So RX3 and RX5 turn out to be the master regulators
01:43:46.280 | of a process of thermogenesis
01:43:49.060 | versus lipogenesis, generation of fat.
01:43:52.300 | So RX3 and RX5 in most people burn heat,
01:43:56.940 | burn calories as heat.
01:43:58.660 | So when you eat too much,
01:43:59.820 | just burn it off in your fat cells.
01:44:02.660 | So that bathtub has basically a sort of dissipation knob
01:44:07.660 | that most people are able to turn on.
01:44:11.200 | I am unable to turn that on
01:44:13.880 | because I am a homozygous carrier
01:44:16.340 | for the mutation that changes a T into a C
01:44:20.420 | in the RS1421085 allele, a locus, a SNP.
01:44:24.620 | I have the risk allele twice,
01:44:26.940 | from my mom and from my dad.
01:44:28.240 | So I'm unable to thermogenize.
01:44:30.760 | I'm unable to turn on thermogenesis through RX3 and RX5
01:44:35.220 | because the regulator that normally binds here,
01:44:38.100 | RX5B, can no longer bind
01:44:39.780 | because it's an AT-rich interacting domain.
01:44:42.620 | And as soon as I change the T into a C,
01:44:44.720 | it can no longer bind because it's no longer AT-rich.
01:44:47.940 | - But doesn't that mean
01:44:48.780 | that you're able to use the energy more efficiently?
01:44:51.700 | You're not generating heat?
01:44:53.580 | Or is that--
01:44:54.420 | - That means I can eat less and get around just fine.
01:44:56.940 | - Yes.
01:44:57.780 | - Yeah, so--
01:44:58.620 | - That's a feature, actually.
01:44:59.820 | - It's a feature in a food-scarce environment.
01:45:02.140 | - Yeah, but--
01:45:02.980 | - If we're all starving, I'm doing great.
01:45:05.080 | If we all have access to massive amounts of food,
01:45:07.140 | I'm obese, basically.
01:45:08.980 | - That's taken us through the entire process
01:45:11.040 | of then understanding why mitochondria and then the lipids
01:45:15.540 | are both, even though distant, are somehow involved.
01:45:18.420 | - Different sides of the same coin.
01:45:20.580 | You basically choose to store energy
01:45:22.460 | or you can choose to burn energy.
01:45:23.860 | - And that all of that is involved in the puzzle of obesity.
01:45:27.640 | - And that's what's fascinating, right?
01:45:29.580 | Here we are in 2007,
01:45:31.460 | discovering the strongest genetic association with obesity
01:45:34.580 | and knowing nothing about how it works for almost 10 years.
01:45:39.360 | For 10 years, everybody focused on this FTO gene.
01:45:42.660 | And they were like, "Oh, it must have to do something
01:45:44.620 | "with RNA modification."
01:45:48.140 | And it's like, "No, it has nothing to do
01:45:49.280 | "with the function of FTO.
01:45:50.580 | "It has everything to do with all of this other process."
01:45:53.700 | And suddenly, the moment you solve that puzzle,
01:45:56.420 | which is a multi-year effort, by the way,
01:45:58.380 | a tremendous effort by Melina and many, many others.
01:46:01.780 | So this tremendous effort basically led us
01:46:04.180 | to recognize this circuitry.
01:46:07.100 | You went from having some 89 common variants associated
01:46:10.860 | in that region of the DNA, sitting on top of this gene,
01:46:13.800 | to knowing the whole circuitry.
01:46:16.040 | When you know the circuitry, you can now go crazy.
01:46:21.060 | You can now start intervening at every level.
01:46:24.420 | You can start intervening at the RX5B level.
01:46:27.180 | You can start intervening with CRISPR-Cas9
01:46:28.980 | at the single SNP level.
01:46:31.200 | You can start intervening at RX3 and RX5, directly there.
01:46:34.760 | You can start intervening at the thermogenesis level
01:46:36.900 | because you know the pathway.
01:46:38.340 | You can start intervening at the differentiation level,
01:46:41.500 | where the decision to make either white fat or beige fat,
01:46:46.500 | the energy-burning beige fat,
01:46:47.980 | is made developmentally in the first three days
01:46:51.900 | of differentiation of your adipocytes.
01:46:53.940 | So as they're differentiating, you basically can choose
01:46:56.220 | to make fat-burning machines or fat-storing machines,
01:46:59.180 | and sort of that's how you populate your fat.
01:47:02.180 | You basically can now go in pharmaceutically
01:47:04.460 | and do all of that.
01:47:05.560 | And in our paper, we actually did all of that.
01:47:09.300 | We went in and manipulated every single aspect.
01:47:12.220 | At the nucleotide level,
01:47:13.620 | we use CRISPR-Cas9 genome editing
01:47:16.080 | to basically take primary adipocytes
01:47:18.220 | from risk and non-risk individuals,
01:47:20.660 | and show that by editing that one nucleotide
01:47:23.720 | out of 3.2 billion nucleotides in the human genome,
01:47:26.580 | you could then flip between an obese phenotype
01:47:29.660 | and a lean phenotype like a switch.
01:47:31.420 | You can basically take micelles that are non-thermogenizing
01:47:34.900 | and just flip into thermogenizing cells
01:47:36.580 | by changing one nucleotide.
01:47:38.540 | It's mind-boggling.
01:47:39.900 | - It's so inspiring that this puzzle
01:47:41.920 | could be solved in this way,
01:47:43.140 | and it feels within reach to then be able
01:47:46.020 | to crack the problem of some of these diseases.
01:47:50.380 | What are, so it's 2007 you mentioned,
01:47:54.260 | what are the technologies, the tools that came along
01:47:58.100 | that made this possible?
01:47:59.780 | Like what are you excited about,
01:48:01.780 | maybe if we just look at the buffet of things
01:48:03.620 | that you've kind of mentioned.
01:48:05.340 | Is there, what's involved, what should we be excited about,
01:48:09.460 | what are you excited about?
01:48:11.540 | - I love that question because there's so much ahead of us.
01:48:13.940 | There's so, so much.
01:48:15.180 | There's, so basically solving that one locus
01:48:20.980 | required massive amounts of knowledge
01:48:23.740 | that we have been building across the years
01:48:25.500 | through the epigenome, through the comparative genomics
01:48:28.320 | to find out the causal variant
01:48:29.820 | and the controller regulatory motif
01:48:33.380 | through the conserved circuitry.
01:48:35.260 | It required knowing this regulatory genomic wiring.
01:48:38.500 | It required high C of these sort of topologically
01:48:41.500 | associated domains to basically find
01:48:42.940 | this long range interaction.
01:48:44.500 | It required EQTLs of this sort of genetic perturbation
01:48:48.460 | of these intermediate gene phenotypes.
01:48:51.060 | It required all of the arsenal of tools
01:48:53.200 | that I've been describing was put together for one locus.
01:48:57.060 | And this was a massive team effort,
01:48:59.380 | huge investment in time, energy, money,
01:49:03.980 | effort, intellectual, everything.
01:49:06.180 | - You're referring to, I'm sorry,
01:49:08.380 | just for the obesity one. - This one paper, yeah.
01:49:09.860 | This one paper. - This one single paper.
01:49:11.460 | - This one single locus.
01:49:12.500 | I like to say that this is a paper
01:49:14.340 | about one nucleotide in the human genome,
01:49:16.460 | about one bit of information,
01:49:17.940 | C versus T in the human genome.
01:49:20.400 | That's one bit of information
01:49:21.660 | and we have 3.2 billion nucleotides to go through.
01:49:25.240 | So how do you do that systematically?
01:49:29.240 | - I am so excited about the next phase of research
01:49:32.320 | because the technologies that my group
01:49:34.940 | and many other groups have developed
01:49:36.540 | allows us to now do this systematically,
01:49:39.040 | not just one locus at a time,
01:49:41.540 | but thousands of loci at a time.
01:49:44.980 | So let me describe some of these technologies.
01:49:47.920 | The first one is automation and robotics.
01:49:52.300 | So basically, we talked about how you can take
01:49:56.420 | all of these molecules and see
01:49:58.240 | which of these molecules are targeting each of these genes
01:50:00.660 | and what do they do.
01:50:02.100 | So you can basically now screen
01:50:03.420 | through millions of molecules,
01:50:06.300 | through thousands and thousands and thousands of plates,
01:50:09.140 | each of which has thousands and thousands
01:50:10.820 | and thousands of molecules,
01:50:12.700 | every single time testing all of these genes
01:50:17.500 | and asking which of these molecules perturb these genes.
01:50:21.960 | So that's technology number one, automation and robotics.
01:50:25.420 | Technology number two is parallel readouts.
01:50:29.200 | So instead of perturbing one locus
01:50:31.080 | and then asking if I use CRISPR-Cas9 on this enhancer
01:50:35.920 | to basically use dCas9 to turn on or turn off the enhancer,
01:50:39.720 | or if I use CRISPR-Cas9 on the SNP
01:50:42.140 | to basically change that one SNP at a time,
01:50:44.780 | then what happens?
01:50:46.520 | But we have 120,000 disease-associated SNPs
01:50:51.160 | that we wanna test.
01:50:52.360 | We don't wanna spend 120,000 years doing it.
01:50:55.620 | So what do we do?
01:50:58.820 | We basically develop this technology
01:51:01.300 | for massively parallel reporter assays, MPRA.
01:51:06.300 | So in collaboration with Tarjan Michelson, Eric Lander,
01:51:09.980 | I mean Jason Durey's group has done a lot of that.
01:51:12.020 | So there's a lot of groups
01:51:13.460 | that basically have developed technologies
01:51:15.980 | for testing 10,000 genetic variants at a time.
01:51:21.340 | How do you do that?
01:51:22.280 | We talked about microarray technology,
01:51:26.040 | the ability to synthesize these huge microarrays
01:51:29.120 | that allow you to do all kinds of things
01:51:30.720 | like measure gene expression by hybridization,
01:51:33.740 | by measuring the genotype of a person,
01:51:36.260 | by looking at hybridization with one version with a T
01:51:38.680 | versus the other version with a C,
01:51:41.560 | and then sort of figuring out
01:51:42.640 | that I am a risk carrier for obesity
01:51:45.560 | based on these hybridization,
01:51:47.380 | differential hybridization in my genome that says,
01:51:49.240 | oh, you seem to only have this allele
01:51:51.120 | or you seem to have that allele.
01:51:52.800 | Microarrays can also be used
01:51:54.340 | to systematically synthesize small fragments of DNA.
01:51:59.340 | So you can basically synthesize
01:52:01.360 | these 150 nucleotide long fragments
01:52:03.840 | across 450,000 spots at a time.
01:52:08.380 | You can now take the result of that synthesis,
01:52:14.120 | which basically works through all of these sort of layers
01:52:16.760 | of adding one nucleotide at a time.
01:52:18.680 | You can basically just type it into your computer
01:52:20.600 | and order it, and you can basically order 10,000
01:52:25.120 | or 100,000 of these small DNA segments at a time.
01:52:30.560 | And that's where awesome molecular biology comes in.
01:52:33.360 | You can basically take all these segments,
01:52:35.280 | have a common start and end barcode or sort of ligator,
01:52:39.960 | like just like pieces of a puzzle,
01:52:42.040 | you can make the same end piece
01:52:44.600 | and the same start piece for all of them.
01:52:47.920 | And you can now use plasmids,
01:52:50.760 | which are these extra chromosomal,
01:52:53.280 | small DNA circular segments
01:52:56.560 | that are basically inhabiting all our genomes.
01:53:00.520 | We basically have plasmids floating around.
01:53:03.040 | I mean, bacteria use plasmids for transferring DNA,
01:53:06.840 | and that's where they put a lot
01:53:07.940 | of antibiotic resistance genes.
01:53:10.640 | So they can easily transfer them
01:53:12.080 | from one bacterium to the other.
01:53:14.120 | So one bacterium evolves a gene to be resistant
01:53:17.040 | to a particular antibiotic.
01:53:19.680 | It basically says to all its friends,
01:53:20.880 | "Hey, here's that sort of DNA piece."
01:53:24.680 | We can now co-opt these plasmids into human cells.
01:53:28.400 | You can basically make a human cell culture
01:53:30.900 | and add plasmids to that human cell culture
01:53:34.080 | that contain the things that you want to test.
01:53:38.080 | You now have this library of 450,000 elements.
01:53:41.240 | You can insert them each into the common plasmid
01:53:45.280 | and then test them in millions of cells in parallel.
01:53:48.280 | - And the common plasmid is all the same
01:53:50.160 | before you add any-- - Exactly.
01:53:51.520 | The rest of the plasmid is the same.
01:53:53.220 | So it's called an epizomal reporter assay.
01:53:57.240 | Epizome means not inside the genome.
01:53:59.640 | It's sort of outside the chromosomes.
01:54:01.520 | So it's an epizomal assay
01:54:03.860 | that allows you to have a variable region
01:54:05.720 | where you basically test 10,000 different enhancers,
01:54:09.200 | and you have a common region
01:54:10.760 | which basically has the same reporter gene.
01:54:13.680 | You now can do some very cool molecular biology.
01:54:16.560 | You can basically take the 450,000 elements
01:54:19.360 | that you've generated,
01:54:20.760 | and you have a piece of the puzzle here,
01:54:22.720 | piece of the puzzle here, which is identical,
01:54:24.360 | so they're compatible with that plasmid.
01:54:27.020 | You can chop them up in the middle
01:54:28.740 | to separate a barcode reporter from the enhancer,
01:54:32.600 | and in the middle, put the same gene,
01:54:34.360 | again, using the same pieces of the puzzle.
01:54:36.840 | You now can have a barcode readout
01:54:39.860 | of what is the impact of 10,000 different versions
01:54:43.080 | of an enhancer on gene expression.
01:54:45.560 | So we're not doing one experiment.
01:54:47.880 | We're doing 10,000 experiments.
01:54:49.560 | And those 10,000 can be 5,000 of different loci,
01:54:55.480 | and each of them in two versions, risk or non-risk.
01:55:00.360 | I can now test tens of thousands--
01:55:02.040 | - These are little hypotheses.
01:55:03.600 | - Exactly.
01:55:04.440 | - And then you can do 10,000, and wait--
01:55:06.840 | - You can test 10,000 hypotheses at once.
01:55:08.880 | - How hard is it to generate those 10,000?
01:55:12.240 | - Trivial, trivial.
01:55:13.800 | - But it's biology.
01:55:14.920 | - No, no, generating the 10,000 is trivial
01:55:16.880 | because you basically add, it's by technology.
01:55:20.680 | You basically have these arrays
01:55:22.720 | that add one nucleotide at a time at every spot.
01:55:26.480 | - So it's printing, and so you're able to control.
01:55:29.880 | - Yeah.
01:55:31.400 | - Super costly, is it?
01:55:33.160 | - 10,000 bucks.
01:55:34.360 | - So this is in millions?
01:55:35.800 | - 10,000 bucks for 10,000 experiments.
01:55:37.360 | Sounds like the right, you know.
01:55:39.640 | - I mean, so that's super, that's exciting,
01:55:42.040 | 'cause you don't have to do one thing at a time.
01:55:44.000 | - You can now use that technology,
01:55:45.440 | these massively parallel reporter assays,
01:55:47.320 | to test 10,000 locations at a time.
01:55:49.880 | We've made multiple modifications to that technology.
01:55:54.880 | One was SHARPER MPRA, which stands for, you know,
01:55:59.880 | basically getting a higher resolution view
01:56:04.600 | by tiling these elements.
01:56:09.480 | So you can see where along the region of control
01:56:14.480 | are they acting.
01:56:16.000 | And we made another modification called HYDRA
01:56:18.440 | for high, you know, definition, regulatory annotation
01:56:23.440 | or something like that,
01:56:25.360 | which basically allows you to test 7 million of these
01:56:29.880 | at a time by sort of cutting them directly from the DNA.
01:56:32.800 | So instead of synthesizing,
01:56:34.320 | which basically has the limit of 450,000
01:56:36.720 | that you can synthesize at a time,
01:56:38.240 | we basically said, "Hey, if we wanna test
01:56:40.080 | all accessible regions of the genome,
01:56:42.440 | let's just do an experiment that cuts accessible regions.
01:56:45.400 | Let's take those accessible regions,
01:56:47.640 | put them all with the same end joints of the puzzles,
01:56:51.360 | and then now use those to create a much, much larger array
01:56:56.360 | of things that you can test.
01:56:59.360 | And then tiling all of these regions,
01:57:01.200 | you can then pinpoint what are the driver nucleotides,
01:57:04.000 | what are the elements, how are they acting
01:57:05.760 | across 7 million experiments at a time.
01:57:07.400 | So basically, this is all the same family of technology
01:57:11.080 | where you're basically using these parallel readouts
01:57:14.040 | of the barcodes.
01:57:15.680 | And then, you know, to do this,
01:57:18.000 | we used a technology called StarSeq
01:57:20.240 | for self-transcribing reporter assays,
01:57:23.560 | a technology developed by Alex Stark,
01:57:25.600 | my former postdoc, who's now a PI over in Vienna.
01:57:30.000 | So we basically coupled the StarSeq,
01:57:32.680 | the self-transcribing reporters,
01:57:35.520 | where the enhancer can be part of the gene itself.
01:57:39.280 | So instead of having a separate barcode,
01:57:40.920 | that enhancer basically acts to turn on the gene
01:57:43.520 | and is transcribed as part of the gene.
01:57:45.960 | - So you don't have to have the two separate parts.
01:57:47.320 | - Exactly, so you can just read them directly.
01:57:49.000 | - So there's a constant improvement in this whole process.
01:57:52.560 | By the way, generating all these options,
01:57:54.840 | is it basically brute force?
01:57:56.920 | How much human intuition is--
01:57:58.560 | - Oh gosh, of course it's human intuition
01:58:01.000 | and human creativity and incorporating
01:58:03.160 | all of the input data sets,
01:58:05.760 | because again, the genome is enormous, 3.2 billion.
01:58:09.200 | You don't wanna test that.
01:58:10.520 | Instead, you basically use all of these tools
01:58:12.520 | that I've talked about already.
01:58:14.120 | You generate your top favorite 10,000 hypotheses,
01:58:18.080 | and then you go and test all 10,000.
01:58:19.760 | And then from what comes out,
01:58:21.080 | you can then go to the next step.
01:58:22.720 | So that's technology number two.
01:58:25.760 | So technology number one is robotics, automation,
01:58:28.720 | where you have thousands of wells
01:58:30.200 | and you constantly test them.
01:58:31.880 | The second technology is instead of having wells,
01:58:34.680 | you have these massively parallel readouts
01:58:37.320 | in sort of these pooled asses.
01:58:39.920 | The third technology is coupling CRISPR perturbations
01:58:44.680 | with these single-cell RNA readouts.
01:58:50.680 | So let me make another parenthesis here
01:58:53.400 | to describe now single-cell RNA sequencing.
01:58:56.480 | So what does single-cell RNA sequencing mean?
01:58:59.560 | So RNA sequencing is what has been traditionally used,
01:59:04.560 | or well, traditionally, the last 20 years,
01:59:07.600 | ever since the advent of next-generation sequencing.
01:59:10.120 | So basically, before, RNA expression profiling
01:59:12.840 | was based on these microarrays.
01:59:14.560 | The next technology after that was based on sequencing.
01:59:17.400 | So you chop up your RNA
01:59:18.960 | and you just sequence small molecules,
01:59:21.440 | just like you would sequence a genome,
01:59:23.720 | basically reverse transcribe the small RNAs into DNA,
01:59:27.040 | and you sequence that DNA
01:59:29.000 | in order to get the number of sequencing reads
01:59:32.920 | corresponding to the expression level
01:59:35.600 | of every gene in the genome.
01:59:37.400 | You now have RNA sequencing.
01:59:39.520 | How do you go to single-cell RNA sequencing?
01:59:42.240 | That technology also went through stages of evolution.
01:59:45.760 | The first was microfluidics.
01:59:48.000 | You basically had these, or even chambers,
01:59:51.280 | you basically had these ways of isolating individual cells,
01:59:54.160 | putting them into a well for every one of these cells.
01:59:57.240 | So you have 384 well plates,
01:59:59.280 | and you now do 384 parallel reactions
02:00:02.320 | to measure the expression of 384 cells.
02:00:05.520 | That sounds amazing, and it was amazing,
02:00:08.200 | but we wanna do a million cells.
02:00:11.200 | How do you go from these wells to a million cells?
02:00:14.040 | You can't.
02:00:15.520 | So what the next technology was after that
02:00:18.640 | is instead of using a well for every reaction,
02:00:21.520 | you now use a lipid droplet for every reaction.
02:00:26.200 | So you use micro droplets as reaction chambers
02:00:30.320 | to basically amplify RNA.
02:00:32.080 | So here's the idea.
02:00:34.480 | You basically have microfluidics,
02:00:36.480 | where you basically have every single cell
02:00:38.680 | coming down one tube in your microfluidics,
02:00:41.440 | and you have little bubbles getting created in the other way
02:00:44.560 | with specifical primers that mark every cell
02:00:47.440 | with its own barcode.
02:00:49.240 | You basically couple the two,
02:00:50.680 | and you end up with little bubbles that have a cell
02:00:54.320 | and tons of markers for that cell.
02:00:57.280 | You now mark up all of the RNA for that one cell
02:01:00.280 | with the same exact barcode,
02:01:01.680 | and you then lyse all of the droplets,
02:01:06.240 | and you sequence the heck out of that,
02:01:08.200 | and you have for every RNA molecule
02:01:10.280 | a unique identifier that tells you what cell was it on.
02:01:12.720 | - That is such good engineering, microfluidics,
02:01:16.320 | and using some kind of primer to put a label on the thing.
02:01:21.320 | I mean, you're making this sound easy.
02:01:24.040 | I assume it's-- - It's beautiful, right?
02:01:26.160 | - But it's gorgeous, yeah.
02:01:27.280 | - So there's the next generation.
02:01:29.440 | So that's the second generation.
02:01:30.840 | Next generation is, forget the microfluidics altogether.
02:01:33.800 | Just use big bottles.
02:01:35.560 | How can you possibly do that with big bottles?
02:01:37.840 | So here's the idea.
02:01:39.280 | You dissociate all of your cells,
02:01:41.160 | or all of your nuclei from complex cells like brain cells
02:01:44.200 | that are very long and sticky, so you can't do that.
02:01:47.640 | So if you have blood cells,
02:01:49.040 | or if you have neuronal nuclei or brain nuclei,
02:01:51.960 | you can basically dissociate, let's say, a million cells.
02:01:56.040 | You now want to add a unique barcode,
02:01:58.800 | a unique barcode in each one of a million cells
02:02:01.440 | using only big bottles.
02:02:03.240 | How can you possibly do that?
02:02:04.240 | Sounds crazy, but here's the idea.
02:02:06.000 | You use 100 of these bottles.
02:02:08.720 | You randomly shuffle all your million cells,
02:02:13.320 | and you throw them into those 100 bottles,
02:02:15.040 | randomly, completely randomly.
02:02:17.040 | You add one barcode out of 100 to every one of the cells.
02:02:21.440 | You then, you now take them all out,
02:02:23.440 | you shuffle them again,
02:02:24.760 | and you throw them again into the same 100 bottles,
02:02:27.200 | but now in a different randomization,
02:02:30.640 | and you add a second barcode.
02:02:33.840 | So every cell now has two barcodes.
02:02:35.760 | You take them out again,
02:02:38.000 | you shuffle them, and you throw them back in.
02:02:40.240 | Another third barcode is adding,
02:02:43.120 | randomly, from the same 100 barcodes.
02:02:47.240 | You've now labeled every cell probabilistically
02:02:51.920 | based on the unique path that it took
02:02:53.520 | of which of 100 bottles did it go for the first time,
02:02:55.760 | which of 100 bottles the second time,
02:02:57.240 | and which of 100 bottles the third time.
02:02:59.200 | 100 times 100 times 100 is a million unique barcodes
02:03:04.080 | in every single one of these cells
02:03:07.200 | without ever using microfluid.
02:03:09.520 | - Very clever.
02:03:10.360 | - It's beautiful, right? - From a computer science
02:03:11.600 | perspective, that's very clever.
02:03:12.720 | - Yeah, so you now have the single cell
02:03:14.960 | sequencing technology.
02:03:16.000 | You can use the wells, you can use the bubbles,
02:03:18.680 | or you can use the bottles.
02:03:20.040 | You have ways--
02:03:22.360 | - The bubbles still sound pretty damn cool.
02:03:23.640 | - The bubbles are awesome,
02:03:24.600 | and that's basically the main technology that we're using.
02:03:26.480 | So the bubbles is the main technology.
02:03:29.040 | So there are kits now that companies just sell
02:03:32.360 | to basically carry out single cell RNA sequencing
02:03:34.840 | that you can basically, for $2,000,
02:03:37.480 | you can basically get 10,000 cells from one sample,
02:03:40.520 | and for every one of those cells,
02:03:44.800 | you basically have the transcription of thousands of genes.
02:03:48.040 | And of course, the data for any one cell is noisy,
02:03:52.680 | but being computer scientists,
02:03:54.200 | we can aggregate the data from all of the cells together
02:03:57.360 | across thousands of individuals together
02:03:59.320 | to basically make very robust inferences.
02:04:01.600 | So the third technology is basically single cell
02:04:05.560 | RNA sequencing that allows you to now start asking
02:04:08.520 | not just what is the brain expression level difference
02:04:12.680 | of that genetic variant,
02:04:14.240 | but what is the expression difference
02:04:15.960 | of that one genetic variant
02:04:17.440 | across every single subtype of brain cell?
02:04:21.520 | How is the variance changing?
02:04:23.320 | You can't just, you know, with a brain sample,
02:04:27.240 | you can just ask about the mean.
02:04:28.840 | What is the average expression?
02:04:30.680 | If I instead have 3,000 cells that are neurons,
02:04:35.280 | I can ask not just what is the neuronal expression,
02:04:38.320 | I can say for layer five excitatory neurons,
02:04:41.960 | of which I have, I don't know, 300 cells,
02:04:44.080 | what is the variance that this genetic variant has?
02:04:46.960 | So suddenly, it's amazingly more powerful.
02:04:50.960 | I can basically start asking
02:04:52.040 | about this middle layer of gene expression
02:04:54.200 | at unprecedented levels.
02:04:56.080 | - And when you look at the average,
02:04:57.080 | it washes out some potentially important signal
02:05:01.400 | that corresponds to ultimately the disease.
02:05:03.720 | - Completely.
02:05:04.560 | - Yeah.
02:05:05.400 | - So that, I can do that at the RNA level,
02:05:07.720 | but I can also do that at the DNA level for the epigenome.
02:05:11.600 | So remember how before I was telling
02:05:13.200 | about all this technology that we're using
02:05:14.320 | to probe the epigenome?
02:05:15.760 | One of them is DNA accessibility.
02:05:18.000 | So what we're doing in my lab is that
02:05:19.440 | from the same dissociation of, say, a brain sample,
02:05:23.280 | where you now have all these tens of thousands
02:05:24.960 | of cells floating around,
02:05:26.600 | you basically take half of them to do RNA profiling,
02:05:30.080 | and the other half to do epigenome profiling,
02:05:31.920 | both at the single cell level.
02:05:34.000 | So that allows you to now figure out
02:05:35.440 | what are the millions of DNA enhancers
02:05:39.840 | that are accessible in every one
02:05:41.920 | of tens of thousands of cells.
02:05:43.640 | And computationally, we can now take the RNA
02:05:47.440 | and the DNA readouts and group them together
02:05:50.560 | to basically figure out how is every enhancer
02:05:55.200 | related to every gene.
02:05:57.400 | And remember these sort of enhancer gene linking
02:05:59.400 | that we were doing across 833 samples?
02:06:02.320 | 833 is awesome, don't get me wrong,
02:06:05.000 | but 10 million is way more awesome.
02:06:08.080 | So we can now look at correlated activity
02:06:10.120 | across 2.3 million enhancers and 20,000 genes
02:06:14.560 | in each of millions of cells
02:06:16.520 | to basically start piecing together
02:06:18.080 | the regulatory circuitry of every single type of neuron,
02:06:22.520 | every single type of astrocytes, oligodendrocytes,
02:06:24.840 | microglial cell inside the brains
02:06:27.560 | of 1,500 individuals that we've sampled
02:06:30.400 | across multiple different brain regions
02:06:33.480 | across both DNA and RNA.
02:06:36.120 | So that's the dataset that my team generated last year alone.
02:06:39.520 | So in one year, we've basically generated 10 million cells
02:06:43.600 | from human brain across a dozen different disorders,
02:06:48.040 | across schizophrenia, Alzheimer's,
02:06:49.880 | frontotemporal dementia, Lewy body dementia, ALS,
02:06:53.560 | Huntington's disease, post-traumatic stress disorder,
02:06:58.640 | autism, bipolar disorder, healthy aging, et cetera.
02:07:04.280 | - So it's possible that even just within that dataset
02:07:08.000 | lie a lot of keys to understanding these diseases
02:07:13.000 | and then be able to directly lead to then treatment.
02:07:18.360 | - Correct, correct.
02:07:19.520 | So basically we are now-- - Motivating.
02:07:22.040 | - Yeah, so our computational team is in heaven right now
02:07:24.720 | and we're looking for people.
02:07:25.720 | I mean, if you have listeners who are--
02:07:27.080 | - They inspire, yeah, yeah.
02:07:28.000 | - Super smart computational people.
02:07:29.640 | - So this is a very interesting kind of side question.
02:07:32.960 | How much of this is biology, how much of this is computation?
02:07:36.160 | So you had the computational biology group,
02:07:38.600 | but how much of, should you be comfortable with biology
02:07:43.600 | to be able to solve some of these problems?
02:07:48.320 | If you just find, if you put several of the hats
02:07:51.440 | that you wear on, fundamentally,
02:07:53.640 | are you thinking like a computer scientist here?
02:07:56.400 | - You have to.
02:07:57.400 | This is the only way.
02:07:58.680 | As I said, we are the descendants
02:08:01.480 | of the first digital computer.
02:08:02.640 | We're trying to understand the digital computer.
02:08:04.920 | We're trying to understand the circuitry,
02:08:06.240 | the logic of this digital core computer
02:08:11.080 | and all of these analog layers surrounding it.
02:08:14.040 | So the case that I've been making
02:08:17.320 | is that you cannot think one gene at a time.
02:08:19.720 | The traditional biology is dead.
02:08:21.920 | There's no way, you cannot solve disease
02:08:23.760 | with traditional biology.
02:08:24.840 | You need it as a component.
02:08:27.000 | Once you figured out RX3 and RX5,
02:08:29.480 | you now can then say, hey,
02:08:31.120 | have you guys worked on those genes
02:08:32.360 | with your single gene approach?
02:08:33.760 | We'd love to know everything you know.
02:08:35.440 | And if you haven't,
02:08:36.400 | we now know how important these genes are.
02:08:38.760 | Let's now launch a single gene program
02:08:40.760 | to dissect them and understand them.
02:08:43.280 | But you cannot use that as a way to dissect disease.
02:08:46.640 | You have to think genomically.
02:08:48.480 | You have to think from the global perspective
02:08:50.720 | and you have to build these circuits systematically.
02:08:53.280 | So we need numbers of computer scientists
02:08:56.520 | who are interested and willing to dive into these data,
02:09:00.040 | you know, fully, fully in and sort of extract meaning.
02:09:04.800 | We need computer science people
02:09:06.920 | who can understand sort of machine learning and inference
02:09:10.080 | and sort of, you know, decouple these matrices,
02:09:12.960 | come up with super smart ways of sort of dissecting them.
02:09:16.240 | But we also need computer scientists who understand biology,
02:09:20.320 | who are able to design the next generation of experiments.
02:09:24.400 | Because many of these experiments,
02:09:26.320 | no one in their right mind would design them
02:09:28.400 | without thinking of the analytical approach
02:09:30.200 | that you would use to deconvolve the data afterwards.
02:09:32.840 | Because it's massive amounts of ridiculously noisy data.
02:09:35.640 | And if you don't have the computational pipeline
02:09:39.680 | in your head before you even design the experiment,
02:09:42.600 | you would never design the experiment that way.
02:09:44.640 | - That's brilliant, so in designing the experiment,
02:09:47.120 | you have to see the entirety of the computational pipeline.
02:09:50.040 | - That drives the design.
02:09:52.120 | That even drives the necessity for that design.
02:09:55.600 | Basically, you know, if you didn't have
02:09:57.880 | a computer scientist way of thinking,
02:10:00.120 | you would never design these hugely combinatorial,
02:10:03.480 | massively parallel experiments.
02:10:05.440 | So that's why you need interdisciplinary teams.
02:10:09.080 | You need teams.
02:10:10.240 | And I wanna sort of clarify that,
02:10:12.280 | what do we mean by computational biology group?
02:10:15.120 | The focus is not on computational,
02:10:16.760 | the focus is on the biology.
02:10:18.800 | So we are a biology group.
02:10:20.800 | What type of biology?
02:10:22.040 | Computational biology.
02:10:24.160 | That's the type of biology that uses the whole genome.
02:10:27.640 | That's the type of biology that designs experiments,
02:10:30.400 | genomic experiments, that can only be interpreted
02:10:33.120 | in the context of the whole genome.
02:10:34.720 | - Right, so it's philosophically
02:10:37.320 | looking at biology as a computer.
02:10:39.120 | - Correct, correct.
02:10:40.640 | - So which is, in the context of the history of biology,
02:10:45.080 | is a big transformation.
02:10:46.640 | - Yeah, yeah, you can think of the name as,
02:10:48.920 | what do we do?
02:10:50.040 | Only computation, that's not true.
02:10:51.880 | But how do we study it?
02:10:53.760 | Only computationally, that is true.
02:10:55.520 | So all of these single cell sequencing
02:10:58.560 | can now be coupled with the technology
02:11:00.480 | that we talked about earlier for perturbation.
02:11:02.880 | So here's the crazy thing.
02:11:04.320 | Instead of using these wells and these robotic systems
02:11:07.840 | for doing one drug at a time,
02:11:10.560 | or for perturbing one gene at a time in thousands of wells,
02:11:14.680 | you can now do this using a pool of cells
02:11:18.200 | and single cell RNA sequencing.
02:11:20.920 | You basically can take these perturbations using CRISPR,
02:11:25.920 | and instead of using a single guide RNA,
02:11:29.320 | you can use a library of guide RNAs
02:11:31.320 | generated exactly the same way using this array technology.
02:11:34.440 | So you synthesize a thousand different guide RNAs.
02:11:37.560 | You now take each of these guide RNAs
02:11:42.240 | and you insert them in a pool of cells
02:11:45.400 | where every cell gets one perturbation.
02:11:48.160 | And you use CRISPR editing or CRISPR,
02:11:51.360 | so with either CRISPR-Cas9 to edit the genome
02:11:56.520 | with these thousand perturbations.
02:11:57.720 | - Or the activation one. - Or with the activation
02:11:59.920 | or with the repression.
02:12:01.320 | And you now can have a single cell readout
02:12:04.400 | where every single cell has received
02:12:06.840 | one of these modifications.
02:12:09.520 | And you can now, in massively parallel ways,
02:12:12.000 | couple the perturbation and the readout
02:12:17.040 | in a single experiment.
02:12:18.280 | - How are you tracking which perturbations
02:12:20.200 | each cell received?
02:12:21.560 | - So there's ways of doing that,
02:12:23.560 | but basically one way is to make that perturbation
02:12:26.200 | an expressible vector so that part of your RNA reading
02:12:30.120 | is actually that perturbation itself.
02:12:33.120 | So you can basically put it in an expressible part,
02:12:36.440 | so you can self-drive it.
02:12:37.680 | So the point that I wanna get across
02:12:40.320 | is that the sky's the limit.
02:12:42.040 | You basically have these tools,
02:12:43.560 | these building blocks of molecular biology.
02:12:46.440 | You have these massive datasets of computational biology.
02:12:50.240 | You have this huge ability to sort of use machine learning
02:12:54.200 | and statistical methods and linear algebra
02:12:57.080 | to sort of reduce the dimensionality
02:12:59.400 | of all these massive datasets.
02:13:01.680 | And then you end up with a series of actionable targets
02:13:06.680 | that you can then couple with pharma
02:13:11.080 | and just go after systematically.
02:13:13.280 | So the ability to sort of bring genetics to the epigenomics,
02:13:18.280 | to the transcriptomics, to the cellular readouts
02:13:21.440 | using these sort of high-throughput perturbation technologies
02:13:24.320 | that I'm talking about,
02:13:25.440 | and ultimately to the organismal
02:13:27.800 | through the electronic health record endophenotypes,
02:13:31.440 | and ultimately the disease battery of assays
02:13:35.480 | at the cognitive level, at the physiological level,
02:13:38.200 | and every other level,
02:13:41.760 | there is no better or more exciting field, in my view,
02:13:45.120 | to be a computer scientist then
02:13:46.960 | or to be a scientist in period.
02:13:48.920 | Basically, this confluence of technologies,
02:13:51.200 | of computation, of data, of insights,
02:13:53.960 | and of tools for manipulation
02:13:56.160 | is unprecedented in human history.
02:13:58.760 | And I think this is what's shaping the next century
02:14:02.640 | to really be a transformative century
02:14:04.720 | for our species and for our planet.
02:14:09.360 | - So you think the 21st century will be remembered
02:14:12.000 | for the big leaps in understanding
02:14:15.560 | and alleviation of biology?
02:14:18.640 | - If you look at the path
02:14:20.120 | between discovery and therapeutics,
02:14:22.720 | it's been on the order of 50 years.
02:14:24.760 | It's been shortened to 40, 30, 20,
02:14:27.040 | and now it's on the order of 10 years.
02:14:29.380 | But the huge number of technologies
02:14:31.960 | that are going on right now for discovery
02:14:35.480 | will result undoubtedly
02:14:37.920 | in the most dramatic manipulation of human biology
02:14:41.200 | that we've ever seen in the history of humanity
02:14:44.040 | in the next few years.
02:14:45.160 | - Do you think we might be able to cure some of the diseases
02:14:47.560 | we started this conversation with?
02:14:49.640 | - Absolutely, absolutely.
02:14:51.320 | It's only a matter of time.
02:14:54.040 | Basically, the complexity is enormous,
02:14:55.880 | and I don't want to underestimate the complexity,
02:14:58.360 | but the number of insights is unprecedented,
02:15:01.360 | and the ability to manipulate is unprecedented,
02:15:03.880 | and the ability to deliver these small molecules
02:15:07.520 | and other non-traditional medicine perturbations.
02:15:10.920 | There's a lot of sort of new,
02:15:13.080 | there's a new generation of perturbations
02:15:15.600 | that you can use at the DNA level, at the RNA level,
02:15:18.440 | at the microRNA level, the epigenomic level.
02:15:22.640 | There's a battery of new generations of perturbations.
02:15:26.440 | If you couple that with cell type identifiers
02:15:30.320 | that can basically sense when you are in the right cell
02:15:32.880 | based on a specific combination
02:15:34.080 | and then turn on that intervention for that cell,
02:15:37.360 | you can now think of combinatorial interventions
02:15:40.000 | where you can basically sort of feed
02:15:42.040 | a synthetic biology construct to someone
02:15:44.400 | that will basically do different things in different cells.
02:15:47.560 | So basically for cancer, this is one of the therapeutics
02:15:50.160 | that our collaborator Ron Weiss is using
02:15:52.400 | to basically start sort of engineering the circuits
02:15:54.800 | that will use microRNA sensors of the environment
02:15:57.120 | to sort of know if you're in a tumor cell,
02:15:59.080 | or if you're in an immune cell,
02:16:00.280 | or if you're in a stromal cell, and so on and so forth,
02:16:02.040 | and basically turn on particular interventions there.
02:16:04.800 | You can sort of create constructs
02:16:07.440 | that are tuned to only the liver cells,
02:16:10.240 | or only the heart cells, or only the brain cells,
02:16:14.680 | and then have these new generations of therapeutics
02:16:18.760 | coupled with this immense amount of knowledge
02:16:21.840 | on the sort of which targets to choose
02:16:23.920 | and what biological processes to measure
02:16:25.920 | and how to intervene.
02:16:27.560 | My view is that disease is gonna be fundamentally altered
02:16:31.800 | and alleviated as we go forward.
02:16:35.020 | - Next time we talk,
02:16:37.300 | we'll talk about the philosophical implications of that
02:16:39.700 | and the effect of life,
02:16:40.860 | but let's stick to biology for just a little longer.
02:16:44.180 | We did pretty good today.
02:16:45.100 | We stuck to the science.
02:16:47.200 | What are you excited in terms of the future of this field,
02:16:52.200 | the technologies in your own group, in your own mind?
02:16:58.580 | You're leading the world at MIT in the science
02:17:01.780 | and the engineering of this work,
02:17:04.340 | so what are you excited about here?
02:17:06.460 | - I could not be more excited.
02:17:08.580 | We are one of many, many teams who are working on this.
02:17:12.580 | In my team, the most exciting parts are many folds.
02:17:16.940 | So basically, we've now assembled
02:17:18.980 | these battery of technologies.
02:17:20.180 | We've assembled these massive, massive data sets,
02:17:22.420 | and now we're really sort of in the stage
02:17:25.020 | of our team's path of generating disease insights.
02:17:30.340 | So we are simultaneously working on a paper
02:17:34.420 | on schizophrenia right now
02:17:35.980 | that is basically using
02:17:37.020 | the single-cell profiling technologies,
02:17:38.580 | using this editing and manipulation technologies
02:17:40.980 | to basically show how the master regulators
02:17:45.260 | underlying changes in the brain
02:17:47.500 | that are sort of found in schizophrenia
02:17:49.900 | are in fact affecting excitatory neurons
02:17:52.220 | and inhibitory neurons in pathways
02:17:54.520 | that are active both in synaptic pruning,
02:17:57.740 | but also in early development.
02:17:59.180 | We've basically found this set of four regulators
02:18:01.660 | that are connecting these two processes
02:18:03.180 | that were previously separate in schizophrenia
02:18:06.460 | in sort of having a sort of more unified view
02:18:10.220 | across those two sides.
02:18:12.660 | The second one is in the area of metabolism.
02:18:15.420 | We basically now have a beautiful collaboration
02:18:17.420 | with the Goodyear Lab
02:18:18.460 | that's basically looking at multi-tissue perturbations
02:18:23.460 | in six or seven different tissues across the body
02:18:28.540 | in the context of exercise
02:18:30.300 | and in the context of nutritional interventions
02:18:33.220 | using both mouse and human,
02:18:35.860 | where we can basically see
02:18:37.180 | what are the cell-to-cell communications
02:18:39.980 | that are changing across them.
02:18:42.300 | And what we're finding is this immense role
02:18:44.940 | of both immune cells,
02:18:46.620 | as well as adipocyte stem cells
02:18:49.140 | in sort of reshaping that circuitry
02:18:51.100 | of all of these different tissues,
02:18:52.700 | and that's sort of painting to a new path
02:18:54.500 | for therapeutical intervention there.
02:18:56.820 | In Alzheimer's, it's this huge focus on microglia,
02:19:00.780 | and now we're discovering different classes
02:19:02.620 | of microglial cells
02:19:04.220 | that are basically either synaptic or immune,
02:19:08.220 | and these are playing vastly different roles
02:19:13.300 | in Alzheimer's versus in schizophrenia.
02:19:15.980 | And what we're finding is this immense complexity
02:19:19.100 | as you go further and further down
02:19:21.420 | of how, in fact, there's 10 different types of microglia,
02:19:25.580 | each with their own sort of expression programs.
02:19:28.260 | We used to think of them as, oh, yeah, they're microglia,
02:19:30.900 | but in fact, now we're realizing
02:19:32.500 | just even in that sort of least abundant of cell types,
02:19:36.220 | there's this incredible diversity there.
02:19:38.260 | The differences between brain regions
02:19:41.420 | is another sort of major, major insight.
02:19:43.940 | Again, one would think that,
02:19:46.100 | oh, astrocytes are astrocytes no matter where they are,
02:19:48.620 | but no, there's incredible region-specific differences
02:19:52.420 | in the expression patterns
02:19:53.940 | of all of the major brain cell types
02:19:56.100 | across different brain regions.
02:19:57.780 | So basically there's the neocortical regions
02:19:59.420 | that are sort of the recent innovation
02:20:00.980 | that makes us so different from all other species.
02:20:03.460 | There's the sort of reptilian brain sort of regions
02:20:06.500 | that are sort of much more,
02:20:08.100 | very extremely distinct.
02:20:10.500 | There's the cerebellum.
02:20:11.660 | Each of those basically is associated
02:20:15.180 | in a different way with disease,
02:20:17.380 | and what we're doing now is looking into
02:20:19.300 | pseudotemporal models for how disease progresses
02:20:23.740 | across different regions of the brain.
02:20:25.740 | If you look at Alzheimer's,
02:20:27.060 | it basically starts in this small region
02:20:29.260 | called the entorhinal cortex,
02:20:30.500 | and then it spreads through the brain,
02:20:33.380 | and through the hippocampus,
02:20:35.500 | and ultimately affecting the neocortex.
02:20:39.460 | And with every brain region that it hits,
02:20:41.860 | it basically has a different impact on the cognitive
02:20:46.860 | and memory aspects, orientation,
02:20:50.500 | short-term memory, long-term memory, et cetera,
02:20:52.780 | which is dramatically affecting the cognitive path
02:20:56.700 | that the individuals go through.
02:20:58.220 | So what we're doing now is creating
02:21:00.980 | these computational models for ordering the cells
02:21:04.620 | and the regions and the individuals
02:21:07.060 | according to their ability to predict Alzheimer's disease.
02:21:10.460 | So we can have a cell-level predictor of pathology
02:21:14.740 | that allows us to now create a temporal time course
02:21:18.140 | that tells us when every gene turns on
02:21:19.780 | along this pathology progression,
02:21:22.580 | and then trace that across regions
02:21:25.060 | and pathological measures that are region-specific,
02:21:27.780 | but also cognitive measures, and so on and so forth.
02:21:30.260 | So that allows us to now sort of, for the first time,
02:21:33.140 | look at, can we actually do early intervention
02:21:35.580 | for Alzheimer's, where we know that the disease
02:21:38.220 | starts manifesting for 10 years
02:21:39.980 | before you actually have your first cognitive loss?
02:21:43.100 | Can we start seeing that path to build new diagnostics,
02:21:47.980 | new prognostics, new biomarkers
02:21:50.140 | for this sort of early intervention in Alzheimer's?
02:21:53.460 | The other aspect that we're looking at is mosaicism.
02:21:56.940 | We talked about the common variants and the rare variants,
02:21:59.780 | but in addition to those rare variants,
02:22:01.860 | as your initial cell that forms the zygote
02:22:06.540 | divides and divides and divides,
02:22:08.260 | with every cell division,
02:22:09.740 | there are additional mutations that are happening.
02:22:12.340 | So what you end up with is your brain being a mosaic
02:22:16.220 | of multiple different types of genetic underpinnings.
02:22:19.060 | Some cells contain a mutation that other cells don't have.
02:22:23.180 | So every human has the common variants
02:22:27.300 | that all of us carry to some degree,
02:22:29.900 | the rare variants that your immediate tree
02:22:33.420 | of the human species carries,
02:22:35.260 | and then there's the somatic variant,
02:22:37.260 | which is the tree that happened after the zygote
02:22:40.460 | that sort of forms your own body.
02:22:44.100 | So these somatic alterations
02:22:47.100 | is something that has been previously inaccessible
02:22:50.100 | to study in human post-mortem samples.
02:22:53.060 | But right now,
02:22:54.220 | with the advent of single-cell RNA sequencing,
02:22:57.220 | in this particular case,
02:22:58.100 | we're using the well-based sequencing,
02:22:59.740 | which is much more expensive,
02:23:00.780 | but gives you a lot richer information
02:23:02.860 | about each of those transcripts.
02:23:04.420 | So we're using now that richer information
02:23:06.500 | to infer mutations that have happened
02:23:09.700 | in each of the thousands of genes
02:23:12.060 | that sort of are active in these cells,
02:23:15.380 | and then understand how the genome relates to the function,
02:23:20.380 | this genotype-phenotype relationship
02:23:24.780 | that we usually build in GWAS,
02:23:26.420 | between, in genome-wide association studies,
02:23:28.620 | between genetic variation and disease.
02:23:31.260 | We're now building that at the cell level,
02:23:33.780 | where for every cell,
02:23:34.980 | we can relate the unique specific genome of that cell
02:23:38.620 | with the expression patterns of that cell,
02:23:41.100 | and the predicted function,
02:23:42.940 | using these predictive models that I mentioned before,
02:23:44.980 | on dysregulation for cognition,
02:23:47.340 | for pathology in Alzheimer's, at the cell level.
02:23:50.820 | And what we're finding is that the genes that are altered,
02:23:53.940 | and the genetic regions that are altered in common variants
02:23:56.820 | versus rare variants versus somatic variants,
02:23:59.060 | are actually very different from each other.
02:24:01.140 | The somatic variants are pointing to neuronal energetics,
02:24:05.500 | and oligodendrocyte functions
02:24:08.580 | that are not visible in the genetic legions
02:24:10.660 | that you find for the common variants,
02:24:12.500 | probably because they have too strong of an effect
02:24:15.180 | that evolution is just not tolerating them
02:24:17.500 | on the common side of the allele frequency spectrum.
02:24:20.860 | - So the somatic one,
02:24:22.060 | that's the variation that happens after the zygote,
02:24:24.820 | after a new individual. - Correct.
02:24:26.980 | - I mean, this is a dumb question,
02:24:28.220 | but there's mutation and variation, I guess,
02:24:31.140 | that happens there,
02:24:32.460 | and you're saying that through this,
02:24:35.260 | if we focus in on individual cells,
02:24:37.100 | we're able to detect a story that's interesting there,
02:24:40.060 | and that might be a very unique
02:24:42.300 | kind of important variability that arises for,
02:24:46.620 | you said neuronal, or something,
02:24:49.140 | that would sound-- - Energetics.
02:24:50.260 | - Energetics,
02:24:51.100 | that's not a really cool term. - Energetics.
02:24:51.940 | So you're, I mean, the metabolism of humans
02:24:55.060 | is dramatically altered from that of nearby species.
02:24:59.020 | You know, we talked about that last time,
02:25:00.580 | that basically we are able to consume meat
02:25:03.240 | that is incredibly energy-rich,
02:25:05.900 | and that allows us to sort of have functions
02:25:09.800 | that are, you know,
02:25:11.620 | meeting this humongous brain that we have.
02:25:14.660 | So basically, on one hand,
02:25:15.660 | every one of our brain cells is much more energy-efficient
02:25:18.340 | than our neighbors, than our relatives.
02:25:20.420 | Number two, we have way more of these cells,
02:25:23.140 | and number three, we have, you know,
02:25:26.420 | this new diet that allows us to now feed all these needs.
02:25:29.980 | That basically creates a massive amount of damage,
02:25:33.420 | oxidative damage, from this huge,
02:25:36.280 | super-powered factory of ideas and thoughts
02:25:40.220 | that we carry in our skull.
02:25:41.980 | And that factory has energetic needs,
02:25:44.980 | and there's a lot of sort of biological processes
02:25:47.620 | underlying that, that we are finding are altered
02:25:51.060 | in the context of Alzheimer's disease.
02:25:52.820 | - That's fascinating that,
02:25:54.180 | so you have to consider all of these systems
02:25:57.220 | if you wanna understand even something like diseases
02:26:00.300 | that you would maybe traditionally associate
02:26:02.700 | with just the particular cells of the brain.
02:26:05.260 | - Yeah.
02:26:06.100 | - The immune system.
02:26:08.760 | - The metabolic system.
02:26:09.720 | - The metabolic system.
02:26:11.120 | - And these are all the things that makes us uniquely human.
02:26:13.320 | So our immune system is dramatically different
02:26:15.720 | from that of our neighbors.
02:26:16.960 | Our societies are so much more clustered.
02:26:19.520 | The history of infections that have plagued
02:26:21.680 | the human population is, you know,
02:26:24.040 | dramatically different from every other species.
02:26:26.320 | The way that our society and our population
02:26:28.640 | has sort of exploded has basically put unique pressures
02:26:31.800 | on our immune system, and our immune system
02:26:34.060 | has both coped with that density and also been shaped by,
02:26:37.360 | as I mentioned, the vast amount of death
02:26:40.080 | that has happened in the Black Plague
02:26:41.800 | and other sort of selective events in human history,
02:26:44.560 | famines, ice ages, and so forth.
02:26:47.080 | So that's number one on the sort of immune side.
02:26:49.840 | On the metabolic side, you know, again,
02:26:52.360 | we are able to sort of run marathons.
02:26:54.760 | You know, I don't know if you remember
02:26:56.240 | the sort of human versus horse experiment
02:26:58.400 | where the horse actually tires out faster than the human,
02:27:00.840 | and the human actually wins.
02:27:03.120 | So on the metabolic side, we're dramatically different.
02:27:05.780 | On the immune side, we're dramatically different.
02:27:07.460 | On the brain side, again, you know,
02:27:09.880 | no need to sort of, you know, it's a no-brainer
02:27:12.380 | of how our brain is like just enormously more capable.
02:27:16.700 | And then, you know, in the side of cancer,
02:27:19.500 | so basically the cancers that humans are having,
02:27:21.700 | the exposures, the environmental exposures,
02:27:23.980 | is again dramatically different.
02:27:25.660 | And the lifespan, the expansion of human lifespan
02:27:29.060 | is unseen in any other species
02:27:31.960 | in, you know, recent evolutionary history.
02:27:35.520 | And that now leads to a lot of new disorders
02:27:39.080 | that are starting to, you know, manifest late in life.
02:27:43.760 | So, you know, Alzheimer's is one example
02:27:46.240 | where basically, you know, these fast energetic needs
02:27:49.000 | over a lifetime of thinking
02:27:52.080 | can basically lead to all of these debris
02:27:54.320 | and eventually saturate the system
02:27:56.360 | and lead to, you know, Alzheimer's in the late life.
02:27:59.420 | But there's, you know, there's just such a dramatic
02:28:04.700 | set of frontiers when it comes to aging research
02:28:08.660 | that, you know, will, so what I often like to say
02:28:11.740 | is that if you want to engineer a car
02:28:14.740 | to go from 70 miles an hour to 120 miles an hour,
02:28:17.380 | that's fine, you can basically, you know,
02:28:19.080 | fix a few components.
02:28:20.380 | If you want it to now go at 400 miles an hour,
02:28:22.740 | you have to completely redesign the entire car.
02:28:25.720 | Because the system has just not evolved to go that far.
02:28:30.720 | Basically, our human body has only evolved
02:28:33.660 | to live to, I don't know, 120.
02:28:36.140 | Maybe we can get to 150 with minor changes.
02:28:39.180 | But if, you know, as we start pushing these frontiers
02:28:41.460 | for not just living, but well living,
02:28:45.100 | the F-Zine that we talked about last time.
02:28:48.120 | So to basically push F-Zine into the 80s and 90s
02:28:51.460 | and 100s and, you know, much further than that,
02:28:54.380 | we will face new challenges that have, you know,
02:28:58.780 | never been faced before.
02:29:00.300 | In terms of cancer, the number of divisions,
02:29:02.220 | in terms of Alzheimer's and brain related disorders,
02:29:05.120 | in terms of metabolic disorders, in terms of regeneration.
02:29:08.220 | There's just so many different frontiers ahead of us.
02:29:10.800 | So I am thrilled about where we're heading.
02:29:14.300 | So basically I see this confluence in my lab
02:29:16.580 | and many other labs of AI, of, you know,
02:29:20.480 | sort of, you know, the next frontier of AI for drug design.
02:29:23.100 | So basically these sort of graph neural networks
02:29:26.460 | on specific chemical designs that allow you to create
02:29:31.460 | new generations of therapeutics.
02:29:34.520 | These molecular biology tricks for intervening
02:29:38.980 | at the system at every level.
02:29:41.140 | These personalized medicine prediction, diagnosis,
02:29:45.380 | and prognosis using the electronic health records
02:29:49.420 | and using these polygenic risk scores,
02:29:51.900 | weighted by the burden, the number of mutations
02:29:55.980 | that are accumulating across common, rare
02:29:57.740 | and somatic variants, the burden converging
02:30:01.220 | across all of these different molecular pathways,
02:30:05.560 | the delivery of specific drugs and specific interventions
02:30:09.460 | into specific cell types.
02:30:10.840 | And again, you've talked with Bob Langer about this.
02:30:12.700 | There's, you know, many giants in that field.
02:30:14.620 | And then the last concept is not intervening
02:30:18.500 | at the single gene level.
02:30:20.620 | And I want you to sort of conceptualize the concept
02:30:23.300 | of an on target side effect.
02:30:27.540 | What is an on target side effect?
02:30:29.140 | An off target side effect is when you design a molecule
02:30:31.780 | to target one gene and instead it targets another gene
02:30:34.520 | and you have side effects because of that.
02:30:36.580 | An on target side effect is when your molecule
02:30:38.900 | does exactly what you were expecting,
02:30:40.940 | but that gene is pleiotropic.
02:30:43.580 | Pleio means many, tropos means ways, many ways.
02:30:46.860 | It acts in many ways.
02:30:48.100 | It's a multifunctional gene.
02:30:49.980 | So you find that this gene plays a role in this,
02:30:52.500 | but as we talked about, the wiring of genes to phenotypes
02:30:56.580 | is extremely dense and extremely complex.
02:30:58.980 | So the next stage of intervention will be intervening,
02:31:03.000 | not at the gene level, but at the network level.
02:31:05.620 | Intervening at the set of pathways and the set of genes
02:31:08.620 | with multi input perturbations to the system,
02:31:12.100 | multi input modulations, pharmaceutical
02:31:14.820 | or other interventional.
02:31:17.660 | That basically allow you to now work
02:31:20.220 | at the sort of full level of understanding,
02:31:23.900 | not just in your brain, but across your body,
02:31:26.300 | not just in one gene, but across the set of pathways
02:31:29.020 | and so on and so forth for every one of these disorders.
02:31:31.900 | So I think that we're finally at the level of systems
02:31:35.340 | medicine of basically instead of sort of medicine
02:31:38.140 | being at a single gene level,
02:31:40.340 | medicine being at the systems level,
02:31:42.180 | where it can be personalized based on a specific set
02:31:45.260 | of genetic markers and genetic perturbations
02:31:47.100 | that you are either born with,
02:31:49.460 | or that you have developed during your lifetime,
02:31:52.940 | your unique set of exposures,
02:31:54.660 | your unique set of biomarkers,
02:31:56.940 | and your unique set of current set of conditions
02:32:00.860 | through your EHR and other ways.
02:32:04.420 | And the precision component of intervening
02:32:09.980 | extremely precisely in the specific pathways
02:32:12.980 | and in specific combinations of genes
02:32:14.620 | that should be modulated to sort of bring you
02:32:16.620 | from the disease state to the physiologically normal state,
02:32:20.620 | or even to physiologically improved state
02:32:22.620 | through this combination of interventions.
02:32:25.620 | So that's in my view, the field
02:32:26.980 | where basically computer science comes together
02:32:29.060 | with artificial intelligence, statistics,
02:32:31.220 | all of these other tools,
02:32:32.420 | molecular biology technologies and biotechnology
02:32:34.980 | and pharmaceutical technologies
02:32:36.140 | that are sort of revolutionary in the way of intervention.
02:32:39.380 | And of course, this massive amount of molecular biology
02:32:41.820 | and data gathering and generation and perturbation
02:32:44.500 | in massively parallel ways.
02:32:46.300 | So there's no better way, there's no better time,
02:32:49.660 | there's no better place to be sort of,
02:32:52.700 | you know, looking at this whole confluence of ideas.
02:32:56.740 | And I'm just so thrilled to be a small part
02:32:58.980 | of this amazing, enormous ecosystem.
02:33:01.300 | - It's exciting to imagine what humans of 100,
02:33:04.260 | 200 years from now, what their life experience is like,
02:33:08.660 | because these ideas seem to have potential
02:33:12.300 | to transform the quality of life.
02:33:15.060 | And I think that when they look back at us,
02:33:18.220 | they probably wonder how we were put up
02:33:21.140 | with all the suffering in the world.
02:33:23.500 | Manolis, it's a huge honor.
02:33:25.180 | Thank you for spending this early Sunday morning with me.
02:33:29.220 | I deeply appreciate it.
02:33:30.700 | See you next time.
02:33:31.540 | - Sounds like a plan.
02:33:32.380 | Thank you, Lex.
02:33:33.740 | Thanks for listening to this conversation
02:33:35.300 | with Manolis Kellis, and thank you to our sponsors.
02:33:38.860 | SEMrush, which is an SEO optimization tool,
02:33:43.220 | Pessimist Archive, which is one
02:33:45.060 | of my favorite history podcasts,
02:33:47.380 | 8Sleep, which is a self-cooling mattress
02:33:50.100 | with smart sensors and an app,
02:33:52.580 | and finally, BetterHelp, which is an online therapy service.
02:33:57.140 | Please check out these sponsors in the description
02:33:59.340 | to get a discount and to support this podcast.
02:34:03.300 | If you enjoy this thing, subscribe on YouTube,
02:34:05.560 | review it with Five Stars on Apple Podcast,
02:34:07.760 | follow on Spotify, support on Patreon,
02:34:10.460 | or connect with me on Twitter @LexFriedman.
02:34:14.380 | And now, let me leave you some words from Haruki Murakami.
02:34:18.820 | Human beings are ultimately nothing
02:34:20.460 | but carriers, passageways for genes.
02:34:24.500 | They ride us into the ground like racehorses
02:34:27.620 | from generation to generation.
02:34:30.020 | Genes don't think about what constitutes good or evil.
02:34:34.060 | They don't care whether we're happy or unhappy.
02:34:37.420 | We're just means to an end for them.
02:34:39.940 | The only thing they think about
02:34:42.540 | is what is most efficient for them.
02:34:44.780 | Thank you for listening, and hope to see you next time.
02:34:49.200 | (upbeat music)
02:34:51.780 | (upbeat music)
02:34:54.360 | [BLANK_AUDIO]