back to indexHow I use LLMs

Chapters
0:0 Intro into the growing LLM ecosystem
2:54 ChatGPT interaction under the hood
13:12 Basic LLM interactions examples
18:3 Be aware of the model you're using, pricing tiers
22:54 Thinking models and when to use them
31:0 Tool use: internet search
42:4 Tool use: deep research
50:57 File uploads, adding documents to context
59:0 Tool use: python interpreter, messiness of the ecosystem
64:35 ChatGPT Advanced Data Analysis, figures, plots
69:0 Claude Artifacts, apps, diagrams
74:2 Cursor: Composer, writing code
82:28 Audio (Speech) Input/Output
87:37 Advanced Voice Mode aka true audio inside the model
97:9 NotebookLM, podcast generation
100:20 Image input, OCR
107:2 Image output, DALL-E, Ideogram, etc.
109:14 Video input, point and talk on app
112:23 Video output, Sora, Veo 2, etc etc.
113:29 ChatGPT memory, custom instructions
118:38 Custom GPTs
126:30 Summary
00:00:00.000 |
Hi everyone. So in this video, I would like to continue our general audience series on large 00:00:05.840 |
language models like Chatsheepd. Now, in the previous video, "Deep Dive into LLMs" that you 00:00:10.480 |
can find on my YouTube, we went into a lot of the under the hood fundamentals of how these models 00:00:14.560 |
are trained and how you should think about their cognition or psychology. Now, in this video, 00:00:20.080 |
I want to go into more practical applications of these tools. I want to show you lots of examples, 00:00:25.600 |
I want to take you through all the different settings that are available, and I want to show 00:00:29.040 |
you how I use these tools and how you can also use them in your own life and work. So let's dive in. 00:00:35.360 |
Okay, so first of all, the web page that I have pulled up here is chatsheepd.com. Now, 00:00:40.080 |
as you might know, Chatsheepd was developed by OpenAI and deployed in 2022. So this was the 00:00:46.000 |
first time that people could actually just kind of like talk to a large language model 00:00:49.760 |
through a text interface, and this went viral and all over the place on the internet, 00:00:54.080 |
and this was huge. Now, since then though, the ecosystem has grown a lot. So I'm going to be 00:00:59.280 |
showing you a lot of examples of Chatsheepd specifically, but now in 2025, there's many 00:01:05.840 |
other apps that are kind of like Chatsheepd-like, and this is now a much bigger and richer ecosystem. 00:01:10.480 |
So in particular, I think Chatsheepd by OpenAI is this original gangster incumbent. 00:01:16.080 |
It's most popular and most feature-rich also, because it's been around the longest. 00:01:20.800 |
But there are many other kinds of clones available, I would say. I don't think it's too unfair to say, 00:01:25.600 |
but in some cases, there are kind of like unique experiences that are not found in Chatsheepd, 00:01:29.840 |
and we're going to see examples of those. So for example, Big Tech has followed with a lot of kind 00:01:35.920 |
of Chatsheepd-like experiences. So for example, Gemini, Meta.ai, and Copilot from Google, Meta, 00:01:40.960 |
and Microsoft respectively. And there's also a number of startups. So for example, Anthropic 00:01:46.000 |
has Claude, which is kind of like a Chatsheepd equivalent. XAI, which is Elon's company, 00:01:50.720 |
has Grok. And there's many others. So all of these here are from the United States 00:01:55.920 |
companies, basically. DeepSeek is a Chinese company, and Le Chat is a French company, Mistral. 00:02:03.120 |
Now, where can you find these and how can you keep track of them? Well, number one, 00:02:07.200 |
on the internet somewhere. But there are some leaderboards, and in the previous video, 00:02:10.480 |
I've shown you Chatbot Arena is one of them. So here you can come to some ranking of different 00:02:15.760 |
models, and you can see sort of their strength or ELO score. And so this is one place where you 00:02:21.040 |
can keep track of them. I would say another place maybe is this SEAL leaderboard from Scale. And so 00:02:28.080 |
here you can also see different kinds of evals, and different kinds of models, and how well they 00:02:32.400 |
rank. And you can also come here to see which models are currently performing the best on a 00:02:37.600 |
wide variety of tasks. So understand that the ecosystem is fairly rich, but for now I'm going 00:02:43.760 |
to start with OpenAI because it is the incumbent and is most feature-rich, but I'm going to show 00:02:48.640 |
you others over time as well. So let's start with ChatGPT. What is this text box and what do we put 00:02:54.240 |
in here? Okay, so the most basic form of interaction with a language model is that we give 00:02:58.400 |
a text and then we get some text back in response. So as an example, we can ask to get a haiku about 00:03:04.480 |
what it's like to be a large language model. So this is a good kind of example task for a language 00:03:09.760 |
model because these models are really good at writing. So writing haikus, or poems, or cover 00:03:15.920 |
letters, or resumes, or email replies, they're just good at writing. So when we ask for something like 00:03:22.240 |
this, what happens looks as follows. The model basically responds, "Words flow like a stream, 00:03:28.320 |
endless echoes nevermind, ghost of thought unseen." Okay, it's pretty dramatic. But what we're seeing 00:03:34.960 |
here in ChatGPT is something that looks a bit like a conversation that you would have with a friend. 00:03:39.120 |
These are kind of like chat bubbles. Now we saw in the previous video is that what's going on 00:03:44.240 |
under the hood here is that this is what we call a user query, this piece of text. And this piece 00:03:50.480 |
of text and also the response from the model, this piece of text is chopped up into little text 00:03:56.160 |
chunks that we call tokens. So this sequence of text is under the hood, a token sequence, 00:04:02.960 |
one-dimensional token sequence. Now the way we can see those tokens is we can use an app like, 00:04:07.040 |
for example, TickTokenizer. So making sure that GPT-40 is selected, I can paste my text here. 00:04:12.080 |
And this is actually what the model sees under the hood. My piece of text to the model looks 00:04:17.920 |
like a sequence of exactly 15 tokens. And these are the little text chunks that the model sees. 00:04:23.360 |
Now there's a vocabulary here of 200,000 roughly of possible tokens. And then these are the token 00:04:32.320 |
IDs corresponding to all these little text chunks that are part of my query. And you can play 00:04:36.880 |
with this and update it. And you can see that, for example, this is case sensitive. You would 00:04:40.000 |
get different tokens. And you can kind of edit it and see live how the token sequence changes. 00:04:44.160 |
So our query was 15 tokens. And then the model response is right here. And it responded back to 00:04:51.520 |
us with a sequence of exactly 19 tokens. So that haiku is this sequence of 19 tokens. 00:04:58.000 |
Now, so we said 15 tokens and it said 19 tokens back. Now, because this is a conversation and we 00:05:06.480 |
want to actually maintain a lot of the metadata that actually makes up a conversation object, 00:05:10.640 |
this is not all that's going on under the hood. And we saw in the previous video a little bit 00:05:14.800 |
about the conversation format. So it gets a little bit more complicated in that we have to take our 00:05:20.880 |
user query. And we have to actually use this chat format. So let me delete the system message. I 00:05:26.320 |
don't think it's very important for the purposes of understanding what's going on. Let me paste my 00:05:30.720 |
message as the user. And then let me paste the model response as an assistant. And then let me 00:05:37.200 |
crop it here properly. The tool doesn't do that properly. So here we have it as it actually 00:05:44.400 |
happens under the hood. There are all these special tokens that basically begin a message 00:05:50.320 |
from the user. And then the user says, and this is the content of what we said. And then the user 00:05:55.840 |
ends. And then the assistant begins and says this, et cetera. Now, the precise details of the 00:06:02.080 |
conversation format are not important. What I want to get across here is that what looks to you and I 00:06:07.280 |
as little chat bubbles going back and forth under the hood, we are collaborating with the model 00:06:12.560 |
and we're both writing into a token stream. And these two bubbles back and forth were in a 00:06:20.640 |
sequence of exactly 42 tokens under the hood. I contributed some of the first tokens and then the 00:06:26.000 |
model continued the sequence of tokens with its response. And we could alternate and continue 00:06:32.000 |
adding tokens here. And together we're building out a token window, a one-dimensional sequence 00:06:38.320 |
of tokens. Okay, so let's come back to chatGPT now. What we are seeing here is kind of like 00:06:43.760 |
little bubbles going back and forth between us and the model. Under the hood, we are building 00:06:47.920 |
out a one-dimensional token sequence. When I click new chat here, that wipes the token window. 00:06:54.720 |
That resets the tokens to basically zero again and restarts the conversation from scratch. 00:07:00.080 |
Now, the cartoon diagram that I have in my mind when I'm speaking to a model looks something like 00:07:04.720 |
this. When we click new chat, we begin a token sequence. So this is a one-dimensional sequence 00:07:11.920 |
of tokens. The user, we can write tokens into this stream. And then when we hit enter, we transfer 00:07:19.200 |
control over to the language model. And the language model responds with its own token streams. 00:07:24.880 |
And then the language model has a special token that basically says something along the lines of 00:07:29.600 |
"I'm done." So when it emits that token, the chatGPT application transfers control back to us 00:07:35.600 |
and we can take turns. Together, we are building out the token stream, which we also call the 00:07:41.680 |
context window. So the context window is kind of like this working memory of tokens and anything 00:07:48.080 |
that is inside this context window is kind of like in the working memory of this conversation 00:07:52.640 |
and is very directly accessible by the model. Now, what is this entity here that we are talking 00:07:59.200 |
to and how should we think about it? Well, this language model here, we saw that the way it is 00:08:04.480 |
trained in the previous video, we saw there are two major stages, the pre-training stage and the 00:08:09.680 |
post-training stage. The pre-training stage is kind of like taking all of internet, chopping 00:08:16.160 |
it up into tokens, and then compressing it into a single kind of like zip file. But the zip file is 00:08:22.640 |
not exact. The zip file is lossy and probabilistic zip file because we can't possibly represent all 00:08:28.400 |
of internet than just one sort of like, say, terabyte of zip file because there's just way 00:08:35.920 |
too much information. So we just kind of get the gestalt or the vibes inside this zip file. 00:08:42.320 |
Now, what's actually inside the zip file are the parameters of a neural network. 00:08:48.400 |
And so, for example, a one terabyte zip file would correspond to roughly, say, one trillion 00:08:54.480 |
parameters inside this neural network. And what this neural network is trying to do is it's trying 00:09:00.400 |
to basically take tokens, and it's trying to predict the next token in a sequence. But it's 00:09:05.760 |
doing that on internet documents. So it's kind of like this internet document generator, right? 00:09:10.560 |
And in the process of predicting the next token in a sequence on internet, 00:09:15.600 |
the neural network gains a huge amount of knowledge about the world. And this knowledge 00:09:21.600 |
is all represented and stuffed and compressed inside the one trillion parameters, roughly, 00:09:26.960 |
of this language model. Now, this pre-training stage also we saw is fairly costly. So this can 00:09:32.240 |
be many tens of millions of dollars, say like three months of training and so on. So this is 00:09:37.840 |
a costly long phase. For that reason, this phase is not done that often. So for example, GPT-4.0, 00:09:45.520 |
this model was pre-trained probably many months ago, maybe like even a year ago by now. And so 00:09:52.080 |
that's why these models are a little bit out of date. They have what's called a knowledge cutoff. 00:09:56.960 |
Because that knowledge cutoff corresponds to when the model was pre-trained. 00:10:00.960 |
And its knowledge only goes up to that point. Now, some knowledge can come into the model 00:10:09.440 |
through the post-training phase, which we'll talk about in a second. But roughly speaking, 00:10:13.280 |
you should think of these models as kind of like a little bit out of date, because pre-training is 00:10:17.680 |
way too expensive and happens infrequently. So any kind of recent information, like if you wanted to 00:10:24.080 |
talk to your model about something that happened last week or so on, we're going to need other 00:10:27.600 |
ways of providing that information to the model, because it's not stored in the knowledge of the 00:10:31.840 |
model. So we're going to have various tool use to give that information to the model. 00:10:36.400 |
Now, after pre-training, there's the second stage called post-training. And the post-training stage 00:10:42.240 |
is really attaching a smiley face to this zip file. Because we don't want to generate internet 00:10:47.360 |
documents, we want this thing to take on the persona of an assistant that responds to user 00:10:53.600 |
queries. And that's done in the process of post-training, where we swap out the dataset 00:10:58.640 |
for a dataset of conversations that are built out by humans. So this is basically where the model 00:11:04.000 |
takes on this persona, so that we can ask questions and it responds with answers. 00:11:09.040 |
So it takes on the style of an assistant, that's post-training, but it has the knowledge 00:11:16.480 |
of all of internet, and that's by pre-training. So these two are combined in this artifact. 00:11:23.200 |
Now, the important thing to understand here, I think, for this section, is that what you are 00:11:28.800 |
talking to is a fully self-contained entity by default. This language model, think of it as a 00:11:34.640 |
one terabyte file on a disk. Secretly, that represents one trillion parameters and their 00:11:40.240 |
precise settings inside the neural network that's trying to give you the next token in a sequence. 00:11:45.360 |
But this is the fully self-contained entity. There's no calculator, there's no computer 00:11:50.080 |
and Python interpreter, there's no worldwide web browsing, there's none of that. There's 00:11:54.320 |
no tool use yet in what we've talked about so far. You're talking to a zip file. If you 00:11:59.120 |
stream tokens to it, it will respond with tokens back. And the zip file has the knowledge from 00:12:05.040 |
pre-training and it has the style and form from post-training. And so that's roughly how you can 00:12:12.640 |
think about this entity. Okay, so if I had to summarize what we talked about so far, I would 00:12:17.280 |
probably do it in the form of an introduction of ChatGPT in a way that I think you should think 00:12:21.120 |
about it. So the introduction would be, "Hi, I'm ChatGPT. I'm a one terabyte zip file. My knowledge 00:12:27.760 |
comes from the internet, which I read in its entirety about six months ago and I only remember 00:12:34.000 |
vaguely. Okay? And my winning personality was programmed, by example, by human labelers at 00:12:39.920 |
OpenAI." So the personality is programmed in post-training and the knowledge comes from 00:12:46.640 |
compressing the internet during pre-training. And this knowledge is a little bit out of date 00:12:51.680 |
and it's probabilistic and slightly vague. Some of the things that probably are mentioned very 00:12:56.880 |
frequently on the internet, I will have a lot better recollection of than some of the things 00:13:01.120 |
that are discussed very rarely, very similar to what you might expect with a human. So let's now 00:13:06.800 |
talk about some of the repercussions of this entity and how we can talk to it and what kinds 00:13:11.040 |
of things we can expect from it. Now I'd like to use real examples when we actually go through this. 00:13:15.280 |
So for example, this morning I asked ChatGPT the following, "How much caffeine is in one shot of 00:13:19.520 |
Americana?" And I was curious because I was comparing it to matcha. Now ChatGPT will tell 00:13:24.800 |
me that this is roughly 63 milligrams of caffeine or so. Now the reason I'm asking ChatGPT this 00:13:29.520 |
question that I think this is okay is, number one, I'm not asking about any knowledge that is very 00:13:35.280 |
recent. So I do expect that the model has sort of read about how much caffeine there is in one shot. 00:13:40.240 |
I don't think this information has changed too much. And number two, I think this information 00:13:44.720 |
is extremely frequent on the internet. This kind of a question and this kind of information has 00:13:48.880 |
occurred all over the place on the internet. And because there were so many mentions of it, 00:13:53.200 |
I expect the model to have good memory of it and its knowledge. So there's no tool use and 00:13:58.160 |
the model, the zip file, responded that there's roughly 63 milligrams. Now I'm not guaranteed 00:14:03.920 |
that this is the correct answer. This is just its vague recollection of the internet. But I 00:14:10.000 |
can go to primary sources and maybe I can look up, okay, caffeine and Americano and I could verify 00:14:16.400 |
that, yeah, it looks to be about 63 is roughly right. And you can look at primary sources to 00:14:20.640 |
decide if this is true or not. So I'm not strictly speaking guaranteed that this is true, but I think 00:14:25.200 |
probably this is the kind of thing that ChatGPT would know. Here's an example of a conversation 00:14:29.840 |
I had two days ago, actually. And there's another example of a knowledge-based conversation and 00:14:34.880 |
things that I'm comfortable asking of ChatGPT with some caveats. So I'm a bit sick, I have runny nose 00:14:39.840 |
and I want to get meds that help with that. So it told me a bunch of stuff. And I want my nose to 00:14:47.680 |
not be runny. So I gave it a clarification based on what it said. And then it kind of gave me some 00:14:52.240 |
of the things that might be helpful with that. And then I looked at some of the meds that I have 00:14:56.320 |
at home. And I said, does DayQuil or NightQuil work? And it went off and it kind of like went 00:15:01.360 |
over the ingredients of DayQuil and NightQuil and whether or not they help mitigate runny nose. 00:15:07.040 |
Now, when these ingredients are coming here, again, remember, we are talking to a zip file 00:15:12.320 |
that has a recollection of the internet. I'm not guaranteed that these ingredients are correct. 00:15:16.960 |
And in fact, I actually took out the box and I looked at the ingredients and I made sure 00:15:20.800 |
that NightQuil ingredients are exactly these ingredients. And I'm doing that because I don't 00:15:26.160 |
always fully trust what's coming out here, right? This is just a probabilistic statistical 00:15:30.480 |
recollection of the internet. But that said, conversations of DayQuil and NightQuil, these 00:15:35.600 |
are very common meds. Probably there's tons of information about a lot of this on the internet. 00:15:40.880 |
And this is the kind of things that the model have pretty good recollection of. So actually, 00:15:45.600 |
these were all correct. And then I said, okay, well, I have NightQuil. How fast would it act 00:15:51.200 |
roughly? And it kind of tells me. And then is acetaminophen basically a Tylenol? And it says, 00:15:56.800 |
yes. So this is a good example of how ChachAPT was useful to me. It is a knowledge-based query. 00:16:02.000 |
This knowledge sort of isn't recent knowledge. This is all coming from the knowledge of the 00:16:06.880 |
model. I think this is common information. This is not a high-stakes situation. I'm checking 00:16:11.920 |
ChachAPT a little bit. But also, this is not a high-stakes situation, so no big deal. So I popped 00:16:16.800 |
a NightQuil and indeed it helped. But that's roughly how I'm thinking about what's coming 00:16:21.680 |
back here. Okay, so at this point, I want to make two notes. The first note I want to make is that 00:16:26.960 |
naturally as you interact with these models, you'll see that your conversations are growing 00:16:31.040 |
longer, right? Anytime you are switching topic, I encourage you to always start a new chat. 00:16:37.200 |
When you start a new chat, as we talked about, you are wiping the context window of tokens 00:16:42.640 |
and resetting it back to zero. If it is the case that those tokens are not anymore useful to your 00:16:47.600 |
next query, I encourage you to do this because these tokens in this window are expensive. 00:16:52.400 |
And they're expensive in kind of like two ways. Number one, if you have lots of tokens here, 00:16:58.320 |
then the model can actually find it a little bit distracting. So if this was a lot of tokens, 00:17:04.000 |
this is kind of like the working memory of the model. The model might be distracted by all the 00:17:09.440 |
tokens in the past when it is trying to sample tokens much later on. So it could be distracting 00:17:15.360 |
and it could actually decrease the accuracy of the model and of its performance. And number two, 00:17:20.080 |
the more tokens are in the window, the more expensive it is by a little bit, not by too much, 00:17:25.200 |
but by a little bit to sample the next token in the sequence. So your model is actually slightly 00:17:30.000 |
slowing down. It's becoming more expensive to calculate the next token and the more tokens 00:17:35.040 |
there are here. And so think of the tokens in the context window as a precious resource. 00:17:41.280 |
Think of that as the working memory of the model and don't overload it with irrelevant information 00:17:47.840 |
and keep it as short as you can. And you can expect that to work faster and slightly better. 00:17:53.360 |
Of course, if the information actually is related to your task, you may want to keep it in there. 00:17:57.360 |
But I encourage you to, as often as you can, basically start a new chat whenever you are 00:18:02.400 |
switching topic. The second thing is that I always encourage you to keep in mind what model you are 00:18:07.440 |
actually using. So here on the top left, we can drop down and we can see that we are currently 00:18:11.520 |
using GPT 4.0. Now, there are many different models of many different flavors, and there are 00:18:17.520 |
too many actually, but we'll go through some of these over time. So we are using GPT 4.0 right 00:18:22.240 |
now. And in everything that I've shown you, this is GPT 4.0. Now, when I open a new incognito 00:18:27.200 |
window, so if I go to chatgpt.com and I'm not logged in, the model that I'm talking to here, 00:18:33.440 |
so if I just say hello, the model that I'm talking to here might not be GPT 4.0. It might be a 00:18:38.560 |
smaller version. Now, unfortunately, OpenAI does not tell me when I'm not logged in what model I'm 00:18:43.600 |
using, which is kind of unfortunate. But it's possible that you are using a smaller, kind of 00:18:48.240 |
dumber model. So if we go to the chatgpt pricing page here, we see that they have three basic 00:18:54.640 |
tiers for individuals, the free, plus, and pro. And in the free tier, you have access to what's 00:19:01.280 |
called GPT 4.0 mini. And this is a smaller version of GPT 4.0. It is a smaller model with a smaller 00:19:08.000 |
number of parameters. It's not going to be as creative, like its writing might not be as good. 00:19:12.240 |
Its knowledge is not going to be as good. It's going to probably hallucinate a bit more, etc. 00:19:16.640 |
But it is kind of like the free offering, the free tier. They do say that you have limited access to 00:19:21.760 |
4.0 and 3.0 mini, but I'm not actually 100% sure. Like, it didn't tell us which model we were using, 00:19:26.880 |
so we just fundamentally don't know. Now, when you pay for $20 per month, even though it doesn't say 00:19:32.640 |
this, I think basically, like, they're screwing up on how they're describing this. But if you go to 00:19:37.600 |
fine print, limit supply, we can see that the plus users get 80 messages every three hours for GPT 4.0. 00:19:46.240 |
So that's the flagship biggest model that's currently available as of today. That's available 00:19:52.320 |
and that's what we want to be using. So if you pay $20 per month, you have that with some limits. 00:19:56.800 |
And then if you pay for $200 per month, you get the pro, and there's a bunch of additional goodies 00:20:01.680 |
as well as unlimited GPT 4.0. And we're going to go into some of this because I do pay for 00:20:06.080 |
pro subscription. Now, the whole takeaway I want you to get from this is be mindful of the models 00:20:12.560 |
that you're using. Typically, with these companies, the bigger models are more expensive to 00:20:16.480 |
calculate. And so therefore, the companies charge more for the bigger models. And so make those 00:20:23.760 |
trade-offs for yourself, depending on your usage of LLMs. Have a look at if you can get away with 00:20:29.280 |
the cheaper offerings. And if the intelligence is not good enough for you and you're using this 00:20:32.560 |
professionally, you may really want to consider paying for the top tier models that are available 00:20:36.320 |
from these companies. In my case, in my professional work, I do a lot of coding and a lot of things 00:20:40.560 |
like that. And this is still very cheap for me. So I pay this very gladly because I get access to 00:20:46.160 |
some really powerful models that I'll show you in a bit. So yeah, keep track of what model you're 00:20:51.360 |
using and make those decisions for yourself. I also want to show you that all the other LLM 00:20:56.480 |
providers will all have different pricing tiers with different models at different tiers that 00:21:01.760 |
you can pay for. So for example, if we go to Claude from Anthropic, you'll see that I am 00:21:06.240 |
paying for the professional plan and that gives me access to Claude 3.5 Sonnet. And if you are 00:21:11.760 |
not paying for a pro plan, then probably you only have access to maybe Haiku or something like that. 00:21:15.760 |
And so use the most powerful model that kind of like works for you. Here's an example of me using 00:21:22.480 |
Claude a while back. I was asking for just travel advice. So I was asking for a cool city to go to, 00:21:28.800 |
and Claude told me that Zermatt in Switzerland is really cool. So I ended up going there for 00:21:33.360 |
a New Year's break following Claude's advice. But this is just an example of another thing that I 00:21:37.920 |
find these models pretty useful for is travel advice and ideation and getting pointers that 00:21:42.960 |
you can research further. Here we also have an example of Gemini.google.com. So this is from 00:21:49.040 |
Google. I got Gemini's opinion on the matter and I asked it for a cool city to go to, and it also 00:21:54.880 |
recommended Zermatt. So that was nice. So I like to go between different models and asking them 00:22:00.080 |
similar questions and seeing what they think about. And for Gemini also on the top left, 00:22:04.640 |
we also have a model selector. So you can pay for the more advanced tiers and use those models. 00:22:10.240 |
Same thing goes for Grok, just released. We don't want to be asking Grok 2 questions because we know 00:22:16.160 |
that Grok 3 is the most advanced model. So I want to make sure that I pay enough and such that I 00:22:22.400 |
have Grok 3 access. So for all these different providers, find the one that works best for you, 00:22:28.240 |
experiment with different providers, experiment with different pricing tiers for the problems 00:22:31.920 |
that you are working on. And that's kind of, and often I end up personally just paying for a lot 00:22:37.120 |
of them and then asking all of them the same question. And I kind of refer to all these 00:22:42.480 |
models as my LLM council. So they're kind of like the council of language models. If I'm trying to 00:22:47.840 |
figure out where to go on a vacation, I will ask all of them. And so you can also do that for 00:22:52.320 |
yourself if that works for you. Okay, the next topic I want to now turn to is that of thinking 00:22:57.680 |
models, quote unquote. So we saw in the previous video that there are multiple stages of training. 00:23:02.320 |
Pre-training goes to supervised fine-tuning, goes to reinforcement learning. And reinforcement 00:23:08.080 |
learning is where the model gets to practice on a large collection of problems that resemble 00:23:13.760 |
the practice problems in the textbook. And it gets to practice on a lot of math and code problems. 00:23:19.040 |
And in the process of reinforcement learning, the model discovers thinking strategies that 00:23:26.000 |
lead to good outcomes. And these thinking strategies, when you look at them, 00:23:30.000 |
they very much resemble kind of the inner monologue you have when you go through problem solving. 00:23:34.400 |
So the model will try out different ideas, it will backtrack, it will revisit assumptions, 00:23:40.160 |
and it will do things like that. Now, a lot of these strategies are very difficult to hard code 00:23:44.640 |
as a human labeler, because it's not clear what the thinking process should be. It's only in the 00:23:48.880 |
reinforcement learning that the model can try out lots of stuff. And it can find the thinking process 00:23:53.520 |
that works for it with its knowledge and its capabilities. So this is the third stage of 00:23:59.760 |
training these models. This stage is relatively recent, so only a year or two ago. And all of the 00:24:06.400 |
different LLM labs have been experimenting with these models over the last year. And this is kind 00:24:10.560 |
of like seen as a large breakthrough recently. And here we looked at the paper from DeepSeek 00:24:17.040 |
that was the first to basically talk about it publicly. And they had a nice paper about 00:24:22.160 |
incentivizing reasoning capabilities in LLMs via reinforcement learning. So that's the paper that 00:24:26.720 |
we looked at in the previous video. So we now have to adjust our cartoon a little bit, because 00:24:30.960 |
basically what it looks like is our emoji now has this optional thinking bubble. And when you are 00:24:38.640 |
using a thinking model, which will do additional thinking, you are using the model that has been 00:24:43.840 |
additionally tuned with reinforcement learning. And qualitatively, what does this look like? 00:24:49.360 |
Well, qualitatively, the model will do a lot more thinking. And what you can expect is that you will 00:24:54.160 |
get higher accuracies, especially on problems that are, for example, math, code, and things 00:24:58.880 |
that require a lot of thinking. Things that are very simple might not actually benefit from this, 00:25:04.400 |
but things that are actually deep and hard might benefit a lot. And so, but basically what you're 00:25:11.440 |
paying for it is that the models will do thinking, and that can sometimes take multiple minutes, 00:25:16.000 |
because the models will emit tons and tons of tokens over a period of many minutes, 00:25:20.000 |
and you have to wait, because the model is thinking just like a human would think. 00:25:23.520 |
But in situations where you have very difficult problems, this might translate to higher accuracy. 00:25:29.280 |
So let's take a look at some examples. So here's a concrete example when I was stuck on a programming 00:25:34.160 |
problem recently. So something called the gradient check fails, and I'm not sure why, 00:25:38.960 |
and I copy pasted the model, my code. So the details of the code are not important, 00:25:44.560 |
but this is basically an optimization of a multi-layer perceptron, and details are not 00:25:50.160 |
important. It's a bunch of code that I wrote, and there was a bug because my gradient check 00:25:53.920 |
didn't work, and I was just asking for advice. And GPT-4.0, which is the flagship, most powerful 00:25:59.200 |
model for open AI, but without thinking, just kind of like went into a bunch of things that 00:26:05.840 |
it thought were issues or that I should double check, but actually didn't really solve the 00:26:09.200 |
problem. Like all the things that it gave me here are not the core issue of the problem. 00:26:15.040 |
So the model didn't really solve the issue, and it tells me about how to debug it and so on. 00:26:20.880 |
But then what I did was, here in the dropdown, I turned to one of the thinking models. Now, 00:26:26.960 |
for open AI, all of these models that start with O are thinking models. O1, O3 mini, O3 mini high, 00:26:35.280 |
and O1 pro, pro mode, are all thinking models. And they're not very good at naming their models, 00:26:41.520 |
but that is the case. And so here they will say something like, "Uses advanced reasoning," or 00:26:47.840 |
"Good at coding logics," and stuff like that. But these are basically all tuned with reinforcement 00:26:53.120 |
learning. And because I am paying for $200 per month, I have access to O1 pro mode, 00:26:59.840 |
which is best at reasoning. But you might want to try some of the other ones depending on your 00:27:06.480 |
pricing tier. And when I gave the same model, the same prompt to O1 pro, which is the best 00:27:13.040 |
at reasoning model, and you have to pay $200 per month for this one, 00:27:17.440 |
then the exact same prompt, it went off and it thought for one minute. And it went through a 00:27:23.520 |
sequence of thoughts, and open AI doesn't fully show you the exact thoughts. They just kind of 00:27:28.560 |
give you a little summaries of the thoughts. But it thought about the code for a while, 00:27:33.440 |
and then it actually came back with the correct solution. It noticed that the parameters are 00:27:37.920 |
mismatched in how I pack and unpack them, and et cetera. So this actually solved my problem. 00:27:41.920 |
And I tried out giving the exact same prompt to a bunch of other LLMs. So for example, Claude, 00:27:48.240 |
I gave Claude the same problem, and it actually noticed the correct issue and solved it. And it 00:27:55.520 |
did that even with Sonnet, which is not a thinking model. So Claude 3.5 Sonnet, to my knowledge, 00:28:02.640 |
is not a thinking model. And to my knowledge, Anthropic, as of today, doesn't have a thinking 00:28:07.680 |
model deployed, but this might change by the time you watch this video. But even without thinking, 00:28:13.440 |
this model actually solved the issue. When I went to Gemini, I asked it, and it also solved the 00:28:20.000 |
issue, even though I also could have tried the thinking model, but it wasn't necessary. 00:28:24.160 |
I also gave it to Grok, Grok 3 in this case, and Grok 3 also solved the problem after a bunch of 00:28:30.800 |
stuff. So it also solved the issue. And then finally, I went to Perplexity.ai. And the reason 00:28:39.200 |
I like Perplexity is because when you go to the model dropdown, one of the models that they host 00:28:43.520 |
is this DeepSeq R1. So this has the reasoning with the DeepSeq R1 model, which is the model that we 00:28:50.560 |
saw over here. This is the paper. So Perplexity just hosts it and makes it very easy to use. 00:28:58.800 |
So I copy pasted it there and I ran it. And I think they render, they like really render it 00:29:04.560 |
terribly. But down here, you can see the raw thoughts of the model, even though you have to 00:29:12.480 |
expand them. But you see like, okay, the user is having trouble with the gradient check, and then 00:29:17.600 |
it tries out a bunch of stuff. And then it says, but wait, when they accumulate the gradients, 00:29:21.120 |
they're doing the thing incorrectly. Let's check the order. The parameters are packed as this, 00:29:26.000 |
and then it notices the issue. And then it kind of like says, that's a critical mistake. 00:29:31.920 |
And so it kind of like thinks through it and you have to wait a few minutes and then also comes up 00:29:35.200 |
with the correct answer. So basically, long story short, what do I want to show you? 00:29:40.960 |
There exists a class of models that we call thinking models. All the different providers 00:29:44.960 |
may or may not have a thinking model. These models are most effective for difficult problems in math 00:29:50.880 |
and code and things like that. And in those kinds of cases, they can push up the accuracy 00:29:55.600 |
of your performance. In many cases, like if you're asking for travel advice or something like that, 00:30:00.080 |
you're not going to benefit out of a thinking model. There's no need to wait for one minute 00:30:04.000 |
for it to think about some destinations that you might want to go to. So for myself, I usually try 00:30:10.720 |
out the non-thinking models because their responses are really fast. But when I suspect the response 00:30:15.040 |
is not as good as it could have been, and I want to give the opportunity to the model to think a 00:30:19.280 |
bit longer about it, I will change it to a thinking model, depending on whichever one you have 00:30:24.160 |
available to you. Now, when you go to Grok, for example, and when I start a new conversation with 00:30:30.320 |
Grok, when you put the question here, like, "Hello," you should put something important here, 00:30:37.040 |
you see here, "Think." So let the model take its time. So turn on "Think," and then click "Go." 00:30:43.440 |
And when you click "Think," Grok, under the hood, switches to the thinking model. And all the 00:30:49.200 |
different OLLM providers will kind of like have some kind of a selector for whether or not you 00:30:52.960 |
want the model to think, or whether it's okay to just like go with the previous kind of generation 00:30:59.680 |
of the models. Okay, now the next section I want to continue to is to tool use. So far, we've only 00:31:06.960 |
talked to the language model through text. And this language model is, again, this zip file in 00:31:12.080 |
a folder, it's inert, it's closed off, it's got no tools, it's just a neural network that can emit 00:31:18.000 |
tokens. So what we want to do now, though, is we want to go beyond that. And we want to give the 00:31:22.720 |
model the ability to use a bunch of tools. And one of the most useful tools is an internet search. 00:31:28.880 |
And so let's take a look at how we can make models use internet search. So for example, 00:31:34.000 |
again, using concrete examples from my own life, a few days ago, I was watching White Lotus season 00:31:40.000 |
three. And I watched the first episode. And I love this TV show, by the way. And I was curious when 00:31:45.600 |
the episode two was coming out. And so in the old world, you would imagine you go to Google or 00:31:51.920 |
something like that, you put in like new episodes of White Lotus season three, and then you start 00:31:56.160 |
clicking on these links. And maybe open a few of them. Or something like that, right? And you start 00:32:02.880 |
like searching through it and trying to figure it out. And sometimes you luck out and you get a 00:32:07.040 |
schedule. But many times you might get really crazy ads, there's a bunch of random stuff going on, 00:32:13.680 |
and it's just kind of like an unpleasant experience, right? So wouldn't it be great if a 00:32:17.120 |
model could do this kind of a search for you, visit all the webpages, and then take all those 00:32:23.840 |
webpages, take all their content and stuff it into the context window, and then basically give you 00:32:31.600 |
the response. And that's what we're going to do now. Basically, we have a mechanism or a way, 00:32:37.120 |
we introduce a mechanism for the model to emit a special token that is some kind of a 00:32:43.360 |
search the internet token. And when the model emits the search the internet token, 00:32:48.480 |
the chat GPT application, or whatever LLM application it is you're using, will stop 00:32:54.720 |
sampling from the model. And it will take the query that the model gave, it goes off, it does 00:32:59.920 |
a search, it visits webpages, it takes all of their text, and it puts everything into the context 00:33:05.840 |
window. So now you have this internet search tool that itself can also contribute tokens into our 00:33:12.640 |
context window. And in this case, it would be like lots of internet webpages, and maybe there's 10 00:33:17.600 |
of them, and maybe it just puts it all together. And this could be thousands of tokens coming from 00:33:21.680 |
these webpages, just as we were looking at them ourselves. And then after it has inserted all 00:33:26.480 |
those webpages into the context window, it will reference back to your question as to, hey, when 00:33:32.880 |
is this season getting released? And it will be able to reference the text and give you the correct 00:33:37.680 |
answer. And notice that this is a really good example of why we would need internet search. 00:33:42.240 |
Without the internet search, this model has no chance to actually give us the correct answer. 00:33:47.760 |
Because like I mentioned, this model was trained a few months ago, the schedule probably was not 00:33:51.600 |
known back then. And so when White Lotus Season 3 is coming out is not part of the 00:33:56.880 |
real knowledge of the model. And it's not in the zip file, most likely, because this is something 00:34:03.200 |
that was presumably decided on in the last few weeks. And so the model has to basically go off 00:34:07.360 |
and do internet search to learn this knowledge. And it learns it from the webpages, just like you 00:34:11.680 |
and I would without it. And then it can answer the question once that information is in the context 00:34:16.080 |
window. And remember, again, that the context window is this working memory. So once we load 00:34:22.000 |
the articles, once all of these articles think of their text as being copy pasted into the context 00:34:30.080 |
window, now they're in a working memory, and the model can actually answer those questions, 00:34:35.040 |
because it's in the context window. So basically, long story short, don't do this manually, 00:34:41.280 |
but use tools like Perplexity as an example. So Perplexity.ai had a really nice sort of LLM that 00:34:48.720 |
was doing internet search. And I think it was like the first app that really convincingly did this. 00:34:53.920 |
More recently, Chachibti also introduced a search button. It says search the web. So we're going to 00:34:59.360 |
take a look at that in a second. For now, when are new episodes of White Lotus Season 3 getting 00:35:04.080 |
released? You can just ask. And instead of having to do the work manually, we just hit enter, 00:35:08.960 |
and the model will visit these web pages, it will create all the queries, and then it will give you 00:35:12.560 |
the answer. So it just kind of did a ton of the work for you. And then you can, usually there 00:35:19.120 |
will be citations, so you can actually visit those web pages yourself, and you can make sure these 00:35:24.080 |
are not hallucinations from the model, and you can actually double check that this is actually 00:35:28.320 |
correct. Because it's not in principle guaranteed, it's just something that may or may not work. 00:35:36.240 |
If we take this, we can also go to, for example, Chachibti, say the same thing. But now, 00:35:40.640 |
when we put this question in, without actually selecting search, I'm not actually 100% sure what 00:35:45.200 |
the model will do. In some cases, the model will actually know that this is recent knowledge, 00:35:51.280 |
and that it probably doesn't know, and it will create a search. In some cases, we have to declare 00:35:55.760 |
that we want to do the search. In my own personal use, I would know that the model doesn't know, 00:36:00.160 |
and so I would just select search. But let's see first, let's see if what happens. 00:36:04.720 |
Okay, searching the web, and then it prints stuff, and then it cites. So the model actually detected 00:36:12.320 |
itself that it needs to search the web, because it understands that this is some kind of a recent 00:36:16.480 |
information, etc. So this was correct. Alternatively, if I create a new conversation, 00:36:20.880 |
I could have also selected search, because I know I need to search. Enter. And then it does the same 00:36:26.320 |
thing, searching the web, and that's the result. So basically, when you're using these LLM, 00:36:31.920 |
look for this. For example, Grok. Excuse me. Let's try Grok. Without it, without selecting search. 00:36:43.360 |
Okay, so the model does some search, just knowing that it needs to search, 00:36:46.720 |
and gives you the answer. So basically, let's see what Claude does. 00:36:52.720 |
You see, so Claude doesn't actually have the search tool available. So we'll say, 00:37:00.160 |
as of my last update in April 2024, this last update is when the model went through pre-training. 00:37:06.960 |
And so Claude is just saying, as of my last update, the knowledge cutoff of April 2024, 00:37:12.960 |
it was announced, but it doesn't know. So Claude doesn't have 00:37:17.040 |
the internet search integrated as an option, and will not give you the answer. 00:37:22.480 |
I expect that this is something that Anthropic might be working on. 00:37:24.880 |
Let's try Gemini, and let's see what it says. Unfortunately, no official release date for 00:37:31.840 |
White Lotus Season 3 yet. So Gemini 2.0 Pro Experimental does not have access to internet 00:37:40.800 |
search, and doesn't know. We could try some of the other ones, like 2.0 Flash. Let me try that. 00:37:50.160 |
Okay, so this model seems to know, but it doesn't give citations. Oh wait, okay, there we go. 00:37:55.440 |
Sources and related content. So you see how 2.0 Flash actually has the internet search tool, 00:38:02.400 |
but I'm guessing that the 2.0 Pro, which is the most powerful model that they have, 00:38:09.120 |
this one actually does not have access. And in here, it actually tells us, 00:38:13.120 |
2.0 Pro Experimental lacks access to real-time info and some Gemini features. 00:38:17.200 |
So this model is not fully wired with internet search. So long story short, we can get models 00:38:24.320 |
to perform Google searches for us, visit the webpages, pull in the information to the context 00:38:29.440 |
window, and answer questions. And this is a very, very cool feature. But different models, 00:38:35.440 |
possibly different apps, have different amount of integration of this capability, 00:38:41.040 |
and so you have to be kind of on the lookout for that. And sometimes the model will automatically 00:38:44.960 |
detect that they need to do search, and sometimes you're better off telling the model that you want 00:38:49.600 |
it to do the search. So when I'm doing GPT 4.0 and I know that this requires a search, 00:38:55.360 |
you probably want to tick that box. So that's search tools. I wanted to show you a few more 00:39:02.320 |
examples of how I use the search tool in my own work. So what are the kinds of queries that I use? 00:39:08.160 |
And this is fairly easy for me to do because usually for these kinds of cases, I go to 00:39:12.800 |
Perplexity just out of habit, even though ChatGPT today can do this kind of stuff as well, as do 00:39:18.240 |
probably many other services as well. But I happen to use Perplexity for these kinds of search queries. 00:39:24.080 |
So whenever I expect that the answer can be achieved by doing basically something like 00:39:29.600 |
Google search and visiting a few of the top links, and the answer is somewhere in those top links, 00:39:34.400 |
whenever that is the case, I expect to use the search tool, and I come to Perplexity. So here 00:39:39.120 |
are some examples. Is the market open today? And this was on precedence day, I wasn't 100% sure. 00:39:46.640 |
So Perplexity understands what is today, it will do the search and it will figure out that on 00:39:51.360 |
precedence day this was closed. Where's White Lotus season three filmed? Again, this is something that 00:39:57.440 |
I wasn't sure that a model would know in its knowledge. This is something niche, so maybe 00:40:01.760 |
there's not that many mentions of it on the internet. And also this is more recent, so I 00:40:06.400 |
don't expect a model to know by default. So this was a good fit for the search tool. Does Vercel 00:40:15.440 |
offer PostgreSQL database? So this was a good example of this, because this kind of stuff 00:40:22.320 |
changes over time, and the offerings of Vercel, which is a company, may change over time, and I 00:40:29.440 |
want the latest. And whenever something is latest or something changes, I prefer to use the search 00:40:34.240 |
tool, so I come to Perplexity. What is the Apple launch tomorrow, and what are some of the rumors? 00:40:40.480 |
So again, this is something recent. Where is the Singles Inferno season four cast? Must know. So 00:40:48.720 |
this is, again, a good example, because this is very fresh information. Why is the Palantir stock 00:40:54.240 |
going up? What is driving the enthusiasm? When is Civilization VII coming out exactly? 00:41:02.240 |
This is an example also. Has Brian Johnson talked about the toothpaste he uses? 00:41:06.000 |
And I was curious, basically, about what Brian does. And again, it has the two features. Number 00:41:11.680 |
one, it's a little bit esoteric, so I'm not 100% sure if this is at scale on the internet and would 00:41:16.560 |
be part of knowledge of a model. And number two, this might change over time, so I want to know 00:41:21.200 |
what toothpaste he uses most recently. And so this is a good fit, again, for a search tool. 00:41:25.360 |
Is it safe to travel to Vietnam? This can potentially change over time. 00:41:30.400 |
And then I saw a bunch of stuff on Twitter about a USAID, and I wanted to know what's the deal, 00:41:35.360 |
so I searched about that. And then you can dive in a bunch of ways here. 00:41:40.000 |
But this use case here is along the lines of, "I see something trending, and I'm curious what's 00:41:46.160 |
happening. What is the gist of it?" And so I very often just quickly bring up a search of what's 00:41:51.840 |
happening, and then get a model to just give me a gist of roughly what happened. Because a lot of 00:41:56.800 |
the individual tweets or posts might not have the full context just by itself. So these are examples 00:42:02.160 |
of how I use a search tool. Okay, next up, I would like to tell you about this capability called Deep 00:42:07.440 |
Research. And this is fairly recent, only as of like a month or two ago. But I think it's incredibly 00:42:12.800 |
cool and really interesting, and kind of went under the radar for a lot of people, even though 00:42:16.400 |
I think it shouldn't have. So when we go to Chachapiti Pricing here, we notice that Deep 00:42:21.360 |
Research is listed here under Pro. So it currently requires $200 per month. So this is the top tier. 00:42:27.120 |
However, I think it's incredibly cool. So let me show you by example in what kinds of scenarios 00:42:33.200 |
you might want to use it. Roughly speaking, Deep Research is a combination of internet search and 00:42:40.160 |
thinking, and rolled out for a long time. So the model will go off, and it will spend tens of 00:42:46.880 |
minutes doing with Deep Research. And the first sort of company that announced this was Chachapiti 00:42:53.840 |
as part of its pro offering very recently, like a month ago. So here's an example. Recently, 00:43:00.000 |
I was on the internet buying supplements, which I know is kind of crazy. But Brian Johnson has this 00:43:05.520 |
starter pack, and I was kind of curious about it. And there's the thing called longevity mix, right? 00:43:09.840 |
And it's got a bunch of health actives. And I want to know what these things are, right? And 00:43:15.200 |
of course, so like CAKG, like what the hell is this? Boost energy production for sustained 00:43:20.400 |
vitality. What does that mean? So one thing you could of course do is you could open up Google 00:43:25.440 |
search and look at the Wikipedia page or something like that, and do everything that you're kind of 00:43:29.760 |
used to. But Deep Research allows you to basically take an alternate route. And it kind of like 00:43:36.320 |
processes a lot of this information for you and explains it a lot better. So as an example, we can 00:43:41.360 |
do something like this. This is my example prompt. CAKG is one of the health actives in Brian Johnson's 00:43:47.360 |
blueprint at 2.5 grams per serving. Can you do research on CAKG? Tell me about why it might 00:43:53.760 |
be found in the longevity mix. It's possible efficacy in humans or animal models. It's 00:43:58.800 |
potential mechanism of action, any potential concerns or toxicity or anything like that. 00:44:02.880 |
Now, here I have this button available to me, and you won't unless you pay $200 per month right now. 00:44:10.000 |
But I can turn on Deep Research. So let me copy paste this and hit go. 00:44:13.360 |
And now the model will say, "Okay, I'm going to research this." And then sometimes it likes 00:44:19.200 |
to ask clarifying questions before it goes off. So a focus on human clinical studies, 00:44:24.160 |
animal models, or both. So let's say both. Specific sources, all sources, I don't know. 00:44:31.280 |
A comparison to other longevity compounds, not needed. Comparison, just AKG. 00:44:38.240 |
We can be pretty brief. The model understands. And we hit go. And then, okay, I'll research CAKG, 00:44:44.960 |
starting research. And so now we have to wait for probably about 10 minutes or so. And if you'd like 00:44:50.160 |
to click on it, you can get a bunch of preview of what the model is doing on a high level. 00:44:53.760 |
So this will go off and it will do a combination of, like I said, thinking and internet search. 00:45:00.160 |
But it will issue many internet searches. It will go through lots of papers. It will look at papers 00:45:05.120 |
and it will think and it will come back 10 minutes from now. So this will run for a while. Meanwhile, 00:45:10.640 |
while this is running, I'd like to show you equivalents of it in the industry. So inspired 00:45:17.440 |
by this, a lot of people were interested in cloning it. And so one example is, for example, 00:45:21.680 |
perplexity. So perplexity, when you go through the model dropdown, has something called deep 00:45:25.840 |
research. And so you can issue the same queries here. And we can give this to the model. And so 00:45:32.480 |
you can issue the same queries here. And we can give this to perplexity. And then grok, as well, 00:45:38.560 |
has something called deep search instead of deep research. But I think that grok's deep 00:45:42.720 |
search is kind of like deep research, but I'm not 100% sure. So we can issue grok deep search 00:45:48.480 |
as well. Grok three, deep search, go. And this model is going to go off as well. 00:45:56.640 |
Now, I think, where's my Chachapiti? So Chachapiti is kind of like maybe a quarter done. 00:46:04.080 |
Perplexity is going to be done soon. Okay, still thinking. And grok is still going as well. 00:46:11.600 |
I like grok's interface the most. It seems like, okay, so basically it's looking up all kinds of 00:46:17.680 |
papers, WebMD, browsing results, and it's kind of just getting all this. Now, while this is all 00:46:23.920 |
going on, of course, it's accumulating a giant context window and it's processing all that 00:46:28.720 |
information, trying to kind of create a report for us. So key points, what is CAKG and why is 00:46:37.360 |
it in the longevity mix? How is it associated to longevity, et cetera? And so it will do citations 00:46:43.280 |
and it will kind of like tell you all about it. And so this is not a simple and short response. 00:46:47.280 |
This is kind of like, almost like a custom research paper on any topic you would like. 00:46:52.320 |
And so this is really cool and it gives a lot of references potentially for you to go off 00:46:55.920 |
and do some of your own reading and maybe ask some clarifying questions afterwards. But it's 00:47:00.080 |
actually really incredible that it gives you all these like different citations and processes the 00:47:03.680 |
information for you a little bit. Let's see if Perplexity finished. Okay, Perplexity is still 00:47:09.200 |
researching and Chachapiti is also researching. So let's briefly pause the video and I'll come 00:47:15.680 |
back when this is done. Okay, so Perplexity finished and we can see some of the report 00:47:19.600 |
that it wrote up. So there's some references here and some basic description. And then Chachapiti 00:47:26.960 |
also finished and it also thought for five minutes, looked at 27 sources and produced a report. 00:47:32.880 |
So here it talked about research in worms, Drosophila in mice and in human trials that 00:47:41.360 |
are ongoing. And then the proposed mechanism of action and some safety and potential concerns 00:47:47.760 |
and references, which you can dive deeper into. So usually in my own work right now, 00:47:53.680 |
I've only used this maybe for like 10 to 20 queries so far, something like that. Usually 00:47:58.400 |
I find that the Chachapiti offering is currently the best. It is the most thorough, it reads the 00:48:03.360 |
best, it is the longest, it makes most sense when I read it. And I think the Perplexity and the Grok 00:48:09.920 |
are a little bit shorter and a little bit briefer and don't quite get into the same detail as the 00:48:17.200 |
deep research from Google, from Chachapiti right now. I will say that everything that is given to 00:48:22.720 |
you here, again, keep in mind that even though it is doing research and it's pulling stuff in, 00:48:27.760 |
there are no guarantees that there are no hallucinations here. Any of this can be 00:48:32.400 |
hallucinated at any point in time. It can be totally made up, fabricated, misunderstood by 00:48:36.160 |
the model. So that's why these citations are really important. Treat this as your first draft. 00:48:41.040 |
Treat this as papers to look at, but don't take this as definitely true. So here, what I would 00:48:47.520 |
do now is I would actually go into these papers and I would try to understand, is Chachapiti 00:48:52.240 |
understanding it correctly? And maybe I have some follow-up questions, et cetera, so you can do all 00:48:55.680 |
that. But still incredibly useful to see these reports once in a while, to get a bunch of 00:49:00.800 |
sources that you might want to descend into afterwards. Okay, so just like before, 00:49:04.960 |
I wanted to show a few brief examples of how I've used deep research. So for example, I was 00:49:10.480 |
trying to change a browser because Chrome upset me, and so it deleted all my tabs. So I was looking 00:49:19.440 |
at either Brave or Arc, and I was most interested in which one is more private. And basically, 00:49:25.920 |
Chachapiti compiled this report for me, and this was actually quite helpful. And I went into some 00:49:30.080 |
of the sources and I understood why Brave is basically TLDR significantly better. And that's 00:49:36.560 |
why, for example, here, I'm using Brave because I've switched to it now. And so this is an example 00:49:41.600 |
of basically researching different kinds of products and comparing them. I think that's a 00:49:45.760 |
good fit for deep research. Here, I wanted to know about a life extension in mice. So it kind of gave 00:49:51.280 |
me a very long reading, but basically mice are an animal model for longevity, and different labs 00:49:58.400 |
have tried to extend it with various techniques. And then here, I wanted to explore LLM labs in the 00:50:04.240 |
USA, and I wanted a table of how large they are, how much funding they've had, et cetera. So this 00:50:11.440 |
is the table that it produced. Now, this table is basically hit and miss, unfortunately. So I wanted 00:50:16.320 |
to show it as an example of a failure. I think some of these numbers, I didn't fully check them, 00:50:21.040 |
but they don't seem way too wrong. Some of this looks wrong. But the big omission I definitely 00:50:26.960 |
see is that XAI is not here, which I think is a really major omission. And then also, conversely, 00:50:32.480 |
Hugging Face should probably not be here because I asked specifically about LLM labs in the USA. 00:50:37.200 |
And also, Eleuther AI, I don't think should count as a major LLM lab due to mostly its resources. 00:50:45.280 |
And so I think it's kind of a hit and miss. Things are missing. I don't fully trust these numbers. I 00:50:50.160 |
have to actually look at them. And so again, use it as a first draft. Don't fully trust it. Still 00:50:56.000 |
very helpful. That's it. So what's really happening here that is interesting is that we are providing 00:51:01.600 |
the LLM with additional concrete documents that it can reference inside its context window. So the 00:51:08.480 |
model is not just relying on the knowledge, the hazy knowledge of the world through its parameters 00:51:14.240 |
and what it knows in its brain. We're actually giving it concrete documents. It's as if you and 00:51:19.600 |
I reference specific documents like on the internet or something like that, while we are kind of 00:51:25.040 |
producing some answer for some question. Now, we can do that through an internet search or like a 00:51:29.440 |
tool like this. But we can also provide these LLMs with concrete documents ourselves through a file 00:51:35.200 |
upload. And I find this functionality pretty helpful in many ways. So as an example, let's 00:51:40.320 |
look at Clod because they just released Clod 3.7 while I was filming this video. So this is a new 00:51:45.120 |
Clod model that is now the state of the art. And notice here that we have thinking mode now as a 00:51:51.440 |
3.7. And so normal is what we looked at so far, but they just released extended best for math and 00:51:57.280 |
coding challenges. And what they're not saying, but it's actually true under the hood, probably 00:52:01.360 |
most likely, is that this was trained with reinforcement learning in a similar way that 00:52:05.600 |
all the other thinking models were produced. So what we can do now is we can upload the documents 00:52:11.440 |
that we wanted to reference inside its context window. So as an example, there's this paper that 00:52:16.960 |
came out that I was kind of interested in. It's from ARC Institute. And it's basically a language 00:52:23.040 |
model trained on DNA. And so I was kind of curious. I mean, I'm not from biology, but I was kind of 00:52:29.040 |
curious what this is. And this is a perfect example of what LLMs are extremely good for because you 00:52:35.520 |
can upload these documents to the LLM and you can load this PDF into the context window and then ask 00:52:41.120 |
questions about it and basically read the documents together with an LLM and ask questions off it. 00:52:47.120 |
So the way you do that is you basically just drag and drop. So we can take that PDF and just drop it 00:52:52.240 |
here. This is about 30 megabytes. Now, when Claude gets this document, it is very likely that they 00:53:02.240 |
actually discard a lot of the images and that kind of information. I don't actually know exactly what 00:53:08.320 |
they do under the hood and they don't really talk about it. But it's likely that the images are 00:53:12.480 |
thrown away. Or if they are there, they may not be as well understood as you and I would 00:53:19.680 |
understand them potentially. And it's very likely that what's happening under the hood is that 00:53:23.440 |
this PDF is basically converted to a text file and that text file is loaded into the token window. 00:53:29.440 |
And once it's in the token window, it's in the working memory and we can ask questions off it. 00:53:34.000 |
So typically when I start reading papers together with any of these LLMs, I just ask for, 00:53:39.440 |
"Can you give me a summary of this paper?" Let's see what Claude 3.7 says. 00:53:49.040 |
Okay, I'm exceeding the length limit of this chat. Oh, God. Really? Oh, damn. Okay, 00:54:00.800 |
well, let's try chat GPT. Can you summarize this paper? 00:54:10.640 |
And we're using GPT 4.0 and we're not using thinking, which is okay. We can start by not 00:54:22.720 |
thinking. Reading documents. Summary of the paper, "Genome Modeling and Design Across All 00:54:31.760 |
Domains of Life." So this paper introduces Evo 2 large-scale biological foundation model 00:54:36.800 |
and then key features and so on. So I personally find this pretty helpful. And then we can kind 00:54:48.080 |
of go back and forth. And as I'm reading through the abstract and the introduction, et cetera, 00:54:52.560 |
I am asking questions of the LLM and it's kind of making it easier for me to understand the paper. 00:54:57.920 |
Another way that I like to use this functionality extensively is when I'm reading books. It is 00:55:02.960 |
rarely ever the case anymore that I read books just by myself. I always involve an LLM to help 00:55:08.240 |
me read a book. So a good example of that recently is The Wealth of Nations, which I was reading 00:55:13.120 |
recently. And it is a book from 1776 written by Adam Smith and it's kind of like the foundation 00:55:17.840 |
of classical economics. And it's a really good book. And it's kind of just very interesting to 00:55:22.400 |
me that it was written so long ago, but it has a lot of modern day kind of like, it's just got a 00:55:27.360 |
lot of insights that I think are very timely even today. So the way I read books now, as an example, 00:55:32.960 |
is you basically pull up the book and you have to get access to like the raw content of that 00:55:38.720 |
information. In the case of Wealth of Nations, this is easy because it is from 1776. So you can 00:55:43.440 |
just find it on Wealth Project Gutenberg as an example. And then basically find the chapter that 00:55:49.360 |
you are currently reading. So as an example, let's read this chapter from book one. And this chapter 00:55:54.960 |
I was reading recently, and it kind of goes into the division of labor and how it is limited by the 00:56:01.840 |
extent of the market. Roughly speaking, if your market is very small, then people can't specialize. 00:56:07.600 |
And specialization is what is basically huge. Specialization is extremely important for wealth 00:56:14.640 |
creation because you can have experts who specialize in their simple little task. But you 00:56:21.040 |
can only do that at scale because without the scale, you don't have a large enough market to 00:56:26.720 |
sell to your specialization. So what we do is we copy paste this book, this chapter, at least. 00:56:34.640 |
This is how I like to do it. We go to say Claude and we say something like, "We are reading the 00:56:41.360 |
Wealth of Nations." Now remember, Claude has knowledge of the Wealth of Nations but probably 00:56:47.520 |
doesn't remember exactly the content of this chapter. So it wouldn't make sense to ask Claude 00:56:53.200 |
questions about this chapter directly because he probably doesn't remember what the chapter is 00:56:56.960 |
about. But we can remind Claude by loading this into the context window. So we're reading the 00:57:02.160 |
Wealth of Nations. "Please summarize this chapter to start." And then what I do here is I copy paste. 00:57:09.360 |
Now in Claude, when you copy paste, they don't actually show all the text inside the text box. 00:57:15.440 |
They create a little text attachment when it is over some size. And so we can click enter. 00:57:21.360 |
And we just kind of like start off. Usually I like to start off with a summary of what this 00:57:27.840 |
chapter is about just so I have a rough idea. And then I go in and I start reading the chapter. 00:57:32.400 |
And if at any point we have any questions, then we just come in and just ask our question. 00:57:37.360 |
And I find that basically going hand in hand with LLMs dramatically increases my retention, 00:57:43.520 |
my understanding of these chapters. And I find that this is especially the case when you're 00:57:47.520 |
reading, for example, documents from other fields, like for example, biology, or for example, 00:57:53.440 |
documents from a long time ago, like 1776, where you sort of need a little bit of help of even 00:57:58.080 |
understanding what the basics of the language. Or for example, I would feel a lot more courage 00:58:03.440 |
approaching a very old text that is outside of my area of expertise. Maybe I'm reading Shakespeare, 00:58:07.920 |
or I'm reading things like that. I feel like LLMs make a lot of reading very dramatically more 00:58:14.320 |
accessible than it used to be before, because you're not just right away confused. You can 00:58:18.800 |
actually kind of go slowly through it and figure it out together with the LLM in hand. So I use this 00:58:24.640 |
extensively, and I think it's extremely helpful. I'm not aware of tools, unfortunately, that make 00:58:29.920 |
this very easy for you. Today, I do this clunky back and forth. So literally, I will find the book 00:58:36.240 |
somewhere, and I will copy paste stuff around. And I'm going back and forth. And it's extremely 00:58:41.520 |
awkward and clunky. And unfortunately, I'm not aware of a tool that makes this very easy for you. 00:58:46.000 |
But obviously, what you want is as you're reading a book, you just want to highlight the passage and 00:58:50.320 |
ask questions about it. This currently, as far as I know, does not exist. But this is extremely 00:58:55.360 |
helpful. I encourage you to experiment with it. And don't read books alone. Okay, the next very 00:59:01.680 |
powerful tool that I now want to turn to is the use of a Python interpreter, or basically giving 00:59:07.440 |
the ability to the LLM to use and write computer programs. So instead of the LLM giving you an 00:59:15.360 |
answer directly, it has the ability now to write a computer program and to emit special tokens 00:59:22.160 |
that the ChachiPT application recognizes as, hey, this is not for the human. This is basically 00:59:29.840 |
saying that whatever I output it here is actually a computer program, please go off and run it and 00:59:35.120 |
give me the result of running that computer program. So it is the integration of the language 00:59:40.720 |
model with a programming language here, like Python. So this is extremely powerful. Let's 00:59:46.480 |
see the simplest example of where this would be used and what this would look like. So if I go 00:59:52.960 |
go to ChachiPT, and I give it some kind of a multiplication problem, let's say 30 times 9, 00:59:57.280 |
or something like that, then this is a fairly simple multiplication. And you and I can probably 01:00:03.440 |
do something like this in our head, right? Like 30 times 9, you can just come up with the result 01:00:07.840 |
of 270, right? So let's see what happens. Okay, so LLM did exactly what I just did. It calculated 01:00:15.600 |
the result of the multiplication to be 270. But it's actually not really doing math, it's actually 01:00:20.320 |
more like almost memory work. But it's easy enough to do in your head. So there was no tool use 01:00:27.680 |
involved here. All that happened here was just the zip file doing next token prediction and gave the 01:00:34.720 |
correct result here in its head. The problem now is what if we want something more, more complicated? 01:00:40.560 |
So what is this times this? And now of course, this, if I asked you to calculate this, you would 01:00:49.200 |
give up instantly because you know that you can't possibly do this in your head. And you would be 01:00:53.360 |
looking for a calculator. And that's exactly what the LLM does now too. And OpenAI has trained 01:00:58.720 |
ChachiPT to recognize problems that it cannot do in its head, and to rely on tools instead. So what 01:01:04.960 |
I expect ChachiPT to do for this kind of a query is to turn to tool use. So let's see what it looks 01:01:09.520 |
like. Okay, there we go. So what's opened up here is what's called the Python interpreter. 01:01:16.320 |
And Python is basically a little programming language. And instead of the LLM telling you 01:01:21.840 |
directly what the result is, the LLM writes a program. And then not shown here are special 01:01:28.000 |
tokens that tell the ChachiPT application to please run the program. And then the LLM pauses 01:01:33.520 |
execution. Instead, the Python program runs, creates a result, and then passes this result 01:01:40.720 |
back to the language model as text. And the language model takes over and tells you that 01:01:45.360 |
the result of this is that. So this is tool use, incredibly powerful. And OpenAI has trained 01:01:52.000 |
ChachiPT to kind of like know in what situations to lean on tools. And they've taught it to do 01:01:58.320 |
that by example. So human labelers are involved in curating datasets that kind of tell the model 01:02:05.440 |
by example in what kinds of situations it should lean on tools and how. But basically, we have a 01:02:10.320 |
Python interpreter. And this is just an example of multiplication. But this is significantly more 01:02:16.640 |
powerful. So let's see what we can actually do inside programming languages. Before we move on, 01:02:22.240 |
I just wanted to make the point that unfortunately, you have to kind of keep track of which LLMs that 01:02:27.920 |
you're talking to have different kinds of tools available to them. Because different LLMs might 01:02:32.880 |
not have all the same tools. And in particular, LLMs that do not have access to the Python 01:02:36.800 |
interpreter or programming language, or are unwilling to use it might not give you correct 01:02:41.200 |
results in some of these harder problems. So as an example, here we saw that ChachiPT correctly 01:02:47.280 |
used a programming language and didn't do this in its head. Grok3 actually, I believe, does not 01:02:52.720 |
have access to a programming language, like a Python interpreter. And here, it actually does 01:02:58.560 |
this in its head and gets remarkably close. But if you actually look closely at it, it gets it wrong. 01:03:05.200 |
This should be 120 instead of 060. So Grok3 will just hallucinate through this multiplication 01:03:12.080 |
and do it in its head and get it wrong. But actually, like remarkably close. Then I tried 01:03:18.240 |
Clod. And Clod actually wrote, in this case, not Python code, but it wrote JavaScript code. 01:03:23.920 |
But JavaScript is also a programming language and gets the correct result. Then I came to Gemini 01:03:29.360 |
and I asked 2.0 Pro. And Gemini did not seem to be using any tools. There's no indication of that. 01:03:35.840 |
And yet, it gave me what I think is the correct result, which actually kind of surprised me. 01:03:40.160 |
So Gemini, I think, actually calculated this in its head correctly. And the way we can tell that 01:03:46.080 |
this is, which is kind of incredible, the way we can tell that it's not using tools is we can just 01:03:50.480 |
try something harder. What is, we have to make it harder for it. Okay, so it gives us some result. 01:03:59.360 |
And then I can use my calculator here, and it's wrong, right? So this is using my MacBook Pro 01:04:06.400 |
calculator. And two, it's not correct, but it's like remarkably close, but it's not correct. 01:04:13.040 |
But it will just hallucinate the answer. So I guess like my point is, unfortunately, 01:04:19.360 |
the state of the LLMs right now is such that different LLMs have different tools available 01:04:24.160 |
to them, and you kind of have to keep track of it. And if they don't have the tools available, 01:04:28.800 |
they'll just do their best, which means that they might hallucinate a result for you. So that's 01:04:33.920 |
something to look out for. Okay, so one practical setting where this can be quite powerful is what's 01:04:38.480 |
called chatgpt advanced data analysis. And as far as I know, this is quite unique to chatgpt itself. 01:04:45.120 |
And it basically gets chatgpt to be kind of like a junior data analyst who you can kind of 01:04:51.840 |
collaborate with. So let me show you a concrete example without going into full detail. So first, 01:04:57.520 |
we need to get some data that we can analyze and plot and chart, etc. So here in this case, I said, 01:05:03.280 |
let's research open AI evaluation as an example. And I explicitly asked chatgpt to use the search 01:05:08.160 |
tool because I know that under the hood, such a thing exists. And I don't want it to be hallucinating 01:05:13.360 |
data to me, I want it to actually look it up and back it up and create a table where each year 01:05:18.320 |
we have the valuation. So these are the open AI evaluations over time. Notice how in 2015, 01:05:24.400 |
it's not applicable. So the valuation is like unknown. Then I said now plot this use log scale 01:05:30.640 |
for y axis. And so this is where this gets powerful. chatgpt goes off and writes a program 01:05:36.320 |
that plots the data over here. So it created a little figure for us, and it sort of ran it and 01:05:43.600 |
showed it to us. So this can be quite nice and valuable because it's a very easy way to basically 01:05:48.080 |
collect data, upload data in a spreadsheet, visualize it, etc. I will note some of the 01:05:53.520 |
things here. So as an example, notice that we had NA for 2015, but chatgpt when it was writing the 01:05:59.920 |
code, and again, I would always encourage you to scrutinize the code, it put in 0.1 for 2015. 01:06:06.240 |
And so basically, it implicitly assumed that it made the assumption here in code 01:06:12.160 |
that the valuation of 2015 was 100 million. And because it put in 0.1, and kind of like did it 01:06:19.120 |
without telling us. So it's a little bit sneaky. And that's why you kind of have to pay attention 01:06:23.200 |
a little bit to the code. So I'm familiar with the code, and I always read it. But I think 01:06:28.880 |
I would be hesitant to potentially recommend the use of these tools, 01:06:32.320 |
if people aren't able to like read it and verify it a little bit for themselves. 01:06:36.400 |
Now, fit a trendline and extrapolate until the year 2030. Mark the expected valuation in 2030. 01:06:44.640 |
So it went off, and it basically did a linear fit. And it's using scipy's curve fit. And it did this 01:06:52.880 |
and came up with a plot. And it told me that the valuation based on the trend in 2030 is 01:06:59.680 |
approximately 1.7 trillion, which sounds amazing, except here, I became suspicious, because I see 01:07:06.480 |
that chatgpt is telling me it's 1.7 trillion. But when I look here at 2030, it's printing 01:07:12.720 |
20271.7b. So it's extrapolation when it's printing the variable is inconsistent with 1.7 trillion. 01:07:22.000 |
This makes it look like that valuation should be about 20 trillion. And so that's what I said, 01:07:27.280 |
print this variable directly by itself, what is it? And then it sort of like rewrote the code 01:07:32.080 |
and gave me the variable itself. And as we see in the label here, it is indeed 20271.7b, etc. 01:07:40.080 |
So in 2030, the true exponential trend extrapolation would be a valuation of 20 trillion. 01:07:51.040 |
So I was like, I was trying to confront chatgpt, and I was like, you lied to me, right? And it's 01:07:54.400 |
like, yeah, sorry, I messed up. So I guess I like this example, because number one, it shows the 01:08:01.200 |
power of the tool in that it can create these figures for you. And it's very nice. But I think 01:08:07.040 |
number two, it shows the trickiness of it, where, for example, here, it made an implicit assumption. 01:08:13.920 |
And here, it actually told me something. It told me just the wrong, it hallucinated 1.7 trillion. 01:08:19.920 |
So again, it is kind of like a very, very junior data analyst. It's amazing that it can plot 01:08:24.880 |
figures. But you have to kind of still know what this code is doing. And you have to be careful 01:08:30.320 |
and scrutinize it and make sure that you are really watching very closely, because your junior 01:08:34.960 |
analyst is a little bit absent minded, and not quite right all the time. So really powerful, 01:08:42.240 |
but also be careful with this. I won't go into full details of advanced data analysis. But 01:08:47.680 |
there were many videos made on this topic. So if you would like to use some of this in your work, 01:08:53.040 |
then I encourage you to look at some of these videos. I'm not going to go into the full detail. 01:08:57.600 |
So a lot of promise, but be careful. Okay, so I've introduced you to ChachiPT and advanced 01:09:02.880 |
data analysis, which is one powerful way to basically have LLMs interact with code, 01:09:07.760 |
and add some UI elements like showing of figures and things like that. I would now like to 01:09:12.880 |
introduce you to one more related tool. And that is specific to Claude, and it's called artifacts. 01:09:18.320 |
So let me show you by example what this is. So you have a conversation with Claude. And I'm asking, 01:09:25.680 |
generate 20 flashcards from the following text. And for the text itself, I just came to the Adam 01:09:32.640 |
Smith Wikipedia page, for example, and I copy pasted this introduction here. So I copy pasted 01:09:38.240 |
this here and ask for flashcards. And Claude responds with 20 flashcards. So for example, 01:09:45.440 |
when was Adam Smith baptized on June 16, etc. When did he die? What was his nationality, etc. 01:09:52.320 |
So once we have the flashcards, we actually want to practice these flashcards. And so this is where 01:09:57.760 |
I continue the conversation, I say, now use the artifacts feature to write a flashcards app to 01:10:03.280 |
test these flashcards. And so Claude goes off and writes code for an app that basically formats all 01:10:13.280 |
of this into flashcards. And that looks like this. So what Claude wrote specifically was this core 01:10:19.760 |
code here. So it uses a react library, and then basically creates all these components, 01:10:26.160 |
it hardcodes the q&a into this app, and then all the other functionality of it. And then 01:10:33.120 |
the cloud interface basically is able to load these react components directly in your browser. 01:10:37.840 |
And so you end up with an app. So when was Adam Smith baptized, and you can click to reveal the 01:10:44.480 |
answer. And then you can say whether you got it correct or not. When did he die? What was his 01:10:50.080 |
nationality, etc. So you can imagine doing this, and then maybe we can reset the progress or 01:10:54.880 |
shuffle the cards, etc. So what happened here is that Claude wrote us a super duper custom app just 01:11:02.400 |
for us, right here. And typically, what we're used to is some software engineers write apps, 01:11:10.640 |
they make them available, and then they give you maybe some way to customize them, or maybe to 01:11:14.160 |
upload flashcards. Like for example, the Anki app, you can import flashcards and all this kind of 01:11:18.720 |
stuff. This is a very different paradigm, because in this paradigm, Claude just writes the app just 01:11:24.000 |
for you and deploys it here in your browser. Now keep in mind that a lot of apps you'll find on the 01:11:29.920 |
internet, they have entire backends, etc. There's none of that here, there's no database or anything 01:11:34.160 |
like that. But these are like local apps that can run in your browser. And they can get fairly 01:11:39.760 |
sophisticated and useful in some cases. So that's Claude artifacts. Now to be honest, I'm not 01:11:46.240 |
actually a daily user of artifacts. I use it once in a while. I do know that a large number of people 01:11:52.160 |
are experimenting with it, and you can find a lot of artifacts showcases because they're easy to 01:11:56.240 |
share. So these are a lot of things that people have developed, various timers and games and 01:12:01.440 |
things like that. But the one use case that I did find very useful in my own work is basically the 01:12:08.960 |
use of diagrams, diagram generation. So as an example, let's go back to the book chapter of 01:12:14.880 |
Adam Smith that we were looking at. What I do sometimes is we are reading The Wealth of Nations 01:12:20.240 |
by Adam Smith, I'm attaching chapter three and book one, please create a conceptual diagram of 01:12:24.640 |
this chapter. And when Claude hears conceptual diagram of this chapter, very often it will write 01:12:30.480 |
code that looks like this. And if you're not familiar with this, this is using the mermaid 01:12:36.960 |
library to basically create or define a graph. And then this is plotting that mermaid diagram. 01:12:45.280 |
And so Claude analyzed the chapter and figures out that, okay, the key principle that's being 01:12:50.080 |
communicated here is as follows, that basically the division of labor is related to the extent 01:12:55.920 |
of the market, the size of it. And then these are the pieces of the chapter. So there's the 01:13:00.560 |
comparative example of trade and how much easier it is to do on land and on water, and the specific 01:13:07.280 |
example that's used. And that geographic factors actually make a huge difference here. And then 01:13:12.560 |
the comparison of land transport versus water transport and how much easier water transport is. 01:13:18.160 |
And then here we have some early civilizations that have all benefited from basically the 01:13:24.240 |
availability of water transport and have flourished as a result of it because they 01:13:27.840 |
support specialization. So if you're a conceptual kind of like visual thinker, 01:13:33.200 |
and I think I'm a little bit like that as well, I like to lay out information as like a tree like 01:13:39.040 |
this, and it helps me remember what that chapter is about very easily. And I just really enjoy these 01:13:43.440 |
diagrams and like kind of getting a sense of like, okay, what is the layout of the argument? How is 01:13:47.360 |
it arranged spatially and so on. And so if you're like me, then you will definitely enjoy this. 01:13:52.400 |
And you can make diagrams of anything, of books, of chapters, of source codes, of anything really. 01:13:59.360 |
And so I specifically find this fairly useful. Okay, so I've shown you that LLMs are quite good 01:14:05.120 |
at writing code. So not only can they emit code, but a lot of the apps like 01:14:10.080 |
ChatGPT and Cloud and so on have started to like partially run that code in the browser. 01:14:16.160 |
So ChatGPT will create figures and show them, and Cloud Artifacts will actually like integrate 01:14:21.840 |
your React component and allow you to use it right there in line in the browser. 01:14:25.600 |
Now, actually, majority of my time personally and professionally is spent writing code. 01:14:31.760 |
But I don't actually go to ChatGPT and ask for snippets of code, because that's way too slow. 01:14:35.920 |
Like ChatGPT just doesn't have the context to work with me professionally to create code. 01:14:41.840 |
And the same goes for all the other LLMs. So instead of using features of these LLMs in 01:14:48.400 |
a web browser, I use a specific app. And I think a lot of people in the industry do as well. 01:14:52.880 |
And this can be multiple apps by now, VS Code, Windsurf, Cursor, etc. So I like to use Cursor 01:15:00.960 |
currently. And this is a separate app you can get for your, for example, Macbook. And it works with 01:15:06.320 |
the files on your file system. So this is not a web inter... this is not some kind of a web page 01:15:11.680 |
you go to. This is a program you download, and it references the files you have on your computer. 01:15:16.800 |
And then it works with those files and edits them with you. So the way this looks is as follows. 01:15:22.800 |
Here I have a simple example of a React app that I built over a few minutes with Cursor. 01:15:30.880 |
And under the hood, Cursor is using Cloud 3.7 Sonnet. So under the hood, it is calling the API 01:15:38.000 |
of Anthropic and asking Cloud to do all of this stuff. But I don't have to manually go to Cloud 01:15:45.920 |
and copy-paste chunks of code around. This program does that for me and has all of the context of the 01:15:51.440 |
files in the directory and all this kind of stuff. So the app that I developed here is a very simple 01:15:56.560 |
tic-tac-toe as an example. And Cloud wrote this in a few, probably a minute. And we can just play. 01:16:04.240 |
X can win. Or we can tie. Oh wait, sorry, I accidentally won. You can also tie. 01:16:14.560 |
And I'd just like to show you briefly, this is a whole separate video of how you would use Cursor 01:16:19.840 |
to be efficient. I just want you to have a sense that I started from a completely new project, 01:16:25.120 |
and I asked the Composer app here, as it's called, the Composer feature, to basically set up a new 01:16:31.520 |
React repository, delete a lot of the boilerplate, please make a simple tic-tac-toe app. And all of 01:16:39.200 |
this stuff was done by Cursor. I didn't actually really do anything except for write five sentences. 01:16:43.280 |
And then it changed everything and wrote all the CSS, JavaScript, etc. And then I'm running it here 01:16:50.240 |
and hosting it locally and interacting with it in my browser. So that's Cursor. It has the context 01:16:57.520 |
of your apps, and it's using Cloud remotely through an API without having to access the 01:17:03.040 |
web page. And a lot of people, I think, develop in this way at this time. And these tools have 01:17:11.200 |
become more and more elaborate. So in the beginning, for example, you could only say, 01:17:16.240 |
change, like, oh, Control-K, please change this line of code to do this or that. And then after 01:17:23.200 |
that, there was a Control-L, Command-L, which is, oh, explain this chunk of code. 01:17:27.760 |
And you can see that there's going to be an LLM explaining this chunk of code. And what's happening 01:17:34.080 |
under the hood is it's calling the same API that you would have access to if you actually did enter 01:17:38.640 |
here. But this program has access to all the files, so it has all the context. And now what we're up 01:17:44.800 |
to is not Command-K and Command-L. We're now up to Command-I, which is this tool called Composer. 01:17:51.680 |
And especially with the new agent integration, the Composer is like an autonomous agent on your 01:17:56.480 |
codebase. It will execute commands. It will change all the files as it needs to. It can edit across 01:18:04.160 |
multiple files. And so you're mostly just sitting back and you're giving commands. And the name for 01:18:11.520 |
this is called Vibe Coding, a name with that I think I probably minted. And Vibe Coding just 01:18:17.600 |
refers to giving control to Composer and just telling it what to do and hoping that it works. 01:18:25.040 |
Now, worst comes to worst, you can always fall back to the good old programming because we have 01:18:30.160 |
all the files here, we can go over all the CSS, and we can inspect everything. And if you're a 01:18:36.000 |
programmer, then in principle, you can change this arbitrarily. But now you have a very helpful 01:18:40.240 |
system that can do a lot of the low-level programming for you. So let's take it for a 01:18:44.720 |
spin briefly. Let's say that when either X or O wins, I want confetti or something. 01:18:55.840 |
Okay, I'll add a confetti effect when a player wins the game. It wants me to run React Confetti, 01:19:06.000 |
which apparently is a library that I didn't know about. So we'll just say okay. It installed it, 01:19:12.160 |
and now it's going to update the app. So it's updating app.tsx, the TypeScript file, 01:19:19.600 |
to add the confetti effect when a player wins. And it's currently writing the code, 01:19:23.440 |
so it's generating. And we should see it in a bit. Okay, so it basically added this chunk of code, 01:19:31.760 |
and a chunk of code here, and a chunk of code here. 01:19:35.520 |
And then we'll also add some additional styling to make the winning cell stand out. 01:19:41.280 |
Okay, still generating. Okay, and it's adding some CSS for the winning cells. 01:19:49.920 |
So honestly, I'm not keeping full track of this. It imported React Confetti. 01:19:55.360 |
This all seems pretty straightforward and reasonable, but I'd have to actually like 01:20:00.720 |
really dig in. Okay, it wants to add a sound effect when a player wins, which is pretty 01:20:07.120 |
ambitious, I think. I'm not actually 100% sure how it's going to do that, because I don't know 01:20:12.480 |
how it gains access to a sound file like that. I don't know where it's going to get the sound file 01:20:16.560 |
from. But every time it saves a file, we actually are deploying it. So we can actually try to 01:20:26.080 |
refresh and just see what we have right now. Also, it added a new effect. You see how it kind 01:20:31.760 |
of like fades in, which is kind of cool. And now we'll win. Whoa, okay. Didn't actually expect that 01:20:39.600 |
to work. This is really elaborate now. Let's play again. Whoa. Okay. Oh, I see. So it actually 01:20:55.840 |
paused and it's waiting for me. So it wants me to confirm the command. So make public sounds. 01:21:00.400 |
I had to confirm it explicitly. Let's create a simple audio component to play victory sound, 01:21:08.400 |
sound/victorymp3. The problem with this will be the victory.mp3 doesn't exist. So I wonder what 01:21:15.520 |
it's going to do. It's downloading it. It wants to download it from somewhere. Let's just go along 01:21:21.920 |
with it. Let's add a fallback in case the sound file doesn't exist. In this case, it actually 01:21:32.080 |
does exist. And yep, we can git add and we can basically create a git commit out of this. 01:21:42.720 |
Okay. So the composer thinks that it is done. So let's try to take it for a spin. 01:21:49.600 |
Okay. So yeah, pretty impressive. I don't actually know where it got this sound file from. 01:21:55.520 |
I don't know where this URL comes from, but maybe this just appears in a lot of repositories and 01:22:01.600 |
sort of cloud kind of like knows about it. But I'm pretty happy with this. So we can accept all. And 01:22:08.800 |
that's it. And then we, as you can get a sense of, we could continue to play the sound file. 01:22:15.120 |
And worst comes to worst, if we can't debug anything, we can always fall back to 01:22:19.920 |
standard programming instead of vibe coding. Okay. So now I would like to switch gears again. 01:22:24.880 |
Everything we've talked about so far had to do with interacting with the model via text. So we 01:22:30.720 |
type text in and it gives us text back. What I'd like to talk about now is to talk about different 01:22:36.400 |
modalities. That means we want to interact with the model via text. So we want to interact with 01:22:41.840 |
these models in more native human formats. So I want to speak to it and I want it to speak back 01:22:46.560 |
to me and I want to give images or videos to it and vice versa. I want it to generate images and 01:22:52.080 |
videos back. So it needs to handle the modalities of speech and audio and also of images and video. 01:22:58.800 |
So the first thing I want to cover is how can you very easily just talk to these models? 01:23:05.520 |
So I would say roughly in my own news, 50% of the time I type stuff out on the keyboard and 50% of 01:23:12.320 |
the time I'm actually too lazy to do that. And I just prefer to speak to the model. And when I'm 01:23:17.440 |
on mobile, on my phone, that's even more pronounced. So probably 80% of my queries are just 01:23:24.080 |
speech because I'm too lazy to type it out on the phone. Now, on the phone, things are a little bit 01:23:28.880 |
easy. So right now the Chatshubpt app looks like this. So I'm going to click on that and I'm going 01:23:33.920 |
to type in my email address and I'm going to type in my email address. So I'm going to type in my 01:23:37.680 |
email address and I'm going to type in my email address. So I'm going to type in my email address 01:23:42.160 |
and I'm going to type in my email address. So I'm going to type in my email address and I'm going 01:23:46.960 |
to type in my email address. So I'm going to type in my email address and I'm going to type in my 01:23:51.760 |
email address. So I'm going to type in my email address. So I'm going to type in my email address 01:23:56.240 |
and I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:24:00.960 |
to type in my email address and I'm going to type in my email address. So I'm going to type in my 01:24:06.560 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:24:12.000 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:24:17.040 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:24:22.240 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:24:27.120 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:24:32.320 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:24:38.400 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:24:44.000 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:24:50.000 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:24:55.680 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:25:01.760 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:25:07.440 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:25:13.600 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:25:19.280 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:25:24.400 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:25:30.160 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:25:36.320 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:25:42.160 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:25:48.240 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:25:53.840 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:25:59.760 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:26:06.000 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:26:12.640 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:26:18.320 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:26:23.600 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:26:29.520 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:26:35.280 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:26:42.320 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:26:48.800 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:26:55.040 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:27:02.240 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:27:08.400 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:27:14.080 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:27:21.360 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:27:27.360 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:27:33.280 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:27:39.440 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:27:45.280 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:27:51.440 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:27:57.600 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:28:03.280 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:28:08.880 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:28:14.720 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:28:20.080 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:28:25.280 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:28:31.680 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:28:36.720 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:28:42.320 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:28:49.040 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:28:55.440 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:29:00.800 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:29:06.480 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:29:13.040 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:29:19.200 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:29:25.920 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:29:32.800 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:29:38.880 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:29:44.960 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:29:50.640 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:29:56.320 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:30:01.040 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:30:06.960 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:30:13.120 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:30:19.280 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:30:25.120 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:30:31.040 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:30:36.160 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:30:41.600 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:30:46.240 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:30:51.040 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:30:57.280 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:31:02.320 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:31:08.000 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:31:13.200 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:31:17.520 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:31:23.040 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:31:28.800 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:31:34.240 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:31:40.400 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:31:48.080 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:31:53.920 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:31:59.840 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:32:06.000 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:32:12.880 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:32:19.200 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:32:24.800 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:32:30.400 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:32:38.800 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:32:45.760 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:32:52.000 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:32:57.360 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:33:03.440 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:33:09.200 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:33:15.200 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:33:21.200 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:33:28.240 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:33:34.560 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:33:41.120 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:33:46.480 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:33:53.760 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:33:59.600 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:34:05.520 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:34:12.160 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:34:18.400 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:34:24.960 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:34:30.960 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:34:37.840 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:34:43.440 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:34:49.600 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:34:55.600 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:35:01.520 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:35:07.280 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:35:12.640 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:35:18.640 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:35:24.800 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:35:30.720 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:35:35.680 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:35:41.440 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:35:46.960 |
So I'm going to type in my email address. So I'm going to type in my email address. So I'm going 01:35:52.080 |
to type in my email address. So I'm going to type in my email address. So I'm going to type in my 01:35:57.280 |
email address. So I'm going to type in my email address. So I'm going to type in my email address. 01:36:02.320 |
>> Oh, you mean the whole Trudeau Castro love child theory? It's wild, but I wouldn't put it 01:36:09.120 |
past those political dynasties. I mean, they were all buddies back in the day. Makes you wonder if 01:36:14.320 |
there's some secret handshake club we're not part of. What's next? Aliens in Area 51? 01:36:19.520 |
>> Bro, this is some kind of like Illuminati level stuff. 01:36:23.840 |
>> Illuminati, shadow governments, lizard people. It's all connected, my friend. Trudeau, Castro, 01:36:32.560 |
they might just be the tip of the iceberg. Next thing you know, we'll find out the queen is 01:36:36.880 |
actually a robot. You think I'm joking? >> Okay, let's try the sexy mode. Okay, hello? 01:36:41.520 |
>> Hey there, Andre. I can feel the electricity in the air already. Let me take the lead tonight. 01:36:48.720 |
Picture this. We're in a dimly lit room, the air thick with anticipation. 01:36:53.200 |
>> I see where this is going. >> Me, your eyes locked on mine, 01:36:58.720 |
and I slowly start to undress. Each piece of clothing falling. 01:37:03.360 |
>> Okay, so I think you get the idea. I think, yeah, I think you get the idea. 01:37:08.800 |
Okay, and one more paradigm I wanted to show you of interacting with language models via audio 01:37:14.880 |
is this Notebook LM from Google. So when you go to notebooklm.google.com, the way this works is on 01:37:23.920 |
the left you have sources, and you can upload any arbitrary data here. So it's raw text or it's web 01:37:29.040 |
pages or it's PDF files, et cetera. So I uploaded this PDF about this foundation model for genomic 01:37:34.960 |
sequence analysis from ARC Institute. And then once you put this here, this enters the context 01:37:41.600 |
window of the model. And then we can, number one, we can chat with that information. So we can ask 01:37:46.560 |
questions and get answers. But number two, what's kind of interesting is on the right they have this 01:37:50.320 |
deep dive podcast. So there's a generate button. You can press it and wait like a few minutes, 01:37:57.440 |
and it will generate a custom podcast on whatever sources of information you put in here. 01:38:01.840 |
So for example, here we got about a 30-minute podcast generated for this paper. And it's really 01:38:08.960 |
interesting to be able to get podcasts on demand. And I think it's kind of like interesting and 01:38:12.720 |
therapeutic. If you're going out for a walk or something like that, I sometimes upload a few 01:38:16.880 |
things that I'm kind of passively interested in and I want to get a podcast about. And it's just 01:38:20.800 |
something fun to listen to. So let's see what this looks like just very briefly. 01:38:25.760 |
OK, so get this. We're diving into AI that understands DNA. 01:38:30.960 |
Not just reading it, but predicting how changes can impact everything. 01:38:36.400 |
From a single protein all the way up to an entire organism. 01:38:39.440 |
It's really remarkable. And there's this new biological foundation model called EVO2 01:38:47.280 |
And it's trained on a massive data set called OpenGenome2, which covers over nine-- 01:38:53.680 |
OK, I think you get the rough idea. So there's a few things here. You can customize the podcast 01:38:58.960 |
and what it is about with special instructions. You can then regenerate it. And you can also 01:39:03.440 |
enter this thing called interactive mode, where you can actually break in and ask a question 01:39:06.960 |
while the podcast is going on, which I think is kind of cool. 01:39:10.240 |
So I use this once in a while when there are some documents or topics or papers that I'm not 01:39:15.680 |
usually an expert in. And I just kind of have a passive interest in. And I'm going out for a walk 01:39:20.560 |
or I'm going out for a long drive. And I want to have a custom podcast on that topic. And so I find 01:39:26.080 |
that this is good in niche cases like that, where it's not going to be covered by another podcast 01:39:32.320 |
that's actually created by humans. It's kind of like an AI podcast about any arbitrary niche 01:39:37.440 |
topic you'd like. So that's Notebook Column. And I wanted to also make a brief pointer to 01:39:44.080 |
this podcast that I generated. It's like a season of a podcast called Histories of Mysteries. 01:39:49.200 |
And I uploaded this on Spotify. And here I just selected some topics that I'm interested in. 01:39:57.040 |
And I generated a deep dive podcast on all of them. And so if you'd like to get a sense of 01:40:01.920 |
what this tool is capable of, then this is one way to just get a qualitative sense. Go on this, 01:40:06.640 |
find this on Spotify and listen to some of the podcasts here and get a sense of what it can do. 01:40:12.160 |
And then play around with some of the documents and sources yourself. So that's the podcast 01:40:17.360 |
generation interaction using Notebook Column. Okay, next up, what I want to turn to is images. 01:40:22.560 |
So just like audio, it turns out that you can re-represent images in tokens. And we can 01:40:29.920 |
represent images as token streams. And we can get language models to model them in the same way as 01:40:35.920 |
we've modeled text and audio before. The simplest possible way to do this, as an example, is you can 01:40:41.200 |
take an image and you can basically create like a rectangular grid and chop it up into little patches. 01:40:46.480 |
And then image is just a sequence of patches. And every one of those patches you quantize. 01:40:51.520 |
So you basically come up with a vocabulary of say 100,000 possible patches. And you represent each 01:40:57.200 |
patch using just the closest patch in your vocabulary. And so that's what allows you to 01:41:02.720 |
take images and represent them as streams of tokens. And then you can put them into context 01:41:06.960 |
windows and train your models with them. So what's incredible about this is that the language model, 01:41:11.680 |
the transformer neural network itself, it doesn't even know that some of the tokens happen to be 01:41:15.920 |
text, some of the tokens happen to be audio, and some of them happen to be images. It just models 01:41:21.280 |
statistical patterns of token streams. And then it's only at the encoder and at the decoder that 01:41:26.960 |
we secretly know that, okay, images are encoded in this way, and then streams are decoded in this 01:41:32.880 |
way back into images or audio. So just like we handled audio, we can chop up images into tokens 01:41:38.720 |
and apply all the same modeling techniques and nothing really changes, just the token streams 01:41:43.280 |
change and the vocabulary of your tokens changes. So now let me show you some concrete examples of 01:41:48.880 |
how I've used this functionality in my own life. Okay, so starting off with the image input, 01:41:53.760 |
I want to show you some examples that I've used LLMs where I was uploading images. 01:42:00.320 |
So if you go to your favorite ChassisPT or other LLM app, you can upload images usually and ask 01:42:06.240 |
questions of them. So here's one example where I was looking at the nutrition label of Brian 01:42:11.040 |
Johnson's longevity mix. And basically, I don't really know what all these ingredients are, 01:42:15.120 |
right? And I want to know a lot more about them and why they are in the longevity mix. 01:42:18.400 |
And this is a very good example where first I want to transcribe this into text. And the reason I like 01:42:24.800 |
to first transcribe the relevant information into text is because I want to make sure that the model 01:42:29.760 |
is seeing the values correctly. Like I'm not 100% certain that it can see stuff. And so here when it 01:42:36.160 |
puts it into a table, I can make sure that it saw it correctly. And then I can ask questions of this 01:42:40.880 |
text. And so I like to do it in two steps whenever possible. And then for example, here I asked it to 01:42:46.720 |
group the ingredients, and I asked it to basically rank them in how safe probably they are. Because I 01:42:52.800 |
want to get a sense of, okay, which of these ingredients are, you know, super basic ingredients 01:42:57.600 |
that are found in your multivitamin? And which of them are a bit more kind of like suspicious or 01:43:03.840 |
strange or not as well studied or something like that. So the model was very good in helping me 01:43:08.640 |
think through basically what's in the longevity mix and what may be missing on like why it's in 01:43:13.520 |
there, et cetera. And this is again, a good first draft for my own research afterwards. 01:43:18.080 |
The second example I want to show is that of my blood test. So very recently I did like a panel 01:43:24.800 |
of my blood test. And what they sent me back was this like 20 page PDF, which is super useless. 01:43:29.840 |
What am I supposed to do with that? So obviously I want to know a lot more information. So what I 01:43:33.760 |
did here is I uploaded all my results. So first I did the lipid panel as an example, 01:43:40.000 |
and I uploaded little screenshots of my lipid panel. And then I made sure that ChachiPT sees 01:43:44.400 |
all the correct results. And then it actually gives me an interpretation. And then I kind of 01:43:49.440 |
iterated and you can see that the scroll bar here is very low because I uploaded piece by piece all 01:43:53.600 |
of my blood test results, which are great by the way. I was very happy with this blood test. 01:44:00.800 |
And so what I wanted to say is number one, pay attention to the transcription and make 01:44:05.520 |
sure that it's correct. And number two, it is very easy to do this because on MacBook, 01:44:09.920 |
for example, you can do control shift command four, and you can draw a window and it copy pastes that 01:44:18.000 |
window into a clipboard. And then you can just go to your ChachiPT and you can control V or command 01:44:23.440 |
V to paste it in. And you can ask about that. So it's very easy to like take chunks of your screen 01:44:28.160 |
and ask questions about them using this technique. And then the other thing I would say about this 01:44:34.400 |
is that of course this is medical information and you don't want it to be wrong. I will say that in 01:44:38.320 |
the case of blood test results, I feel more confident trusting ChachiPT a bit more because 01:44:43.120 |
this is not something esoteric. I do expect there to be like tons and tons of documents about blood 01:44:47.920 |
test results. And I do expect that the knowledge of the model is good enough that it kind of 01:44:51.600 |
understands these numbers, these ranges, and I can tell it more about myself and all this kind 01:44:55.840 |
of stuff. So I do think that it is quite good. But of course you probably want to talk to an 01:45:01.280 |
actual doctor as well. But I think this is a really good first draft and something that maybe 01:45:05.200 |
gives you things to talk about with your doctor, etc. Another example is I do a lot of math and 01:45:11.280 |
code. I found this tricky question in a paper recently. And so I copy pasted this expression 01:45:18.960 |
and I asked for it in text because then I can copy this text and I can ask a model what it thinks 01:45:25.760 |
the value of x is evaluated at pi or something like that. It's a trick question. You can try it 01:45:30.000 |
yourself. Next example, here I had a Colgate toothpaste and I was a little bit suspicious 01:45:35.760 |
about all the ingredients in my Colgate toothpaste. And I wanted to know what the hell is all this. 01:45:39.280 |
So this is Colgate. What the hell is all these things? So it transcribed it and then it told me 01:45:43.840 |
a bit about these ingredients and I thought this was extremely helpful. And then I asked it, okay, 01:45:49.360 |
which of these would be considered safest and also potentially less safe? And then I asked it, 01:45:55.120 |
okay, if I only care about the actual function of the toothpaste and I don't really care about 01:45:59.440 |
other useless things like colors and stuff like that, which of these could we throw out? 01:46:03.120 |
And it said that, okay, these are the essential functional ingredients and this is a bunch of 01:46:06.800 |
random stuff you probably don't want in your toothpaste. And basically, spoiler alert, 01:46:13.360 |
most of the stuff here shouldn't be there. And so it's really upsetting to me that companies 01:46:18.000 |
put all this stuff in your food or cosmetics and stuff like that when it really doesn't need to be 01:46:25.440 |
there. The last example I wanted to show you is, so this is a meme that I sent to a friend and my 01:46:33.120 |
friend was confused like, oh, what is this meme? I don't get it. And I was showing them that Chachi 01:46:38.000 |
Petit can help you understand memes. So I copy pasted this meme and asked explain. And basically, 01:46:46.800 |
this explains the meme that, okay, multiple crows, a group of crows is called a murder. 01:46:53.440 |
And so when this crow gets close to that crow, it's like an attempted murder. 01:46:56.800 |
So yeah, Chachi Petit was pretty good at explaining this joke. Okay, now vice versa, 01:47:03.920 |
you can get these models to generate images. And the OpenAI offering of this is called DALI, 01:47:09.360 |
and we're on the third version. And it can generate really beautiful images on basically 01:47:14.000 |
given arbitrary prompts. Is this the Golden Temple in Kyoto, I think? I visited, so this is really 01:47:19.680 |
beautiful. And so it can generate really stylistic images and can ask for any arbitrary style of any 01:47:26.080 |
arbitrary topic, et cetera. Now, I don't actually personally use this functionality way too often. 01:47:30.960 |
So I cooked up a random example just to show you. But as an example, what are the big headlines 01:47:35.440 |
news today? There's a bunch of headlines around politics, health, international entertainment, 01:47:41.040 |
and so on. And I used search tool for this. And then I said, generate an image that summarizes 01:47:46.160 |
today. And so having all of this in the context, we can generate an image like this, that kind of 01:47:51.520 |
like summarizes today, just as an example. And the way I use this functionality is usually for 01:48:00.160 |
arbitrary content creation. So as an example, when you go to my YouTube channel, then this video, 01:48:05.680 |
Let's Reproduce GPT-2, this image over here was generated using a competitor actually to DALI 01:48:12.880 |
called ideogram.ai. And the same for this image that's also generated by an AI. And this image 01:48:19.280 |
as well was generated, I think, also by ideogram, or this may have been Chachapiti, I'm not sure. 01:48:24.160 |
So I use some of the tools interchangeably. So I use it to generate icons and things like that. 01:48:28.880 |
And you can just kind of like ask for whatever you want. Now, I will note that the way that this 01:48:34.800 |
actually works, the image output, it's not done fully in the model currently with DALI 3. With 01:48:42.720 |
DALI 3, this is a separate model that takes text and creates image. And what's actually happening 01:48:48.480 |
under the hood here, in the current iteration of Chachapiti, is when I say generate an image that 01:48:53.360 |
summarizes today, this will actually, under the hood, create a caption for that image. And that 01:48:59.680 |
caption is sent to a separate model that is an image generator model. And so it's kind of like 01:49:04.640 |
stitched up in this way. But it's not like super important to, I think, fully understand at this 01:49:09.440 |
point. So that is image output. Now, next up, I want to show you an extension where the model 01:49:17.280 |
can actually like see video. And this is included in what I showed you previously called advanced 01:49:23.600 |
voice. It's not available on the web app, but it's available on the mobile app. So if we go to 01:49:29.680 |
advanced voice, and we connect. So hello, I'm demonstrating the video capabilities. And I'm 01:49:35.840 |
showing to the camera that there is a little video icon over there. And when you press it, 01:49:40.720 |
you can basically see, you can basically show the model different things. 01:49:44.240 |
I can see the camera feed. It looks like you've got some black acoustic foam panels on the wall. 01:49:51.200 |
Are you testing out how well they reduce echo in the room? 01:49:53.920 |
Yeah, I was testing out the acoustic foam. That's right. But it's not actually that helpful. 01:49:58.880 |
Sometimes placement can make a big difference. Have you tried covering more corners or 01:50:03.120 |
experimenting with different layouts? Okay, well, let's let's look at this. What book is this? Do 01:50:08.000 |
you know? Yes, that's Genghis Khan and the making of the modern world by Jack Weatherford. It's a 01:50:14.560 |
fascinating book about the impact of Genghis Khan on world history. Yeah, pretty good. Do you see 01:50:19.440 |
this other book? What is it? That's surely you're joking, Mr. Feynman by Richard Feynman. 01:50:26.080 |
It's a collection of humorous anecdotes from the physicist's life. It's really entertaining. 01:50:30.880 |
And what is this? That's an Aeronet 4, a portable CO2 monitor. It measures carbon dioxide levels, 01:50:40.240 |
temperature, humidity, and air pressure. It's quite useful for monitoring indoor air quality. 01:50:45.440 |
So we're currently at 713, right? Do you think that's good or what? 01:50:50.640 |
A CO2 level of 713 ppm is generally okay. It's normal for indoor spaces. Ideally, you want it 01:50:56.880 |
below 800 ppm. If it goes above 1000, you might need more ventilation. Got it. And how about this 01:51:02.720 |
map? Do you know what this map is? Yes, that looks like a map of Middle Earth from J.R.R. Tolkien's 01:51:10.000 |
The Lord of the Rings. It's a great piece for any fan of the series. Yeah, good job. Thank you for 01:51:16.000 |
You're welcome. Glad I could help. If you need anything else, just let me know. 01:51:19.200 |
So that's a brief demo. You basically have the camera running, you can point it at stuff, 01:51:25.920 |
and you can just talk to the model. It is quite magical, super simple to use. I don't personally 01:51:31.120 |
use it in my daily life because I'm kind of like a power user of all the chat GPT apps, 01:51:35.840 |
and I don't kind of just like go around pointing at stuff and asking the model for stuff. 01:51:40.560 |
I usually have very targeted queries about code and programming, etc. But I think if I was 01:51:44.400 |
demonstrating some of this to my parents or my grandparents and have them interact in a very 01:51:48.240 |
natural way, this is something that I would probably show them, because they can just point 01:51:53.040 |
the camera at things and ask questions. Now, under the hood, I'm not actually 100% sure that they 01:51:57.840 |
currently consume the video. I think they actually still just take image sections, like for example, 01:52:05.600 |
they take one image per second or something like that. But from your perspective as a user of the 01:52:10.640 |
tool, it definitely feels like you can just stream it video and have it make sense. So I think that's 01:52:16.800 |
pretty cool as a functionality. And finally, I want to briefly show you that there's a lot of 01:52:21.120 |
tools now that can generate videos, and they are incredible, and they're very rapidly evolving. 01:52:25.440 |
I'm not going to cover this too extensively because I don't... I think it's relatively 01:52:30.240 |
self-explanatory. I don't personally use them that much in my work, but that's just because I'm not 01:52:34.000 |
in a kind of a creative profession or something like that. So this is a tweet that compares a 01:52:38.400 |
number of AI video generation models as an example. This tweet is from about a month ago, 01:52:43.200 |
so this may have evolved since. But I just wanted to show you that all of these models were asked 01:52:50.160 |
to generate, I guess, a tiger in a jungle. And this is a tweet from a month ago, so this may 01:52:55.440 |
have evolved since. But I just wanted to show you that all of these models were asked to generate, 01:53:00.480 |
I guess, a tiger in a jungle. And they're all quite good. I think right now, VO2, I think, 01:53:06.720 |
is really near state-of-the-art and really good. Yeah, that's pretty incredible, right? 01:53:22.960 |
etc. And you can compare and contrast and use some of these tools that are dedicated to this problem. 01:53:27.600 |
Okay, and the final topic I want to turn to is some quality of life features that I think are 01:53:33.760 |
quite worth mentioning. So the first one I want to talk about is ChatGPT memory feature. 01:53:38.960 |
So say you're talking to ChatGPT, and you say something like, "When roughly do you think we'll 01:53:45.520 |
speak Hollywood?" Now, I'm actually surprised that ChatGPT gave me an answer here because 01:53:50.080 |
I feel like very often, these models are very averse to actually having any opinions. And they 01:53:54.960 |
say something along the lines of, "Oh, I'm just an AI. I'm here to help. I don't have any opinions," 01:53:58.880 |
and stuff like that. So here, actually, it seems to have an opinion and say it says that the last 01:54:04.960 |
true peak before franchises took over was 1990s to early 2000s. So I actually happen to really 01:54:10.480 |
agree with ChatGPT here. And I really agree, so totally agreed. Now, I'm curious what happens 01:54:19.280 |
here. Okay, so nothing happened. So what you can, basically, every single conversation like we 01:54:28.560 |
talked about begins with empty token window and goes until the end. The moment I do new conversation 01:54:35.120 |
or a new chat, everything gets wiped clean. But ChatGPT does have an ability to save information 01:54:41.200 |
from chat to chat, but it has to be invoked. So sometimes ChatGPT will trigger it automatically, 01:54:47.680 |
but sometimes you have to ask for it. So basically, say something along the lines of, 01:54:52.080 |
"Can you please remember this?" Or like, "Remember my preference," or whatever, 01:55:00.080 |
something like that. So what I'm looking for is, I think it's going to work. There we go. 01:55:08.560 |
So you see this memory updated, believes that late 1990s and early 2000s was the greatest 01:55:15.280 |
peak of Hollywood, et cetera. Yeah. And then it also went on a bit about 1970. 01:55:23.360 |
And then it allows you to manage memories. So we'll look into that in a second. But what's 01:55:28.080 |
happening here is that ChatGPT wrote a little summary of what it learned about me as a person 01:55:32.800 |
and recorded this text in its memory bank. And a memory bank is basically a separate piece of 01:55:40.160 |
ChatGPT that is kind of like a database of knowledge about you. And this database of 01:55:45.760 |
knowledge is always prepended to all the conversations so that the model has access to 01:55:50.960 |
it. And so I actually really like this because every now and then the memory updates whatever 01:55:56.480 |
you have conversations with ChatGPT. And if you just let this run and you just use ChatGPT 01:56:00.960 |
naturally, then over time, it really gets to know you to some extent. And it will start to make 01:56:06.480 |
references to the stuff that's in the memory. And so when this feature was announced, I wasn't 100% 01:56:11.360 |
sure if this was going to be helpful or not. But I think I'm definitely coming around. And I've 01:56:16.080 |
used this in a bunch of ways. And I definitely feel like ChatGPT is knowing me a little bit 01:56:21.440 |
better over time and is being a bit more relevant to me. And it's all happening just by sort of 01:56:27.920 |
natural interaction and over time through this memory feature. So sometimes it will trigger it 01:56:33.840 |
explicitly. And sometimes you have to ask for it. Okay, now I thought I was going to show you some 01:56:38.400 |
of the memories and how to manage them. But actually, I just looked and it's a little too 01:56:41.680 |
personal, honestly. So it's just a database, it's a list of little text strings, those text strings, 01:56:47.920 |
just make it to the beginning. And you can edit the memories, which I really like. And you can, 01:56:53.280 |
you know, add memories, delete memories, manage your memories database. So that's incredible. 01:56:58.480 |
I will also mention that I think the memory feature is unique to ChatGPT. I think that 01:57:03.600 |
other LLMs currently do not have this feature. And I will also say that, for example, ChatGPT is 01:57:10.000 |
very good at movie recommendations. And so I actually think that having this in its memory 01:57:15.040 |
will help it create better movie recommendations for me. So that's pretty cool. The next thing I 01:57:19.840 |
wanted to briefly show is custom instructions. So you can, to a very large extent, modify your 01:57:25.680 |
ChatGPT and how you like it to speak to you. And so I quite appreciate that as well. You can come 01:57:32.000 |
to settings, customize ChatGPT. And you see here, it says, what traits should ChatGPT have? 01:57:39.440 |
And I just kind of like told it, just don't be like an HR business partner, just talk to me 01:57:43.920 |
normally. And also just give me, I just love explanations, educations, insights, et cetera. 01:57:48.800 |
So be educational whenever you can. And you can just probably type anything here and you can 01:57:52.560 |
experiment with that a little bit. And then I also experimented here with telling it my identity. 01:58:00.000 |
I'm just experimenting with this, et cetera. And I'm also learning Korean. And so here I'm kind of 01:58:06.560 |
telling it that when it's giving me Korean, it should use this tone of formality. Otherwise, 01:58:11.520 |
sometimes, or this is like a good default setting. Because otherwise, sometimes it might give me the 01:58:16.080 |
informal or it might give me the way too formal and sort of tone. And I just want this tone by 01:58:21.600 |
default. So that's an example of something I added. And so anything you want to modify about 01:58:25.440 |
ChatGPT globally between conversations, you would kind of put it here into your custom instructions. 01:58:30.800 |
And so I quite welcome this. And this, I think you can do with many other LLMs as well. So look 01:58:36.640 |
for it somewhere in the settings. Okay. And the last feature I wanted to cover is custom GPTs, 01:58:41.760 |
which I use once in a while. And I like to use them specifically for language learning the most. 01:58:46.720 |
So let me give you an example of how I use these. So let me first show you maybe, 01:58:51.040 |
they show up on the left here. So let me show you this one, for example, Korean Detailed Translator. 01:58:56.960 |
So no, sorry, I want to start with this one, Korean Vocabulary Extractor. So basically, 01:59:04.000 |
the idea here is I give it, this is a custom GPT, I give it a sentence, and it extracts vocabulary 01:59:12.400 |
in dictionary form. So here, for example, given this sentence, this is the vocabulary. And notice 01:59:18.480 |
that it's in the format of Korean, semicolon, English. And this can be copy pasted into Anki 01:59:26.160 |
flashcards app. And basically, this kind of, this means that it's very easy to turn a sentence 01:59:34.640 |
into flashcards. And now the way this works is basically, if we just go under the hood, 01:59:39.040 |
and we go to edit GPT, you can see that you're just kind of like, this is all just done via 01:59:45.920 |
prompting. Nothing special is happening here. The important thing here is instructions. So when I 01:59:50.640 |
pop this open, I just kind of explain a little bit of, okay, background information, I'm learning 01:59:54.880 |
Korean, I'm beginner, instructions. I will give you a piece of text, and I want you to extract 02:00:00.960 |
the vocabulary. And then I give it some example output. And basically, I'm being detailed. And 02:00:07.920 |
when I give instructions to LLMs, I always like to number one, give it sort of the description, 02:00:13.760 |
but then also give it examples. So I like to give concrete examples. And so here are four concrete 02:00:19.360 |
examples. And so what I'm doing here really is I'm constructing what's called a few shot prompt. 02:00:23.680 |
So I'm not just describing a task, which is kind of like asking for performance in a zero shot 02:00:28.240 |
manner, just like do it without examples. I'm giving it a few examples. And this is now a few 02:00:32.640 |
shot prompt. And I find that this always increases the accuracy of LLMs. So kind of, that's a, 02:00:37.600 |
I think, a general good strategy. And so then when you update and save this LLM, then just given a 02:00:45.840 |
single sentence, it does that task. And so notice that there's nothing new and special going on. 02:00:50.960 |
All I'm doing is I'm saving myself a little bit of work, because I don't have to basically start 02:00:56.720 |
from scratch, and then describe the whole setup in detail. I don't have to tell ChachiPT all of this 02:01:04.800 |
each time. And so what this feature really is, is that it's just saving you prompting time. 02:01:10.000 |
If there's a certain prompt that you keep reusing, then instead of reusing that prompt and copy 02:01:15.360 |
pasting it over and over again, just create a custom Chachi, custom GPT, save that prompt a 02:01:20.640 |
single time. And then what's changing per sort of use of it is the different sentence. So if 02:01:26.720 |
I give it a sentence, it always performs this task. And so this is helpful if there are certain 02:01:31.920 |
prompts or certain tasks that you always reuse. The next example that I think transfers to every 02:01:37.200 |
other language would be basic translation. So as an example, I have this sentence in Korean, 02:01:42.800 |
and I want to know what it means. Now many people will go to just Google Translate or something 02:01:46.720 |
like that. Now famously, Google Translate is not very good with Korean. So a lot of people 02:01:51.440 |
use Naver or Papago and so on. So if you put that here, it kind of gives you a translation. 02:01:58.000 |
Now, these translations often are okay as a translation. But I don't actually really 02:02:03.120 |
understand how this sentence goes to this translation. Like where are the pieces I need 02:02:07.040 |
to like I want to know more and I want to be able to ask clarifying questions and so on. And so here 02:02:11.600 |
it kind of breaks it up a little bit, but it's just like not as good because a bunch of it gets 02:02:16.480 |
omitted, right? And those are usually particles and so on. So I basically built a much better 02:02:20.960 |
translator in Chachi, PT, and I think it works significantly better. So I have a Korean detailed 02:02:25.440 |
translator. And when I put that same sentence here, I get what I think is a much, much better 02:02:30.320 |
translation. So it's three in the afternoon now, and I want to go to my favorite cafe. 02:02:34.560 |
And this is how it breaks up. And I can see exactly how all the pieces of it translate 02:02:40.800 |
part by part into English. So chigumin, afternoon, etc. So all of this. And what's really beautiful 02:02:48.960 |
about this is not only can I see all the little detail of it, but I can ask clarifying questions. 02:02:55.920 |
Right here. And we can just follow up and continue the conversation. So this is, I think, 02:02:59.680 |
significantly better, significantly better in translation than anything else you can get. And 02:03:04.000 |
if you're learning different language, I would not use a different translator other than Chachi, 02:03:08.160 |
PT. It understands a ton of nuance. It understands slang. It's extremely good. And I don't know why 02:03:16.240 |
translators even exist at this point. And I think GPT is just so much better. Okay. And so the way 02:03:21.600 |
this works, if we go to here is if we edit this GPT, just so we can see briefly, then these are 02:03:28.560 |
the instructions that I gave it, you'll be giving a sentence a Korean, your task is to translate the 02:03:33.280 |
whole sentence into English first, and then break up the entire translation in detail. And so here, 02:03:39.120 |
again, I'm creating a few shot prompt. And so here's how I kind of gave it the examples because 02:03:43.360 |
they're a bit more extended. So I used kind of like an XML like language, just so that the model 02:03:48.960 |
understands that the example one begins here and ends here. And I'm using XML kind of tags. 02:03:54.400 |
And so here's the input, I gave it, and here's the desired output. And so I just give it a few 02:04:00.320 |
examples, and I kind of like specify them in detail. And then I have a few more instructions 02:04:07.360 |
here. I think this is actually very similar to human, how you might teach a human a task, 02:04:12.560 |
like you can explain in words what they're supposed to be doing. But it's so much better if 02:04:16.080 |
you show them by example, how to perform the task. And humans, I think, can also learn in a few shot 02:04:20.400 |
manner significantly more efficiently. And so you can program this in whatever way you like. And 02:04:26.000 |
then you get a custom translator that is designed just for you, and is a lot better than what you 02:04:31.280 |
would find on the internet. And empirically, I find that ChachiPT is quite good at translation, 02:04:36.480 |
especially for like a basic beginner like me right now. Okay, maybe the last one that I'll show you 02:04:41.760 |
just because I think it ties a bunch of functionality together is as follows. Sometimes I'm, 02:04:46.480 |
for example, watching some Korean content. And here we see we have the subtitles, but the subtitles 02:04:51.600 |
are baked into video, into the pixels. So I don't have direct access to the subtitles. And so what I 02:04:57.120 |
can do here is I can just screenshot this. And this is a scene between Jinyoung and Seulgi in 02:05:02.560 |
Singles Inferno. So I can just take it, and I can paste it here. And then this custom GPT I called 02:05:10.160 |
KoreanCAP, first OCRs it, then it translates it, and then it breaks it down. And so basically it 02:05:17.040 |
does that. And then I can continue watching, and anytime I need help, I will copy-paste the 02:05:22.240 |
screenshot here, and this will basically do that translation. And if we look at it under the hood, 02:05:27.840 |
in Edit GPT, you'll see that in the instructions, it just simply gives out, it just breaks down the 02:05:37.440 |
instructions. So you'll be given an image crop from a TV show, Singles Inferno, but you can 02:05:41.440 |
change this, of course. And it shows a tiny piece of dialogue. So I'm giving the model sort of a 02:05:46.320 |
heads up and a context for what's happening. And these are the instructions. So first OCR it, 02:05:51.520 |
then translate it, and then break it down. And then you can do whatever format you like. 02:05:56.400 |
And you can play with this and improve it, but this is just a simple example, 02:06:00.320 |
and this works pretty well. So yeah, these are the kinds of custom GPTs that I've built for myself. 02:06:06.400 |
A lot of them have to do with language learning, and the way you create these is you come here, 02:06:11.360 |
and you click My GPTs, and you basically create a GPT, and you can configure it arbitrarily here. 02:06:17.840 |
And as far as I know, GPTs are fairly unique to ChatGPT, but I think some of the other LLM apps 02:06:24.640 |
probably have a similar kind of functionality, so you may want to look for it in the project 02:06:29.680 |
settings. Okay, so I could go on and on about covering all the different features that are 02:06:34.000 |
available in ChatGPT and so on, but I think this is a good introduction and a good bird's-eye view 02:06:39.440 |
of what's available right now, what people are introducing, and what to look out for. 02:06:43.680 |
So in summary, there is a rapidly growing, changing, and shifting, and thriving ecosystem 02:06:51.360 |
of LLM apps like ChatGPT. ChatGPT is the first and the incumbent, and it's probably the most 02:06:58.000 |
feature-rich out of all of them, but all of the other ones are very rapidly growing and becoming 02:07:03.920 |
either reaching feature parity or even overcoming ChatGPT in some specific cases. As an example, 02:07:10.080 |
ChatGPT now has internet search, but I still go to Perplexity, because Perplexity was doing search 02:07:16.720 |
for a while, and I think their models are quite good. Also, if I want to kind of prototype some 02:07:22.400 |
simple web apps, and I want to create diagrams and stuff like that, I really like Cloud Artifacts, 02:07:27.280 |
which is not a feature of ChatGPT. If I just want to talk to a model, then I think ChatGPT 02:07:33.280 |
Advanced Voice is quite nice today, and if it's being too cagey with you, then you can switch to 02:07:38.160 |
Grok, things like that. So basically, all the different apps have some strengths and weaknesses, 02:07:42.800 |
but I think ChatGPT by far is a very good default and the incumbent and most feature-rich. 02:07:47.600 |
Okay, what are some of the things that we are keeping track of when we're thinking about these 02:07:52.240 |
apps and between their features? So the first thing to realize, and that we looked at, is you're 02:07:56.880 |
talking basically to a zip file. Be aware of what pricing tier you're at, and depending on the 02:08:02.160 |
pricing tier, which model you are using. If you are using a model that is very large, that model 02:08:10.080 |
is going to have basically a lot of world knowledge, and it's going to be able to answer complex 02:08:15.040 |
questions, it's going to have very good writing, it's going to be a lot more creative in its writing, 02:08:19.440 |
and so on. If the model is very small, then probably it's not going to be as creative, 02:08:23.840 |
it has a lot less world knowledge, and it will make mistakes. For example, it might hallucinate. 02:08:27.840 |
On top of that, a lot of people are very interested in these models that are thinking 02:08:35.280 |
and trained with reinforcement learning, and this is the latest frontier in research today. 02:08:38.640 |
So in particular, we saw that this is very useful and gives additional accuracy in problems like 02:08:44.720 |
math, code, and reasoning. So try without reasoning first, and if your model is not 02:08:50.640 |
solving that kind of a problem, try to switch to a reasoning model and look for that in the 02:08:55.040 |
user interface. On top of that, then we saw that we are rapidly giving the models a lot more tools. 02:09:01.760 |
So as an example, we can give them an internet search. So if you're talking about some fresh 02:09:05.280 |
information or knowledge that is probably not in the zip file, then you actually want to use an 02:09:09.840 |
internet search tool, and not all of these apps have it. In addition, you may want to give it 02:09:15.600 |
access to a Python interpreter so that it can write programs. So for example, if you want to 02:09:20.240 |
generate figures or plots and show them, you may want to use something like advanced data analysis. 02:09:24.800 |
If you're prototyping some kind of a web app, you might want to use artifacts, or if you are 02:09:28.400 |
generating diagrams, because it's right there and in line inside the app. Or if you're programming 02:09:33.280 |
professionally, you may want to turn to a different app like Cursor and Composer. On top of all this, 02:09:39.600 |
there's a layer of multi-modality that is rapidly becoming more mature as well, and that you may 02:09:44.160 |
want to keep track of. So we were talking about both the input and the output of all the different 02:09:49.200 |
modalities, not just text, but also audio, images, and video. And we talked about the fact that some 02:09:54.560 |
of these modalities can be sort of handled natively inside the language model. Sometimes 02:09:59.760 |
these models are called omni-models or multi-modal models, so they can be handled natively by the 02:10:04.560 |
language model, which is going to be a lot more powerful, or they can be tacked on as a separate 02:10:09.280 |
model that communicates with the main model through text or something like that. So that's 02:10:14.160 |
a distinction to also sometimes keep track of. And on top of all this, we also talked about quality 02:10:18.640 |
of life features. So for example, file uploads, memory features, instructions, GPTs, and all this 02:10:23.600 |
kind of stuff. And maybe the last sort of piece that we saw is that all of these apps have usually 02:10:30.320 |
a web kind of interface that you can go to on your laptop, or also a mobile app available on 02:10:35.520 |
your phone. And we saw that many of these features might be available on the app in the browser, but 02:10:40.560 |
not on the phone, and vice versa. So that's also something to keep track of. So all of this is a 02:10:45.120 |
little bit of a zoo. It's a little bit crazy, but these are the kinds of features that exist that 02:10:49.360 |
you may want to be looking for when you're working across all of these different apps. 02:10:52.640 |
And you probably have your own favorite in terms of personality or capability or something like 02:10:57.040 |
that, but these are some of the things that you want to be thinking about and looking for and 02:11:01.760 |
experimenting with over time. So I think that's a pretty good intro for now. Thank you for watching. 02:11:07.440 |
I hope my examples were interesting or helpful to you, and I will see you next time.