back to indexBuilding a ChatGPT Plugin for Lex Fridman Podcasts
Chapters
0:0 ChatGPT plugins for YouTube videos
0:19 Ask Lex ChatGPT Plugin so far
3:56 How ChatGPT is searching the podcast
6:58 How the Ask Lex ChatGPT plugin works
15:58 Finding the plugin code
16:40 Instructions for ChatGPT plugin
19:15 Creating and indexing the podcast transcripts
21:32 Hosting the API on DigitalOcean
22:33 Installing the plugin in ChatGPT
23:41 Having a conversation with "Ask Lex"
25:54 Final notes on ChatGPT plugins
00:00:00.000 |
Today, we're going to take a look at a ChatGPT plugin 00:00:04.640 |
that is going to let us search through podcast episodes 00:00:08.700 |
and basically answer questions based on those episodes 00:00:21.160 |
You can see up here, we have a little Lex Friedman logo. 00:00:26.200 |
That's because we are going to ask Lex some questions. 00:00:30.360 |
So we're going to ask Lex about the future of AI. 00:00:37.360 |
It decides that it needs to use the AskLex plugin. 00:00:41.200 |
And then it's kind of just giving us a summary 00:00:47.720 |
the source of these episodes, which is pretty nice. 00:00:58.720 |
So we get a pretty good summary of AI being discussed 00:01:06.720 |
This is, I think, one of the courses he did at MIT, right? 00:01:13.800 |
Now, what if I, maybe I want a little more information 00:01:39.120 |
it knows that I still want to use this plugin. 00:01:46.400 |
in this kind of summary of that particular episode. 00:02:07.240 |
Now, let's just continue this conversation a little bit. 00:02:11.160 |
So interesting, just let's talk about space exploration. 00:02:23.640 |
Getting a few summaries like we did the other time. 00:02:30.960 |
One with Darwin Newman and the other one with Ariel Ekblor. 00:02:42.360 |
building space megastructures, that sounds interesting. 00:02:48.560 |
Can you tell me more about building space megastructures? 00:03:07.200 |
and decided that, yeah, we're probably talking 00:03:43.400 |
Let's kind of dive into how I actually built this 00:03:50.560 |
when ChatGPT is deciding to use the AskElects plugin. 00:04:03.120 |
We can open this to see what is being sent to this plugin. 00:04:19.080 |
If that doesn't make sense, I'm going to explain it. 00:04:21.640 |
But essentially you can think of this as being our question. 00:04:29.160 |
of transcribed audio from Deluxe Freeman podcasts 00:04:51.640 |
You see there's a ton of things being spoken about here, 00:04:55.520 |
but ChatGPT is actually managing to kind of find, 00:05:07.880 |
that I couldn't have thought of in a greater world 00:05:09.760 |
I couldn't imagine, you know, all this sort of stuff, right? 00:05:12.040 |
So that's the way it's getting that information 00:05:13.720 |
from it then summarizes and gives us in that answer, okay? 00:05:18.720 |
And we can see that there's three entries there, right? 00:05:22.480 |
So we have that the Eric Brian Jolson episode, 00:05:41.080 |
you can see that the query has changed slightly. 00:05:46.240 |
And what we have done is actually given specific instructions 00:05:55.120 |
if someone wants to know more about a particular episode, 00:06:38.000 |
So here, it will just be a search, the query. 00:06:41.120 |
And then down here, I'm not sure what it's doing actually. 00:06:47.680 |
and it's also filtering for that particular episode 00:06:49.800 |
because it knows the information is in there. 00:06:57.000 |
but let's maybe dive into it a little more detail 00:07:05.680 |
and visualize what we have actually built here. 00:07:29.480 |
because the code for that isn't actually public 00:07:33.280 |
is that you are basically passing your plugin, 00:07:38.280 |
which is like a tool for your chat GPT instance to use. 00:07:43.960 |
You're basically passing that into the prompt 00:07:51.920 |
maybe it's reformatted on its way to chat GPT, 00:07:54.480 |
I'm not sure, but basically somewhere in the prompts, 00:08:04.160 |
Okay, and then it will list each of these plugins 00:08:06.240 |
with a little description about each one of these plugins, 00:08:34.640 |
And when you see the words, ask Lex in the user's prompt, 00:08:47.840 |
and it will sometimes decide that it needs to use this. 00:09:01.680 |
actually, you already saw it, it contains queries. 00:09:04.600 |
And then within that, you have like a list of queries. 00:09:08.760 |
Okay, so it can actually pass multiple queries in there, 00:09:26.120 |
And then we have this response from the plugin. 00:09:40.680 |
so we pass the query from chat GPT into our plugin, 00:09:47.240 |
that is not necessarily anything to do with chat GPT. 00:09:50.720 |
Okay, so in this space is our own code or API, 00:10:00.040 |
is based on the chat GPT retrieval plugin API 00:10:10.720 |
So we go to github.com/openai/chat-gpt-retrieval-plugin. 00:10:19.240 |
that chat GPT retrieval plugin lets you easily search 00:10:24.400 |
by asking questions in everyday language, right? 00:10:30.920 |
It's just rather than personal work documents, 00:10:49.360 |
We've set it up to use OpenAI's text embedding model, 00:10:57.000 |
So ARDA002, and you can basically think of this 00:11:01.120 |
as the same as the GPT models that you've seen, 00:11:06.920 |
this one actually generates numerical vectors 00:11:36.200 |
this space between this vector and the others, 00:11:39.160 |
this one will have some different meaning, okay? 00:11:58.120 |
So how do we perform searches through this space? 00:12:00.600 |
Okay, so imagine we have our user query over here, 00:12:07.040 |
It was, ask Lex about the future of AI, right? 00:12:17.720 |
and that is going to create an embedding, right? 00:12:19.880 |
And it's going to maybe place it over here, right? 00:12:28.040 |
these two documents that we already embedded, 00:12:38.840 |
to be relevant to our particular query, okay? 00:12:43.560 |
this whole thing that you have going on here, 00:13:07.680 |
So you can have like hundreds of millions of these vectors 00:13:13.280 |
and you'll be able to retrieve relevant items 00:13:24.080 |
by the Vector Database component there, okay? 00:13:27.760 |
So we are then going to return those items out here, okay? 00:13:43.240 |
So let's say that was another item over here, right? 00:14:00.320 |
So they go into here and they go back to chat GPT. 00:14:18.960 |
and the URL to the podcast episode and everything else, 00:14:22.440 |
which is exactly what you can see in here, right? 00:14:25.080 |
So these results, we have the original query, 00:14:27.480 |
and then we have the things that we returned, right? 00:14:37.600 |
which I think is actually being used for the URL. 00:14:44.120 |
or in the URL, we should actually add the actual URL 00:14:55.280 |
we've just basically augmented our original query 00:15:04.480 |
So now, our query is going to be a ton of text 00:15:10.960 |
and we've just returned it to JetGPT over here, 00:15:23.800 |
The user's question is, and then it's whatever you, 00:15:33.360 |
And then, JetGPT is going to answer your question, 00:15:41.560 |
where it's telling you a ton of stuff, right? 00:15:49.400 |
and it's going to take us through to that episode, right? 00:15:53.080 |
So this is, yeah, the Eric Brian Jolson episode. 00:15:58.800 |
But there is still a little more to this whole thing 00:16:05.400 |
I mean, Wanda is actually the code to do all of this, 00:16:09.320 |
which you can actually find if you go to GitHub. 00:16:12.840 |
The plugin code itself is actually all here, right? 00:16:27.760 |
as what you would find in the JetGPT retrieval plugin. 00:16:37.280 |
So for example, if you go to the well-known here, 00:16:40.480 |
this is basically where you sell the instructions 00:16:50.760 |
So I've changed the name for the model, okay? 00:16:53.640 |
So the name that JetGPT we use to understand this model, 00:17:04.400 |
to answer user questions using Lex Friedman podcast. 00:17:14.840 |
This tool can also be used to follow up questions 00:17:19.560 |
to grab more information from specific podcasts 00:17:23.440 |
like by filtering for a specific podcast title, right? 00:17:31.560 |
on how to use this plugin, which is they're pretty light. 00:17:36.920 |
we've just set up where our API is hosted, right? 00:17:41.000 |
So we need to include that so that JetGPT knows 00:17:44.760 |
where to find the opening API spec for our app. 00:17:48.720 |
So these are basically going to be the instructions 00:18:00.920 |
And there were some changes made there as well. 00:18:06.880 |
So just put in the server, you have to do this. 00:18:18.720 |
So say this is an array of search query objects, 00:18:22.840 |
each contain natural language query string, query, 00:18:25.560 |
and an optional metadata filter, filter, okay? 00:18:29.680 |
So we also say here filters can help refine search results 00:18:33.120 |
based on criteria such as document title or time period. 00:18:41.400 |
For example, the user may ask for more information 00:18:57.680 |
In that case, the filter field can then be used 00:19:27.400 |
it will be somewhere around the top of the video right now. 00:19:30.280 |
To this notebook, you'll just be able to run it through 00:19:34.720 |
Okay, so there are a few items that you need, 00:19:58.040 |
We're using a pre-built dataset in this example, 00:20:01.240 |
but actually, if you wanted to do the download 00:20:04.080 |
and transcribing yourself, you just open this, 00:20:07.000 |
and you can actually see all of the code that I use there. 00:20:10.520 |
So I actually built this PodGPT library to help with that. 00:20:18.160 |
So in reality, there's just a few lines of code, okay? 00:20:31.360 |
and then transcribing them using OpenAI's Whisper, 00:20:43.440 |
So you can actually just use my dataset here. 00:20:55.800 |
So then all you need to do is you initialize your -- 00:21:03.160 |
and also the embedding model that you're going to be using 00:21:16.920 |
and specifically the OpenAI text embedding R002 model, okay? 00:21:21.520 |
And then here, I'm just looping through all of the podcasts. 00:21:24.520 |
We're indexing everything, and that is actually it, okay? 00:21:28.480 |
That's all I did in order to index all of this text. 00:21:32.480 |
Then from there, I just went over to DigitalOcean. 00:21:40.560 |
which is pretty ideal for what we have, right? 00:21:57.400 |
Clicked on Apps, clicked on Deploy from GitHub, 00:22:03.520 |
And then it deploys, and I end up with this, right? 00:22:15.520 |
You can open it, and we will probably see this, okay? 00:22:26.920 |
Without the endpoints or without any file extensions, 00:22:49.400 |
So this is the -- we saw this, the AIPlugin.json file. 00:22:57.800 |
So from there, we can see, okay, we've got ValidateManifest, 00:23:13.600 |
But then from there, all you do is you go to your Plugin Store. 00:23:22.440 |
And then you just go ahead and use it, right? 00:23:25.480 |
So it's incredibly easy to pull this together. 00:23:45.960 |
I want to know what he thinks about World War II, right? 00:23:53.880 |
I know this is, like, a favorite talking point of him, 00:23:57.160 |
so there's probably plenty of things to talk about there. 00:24:05.920 |
We're getting, like, pretty similar thing to before. 00:24:13.000 |
We can ask, okay, cool, who's, like, this guy? 00:24:23.200 |
this episode, and it will actually be able to do this. 00:24:41.800 |
Maybe, you know, something about this makes us think, 00:24:45.440 |
oh, I kind of want to know a little more about that. 00:24:49.760 |
so there's this Ian Kershaw biography on Adolf Hitler. 00:25:04.560 |
Now, I'm not sure if it's going to use the SLX plug-in. 00:25:11.120 |
It already probably knows some stuff about this. 00:25:14.280 |
Now, my internet isn't great, so it just cut out, 00:25:26.920 |
about this particular book that was kind of inspired 00:25:30.880 |
by us reading through the Lex Friedman information. 00:25:47.840 |
I can actually just give you all this information 00:26:18.600 |
I think with that, you can pretty much do all of this. 00:26:22.400 |
But if you are struggling to kind of follow along 00:26:26.240 |
or you just want a bit more of a technical deep dive, 00:26:28.600 |
I do actually have another video on Chad G. Petit plugins 00:26:41.080 |
So if you like and you want to go into technical details, 00:26:45.240 |
I would definitely recommend following along with that video 00:26:48.520 |
as I actually go through, like from start to finish, 00:26:51.680 |
building a plugin, not specifically for podcasts, 00:27:03.320 |
I hope this has been interesting and insightful 00:27:11.280 |
If you are right now looking at where to find these plugins, 00:27:16.280 |
there is right now still a wait list to get access to those. 00:27:24.480 |
is somewhere in either the description of the video