back to indexLLMs: A Hackers Guide
00:00:00.000 |
So this is just meant to be relatively casual. 00:00:03.760 |
This is just a template and through to get a bunch of stuff. 00:00:07.760 |
But a lot of it is just a lot of questions that I've been getting over the last 00:00:12.560 |
Just about AI and now about prompting and everything else. 00:00:15.160 |
Because a lot of the work that we now do with Gen AI kind of requires that you 00:00:20.920 |
So this is just a collection of all of those things, and 00:00:24.120 |
Just tiny bit about me, I run a company called Grey Wing here in Singapore. 00:00:30.960 |
We do a ton of AI work, we do a ton of data work. 00:00:34.000 |
When LLMs came up, we started looking at them as NLP effectively on steroids. 00:00:39.600 |
Because it was a suddenly all of the English language and people's messages, 00:00:43.920 |
emails were all accessible, so we started working there. 00:00:50.400 |
which was a comms automation tool, that's doing pretty well. 00:00:56.280 |
Far before OpenAI had tool usage or any of those things, 00:01:04.000 |
We do a ton of work with multi-modal RAG, which is visual RAG. 00:01:07.760 |
So being able to process sort of complex information, how to visually index it, and 00:01:11.640 |
sort of how to use that to answer mission critical questions. 00:01:14.120 |
So that is, for example, a DRAM data sheet from Samsung. 00:01:19.040 |
And in shipping, all of these data sets are really important, and 00:01:22.200 |
the importance of the output is also very high. 00:01:25.400 |
So this is just to say, we've done quite a bit of work across different parts of AI, 00:01:32.280 |
Also, I'll put up little QR codes just in lieu of links. 00:01:37.480 |
So if you see something and you're like, I wanna know a little bit more, 00:01:45.080 |
everything today is gonna be about going from, let's say, 0 to 0.1, right? 00:01:51.160 |
So this is just an open source project that I recently released. 00:01:53.480 |
I'll use that as an example later on, how to go from script to project to release. 00:02:01.840 |
So we had a docs problem, everybody's got a docs problem. 00:02:07.040 |
We'd have tons of meetings, repeated meetings about the same things, 00:02:12.160 |
So now we have Whisper, we have all of these things accessible. 00:02:15.520 |
Can we just make docs out of things we've explained ten times before? 00:02:21.160 |
This has already had, I don't know, about 50 or 00:02:32.040 |
So the first kind of thing I wanna talk about really, and this is, 00:02:36.240 |
it took me a long time to discover and it made a huge difference, 00:02:41.080 |
And when I say iterative loop, I mean what is your process to build stuff, right? 00:02:45.360 |
So long time ago, if you were coding that long ago, we had write compile run, right? 00:02:50.640 |
We basically would write code, it would take a long time to compile, 00:02:53.280 |
people still working on Xcode would have that same problem today. 00:02:56.520 |
We'd run it, we'd go back, write, compile, and run. 00:02:59.560 |
And then we had sort of interpretive languages come along, and 00:03:01.840 |
then we now have the REPL, which is effectively the read eval print loop, 00:03:05.560 |
right, which is you write code, runs instantly, you keep changing it, 00:03:14.120 |
I don't fully agree with these guys, but they had a good point. 00:03:16.840 |
You have test driven development, test build, test, that kind of thing. 00:03:19.560 |
But I think AI really, because the way these models work is not deterministic, 00:03:25.720 |
and prompting can feel kind of code-like, and 00:03:28.720 |
working with prompts can feel kind of code-like, but 00:03:33.160 |
So the pattern that me and my team, and a lot of other people that I know, 00:03:36.200 |
have fallen into is what I'm calling CPLN, so we'll see what that means. 00:03:44.520 |
Whatever your problem is, whatever you're trying to do, 00:03:47.000 |
just chat with models, make more and more and more examples, and 00:03:50.680 |
just keep changing prompts, keep changing what you're doing, right? 00:03:54.120 |
A lot of people, and myself included, get into this habit, 00:03:58.840 |
sort of building things, of doing it once, and it sort of works. 00:04:02.440 |
And then forever, you're iterating on that particular prompt. 00:04:05.720 |
You make very small changes, you keep fixing things, and 00:04:09.960 |
But I think, really, where you should be spending most of your time, really, 00:04:12.680 |
is just changing and finding new approaches to solve things. 00:04:18.680 |
You wouldn't write something with the intention of rewriting it seven times 00:04:23.760 |
You'd write something, something'd be broken, you'd fix that broken thing, 00:04:27.080 |
you'd fix the next broken thing, and then you'd be done. 00:04:33.000 |
And I'm still surprised, and I still talk to the people on my team, 00:04:35.840 |
me included, have this problem where we just don't play. 00:04:42.320 |
Once we get to a system that sort of gets to like 40%, we're like, okay, 00:04:51.320 |
The next one's just take whatever you've learned, go to the playground. 00:04:55.240 |
There's a lot of tools, and some of them are really good. 00:04:57.400 |
But in most cases, 90% of what you want is still just in the playground. 00:05:03.200 |
is the ability to retroactively edit prompts and conversation histories. 00:05:06.920 |
Some people call it surfing the latent space and sort of make changes, right? 00:05:10.120 |
So this is where you'd spend maybe 20% of your time. 00:05:12.560 |
Once you've got that working, right, let's say you've got one of it working. 00:05:17.080 |
The next, in most cases, and this is just examples from Lamentis, is loop, right? 00:05:22.640 |
Add more data, add more test cases, a lot more. 00:05:25.480 |
See how solid your hypotheses were when you started, right? 00:05:31.320 |
Once you're done with that, right, nest, right? 00:05:38.280 |
you've got a general sense of the approach you want to take, 99.999% of the time. 00:05:43.840 |
And I almost dared a few people to do it, and I so 00:05:45.880 |
far haven't seen a single prompt or a single approach where it couldn't be nested. 00:05:50.160 |
And by that I mean effectively break the prompt, 00:05:53.560 |
break the work you're doing into smaller and smaller and smaller subsegments. 00:05:58.280 |
But if you're not going to accept a 700 line code file as good, 00:06:03.280 |
you shouldn't accept a 100 line prompt as good, right? 00:06:08.200 |
It could always be made simpler, it could always be broken down. 00:06:11.400 |
So really, you just want to keep doing that, right? 00:06:13.720 |
And once you've gotten this far, and luck would have it, if you go to production and 00:06:20.120 |
And they can go through the exact same loop, right? 00:06:22.320 |
You run into a problem, you've got a new customer with new kind of data, 00:06:25.320 |
new problems, new things, you want to go back to the original loop. 00:06:27.880 |
So this is kind of where you want to be spending, or 00:06:32.840 |
where I find the best division of your time being, right? 00:06:37.320 |
This entire blue segment is just try new approaches, right? 00:06:40.840 |
Because these models, they've been around for about a year, but they are so new. 00:06:44.880 |
And we are still finding new ways to use them. 00:06:48.040 |
you might be the first person in the world to have tried it. 00:06:50.320 |
You might genuinely be the first person to have thought of 00:06:53.080 |
that particular way of solving a problem with a model, right? 00:06:58.280 |
And you probably want to spend about 20% of your time tuning the prompts. 00:07:03.560 |
Because I'm presuming that the people you work with, and 00:07:05.720 |
the things you're building, that you guys are good at coding. 00:07:08.120 |
If you're not, there's tons of ways to get better at it. 00:07:17.920 |
Ideally, see if you can change the size and shape of that data. 00:07:29.320 |
The first one, and I still have this problem. 00:07:32.520 |
Although, it's wonderful to know that someone in this room has solved my 00:07:35.440 |
biggest problem, which is diarization, which is awesome. 00:07:42.400 |
I think a lot of people kind of forgot that when we got ChatGPT, and 00:07:46.360 |
in short order, we also got audio, we got vision, right? 00:07:50.720 |
And we got speech to text, and all of these different modalities. 00:07:56.080 |
you can transform it into so many things, right? 00:07:59.480 |
transform that into code to get a more structured representation. 00:08:02.480 |
You can get structured data, you can do language transformations. 00:08:05.120 |
So use all of the tools that you've got, right? 00:08:10.800 |
So speech, for example, here's where you'd use each one. 00:08:15.200 |
If you've got anything dealing with users, we love to talk, right? 00:08:18.360 |
This entire talk is probably gonna be, I don't know, I'm hoping not, 00:08:23.400 |
If you ask me to type out 8,000 words, it would take me far longer. 00:08:27.160 |
I would be far less likely to do it, and I'd probably tell you no, right? 00:08:30.960 |
If you present the users with a text box, they'll give you five words. 00:08:34.040 |
If you ask them to just press a button and talk, 00:08:37.760 |
And these models, the things that we work with, they love context. 00:08:40.880 |
The more context you can provide, the better. 00:08:47.880 |
There's a lot of relationships that you can capture 00:08:55.240 |
Anytime you write as a person, you wanna put pictures in for the same reason, 00:08:59.240 |
So all of that can be captured, and now we're getting smarter and 00:09:01.520 |
smarter and smarter models that can understand that information. 00:09:04.720 |
You can use it as really expensive OCR if you want to, we do in some places. 00:09:09.240 |
But in a lot of cases, it's also far more dense, right? 00:09:12.720 |
Even if the diagram on the top right, top left, 00:09:16.040 |
that's an actual diagram that we use, by the way, were to be represented in text, 00:09:19.600 |
that would be far more token heavy than that picture, right, can encode. 00:09:23.400 |
Code is awesome for structure, both for input and output. 00:09:30.640 |
Like you almost always wanna be using structure both on the input and 00:09:37.000 |
Structure your input whenever possible, right? 00:09:39.760 |
Almost everything humans ever touch usually has some structure, right? 00:09:45.760 |
When you write a paragraph, there's a topic sentence. 00:09:47.720 |
Everything humans ever do usually has structure. 00:09:50.480 |
And if you're leaving it out, if you're not extracting it, 00:09:58.200 |
So we use TypeScript and Zod to build type specs, and 00:10:01.440 |
that makes it so much easier to steer these models. 00:10:03.600 |
We use SQL when we wanna express something as a search query. 00:10:06.200 |
Even if we never run that SQL, it helps the model think, 00:10:09.200 |
it helps the AI system sort of better guide these things. 00:10:11.440 |
Yeah, same thing here, use structured output as often as you can, 00:10:20.600 |
because you've got a type spec on the inside. 00:10:22.680 |
And structured output usually constrains the output that's coming out of it, 00:10:28.160 |
Sorry, far fewer hallucinations with structured output, right? 00:10:31.960 |
And I can talk about that more if we have time at the end, but 00:10:34.240 |
it usually has to do with token probabilities and the output set. 00:10:36.920 |
The same thing is, again, use as much as you can, 00:10:42.360 |
cuz you got this massive model for free, right? 00:10:45.600 |
Commoditized down, and you got this massive model that had 2 trillion, 00:10:49.520 |
3 trillion tokens thrown in about human information into it, right? 00:10:53.760 |
Use that as much as you can, lean into it, right? 00:10:57.040 |
There's a lot of libraries, let's say projects that I've either consulted or 00:11:01.280 |
advised with, where they're inventing their own DSLs. 00:11:03.960 |
They're inventing their own languages to express what they want. 00:11:06.520 |
When ideally, if they expressed it as a super set of something that existed, 00:11:10.160 |
say TypeScript, Python, English, Hindi, whatever's in there, 00:11:23.880 |
None of these are hard rules, but they're general rules of thumb, 00:11:30.080 |
In AI, I mean, this is a meme at this point, but we are still very, 00:11:34.720 |
This is not very early days of development or very early days of design. 00:11:39.360 |
Like, if you wanted to get into design, and you wanted to be a good painter or 00:11:43.280 |
a good designer, you wouldn't use Dolly, right? 00:11:46.520 |
You wouldn't add an abstraction between you and the thing. 00:11:51.720 |
You actually want that harder knowledge of how these things work, 00:11:55.120 |
how they behave, what the actual nature of these things are. 00:11:58.560 |
The more abstractions and toolkits and libraries you put between yourself and 00:12:02.280 |
the model, when you're developing, the less you learn, right? 00:12:07.440 |
But that's also a problem, because they're really good, and 00:12:10.280 |
they have this little circle of things that they do really well. 00:12:13.320 |
And very quickly, if you're lucky, somewhat slower, 00:12:17.200 |
you'll want to step out of it, and then it's just a wasteland, right? 00:12:22.040 |
If you've ever built something with WordPress or Squarespace and 00:12:24.680 |
then just wanted to do one thing that it didn't do, 00:12:29.440 |
That's impossible, everything will fight you. 00:12:33.640 |
I know it can be, especially people with a coding background, 00:12:36.360 |
kind of sometimes I've seen want to distance themselves from prompting, 00:12:39.120 |
distance themselves from the non-deterministic nature of these things. 00:12:46.440 |
The next one is also, I know we've got credits to OpenAI, but 00:12:53.040 |
Everyone wants to give you free credits these days if you're a provider. 00:13:01.000 |
They were all kind of similar when they came out, 00:13:03.400 |
because everyone was working with the same information set. 00:13:08.000 |
They're all practically different people, right? 00:13:09.720 |
It's almost like if you gave some work to someone on your team and 00:13:13.680 |
they couldn't do it, you wouldn't go, this is undoable. 00:13:16.160 |
You'd probably give it to someone else, right? 00:13:23.080 |
There's even different personalities in there. 00:13:24.520 |
This one is kind of easy to keep track of, right? 00:13:29.440 |
Basically, have a general rule of thumb that your outputs are not gonna be that 00:13:37.240 |
Again, rule of thumb, that's not gonna end up well, right? 00:13:40.720 |
If you're looking to generate, let's say, 20 paragraphs of an article 00:13:44.640 |
from five words of input, you're usually just gonna get very generic, 00:13:50.200 |
So try and keep those ratios relatively the same if you can, right? 00:13:53.720 |
Cool, some smaller FAQs, because these questions get asked a lot, right? 00:14:02.640 |
So agents, a lot of people have asked me about agents. 00:14:05.920 |
The simple answer there is anything with looping and 00:14:08.200 |
termination is usually considered an agent, right? 00:14:11.000 |
So anytime you've got a system and it basically loops on the same prompt or 00:14:14.600 |
some set of prompts, and it basically has the ability to continue execution and 00:14:18.240 |
then decide when it wants to stop, that's usually an agent. 00:14:25.280 |
When you run into problems or when you start working on a project, or 00:14:28.040 |
you're just looking for a project to work on, 00:14:29.680 |
it's useful to know what capabilities just got added to the tool set, right? 00:14:38.120 |
If you've done NLP or anything close to it, it just got way better, right? 00:14:41.880 |
We can classify documents, we can classify information all sorts of ways, 00:14:47.240 |
we can do all sorts of things with them that previously NLP really couldn't do. 00:14:51.080 |
The second one's filtering and extraction, right? 00:14:58.000 |
So anytime you've got rags, summarization, that's a transformation, right? 00:15:01.440 |
If you're doing code generation, a lot of cases, that's transformation. 00:15:04.680 |
If you're doing translation, that's transformation, right? 00:15:07.240 |
So oftentimes it's useful to look at your problem, right, in an industry or 00:15:11.080 |
your problem set in front of you, or you're just looking for ideas. 00:15:13.400 |
If you look for one of these four things, if you look for 00:15:15.720 |
one of these four classes, it's an easier way to structure, 00:15:17.880 |
maybe that's where you wanna go, instead of where to put things. 00:15:21.440 |
The final one, and I think some people are using it, but 00:15:23.760 |
I've seen that use case sort of go down for some reason. 00:15:27.640 |
You want it to write things no one's ever written before. 00:15:32.240 |
So some resources, I'm not gonna be talking about prompting, 00:15:42.080 |
If you don't like me, the top of it has people that I respect that are far 00:15:46.520 |
smarter than me, so click the links and go there and read those. 00:15:48.800 |
Cool, the next one, and this might be the final one, is debugging, right? 00:16:00.360 |
I don't think I've heard that many people talk about. 00:16:02.280 |
I mean, among people who work with AI, this is a massive conversation, right? 00:16:07.880 |
Because the sort of curse and sort of the benefit that we got with modern AI 00:16:12.960 |
things is that it's very easy to build a demo. 00:16:14.960 |
It's very easy to get to something that sort of works, but 00:16:17.480 |
it's very hard to debug things when they go wrong, right? 00:16:25.400 |
If nothing works, right, always go down to the prompt level. 00:16:29.320 |
And if you can't, then get rid of your abstractions and work up from there, 00:16:34.960 |
Try going up a level of intelligence and see if it fixes it. 00:16:37.520 |
That should tell you where your problems are. 00:16:39.320 |
Or try going down a level of intelligence and see what happens. 00:16:45.360 |
In most cases, it's your input that's the issue. 00:16:47.880 |
Either it's too verbose, it's not the right transformation, 00:16:51.800 |
So any transformations you can do on the input is gonna make a massive difference, 00:16:56.200 |
And finally, if you're not doing this already, 00:16:59.400 |
More structure is gonna help you point out where your problems are. 00:17:02.120 |
More structure is gonna tell you, sort of expose some of the big issues there. 00:17:05.440 |
Okay, so this doesn't usually happen to people. 00:17:15.800 |
It's kind of working, and I can spend another two weeks on it, and 00:17:19.040 |
it'll get a bit further down the line of kind of working. 00:17:26.520 |
In most cases, you wanna find out what separates your offensive data, 00:17:30.280 |
which is where it doesn't work, to the stuff that does work, right? 00:17:34.720 |
One of those is gonna point to some sort of difference between the stuff that works 00:17:45.000 |
And then we saw the classification before, right? 00:17:47.600 |
If you're trying to do more than one of those things inside the same system, 00:17:50.640 |
inside the same, with the same model, usually separate it out, right? 00:18:01.440 |
Most errors I've seen sort of fall into these three issues. 00:18:05.200 |
You've either got app level issues in terms of how that data's being fed in and 00:18:08.560 |
fed out, and how models are orchestrated once things get too large. 00:18:14.560 |
It's just making things up that don't exist, or 00:18:17.240 |
it's just giving you information that it really shouldn't, or 00:18:24.240 |
Is it just not listening to the specific instructions that you're giving it, right? 00:18:27.880 |
And this is at the model level, but it happens at the meta level as well. 00:18:31.320 |
Even if you're working with, say, three models and 300 prompts, 00:18:39.760 |
The first one is, whatever you're doing, right? 00:18:45.360 |
Whatever you're doing as far as prompting and working with models go, 00:18:50.040 |
Because in most cases, it's English, and once we start adding things, 00:18:54.120 |
So you get to this sort of Pareto level of, it works, but it just doesn't. 00:19:01.120 |
Cut them down, there's usually space to cut them down, cut them again. 00:19:04.440 |
The lower your task complexity per prompt, or per task, or 00:19:10.800 |
The easier it is to debug, the easier it is for you to have things with defined 00:19:14.520 |
blast radiuses, where if something goes wrong, you can swap it out and fix it. 00:19:18.600 |
Otherwise, if something goes wrong some day, you're gonna have a problem. 00:19:27.280 |
So this is just an example of that particular project that I mentioned 00:19:32.000 |
So it started with just a specific issue, honestly, it wasn't even me. 00:19:37.160 |
It was Hibi, who's actually here, who had a transcript for me. 00:19:39.920 |
And she was like, okay, can we make docs out of this, right? 00:19:45.400 |
There was a lot of trying to figure out what we can pull out, 00:19:51.100 |
You're trying to see if this can even be done. 00:19:52.560 |
You're just testing very high level hypothesis, right? 00:19:55.240 |
Some of the things I tested were sort of trying to pull out structure directly. 00:19:58.320 |
Some of the other ones were trying to classify that data before pulling out 00:20:01.120 |
structure. You learn just a lot about what it is. 00:20:03.880 |
You figure out where you wanna put the transcript, 00:20:07.800 |
All of that you can learn from just talking, right? 00:20:11.320 |
The next one is talk, but then start changing things, right? 00:20:18.540 |
And once you're done with that, the entire thing, and 00:20:22.460 |
this actually worked, was just this one script, right? 00:20:31.220 |
And really all it did was just loop twice over everything, and 00:20:36.220 |
then break it down into sections and use different models to write different things, 00:20:39.180 |
right, so there's one model, you know, that's generating the structure. 00:20:42.240 |
There's another model that's actually doing the long form writing. 00:20:44.880 |
And then the final one is just breaking it down into smaller and 00:20:50.080 |
So if you look in the repo, it's still not that big, right? 00:20:57.840 |
All of that stuff can go in after, like you've proven the thesis. 00:21:08.740 |
is a lot of people I speak to are still very concerned about cost, right? 00:21:12.180 |
I don't know how many of you guys watched the NVIDIA keynote that happened a couple 00:21:16.860 |
of days ago, but long story short, everything you're using now is gonna get 00:21:22.220 |
at least 10x, if not 50x cheaper, in very short order, right? 00:21:26.900 |
It's gonna get 10x, if not 50x faster, in very short order. 00:21:31.060 |
So what would you build if you were building for, say, six months from now, or 00:21:34.980 |
what would you make if you just presumed that today, right? 00:21:38.720 |
And it's a different way of working with these things. 00:21:42.360 |
that's a different system than if it costs one cent, right? 00:21:45.440 |
If something takes an hour, that's different from if it takes six minutes, 00:21:49.520 |
right, so I would say this is a valid presumption to make, right, 00:21:54.800 |
Is what more can you do if you just presume that about the future? 00:21:59.040 |
Because we still haven't even gotten hardware level optimizations. 00:22:04.680 |
Memory level optimizations, again, still coming up. 00:22:10.800 |
So all of these things are almost being done now. 00:22:13.480 |
And they're very comparatively easy, engineering-wise. 00:22:16.640 |
It's just incremental optimization to get there. 00:22:21.160 |
Feel free to find me after, or just reach out on Twitter, I'm happy to help. 00:22:36.400 |
We got like, what do you think about long context window model and 00:22:45.200 |
Because I might say something where I don't know what I'm talking about. 00:22:48.680 |
That said, this has been my question as well. 00:22:51.080 |
The problem with context windows is our algorithm for 00:22:56.160 |
What I mean by that, it scales exponentially to get twice as much 00:23:02.000 |
you've got to spend four times the amount of memory and compute. 00:23:06.280 |
There's no way to, we still don't know a good way to get around it, right? 00:23:09.920 |
So what that means effectively is to get really long context windows, 00:23:15.280 |
You effectively have to say, okay, I'm going to have something before I run 00:23:18.600 |
the model that's going to kind of figure out which part of the context to 00:23:22.560 |
So you don't actually get the full context window, right? 00:23:25.120 |
You kind of do, but if you take the full context window and 00:23:27.800 |
you're trying to use every single token in it to compute an answer, 00:23:32.140 |
So that is still very much a problem that could be solved. 00:23:36.080 |
I think it's an open problem that could be solved tonight by someone that's 00:23:39.280 |
working somewhere or ten years from now, we just don't know, right? 00:23:41.760 |
>> You've mentioned a bunch about transforming the input. 00:23:50.080 |
Do you use AI to transform, is your input global? 00:23:53.920 |
>> In most cases, yes, you're going to be using AI to transform it. 00:23:56.400 |
But there's also just a ton of structured stuff you can do, right, very easily. 00:23:59.640 |
Like most documents, let's say I've got a PDF, or I've got, let's say these slides, 00:24:04.160 |
or I've got one of my documents is in Markdown. 00:24:07.000 |
There's a ton of structure in there you can just grep for, right? 00:24:09.800 |
because I can very quickly figure out what the sections are. 00:24:14.440 |
That's all stuff that you can do today, right? 00:24:16.600 |
So even just knowing that that's got 300 sentences in it, 00:24:19.640 |
that's a transformation of the input that is valuable, super valuable, right? 00:24:24.040 |
Because we can make assumptions already, right? 00:24:26.400 |
If I give you a document that someone's written, 00:24:28.480 |
I can presume that the title is probably the highest compressed information in 00:24:35.160 |
I can presume that the first section will have some sort of intro 00:24:39.960 |
Those are all transformations, but yes, usually you use AI. 00:24:44.600 |
So how do you think about [INAUDIBLE] >> I actually haven't used Devin. 00:24:53.360 |
I just have not had the time, but I've had people tell me that it's good. 00:24:56.600 |
Look, coding is going to be where these models make just a massive, 00:25:00.480 |
I already use a cursor which can understand just a massive amount of 00:25:07.880 |
It has been six months since I wrote any code that wasn't at least partially AI 00:25:11.760 |
generated, so it's just going to keep getting bigger and bigger and bigger. 00:25:16.120 |
That said, I will say the time that most devs that I know and 00:25:20.400 |
most companies that I know spend is in business logic, maintenance, and 00:25:25.400 |
sort of really trying to transform customer input to really massive systems 00:25:31.400 |
with a ton of legacy code, like we're a long way away from that, right? 00:25:35.040 |
What I mean is it's getting easier and easier for 00:25:37.600 |
you to spin up a more and more and more complex project from scratch, right? 00:25:43.000 |
But the massive dev work that sort of sits, kind of sits past that, right? 00:25:57.200 |
because that's where the money is in a lot of ways, 00:25:58.640 |
because that's where most enterprises are, right? 00:26:00.480 |
If you look at SAP, or you look at most of these guys, 00:26:07.920 |
they're still having trouble getting it to work with very large code bases, right? 00:26:12.760 |
Like, let's say anything above a 50% company that's existed for 00:26:17.520 |
That code base, so far, AI hasn't been able to touch, right? 00:26:24.840 |
So I recently read a, I wouldn't say read the paper, I read the abstract, right? 00:26:29.920 |
So where it was like, I think from Amazon or from somewhere, or 00:26:33.440 |
Netflix perhaps, that getting cosine similarities between embeddings, 00:26:39.080 |
it's not really a good measure for getting the meaning of things, right? 00:26:46.600 |
And also when we do vector searches and just try to pull relevant information, 00:26:56.800 |
I'm trying to figure out what am I doing wrong, how to do it better. 00:27:01.960 |
I watched Jerry Hill's talk from our index, it was an 18-minute talk or something. 00:27:07.720 |
It's a very nice talk, but it just kind of flew over. 00:27:12.960 |
>> I think the problem here is embeddings are sort of fuzzy search on steroids. 00:27:19.440 |
If you're using them for anything more, I think even today you have a problem, 00:27:26.760 |
One, these are really tiny models, comparatively, right? 00:27:30.680 |
Big brain, small brain, tiny brain, these are really tiny models. 00:27:33.800 |
In most cases, they don't have a good understanding of the underlying text. 00:27:36.800 |
That's why long context embeddings never made sense, right? 00:27:40.040 |
The longer the context, it just doesn't really make sense. 00:27:43.200 |
Not to mention, in most cases, that's a transformation of the input, right? 00:27:45.880 |
What Hebe was saying, that's a transformation of the input, 00:27:49.440 |
Well, you're transforming it into a space where it's a lot harder for 00:27:55.320 |
And now the only thing you have is cosine similarity. 00:27:57.960 |
You can have a bias matrix, you can push that math a little bit more. 00:28:01.640 |
But because that model is unknown to you, the model's workings are unknown to you, 00:28:06.040 |
those are forever gonna be a bunch of numbers, right? 00:28:14.840 |
What is becoming very possible now, that I see a lot of companies switching to, 00:28:19.200 |
is just use the whole brain, use the LLM, right? 00:28:23.720 |
Whatever you're using embeddings for, you can use an LLM, right? 00:28:32.440 |
Like let's say you're using, I'll give you the most brute force example of this. 00:28:37.320 |
Let's say using embeddings to take 100,000 items and 00:28:40.400 |
see which ones are similar or which ones closest to your query. 00:28:43.160 |
You can take an LLM, run it through every single one of those documents and ask, 00:28:46.360 |
hey, is this close, is this close, is this close? 00:28:49.760 |
That is not a good way to do it, do not do it this way. 00:28:53.720 |
So they are kind of, you can substitute one for the other just a little bit. 00:28:59.080 |
But they should always be the last step in your pipeline. 00:29:01.920 |
You should cut down the search space as much as possible with structured search, 00:29:06.520 |
transformations, it's a BM25, there's a bunch of stuff you can do, right? 00:29:11.280 |
You should never be searching your search space with embeddings, right? 00:29:14.600 |
You should always be searching some reduced search space where, hey, 00:29:17.680 |
last 20 things, and I know these are relevant because keywords. 00:29:23.800 |
I know these are relevant because an LLM told me after transformation, whatever. 00:29:30.160 |
But if you embed at the beginning, in most cases, it just doesn't work at scale. 00:29:34.360 |
>> So it's more like to get the results and then sort it. 00:29:40.120 |
>> More like to get the results, and yes, kind of to sort it, but 00:29:43.200 |
kind of also to identify useful parts of those results. 00:29:46.400 |
Let's say the results you got were pages, but you want sentences, right? 00:29:49.600 |
You wanna know which part of it is heat map wise the most important.