back to index

LLMs: A Hackers Guide


Whisper Transcript | Transcript Only Page

00:00:00.000 | So this is just meant to be relatively casual.
00:00:02.360 | I didn't really prep for this.
00:00:03.760 | This is just a template and through to get a bunch of stuff.
00:00:07.760 | But a lot of it is just a lot of questions that I've been getting over the last
00:00:11.160 | couple of years.
00:00:12.560 | Just about AI and now about prompting and everything else.
00:00:15.160 | Because a lot of the work that we now do with Gen AI kind of requires that you
00:00:18.680 | think about things differently, right?
00:00:20.920 | So this is just a collection of all of those things, and
00:00:22.920 | as we go through them, we'll see.
00:00:24.120 | Just tiny bit about me, I run a company called Grey Wing here in Singapore.
00:00:28.840 | We're focused on commercial shipping.
00:00:30.960 | We do a ton of AI work, we do a ton of data work.
00:00:34.000 | When LLMs came up, we started looking at them as NLP effectively on steroids.
00:00:39.600 | Because it was a suddenly all of the English language and people's messages,
00:00:43.920 | emails were all accessible, so we started working there.
00:00:46.360 | Out of that work came that initial launch,
00:00:50.400 | which was a comms automation tool, that's doing pretty well.
00:00:54.120 | Another one was an assistant of our own.
00:00:56.280 | Far before OpenAI had tool usage or any of those things,
00:00:59.720 | we were doing charts and a bunch of things.
00:01:01.240 | Then came sort of RAG.
00:01:04.000 | We do a ton of work with multi-modal RAG, which is visual RAG.
00:01:07.760 | So being able to process sort of complex information, how to visually index it, and
00:01:11.640 | sort of how to use that to answer mission critical questions.
00:01:14.120 | So that is, for example, a DRAM data sheet from Samsung.
00:01:17.640 | So they were testing the product.
00:01:19.040 | And in shipping, all of these data sets are really important, and
00:01:22.200 | the importance of the output is also very high.
00:01:25.400 | So this is just to say, we've done quite a bit of work across different parts of AI,
00:01:30.480 | and this is just everything we've learned.
00:01:32.280 | Also, I'll put up little QR codes just in lieu of links.
00:01:36.360 | The slides are gonna be up later.
00:01:37.480 | So if you see something and you're like, I wanna know a little bit more,
00:01:39.800 | there's gonna be a QR code.
00:01:40.760 | So this is, in the interest of hacking,
00:01:45.080 | everything today is gonna be about going from, let's say, 0 to 0.1, right?
00:01:49.720 | Where do you start?
00:01:50.320 | How do you start?
00:01:51.160 | So this is just an open source project that I recently released.
00:01:53.480 | I'll use that as an example later on, how to go from script to project to release.
00:01:59.680 | This is effectively a tool to generate docs.
00:02:01.840 | So we had a docs problem, everybody's got a docs problem.
00:02:04.480 | But what we did have is a ton of meetings.
00:02:07.040 | We'd have tons of meetings, repeated meetings about the same things,
00:02:10.720 | explaining the same things.
00:02:12.160 | So now we have Whisper, we have all of these things accessible.
00:02:15.520 | Can we just make docs out of things we've explained ten times before?
00:02:19.440 | So this was launched two days ago.
00:02:21.160 | This has already had, I don't know, about 50 or
00:02:23.760 | 60 projects already make docs using it.
00:02:26.400 | Cuz it turns out, people, it's easy to talk.
00:02:28.240 | It's very hard to write docs, right?
00:02:30.920 | So that's just an example.
00:02:32.040 | So the first kind of thing I wanna talk about really, and this is,
00:02:36.240 | it took me a long time to discover and it made a huge difference,
00:02:38.880 | is just the iterative loop, right?
00:02:41.080 | And when I say iterative loop, I mean what is your process to build stuff, right?
00:02:45.360 | So long time ago, if you were coding that long ago, we had write compile run, right?
00:02:50.640 | We basically would write code, it would take a long time to compile,
00:02:53.280 | people still working on Xcode would have that same problem today.
00:02:56.520 | We'd run it, we'd go back, write, compile, and run.
00:02:59.560 | And then we had sort of interpretive languages come along, and
00:03:01.840 | then we now have the REPL, which is effectively the read eval print loop,
00:03:05.560 | right, which is you write code, runs instantly, you keep changing it,
00:03:09.320 | you keep changing it, something happens.
00:03:10.360 | And then a bunch of guys came along.
00:03:14.120 | I don't fully agree with these guys, but they had a good point.
00:03:16.840 | You have test driven development, test build, test, that kind of thing.
00:03:19.560 | But I think AI really, because the way these models work is not deterministic,
00:03:25.720 | and prompting can feel kind of code-like, and
00:03:28.720 | working with prompts can feel kind of code-like, but
00:03:30.360 | they're really the furthest thing.
00:03:31.760 | So they really need new patterns.
00:03:33.160 | So the pattern that me and my team, and a lot of other people that I know,
00:03:36.200 | have fallen into is what I'm calling CPLN, so we'll see what that means.
00:03:40.600 | So the first one's just chat, right?
00:03:44.520 | Whatever your problem is, whatever you're trying to do,
00:03:47.000 | just chat with models, make more and more and more examples, and
00:03:50.680 | just keep changing prompts, keep changing what you're doing, right?
00:03:54.120 | A lot of people, and myself included, get into this habit,
00:03:56.920 | because we have that habit from coding, and
00:03:58.840 | sort of building things, of doing it once, and it sort of works.
00:04:02.440 | And then forever, you're iterating on that particular prompt.
00:04:05.720 | You make very small changes, you keep fixing things, and
00:04:08.600 | you keep fixing things.
00:04:09.960 | But I think, really, where you should be spending most of your time, really,
00:04:12.680 | is just changing and finding new approaches to solve things.
00:04:15.560 | Cuz that makes a huge, huge difference.
00:04:17.280 | And that's unheard of in code, right?
00:04:18.680 | You wouldn't write something with the intention of rewriting it seven times
00:04:22.200 | before you got to the end, right?
00:04:23.760 | You'd write something, something'd be broken, you'd fix that broken thing,
00:04:27.080 | you'd fix the next broken thing, and then you'd be done.
00:04:30.040 | So the most important thing is just chat.
00:04:33.000 | And I'm still surprised, and I still talk to the people on my team,
00:04:35.840 | me included, have this problem where we just don't play.
00:04:39.800 | We don't chat enough, really, right?
00:04:42.320 | Once we get to a system that sort of gets to like 40%, we're like, okay,
00:04:45.840 | we're now going to production.
00:04:47.400 | We're just, this is almost good.
00:04:49.240 | And so that's a big problem, right?
00:04:51.320 | The next one's just take whatever you've learned, go to the playground.
00:04:55.240 | There's a lot of tools, and some of them are really good.
00:04:57.400 | But in most cases, 90% of what you want is still just in the playground.
00:05:00.840 | Everyone's got a playground.
00:05:02.120 | All you're really looking for
00:05:03.200 | is the ability to retroactively edit prompts and conversation histories.
00:05:06.920 | Some people call it surfing the latent space and sort of make changes, right?
00:05:10.120 | So this is where you'd spend maybe 20% of your time.
00:05:12.560 | Once you've got that working, right, let's say you've got one of it working.
00:05:17.080 | The next, in most cases, and this is just examples from Lamentis, is loop, right?
00:05:22.640 | Add more data, add more test cases, a lot more.
00:05:25.480 | See how solid your hypotheses were when you started, right?
00:05:29.200 | And always reset if it doesn't work.
00:05:31.320 | Once you're done with that, right, nest, right?
00:05:37.320 | Once you're done with that,
00:05:38.280 | you've got a general sense of the approach you want to take, 99.999% of the time.
00:05:43.840 | And I almost dared a few people to do it, and I so
00:05:45.880 | far haven't seen a single prompt or a single approach where it couldn't be nested.
00:05:50.160 | And by that I mean effectively break the prompt,
00:05:53.560 | break the work you're doing into smaller and smaller and smaller subsegments.
00:05:56.680 | We'll go into it later, right?
00:05:58.280 | But if you're not going to accept a 700 line code file as good,
00:06:03.280 | you shouldn't accept a 100 line prompt as good, right?
00:06:06.560 | Or a 50 line prompt as good.
00:06:08.200 | It could always be made simpler, it could always be broken down.
00:06:11.400 | So really, you just want to keep doing that, right?
00:06:13.720 | And once you've gotten this far, and luck would have it, if you go to production and
00:06:17.720 | you've got users, you've got subtasks now.
00:06:20.120 | And they can go through the exact same loop, right?
00:06:22.320 | You run into a problem, you've got a new customer with new kind of data,
00:06:25.320 | new problems, new things, you want to go back to the original loop.
00:06:27.880 | So this is kind of where you want to be spending, or
00:06:32.840 | where I find the best division of your time being, right?
00:06:37.320 | This entire blue segment is just try new approaches, right?
00:06:40.840 | Because these models, they've been around for about a year, but they are so new.
00:06:44.880 | And we are still finding new ways to use them.
00:06:47.080 | That you might try something,
00:06:48.040 | you might be the first person in the world to have tried it.
00:06:50.320 | You might genuinely be the first person to have thought of
00:06:53.080 | that particular way of solving a problem with a model, right?
00:06:55.880 | So I really can't emphasize that enough.
00:06:58.280 | And you probably want to spend about 20% of your time tuning the prompts.
00:07:01.160 | Almost everything usually is a prompt issue.
00:07:03.560 | Because I'm presuming that the people you work with, and
00:07:05.720 | the things you're building, that you guys are good at coding.
00:07:08.120 | If you're not, there's tons of ways to get better at it.
00:07:10.080 | Computers are really good at coding.
00:07:11.520 | So in most cases, it's your prompt, right?
00:07:13.920 | If it's not your prompt, it's your input.
00:07:16.280 | It's the data you're providing.
00:07:17.920 | Ideally, see if you can change the size and shape of that data.
00:07:21.160 | And in most cases, that fixes your problem.
00:07:22.800 | So a couple of do's and don'ts.
00:07:29.320 | The first one, and I still have this problem.
00:07:32.520 | Although, it's wonderful to know that someone in this room has solved my
00:07:35.440 | biggest problem, which is diarization, which is awesome.
00:07:38.600 | But really, just use all modalities, right?
00:07:42.400 | I think a lot of people kind of forgot that when we got ChatGPT, and
00:07:46.360 | in short order, we also got audio, we got vision, right?
00:07:50.720 | And we got speech to text, and all of these different modalities.
00:07:53.360 | And even just the input modality of text,
00:07:56.080 | you can transform it into so many things, right?
00:07:58.640 | You can take text and
00:07:59.480 | transform that into code to get a more structured representation.
00:08:02.480 | You can get structured data, you can do language transformations.
00:08:05.120 | So use all of the tools that you've got, right?
00:08:07.480 | So let's, yeah.
00:08:10.800 | So speech, for example, here's where you'd use each one.
00:08:13.880 | Speech is verbose.
00:08:15.200 | If you've got anything dealing with users, we love to talk, right?
00:08:18.360 | This entire talk is probably gonna be, I don't know, I'm hoping not,
00:08:21.880 | like about 8,000 words, right?
00:08:23.400 | If you ask me to type out 8,000 words, it would take me far longer.
00:08:27.160 | I would be far less likely to do it, and I'd probably tell you no, right?
00:08:30.960 | If you present the users with a text box, they'll give you five words.
00:08:34.040 | If you ask them to just press a button and talk,
00:08:35.880 | they'll give you 200 words, right?
00:08:37.760 | And these models, the things that we work with, they love context.
00:08:40.880 | The more context you can provide, the better.
00:08:42.400 | Vision is insanely useful, right?
00:08:47.880 | There's a lot of relationships that you can capture
00:08:50.680 | with a picture that you can't with text.
00:08:52.720 | Like we know this, right?
00:08:54.120 | It's 1,000 words.
00:08:55.240 | Anytime you write as a person, you wanna put pictures in for the same reason,
00:08:58.720 | right?
00:08:59.240 | So all of that can be captured, and now we're getting smarter and
00:09:01.520 | smarter and smarter models that can understand that information.
00:09:04.720 | You can use it as really expensive OCR if you want to, we do in some places.
00:09:09.240 | But in a lot of cases, it's also far more dense, right?
00:09:12.720 | Even if the diagram on the top right, top left,
00:09:16.040 | that's an actual diagram that we use, by the way, were to be represented in text,
00:09:19.600 | that would be far more token heavy than that picture, right, can encode.
00:09:23.400 | Code is awesome for structure, both for input and output.
00:09:30.640 | Like you almost always wanna be using structure both on the input and
00:09:34.240 | the output, right?
00:09:34.920 | Use structured output whenever possible.
00:09:37.000 | Structure your input whenever possible, right?
00:09:39.760 | Almost everything humans ever touch usually has some structure, right?
00:09:43.160 | Like when I talk, my talk has a structure.
00:09:45.760 | When you write a paragraph, there's a topic sentence.
00:09:47.720 | Everything humans ever do usually has structure.
00:09:50.480 | And if you're leaving it out, if you're not extracting it,
00:09:52.880 | it's a lot harder to control.
00:09:54.160 | Yeah, this is just stuff that we use, right?
00:09:58.200 | So we use TypeScript and Zod to build type specs, and
00:10:01.440 | that makes it so much easier to steer these models.
00:10:03.600 | We use SQL when we wanna express something as a search query.
00:10:06.200 | Even if we never run that SQL, it helps the model think,
00:10:09.200 | it helps the AI system sort of better guide these things.
00:10:11.440 | Yeah, same thing here, use structured output as often as you can,
00:10:17.600 | far easier to guide.
00:10:18.600 | It's also far less prone to hallucinations,
00:10:20.600 | because you've got a type spec on the inside.
00:10:22.680 | And structured output usually constrains the output that's coming out of it,
00:10:26.040 | that you see far fewer generations.
00:10:28.160 | Sorry, far fewer hallucinations with structured output, right?
00:10:31.960 | And I can talk about that more if we have time at the end, but
00:10:34.240 | it usually has to do with token probabilities and the output set.
00:10:36.920 | The same thing is, again, use as much as you can,
00:10:42.360 | cuz you got this massive model for free, right?
00:10:44.880 | Kind of, right?
00:10:45.600 | Commoditized down, and you got this massive model that had 2 trillion,
00:10:49.520 | 3 trillion tokens thrown in about human information into it, right?
00:10:53.760 | Use that as much as you can, lean into it, right?
00:10:57.040 | There's a lot of libraries, let's say projects that I've either consulted or
00:11:01.280 | advised with, where they're inventing their own DSLs.
00:11:03.960 | They're inventing their own languages to express what they want.
00:11:06.520 | When ideally, if they expressed it as a super set of something that existed,
00:11:10.160 | say TypeScript, Python, English, Hindi, whatever's in there,
00:11:15.080 | you'd get a lot more benefit out of that.
00:11:17.960 | Cool, so this is a bunch of don'ts.
00:11:23.880 | None of these are hard rules, but they're general rules of thumb,
00:11:27.600 | especially when you start out, right?
00:11:30.080 | In AI, I mean, this is a meme at this point, but we are still very,
00:11:33.560 | very early, right?
00:11:34.720 | This is not very early days of development or very early days of design.
00:11:39.360 | Like, if you wanted to get into design, and you wanted to be a good painter or
00:11:43.280 | a good designer, you wouldn't use Dolly, right?
00:11:46.520 | You wouldn't add an abstraction between you and the thing.
00:11:48.840 | You would learn how to paint, right?
00:11:50.200 | Because you want that knowledge.
00:11:51.720 | You actually want that harder knowledge of how these things work,
00:11:55.120 | how they behave, what the actual nature of these things are.
00:11:58.560 | The more abstractions and toolkits and libraries you put between yourself and
00:12:02.280 | the model, when you're developing, the less you learn, right?
00:12:05.120 | Some of them, honestly, are really good.
00:12:07.440 | But that's also a problem, because they're really good, and
00:12:10.280 | they have this little circle of things that they do really well.
00:12:13.320 | And very quickly, if you're lucky, somewhat slower,
00:12:17.200 | you'll want to step out of it, and then it's just a wasteland, right?
00:12:22.040 | If you've ever built something with WordPress or Squarespace and
00:12:24.680 | then just wanted to do one thing that it didn't do,
00:12:27.880 | you know what I'm talking about, right?
00:12:29.440 | That's impossible, everything will fight you.
00:12:31.400 | So ideally, don't add abstractions.
00:12:33.640 | I know it can be, especially people with a coding background,
00:12:36.360 | kind of sometimes I've seen want to distance themselves from prompting,
00:12:39.120 | distance themselves from the non-deterministic nature of these things.
00:12:42.480 | Bad instinct, don't look away from it.
00:12:46.440 | The next one is also, I know we've got credits to OpenAI, but
00:12:51.680 | everyone wants to give you free money.
00:12:53.040 | Everyone wants to give you free credits these days if you're a provider.
00:12:55.960 | It's too much investor money in this space.
00:12:58.320 | Don't stick to one model, right?
00:13:00.000 | They're all very different.
00:13:01.000 | They were all kind of similar when they came out,
00:13:03.400 | because everyone was working with the same information set.
00:13:05.960 | But things have diverged massively.
00:13:08.000 | They're all practically different people, right?
00:13:09.720 | It's almost like if you gave some work to someone on your team and
00:13:13.680 | they couldn't do it, you wouldn't go, this is undoable.
00:13:16.160 | You'd probably give it to someone else, right?
00:13:18.280 | Same thing, work with different models.
00:13:19.760 | They're all very, very differently trained.
00:13:23.080 | There's even different personalities in there.
00:13:24.520 | This one is kind of easy to keep track of, right?
00:13:29.440 | Basically, have a general rule of thumb that your outputs are not gonna be that
00:13:34.760 | much bigger than your inputs, in most cases.
00:13:37.240 | Again, rule of thumb, that's not gonna end up well, right?
00:13:40.720 | If you're looking to generate, let's say, 20 paragraphs of an article
00:13:44.640 | from five words of input, you're usually just gonna get very generic,
00:13:47.880 | not so good input, right?
00:13:49.000 | Not so good output.
00:13:50.200 | So try and keep those ratios relatively the same if you can, right?
00:13:53.720 | Cool, some smaller FAQs, because these questions get asked a lot, right?
00:14:02.640 | So agents, a lot of people have asked me about agents.
00:14:05.920 | The simple answer there is anything with looping and
00:14:08.200 | termination is usually considered an agent, right?
00:14:11.000 | So anytime you've got a system and it basically loops on the same prompt or
00:14:14.600 | some set of prompts, and it basically has the ability to continue execution and
00:14:18.240 | then decide when it wants to stop, that's usually an agent.
00:14:20.520 | This one is really helpful, right?
00:14:25.280 | When you run into problems or when you start working on a project, or
00:14:28.040 | you're just looking for a project to work on,
00:14:29.680 | it's useful to know what capabilities just got added to the tool set, right?
00:14:32.840 | With Gen AI.
00:14:33.960 | These are four of the biggest ones, right?
00:14:36.400 | The first one's just plain NLP.
00:14:38.120 | If you've done NLP or anything close to it, it just got way better, right?
00:14:41.880 | We can classify documents, we can classify information all sorts of ways,
00:14:45.160 | we can label them, and
00:14:47.240 | we can do all sorts of things with them that previously NLP really couldn't do.
00:14:51.080 | The second one's filtering and extraction, right?
00:14:53.080 | So you can pull information out, right?
00:14:56.520 | And the next one's sort of transformation.
00:14:58.000 | So anytime you've got rags, summarization, that's a transformation, right?
00:15:01.440 | If you're doing code generation, a lot of cases, that's transformation.
00:15:04.680 | If you're doing translation, that's transformation, right?
00:15:07.240 | So oftentimes it's useful to look at your problem, right, in an industry or
00:15:11.080 | your problem set in front of you, or you're just looking for ideas.
00:15:13.400 | If you look for one of these four things, if you look for
00:15:15.720 | one of these four classes, it's an easier way to structure,
00:15:17.880 | maybe that's where you wanna go, instead of where to put things.
00:15:21.440 | The final one, and I think some people are using it, but
00:15:23.760 | I've seen that use case sort of go down for some reason.
00:15:25.960 | It's just general purpose generation, right?
00:15:27.640 | You want it to write things no one's ever written before.
00:15:30.680 | You want it to make things up.
00:15:32.240 | So some resources, I'm not gonna be talking about prompting,
00:15:37.880 | not gonna be talking about rag.
00:15:39.440 | These are just some articles.
00:15:41.120 | These are my articles.
00:15:42.080 | If you don't like me, the top of it has people that I respect that are far
00:15:46.520 | smarter than me, so click the links and go there and read those.
00:15:48.800 | Cool, the next one, and this might be the final one, is debugging, right?
00:16:00.360 | I don't think I've heard that many people talk about.
00:16:02.280 | I mean, among people who work with AI, this is a massive conversation, right?
00:16:06.920 | How do you debug?
00:16:07.880 | Because the sort of curse and sort of the benefit that we got with modern AI
00:16:12.960 | things is that it's very easy to build a demo.
00:16:14.960 | It's very easy to get to something that sort of works, but
00:16:17.480 | it's very hard to debug things when they go wrong, right?
00:16:20.040 | That's almost, again, new paradigm.
00:16:21.360 | So what is happening to you, right?
00:16:25.400 | If nothing works, right, always go down to the prompt level.
00:16:29.320 | And if you can't, then get rid of your abstractions and work up from there,
00:16:33.480 | right?
00:16:34.000 | Try a different model.
00:16:34.960 | Try going up a level of intelligence and see if it fixes it.
00:16:37.520 | That should tell you where your problems are.
00:16:39.320 | Or try going down a level of intelligence and see what happens.
00:16:42.480 | The next one is transform the input.
00:16:45.360 | In most cases, it's your input that's the issue.
00:16:47.880 | Either it's too verbose, it's not the right transformation,
00:16:50.240 | it's not structured the right way.
00:16:51.800 | So any transformations you can do on the input is gonna make a massive difference,
00:16:55.640 | right?
00:16:56.200 | And finally, if you're not doing this already,
00:16:57.800 | add more structure to the output, right?
00:16:59.400 | More structure is gonna help you point out where your problems are.
00:17:02.120 | More structure is gonna tell you, sort of expose some of the big issues there.
00:17:05.440 | Okay, so this doesn't usually happen to people.
00:17:11.000 | This usually does, right?
00:17:12.720 | Is it's kind of working?
00:17:15.800 | It's kind of working, and I can spend another two weeks on it, and
00:17:19.040 | it'll get a bit further down the line of kind of working.
00:17:22.000 | But it's not working, necessarily, right?
00:17:24.720 | So again, I'm gonna go back to data.
00:17:26.520 | In most cases, you wanna find out what separates your offensive data,
00:17:30.280 | which is where it doesn't work, to the stuff that does work, right?
00:17:33.400 | Try all sorts of transformations.
00:17:34.720 | One of those is gonna point to some sort of difference between the stuff that works
00:17:39.000 | and the stuff that doesn't, right?
00:17:40.600 | If you do, that's a prompt, right?
00:17:43.080 | More validation's always gonna help.
00:17:45.000 | And then we saw the classification before, right?
00:17:47.600 | If you're trying to do more than one of those things inside the same system,
00:17:50.640 | inside the same, with the same model, usually separate it out, right?
00:17:55.080 | And it makes a huge difference.
00:17:56.000 | Finally, yeah, just classify your errors.
00:18:01.440 | Most errors I've seen sort of fall into these three issues.
00:18:05.200 | You've either got app level issues in terms of how that data's being fed in and
00:18:08.560 | fed out, and how models are orchestrated once things get too large.
00:18:12.640 | Or you get factuality issues, right?
00:18:14.560 | It's just making things up that don't exist, or
00:18:17.240 | it's just giving you information that it really shouldn't, or
00:18:19.800 | pulling out the wrong information.
00:18:20.840 | It's a factuality issue.
00:18:22.360 | The third one's just instruction following.
00:18:24.240 | Is it just not listening to the specific instructions that you're giving it, right?
00:18:27.880 | And this is at the model level, but it happens at the meta level as well.
00:18:31.320 | Even if you're working with, say, three models and 300 prompts,
00:18:34.320 | all of these things still apply.
00:18:35.320 | Okay, so what do you do, right?
00:18:39.760 | The first one is, whatever you're doing, right?
00:18:45.360 | Whatever you're doing as far as prompting and working with models go,
00:18:48.240 | you're almost always too verbose.
00:18:50.040 | Because in most cases, it's English, and once we start adding things,
00:18:53.120 | they kind of work.
00:18:54.120 | So you get to this sort of Pareto level of, it works, but it just doesn't.
00:18:59.040 | It's almost how humans behave.
00:19:01.120 | Cut them down, there's usually space to cut them down, cut them again.
00:19:04.440 | The lower your task complexity per prompt, or per task, or
00:19:07.840 | per function, the better, right?
00:19:10.800 | The easier it is to debug, the easier it is for you to have things with defined
00:19:14.520 | blast radiuses, where if something goes wrong, you can swap it out and fix it.
00:19:18.600 | Otherwise, if something goes wrong some day, you're gonna have a problem.
00:19:21.440 | So, how much time have we got left?
00:19:25.800 | Okay, ten minutes, perfect.
00:19:27.280 | So this is just an example of that particular project that I mentioned
00:19:31.000 | at the beginning, right?
00:19:32.000 | So it started with just a specific issue, honestly, it wasn't even me.
00:19:37.160 | It was Hibi, who's actually here, who had a transcript for me.
00:19:39.920 | And she was like, okay, can we make docs out of this, right?
00:19:42.000 | Or I think it came partly from that.
00:19:43.660 | So there was a lot of talking.
00:19:45.400 | There was a lot of trying to figure out what we can pull out,
00:19:47.760 | what it understood out of the transcript.
00:19:49.520 | You're trying to look for understanding.
00:19:51.100 | You're trying to see if this can even be done.
00:19:52.560 | You're just testing very high level hypothesis, right?
00:19:55.240 | Some of the things I tested were sort of trying to pull out structure directly.
00:19:58.320 | Some of the other ones were trying to classify that data before pulling out
00:20:01.120 | structure. You learn just a lot about what it is.
00:20:03.880 | You figure out where you wanna put the transcript,
00:20:05.720 | whether chunking is a valid strategy.
00:20:07.800 | All of that you can learn from just talking, right?
00:20:11.320 | The next one is talk, but then start changing things, right?
00:20:14.700 | Now you start adding steps.
00:20:15.820 | Now you start adding structure.
00:20:17.320 | You start getting information out.
00:20:18.540 | And once you're done with that, the entire thing, and
00:20:22.460 | this actually worked, was just this one script, right?
00:20:25.740 | Really, I mean, you don't have to read that.
00:20:27.300 | It's actually in the repo.
00:20:28.900 | It's just this one script, right?
00:20:31.220 | And really all it did was just loop twice over everything, and
00:20:36.220 | then break it down into sections and use different models to write different things,
00:20:39.180 | right, so there's one model, you know, that's generating the structure.
00:20:42.240 | There's another model that's actually doing the long form writing.
00:20:44.880 | And then the final one is just breaking it down into smaller and
00:20:49.240 | smaller and smaller functions.
00:20:50.080 | So if you look in the repo, it's still not that big, right?
00:20:52.360 | But there's a lot more state management.
00:20:54.440 | There's a lot more state management.
00:20:55.760 | There's a lot of self-healing.
00:20:56.880 | There's a lot of correction.
00:20:57.840 | All of that stuff can go in after, like you've proven the thesis.
00:21:00.920 | Cool, actually, I'm ahead of time.
00:21:04.640 | I didn't think I would be.
00:21:05.920 | So the final thing, and I will say this,
00:21:08.740 | is a lot of people I speak to are still very concerned about cost, right?
00:21:12.180 | I don't know how many of you guys watched the NVIDIA keynote that happened a couple
00:21:16.860 | of days ago, but long story short, everything you're using now is gonna get
00:21:22.220 | at least 10x, if not 50x cheaper, in very short order, right?
00:21:26.900 | It's gonna get 10x, if not 50x faster, in very short order.
00:21:31.060 | So what would you build if you were building for, say, six months from now, or
00:21:34.980 | what would you make if you just presumed that today, right?
00:21:38.720 | And it's a different way of working with these things.
00:21:40.720 | If something costs ten bucks,
00:21:42.360 | that's a different system than if it costs one cent, right?
00:21:45.440 | If something takes an hour, that's different from if it takes six minutes,
00:21:49.520 | right, so I would say this is a valid presumption to make, right,
00:21:53.240 | when you're building something.
00:21:54.800 | Is what more can you do if you just presume that about the future?
00:21:57.840 | Immediate future, right?
00:21:59.040 | Because we still haven't even gotten hardware level optimizations.
00:22:01.840 | That's what NVIDIA's doing now.
00:22:03.600 | That's a 10x.
00:22:04.680 | Memory level optimizations, again, still coming up.
00:22:07.360 | That's a 10x.
00:22:08.160 | Quantization, that's probably another 10x.
00:22:10.800 | So all of these things are almost being done now.
00:22:13.480 | And they're very comparatively easy, engineering-wise.
00:22:16.640 | It's just incremental optimization to get there.
00:22:19.280 | But cool, that's everything.
00:22:21.160 | Feel free to find me after, or just reach out on Twitter, I'm happy to help.
00:22:24.320 | >> [APPLAUSE]
00:22:29.320 | >> Do you have some questions?
00:22:30.400 | >> Sure, yeah.
00:22:31.040 | >> Yeah, I think I have a question.
00:22:36.400 | We got like, what do you think about long context window model and
00:22:40.640 | embedding model?
00:22:41.940 | >> Okay, so long context is tough, right?
00:22:45.200 | Because I might say something where I don't know what I'm talking about.
00:22:48.680 | That said, this has been my question as well.
00:22:51.080 | The problem with context windows is our algorithm for
00:22:54.120 | attention is quadratic.
00:22:56.160 | What I mean by that, it scales exponentially to get twice as much
00:23:01.020 | context out of something,
00:23:02.000 | you've got to spend four times the amount of memory and compute.
00:23:05.120 | We still have that curse.
00:23:06.280 | There's no way to, we still don't know a good way to get around it, right?
00:23:09.920 | So what that means effectively is to get really long context windows,
00:23:13.860 | you have to cheat.
00:23:15.280 | You effectively have to say, okay, I'm going to have something before I run
00:23:18.600 | the model that's going to kind of figure out which part of the context to
00:23:21.480 | actually pay attention to.
00:23:22.560 | So you don't actually get the full context window, right?
00:23:25.120 | You kind of do, but if you take the full context window and
00:23:27.800 | you're trying to use every single token in it to compute an answer,
00:23:31.160 | it's not going to work.
00:23:32.140 | So that is still very much a problem that could be solved.
00:23:34.700 | That's one of those open problems.
00:23:36.080 | I think it's an open problem that could be solved tonight by someone that's
00:23:39.280 | working somewhere or ten years from now, we just don't know, right?
00:23:41.760 | >> You've mentioned a bunch about transforming the input.
00:23:47.940 | How do you go about doing that?
00:23:50.080 | Do you use AI to transform, is your input global?
00:23:53.920 | >> In most cases, yes, you're going to be using AI to transform it.
00:23:56.400 | But there's also just a ton of structured stuff you can do, right, very easily.
00:23:59.640 | Like most documents, let's say I've got a PDF, or I've got, let's say these slides,
00:24:04.160 | or I've got one of my documents is in Markdown.
00:24:07.000 | There's a ton of structure in there you can just grep for, right?
00:24:09.800 | because I can very quickly figure out what the sections are.
00:24:12.360 | I can very easily separate by sentences.
00:24:14.440 | That's all stuff that you can do today, right?
00:24:16.600 | So even just knowing that that's got 300 sentences in it,
00:24:19.640 | that's a transformation of the input that is valuable, super valuable, right?
00:24:24.040 | Because we can make assumptions already, right?
00:24:26.400 | If I give you a document that someone's written,
00:24:28.480 | I can presume that the title is probably the highest compressed information in
00:24:32.240 | there, right, that is a good enough thing.
00:24:35.160 | I can presume that the first section will have some sort of intro
00:24:38.120 | of what the thing is, right?
00:24:39.960 | Those are all transformations, but yes, usually you use AI.
00:24:42.080 | >> I had a question.
00:24:44.600 | So how do you think about [INAUDIBLE] >> I actually haven't used Devin.
00:24:53.360 | I just have not had the time, but I've had people tell me that it's good.
00:24:56.600 | Look, coding is going to be where these models make just a massive,
00:24:59.400 | massive difference, right?
00:25:00.480 | I already use a cursor which can understand just a massive amount of
00:25:05.560 | context and sort of work.
00:25:07.880 | It has been six months since I wrote any code that wasn't at least partially AI
00:25:11.760 | generated, so it's just going to keep getting bigger and bigger and bigger.
00:25:16.120 | That said, I will say the time that most devs that I know and
00:25:20.400 | most companies that I know spend is in business logic, maintenance, and
00:25:25.400 | sort of really trying to transform customer input to really massive systems
00:25:31.400 | with a ton of legacy code, like we're a long way away from that, right?
00:25:35.040 | What I mean is it's getting easier and easier for
00:25:37.600 | you to spin up a more and more and more complex project from scratch, right?
00:25:43.000 | But the massive dev work that sort of sits, kind of sits past that, right?
00:25:48.800 | That still hasn't been touched.
00:25:50.280 | >> [INAUDIBLE]
00:25:53.840 | >> The efforts to do something there,
00:25:57.200 | because that's where the money is in a lot of ways,
00:25:58.640 | because that's where most enterprises are, right?
00:26:00.480 | If you look at SAP, or you look at most of these guys,
00:26:03.640 | have not so far borne active fruit.
00:26:05.800 | I know most of the companies in that space,
00:26:07.920 | they're still having trouble getting it to work with very large code bases, right?
00:26:12.760 | Like, let's say anything above a 50% company that's existed for
00:26:16.480 | more than three years.
00:26:17.520 | That code base, so far, AI hasn't been able to touch, right?
00:26:24.240 | >> I had a question.
00:26:24.840 | So I recently read a, I wouldn't say read the paper, I read the abstract, right?
00:26:29.920 | So where it was like, I think from Amazon or from somewhere, or
00:26:33.440 | Netflix perhaps, that getting cosine similarities between embeddings,
00:26:39.080 | it's not really a good measure for getting the meaning of things, right?
00:26:43.800 | And this is a preface for my question.
00:26:46.600 | And also when we do vector searches and just try to pull relevant information,
00:26:53.080 | I don't know, it feels like it doesn't work.
00:26:56.800 | I'm trying to figure out what am I doing wrong, how to do it better.
00:27:01.960 | I watched Jerry Hill's talk from our index, it was an 18-minute talk or something.
00:27:07.720 | It's a very nice talk, but it just kind of flew over.
00:27:10.720 | So what's your recommendation?
00:27:12.960 | >> I think the problem here is embeddings are sort of fuzzy search on steroids.
00:27:19.440 | If you're using them for anything more, I think even today you have a problem,
00:27:25.280 | right, because a couple of things.
00:27:26.760 | One, these are really tiny models, comparatively, right?
00:27:30.680 | Big brain, small brain, tiny brain, these are really tiny models.
00:27:33.800 | In most cases, they don't have a good understanding of the underlying text.
00:27:36.800 | That's why long context embeddings never made sense, right?
00:27:40.040 | The longer the context, it just doesn't really make sense.
00:27:43.200 | Not to mention, in most cases, that's a transformation of the input, right?
00:27:45.880 | What Hebe was saying, that's a transformation of the input,
00:27:48.200 | is you're transforming it.
00:27:49.440 | Well, you're transforming it into a space where it's a lot harder for
00:27:52.280 | you to work with it, right?
00:27:53.720 | You're transforming it to a set of numbers.
00:27:55.320 | And now the only thing you have is cosine similarity.
00:27:57.960 | You can have a bias matrix, you can push that math a little bit more.
00:28:01.640 | But because that model is unknown to you, the model's workings are unknown to you,
00:28:06.040 | those are forever gonna be a bunch of numbers, right?
00:28:10.160 | In some insanely high dimensional space.
00:28:12.720 | So there's not a lot to do there, right?
00:28:14.840 | What is becoming very possible now, that I see a lot of companies switching to,
00:28:19.200 | is just use the whole brain, use the LLM, right?
00:28:23.720 | Whatever you're using embeddings for, you can use an LLM, right?
00:28:27.120 | It's just more expensive, right?
00:28:30.600 | In most cases, you can use an LLM for that.
00:28:32.440 | Like let's say you're using, I'll give you the most brute force example of this.
00:28:37.320 | Let's say using embeddings to take 100,000 items and
00:28:40.400 | see which ones are similar or which ones closest to your query.
00:28:43.160 | You can take an LLM, run it through every single one of those documents and ask,
00:28:46.360 | hey, is this close, is this close, is this close?
00:28:48.640 | And you'll get an answer, right?
00:28:49.760 | That is not a good way to do it, do not do it this way.
00:28:51.960 | But you see what I mean, right?
00:28:53.720 | So they are kind of, you can substitute one for the other just a little bit.
00:28:57.000 | I think embeddings have a place, right?
00:28:59.080 | But they should always be the last step in your pipeline.
00:29:01.920 | You should cut down the search space as much as possible with structured search,
00:29:06.520 | transformations, it's a BM25, there's a bunch of stuff you can do, right?
00:29:11.280 | You should never be searching your search space with embeddings, right?
00:29:14.600 | You should always be searching some reduced search space where, hey,
00:29:17.680 | last 20 things, and I know these are relevant because keywords.
00:29:21.840 | I know these are relevant because location.
00:29:23.800 | I know these are relevant because an LLM told me after transformation, whatever.
00:29:27.640 | Now I can embed, that's fine, right?
00:29:30.160 | But if you embed at the beginning, in most cases, it just doesn't work at scale.
00:29:34.360 | >> So it's more like to get the results and then sort it.
00:29:38.120 | Is that where embeddings come in?
00:29:40.120 | >> More like to get the results, and yes, kind of to sort it, but
00:29:43.200 | kind of also to identify useful parts of those results.
00:29:46.400 | Let's say the results you got were pages, but you want sentences, right?
00:29:49.600 | You wanna know which part of it is heat map wise the most important.
00:29:53.720 | You can use embeddings for that, right?
00:29:55.040 | All right, thank you so much.
00:30:04.260 | >> Yeah.
00:30:04.760 | >> [APPLAUSE]