back to index10x Development: LLMs For the working Programmer - Manuel Odendahl

00:00:00.000 |
Hello, are you all awake and you just have your coffee, that's great, welcome to my workshop 00:00:21.380 |
about LLMs for programming or like how to become a 10x engineer, which I'm like actually pretty 00:00:27.240 |
serious about. I put some handouts out, so I didn't print like 200 of them, but if you 00:00:35.220 |
can sit next to someone who has some, there's like a couple lying around in this part of 00:00:39.380 |
the room, and in general I encourage you to have a partner, two partners, because it's 00:00:44.520 |
a workshop and it's like nice to share ideas. There's a URL that's to a GitHub repo which 00:00:51.380 |
has nothing in it, but it has the handouts as PDF and it has like a list of links or something 00:00:58.100 |
like that. I encourage you to join the Slack channel, because of the size I won't be able 00:01:03.640 |
to really interact with individuals, but I will be able to through Slack. The only prerequisites 00:01:13.460 |
you need are an LLM, so I use OpenAI just like the chat interface or like CloudSonet is great, 00:01:22.440 |
but you can use like if you want to have fun and use like a small one on your laptop, you're 00:01:26.040 |
welcome to. Find.com is also great, like anything that's all you need. No code, no Git fork, no 00:01:34.460 |
nothing. And with that, I'm manual, I'm like a software engineer by heart, I knew I wanted to be a 00:01:43.640 |
programmer when I was six, and when GPT-3 came out, I like co-pilot, I was like what the, and started 00:01:52.220 |
just like doing everything I do usually after 25 years of being a professional software developer 00:01:58.800 |
with LLMs and just like keep finding new techniques until today. I kind of am not on social media, 00:02:04.780 |
I don't really follow what other people are doing, so this is kind of like all homegrown 00:02:09.520 |
kind of stuff, but always with the focus of like actually doing like e-commerce work, PHP programming, 00:02:15.040 |
just like down to earth stuff. And so I'm going to show first just like a few general concrete 00:02:21.920 |
techniques. Like I think a lot of programmers don't really vibe very well with the technology 00:02:28.800 |
especially because the hype around it is like it's going to replace you. Us. Which is kind 00:02:35.120 |
of true. But so I'll give like a bunch of techniques just like to ease and getting into it. So who of 00:02:43.040 |
you are programmers? Cool. Is there someone who's not a programmer because you can still stay because 00:02:51.600 |
it's everything will still apply actually. Who's using LLM's co-pilot, well I'll put co-pilot a little 00:03:00.960 |
bit apart, but like LLM's code generation for their work? All right, cool. I guess that's why you're coming 00:03:07.920 |
to the fair, huh? So after a bunch of general concrete techniques that I use which I feel are pretty 00:03:13.920 |
important I'll go into the main concept of how I approach programming with LLMs which is like treating them as 00:03:19.920 |
as translation engine. Not like as something that reasons or something that like writes code and can 00:03:26.240 |
figure things out with agents. It's just like oh they're good at translating one type of language to 00:03:30.400 |
another. Which is honestly what we do as programmers as well, right? And we take GitHub tickets or whatever 00:03:37.120 |
and we convert them to code. Or the other way around when the code doesn't work we create GitHub tickets. 00:03:42.880 |
Then a pretty important, this is like a new concept I haven't really played with but it's so massive is 00:03:50.240 |
treating LLMs as like word simulators and what we can do with that in the context of just like pragmatic 00:03:57.040 |
programming. And finally a bunch of like what I try to distill is like what are the skills that we need in 00:04:04.080 |
this new age of software developers to become really effective because they're very, very different from 00:04:08.480 |
from what they were in the past. But some of them stay the same which is also good. 00:04:14.800 |
So what I'm going to do is like I'm going to go through these pretty quickly. I wrote up, 00:04:22.160 |
I wrote them up on the handout and the handout you'll see these gray blocks. They're concrete examples of 00:04:28.560 |
just prompts. Like I don't want to paste 20 pages of GPT transcript that no one cares about. But what you can do and try to maybe it will work just like paste these prompts. 00:04:30.480 |
into chat GPT and say what happens and like there's some sequences which might or might not work depending on what the model outputs but hopefully you can follow your way around. 00:04:46.800 |
And there's links which obviously don't work on paper to some of the transcripts I just did yesterday. But really it's about you trying them out. Like don't try to, these are not examples you should follow. 00:04:50.800 |
this is not the way to do it. It's just like an example of applying what I really want to get at. 00:05:04.800 |
And I'll give you about 10 minutes after showing you these things to basically communicate with your neighbors, try to solve some problems that you have, come up with some or just use some of the examples I show 00:05:18.800 |
to get a sense of how these things work. So first technique which I see a lot of programmers not doing is not regenerating. 00:05:29.440 |
You'll come in, you'll put some prompt in it and I'll be like either it outputs something great and you're like holy crap or it outputs something that's buggy and you're like ah it doesn't work. 00:05:37.840 |
But if you regenerate 10 times you'll see like oh it maybe works three times and five times it outputs something that's like almost working and then two times it's outputting something that's completely random. 00:05:48.240 |
The API works better for that because you can up the temperature like I realized chat open AI used to be much more wide in its outputs. 00:05:57.440 |
So if you regenerate it often comes out with the same stuff. 00:06:02.640 |
Second one is like instead of trying to correct the models and like no this is wrong this is not the right method and stuff like that just go back and edit either your prompt or if your UI allows it like LM studio or Libra chat like other frontends allow you to edit the LLM's response. 00:06:17.440 |
So if the LLM outputs bad code just go back fix the code and then the LLM thinks it output the right code. 00:06:27.640 |
Then a next technique is to often clear the context like I don't think I do more than three four prompts in a conversation and then I just take the output at the end just start a new conversation that tells me if you know my technique is robust. 00:06:43.640 |
Because I can't you know if I have a 20 page transcript I don't really know what's going on. 00:06:48.840 |
I don't know what the context has been summarized to and usually by the end you're just like in LLL and it works much better with the bigger models now. 00:06:56.840 |
So you know maybe this doesn't apply anymore but it's book good practice anyway experiment with different models because they all kind of have different vibes and you get to feel these different vibes and how they change over time or like how smaller models might just need more coaxing. 00:07:10.840 |
But can actually do the same stuff than the bigger models and this is all through practice. 00:07:18.040 |
So like practice practice practice practice practice. 00:07:20.040 |
I have the trouble that I just get rabbit hole when I have a task to solve I try to write the prompt that solves a task and then I try to write the prompt that generates the prompt that solves the task. 00:07:39.240 |
Just make like custom system prompts that you can put into like chat open air whatever don't worry about changing them like if you have a session today where you're going to write PHP just change the system prompt to be PHP oriented like which libraries you use maybe what code formatting you use. 00:07:53.240 |
Just because it saves like a ton of time that's a very concrete technique there's a prompt on how to do that. 00:08:00.440 |
And I want once I get to this slide maybe we can try it out. 00:08:07.640 |
Generate helpers like every time you have a problem just ask the model to write a script to solve it right like often I put stuff in my clipboard and I have to like edit it to put it back into my code. 00:08:15.640 |
Maybe the imports I don't want to always like have to remove the imports when I paste into my ID so I can ask like a GPT write a shell script that just opens the clipboard and an editor and or open the shell script and just open the clipboard and remove the imports at the beginning. 00:08:32.840 |
It's like a simple sad three liner just generate it and at the end of the day you can throw it away or save it. 00:08:38.840 |
And this might be the most important technique summarizing your transcript and I want you to do that today to exchange with your neighbor when you try something at the end just like summarize what I did as a read me as a how to as a wiki entry as a RFC and then share it with your neighbor so that they only have to read like one clean document and not like 10 pages of whatever. 00:09:02.040 |
So I really want you to do that today if you're like working with your neighbor which I really encourage at the end summarize the transcript and just send them that even if you do like rabbit holes I'm fine trying to solve a problem right like oh what's this type script bug that I have at the end you can summarize your search and just send it over and say like look this is this is how I solve that here's all the links if you want to look at it. 00:09:25.240 |
So that's something you would never do before when you were readings like overflow and like Google searching you would never send someone your Google search history. 00:09:31.840 |
But now you can and a final technique is like when I ask it to when I try to figure out how to do something I use ridiculous domain examples like instead of working on my app which is often going to be kind of a meta app right like if I do infrastructure. 00:09:49.340 |
That's going to be like programming jargon and the model might get confused because I want to you know create a script that create scripts or manage the scripts. 00:09:58.740 |
So instead I always ask it to do like you know manage dinosaurs or like zoo animals or like write an operating system about Tolstoy and then you see in the output. 00:10:10.940 |
What is Tolstoy relevant is like what it fills into the pattern and then when it says like Colonel driver you know okay that's Colonel driver but if it says like Colonel driver system function interface you're like you're not sure if it's like a driver about functions if it's the function of the driver like. 00:10:25.940 |
So I have an example here like write an Arduino Arduino Arduino for remote controlled T-Rex and I'll show that on screen and while you know for the next 10 minutes I want you to try out some of these things like you know write some helpers like create some system prompts or summarize some transcripts that you have research how to how bourbon bourbon is made and summarize you know your research to share with your neighbors like just. 00:10:54.940 |
Try different things and while you're doing that I'm going to watch the slack if you have cool examples you can also open PR's on the GitHub on the on the GitHub repo so that everybody can see them later on and I'll answer questions and if no one has questions I'll just do my own stuff in the browser they can look at for inspiration so before we get into that like maybe I'll just show you one of these of these prompts. 00:11:23.940 |
And you know the code doesn't need to work but it like gives me an idea of like different concepts. 00:12:00.940 |
And here you can see right like the tasks are prey detection. 00:12:03.940 |
I know okay like it understood that the task name is something different than the tasks thing itself. 00:12:09.940 |
But so here I can tell that it used free artists and it's like no implement and artists from scratch and then you can just like rabbit hole right so going to implement it from scratch and I'm going to summarize the transcript and then you can see how that works and how useful it is. 00:12:25.940 |
So this doesn't necessarily need to work right but it gives me a bunch of stuff to look at and try to Google maybe. 00:12:35.940 |
But then I'm like okay well where did it kind of fetch this from. 00:12:40.940 |
You can probably Google it because it's in the training corpus. 00:12:42.940 |
And that gives you like a good start of either a lot of boilerplate that you want to change right. 00:12:48.940 |
And then maybe you're like okay well I know how to do a timer interrupt I will replace that with my own. 00:12:55.940 |
You can edit the LLM's response to put in your timer interrupt and regenerate and it will be like oh this is how you write interrupts. 00:13:01.940 |
I'm going to write interrupts that way as well. 00:13:03.940 |
And you have a really efficient thing because it's still two interactions with the LLM right like fast inference and stuff like that. 00:13:11.940 |
Summarize this as a read me for my colleagues. 00:13:16.940 |
And then this is like high value when you work on a team and you can just be like look this is how my artist works. 00:13:30.940 |
You can say like add examples list of caveats only output new text. 00:13:40.940 |
I will make many typos the model doesn't care. 00:13:56.940 |
If you want to share ideas of stuff to build like feel free to do it so that people who are you know early morning. 00:14:04.940 |
But can you later on asking the slack that way I don't have more people can do it. 00:14:10.940 |
I just wanted to say that it works nice for mermint the charts. 00:14:31.940 |
When you talk about doing something just type that into the LLM and then we'll do it. 00:18:14.800 |
Could you spend time to join the Slack channel? 00:18:20.840 |
So the Slack channel, to join it, the Slack channel is called LLM working programmer, workshop LLM working programmer. 00:18:28.960 |
Sorry about that, maybe I'll just leave this light on you have the handout, maybe, or, and I'll just mess around in a different tab. 00:23:46.040 |
I'm going to take a minute showing what I did. 00:25:15.980 |
In Slack, so that everybody can see them or make a pull request to the repo. 00:25:21.420 |
And for the sake of time, because we have 50 minutes left and I have a lot to show, let's 00:25:46.500 |
Because no one knows how LLMs work, no one knows how LLMs work and we never know how LLMs work, because LLMs are basically these weird computational artifacts that have been trained on whatever gigantic corpus we've trained them on, which no one has the time to read, let alone understand, right? 00:26:01.580 |
So no single human will ever understand what is the model, you can just kind of poke at it and connect it to what we know, and so in a way it's not like a computer technology or a math technology or whatever we want to call it, it's like a cultural technology. 00:26:17.000 |
It's like based on human culture and if you know your way around human culture, you'll be able to squeeze out more of the model, not just programming, right? 00:26:26.100 |
So it's all about language, the model really doesn't care if it's programming language, if like a formal language, or if it's like German or symbols or whatever, because the only thing it does is transform tokens into tokens and it looks back at its own tokens doing so. 00:26:41.680 |
So even if we say like, oh please output JSON, it's like, I'm going to output numbers, probabilities of all the numbers that exist, but so if you make the switch away, right, like you're writing code, but like there's no reason that what is in the code isn't actually also like language, suddenly it makes a lot of sense of calling a variable like year over year revenue and not call it A, even though as programmers we think like, ah, that doesn't really matter, right, like, but actually the model, it matters a lot. 00:27:11.660 |
Because one is like, the token A, the other one is the token year over year revenue, and then suddenly it's able to access all the like, ah, finance literature to do, to write your code, even though you think that's like just, you know, um, a stupid dashboard, so understanding that means that suddenly things like theater, poetry, marketing literature, like textbooks, mathematical formulas, all of that is like fair game to write code. 00:27:41.640 |
So for example, theater, I bring that up as the first one, the models are trained to be nice, right, like they're, they're re-instructed to be like, yeah, I, I, I understand, like please, ah, please tell me what to do, I'm, I'm happy, and you lose a lot of conflict, which is useful, like when you do a code review, I don't want someone to be nice in the code review and give me platitudes, or like, you know, like, shit sandwiches. 00:28:03.640 |
I want something, I want something that just like challenges every line I give it, and so if you say like make a theater play, code reviewing my code with like six different people just like arguing about it, then suddenly in one inference you have like a really good code review instead of someone telling me to handle my errors. 00:28:18.640 |
So that's kind of the main thing, right, it's like thinking of them as culture, not as technology, ah, or as cultural technology, ah, and so the way to approach is, I think of them as translation engines, right, like there's many, many different kinds of languages, like when I talk to my mom, I have a different language than when I give a talk, I have a different language than when I write Python, I have a different language than when I write Python, I have a different language than when I try to write like a math paper, um, and so some of them are like from less formal to more formal to more formal, 00:28:48.620 |
Once you step into the formal language world, you can use a computer to interpret it or do stuff with it more easily, um, but you can also, you know, transfer from one domain to another, you can go from the T-Rex domain to the real-time operating system domain or mix both together, um, it's just tokens, right, um, but so what that means is writing software now is about decomposing the problem that you want to solve into language translation steps and then let the model do these language translation steps, understanding which of these steps 00:29:18.620 |
are too much for certain model, or how to do them, like how to prompt it to say, transform, you know, when it outputs JSON, it doesn't output the right JSON, you might put in like, this is important for your career, that's like kind of a cultural technique to get it to output JSON, which, which is a little bit weird as a programmer, um, so here's a bunch of examples, and like, I'm not gonna go through every graph, but like, feel free to look at them in the handout, um, this is, for example, an example of how to transform, 00:29:48.600 |
for meeting transcripts, or like on Zoom, or you have like your software planning meeting, everybody interrupts each other, it's like a big chaos, no one's gonna go back at that transcript and try to understand what was discussed, uh, but now you can, right, and so the steps to do that, because if you just ask, like, tell me what we discussed last week, they're getting good at that, like, like, it's a little bit scary, but also, it used to really work not very well, 'cause you, you have like, 28 pages of transcripts, with a lot of, like, just people interrupting each other, 00:30:18.580 |
and using colloquial language, so the different steps to do it, that I like doing, is like, first I wanna get a summary, because Claude has gotten really good at, like, figuring out, actually most of the information that's in there is a bullet point list, it's not good at details, I just want, like, which topics did we discuss? 00:30:33.580 |
Bam, five bullet points, then I take this translation step, right, from transcript to five topics we discussed, and then for each topic I go back and I select, for each of these topics, tell me exactly what we did, and now it has, like, a direction to know what to grep for in this huge transcript, that's only relevant to it, so you get this, like, pretty good result. 00:30:54.580 |
People call that, like, there's plenty of papers about it, I think, right, like, with complicated names or whatever, but technique is really just translation steps, it's like, from a topic to get all the details to that topic, no need to have, like, really fancy names about it, no need for fancy prompts, you just go back, paste the summary, and then for each point in the summary, like, give me the details. 00:31:15.580 |
Then the next step is, once you have these technical details, and you take the transcript again, you say, like, oh, give me action points, like, who's going to do what, which we discussed in the planning meeting, and just give me a list of, like, oh, Enrique's going to do that, and Sarah said that maybe we should do this, you know, and you get, like, a clean list of action points for each of these, like, fairly detailed technical summaries of each transcript point. 00:31:38.580 |
If you do that one shot, there's no way that that will work. 00:31:41.580 |
But if you decompose in these, like, little translation steps, then you get it, and at the end, you can just take the action points and the technical details we discussed, maybe paste some of your code base in there, and create the GitHub issues, which already have, like, a plan laid out, they're assigned to the right person, you can ask to add tags to it, the way you do tags, like, name them correctly with whatever ticket naming convention you use. 00:32:05.580 |
So that would be an example of, like, a workflow for transforming transcripts into GitHub issues. 00:32:11.580 |
No need to really automate that, like, it's just a list of things you can paste, and then in the conversation, if you're seeing it veering off a little bit, you say, like, oh, no, please tell me what Enrique said about XYZ, I remember it. 00:32:24.580 |
I don't want to script and, like, kind of formalize that, just put in transcript tomorrow, I'll forget about it. 00:32:30.580 |
So, I have a couple of scripts to help me do prompts, and I'll get back to those, but I zero-shot everything, like, haven't done agents, whatever people call agents. 00:32:45.580 |
You can do other techniques, right, like, take this meeting transcript and the existing docs and code, and just say, like, oh, we talked about refactoring this thing, and now give me a concrete plan. 00:32:57.580 |
Like, tell me which API should be renamed to what based on what we discussed. 00:33:02.580 |
It's like I need to look for the functions or whatever. 00:33:04.580 |
Just paste all the names, say, like, how do you rename them. 00:33:07.580 |
Write the script to rename them with said, you know. 00:33:11.580 |
You can add in, like, the technical details we worked out, right, from the summary. 00:33:15.580 |
We had all these, like, oh, we discussed this way. 00:33:17.580 |
You can maybe put in, like, your RFC guidelines. 00:33:20.580 |
You know, this is how we write RFCs, and then you get an RFC out of the refactoring planning that we discussed in the meeting. 00:33:25.580 |
But with the right function names, maybe with a script or an actual refactoring that you can put into a test branch, for example. 00:33:34.580 |
And then you have an RFC that people can discuss, right, and then you can go back. 00:33:38.580 |
RFC comments, just put them back with the RFC and the technical details in the summary and just, like, update the RFC. 00:33:54.580 |
This one is, like, once we have the GitHub issues, right, from this transcript, I can formalize them, because just, like, output them as YAML. 00:34:00.580 |
Like, make me title, tags, assignee, ticket number, link to your whatever. 00:34:10.580 |
And then you have, now you have a formal representation that a computer that we can write programs for, right? 00:34:15.580 |
So then I ask it, like, oh, make a shell script to just take this YAML and create the issues for real, like just use the GitHub API. 00:34:21.580 |
Models are usually pretty good at zero-shotting that. 00:34:25.580 |
If they mess up the GitHub API like they used to do, just paste the API in there. 00:34:31.580 |
And then you can take this script, which is pretty simple, but iterate on it, right? 00:34:35.580 |
Like, add a help, add a readme, add more flags, maybe confirmation, add, like, colors and little emojis, make, like, an NPM package for it that everybody can install. 00:34:47.580 |
That's all, like, little translation steps going from a JavaScript to the package JSON. 00:34:55.580 |
But when I do it, like, it's really annoying and tiring. 00:34:59.580 |
And then I'll, like, forget that you have to write, like, ID uppercase and ID lowercase, and it costs me the afternoon. 00:35:09.580 |
Like, because I was talking about it, but if the model, if I just pasted it into the model, I would be there. 00:35:15.580 |
So, now, you can actually talk, okay, now that we have the script, how do we change our workflow within the team, within the company, within the open source project to just automate it, right? 00:35:28.580 |
Like, we have a meeting transcript where it's, like, create GitHub issues, boom, it's all kind of automated. 00:35:34.580 |
You maybe want to look at the YAML before you press fire, right? 00:35:38.580 |
But there's a decent chance that at least you get a lot of boilerplate and you can replace, like, a couple of LLM fringe-speak. 00:35:49.580 |
What you can do now is just, like, take the script and the Slackbot API and say, like, make a Slackbot for this. 00:35:55.580 |
Notify the people that have been assigned the tickets. 00:35:57.580 |
You take your GitHub transcript, paste it to the bot, and then everybody gets notified of their GitHub tasks that come out of it. 00:36:05.580 |
And then I did some meta-meta stuff, which I'm going to skip. 00:36:10.580 |
If we go back to this workflow now, right, which probably took us, like, maybe an hour to build all of this. 00:36:16.580 |
A good chance that, you know, if it's annoying to get, like, the Slack access API token, or if you like some OAuth stuff, 00:36:23.580 |
you just, like, write me a script to get a Slack API token. 00:36:30.580 |
And, but what you can do now is that as part of these prompts that we give it, every individual developer that you have on your team, 00:36:39.580 |
they can, like, put in their preferences, right? 00:36:41.580 |
They could say, like, well, I don't know about this library, so explain it a little bit more when it goes about this thing. 00:36:47.580 |
You know, my background is this and this and this, but currently I'm busy with this other part, so only tell me about parts that overlap. 00:36:53.580 |
You can, that's just, like, a TXT you can have on your GitHub, right, or somewhere, but every time the Slack bot, for example, 00:37:02.580 |
is going to create the GitHub issues assigned to you, it's going to put this thing into the context, 00:37:06.580 |
and then you will have, like, tailored documents that explain how to do something. 00:37:11.580 |
So, you know, if you have the intern, and you know the intern doesn't really know much about APIs or, like, HTTP, 00:37:17.580 |
you can just put in their prompt, like, explain HTTP, as it relates to this ticket, 00:37:23.580 |
or explain our internal infrastructure deployment script. 00:37:28.580 |
Onboarding, super easy, just have, like, an onboarding TXT that's ready to go, 00:37:33.580 |
explaining, like, the architecture of your code base, just put that into the new developers' GitHub issues, 00:37:48.580 |
So, for the sake of time, you can go through some of these. 00:37:52.580 |
Maybe I'll show, one that's really funny is, like, do you want to write API clients? 00:37:58.580 |
It's like, I record, say, I'm in a browser, and I click on things, 00:38:01.580 |
and I want to write, like, an API for the cloud UI, which they don't want me to do, but they have. 00:38:07.580 |
I just record the entire session, clean it up a little bit, and then just say, like, 00:38:11.580 |
well, write the documentation for this mass of HTTP requests that I have. 00:38:17.580 |
And once I have the API documentation, I'm just like, well, write me, like, a proxy. 00:38:20.580 |
And then I have a proxy for cloud AI's newest thing. 00:38:40.580 |
Maybe some code base or preferences that you want. 00:38:42.580 |
Create a wiki entry for your personal knowledge base. 00:38:45.580 |
So you want to learn about advanced TypeScript, Lambda things. 00:38:50.580 |
Take your existing knowledge, saying, like, my background's in Ruby. 00:38:53.580 |
Give me some exercises and project ideas to build. 00:38:56.580 |
Maybe use the guidelines for my company to format these exercises so I learn two things at once. 00:39:04.580 |
And then maybe once you have a prototype, you just, like, paste your actual API and select, well, just make it real, right? 00:39:11.580 |
But those are all, if I just said, like, you know, solve the problem and teach me how to do this, it's not going to work. 00:39:17.580 |
But if you decompose it to these little steps at each point, not only do you get value of doing it because you see the output, 00:39:23.580 |
but you also get, like, artifacts you can share, for example, right? 00:39:26.580 |
Like, once you explain how Lambda TypeScript stuff works and your colleague also wants to learn it, 00:39:31.580 |
you just give them the exercise of the transcript. 00:39:36.580 |
Maybe you put their personal prompt into it and their background's, like, in Lisp and then it will just update it. 00:39:43.580 |
So you can take some of these graphs in the transcript. 00:39:46.580 |
I'm going to give you, like, ten minutes really sharp so that I get to the mind-melting stuff. 00:39:58.580 |
If you know about DSLs, if you don't, you really should look it up. 00:40:01.580 |
LMs are great at creating a language, so you can actually create the target step itself. 00:40:06.580 |
You can say, like, well, I want to write reports, just, like, create a language to write reports. 00:40:09.580 |
And then you can target that and, you know, implement the interpreter for that language. 00:40:15.580 |
So I really -- there is an example set of prompts in here about DSLs where I create, like, a text adventure game. 00:40:24.580 |
I did that, you know, this morning at breakfast. 00:40:33.580 |
Creating adventure games, three lines of prompt, and I create, like, an alien-inspired adventure game with, like, a full source code and stuff. 00:40:43.580 |
A good target is also self-contained HTML plus JS plus whatever your favorite library is. 00:40:56.580 |
There's a couple over here, and then if you have neighbors, I recommend just working with your neighbor as well. 00:41:23.580 |
It's, like, yeah, I think that was the last handout. 00:41:34.580 |
If you want to see the repo again, I guess it's in the Slack channel. 00:41:41.580 |
So, the Slack channel is LLM workshop working programmer. 00:43:44.580 |
I'm going to accelerate a little bit, like four minutes. 00:44:12.580 |
You should be able to write like five programs in that time. 00:44:21.580 |
Like try out my prompts and just replace create a YAML DSL for the program you want to write. 00:45:08.920 |
If you're in Slack, are you in the AI engineering Slack? 00:45:14.600 |
Yeah, just look for workshop and then you should find it. 00:48:54.720 |
I'm moving on because I've got some really important stuff to show you. 00:48:58.840 |
And one is, so Josh, no, Brenton just asked, like, have you found LMs able to work with PCAP 00:49:06.540 |
I'm like, not really, but like create a program that outputs the PCAP format in a format that, 00:49:11.720 |
you know, in a language type that the model can understand. 00:49:15.040 |
So I said, write a YAML DSL to represent interesting DNS traffic out of the PCAP file. 00:49:21.280 |
Like, this is language that the LLM can do well with because it's called DNS traffic, destination 00:49:26.960 |
It's not called like byte 0x5 or like DNS underscore, I don't know, whatever. 00:49:32.860 |
And then I'm just like, write a Python file that takes a PCAP file and outputs that YAML. 00:49:37.320 |
And now I have a script that I can give every PCAP file, outputs that into something that DLLM 00:49:41.780 |
can use, and I can put it into my LLM agent loop or whatever. 00:49:47.380 |
That's what I mean with 10X engineering, right? 00:49:50.060 |
Like, in the three minutes here, I wrote another one because, so, what Josh was asking, right, 00:49:58.960 |
is like, oh, when I put in code base in my thing, like, what does that mean? 00:50:04.320 |
But usually when you interact with a code base as a human, you either interact with a small, 00:50:08.060 |
certain part, you read some docs, you maybe look at the function APIs. 00:50:12.080 |
So just create a YAML DSL to, like, grab your code base for interesting stuff, right? 00:50:16.820 |
So maybe I want to get all the functions in Python files as well as the classes. 00:50:22.420 |
So I said, like, invent a YAML DSL to create a program that finds functions and classes in 00:50:32.960 |
If I don't like the idea, just regenerate until I have a language that matches what I want. 00:50:38.220 |
Here it added, like, recursion, which I like. 00:50:41.660 |
And I was like, well, I also want, like, the markdown files. 00:50:44.400 |
So I'd, like, made up, you know, two things for the markdown. 00:50:47.940 |
And then my third prompt is just, like, implement it, please. 00:50:50.520 |
And then now I have a Python file with a YAML that I can tell how to look for stuff in 00:50:55.940 |
And then, well, now I can run it on my massive code base. 00:51:00.060 |
And I just have the function titles, the doc strings, the markdown document titles. 00:51:03.800 |
And that's a good context to give to my DSL, right? 00:51:11.840 |
Like, in real life, I would say, like, this is an afternoon project, maybe, to make it really 00:51:16.240 |
nice with the readme, with unit test, with, like, examples, and install package, whatever. 00:51:25.280 |
Like, this is maybe one of my main techniques, is, like, writing these little fragments that 00:51:29.100 |
I know are good context to do a certain task. 00:51:32.180 |
So some of them could be, like, oh, write a new widget to do something. 00:51:35.960 |
And then I'll actually add, like, grep for these widget names, maybe add the documentation. 00:51:41.900 |
Out of the Git log of last week, I'm going to create a how-to, and then that's going to be 00:51:48.200 |
And now I have, like, a little script that I can, every time I want to write a new widget, 00:51:51.640 |
now I just paste the result of that script in, so that's the life status of the codebase. 00:51:57.300 |
And then, usually it does pretty well, right? 00:51:58.960 |
Like, if you give it, like, first write the props, write this and write this. 00:52:03.420 |
So this thing, and I do this for third-party libraries, right, which often don't have, like, 00:52:08.280 |
I'll go over the codebase and just generate good documentation that I just need for, you know, 00:52:14.000 |
for the task at hand, and then it takes me, like, a minute, usually. 00:52:18.120 |
Sometimes it'll take me, like, the whole day. 00:52:21.520 |
But, um, pretty good technique, there's a graph on how I do it. 00:52:25.360 |
Um, and now I'm gonna quickly switch to the wild stuff. 00:52:33.600 |
So, word simulation, right, like, everything you tell the model is what it thinks reality 00:52:40.160 |
will be, or whatever it uses as patterns to grab its training corpus. 00:52:43.720 |
If I tell it you are a wizard, it will pretend to be a wizard. 00:52:47.700 |
If I tell it to write poetry, it will write poetry. 00:52:50.200 |
If I tell it, like, time travel exists, it will say, like, sure, time travel exists. 00:52:54.200 |
So anything that's kind of in the training corpus, so things that humans can think up so that 00:52:58.260 |
you can think up, the model will, like, usually know what to do with, right? 00:53:02.760 |
That also applies to, like, formal languages. 00:53:05.320 |
If you say, like, your world is, like, def function paren, whatever, and it's like, yeah, 00:53:11.720 |
Um, so, one thing you can do, for example, for code reviews, right, is, like, I tell it 00:53:16.700 |
you're a reality TV show, code survivor, and, like, each participant in this reality TV show 00:53:24.320 |
is one of my variables or functions in this piece of code, now have them, like, fight each 00:53:29.960 |
other and figure out who's the bad one, right, and then vote them out, like, who's got the 00:53:39.900 |
So this is based on the concept, like, whatever you tell it it is, I'm gonna use what's in 00:53:44.140 |
the training corpus, like, reality TV drama, um, so I'm gonna write some bad PHP code, 'cause 00:53:56.280 |
And I have a GPT for that, that I shared, um, but really, you can, like, write is the idea, 00:54:02.780 |
it's not the prompt, uh, I don't really do any prompt refinement engineering, except when 00:54:07.940 |
I want to repeat something in a, in a loop with a small model, um, I make typos and all 00:54:13.500 |
that stuff, so now I have, I have this, uh, I have this bad PHP, right, and I'll just paste 00:54:17.340 |
it into my code survivor thing, and this has multiple, right, so the contestants are, are, like, 00:54:25.180 |
different variables, and I told in the format and the prompt, you know, just, like, each one 00:54:31.000 |
gets to say one thing and just, like, attack someone else, identify the conflict, and then, 00:54:37.560 |
there's some kind of format in there, and, and one of the advantages is, like, A, I get conflict, 00:54:41.660 |
right, even though the model has been trained to be nice, this is, like, standard red team 00:54:45.880 |
tactics to just exploit LLMs, this works really well to do this stuff, just have it pretend to 00:54:53.000 |
It allows me, back when ChatGPT was, like, really annoying and said, like, please fill 00:54:57.920 |
in the rest, dot, dot, dot, um, which it doesn't do anymore, this would allow me to say, like, 00:55:02.840 |
make six rounds, and it would do six rounds, right, like, it was so strong in the prompt, 00:55:07.940 |
that I could, in one inference, right, like, compare that to code review my PHP, this has already 00:55:13.800 |
a lot of concrete information in it, and I can, if I don't like it, like, write three 00:55:18.020 |
more things, and there is the comeback participant, and it doesn't need to be a variable, right, 00:55:25.960 |
it's like, uh, uh, SQL injection, it can be a concept, like, concept can be a character, 00:55:32.140 |
it can give them, like, a personality, um, and then SQL injection will start to be, like, 00:55:39.800 |
dissing other people, which is pretty funny, um, but it's useful, right, like, this is proper 00:55:45.800 |
engineering, it's not, like, it's not fun, it will actually tell me, like, you should use 00:55:50.500 |
MySQLiClose when you finish this, and you forgot, like, I'll vote you out, um, and so once you 00:56:00.000 |
have all this garbage, which I don't even really need, it's, like, funny to look as it scrolls 00:56:03.720 |
by, because they're like, hey, we're just, sanitation is not our job, whatever, 00:56:09.760 |
um, at the, oh my god, this just goes on for pages and pages, which was really hard for 00:56:16.940 |
GPT to do back then, like, it would be like, oh yeah, I think this variable's bad, like, 00:56:24.900 |
Then you can do, like, write a sober code review report at the end, um, just based on everything 00:56:29.940 |
that's in the context, like, now strip the reality TV part out of your language and just 00:56:35.220 |
keep, just keep that stuff, right, um, and actually suggest fix, and I could then turn 00:56:41.560 |
that into GitHub issues or whatever, uh, you get the idea, um, but so this is the world simulation 00:56:47.560 |
thing, is that you can really make everything up, like, one technique that I have is time 00:56:51.280 |
travel, because when you correct the model, it's, like, useful information, right, it tells 00:56:55.440 |
the model what not to do, but if you have 25 messages telling you not what to do, it will 00:57:00.620 |
confuse it a little bit, so I'll do my 25 messages, I'll get to the right code at the 00:57:04.820 |
end, I'll select summarize what you did, and then, ah, thank you. 00:57:13.380 |
It will be, like, um, summarize what you did, then I'll rewind the 25 things, and I'll paste 00:57:17.720 |
in, like, this is you from the future, this is what you tell your past self, and now I have, 00:57:22.780 |
like, this pretty good, and it knows what to do with time travel, people coming back from 00:57:27.960 |
the future telling, like, your young version what to do, and it, like, it's able to generate 00:57:32.300 |
better code out of that, so, weird engineering concepts, um, so try it out, like, it, this 00:57:41.420 |
is not, like, this is not fun, this, this actually works, right, um, presidential debate is great, 00:57:48.160 |
uh, Greek drama is pretty funny, um, but where it gets really wild are prompts, like, you are the 00:57:57.960 |
application I want to build, please start, right, and now suddenly, like, going to pretend to be 00:58:05.020 |
what you want to build, and so you can tell, like, like, please output your UI as, like, a concise 00:58:10.080 |
DSL, and it's like, well, I've got, like, a main widget, I've got, like, a text area, and I've 00:58:14.200 |
got this, and I've got this, that's formal language that you can already, like, visualize, 00:58:18.880 |
um, but also you can be like, well, no, I want a sidebar, and it's like, yeah, sure, you have 00:58:22.560 |
a sidebar, uh, so, like, click the third element of the sidebar, and we'll pretend that I clicked 00:58:26.580 |
the third element of the sidebar, right, um, so just to show you, uh, you are an application 00:58:36.820 |
to plan flight, travel, and sightseeing for me and my kids, output your UI as a concise 00:58:47.100 |
DSL YAML, start, so, you know, regenerate if you don't like it, if you want it to be, like, 00:58:59.020 |
react widgets instead of, like, this high level description of what it is, uh, feel free to 00:59:04.140 |
do that, like, you can steer it, right, can be like, no, I don't want, like, my data, 00:59:07.500 |
I want, like, actually the UI here, I don't want to rewind it, I'm just going to correct 00:59:13.120 |
it, and then once I have the right thing, I can, like, gaslight it and say, like, this 00:59:16.360 |
is what you generated, like, uh, no, I want some HTML markup kind of YAML DSL with the actual 00:59:25.140 |
UI, um, zero prompt engineering, like, I don't know what I'm doing, but, you know, this is much 00:59:31.160 |
more closely to what I had in mind, it's like, oh, I want the actual widgets, um, and if I were 00:59:37.620 |
to, you know, write a JavaScript that parses this, and then as callback handles, just, like, output 00:59:42.340 |
press, I press the button X, right, it's like, uh, if I want to do, like, open the tab for day two, 00:59:49.320 |
I don't think there's even tabs for it, but it'll just be like, oh, yeah, I've got tabs, like, of course. 00:59:53.240 |
Um, and so you can interact with it, and it's kind of like, is this code, is it not code, but at the end, 01:00:03.700 |
you can say, like, oh, write a spec, right, it's going to write, like, a concise readme kind of spec, 01:00:09.380 |
like, and it can be like, implement yourself. Um, which, this is new, I haven't really played with 01:00:15.840 |
it too much, I came up with this two weeks ago, but it's kind of wild, because I'm into film photography, 01:00:21.980 |
I wanted an app with, like, three timers telling me what chemicals to mix and what, and I built it in five 01:00:26.560 |
minutes, and I was like, this is wild, like, I just said, like, I want an app to help me develop film 01:00:31.740 |
with this process, and, like, filled it in with the right chemicals, and all, and I was like, no, 01:00:36.280 |
I want, like, two parallel timers that, you know, loop, and I was like, yeah, sure, and then, at the end, 01:00:40.240 |
I was like, write a concise, had him help, JS, self-contained prototype, and then I can pass it on to the 01:00:48.040 |
graphic designer, or maybe the graphic designer does that all day, right, and he's like, ah, no, I want a sidebar, 01:00:52.940 |
wouldn't that be nice, and then tomorrow, he'll just send me the transcript, and I'll be like, okay, 01:00:57.080 |
I'll make this work for real, so there we go, this is, you are an application, I showed you the 01:01:05.480 |
code survivor, this is your turn now, five, six minutes, maybe, just, like, invent an application, 01:01:12.380 |
play with it, like, you're an application, use a concise HTML markup, YAML, DSL, start, and then see 01:01:28.080 |
TARGET, TARGET, TARGET, TARGET, TARGET, TARGET. 01:02:56.500 |
I'll actually interrupt this, I think afterwards you're going to have fun with this, but I had 01:03:01.460 |
a really interesting question come up, it's like how do I manage all of this, right, because 01:03:05.860 |
I don't know why every LLM company doesn't manage to make a decent history browser, it's 01:03:10.400 |
really not that hard, I could generate one in two days probably that has tags and allows 01:03:16.720 |
me to search, but no one manages to do that, I don't know why. 01:03:22.760 |
So what I use, and I would recommend not using my tool, don't take my tool as the thing, 01:03:28.760 |
I break it off and I tweak it to my things, but just build your own, the tool to manage 01:03:33.180 |
fragments is like you're managing text files and how to paste text files together, or maybe 01:03:37.340 |
manage shell scripts, so my most used tool is called Prompto, I literally generate it once, 01:03:43.660 |
it's a 100 line script, it will go over all my repos, I have config files where I say look 01:03:50.220 |
into all these directories, if it finds a text file, or a script, or a YAML template, 01:03:56.220 |
it will just list these, it's horrible, it has no pagination, it's whatever, but so once 01:04:02.600 |
I have like a context, right, for like my library, when I want to do something with parameters, 01:04:07.180 |
I have like a little script that gives me the current parameter API, and I can call it, right, 01:04:13.220 |
it's like if I do Prompto get glaze parameters, it will find, it will look into all these repositories, 01:04:19.220 |
like super inefficient, no indexing, no caching, it will find this parameters MD file, and it 01:04:25.220 |
will just paste it, or actually it grabs the Go thing and just outputs, it's super ugly, right, 01:04:32.220 |
there's no structure in this except dash dash dash, but so now what I can do, I have another 01:04:38.220 |
tool that just, it's really just like a command line LLM prompter, it's like HTTP request with 01:04:44.220 |
no format, and I can pass it, this as a context, right, I've got a couple of like pseudo things, 01:04:53.220 |
which I wrote two years ago, a year ago, they're not great, I can print a prompt that it's going 01:04:59.220 |
to output, and so like, oh, create a parameter for URLs, and so it will just paste everything 01:05:08.220 |
together in like a big blob, and at the beginning it says like, you're a great, you're a great 01:05:13.220 |
Go developer, actually I messed it up, it actually says PHP, so whatever, and then it just like literally 01:05:20.220 |
says like create a parameter for URLs, right, in Go, and then if I actually run this, it 01:05:28.220 |
will have, it will use my API, it will do a good job, like this is how I just add stuff 01:05:33.220 |
to my code, it will have the right packages, I can literally just paste this, and it will 01:05:39.220 |
probably run, and once I have this prototype, then I have other scripts to like integrate this 01:05:43.220 |
into the framework, like add the documentation style I want, and all of that stuff, but so this 01:05:48.220 |
tool, pasting strings together, and just like finding scripts in a directory, they're like nothing 01:05:54.220 |
scripts right there, so that's how I manage all my fragments, and we have eight minutes left, 01:06:00.220 |
so I'm going to skip to the final part, which is more experimental, it's like what are these, like you 01:06:07.220 |
realize now that I'm like kind of not joking with this 10x, because during this workshop, I wrote like six, 01:06:13.220 |
seven decent programs, right, they maybe don't work too well, but like you can guess that they're not far 01:06:22.220 |
from something that's of decent quality, and then if I want to add unit tests and whatever, I can, you know, 01:06:27.220 |
take an hour at lunch while watching Netflix to actually do it, sometimes it doesn't work, so this is like a 01:06:34.220 |
big part of it, is like no one to step away, don't, right, if it doesn't work like within 10 minutes, don't force it, 01:06:42.220 |
either do it by hand like you used to, it's fun, like we all love programming, maybe go for a walk and come back 01:06:48.220 |
when you realize which language step it didn't understand, that's a really important one, I still 01:06:54.220 |
get into full rabbit holes, where I just like don't get the task done that I actually want to do, I just 01:07:00.220 |
like generate programs that don't work to solve the task, and it's, so knowing when to step away, because it's 01:07:06.220 |
like breathtaking, right, like I generated seven programs, like what am I going to do with that, like I can't 01:07:11.220 |
mentally manage seven programs within a time span, like I can't do it, and I really have this problem, 01:07:20.220 |
it's like I have like a, an experimental repo where I put in all my scripts to kind of, I don't even, I 01:07:26.220 |
don't even bother putting them in, right, but I have like so many, and some of them are like 5,000 line code 01:07:34.220 |
things with like crazy features, and then I forget that I wrote them, and I write them again, and I go back, 01:07:39.220 |
and it's like oh, you already did it, it's like shopper approved customer management things with like 01:07:46.220 |
parallel worker queues, and I, I wrote it three times, oops, sorry, so, what however doesn't go away I think 01:07:56.220 |
is like fundamental and practical knowledge, like I know how a real-time operating system works, so it makes it 01:08:01.220 |
really easy to see if the model doesn't get it, right, like I'll be like no, let's, don't do it in assembly, 01:08:06.220 |
like do it, use this pattern, or use this framework, if I didn't do it by hand or like learn about it, 01:08:13.220 |
I wouldn't be able to do it, it would go into a big rabbit hole, I wouldn't get an RTOS, 01:08:18.220 |
but that's the difference between someone who knows RTOS and generates a new one with an LLM, 01:08:23.220 |
and can maybe do it in an hour, and someone will just like after a week just get nowhere, nothing 01:08:29.220 |
works, nothing compiles, the assembly's broken, so fundamentals, super useful, what you can't totally 01:08:36.220 |
forget is APIs, right, like I really don't care about Amazon's cloud formation API to do XYZ, like I really 01:08:42.220 |
couldn't care less, it's not knowledge I want to take with me into my retirement, and that's gone. 01:08:48.220 |
But however, the knowledge of how to deploy, you know, to do Coda's infrastructure for functions as a service, 01:08:54.220 |
that's like really useful pattern knowledge, and I can take that knowledge and apply it to like DigitalOcean 01:09:01.220 |
and to like Amazon and Google, what I don't need to do is like figure out after 10 minute deploys 01:09:08.220 |
that I forget to put like an IAM rule somewhere, like I really, I really don't care. 01:09:15.220 |
Some people do, and those are going to have a hard time I think in the next few years. 01:09:20.220 |
While practice, I already went over it, but as soon as you think about humans in the loop, right, 01:09:27.220 |
like, because language is always going to be useful when interpreted by a human, I don't care what the LLM, 01:09:32.220 |
it could generate like 10,000 programs that don't work, and like what are you going to do with that? 01:09:38.220 |
So everything you generate with the LLM, either it's like to kind of help the LLM along 01:09:43.220 |
to finally output something that's useful to humans, or it's directly useful to humans, right, 01:09:48.220 |
like a readme is useful for humans, it so happens that because LLM is trained on human language, 01:09:52.220 |
it's also useful for an LLM, but if you make it nice for humans, it will work well with LLMs as well. 01:09:59.220 |
Like if I have like, if I go on for three paragraphs about how error handling is useful, 01:10:04.220 |
like that doesn't matter very much for my senior software colleagues. 01:10:09.220 |
So I can strip it out and now have a smaller prompt and it's like going to be more focused. 01:10:13.220 |
So if you always think in this like language decomposition things of like, how would it work for humans, 01:10:19.220 |
how would it work for the user, what do I actually want to do, that's like a good engineering skill. 01:10:25.220 |
It was already before, right, but now it's actually in the small is also pretty good. 01:10:31.220 |
Divergent thinking, I don't know how to teach creativity or whatever, smoke weed. 01:10:38.220 |
But like thinking about, like these weird things of like, well I'm going to make like a reality TV show about my code review. 01:10:45.220 |
I don't know necessarily how I came up with it, but it's the best code review prompt I ever found, right. 01:10:53.220 |
It's, it's, I don't know, I don't know how to come up with that stuff. 01:10:57.220 |
And what's useful there is to look at people who are not programmers because they have amazing prompts usually. 01:11:03.220 |
And programmers are like, we're like very focused on like, oh we know how to do it and it's like it should write code. 01:11:07.220 |
And then someone comes along and it's like, you're an alien that writes code and it actually works. 01:11:12.220 |
That's why people who have no programming knowledge are able to build like full apps while a programmer comes in. 01:11:18.220 |
So it's like, well I forgot to do the for loop check and this is worthless. 01:11:22.220 |
And in the meantime the non-programmer is like, well I wrote an Android app that like allows me to fill my daughter. 01:11:28.220 |
And abstract thinking, I think a lot of it is like, once you build something, knowing how to build the thing that builds the thing. 01:11:37.220 |
Or find like the deeper abstraction of it and then focus that abstraction, I think is pretty useful. 01:11:45.220 |
So there's a bunch of languages like Common Lisp, Ruby, Haskell that are very focused on like creating abstractions. 01:11:52.220 |
Manipulating the language that actually solves the thing. 01:11:55.220 |
I think that's like a pretty interesting thing. 01:11:58.220 |
Not that you should write Common Lisp or Ruby, right. 01:12:00.220 |
But it's like getting familiar with the concepts that make up the community around that language of creating compilers. 01:12:11.220 |
Or finding like mathematical abstraction of how to do control flow or so. 01:12:15.220 |
That, that's really useful and that kind of goes into language design. 01:12:18.220 |
It's not just like programming language design. 01:12:25.220 |
It's like our head, like my favorite format is like title, one sentence, code example. 01:12:43.220 |
I can, I guess I can take live questions and then repeat them maybe. 01:12:55.220 |
It's amazing to be here in a room with like-minded people who will be probably geeking out on all these things for so long. 01:13:01.220 |
And thank you for sharing your, your findings. 01:13:05.220 |
One of them is, are you finding yourself working with other people in your team? 01:13:09.220 |
Kind of spearheading some of the incubation portions of work following these techniques. 01:13:15.220 |
And then the cover, the cavalry comes behind you to build the actual thing. 01:13:23.220 |
And then the second question is, I see you can push yourself going really far with all these techniques. 01:13:29.220 |
But how much are, have you tried using some of the auto GPT kind of metaphors as well? 01:13:38.220 |
I was like kind of solo for a long time and it's hard. 01:13:43.220 |
Uh, it's, I, I don't know how to bring people on board with like how I do things where I'm 01:13:47.220 |
like, I write 5,000 lines of code an hour and then I throw them away. 01:13:53.220 |
Um, I don't really know how to bring that into the team currently. 01:13:58.220 |
And then for the second question, I, I guess that's going to be the last one. 01:14:01.220 |
Cause I, I think the, the time is pretty strictly enforced. 01:14:07.220 |
How much do you rely on auto GPT or those kind of things? 01:14:11.220 |
Um, I just like zero shot stuff and then I'm in the chat API. 01:14:16.220 |
So I don't, I don't use, I, I use copilot for, you know, like just as, as copilot is,