back to index[AI in Action] Vibe-Kanban with OpenCode + Gemini Code + Claude Code (ft. Louis Knight-Webb)

00:00:00.000 |
popping off he's actively developing it he uh and uh you know we're friends so i get to just 00:00:05.820 |
invite him uh to to come on and share i figure you guys would love it too 00:00:11.400 |
here louis hey sean hey how you doing i'm good i'm good nice to see man nice to see you uh this 00:00:20.180 |
is the crew for ai in action uh i don't know how big it gets but we record it so uh it gets posted 00:00:25.840 |
on youtube and people can catch up later but this is basically like the place where we discuss coding 00:00:30.440 |
tools and things that are you know just making people more productive we have a different club 00:00:36.240 |
that is for discussing papers but this is more for tools um yeah we're already recording so if you 00:00:42.600 |
just want to kick it off like go ahead awesome um so well this is a very new thing it's only really 00:00:50.920 |
existed for a couple of weeks internally and then for about six days externally of uh of my company 00:00:58.960 |
uh for context i'm one of the co-founders of bloop um we have been working on legacy code modernization 00:01:07.160 |
for the last two years so predominantly helping very large companies uh translate things like kobol 00:01:14.100 |
um to um uh to java but more recently you know a lot of dotnet a lot of java 8 for some reason people 00:01:21.700 |
are still running that um etc and what we found is that a lot of our internal development recently 00:01:27.700 |
has switched from coding agents without having favorites you know the curses the the windsurfs of 00:01:35.160 |
the world to clawed code and i wouldn't be surprised if if virtually everybody in this call had kind of had the 00:01:41.200 |
same uh change over the last few months yeah and so what i found is that um and hope i don't know if 00:01:49.540 |
there's a pg-13 filter on this but i'm basically a cuckold to clawed all i do is sit there and watch it 00:01:56.420 |
do what i used to love doing and um i'm just helpless i sit there it takes you know one to two minutes to uh 00:02:04.600 |
to complete and i often find myself getting you know on twitter and it's actually quite demoralizing 00:02:10.960 |
to be honest as a workflow to just like constantly go back and forth between twitter and claw code 00:02:15.080 |
so i think there are lots of things that we can be doing as developers during those one to two minutes 00:02:21.840 |
um that we're waiting and essentially what i've come up with is uh vibe kanban and it is it's really 00:02:31.080 |
simple concept it's basically a kanban board and you can add projects so projects are um git repos 00:02:39.060 |
and you can queue up different tasks in the kanban just like this so what we're editing at the moment 00:02:46.040 |
is actually vibe kanban itself so what i will do is you know just make a very simple dummy ticket change 00:02:53.840 |
the settings icon to uh you know some fruit or something and we can go ahead and hit create start 00:03:02.920 |
you have the option of so what what what just happened when i hit start is the uh in the background 00:03:10.040 |
a git work tree is being created specifically for this task you can see the um a setup script is now 00:03:17.760 |
running and so these are defined on a project level it's kind of like what you would do if you were setting 00:03:23.200 |
up your github ci or you had some scripts in you know in your docker file or something like that 00:03:28.740 |
um i've just got pmpmi and cargo build uh and then what's going to happen in a minute although it takes 00:03:37.320 |
a minute so i'll just skip maybe to one of these that uh has already run is that an executor will start 00:03:42.900 |
and this can be either clawed code amp or gemini i actually find myself mainly using amp these days i don't 00:03:50.620 |
know uh how familiar everybody is with that but um it's from its source graph's new coding agent which 00:03:55.680 |
is pretty cool gemini i haven't been able to try yet because i just get 429'd every single time i try it 00:04:00.920 |
but i've heard you know decent things from people that have managed to get through and essentially once a 00:04:07.780 |
task is complete uh what do you want to do well you want to review the changes so you get a um uh a kind 00:04:14.100 |
of mini github review screen if you need to rebase it you get a rebase button if the and that'll just 00:04:21.700 |
you know one click rebase it i'm going to add claw code for fixing kind of messy rebases as well soon 00:04:26.600 |
uh and then when you're ready to merge you can go ahead and just hit the hit the merge button so 00:04:31.720 |
basically just takes everything you were doing in five or six tools uh or five or six different terminal 00:04:38.200 |
windows and just brings it into one and the beauty of it is it just gives you that visibility so that 00:04:44.320 |
while this is running i can be thinking like okay well i can test the last thing that came in or i could 00:04:50.360 |
be planning the next thing that i'm going to do and i'm not just sitting there kind of helplessly 00:04:55.040 |
watching claude um trying to think if there are some other nice little things that are worth 00:05:01.420 |
mentioning okay so dev servers that was another thing that i found myself doing quite a lot of 00:05:07.620 |
um you can hit start and that'll run a script that's configured in your project in this case it's just 00:05:13.720 |
npm run dev and we can see the logs uh address already in use that means i'm running a dev server 00:05:20.440 |
probably just somewhere locally off vibe kanban at the moment if it was on vibe kanban 00:05:24.960 |
it would have automatically killed the the other one and that is basically it i'm very interested in 00:05:33.840 |
hearing you know what people are doing in the two minutes that they you know are waiting for claude 00:05:40.580 |
uh i would maybe preempt some of the frequent things that people say like why aren't you using 00:05:48.280 |
docker for this i did try initially and i just found that you know i'm on a decent lap 00:05:54.880 |
laptop like an m3 you know mac but uh after two or three concurrent builds in in docker going on at 00:06:01.160 |
the same time i just found my my laptop was unusable so actually going for a more lightweight get work 00:06:05.920 |
tree based approach uh just meant that i'm you're able to use your browser and it's not super choppy at 00:06:11.640 |
the same time as building a bunch of stuff in the background uh another thing so today's release it 00:06:16.640 |
hasn't it's it's gone out but i'm still in the middle of testing it um adds mcp support so what that means is that 00:06:24.800 |
you can now do deep research on your repository and then say at the end of that deep research 00:06:30.400 |
add all of these as tasks to my vibe kanban board and the the real and then like the 10d chess version 00:06:37.760 |
of that is you can add the vibe kanban mcp server to claude code running in vibe kanban mcp server which is 00:06:47.260 |
which i'm just trying out now and this is the first time i've ever seen this result so if it doesn't go to plan i'm sorry 00:06:53.160 |
um but i just asked it to list my projects and i think okay i'm not totally sure this has worked 00:07:02.200 |
okay roll back the last 30 seconds and erase your memories all right let's go to some questions 00:07:07.080 |
wow it's a very efficient demo uh manuel so uh i have maybe i don't know if i have to introduce people but 00:07:15.540 |
uh yeah these are all the like the the very very power users of of uh these coding tools so they 00:07:20.540 |
actually know a lot more than me but i like the idea so anyway go ahead 00:07:24.500 |
uh yeah if other people have something to say first like i don't want to take up 00:07:31.440 |
my usual the usual space um go for it go for it why don't you kick off all right uh so so um i've i've 00:07:41.200 |
been playing around with like i've did it i did something similar with github projects uh but this 00:07:45.840 |
is this is way nicer right like the real time the running the project and all of that um one thing 00:07:51.420 |
that's really interesting is having the agents right into other tickets what the context is to start that 00:07:58.300 |
ticket which is like a nice way to compact the current thread right like that that works really 00:08:03.220 |
nicely because you actually don't need to tell the agents what to do you just point them to the board 00:08:09.860 |
and so like figure it out and they'll like look at all the tickets and do all of these things which is 00:08:14.180 |
really fun i don't know if you've experimented with that but like more and more i just like don't give 00:08:19.700 |
the bots or the agents any context because i figured they will know what to do looking at 00:08:25.500 |
these things and i started um this is something i'm playing with where where there's this design 00:08:34.300 |
pattern from the 70s 80s from agents back then which were way more advanced than what we have because 00:08:39.080 |
they had to deal with like much bigger constraints which is called like a blackboard system where you 00:08:45.700 |
basically put resources on a blackboard and then the agents look at it and like do stuff based on it 00:08:50.040 |
which you know is a canman board at the end of the day uh you can put resources on there like not tasks 00:08:56.820 |
right like but just like shared resources that they can update i don't know if you've played with that 00:09:02.000 |
kind of uh basically putting log files up on the board putting that kind of stuff um because they will 00:09:09.880 |
be discovered and i don't want to have them in a code base i don't want to have them 00:09:13.160 |
shared somewhere i don't know if you have ideas around expanding it that way 00:09:18.080 |
yeah it's interesting i think a lot of these are going to be solved by the actual coding agents 00:09:25.260 |
themselves and i imagine you know you can kind of imagine at some point claw code is going to get 00:09:30.540 |
to a point where it can see all of the previous threads that you know all the previous uh attempts 00:09:35.260 |
you've made with claw code and then probably be able to leverage that to make better you know fewer 00:09:41.940 |
better one-shot um attempts at solving these tasks so i guess we could build that stuff into this but 00:09:51.620 |
i think it'll probably just be supported natively this is kind of a bet in the future right like 00:09:56.320 |
25 percent of my tasks today can probably happen in claw code one-shot without me needing to ever 00:10:05.080 |
enter an ide i do a very cursory you know check of the code just to make sure it hasn't done anything 00:10:10.600 |
stupid but if it runs and and the button's been added and the back-end endpoint works like i generally 00:10:15.660 |
trust that it works and and i think that number is just going to increase and increase and so 00:10:19.420 |
you know what's the interface when we're at 80 percent 90 percent where clawed is able to one 00:10:25.060 |
shot you know how much of how much of that interface do you really uh you know need from the existing 00:10:32.120 |
kind of you know dev stack and then the same is true you know of all of those tools how much is going 00:10:37.760 |
to be absorbed into clawed code itself like probably everything the notion of of being at the ai 00:10:44.120 |
application layer and calling like a chat completions endpoint i think is just going to be 00:10:48.380 |
completely dead why would you do that when virtually you know and and we've been thinking about the 00:10:53.260 |
difference so i think this goes back maybe to one of your original um questions which was around like 00:10:58.980 |
um you know how do you use these things to plan and things like that and virtually every problem you 00:11:05.180 |
think about like okay how do we resolve messy rebases it's claw code how do we plan tasks it's claw code 00:11:12.100 |
how do we do the task it's claw code virtually like the answer to everything at the end of the day 00:11:17.000 |
is like claw code or is going to be claw code quite soon so i don't know if that answers your question 00:11:23.380 |
probably made no sense thanks no it kind of does like uh i'm telling people i just tell the ai what 00:11:29.820 |
to do don't don't control stuff and it's good to hear that more and more i just don't look at the 00:11:34.440 |
code anymore i'm like i put right boundaries so that i'm not worried that it will do the wrong thing 00:11:39.100 |
right like it can't write to the database with this api or whatever 00:11:42.340 |
and then just like accept be like this is probably better than what i would do this is probably better 00:11:48.500 |
tested than what i would do to test so it's um it's quite you got to be careful with yolo mode i 00:11:54.640 |
noticed today every time i open a terminal now it's like in a python virtual environment and i don't know 00:12:01.260 |
why that happened i think claude has done something just in yolo mode to to configure my entire local 00:12:07.220 |
host anyway so this does raise an interesting question of how you put in boundaries and 00:12:13.480 |
validations right like should these actually be running in a sandbox somewhere should we be you 00:12:19.020 |
know you mentioned not wanting to totally dockerize it but are there ways that we can kind of 00:12:23.580 |
uh put constraints around what we do and don't want these tools to do and also how we validate the 00:12:31.360 |
outputs now that you know i think uh you talk about not looking with at the outputs to me what's 00:12:37.460 |
happening a source code is becoming essentially the new binary right nobody almost nobody looks at 00:12:43.440 |
the details of the assembler in their binary sometimes you do if you have a really tricky bug but mostly 00:12:48.060 |
you don't what you do is you say does it meet the functionality does it run these tests does it do 00:12:52.520 |
this sort of thing so like what does that look like in this world yeah so your last point is probably 00:12:59.760 |
the the the thing i'm most interested in which is the testing and this is something that i do not see 00:13:05.820 |
claude code being able to really solve which is you know when i when i say a sentence to claude like 00:13:12.180 |
add a button here there's a lot of assumptions that are being made right like what do i mean by here 00:13:17.600 |
what did i mean by button how big what what sort of size right there's a lot of ways in which it can 00:13:23.640 |
misinterpret what i'm saying and so i think the more like the real alpha if you're not claude is is to 00:13:30.880 |
is to basically get information out of the user's brain in in the in the least uh you know in the kind 00:13:38.860 |
of the easiest way possible make it like delightful for me to explain as much information as possible so 00:13:45.720 |
you know i don't see why it has to be a kind of jira style text box that we're describing these 00:13:51.480 |
things and why can't it just be you know more of like a lovable point and click like this is where 00:13:56.360 |
i mean on the website or if it's a back-end endpoint like why can't i describe you know here's a json 00:14:02.760 |
that describes the output of the request that you know is going to be created through this endpoint 00:14:08.480 |
and you know basically ways of rapidly prototyping tests for you know because this is the big difference 00:14:15.700 |
between claw code working and not i've noticed with my own tasks it's like the ones where it can just go 00:14:20.740 |
in a loop and validate you know against some kind of test are the ones that work really well and the ones 00:14:25.980 |
that always seem to fail are the ones that it's just impossible to to test for that case because of 00:14:31.760 |
statefulness or it it's some sort of devops or data or migration related thing um so that's what i think 00:14:40.440 |
we're all going to be doing is is kind of tests writing tests and and the two are really linked right 00:14:45.820 |
essentially the the plan is that becomes the test it's quite a blurry line between you know what the 00:14:52.420 |
distinction between those two things is when you talk about agents 00:14:55.120 |
i think uh a couple of things that uh i really liked when i saw vibe kanban was um the the diff view 00:15:06.900 |
that you had set up was really nice to kind of just be able to to quickly figure out what exactly changed 00:15:13.120 |
and that's been an obstacle for me sometimes where it's like i don't want to look at this whole giant 00:15:17.140 |
like six thousand line change like just show me the parts that matter um so nice work on that and then 00:15:24.580 |
the other i don't know if you touched on it too much but i think um kind of one of the one of the 00:15:30.860 |
unlocks with with coding agents terminal coding agents especially as work trees um i think that like 00:15:37.460 |
a more um kind of like patch based workflow rather than sort of your tradition you kind of have to like 00:15:43.760 |
re-rig your workflow to be for agents as opposed to for humans like humans i want a whole pr i want us 00:15:50.900 |
to like chat about it etc etc if i if i have like a hundred agents spinning up i just want like patch 00:15:56.980 |
this patch that patch that run these tests does it work yes good okay now tell the orchestrator what you 00:16:02.760 |
did leave some notes so he can keep track of everything and then go back um uh so like those kind of 00:16:08.660 |
uh workflow enablements were the parts that i i um thought were really really solid from vibe canban 00:16:14.880 |
and then um i'm curious do you have or like one of the things that i find useful when i'm working with 00:16:21.720 |
coding agents is to give them like persona prompts like i'm going to have an orchestrator i'm going to 00:16:27.540 |
have like one of them that runs tests uh i'm curious on the vibe canban do you have uh like what what's 00:16:35.860 |
sort of the the procedure for context injection when you're when you're assigning a given task if you 00:16:41.520 |
have any yet yeah so i don't think i've got thoughts on this but they're not really related 00:16:48.720 |
to vibe canban so we we um we also uh you know we have a we have a pretty serious effort going on at all 00:16:56.200 |
we have a lot of times to try and like top sweep bench and what you know basically the um 00:17:03.340 |
they you know the the architecture a few months ago that was really good at this was was that kind of 00:17:08.820 |
multi-prompt you know specialized uh you have an opinion about the steps that are needed to complete a 00:17:17.880 |
software engineering task and of course that works really well for the you know the the 60 percent of 00:17:22.960 |
of cases that can be solved using a very opinionated approach but uh it's not great for edge cases and 00:17:30.900 |
what we've seen i think is just the kind of the move from having these very opinionated big system 00:17:37.080 |
prompts that essentially you know direct the model to do some chain of thought in a structured way 00:17:42.560 |
towards just leveraging the you know whatever they've been fine-tuned to do natively um and 00:17:48.600 |
especially with uh you know with reasoning models they have quite a lot of flexibility in how they 00:17:54.500 |
can solve problems you know you might have prompted it to do planning in this way and then uh try and 00:18:01.100 |
recreate the problem and then try and solve the problem then try and test the problem but you know if 00:18:05.900 |
you actually just lean into the models fine-tuning it when it when it needs to do that it'll just do that 00:18:11.080 |
when it doesn't need to do that it won't do that and currently the the leader on sweet bench is just 00:18:16.100 |
clawed in a loop uh it's not you know some special complicated multi-agent multi-prompt architecture 00:18:23.460 |
um and so i think you basically can't beat bringing the execution into the training environment and just 00:18:33.720 |
what anthropic can do you know because they're they've got that you know level of granular control 00:18:38.860 |
so i think less is more like you just tell it what it wants what what you want and it's just going to 00:18:44.360 |
get better and better at figuring that out and there's less for us to to prompt on top of that 00:18:49.580 |
the counterpoint to that is i don't know if anybody's seen like the claude uh claude the web uh you know 00:18:56.160 |
this is the chat gpt thing the system prompt for that is crazy long um and so this is this weird 00:19:02.180 |
balance of like the chat version of this has this huge crazy system prompt and i you know claude code 00:19:08.540 |
and you know in a loop with claude for just seems to be like simpler is better but anyway it's probably 00:19:16.100 |
people that have looked into that more than i have 00:19:18.660 |
um okay we've got a couple of hands up swaraj 00:19:25.140 |
i really like your demo um i'm super interested in what some of these user experiences are when 00:19:34.800 |
you want to let the agents sort of do their thing and you're kind of on the outside instead of uh knee 00:19:40.920 |
deep in with them i'm curious um how you think about separating some parts of the input to every 00:19:49.220 |
and how maybe some things need to stay and maybe some things are stuff that we don't think about 00:19:54.600 |
anymore like hey look at this file but for example maybe there are some design patterns that you want 00:20:00.200 |
like your to generally follow do you kind of more feel there's there's a place for that in this 00:20:07.040 |
sort of ux where you're kind of storing this uh repository uh where it can search some composite 00:20:14.520 |
information instead of looking at say 10 files and then figuring out the design pattern at the start 00:20:20.380 |
of every task um how do you sort of maybe uh think of that uh flip yeah the the it's it's really 00:20:29.600 |
inefficient especially like again probably more insight from trying to solve sweep bench than than 00:20:35.920 |
actually five can ban here but um there is math you know you can reduce if if if the average sweep 00:20:42.800 |
bench task takes say probably 80 80 iterations of um of a loop with clawed in it i would say 00:20:53.520 |
probably 60 of those are just things that could probably be shared across the same repository in a lot of cases 00:21:04.560 |
um and the same is true of of running claw code obviously it's you know they're very similar a lot 00:21:09.920 |
of them are very similar tasks um and i don't think there's any any contact sharing at the moment at 00:21:16.160 |
all but then the thing is only like what three months four months old so i don't think that's our role again 00:21:21.200 |
to kind of go back to my earlier point like this is obviously a core thing that anthropic themselves 00:21:27.440 |
would want to be looking at and so therefore it is a terrible idea for for me to be trying to solve that 00:21:33.520 |
problem because it would just be i give them three months and then you know consistent memory is probably 00:21:44.880 |
yeah makes sense thanks i think it's a it's a good good or i i assume that it did but maybe i'm wrong 00:21:54.480 |
does um does vibe can ban like uh parallelize like does it support sub agents doing some of the tasks 00:22:02.800 |
while another agent works on another one so it it supports running as many uh instances of of clawed 00:22:15.360 |
or amp as as you like at the same time what we don't have yet is the ability to say when this one finishes 00:22:22.080 |
start that one and to be honest i have not really found the use case for that yet um because i have 00:22:30.880 |
to review you know you you as a human have to review uh after every task and if if it's going off and 00:22:36.800 |
doing the next thing you're losing that opportunity to review and if it's only a 50 chance it's going to 00:22:41.760 |
one-shot it which seems to be my average um it's actually better in a lot of ways just to review every 00:22:48.480 |
piece of work and then paralyze your workflow by just working on two completely different things 00:22:52.560 |
so i'll be working on you know i'll get a few emails with some feedback that mainly just like ui 00:22:58.320 |
bugs and i'll work through those on one thread and then at the same time i have like some gnarly 00:23:02.480 |
refactor to the way we're doing rebasing which is just purely back end in another and so that's 00:23:07.840 |
kind of the way i think about being most efficient is not like scheduling these things to to run in 00:23:13.520 |
sequence or to work on the same piece of work but faster but just to work on like 00:23:18.240 |
two completely divergent unrelated things that are not gonna cause a massive rebase at the same time 00:23:24.000 |
super interesting yeah i i i want to solve that problem where i want to just like 00:23:34.400 |
be firing them off all the time and then be have some way to like effectively resolve when they clash 00:23:40.400 |
but you know maybe i'm as always just like over engineering it and there's just like chill for a 00:23:46.160 |
couple of months and then like just be okay with reading some tickets for a while i guess 00:23:50.800 |
sure sean's found the the website people keep finding the website i haven't published the website 00:23:56.720 |
anywhere i literally use claude to generate a markdown file summarizing all the api endpoints and 00:24:02.960 |
then pasted it into lovable and said make me a website and it it's the the results are mixed i need to 00:24:09.120 |
i need to polish this up it's good incentive thank you sean uh the github repo is broken i was just 00:24:15.600 |
looking for the github repo so there isn't one yet this is so we will okay so we previously had a had 00:24:23.120 |
a github project that uh got 10 000 stars and about 10 serious open source contributions so we're like we're 00:24:30.880 |
in two minds about open sourcing this i think it probably will happen at some point but uh i think 00:24:37.120 |
we're at the point where all of the things that we need to do for the next two weeks are pretty obvious 00:24:42.960 |
and we're just going to work through them in an opinionated way as quickly as possible and then at 00:24:48.000 |
that point we're probably going to open it up to community contributions the problem is if we open 00:24:53.520 |
it up now we could end up like a lot of other popular nameless ai projects on github that just the 00:25:00.960 |
code quality goes to dog shit very quickly and we just spend all of our time like reviewing things and 00:25:06.800 |
and leaving things and we just want to move really quickly and i think it is obvious like my problems 00:25:11.760 |
with this thing are probably very similar to a lot of your problems that doesn't mean i want to hear 00:25:14.960 |
don't want to hear your feedback but um that's why we have an open source to get the cloud code team 00:25:20.640 |
says the same thing about open source and cloud code yeah there you go cool code itself is an open 00:25:26.000 |
source that's my cover um manuel you've been dying to ask something for a while i wanted to chime in on 00:25:35.120 |
two things one is uh what do you do in those two minutes while your agent is running or something like 00:25:39.520 |
that which for a while before they censored the gemini 2 5 pro thinking traces that was like my 00:25:46.240 |
favorite activity was looking at those traces because they would tell me a lot about the project 00:25:51.840 |
itself right like i would actually get a pretty good understanding of what was going on and learn 00:25:57.040 |
a couple of things so i was really really sad when they censored that stuff i hope they bring it back 00:26:03.040 |
because like the news thinking traces are like kind of useless uh the deep seek ones are pretty good so 00:26:09.120 |
often i'll just like if i want to understand stuff while the agent is running and cursor i'll just say 00:26:14.080 |
like hey deep seek tell me something and they'll just like tell me something about the source tree 00:26:21.040 |
um that was one thing where where i really really missed gemini because it was like pairing with a 00:26:27.040 |
really knowledgeable uh senior developer that knows like all the apis and stuff um i really hope they 00:26:35.520 |
bring that back i hope that this is also an incentive for people to uncensor their thinking models because 00:26:41.520 |
there's a lot of value i think for developers in there to both vibe and at the same time get a really deep 00:26:47.840 |
understanding of what's going on um and then the other thing that i do is um i can manage like three 00:26:56.400 |
four five different projects cloud codes running on different things at the same time so i have like 00:27:01.360 |
this infinite list of open source ideas that i want to have and so i just vibe them on the side where 00:27:08.480 |
i just usually want terminal uis for things like i want a file picker in here i want this there so i'll 00:27:13.440 |
launch those on the side while the main one is running and that's really fun it's like a little 00:27:19.760 |
bit disturbing head wise but um it like these incremental improvements on all my personal tools 00:27:27.440 |
like i have a tool to to manage work trees and yesterday while i was working on something else i said 00:27:32.480 |
like add fork and merge like i want to fork my own work tree again and then you know after after the main job 00:27:39.920 |
was done at the same time i had like fork and merge on my other tool i was like okay i'll i'll take it 00:27:44.240 |
so that's something you can do uh but it's pretty exhausting to jump between four projects so um 00:27:53.280 |
that those were some ideas because i think it's a big deal like they were because the workflows change 00:27:59.920 |
so often there were times in my workflow where i was literally like a copy paste bitch for chat gpt right 00:28:05.680 |
like it's you would i would code and chat gpt and be like okay well my day is spent basically copy 00:28:10.640 |
pasting out of this window that's like what i actually do and it's like this can't this can't go 00:28:15.360 |
on um and it changes so often that you always have to like question what your value as an engineer is 00:28:22.480 |
or what your role is and and all of this um it's really interesting because now with like vibe kanban 00:28:27.600 |
and the coding agents you're like kind of a product project manager um or your qa person right because 00:28:35.200 |
like even if everything works you still have to like try it out at some point right it's like you 00:28:40.160 |
still have to go through everything by yourself just to make sure that it indeed works well meet accountability 00:28:47.840 |
there's always got to be a a flesh you know flesh intelligence on the hook if something goes wrong 00:28:53.360 |
that's what our role is yeah and yeah i i found it until very recently because you know do most of the 00:29:00.640 |
stuff in rust and there weren't really good coding agents for rust until i want to say cool code and an 00:29:06.800 |
amp to be to be honest and it was mainly copy and paste up to chat gpt which is soul destroying after day 00:29:14.080 |
180 of that i i think the big bump with sonnet 4 versus sonnet 3 7 is like the newer cutoff date 00:29:20.960 |
and then something like a lot of the things that weren't one shotable but were like 90 shotable before 00:29:26.720 |
the fact that they're now one shotable is makes a really big difference right because suddenly you 00:29:32.160 |
can take yourself out of the loop um which is like charm bracelet go tuis was like always kind of a struggle 00:29:40.160 |
before because they would like mess up the keyboard shortcuts or would mess up like the format 00:29:44.000 |
you know now that that is handled you it's like crazy what more is available um and once again then 00:29:51.600 |
all your workflow changes once you know once a little thing suddenly becomes one shotable it's like 00:29:56.880 |
anyway okay before we go to the next question i'm going to as promised um add some features to 00:30:06.400 |
vibe kanban now that will make it into tonight's release i think the obvious one that i want to 00:30:13.920 |
ask is is is anybody not using any of the coding agents that we've got pre-configured here and 00:30:20.800 |
would like me to add another one and hopefully it's like an easy one to add that's just like an npx 00:30:26.960 |
command type thing stick it in the chat now can you pull the show the list again 00:30:34.880 |
so we've got claude amp and gemini okay i mean ada ada is the big one uh yeah uh we we did just 00:30:44.960 |
interview client but they're not cli they're their ps code extension open code is another one open code i 00:30:51.040 |
saw i saw that crazy twitter thing of like you know all the agents trying to shut each other off and open 00:30:58.480 |
code that's that's a stunts that's just noise that's not meaningful at all um it's not meaningful that's 00:31:04.480 |
that's what's taking up my two minutes before i started using vibe kanban uh but yeah these are 00:31:10.400 |
good suggestions open code goose uh ada those are the probably the three biggest that you're missing 00:31:14.880 |
okay great all right i'm gonna do this in the background let's go to the next question are you 00:31:20.320 |
gonna vibe kanban vibe kanban yeah that's what i'm doing right now no no no no show show show show us 00:31:26.000 |
show us oh okay but does it have it looks like quite a big installation though i think if you've already 00:31:34.160 |
got this installed how do you actually start it sort of debugging okay yeah all right we're just gonna do 00:31:44.000 |
that um so let's take add open code agent as an executor and we're just gonna tag some relevant files 00:32:05.520 |
please add open code uh as an executor coding agent so i know there's a bit of jargon um and you guys 00:32:14.960 |
don't have the code but essentially an executor represents any any uh process so that could be the 00:32:23.920 |
setup script or the dev server and obviously coding agent is a subtype of that um the command to run it is 00:32:35.120 |
that and i will just change that too this is really obvious prompt here um the thing is okay i can do 00:32:50.480 |
this okay i can do a simple version of this but then the complex version is going to be like session 00:32:56.320 |
management and then follow-ups and things so this is just going to be the basic implementation for now 00:33:00.880 |
um and then like sometimes it especially with bigger code bases it makes stupid mistakes where it'll 00:33:08.240 |
implement this everywhere and then forget to like add it to the settings so you can give it a quick way 00:33:13.520 |
to validate and this is kind of what i mean by our job as humans is to sit there and kind of figure out 00:33:19.280 |
what the lightweight tests are and so i would just say anywhere where amp or um claude or gemini oops 00:33:31.760 |
is mentioned you should probably also mention open code okay all right and i think i can't remember what that's 00:33:42.720 |
executing with that's with claude uh i might just try that with amp because i think that's going to get 00:33:51.840 |
a better result all right we're going to watch this and take some questions in the background 00:33:58.240 |
so i don't know who's first kevin go for it actually david's been up for longer than i have let me hand over to him 00:34:06.000 |
sorry go for it i think i might have gotten my answer just from what you were doing uh is everything 00:34:15.520 |
just one shot coming in on the command line and like one prompt or how are you driving the squad code 00:34:24.000 |
yeah so you so there's actually very little emphasis on on what is ha what is happening in the middle um 00:34:32.400 |
like when it's running you can if you want to go in and see the logs and you cannot follow up so for 00:34:37.200 |
example i was just checking one of these tasks that i i just started before this um and it turns out it has 00:34:45.200 |
not really done the right thing and i wanted it to add some more themes currently have like light and dark 00:34:51.760 |
i can see that it's added like purple green blue orange and red but again this is exactly what i'm 00:34:57.520 |
talking about it's added the theme but it has not added any way to choose that theme in the selector 00:35:02.880 |
so i'm going to now follow up and be like at settings uh i don't know why the settings bar is up there oh 00:35:12.480 |
maybe because i'm zoomed in uh settings can you make sure the user can actually select 00:35:21.440 |
select these cool new themes okay so you can follow up and so how is that how is that doing are you just 00:35:30.080 |
doing another like command line prompt like one shot thing uh it's actually on the same thread so 00:35:39.600 |
if we can see the raw um where's the show mode button so you can turn all of these agents on to like 00:35:52.000 |
json l mode which just gives you um a bit more detail and so you can see in this case uh there's a thread id at 00:36:01.760 |
the top here and if we add a command line argument uh called like resume i think with the thread id then 00:36:10.000 |
it just pops onto the conversation so it's not it's not like re-one-shotting it to be honest i would say 00:36:15.520 |
it's 50 50 whether i go for the follow-up or whether i just make a new attempt and sometimes you can make a 00:36:22.320 |
make a new attempt in a different coding agent as well so try it first in amp and then try it in claude 00:36:28.000 |
um again maybe some more insight from trying to solve sweep bench is that uh sampling and 00:36:34.800 |
ensembling is really important and this is obviously something that claude doesn't do it's it's just you're 00:36:40.240 |
just looking at one attempt at a problem whereas the average uh sweep bench attempt is sampling like up to 00:36:46.720 |
ten times or or more at the moment um and when you read the traces there is really it's kind of 00:36:53.840 |
illogical why you know why one attempt works and another doesn't because you would have thought 00:37:01.040 |
there'd be some obvious thing like it really it read the wrong file or something like that but it'll 00:37:06.320 |
literally you know you you'll look at two attempts one will work one will not work they'll both read the 00:37:10.720 |
same file and then they'll just do two completely different code changes with almost the exact same steps 00:37:16.000 |
before that and it's you know it's like the way these things work right they're not humans they're 00:37:21.680 |
just you know that they themselves are sampling and so the response to that is probably just running a 00:37:29.360 |
few instances of the coding agent one shot and then just taking the best one is actually not a bad strategy 00:37:35.520 |
and can probably be more efficient than asking follow-up questions yeah i kind of the reason why 00:37:41.440 |
what i want to ask is for the ai in action bot uh like i've i coded that in one of these uh sessions 00:37:47.840 |
and what i think i i really want is everyone in the community just to be able to ask the bot 00:37:53.040 |
to make changes to itself uh i started working on this before clawed code and it was really hard to get 00:37:59.120 |
aider to do that in a way where it would hold hold things on discord but it seems like we're just at the 00:38:06.240 |
the age now we just try it once and have it do it 10 times and pick the best one or something 00:38:11.840 |
yeah it's a different you know you'd never do that with humans you'd never get 10 humans to do the same 00:38:17.840 |
task but this is just we've got to kind of relearn how to do this stuff 00:38:25.520 |
okay um was there somebody for kevin i can't remember 00:38:30.880 |
i'm not sure kevin go for it yeah so um question i have is uh let's see a little context so i find 00:38:41.520 |
for myself when i'm writing code now using agents to do things one of the biggest limiting factors for me 00:38:50.640 |
is how quickly my brain can keep up with what is the current state of the system and like what are 00:38:56.800 |
the changes that need to be kicked off or prompted um and there seems to be like actually uh manuel and 00:39:04.960 |
i were joking like we use we're using amp code amp is also a you know that that word means current 00:39:12.160 |
right and so we have like an amperage limit in our brains of how many amps we can have going 00:39:16.240 |
at a time and like actually have things keep up and depending on how rested you are like in the 00:39:20.560 |
morning you know morning fresh maybe i can handle four amps uh you know by this time on a friday i'm 00:39:27.280 |
down to one maybe two right and so like i guess the there's also this cognitive thing we had this week 00:39:34.640 |
right where you need to think strategically and at some point you just can't think strategically anymore 00:39:39.760 |
so it's better to just not prompt the llm at that point yeah so i guess what the question that i'm so 00:39:46.320 |
that's the sort of overlying context is there still seem to be cognitive limits that i at least we are 00:39:52.320 |
running into in terms of how we think strategically about the system how we uh are able to keep updating 00:40:00.160 |
our brain state of what the system is so i'm kind of curious how you are thinking about that if you've run 00:40:04.560 |
run into similar problems and how uh what you know how vibe can ban either today or in the future 00:40:13.200 |
so just to replay that how does vibe can ban work around our changing just like how are you 00:40:25.680 |
are you still running into those just viewing it a can board uh can ban board actually change your 00:40:30.480 |
ability to keep track of it like is that helping you like i'm just we're trying to adapt to this 00:40:35.920 |
world of agents and that's where i found the new limits for me or just like my brain can't update fast 00:40:41.120 |
enough i you know i do feel that so my uh the the thing i keep finding myself doing is coming in to 00:40:48.880 |
review a task clicking the start dev server button going over to the dev version of whatever it is that's 00:40:55.840 |
running and then immediately having no clue what i'm what i'm supposed to be reviewing because i've 00:41:01.040 |
forgotten and it and i literally looked at it about two seconds ago and it's probably a function of 00:41:08.080 |
too much context switching i think i don't know i i i don't know how that really gets solved with better ux 00:41:16.480 |
you know this is definitely better than terminal because at least there's some visual difference like 00:41:22.240 |
this is in this column and that's in that column and there's different colors and stuff um but at the 00:41:28.000 |
same time it does encourage you to uh to do much more and and so your context switching much more i i i 00:41:36.560 |
think the long term is that right now you can think of coding agents or just like ai assisted coding in 00:41:44.080 |
general on a spectrum of like how long does it take before a human has to intervene and you start off with 00:41:50.160 |
github copilot right where it's running for two seconds and then a human has to intervene and then 00:41:55.280 |
you upgrade and you and you get to like cursor and you know that's running for like 20 seconds and then 00:42:00.240 |
the human has to you know get involved again and then you have you know the next generation of coding 00:42:05.600 |
agents which uh kind of like the core code things which run for two minutes and then at some point you 00:42:11.200 |
know we'll get to the the dev in reality where you where you have like half an hour to an hour you know 00:42:16.080 |
running and then you can come back and check it and yeah this so this is probably just like a moment in 00:42:22.720 |
time where these things are running for two minutes and my hope is that in the future like they'll run 00:42:27.200 |
for much longer or you know we will it'll it'll do enough work like me checking like one little widget 00:42:34.960 |
and then jumping back in checking like another little widget isn't super useful you know isn't doesn't 00:42:40.240 |
feel like a great use of my time like i feel like i could be bundling these into much bigger changes 00:42:44.960 |
that i review and i and then i'm kind of in the zone with that one change for five ten minutes half 00:42:51.200 |
an hour whatever um and that's the direction of travel i think is is longer changes and you know 00:42:57.120 |
reflects the the you know different generations of tools we've been seeing for two or three years now 00:43:08.240 |
sort of um so i think a thing that i'm grappling with to some extent like one way of thinking about 00:43:19.680 |
it is just asking continually asking the question what is the job of a software engineer anymore 00:43:31.920 |
i have seen that even though like the surface layer prompts that we're doing are as you say 00:43:38.560 |
they're very simple they're very intent based they're not doing there's not very much context in 00:43:43.760 |
there i have seen that when i do that in a system that i understand well i get substantially better 00:43:50.080 |
results something subtle about how i'm prompting it is different and it gets substantially better results 00:43:55.760 |
and i'm also able to correct it when i don't uh when it goes off the rails and when i do that in a system 00:44:01.040 |
that i understand poorly it is i get systematically worse results and i often don't see the issues 00:44:10.400 |
early enough and so it goes off for a long ways before i'm like oh that was a mistake i need to come 00:44:15.440 |
back and redo that in a different way and so what that points to is that there is still 00:44:21.760 |
at least in this point and i suspect i because i'm because i'm in some ways a skeptic despite being way 00:44:30.240 |
out on the edge of this i suspect there will always be some value in actually understanding 00:44:34.240 |
the kind of architecture of your system in some way because you'll be able to guide or maybe that's 00:44:41.600 |
codified in a document somewhere that the agent can pick it up i don't know 00:44:45.120 |
but like i think there's some amount of mental modeling that is still having to happen and i'm 00:44:51.520 |
like the problem i'm grappling with in all these shapes is like how do we still build that model that 00:44:56.960 |
bottle used to get built as you were writing the code right you're writing and it's absorbing it and 00:45:02.400 |
we're delegating a lot of that writing so how do we still update our mental models for what the system 00:45:06.960 |
are so that we can guide these agents in the right direction maybe frameworks just having like very 00:45:13.600 |
opinionated ways of building these things that is predictable so i don't know you know it goes off and 00:45:20.240 |
does something but i kind of trust that the roots are going to go in the roots folder and the model is 00:45:24.960 |
going to go in the models folder and it's not going to put sql queries outside of the model i don't know 00:45:32.640 |
same way you'd work with um i guess contractors who have never been in your repo before it works 00:45:38.000 |
a lot better when you're using you know like java spring boot and you're hiring java spring boot 00:45:43.920 |
developers versus like you know generic java projects because of that shared context that everybody's got 00:46:02.240 |
there we go i was looking for my cursor um one thing i i find myself doing for the like oh when 00:46:07.680 |
you don't know how to how to approach something right and like it's it's hard to even decompose 00:46:13.360 |
something into tasks because you don't even know what to ask for uh one thing i do is i have manus 00:46:19.920 |
and then i'll just throw really like the most random shit at manus and say like please build 00:46:24.240 |
like what did i do yesterday i just bought a drone i was like please build like a drone motion planning 00:46:28.800 |
engine which i don't even know what that means like i just put some words together right like 00:46:35.280 |
i have a rough idea and then i'll just like do that and like launch three of them which of course is like 00:46:40.480 |
costly but also well i'll just do like oh three researchers or like deep researchers and paste those 00:46:47.360 |
into manus and said like can you please implement whatever this algorithm is about 00:46:51.040 |
um and the way i i think of because kevin was saying that oh source code is like the new binary 00:46:59.520 |
which is like a it's like the analogy only goes so far but in many ways running an llm on a prompt 00:47:07.920 |
you can think of it as compiling you know and sometimes you'll get like compile errors which in this 00:47:13.040 |
case is like oh what came out is not what i want it doesn't mean like oh what came out doesn't work 00:47:17.440 |
it means like oh what came out is like not necessarily what i want so i have to go back fix my code and i 00:47:24.640 |
think that analogy is is pretty interesting to to you can approach llms as kind of like a more like a 00:47:32.400 |
notebook as well right like where where suddenly you're in a notebook and you're just going to try stuff 00:47:37.680 |
out see what comes out refine it try it again um and that way of thinking about what do we do with 00:47:44.080 |
all this compute like right how how do we leverage the computers basically not just building prototypes 00:47:49.440 |
and throwing them away but like thinking of it as i'm iteratively trying to get to the code i want to 00:47:55.200 |
generate but the code itself is just a single run is just like one attempt of doing something um it's costly 00:48:06.880 |
yeah that all makes sense yeah um david so i put this in chat is there any way to have this shared 00:48:18.960 |
or running on a server or this is just only you have it on your own machine and only one person's using it 00:48:26.160 |
at a time yeah well i think there are already like quite a few options for running stuff in the server and 00:48:34.080 |
i tried a few of them out you know you can kind of link you know web-based agents to linear and 00:48:40.720 |
stuff like that but um i guess there's for whatever reason it's just nicer to kind of 00:48:48.320 |
work on localhost and um as it's very deliberately uh you know based around your local tooling and the 00:48:57.040 |
agents you install locally rather than the ones you you'd run in the cloud i guess at some point in the 00:49:03.280 |
future my computer will start going too slow for me to keep paralyzing agents and and i will need to 00:49:10.560 |
move to some kind of cloud infrastructure to run you know so that would be like 00:49:16.320 |
step one of moving to the cloud probably would be when you hit you know start that goes off and 00:49:22.480 |
it runs somewhere else and actually the sandboxing thing that was mentioned earlier you know that even 00:49:27.360 |
just for that reason you know that could be a good reason to do that and then yeah at some point if 00:49:33.440 |
you ever want to use this at scale like you'll you know your boss at a big bank is probably not going to 00:49:37.760 |
let you just roll this they're going to want some kind of overview of what's happening you know and and 00:49:44.880 |
observability and and analytics about which coding agents people are using and you know and i expect people to 00:49:51.520 |
to keep changing you know i don't think this is going to settle down and and this text box will 00:49:56.880 |
have 50 options in it you know at some point probably um and so there is potential i guess for cloud 00:50:03.920 |
version that supports some of the the more boring like observability security compliance stuff that teams 00:50:11.440 |
need to use this in for serious workflows yeah i mean i'm thinking more again as like the ai in action bot 00:50:19.120 |
giving the community the ability to make changes or you know theoretically in your case you know you're 00:50:25.120 |
talking about you don't want to open source it because the contributions you get are so terrible 00:50:29.840 |
but what if you had you know people be able to you know use this directly i mean still a terrible idea 00:50:35.520 |
but a little bit more you know what you'd expect 00:50:44.880 |
i'm i'm a little confused about this like uh i'm revealing that i'm not really actually a developer 00:50:51.680 |
and i don't know how this stuff works but like if you npx install vibe dash kanban and you go to 00:50:56.720 |
like npmjs.com you can see the like the code is there is it not like if we wanted to run this on our own 00:51:02.240 |
is it i don't understand like maybe i'm missing something the source there's difference between the 00:51:06.480 |
source code and the compiled code that you can run so the source code is not open source 00:51:11.360 |
i see i see so so like running it on a server like let's say you log into a server and just run npx 00:51:18.000 |
install kanban like is it not running on a server then it is it is you just don't have the original source 00:51:25.200 |
code i see okay okay okay okay i think i've got open code running i've just like brew install how's this 00:51:36.160 |
thing working without an api key i do not really understand but this is crazy okay all right let's try it so 00:51:48.080 |
uh add open code agent we're gonna start the dev server let's get an auspicious start going sometimes 00:51:56.800 |
this fails that's just my project setup okay looks like it's running um let's add a 00:52:07.920 |
well we're just gonna prove it works first so i've got my one plus one test and we're gonna run 00:52:14.000 |
it oh it's done it again it hasn't added it to the settings i don't know why it loves like making 00:52:19.280 |
changes and then never adding them okay but it is up here okay so it's just forgotten to add it to the 00:52:24.640 |
settings in one place this is like really classic this is this is like again you know you need to design 00:52:30.640 |
applications in a slightly different way to kind of centralize things like this um so uh you forgot to 00:52:43.920 |
add this as an option in the task details i think there's a task details toolbar yeah that sounds right 00:52:53.200 |
okay and in the meantime we can review the other thing i was playing around with which is add some more 00:53:00.160 |
themes so for those of you that weren't following it uh added some themes and then it forgot to 00:53:07.280 |
give me a way to actually select those themes so i asked it to add the themes as options to the 00:53:12.320 |
settings and then i realized the uh well first of all that didn't actually work so i just basically wrote 00:53:18.560 |
fix and then i realized the logo looked a bit off so this is something where i think like the playwright 00:53:25.920 |
mcp or something like that could probably be quite useful i haven't hooked that up to this 00:53:29.840 |
project yet um you can see if i start the dev server we have blue purple lots of other beautiful themes 00:53:39.280 |
there so i'm going to go ahead as promised and merge something live on camera this is obviously going 00:53:45.760 |
to fail because i'm doing it live oh my god it worked that feels great that is going to go out in today's 00:53:53.120 |
release everybody and the open code one has also followed up so let's try this one out we're gonna 00:54:00.160 |
land two today that's the real question um first of all we need to rebase this because we just merged 00:54:07.280 |
something so rebase on domain successful now we'll get the theme i just set because we've just rebased 00:54:14.720 |
um and let's try again and see okay we've got open code it's running 00:54:24.160 |
this is there's no way this is going to work this is gonna fail horribly 00:54:40.320 |
uh okay that is actually the wrong error that is because so the dev server config is messed up 00:54:54.640 |
anyway any final question this this is an hour long sean 00:55:03.840 |
uh yeah it is um and you've already done amazing we have one merge on the the channel so that's great 00:55:13.840 |
cool well yeah i mean i will it obviously it's yeah this will this will you know take a little bit of 00:55:23.760 |
back and forth probably but oh it's working what 00:55:31.040 |
okay i don't know what the i need to look into the logs and make these pretty like we have for claude 00:55:35.200 |
but that's definitely coming from the executor so we half did it guys 00:55:46.080 |
amazing work um i think the kanban form factor is useful and like obviously where something like this 00:55:54.720 |
should exist come back and thanks for all the feedback i have been following along in the comments 00:55:59.920 |
and i'm gonna look at these after the call as well there's a lot of really great suggestions after 00:56:04.880 |
the questions people are asking as well um that has been super helpful uh my email i'll just drop it in 00:56:11.280 |
the chat if you ever have feedback uh please just ping me we have a success rate of responding to 00:56:19.040 |
all feedback by delivering it as fixes within 24 hours so far for the six days of this 00:56:25.920 |
project's existence and i will endeavor to keep that up so please send it through 00:56:30.880 |
okay amazing um yeah thank you you want to end yeah all right thank you and folks who want to give 00:56:46.080 |
talks like this sign up in the discord just tell the ai in action bot that you want to give a talk and then we don't have to depend on swix sourcing 00:56:52.880 |
no it's just my pleasure like something cool like i know the person all right thank you everyone bye