back to index

AI in Action 15 Aug 2025: How to Build and Manage a Team of AI Agents


Chapters

0:0 Introduction to AI Engineering
0:30 Challenges with Language Models
0:58 Compliance Rules
1:6 Execution Protocol
1:45 Task Delegation
7:2 Test-Driven Development (TDD) Strategy
8:44 Prompt Engineering and Improvement
9:50 Sub-Agent Orchestration and Workflow
31:6 Sub-Agent Types & Responsibilities
31:51 Contextual Identity and Protocol
33:4 Validation Hand-off
38:8 Role of the Reviewer Agent
39:58 Local Documentation Indexing
40:37 Hooks for Automation
40:52 Database Schema and Task Management
42:1 Architecture & Framework Compliance
42:30 Playwright Debugger for Front-end Testing
43:11 Visual Anthropologist for Design Decisions
43:54 Release Gate Auditor
43:58 Data Reviewer for Business Logic

Whisper Transcript | Transcript Only Page

00:00:00.000 | Thank you.
00:00:29.980 | I got a thumbs up.
00:00:36.660 | Dirk's like the one other person besides me actually on camera.
00:00:40.360 | Yeah, I would, but I'm quite sick.
00:00:44.060 | So no worries.
00:00:46.380 | No worries.
00:00:47.120 | Thank you for still being willing to kick us off then.
00:00:51.060 | Appreciate it.
00:00:52.020 | Yeah, no worries.
00:00:53.180 | Okay.
00:00:53.500 | I finally found it.
00:00:54.700 | Sorry for the delay.
00:00:55.740 | Awesome.
00:00:57.140 | All right.
00:00:57.800 | Well, I'll hand off.
00:00:58.780 | Would you, if you'd like,
00:00:59.740 | I can keep an eye on the chat
00:01:01.900 | and just flag things as they're there
00:01:03.200 | so you don't have to pay attention to it.
00:01:04.880 | And then, yep.
00:01:06.540 | So welcome everyone.
00:01:07.560 | I'm going to hand over to Olivier.
00:01:08.640 | He kicked off probably the most epic thread
00:01:11.800 | we've had in the AI in Action channel
00:01:13.840 | talking about quad and subagents.
00:01:15.520 | So he's going to present about his approach and workflow.
00:01:18.040 | It sounds like he's going to be sick
00:01:20.840 | and he's in EU time zone.
00:01:22.480 | So he'll kick us off for the first, you know, 20, 30 minutes
00:01:25.280 | and then we can continue the conversation.
00:01:26.920 | And with that, I will go off camera,
00:01:29.160 | hand it over to you, Olivier.
00:01:30.120 | Great.
00:01:32.080 | Thank you so much.
00:01:32.840 | Okay.
00:01:34.820 | So let's give you some background,
00:01:38.260 | everyone who's watching.
00:01:40.280 | Markov is an AI engineering studio.
00:01:43.660 | we're based in Antwerp and long story short,
00:01:46.660 | we're focused on the automation of accounting systems.
00:01:50.680 | So end-to-end accounting and we're also in MedTech
00:01:53.880 | now bringing both systems to production
00:01:57.120 | on like a pan-European scale.
00:02:01.160 | for me with cloud code
00:02:03.920 | and the really big unlock
00:02:06.180 | came around
00:02:07.840 | the four model series release
00:02:12.800 | and then these subagents
00:02:14.480 | just make things a lot more reliable.
00:02:16.140 | I think it's important,
00:02:19.660 | even though I won't go into this
00:02:21.560 | in high detail today,
00:02:23.440 | that everyone is aware of
00:02:26.940 | how we get to the point
00:02:28.560 | within Markov
00:02:30.000 | where we start using cloud code, right?
00:02:33.260 | for us,
00:02:34.960 | the big learnings have been
00:02:36.380 | that as the models
00:02:37.720 | have been getting better
00:02:38.620 | at coding.
00:02:40.300 | So actually,
00:02:41.680 | you know,
00:02:42.060 | for everyone who's been
00:02:43.420 | working with these models
00:02:44.560 | for a long time,
00:02:45.360 | I'm sure you'll know
00:02:46.480 | like a few years,
00:02:48.220 | maybe even
00:02:49.060 | double digit months ago,
00:02:52.400 | they were still making
00:02:53.080 | a lot of syntax errors.
00:02:54.020 | That's not really the case anymore.
00:02:55.200 | And the models have gotten pretty good.
00:02:57.680 | So now we've come to the point
00:03:00.000 | at least for us internally
00:03:01.080 | where we will often,
00:03:03.040 | and this is like,
00:03:04.440 | you know,
00:03:04.820 | in the scope of like internal tools
00:03:06.420 | or like prototypes for customers
00:03:08.160 | because we also do a bit
00:03:10.040 | of consulting on the side.
00:03:11.460 | You know,
00:03:12.480 | our flagship products
00:03:13.780 | are medtech and accounting.
00:03:14.840 | So fintech,
00:03:15.700 | but we also do a bit
00:03:16.440 | of consulting on the side
00:03:17.380 | because we haven't raised
00:03:18.940 | any capital
00:03:19.500 | and it's a nice way
00:03:20.660 | to generate some cash flow.
00:03:21.760 | So what we have
00:03:24.980 | is basically
00:03:25.680 | on my screen,
00:03:27.820 | you'll see we have,
00:03:28.940 | so this is kind of our directory
00:03:30.520 | and all of our prompts
00:03:33.800 | that we share within our team
00:03:35.080 | are stored in here.
00:03:35.740 | The main reason
00:03:36.980 | that I'm sharing all of this,
00:03:38.620 | even though,
00:03:39.300 | you know,
00:03:39.780 | I think a lot of people
00:03:40.500 | would consider this
00:03:41.320 | to be quite proprietary data
00:03:42.840 | is because in a few months,
00:03:45.280 | it's all going to be relevant anyway
00:03:46.420 | because I'm pretty sure
00:03:47.500 | that at least
00:03:49.180 | that's been my experience
00:03:49.960 | so far,
00:03:50.380 | all this kind of hardness
00:03:51.700 | thing and scaffolding
00:03:52.800 | as the models get better,
00:03:54.360 | it just gets washed away.
00:03:55.900 | So maybe I'll be able
00:03:58.540 | to help some people
00:03:59.340 | in the coming months
00:04:00.940 | with this.
00:04:02.040 | So as you can see,
00:04:04.260 | we have these like
00:04:05.500 | architecting prompts.
00:04:06.520 | We can see we have
00:04:07.980 | these like subsections
00:04:09.540 | of prototypes.
00:04:11.180 | We have a Jinja,
00:04:12.200 | FastAPI,
00:04:13.100 | local,
00:04:14.120 | Azure,
00:04:15.120 | you know,
00:04:15.700 | so this is like for,
00:04:16.860 | this is a prompt
00:04:17.820 | that we use
00:04:18.740 | to generate,
00:04:19.820 | to create prototype plans
00:04:21.940 | for like really,
00:04:24.840 | really quick prototypes
00:04:26.140 | that will just like
00:04:27.380 | run our machine.
00:04:28.320 | like we don't even
00:04:28.940 | deploy to Azure.
00:04:30.940 | It's like for super quick
00:04:32.020 | customer demos
00:04:32.760 | and this prompt
00:04:34.520 | is basically like
00:04:35.780 | it's catered fully
00:04:37.120 | to our stack.
00:04:38.200 | It has like a lot
00:04:40.100 | of methodologies
00:04:43.880 | that we use
00:04:44.640 | on how we want
00:04:46.000 | our plans
00:04:46.560 | to be constructed.
00:04:47.240 | So we always use
00:04:48.560 | OpenAI agents,
00:04:49.860 | we use Lytlm
00:04:51.060 | with open routers.
00:04:51.940 | So all our code
00:04:53.560 | is model agnostic
00:04:54.880 | or at least the prototypes.
00:04:56.040 | The big downside
00:04:57.860 | with this,
00:04:58.520 | I'm not sure
00:04:59.320 | if there's many
00:04:59.780 | other Europeans here
00:05:00.720 | is that
00:05:01.100 | it's more difficult
00:05:03.540 | to be GDPR compliant
00:05:05.380 | when you use
00:05:06.060 | LightLM
00:05:06.780 | and open routers.
00:05:07.500 | So in production,
00:05:08.200 | it's usually
00:05:08.980 | OpenAI agents,
00:05:11.620 | and then some
00:05:12.980 | European endpoint.
00:05:14.680 | But for prototyping,
00:05:16.080 | it doesn't matter as much.
00:05:17.960 | And this is really nice
00:05:19.080 | because this way
00:05:20.620 | you can just easily
00:05:21.280 | swap models in and out
00:05:22.960 | and if it is already
00:05:24.320 | included in your
00:05:25.140 | planning stage,
00:05:26.000 | then it just makes
00:05:28.440 | life a lot easier.
00:05:30.220 | Then we usually
00:05:32.440 | go with hexagonal
00:05:34.640 | LightStructures
00:05:35.900 | or hexagonal
00:05:36.800 | software architecture.
00:05:37.580 | We define a bunch
00:05:40.780 | of variables
00:05:41.960 | that are important
00:05:43.040 | to us specifically
00:05:44.080 | for these local
00:05:45.760 | prototypes.
00:05:46.260 | We usually use
00:05:47.060 | SQLite
00:05:47.940 | and FastAPI.
00:05:49.980 | And then if we
00:05:50.700 | wanted to go really fast,
00:05:51.840 | we also have
00:05:52.320 | like Streamlit.
00:05:53.020 | So some Streamlit
00:05:54.240 | prototypes
00:05:54.920 | which don't even
00:05:57.200 | have like FastAPI.
00:05:58.300 | We also have one
00:05:59.140 | where we use
00:05:59.640 | Streamlit as a UI
00:06:00.900 | with a FastAPI
00:06:02.620 | and then two containers
00:06:03.500 | which we can then
00:06:04.460 | deploy to Azure
00:06:05.580 | if we want to.
00:06:06.300 | So that's kind of
00:06:08.020 | the essence of it, right?
00:06:09.160 | And you'll see
00:06:10.500 | like there's nothing
00:06:11.320 | in here
00:06:12.280 | that really takes us
00:06:13.060 | to production.
00:06:13.600 | That's because,
00:06:14.540 | well,
00:06:15.300 | you don't,
00:06:16.240 | you don't really
00:06:16.680 | architect software
00:06:17.920 | that often
00:06:18.580 | to take it to production,
00:06:20.020 | right?
00:06:20.500 | So that's usually
00:06:21.420 | like custom work
00:06:22.440 | that we only have to do
00:06:23.560 | when we like
00:06:25.140 | want to start
00:06:25.800 | a new flagship
00:06:26.320 | product or something.
00:06:27.360 | That's not really
00:06:28.160 | part of this.
00:06:28.780 | But if it were,
00:06:30.200 | then it would be
00:06:30.800 | something like FastAPI
00:06:32.360 | with Pydantic
00:06:33.900 | and then like
00:06:34.640 | if there's a frontend,
00:06:35.500 | frontend,
00:06:36.280 | it would be something
00:06:36.880 | TypeScript
00:06:37.560 | React-based, right?
00:06:39.380 | I think that's
00:06:40.360 | what most people
00:06:40.900 | are doing these days.
00:06:41.700 | So let's,
00:06:46.300 | and then I also
00:06:46.760 | have this agent
00:06:47.660 | blueprint
00:06:49.180 | prompt
00:06:52.560 | which will basically
00:06:53.820 | generate a plan
00:06:55.460 | of our agents
00:06:59.180 | that we want
00:06:59.720 | to create
00:07:00.060 | for a certain project.
00:07:01.480 | And we also have
00:07:02.160 | a test,
00:07:02.540 | oh yeah,
00:07:03.060 | we also have
00:07:03.720 | the test-driven
00:07:04.580 | development prompt.
00:07:06.800 | So to recap
00:07:09.020 | how it usually works
00:07:10.020 | is we have
00:07:11.200 | some idea
00:07:11.760 | of some prototype
00:07:12.580 | or some application
00:07:13.340 | that we want to make.
00:07:14.040 | Then we will pick
00:07:14.900 | the stack that we want.
00:07:16.540 | So FastAPI Ginger
00:07:18.140 | or Streamlet
00:07:18.980 | or something else.
00:07:21.480 | And then at the bottom
00:07:23.960 | of these prompts
00:07:24.640 | you can always see,
00:07:25.460 | sorry,
00:07:26.580 | you can always see
00:07:27.740 | like paste your idea
00:07:28.660 | below this line.
00:07:29.340 | So this is nice
00:07:32.520 | because that means
00:07:33.400 | that whatever provider,
00:07:34.760 | whatever model
00:07:35.560 | is best at
00:07:36.620 | architecting,
00:07:37.600 | you know,
00:07:38.320 | for like a week
00:07:39.560 | it was Gemini
00:07:41.200 | 2.5 DeepThing
00:07:42.880 | and before that
00:07:44.500 | it was O3 Pro
00:07:45.460 | and now it's,
00:07:48.000 | you know,
00:07:48.600 | GP5 Pro again.
00:07:50.800 | So you just swap
00:07:51.880 | between the models
00:07:52.560 | and usually
00:07:54.440 | how I go about
00:07:55.220 | is if I want to
00:07:56.240 | create some
00:07:56.900 | like internal tool
00:07:58.820 | give some context
00:08:01.520 | our internal tools
00:08:02.900 | like we build them
00:08:05.120 | and then we actually
00:08:05.700 | keep building on them.
00:08:06.500 | so we want
00:08:06.880 | to make sure
00:08:07.200 | that the first time
00:08:07.960 | we make them
00:08:08.500 | is like already
00:08:09.440 | maintainable,
00:08:10.920 | scalable code.
00:08:12.260 | So no spaghetti code
00:08:13.560 | and how it usually works
00:08:16.220 | is we'll,
00:08:17.200 | you know,
00:08:17.760 | depending on what
00:08:18.340 | I want to do,
00:08:18.820 | I will pick a prompt
00:08:19.740 | and then I will just
00:08:21.020 | paste this prompt in
00:08:21.940 | and then I will
00:08:23.380 | ideate with
00:08:24.480 | which I should put you.
00:08:25.340 | So I will just
00:08:26.080 | use voice messages,
00:08:27.540 | transcribe,
00:08:28.520 | and basically
00:08:29.420 | spend like half an hour
00:08:30.900 | to an hour
00:08:31.460 | just going through
00:08:32.540 | the motions
00:08:33.120 | of what does this
00:08:34.320 | actually look like?
00:08:35.080 | what is the user
00:08:35.740 | journey that I want?
00:08:36.840 | So the user stories
00:08:38.380 | and what are the
00:08:39.700 | functional requirements?
00:08:40.500 | And then
00:08:41.560 | when I've gone
00:08:43.140 | through that,
00:08:43.680 | then I'll take
00:08:44.240 | the output of that
00:08:45.020 | and I'll start
00:08:45.860 | a new session
00:08:46.400 | and I basically
00:08:47.100 | paste it underneath
00:08:47.780 | here and then
00:08:48.460 | you get a very
00:08:49.120 | nice verbose plan.
00:08:52.040 | then when you have
00:08:52.760 | that plan
00:08:53.240 | you can put that
00:08:54.920 | plan underneath
00:08:55.800 | this prompt
00:08:56.520 | which will basically
00:08:57.900 | create a whole
00:08:59.460 | test set
00:09:00.920 | for your code
00:09:02.840 | because,
00:09:03.740 | and we'll get to that
00:09:04.800 | in a bit,
00:09:05.220 | so within cloud code
00:09:06.560 | the only way
00:09:08.420 | you can really get it
00:09:09.320 | to work reliably
00:09:09.940 | from my experience
00:09:10.760 | at least,
00:09:11.200 | it's obviously
00:09:12.080 | limited,
00:09:12.920 | is to really double down
00:09:15.640 | on test-driven development
00:09:16.760 | but these models
00:09:19.000 | are quite prone
00:09:19.900 | to reward hacking
00:09:21.700 | so you really
00:09:23.140 | want to make sure
00:09:23.760 | that the test-driven
00:09:25.420 | development
00:09:25.940 | for that in itself
00:09:27.720 | you also have
00:09:28.280 | like a standalone plan.
00:09:30.300 | so you'll see
00:09:31.660 | we kind of go through
00:09:32.960 | the motions
00:09:34.040 | of doing,
00:09:35.580 | you know,
00:09:35.880 | you want to do unit tests,
00:09:37.200 | end-to-end tests,
00:09:38.120 | smoke tests,
00:09:38.940 | live API keys,
00:09:40.160 | kind of everything.
00:09:42.920 | Then when this is done,
00:09:45.000 | so you'll have two plans,
00:09:46.560 | right,
00:09:46.840 | you'll have your test-driven
00:09:47.880 | development plan,
00:09:49.300 | so you have your test-plan-md
00:09:50.940 | and then you have
00:09:52.160 | your plan-md,
00:09:53.000 | right,
00:09:53.400 | and these two files
00:09:54.480 | depending on
00:09:55.880 | the complexity
00:09:57.680 | of the application
00:09:58.260 | that you're making,
00:09:58.940 | either you can
00:10:00.040 | separate them
00:10:00.960 | and have like
00:10:03.260 | two markdown files
00:10:03.920 | you're working with
00:10:04.460 | or you just append
00:10:05.680 | one after the other
00:10:06.540 | and you have your plan-md
00:10:08.100 | to get started.
00:10:08.800 | So that's kind of
00:10:11.620 | the whole journey
00:10:12.100 | before you get to
00:10:13.000 | actually using cloud code
00:10:14.600 | and then
00:10:15.800 | this is the fun part.
00:10:18.220 | Real quick question
00:10:20.260 | came in in the chat
00:10:21.160 | before we move on
00:10:21.960 | from this section
00:10:22.580 | which is around
00:10:23.280 | for all of these
00:10:24.360 | sort of high-level
00:10:25.000 | guidance prompts,
00:10:25.820 | are these also generated
00:10:27.940 | by a model?
00:10:28.600 | Do you have like
00:10:29.340 | a base prompt
00:10:30.100 | that you use
00:10:30.540 | to generate them
00:10:31.440 | or like
00:10:31.840 | how are you managing
00:10:32.680 | those kind of prompts
00:10:34.680 | over time?
00:10:35.220 | Yeah,
00:10:36.420 | that's a good question.
00:10:37.800 | I mean,
00:10:39.040 | I've been working
00:10:39.720 | with these models
00:10:40.320 | for a very long time
00:10:41.200 | so it's kind of
00:10:42.260 | intuitive to me
00:10:43.020 | but we do have,
00:10:44.240 | so we have prompts
00:10:47.000 | for everything,
00:10:47.540 | right,
00:10:47.940 | and let's see.
00:10:50.580 | so depending on the subject matter
00:10:52.580 | I will usually,
00:10:55.300 | like if it's something
00:10:56.200 | I don't know anything about,
00:10:57.240 | like for example,
00:10:57.980 | we have this MedTech project,
00:11:00.500 | like flagship product,
00:11:01.620 | right,
00:11:01.900 | and obviously I'm not a doctor
00:11:03.400 | so I want to do a lot
00:11:06.360 | of deep research.
00:11:06.900 | So we have these prompts
00:11:08.800 | where you like,
00:11:09.720 | you basically,
00:11:10.360 | same principle,
00:11:11.340 | to put your query underneath
00:11:13.320 | and it will generate
00:11:15.140 | a highly verbose,
00:11:16.480 | like super detailed
00:11:17.580 | deep research query
00:11:19.340 | for you,
00:11:19.940 | which you can then again
00:11:21.100 | put into
00:11:21.840 | JetGPT
00:11:22.460 | or whatever model
00:11:23.420 | you want to use
00:11:24.020 | for deep research.
00:11:26.440 | and then we also have
00:11:27.760 | like user query
00:11:30.200 | to prompt,
00:11:30.840 | same principle,
00:11:32.140 | it's a prompt
00:11:34.440 | that helps you
00:11:35.020 | create prompts,
00:11:35.700 | right,
00:11:35.940 | so you basically,
00:11:36.540 | you put your thoughts
00:11:39.840 | underneath
00:11:40.400 | and from my experience,
00:11:42.200 | I always,
00:11:43.380 | always,
00:11:43.780 | always just to
00:11:44.860 | speech to text,
00:11:46.620 | right,
00:11:47.180 | because bandwidth
00:11:48.300 | that you have
00:11:49.040 | when you're typing,
00:11:50.140 | you'll always,
00:11:51.760 | you're always going
00:11:52.440 | to take shortcuts
00:11:53.140 | whereas
00:11:53.780 | when you can just
00:11:55.080 | talk to these models,
00:11:56.040 | you will just
00:11:58.580 | use a lot more
00:11:59.620 | like,
00:12:00.100 | you know,
00:12:00.860 | of your own tokens,
00:12:01.900 | your bandwidth
00:12:02.900 | is much larger
00:12:03.800 | and you'll,
00:12:04.920 | like from my experience
00:12:06.040 | at least,
00:12:06.920 | obviously anecdotally,
00:12:08.240 | a lot of the times
00:12:09.920 | when you're like talking
00:12:10.880 | to these models
00:12:13.120 | or like recording
00:12:13.860 | this first message,
00:12:14.760 | halfway through
00:12:16.060 | you're going to realize
00:12:16.920 | that something
00:12:18.000 | you said actually
00:12:18.920 | doesn't make sense
00:12:19.660 | and then you just cancel
00:12:20.600 | and you go again.
00:12:21.320 | So you're kind of,
00:12:22.200 | you're kind of iterating
00:12:22.960 | with yourself
00:12:23.680 | and then when you're happy.
00:12:25.520 | talking makes you think
00:12:26.720 | in a different way
00:12:27.360 | than writing does.
00:12:28.100 | Yeah, exactly.
00:12:29.220 | And then you put
00:12:31.860 | your idea underneath
00:12:33.120 | and then you get a prompt,
00:12:34.220 | right?
00:12:34.660 | And then
00:12:36.160 | for actually improving
00:12:38.320 | your prompts,
00:12:38.960 | that's kind of
00:12:41.160 | this nice thing
00:12:42.240 | with these models.
00:12:42.940 | So usually
00:12:44.380 | what I do
00:12:45.340 | is I just find
00:12:46.240 | like all the papers
00:12:47.400 | and like best practices
00:12:48.840 | that the labs release.
00:12:50.580 | So OpenAI,
00:12:51.980 | Entropic,
00:12:53.060 | I always have this,
00:12:54.140 | they're like best practices
00:12:55.720 | for prompting your models.
00:12:57.440 | So I will scrape that
00:12:59.040 | and then based on
00:13:00.840 | the documents
00:13:02.480 | that I've scraped,
00:13:03.200 | I will basically
00:13:04.060 | improve
00:13:05.060 | my prompt improver,
00:13:07.260 | right?
00:13:07.700 | And then
00:13:08.520 | you use that
00:13:09.980 | scrape documentation,
00:13:10.800 | you use the most
00:13:11.840 | intelligent model
00:13:12.600 | that's available to you
00:13:13.560 | and you just like
00:13:14.620 | iterate on
00:13:15.500 | your prompt improver.
00:13:17.340 | So it's kind of funny
00:13:18.460 | because it scales
00:13:19.280 | pretty well
00:13:19.700 | with model intelligence.
00:13:20.640 | And we also have
00:13:23.020 | these like best practices,
00:13:24.640 | right?
00:13:25.080 | So we have like this
00:13:25.900 | best practices
00:13:28.040 | prompting playbook.
00:13:29.300 | You know,
00:13:30.880 | you see you have
00:13:31.480 | some XML stuff here
00:13:32.540 | that's, you know,
00:13:33.200 | specifically for
00:13:34.320 | the cloud model series
00:13:36.200 | and then we have
00:13:37.020 | some best practices
00:13:38.700 | the GPT-5 models.
00:13:41.540 | But yeah,
00:13:43.680 | that's a long story short,
00:13:44.600 | that's kind of
00:13:45.340 | how we go about it.
00:13:46.180 | so you just
00:13:47.580 | scrape what's out there
00:13:48.700 | and then you put
00:13:50.160 | take that context
00:13:51.060 | and you ask
00:13:53.580 | the smartest model
00:13:54.820 | that's available to you
00:13:56.140 | to, you know,
00:13:58.120 | make it make sense
00:13:58.980 | and help you
00:13:59.880 | with prompt engineering.
00:14:00.960 | But it's always,
00:14:01.720 | it's very important
00:14:02.500 | that you,
00:14:03.000 | like this whole phase
00:14:05.140 | of planning
00:14:05.780 | and prompt engineering,
00:14:06.940 | it's not a hands-off thing.
00:14:08.500 | You actually have to monitor it
00:14:09.920 | and put in your own time,
00:14:12.440 | your own thoughts,
00:14:13.180 | else it's not going to work.
00:14:18.200 | so moving on.
00:14:21.440 | Right now,
00:14:25.160 | the flow that we have
00:14:28.740 | in Cloud Code
00:14:30.100 | is basically
00:14:32.340 | you start with your plan,
00:14:34.900 | right?
00:14:35.320 | So you use your architect prompts
00:14:37.200 | and the moment you actually
00:14:38.400 | start in Cloud Code,
00:14:40.040 | and maybe I can show you
00:14:41.240 | an example.
00:14:41.900 | Let me see if I have one here.
00:14:43.240 | Is this?
00:14:44.680 | This is maybe a bit too proprietary.
00:14:46.760 | Wait, let me,
00:14:47.460 | we grab something
00:14:48.900 | from an inter-repositoring.
00:14:50.560 | OK, so this is an example
00:15:00.840 | of a plan.
00:15:01.560 | This is for a nonprofit
00:15:03.500 | that I'm in
00:15:06.180 | where we basically,
00:15:07.680 | we want to do some AI.
00:15:10.120 | we want to use AI
00:15:11.400 | basically for managing
00:15:12.780 | our social media advertising.
00:15:14.740 | So we wanted to create
00:15:18.140 | a Jinja front-end
00:15:19.440 | FastAPI pedantic.
00:15:20.460 | We use OpenAI agent SDK,
00:15:23.740 | and this is the plan, right?
00:15:26.000 | So you can see it's
00:15:26.620 | over a thousand lines.
00:15:28.480 | It's like one hour
00:15:29.800 | of iterating with a GPT-5 Pro
00:15:32.940 | using our planning prompts.
00:15:35.000 | We have our test plan.
00:15:36.280 | As you can see,
00:15:37.460 | we have the test plan
00:15:38.440 | appended underneath our plan,
00:15:41.000 | the plan MD, so to speak.
00:15:43.040 | So the first part
00:15:44.220 | is focused more on
00:15:45.400 | like software architecture,
00:15:46.380 | data models,
00:15:47.460 | user stories, et cetera.
00:15:49.720 | And the second part
00:15:51.880 | is focused on purely
00:15:53.300 | on the testing.
00:15:55.980 | test-driven development
00:15:56.900 | of it all.
00:15:57.480 | Then we have our
00:16:00.500 | clod.md file.
00:16:01.340 | Very heavily focused
00:16:04.900 | on orchestration.
00:16:05.640 | So in this setup,
00:16:07.120 | your clod.md,
00:16:09.220 | let me, I can just
00:16:10.260 | start one here.
00:16:13.220 | So in this setup,
00:16:19.240 | your root agent,
00:16:21.680 | so to speak,
00:16:22.320 | is never actually
00:16:23.020 | going to write any code,
00:16:24.300 | right?
00:16:24.620 | All it does
00:16:25.180 | is orchestration
00:16:26.020 | and how it will
00:16:28.740 | always start
00:16:29.400 | is, you know,
00:16:31.840 | let's say you have
00:16:33.100 | a clean repository
00:16:33.980 | and you want
00:16:36.120 | to start
00:16:36.520 | with the implementation
00:16:37.160 | of your plan.
00:16:38.080 | You open it up,
00:16:39.940 | you tell it
00:16:40.360 | root-clod.md,
00:16:41.160 | and then
00:16:42.020 | this isn't actually
00:16:43.640 | the root repository,
00:16:44.540 | right?
00:16:45.160 | Which is why
00:16:45.760 | it's not looking
00:16:46.820 | for the plan.
00:16:47.400 | It's a sub-repository.
00:16:48.880 | It's like just a prompt
00:16:49.980 | stash, so to speak.
00:16:53.260 | But what it would do
00:16:54.440 | is if you tell it,
00:16:55.280 | okay, read this plan,
00:16:56.260 | it will read the plan,
00:16:57.380 | then that's when
00:17:00.080 | the automation
00:17:01.000 | or the autonomous
00:17:02.400 | engineering comes into play.
00:17:03.740 | So the first thing
00:17:05.900 | it will do
00:17:06.420 | is it will summon
00:17:08.920 | or it will execute
00:17:10.500 | the architect agent.
00:17:12.860 | And what this will do
00:17:14.760 | is it will basically
00:17:16.300 | turn our plan,
00:17:17.760 | our markdown file
00:17:18.840 | into a task list.
00:17:20.260 | And for this task list,
00:17:22.240 | we have an MCP server.
00:17:24.240 | So I wrote
00:17:26.580 | my own MCP server
00:17:29.000 | for this,
00:17:30.120 | and this one is empty.
00:17:31.300 | Let me grab one
00:17:32.620 | from a different
00:17:33.060 | repository real quick.
00:17:34.380 | I believe this one.
00:17:40.180 | So it's very simple.
00:17:41.600 | It's a SQLite database.
00:17:43.060 | And what it,
00:17:46.840 | so the architect,
00:17:47.440 | what it does
00:17:48.260 | is it will go
00:17:48.880 | through the plan
00:17:49.500 | and it will basically
00:17:50.880 | create vertically
00:17:53.360 | sliced task lists
00:17:55.760 | for maximum
00:17:56.760 | parallelization.
00:17:58.560 | and then you can see
00:17:59.860 | it's a little bit bigger.
00:18:02.620 | Okay.
00:18:06.480 | So it will basically
00:18:08.560 | create tasks
00:18:09.480 | and then assign them
00:18:10.700 | to agents
00:18:11.280 | and execute them
00:18:13.820 | in parallel, right?
00:18:14.900 | So what's very important
00:18:17.420 | is that every agent,
00:18:21.020 | so we have one main agent
00:18:22.320 | to write code
00:18:23.120 | in this one.
00:18:23.840 | So it's the executor,
00:18:24.960 | it's the sonnet,
00:18:26.020 | it's access to bash.
00:18:28.140 | that basically has access
00:18:29.040 | to all the tools
00:18:29.840 | and it also has access
00:18:31.340 | to all our MCP servers
00:18:34.940 | that we use for this.
00:18:35.680 | So we use playwright
00:18:36.660 | for like a runtime testing
00:18:40.100 | and then we have this
00:18:41.920 | core hub light,
00:18:43.700 | that's the MCP server.
00:18:44.800 | And what it'll do
00:18:46.920 | is it gets a task.
00:18:48.340 | The orchestrator agent
00:18:50.340 | is going to give it
00:18:51.440 | a contextual identity.
00:18:52.800 | So when the orchestrator
00:18:55.360 | summons or invokes
00:18:57.520 | an executor,
00:18:58.680 | it will give it
00:18:59.600 | a contextual identity
00:19:00.480 | and within that identity
00:19:02.300 | it will basically tell it
00:19:03.500 | this is your task,
00:19:04.800 | these are the files
00:19:06.460 | that you can edit,
00:19:07.060 | these are the tests
00:19:09.160 | that you have to run
00:19:09.960 | and this is the output
00:19:11.740 | that you have to generate
00:19:12.560 | and the output
00:19:13.500 | that it generates
00:19:14.500 | is a task artifact.
00:19:16.060 | Let me see
00:19:18.320 | if I can find one here.
00:19:19.580 | right, so it's always
00:19:20.320 | red to green
00:19:21.540 | test-driven development.
00:19:22.740 | So it's very small.
00:19:28.680 | Let me see if there's
00:19:32.600 | a better way
00:19:33.080 | to display this.
00:19:38.080 | right, so we use UV,
00:19:40.020 | we run tests
00:19:41.340 | and it will
00:19:42.320 | generate an artifact
00:19:44.120 | and after every time
00:19:47.300 | an executor agent runs,
00:19:51.640 | the orchestrator
00:19:54.420 | will always invoke
00:19:55.540 | a reviewer agent
00:19:56.600 | because you can only go
00:19:58.880 | so in our system
00:20:00.020 | in this MCP server
00:20:01.540 | which is being used
00:20:02.780 | by the orchestrator
00:20:03.620 | to do the task management,
00:20:04.840 | you can only go
00:20:07.340 | so a completed task
00:20:12.760 | so the first iteration
00:20:14.880 | of writing code
00:20:15.960 | will always go
00:20:17.400 | to the status
00:20:18.600 | needs review
00:20:19.580 | and that's the machine state
00:20:21.180 | so even the orchestrator
00:20:23.000 | cannot simply
00:20:23.840 | move a task
00:20:25.040 | from like in progress
00:20:26.120 | to approved
00:20:27.660 | or completed,
00:20:28.220 | it always has to go
00:20:29.100 | through in review
00:20:30.540 | and that means
00:20:32.020 | the orchestrator
00:20:33.020 | will trigger
00:20:33.820 | this reviewer agent
00:20:35.680 | and the reviewer agent
00:20:37.700 | basically checks
00:20:39.020 | two things,
00:20:39.580 | it checks the output
00:20:40.920 | of the
00:20:42.120 | executor agent
00:20:43.540 | so it will check
00:20:44.320 | it will read
00:20:45.500 | the artifact
00:20:46.060 | the artifact
00:20:47.420 | is basically
00:20:48.240 | a highly verbose
00:20:50.260 | summary
00:20:51.620 | of what the
00:20:52.680 | executor agent
00:20:53.700 | has done
00:20:54.300 | let me see
00:20:55.000 | I think the template
00:20:55.760 | should be
00:20:56.620 | in here somewhere
00:20:57.940 | right
00:20:59.080 | so this
00:20:59.620 | this is a completion report
00:21:04.460 | the task ID
00:21:05.520 | that they got
00:21:06.900 | from the orchestrator
00:21:07.700 | right
00:21:08.360 | so the orchestrator
00:21:09.200 | will give
00:21:09.780 | the executor agent
00:21:10.900 | the task ID
00:21:12.060 | that's the contextual
00:21:13.920 | what do we call
00:21:17.100 | contextual identity
00:21:20.580 | and then
00:21:22.380 | I'm sorry
00:21:23.580 | when the agent
00:21:24.200 | is done
00:21:24.640 | it will create
00:21:27.280 | this artifact
00:21:28.180 | so the implementation
00:21:29.760 | context
00:21:30.560 | the stack
00:21:33.220 | the architecture
00:21:34.400 | adherence to the plan
00:21:35.880 | so every agent
00:21:37.160 | so every instance
00:21:38.940 | of this executor agent
00:21:40.160 | and remember
00:21:40.840 | we can run
00:21:41.560 | four in parallel
00:21:42.420 | we'll first read
00:21:43.700 | the plan
00:21:44.540 | it will read
00:21:45.920 | the framework guidelines
00:21:47.040 | it gets
00:21:48.520 | it gets
00:21:49.600 | from the orchestrator
00:21:51.180 | it gets
00:21:51.840 | a concrete list
00:21:53.060 | of all the
00:21:53.540 | files
00:21:54.420 | that it's able
00:21:55.100 | to edit
00:21:55.460 | so it will output
00:21:57.460 | in this summary
00:21:58.900 | so this artifact
00:21:59.680 | that we write
00:22:00.260 | to our SQLite
00:22:01.060 | database
00:22:01.940 | which is our
00:22:02.620 | mcp server
00:22:03.260 | all the files
00:22:04.300 | that it touched
00:22:04.840 | what it did
00:22:06.760 | what it built
00:22:07.680 | and the key decisions
00:22:09.360 | it made
00:22:09.760 | the test results
00:22:11.400 | passing tests
00:22:12.440 | coverage
00:22:13.100 | all tests
00:22:13.700 | passing
00:22:14.240 | validation handoff
00:22:16.440 | so ready for validation
00:22:17.900 | when it has
00:22:19.440 | the status
00:22:19.920 | that means
00:22:20.480 | it will be
00:22:21.400 | handed over
00:22:22.200 | to the
00:22:22.920 | to the viewer
00:22:25.200 | and then
00:22:26.560 | we have
00:22:28.420 | actual
00:22:30.260 | outputs
00:22:31.380 | of the test
00:22:32.060 | right
00:22:32.420 | so what we
00:22:33.320 | mandate
00:22:35.060 | from these
00:22:35.860 | executor agents
00:22:36.840 | is that
00:22:37.840 | they have to
00:22:39.500 | basically
00:22:40.480 | deliver a
00:22:42.440 | let me see
00:22:43.580 | search
00:22:44.860 | real quick
00:22:45.520 | right
00:22:47.120 | so this
00:22:48.520 | kind of
00:22:49.100 | an important
00:22:49.720 | business
00:22:50.500 | value
00:22:50.960 | truth
00:22:52.620 | because
00:22:53.400 | these
00:22:55.240 | models
00:22:56.820 | very prone
00:22:58.020 | to reward
00:22:58.420 | hacking
00:22:59.580 | you have
00:23:00.160 | to force
00:23:00.880 | to show
00:23:01.940 | proof
00:23:03.420 | the tests
00:23:05.000 | that they
00:23:05.860 | created
00:23:08.340 | red to green
00:23:09.340 | test driven
00:23:09.840 | development
00:23:10.240 | so tests
00:23:10.780 | that they created
00:23:11.360 | the green
00:23:11.820 | tests
00:23:12.300 | especially
00:23:12.860 | in the
00:23:13.300 | phase
00:23:13.640 | where they're
00:23:14.640 | testing
00:23:15.120 | stuff at
00:23:16.220 | runtime
00:23:18.900 | smoke
00:23:19.920 | they have
00:23:21.300 | to deliver
00:23:21.660 | proof
00:23:22.100 | that what
00:23:22.580 | they did
00:23:23.060 | is actually
00:23:24.240 | because if
00:23:24.980 | we don't
00:23:25.960 | do that
00:23:26.500 | then they
00:23:27.560 | will simply
00:23:28.000 | use mock
00:23:28.840 | and they'll
00:23:29.240 | be like
00:23:29.500 | yeah sure
00:23:29.900 | passed
00:23:30.400 | and not
00:23:31.820 | really
00:23:32.080 | like it
00:23:32.480 | didn't
00:23:32.780 | and then
00:23:33.880 | after
00:23:34.880 | three hours
00:23:35.760 | of these
00:23:36.440 | agents
00:23:36.860 | running
00:23:37.160 | autonomously
00:23:37.760 | you boot up
00:23:38.340 | your application
00:23:38.920 | and it won't even
00:23:39.580 | start
00:23:39.900 | because they just
00:23:40.700 | reward
00:23:41.020 | attack the whole
00:23:41.560 | thing
00:23:41.840 | and this
00:23:44.680 | kind of
00:23:45.240 | fixes that
00:23:47.360 | because
00:23:47.900 | the reviewer
00:23:49.440 | agent
00:23:49.780 | that comes
00:23:50.320 | after the
00:23:50.800 | executor
00:23:51.320 | agent
00:23:51.680 | is going
00:23:52.220 | to actually
00:23:52.520 | check the
00:23:53.020 | proof
00:23:53.300 | and it's
00:23:54.380 | going to
00:23:54.660 | review the
00:23:55.380 | and it's
00:23:55.700 | going to
00:23:55.860 | run the
00:23:56.100 | tests
00:23:56.420 | and if
00:23:56.980 | it sees
00:23:58.180 | if the
00:23:59.380 | reviewer
00:23:59.680 | agent
00:24:00.000 | establishes
00:24:00.820 | that the
00:24:01.540 | tests
00:24:01.880 | are faulty
00:24:04.080 | let's
00:24:06.380 | quick
00:24:07.660 | then I
00:24:12.200 | basically
00:24:15.060 | here we
00:24:17.040 | sorry
00:24:19.140 | for all
00:24:19.560 | scrolling
00:24:21.620 | it will
00:24:22.180 | basically
00:24:24.440 | executor
00:24:24.920 | agent
00:24:25.240 | and it
00:24:27.340 | right
00:24:28.700 | forth
00:24:28.980 | until
00:24:29.980 | actually
00:24:30.580 | correct
00:24:32.420 | reviewer
00:24:32.900 | agent
00:24:34.620 | approved
00:24:36.020 | approved
00:24:37.520 | orchestrator
00:24:45.980 | right
00:24:48.540 | experience
00:24:50.560 | arbitrary
00:24:50.940 | because
00:24:51.640 | you know
00:24:52.160 | agents
00:24:52.760 | quite
00:24:54.660 | while
00:24:56.480 | capped
00:24:57.560 | agents
00:24:58.080 | running
00:24:58.800 | parallel
00:24:59.160 | let's
00:25:03.100 | let me
00:25:03.500 | share
00:25:04.040 | other
00:25:04.260 | screen
00:25:04.700 | quick
00:25:05.040 | is my
00:25:12.380 | whole
00:25:12.580 | screen
00:25:12.800 | visible
00:25:16.860 | we've
00:25:17.500 | we've
00:25:19.380 | stuff
00:25:27.860 | is just
00:25:28.480 | stuff
00:25:28.680 | I was
00:25:28.940 | running
00:25:29.140 | earlier
00:25:30.700 | actually
00:25:31.460 | non-profit
00:25:33.220 | thing
00:25:33.520 | I showed
00:25:34.320 | earlier
00:25:38.520 | isn't
00:25:38.740 | running
00:25:39.480 | clear
00:25:42.000 | limited
00:25:42.320 | unlucky
00:25:47.740 | it'll
00:25:48.440 | it'll
00:25:48.620 | basically
00:25:50.380 | tasks
00:25:51.520 | these
00:25:51.760 | reviewer
00:25:52.220 | agents
00:25:52.620 | after
00:25:53.580 | executor
00:25:54.060 | agents
00:25:54.800 | we also
00:25:55.840 | have a
00:25:56.180 | release
00:25:56.680 | auditor
00:25:57.580 | every
00:25:59.480 | tasks
00:26:03.240 | phases
00:26:03.700 | right
00:26:04.220 | we have
00:26:04.400 | phase
00:26:04.960 | three
00:26:05.780 | seven
00:26:06.040 | whatever
00:26:07.360 | after
00:26:07.600 | every
00:26:07.820 | phase
00:26:08.840 | through
00:26:09.340 | release
00:26:09.880 | auditor
00:26:11.320 | think
00:26:11.500 | of it
00:26:12.200 | final
00:26:13.200 | review
00:26:13.460 | agent
00:26:13.860 | which
00:26:14.920 | again
00:26:15.520 | through
00:26:16.100 | artifacts
00:26:17.160 | created
00:26:18.720 | everything
00:26:21.800 | this one
00:26:22.100 | actually
00:26:22.300 | didn't run
00:26:22.620 | too long
00:26:23.060 | but most
00:26:24.480 | of the time
00:26:24.940 | these will
00:26:25.440 | take quite
00:26:25.840 | a while
00:26:29.960 | do it
00:26:30.200 | this way
00:26:31.900 | these
00:26:32.140 | agents
00:26:32.580 | we're
00:26:32.860 | running
00:26:33.180 | parallel
00:26:33.740 | quite
00:26:36.140 | the way
00:26:36.980 | going
00:26:38.060 | quite
00:26:38.280 | important
00:26:39.520 | through
00:26:40.320 | hooks
00:26:44.280 | a few
00:26:46.560 | hooks
00:26:46.840 | right
00:26:50.440 | that's
00:26:52.200 | agent
00:26:53.960 | agent
00:26:54.240 | being
00:26:55.300 | boot up
00:26:55.900 | cloud
00:26:56.860 | agent
00:26:57.360 | you're
00:26:57.560 | talking
00:26:58.260 | stops
00:27:01.200 | basically
00:27:04.080 | cannot
00:27:05.740 | resume
00:27:07.480 | it'll
00:27:08.500 | basically
00:27:09.180 | forces
00:27:12.560 | agent
00:27:12.980 | to do
00:27:13.800 | exactly
00:27:14.420 | right
00:27:16.480 | command
00:27:20.540 | it'll
00:27:21.020 | basically
00:27:21.500 | query
00:27:22.900 | database
00:27:24.960 | figure
00:27:26.200 | which
00:27:26.480 | tasks
00:27:27.340 | approved
00:27:27.660 | completed
00:27:28.040 | merged
00:27:28.440 | pending
00:27:29.000 | signed
00:27:30.680 | tasks
00:27:31.300 | approved
00:27:32.620 | it'll
00:27:34.400 | integration
00:27:35.420 | manager
00:27:36.860 | integration
00:27:37.280 | manager
00:27:37.700 | you can
00:27:38.820 | kind of
00:27:39.160 | consider
00:27:39.680 | to be
00:27:40.240 | agent
00:27:41.340 | handles
00:27:42.180 | repository
00:27:45.480 | commits
00:27:46.040 | and what
00:27:46.580 | and then
00:27:48.180 | after
00:27:49.620 | release
00:27:50.040 | auditor
00:27:50.440 | which
00:27:51.600 | pending
00:27:53.460 | assigned
00:27:53.780 | tasks
00:27:55.020 | ready
00:27:55.560 | it'll
00:27:56.500 | ownership
00:27:56.880 | preflight
00:27:57.620 | which
00:27:57.940 | basically
00:27:58.360 | checks
00:27:59.600 | let's
00:27:59.960 | we have
00:28:00.780 | executed
00:28:01.240 | agents
00:28:01.760 | who are
00:28:02.200 | assigned
00:28:03.280 | their
00:28:06.020 | tasks
00:28:06.500 | is there
00:28:07.320 | overlap
00:28:07.800 | between
00:28:08.200 | these
00:28:08.760 | agents
00:28:09.580 | files
00:28:10.300 | because
00:28:10.540 | obviously
00:28:10.980 | in this
00:28:12.260 | setup
00:28:12.480 | we're
00:28:12.860 | using
00:28:13.260 | separate
00:28:15.180 | trees
00:28:15.480 | right
00:28:16.340 | so we
00:28:18.480 | agents
00:28:19.960 | working
00:28:20.400 | independently
00:28:21.460 | their
00:28:21.720 | files
00:28:27.580 | these
00:28:27.760 | files
00:28:28.220 | we're
00:28:28.760 | giving
00:28:29.100 | signing
00:28:29.940 | these
00:28:30.100 | executor
00:28:30.600 | agents
00:28:31.860 | parallel
00:28:32.160 | they're
00:28:33.820 | there's
00:28:34.220 | conflict
00:28:36.100 | batch
00:28:36.440 | assign
00:28:36.840 | tasks
00:28:37.300 | maximum
00:28:41.860 | simply
00:28:42.440 | start
00:28:43.840 | these
00:28:44.700 | agents
00:28:45.880 | parallel
00:28:48.100 | that's
00:28:48.320 | quite
00:28:49.140 | important
00:28:53.360 | reason
00:28:54.500 | takes
00:28:54.880 | quite a
00:28:57.300 | repetition
00:28:58.400 | actually
00:29:02.100 | really
00:29:02.540 | force
00:29:03.820 | spawn
00:29:04.940 | these
00:29:06.000 | agents
00:29:06.980 | message
00:29:07.400 | because
00:29:07.960 | from my
00:29:08.600 | experience
00:29:08.960 | what it
00:29:09.400 | usually
00:29:10.700 | it'll
00:29:12.040 | spawn
00:29:13.000 | these
00:29:13.340 | parallel
00:29:14.700 | it'll
00:29:15.000 | spawn
00:29:15.760 | agent
00:29:17.560 | blocked
00:29:20.240 | agent
00:29:20.460 | finishes
00:29:21.380 | spawn
00:29:23.000 | spawn
00:29:23.740 | agents
00:29:24.480 | parallel
00:29:26.620 | basically
00:29:28.080 | actually
00:29:28.480 | parallelize
00:29:32.320 | that's
00:29:32.720 | that's
00:29:37.600 | cloud
00:29:38.460 | a lot
00:29:39.440 | iteration
00:29:40.060 | through
00:29:43.740 | basically
00:29:45.500 | whole
00:29:45.660 | workflow
00:29:46.200 | all the
00:29:47.380 | agents
00:29:48.400 | artifact
00:29:50.080 | completion
00:29:50.640 | what agents
00:29:51.980 | are available
00:29:52.540 | to you
00:29:53.560 | how do you
00:29:54.280 | initialize
00:29:54.900 | when you start
00:29:55.680 | your session
00:29:56.780 | the boot
00:29:58.440 | artifacts
00:29:59.660 | to make
00:30:00.680 | when you're
00:30:01.140 | running
00:30:01.360 | an app
00:30:01.880 | when you're
00:30:02.300 | coding
00:30:02.520 | an application
00:30:03.080 | that it
00:30:04.200 | actually
00:30:04.520 | works
00:30:04.960 | right
00:30:05.440 | we have
00:30:05.740 | to force
00:30:06.200 | these
00:30:06.700 | agents
00:30:07.140 | actually
00:30:08.640 | and then
00:30:09.420 | debug
00:30:12.840 | it'll
00:30:13.100 | go for
00:30:13.500 | two hours
00:30:14.360 | won't
00:30:16.300 | there's
00:30:20.840 | a lot
00:30:21.700 | rules
00:30:24.140 | as to
00:30:25.020 | coordination
00:30:25.440 | has to
00:30:26.140 | place
00:30:28.560 | important
00:30:29.420 | it always
00:30:30.620 | needs
00:30:31.600 | review
00:30:32.560 | science
00:30:32.880 | reviewer
00:30:33.540 | approved
00:30:34.260 | goes to
00:30:35.260 | validation
00:30:35.840 | completed
00:30:37.100 | goes to
00:30:38.260 | integration
00:30:38.700 | coordinator
00:30:39.240 | these are
00:30:39.780 | all the
00:30:40.220 | agents
00:30:40.640 | just to be
00:30:41.240 | clear
00:30:41.440 | as you
00:30:41.680 | can see
00:30:41.960 | on the
00:30:42.240 | left hand
00:30:45.120 | basically
00:30:46.880 | it resets
00:30:48.560 | either
00:30:48.900 | depending
00:30:49.840 | severity
00:30:50.880 | issue
00:30:52.740 | architect
00:30:54.320 | fundamental
00:30:55.000 | issue
00:30:56.080 | architecture
00:30:58.580 | structural
00:30:58.880 | problem
00:30:59.380 | and the
00:31:00.120 | architect
00:31:00.600 | basically
00:31:00.940 | alter
00:31:03.440 | cycle
00:31:03.680 | starts
00:31:03.960 | again
00:31:07.620 | we also
00:31:08.400 | these
00:31:08.960 | dedicated
00:31:09.540 | testing
00:31:09.980 | agents
00:31:11.040 | review
00:31:11.400 | agent
00:31:13.060 | review
00:31:13.420 | agent
00:31:14.280 | think
00:31:15.680 | example
00:31:16.360 | project
00:31:16.780 | where
00:31:16.980 | we're
00:31:17.220 | creating
00:31:17.740 | synthetic
00:31:21.160 | review
00:31:21.460 | agent
00:31:22.060 | basically
00:31:22.580 | sanity
00:31:24.160 | check
00:31:27.660 | that's
00:31:27.960 | coming
00:31:28.240 | out of
00:31:29.460 | calls
00:31:30.500 | when you
00:31:30.820 | want to
00:31:31.040 | create
00:31:31.240 | synthetic
00:31:35.280 | actually
00:31:35.720 | checks
00:31:39.520 | you're
00:31:39.700 | trying
00:31:40.820 | basically
00:31:42.200 | judge
00:31:43.900 | cloud
00:31:44.620 | agent
00:31:49.460 | that's
00:31:49.840 | about
00:31:51.260 | experience
00:31:54.540 | would
00:31:55.700 | maybe
00:31:57.620 | three
00:31:58.540 | hours
00:31:58.940 | fully
00:31:59.240 | autonomously
00:32:02.560 | output
00:32:04.000 | would
00:32:05.140 | about
00:32:07.980 | there
00:32:08.280 | I have
00:32:09.060 | to be
00:32:09.280 | super
00:32:09.580 | honest
00:32:10.280 | don't
00:32:10.420 | think
00:32:10.620 | we've
00:32:11.080 | single
00:32:11.260 | project
00:32:11.780 | where
00:32:13.300 | everything
00:32:13.800 | worked
00:32:14.200 | there's
00:32:14.660 | always
00:32:15.940 | buttons
00:32:16.560 | don't
00:32:17.680 | tests
00:32:18.840 | reward
00:32:19.320 | hacked
00:32:21.720 | implemented
00:32:23.720 | business
00:32:24.420 | value
00:32:27.800 | truth
00:32:28.440 | that's
00:32:28.840 | something
00:32:29.280 | implemented
00:32:29.680 | today
00:32:31.460 | haven't
00:32:31.940 | chance
00:32:32.420 | really
00:32:34.820 | going
00:32:35.060 | through
00:32:35.340 | those
00:32:35.680 | motions
00:32:38.880 | actually
00:32:39.300 | really
00:32:39.800 | because
00:32:41.880 | whole
00:32:42.180 | value
00:32:43.740 | right
00:32:45.420 | actually
00:32:46.620 | proof
00:32:47.740 | tests
00:32:48.080 | executed
00:32:48.760 | especially
00:32:52.060 | right
00:32:53.460 | tests
00:32:54.260 | development
00:32:54.580 | server
00:32:55.060 | whatever
00:32:57.160 | those
00:32:57.420 | actually
00:32:57.760 | produced
00:32:59.280 | outputs
00:32:59.700 | seems
00:33:00.400 | to have
00:33:00.700 | solved
00:33:01.040 | a lot
00:33:01.280 | of reward
00:33:01.620 | hacking
00:33:02.220 | quite
00:33:02.400 | optimistic
00:33:02.800 | about
00:33:04.680 | added
00:33:06.260 | validator
00:33:07.400 | agent
00:33:07.820 | which
00:33:08.280 | basically
00:33:10.720 | second
00:33:11.980 | basically
00:33:13.240 | final
00:33:14.700 | reviewer
00:33:15.240 | agents
00:33:17.920 | build
00:33:19.400 | docker
00:33:19.960 | image
00:33:20.760 | smoke
00:33:21.060 | tests
00:33:25.020 | basically
00:33:26.620 | tests
00:33:27.500 | everything
00:33:29.480 | there
00:33:30.780 | reward
00:33:31.120 | hacking
00:33:31.440 | whatsoever
00:33:31.840 | when it
00:33:32.440 | comes to
00:33:32.800 | the test
00:33:33.180 | driven
00:33:33.420 | development
00:33:33.840 | that's
00:33:34.360 | honestly
00:33:34.660 | the main
00:33:35.300 | challenge
00:33:35.780 | the code
00:33:36.420 | itself
00:33:38.740 | syntax
00:33:39.160 | perspective
00:33:40.620 | architecture
00:33:41.860 | perspective
00:33:42.880 | there's
00:33:43.940 | they're
00:33:45.720 | quite
00:33:45.940 | prone
00:33:46.500 | reward
00:33:47.060 | hacking
00:33:52.480 | in your
00:33:52.760 | cloud
00:33:53.500 | settings
00:33:54.940 | maintain
00:33:55.300 | project
00:33:55.780 | working
00:33:56.160 | directory
00:33:57.760 | important
00:33:58.980 | going
00:34:00.960 | sauce
00:34:01.220 | halfway
00:34:02.020 | through
00:34:03.100 | forget
00:34:03.580 | where
00:34:04.300 | actually
00:34:04.820 | works
00:34:07.600 | after
00:34:08.520 | think
00:34:08.700 | every
00:34:09.060 | three
00:34:10.060 | agents
00:34:10.480 | stops
00:34:11.560 | re-inject
00:34:13.180 | cloud
00:34:14.400 | context
00:34:14.740 | window
00:34:21.120 | think
00:34:24.720 | maybe
00:34:25.600 | thing
00:34:26.020 | framework
00:34:26.420 | guide
00:34:28.160 | project
00:34:28.600 | agnostic
00:34:31.840 | a playbook
00:34:32.720 | you know
00:34:33.100 | a cookbook
00:34:35.280 | write
00:34:36.980 | write
00:34:37.360 | tests
00:34:38.880 | target
00:34:39.140 | shape
00:34:40.340 | rules
00:34:41.820 | because
00:34:42.920 | force
00:34:43.260 | every
00:34:43.540 | agent
00:34:44.060 | every
00:34:44.980 | source
00:34:45.500 | every
00:34:46.020 | agent
00:34:47.600 | framework
00:34:47.880 | guidelines
00:34:50.920 | concrete
00:34:51.500 | instructions
00:34:52.820 | orchestrator
00:34:53.300 | agent
00:34:54.360 | means
00:34:55.640 | stays
00:34:56.180 | clean
00:34:58.260 | maintainable
00:34:59.580 | don't
00:34:59.740 | wind up
00:35:02.100 | massive
00:35:03.380 | files
00:35:04.160 | stuff
00:35:06.140 | that's
00:35:07.300 | place
00:35:09.780 | maybe
00:35:10.560 | thing
00:35:13.740 | proven
00:35:14.060 | super
00:35:14.460 | handy
00:35:15.060 | especially
00:35:15.340 | when you
00:35:15.660 | want to
00:35:15.920 | develop
00:35:16.760 | your own
00:35:17.540 | servers
00:35:17.980 | or your
00:35:19.020 | agents
00:35:21.920 | scrape
00:35:22.780 | documentation
00:35:24.540 | scraped
00:35:25.100 | documentation
00:35:26.520 | agents
00:35:28.240 | router
00:35:30.020 | crawl
00:35:31.640 | and then
00:35:33.020 | anthropic
00:35:36.000 | agents
00:35:36.460 | documentation
00:35:37.860 | documentation
00:35:38.500 | because
00:35:40.520 | cloud
00:35:42.140 | agent
00:35:42.760 | orchestrator
00:35:44.360 | let's
00:35:46.040 | say for
00:35:46.260 | example
00:35:47.000 | developing
00:35:47.860 | server
00:35:48.640 | cloud
00:35:50.960 | instruct
00:35:51.840 | executor
00:35:52.340 | agents
00:35:53.680 | documentation
00:35:54.520 | available
00:35:55.240 | because
00:35:55.640 | obviously
00:35:56.420 | training
00:35:57.420 | let's
00:35:57.780 | they're
00:35:57.980 | working
00:35:59.100 | server
00:36:00.140 | every
00:36:00.360 | executor
00:36:02.000 | there
00:36:02.700 | prompt
00:36:03.880 | orchestrator
00:36:04.740 | to invoke
00:36:10.320 | directory
00:36:12.160 | index
00:36:13.000 | which
00:36:14.620 | basically
00:36:15.120 | allows
00:36:16.680 | tells
00:36:17.520 | models
00:36:20.420 | pattern
00:36:20.860 | documentation
00:36:22.260 | multi-agent
00:36:24.020 | evaluations
00:36:26.740 | agent
00:36:29.140 | always
00:36:29.360 | tells
00:36:30.140 | index
00:36:31.060 | knows
00:36:32.720 | running
00:36:33.300 | issues
00:36:33.980 | getting
00:36:34.940 | server
00:36:35.580 | cloud
00:36:37.120 | knows
00:36:37.980 | documentation
00:36:39.260 | cloud
00:36:40.460 | servers
00:36:40.840 | exactly
00:36:43.400 | saves
00:36:43.680 | a lot
00:36:45.500 | would
00:36:45.780 | anytime
00:36:46.180 | we're
00:36:46.460 | working
00:36:47.200 | project
00:36:49.460 | specific
00:36:50.380 | which
00:36:51.780 | models
00:36:52.080 | aren't
00:36:52.300 | super
00:36:52.520 | familiar
00:36:53.440 | scrape
00:36:53.800 | documentation
00:36:55.280 | make it
00:36:55.760 | available
00:36:56.640 | agents
00:36:58.820 | check
00:36:59.240 | before
00:36:59.640 | write
00:37:04.580 | think
00:37:04.780 | that's
00:37:05.100 | awesome
00:37:06.520 | we've
00:37:07.240 | accumulated
00:37:07.880 | a set
00:37:08.500 | questions
00:37:09.020 | if you
00:37:09.460 | if you're
00:37:10.140 | to go
00:37:10.440 | through
00:37:11.220 | questions
00:37:11.480 | people
00:37:13.480 | going
00:37:13.620 | start
00:37:13.880 | actually
00:37:16.040 | earlier
00:37:16.840 | actually
00:37:17.840 | tactically
00:37:18.680 | asked
00:37:19.000 | about
00:37:19.880 | there
00:37:20.880 | extension
00:37:21.320 | that's
00:37:21.640 | letting
00:37:23.120 | inside
00:37:25.520 | cursor
00:37:26.580 | that's
00:37:28.580 | extension
00:37:30.240 | let's
00:37:39.340 | viewer
00:37:40.880 | perfect
00:37:42.440 | question
00:37:45.780 | framing
00:37:47.300 | handed
00:37:48.080 | reviewer
00:37:48.600 | agent
00:37:54.080 | meant
00:37:55.660 | framing
00:37:58.440 | producing
00:37:59.860 | perspective
00:38:00.380 | they're
00:38:00.860 | reviewing
00:38:05.320 | let me
00:38:07.860 | personally
00:38:08.920 | whenever
00:38:09.160 | I ask
00:38:09.560 | it to
00:38:09.740 | review
00:38:10.300 | stuff
00:38:11.360 | finds
00:38:11.720 | I mean
00:38:12.440 | thinks
00:38:13.060 | great
00:38:14.560 | review
00:38:14.960 | stuff
00:38:15.420 | thinks
00:38:15.780 | great
00:38:17.000 | review
00:38:17.240 | somebody
00:38:17.520 | else's
00:38:17.940 | stuff
00:38:20.680 | the same
00:38:20.940 | stuff
00:38:24.920 | from my
00:38:25.320 | experience
00:38:26.000 | hasn't
00:38:26.840 | a huge
00:38:28.040 | problem
00:38:28.360 | I think
00:38:28.780 | that's
00:38:29.000 | probably
00:38:29.260 | because
00:38:29.720 | we set
00:38:31.860 | strict
00:38:32.440 | compliance
00:38:33.020 | rules
00:38:34.320 | there's
00:38:35.200 | agent
00:38:38.420 | reject
00:38:39.300 | there's
00:38:40.000 | a whole
00:38:40.140 | bunch
00:38:40.420 | rules
00:38:40.780 | common
00:38:41.500 | violations
00:38:44.520 | testing
00:38:45.200 | and then
00:38:47.700 | there's
00:38:48.140 | a protocol
00:38:48.760 | that has
00:38:49.420 | follow
00:38:50.840 | verification
00:38:51.960 | of the
00:38:52.620 | tests
00:38:53.020 | right
00:38:56.360 | experience
00:38:57.860 | deliberate
00:38:59.040 | establishing
00:39:00.480 | for the
00:39:01.160 | agent
00:39:03.840 | violation
00:39:06.340 | these
00:39:06.940 | protocols
00:39:08.140 | violated
00:39:10.060 | decision
00:39:10.800 | needs
00:39:11.100 | fixes
00:39:11.480 | right
00:39:14.740 | think
00:39:14.960 | for me
00:39:15.320 | that's
00:39:16.020 | solved
00:39:16.380 | a lot
00:39:16.600 | of it
00:39:17.080 | think
00:39:17.900 | is it
00:39:20.120 | going
00:39:21.720 | great
00:39:22.340 | perfect
00:39:22.660 | nothing
00:39:23.360 | wrong
00:39:25.280 | specifically
00:39:26.380 | adhere
00:39:26.880 | these
00:39:27.220 | thresholds
00:39:28.280 | these
00:39:28.520 | requirements
00:39:30.460 | from my
00:39:31.100 | experience
00:39:31.400 | it does
00:39:31.780 | actually
00:39:34.000 | issues
00:39:35.140 | occur
00:39:37.160 | you're
00:39:37.280 | saying
00:39:37.920 | rubric
00:39:39.620 | exactly
00:39:43.360 | thank you
00:39:45.560 | question
00:39:48.180 | failed
00:39:48.960 | handover
00:39:49.780 | executor
00:39:50.740 | reviewer
00:39:51.320 | firmly
00:39:51.620 | defined
00:39:52.200 | couldn't
00:39:53.260 | easily
00:39:55.420 | complete
00:39:59.200 | orchestrator
00:39:59.800 | let's
00:40:01.820 | orchestrator
00:40:02.280 | invokes
00:40:02.740 | a bunch
00:40:03.800 | executor
00:40:04.800 | agents
00:40:08.440 | right
00:40:16.220 | certain
00:40:16.400 | status
00:40:16.880 | right
00:40:18.140 | example
00:40:21.260 | these
00:40:21.680 | fixes
00:40:24.100 | never
00:40:26.140 | that's
00:40:27.060 | executed
00:40:27.980 | that's
00:40:29.000 | completed
00:40:29.760 | executor
00:40:32.000 | anything
00:40:32.520 | besides
00:40:33.380 | needs
00:40:35.420 | review
00:40:43.200 | automatically
00:40:43.940 | assign
00:40:46.440 | completed
00:40:47.140 | executor
00:40:48.300 | review
00:40:48.680 | agent
00:40:49.780 | there's
00:40:50.200 | machine
00:40:50.500 | state
00:40:51.880 | SQLite
00:40:53.020 | database
00:40:55.000 | server
00:40:56.560 | doesn't
00:40:57.280 | allow
00:40:57.720 | orchestrator
00:41:01.760 | change
00:41:02.160 | status
00:41:03.140 | that's
00:41:04.220 | executed
00:41:05.120 | completed
00:41:07.140 | executor
00:41:08.980 | anything
00:41:09.780 | needs
00:41:10.220 | review
00:41:12.960 | status
00:41:14.460 | approved
00:41:14.880 | which
00:41:17.440 | review
00:41:17.680 | agent
00:41:18.080 | through
00:41:18.580 | their
00:41:21.420 | moved
00:41:22.520 | completed
00:41:24.040 | there's
00:41:24.720 | machine
00:41:25.040 | state
00:41:25.420 | guardrail
00:41:26.600 | means
00:41:28.280 | model
00:41:28.520 | simply
00:41:28.840 | can't
00:41:29.420 | reward
00:41:30.680 | themselves
00:41:32.080 | skipping
00:41:32.740 | review
00:41:38.620 | right
00:41:38.900 | let's
00:41:41.080 | question
00:41:41.940 | around
00:41:43.000 | models
00:41:44.040 | agents
00:41:45.400 | dictate
00:41:45.780 | which
00:41:46.000 | models
00:41:46.340 | they're
00:41:46.540 | using
00:41:48.320 | going
00:41:48.700 | default
00:41:50.880 | depending
00:41:51.920 | complexity
00:41:52.920 | pre-executor
00:41:54.340 | sonnet
00:41:55.980 | since
00:41:56.600 | makes
00:41:59.140 | token
00:41:59.480 | usage
00:42:01.320 | everything
00:42:04.700 | thing
00:42:05.640 | writes
00:42:06.400 | sonnet
00:42:07.060 | reviewer
00:42:07.660 | architect
00:42:08.500 | validator
00:42:12.640 | right
00:42:12.960 | let's
00:42:15.740 | question
00:42:16.700 | I saw
00:42:18.780 | considered
00:42:19.160 | breaking
00:42:19.900 | validation
00:42:20.620 | layers
00:42:21.060 | approached
00:42:21.780 | different
00:42:22.140 | agents
00:42:24.520 | and we
00:42:26.280 | I mean
00:42:27.180 | kind of
00:42:28.060 | do that
00:42:28.820 | it depends
00:42:29.380 | a bit
00:42:29.700 | so we
00:42:30.340 | also have
00:42:30.700 | like a
00:42:31.000 | playwright
00:42:31.420 | debugger
00:42:37.840 | things
00:42:38.240 | especially
00:42:40.420 | runtime
00:42:40.720 | issues
00:42:42.460 | orchestrator
00:42:43.680 | can choose
00:42:44.580 | to hand
00:42:45.780 | playwright
00:42:46.220 | debugger
00:42:46.760 | and this
00:42:48.420 | agent
00:42:49.340 | is then
00:42:50.340 | you know
00:42:51.100 | specialized
00:42:51.500 | is a big
00:42:52.720 | obviously
00:42:53.140 | it's just
00:42:53.460 | a prompt
00:42:53.860 | right
00:42:54.420 | focused
00:42:56.280 | debugging
00:42:58.200 | issues
00:42:58.720 | using
00:42:59.420 | playwright
00:43:01.420 | for our
00:43:02.080 | frontend
00:43:02.700 | so we
00:43:03.520 | also have
00:43:03.880 | a frontend
00:43:05.040 | agent
00:43:05.880 | stack
00:43:06.480 | kind of
00:43:08.100 | the same
00:43:08.400 | principle
00:43:08.920 | they were
00:43:10.880 | like this
00:43:11.340 | it's a bit
00:43:12.100 | different
00:43:12.440 | because it's
00:43:13.140 | the frontend
00:43:14.100 | it's very
00:43:15.640 | at least
00:43:16.400 | you know
00:43:16.680 | I'm not
00:43:17.100 | good at
00:43:17.360 | frontend
00:43:17.760 | so maybe
00:43:18.240 | I should
00:43:18.480 | start with
00:43:19.080 | definitely
00:43:20.560 | frontend
00:43:21.020 | design
00:43:21.420 | but it's
00:43:23.320 | been very
00:43:23.760 | hard to
00:43:24.280 | just use
00:43:24.800 | design
00:43:25.160 | tokens
00:43:25.660 | and stuff
00:43:26.280 | to keep
00:43:26.780 | everything
00:43:27.660 | looking
00:43:28.260 | so we
00:43:29.680 | kind of
00:43:30.440 | went through
00:43:31.860 | this more
00:43:32.380 | esoteric
00:43:33.480 | approach
00:43:34.140 | we have
00:43:34.940 | like this
00:43:35.180 | visual
00:43:35.420 | anthropologist
00:43:36.100 | which will
00:43:36.500 | basically
00:43:36.980 | so it
00:43:41.380 | takes
00:43:41.640 | screenshots
00:43:42.200 | and then
00:43:42.900 | it just
00:43:43.360 | ingests
00:43:44.020 | the screenshots
00:43:44.600 | of the
00:43:45.120 | frontend
00:43:45.540 | to make
00:43:46.260 | design
00:43:47.580 | decisions
00:43:48.260 | so yes
00:43:49.800 | sorry
00:43:50.140 | that was a bit
00:43:51.040 | of a tangent
00:43:51.420 | but yes
00:43:51.900 | we do
00:43:52.480 | use specialized
00:43:53.060 | agents
00:43:53.860 | or sub
00:43:55.420 | agents
00:43:55.720 | rather for
00:43:56.400 | review
00:43:57.460 | if it's
00:43:58.160 | necessary
00:43:58.580 | and usually
00:43:59.380 | that will
00:43:59.760 | be the
00:44:00.440 | reviewer
00:44:01.000 | if it's
00:44:01.620 | specifically
00:44:02.120 | for stuff
00:44:02.880 | that's like
00:44:03.420 | business
00:44:04.120 | logic
00:44:04.460 | related
00:44:05.460 | synthetic
00:44:06.080 | that we
00:44:06.300 | generate
00:44:06.720 | and we
00:44:07.460 | frontend
00:44:07.920 | architect
00:44:08.520 | and then
00:44:09.320 | we have
00:44:09.440 | a playwright
00:44:09.880 | debugger
00:44:10.620 | and that's
00:44:11.100 | kind of
00:44:11.420 | all we
00:44:12.280 | maybe
00:44:13.760 | the road
00:44:14.060 | it would
00:44:14.280 | make sense
00:44:14.660 | to have
00:44:15.140 | I don't know
00:44:16.020 | awesome
00:44:18.380 | scanning through
00:44:21.280 | I guess
00:44:21.560 | the last
00:44:22.400 | question I
00:44:23.340 | see is
00:44:24.020 | for the
00:44:24.500 | local
00:44:24.780 | documentation
00:44:25.480 | which
00:44:25.820 | there's
00:44:26.040 | a lot
00:44:26.200 | of people
00:44:26.480 | chiming in
00:44:27.920 | and agreeing
00:44:28.600 | on that
00:44:28.940 | approach
00:44:29.360 | and they
00:44:29.740 | love that
00:44:30.240 | do you
00:44:30.900 | do any
00:44:31.200 | sort of
00:44:31.520 | indexing
00:44:32.140 | of that
00:44:32.640 | or how
00:44:33.280 | do you
00:44:33.660 | manage
00:44:34.660 | which
00:44:35.040 | documentation
00:44:35.500 | is relevant
00:44:36.520 | for what
00:44:36.920 | context
00:44:39.120 | we have
00:44:39.960 | the index
00:44:41.040 | right
00:44:41.500 | documentation
00:44:42.160 | we have
00:44:42.580 | the index
00:44:42.920 | directory
00:44:43.300 | sorry
00:44:43.920 | the index
00:44:45.580 | orchestrator
00:44:46.520 | I believe
00:44:48.520 | also has
00:44:52.860 | an overview
00:44:54.640 | so we have
00:44:55.120 | documentation
00:44:55.620 | guidance
00:44:57.180 | where we
00:44:59.720 | kind of
00:45:00.060 | tell the
00:45:00.720 | orchestrator
00:45:01.200 | which documentation
00:45:02.260 | is available
00:45:02.860 | to it
00:45:03.320 | and then
00:45:04.500 | based on
00:45:06.120 | the index
00:45:07.260 | inform
00:45:09.520 | agents
00:45:09.940 | in the
00:45:10.500 | prompt
00:45:10.780 | when they're
00:45:11.100 | invoked
00:45:11.520 | what documentation
00:45:12.400 | they should
00:45:13.440 | have you
00:45:15.960 | considered
00:45:16.300 | an archivist
00:45:18.960 | specifically
00:45:20.140 | context
00:45:20.700 | and documentation
00:45:21.240 | management
00:45:21.860 | who can
00:45:24.240 | take that
00:45:24.960 | of the
00:45:25.360 | orchestrators
00:45:26.060 | context
00:45:27.000 | window
00:45:29.080 | I mean
00:45:29.660 | that would
00:45:30.100 | make a lot
00:45:30.500 | of sense
00:45:30.880 | I mean
00:45:32.520 | I think
00:45:33.000 | that what
00:45:33.500 | I have
00:45:33.840 | here is
00:45:34.600 | version
00:45:36.600 | which is
00:45:37.260 | probably
00:45:37.520 | going to
00:45:37.720 | get washed
00:45:38.300 | by model
00:45:38.680 | improvements
00:45:39.140 | in a few
00:45:39.500 | months
00:45:39.840 | but I'm
00:45:41.100 | 100% sure
00:45:41.740 | there are
00:45:42.420 | a thousand
00:45:43.440 | that you
00:45:44.340 | can do
00:45:44.660 | what I'm
00:45:45.000 | doing
00:45:45.320 | better
00:45:46.480 | having
00:45:47.520 | granular
00:45:48.660 | management
00:45:49.060 | and more
00:45:49.640 | granular
00:45:50.080 | agents
00:45:50.760 | and a
00:45:51.100 | better
00:45:51.260 | system
00:45:51.880 | handoffs
00:45:53.440 | are you
00:45:55.380 | going to
00:45:55.580 | open source
00:45:56.420 | oh yeah
00:45:58.320 | I think
00:46:00.720 | thanks
00:46:01.280 | I built
00:46:02.920 | my own
00:46:03.240 | version
00:46:03.500 | of this
00:46:04.280 | but yours
00:46:05.860 | bigger
00:46:06.640 | I think
00:46:07.120 | you've
00:46:07.420 | put more
00:46:07.980 | in it
00:46:08.220 | than I
00:46:08.720 | and I
00:46:10.800 | what you
00:46:12.820 | I hope
00:46:13.240 | it works
00:46:14.340 | I hope
00:46:14.800 | it doesn't
00:46:15.160 | just work
00:46:15.540 | on my
00:46:15.800 | machine
00:46:16.100 | it should
00:46:16.860 | awesome
00:46:21.100 | this was
00:46:22.100 | super
00:46:22.960 | interesting
00:46:24.660 | engagement
00:46:25.100 | and the
00:46:25.520 | comments
00:46:26.180 | appreciate
00:46:26.740 | you taking
00:46:27.280 | the time
00:46:27.840 | I know
00:46:28.220 | it's late
00:46:28.760 | you're
00:46:29.940 | we've
00:46:30.520 | held you
00:46:31.260 | when we
00:46:32.460 | thought
00:46:32.820 | would
00:46:33.340 | thank
00:46:33.820 | Olivier
00:46:34.160 | really
00:46:36.460 | appreciate
00:46:36.980 | and hopefully
00:46:37.340 | we will
00:46:37.900 | from you
00:46:38.260 | again
00:46:38.520 | in the
00:46:38.760 | future
00:46:39.060 | great
00:46:40.420 | thank you
00:46:40.840 | very much
00:46:41.220 | have a
00:46:42.740 | all right
00:46:44.360 | awesome
00:46:47.900 | we have
00:46:48.960 | a few
00:46:49.300 | minutes
00:46:49.920 | anybody
00:46:50.080 | wants
00:46:51.380 | discuss
00:46:52.180 | debrief
00:46:52.700 | insights
00:46:53.120 | things
00:46:56.600 | minutes
00:46:56.840 | early
00:46:57.220 | whatever
00:47:00.080 | y'all
00:47:00.360 | prefer
00:47:00.680 | I think
00:47:03.700 | local
00:47:04.240 | documentation
00:47:04.720 | thing
00:47:05.120 | and keeping
00:47:05.440 | that up
00:47:05.720 | to date
00:47:05.900 | was super
00:47:06.600 | like the
00:47:07.000 | discussion
00:47:07.400 | around that
00:47:07.800 | seemed
00:47:08.020 | interesting
00:47:08.380 | and I
00:47:08.700 | want to
00:47:09.420 | I post
00:47:09.800 | all the
00:47:10.160 | links
00:47:11.000 | in that
00:47:11.640 | conversation
00:47:12.580 | discord
00:47:15.280 | takeaway
00:47:16.800 | someone
00:47:17.120 | who's
00:47:17.300 | trying
00:47:18.960 | documentation
00:47:20.280 | interesting
00:47:20.640 | discussion
00:47:21.100 | but I
00:47:21.660 | think
00:47:21.880 | whole
00:47:21.980 | thing
00:47:22.240 | actually
00:47:22.460 | pretty
00:47:22.660 | interesting
00:47:24.440 | think
00:47:24.580 | there
00:47:24.800 | something
00:47:24.980 | to be
00:47:25.440 | because
00:47:27.300 | messages
00:47:30.900 | there
00:47:31.940 | orchestration
00:47:32.660 | which
00:47:33.060 | think
00:47:33.280 | there's
00:47:33.980 | probably
00:47:34.880 | element
00:47:35.780 | going
00:47:38.520 | think
00:47:38.760 | overall
00:47:39.260 | to see
00:47:39.480 | somebody's
00:47:39.840 | workflow
00:47:40.260 | always
00:47:40.500 | interesting
00:47:40.840 | every
00:47:41.640 | watch
00:47:41.920 | every
00:47:42.940 | those
00:47:43.760 | thoughts
00:47:47.020 | unfortunate
00:47:50.720 | those
00:47:51.600 | haven't
00:47:52.820 | thread
00:47:55.420 | think
00:47:57.060 | what I
00:47:58.660 | regard
00:48:00.360 | orchestration
00:48:01.240 | something
00:48:02.900 | you were
00:48:05.300 | touching
00:48:05.620 | on there
00:48:06.040 | where
00:48:11.020 | system
00:48:12.400 | building
00:48:12.720 | right
00:48:14.080 | probably
00:48:15.080 | going
00:48:15.420 | relevant
00:48:18.620 | model
00:48:18.980 | generations
00:48:20.420 | think
00:48:23.140 | seems
00:48:23.640 | there's
00:48:23.960 | value
00:48:25.220 | figuring
00:48:29.800 | what's
00:48:30.420 | minimum
00:48:30.820 | version
00:48:32.220 | system
00:48:32.680 | because
00:48:33.320 | you're
00:48:33.900 | going
00:48:35.540 | scratch
00:48:37.240 | generations
00:48:42.000 | minimum
00:48:42.300 | amount
00:48:45.120 | functionality
00:48:46.080 | you're
00:48:46.460 | using
00:48:50.920 | effectively
00:48:52.240 | The alternative
00:48:54.820 | angle is
00:48:55.420 | that you
00:48:55.660 | just make
00:48:55.980 | it good
00:48:56.300 | enough
00:48:56.600 | that it
00:48:57.040 | rebuild
00:48:57.400 | itself
00:48:57.880 | every
00:48:58.200 | generation
00:49:00.080 | exactly
00:49:00.980 | either
00:49:02.380 | could
00:49:04.200 | could
00:49:04.620 | either
00:49:04.820 | approach
00:49:05.200 | where
00:49:05.520 | could
00:49:06.300 | fairly
00:49:06.700 | sophisticated
00:49:07.280 | system
00:49:07.860 | that's
00:49:08.160 | built
00:49:08.920 | particular
00:49:09.280 | model
00:49:09.660 | that's
00:49:10.000 | spitting
00:49:10.480 | littler
00:49:10.840 | systems
00:49:13.160 | could
00:49:14.140 | these
00:49:15.900 | six or
00:49:17.860 | seven
00:49:18.280 | fairly
00:49:18.840 | straightforward
00:49:19.620 | approaches
00:49:20.760 | generally
00:49:21.340 | give me
00:49:21.960 | all the
00:49:22.520 | all the
00:49:23.140 | chassis
00:49:23.520 | I need
00:49:24.000 | to get
00:49:24.500 | thing
00:49:24.680 | going
00:49:25.320 | thing
00:49:25.560 | basically
00:49:27.180 | establish
00:49:28.020 | primitives
00:49:34.060 | think
00:49:34.200 | another
00:49:34.480 | takeaway
00:49:35.120 | watching
00:49:36.260 | using
00:49:37.140 | database
00:49:37.620 | orchestration
00:49:38.140 | really
00:49:38.960 | think
00:49:39.100 | that's
00:49:39.480 | something
00:49:40.600 | becoming
00:49:41.320 | database
00:49:41.840 | peeled
00:49:47.300 | quick
00:49:48.740 | using
00:49:49.240 | databases
00:49:50.100 | always
00:49:52.060 | environment
00:49:53.140 | production
00:49:53.640 | environment
00:49:54.160 | and name
00:49:58.220 | doesn't
00:49:58.860 | delete
00:49:59.300 | production
00:49:59.760 | database
00:50:03.380 | a big
00:50:03.960 | Neo4j
00:50:04.620 | myself
00:50:07.420 | I was
00:50:07.580 | going to
00:50:09.020 | recent
00:50:09.500 | database
00:50:10.360 | thing
00:50:11.940 | interested
00:50:13.580 | lately
00:50:14.000 | called
00:50:14.780 | where
00:50:16.420 | basically
00:50:20.580 | SQLite
00:50:21.460 | squished
00:50:21.880 | together
00:50:23.540 | database
00:50:24.400 | different
00:50:24.660 | branches
00:50:25.200 | whatever
00:50:25.700 | you'd
00:50:26.620 | branch
00:50:30.340 | branch
00:50:31.260 | wouldn't
00:50:34.620 | thanks
00:50:35.560 | pointing
00:50:38.000 | really good
00:50:38.540 | prompt
00:50:39.080 | literally
00:50:40.280 | SQLite
00:50:44.060 | create
00:50:44.580 | schema
00:50:46.040 | which
00:50:46.560 | which
00:50:47.160 | really
00:50:48.740 | which
00:50:49.480 | where
00:50:50.840 | difference
00:50:51.340 | between
00:50:52.140 | approach
00:50:52.520 | and Olivier's
00:50:53.580 | where he's
00:50:54.720 | a fairly
00:50:55.800 | sophisticated
00:50:56.340 | system
00:50:56.880 | that took
00:50:57.340 | a while
00:50:57.720 | to build
00:50:58.580 | whereas
00:50:58.980 | a lot
00:50:59.720 | of what
00:50:59.980 | I see
00:51:03.800 | applications
00:51:05.680 | where
00:51:06.340 | structure
00:51:08.500 | thing
00:51:09.120 | you're
00:51:09.460 | doing
00:51:10.620 | trying
00:51:11.220 | shoot
00:51:12.100 | minimal
00:51:13.220 | approach
00:51:13.800 | that's
00:51:14.200 | going
00:51:17.840 | leverage
00:51:18.640 | model
00:51:20.460 | model
00:51:22.940 | stuff
00:51:23.760 | opposed
00:51:26.160 | these
00:51:26.460 | things
00:51:30.060 | artifacts
00:51:32.900 | context
00:51:33.280 | window
00:51:34.980 | approach
00:51:38.080 | question
00:51:38.440 | everyone
00:51:39.020 | because
00:51:39.300 | don't
00:51:39.940 | experience
00:51:40.600 | databases
00:51:41.780 | checking
00:51:42.360 | adult
00:51:42.680 | as we
00:51:43.060 | speak
00:51:44.320 | anyone
00:51:44.900 | Marcus
00:51:45.820 | if you
00:51:46.260 | expand
00:51:46.960 | anyone
00:51:51.000 | dealing
00:51:51.520 | databases
00:51:52.320 | comes
00:51:52.860 | agent
00:51:53.380 | orchestration
00:51:54.800 | sense
00:51:57.320 | would
00:51:57.820 | personally
00:51:58.560 | sorry
00:52:05.600 | think
00:52:06.300 | Marcus
00:52:06.700 | speaking
00:52:07.280 | mistaken
00:52:10.620 | what I
00:52:11.320 | going
00:52:12.400 | something
00:52:12.620 | that's
00:52:12.860 | represented
00:52:13.640 | training
00:52:14.300 | because
00:52:16.820 | latest
00:52:17.280 | greatest
00:52:17.880 | certain
00:52:18.980 | thing
00:52:20.420 | might
00:52:21.100 | sense
00:52:22.040 | doesn't
00:52:22.340 | either
00:52:22.600 | operate
00:52:23.600 | existing
00:52:24.040 | schema
00:52:24.560 | that's
00:52:25.180 | training
00:52:29.960 | doesn't
00:52:31.140 | you're
00:52:32.380 | going
00:52:33.080 | getting
00:52:34.040 | recognize
00:52:34.560 | commands
00:52:34.980 | needs
00:52:37.140 | interact
00:52:38.000 | effectively
00:52:39.680 | server
00:52:40.000 | obviates
00:52:42.580 | server
00:52:43.100 | tools
00:52:43.400 | built
00:52:45.560 | handles
00:52:48.860 | personally
00:52:49.300 | found
00:52:50.640 | making
00:52:51.720 | schema
00:52:54.220 | model
00:52:54.500 | training
00:52:55.060 | makes
00:52:55.540 | massive
00:52:56.080 | difference
00:52:57.460 | consistency
00:53:00.020 | ability
00:53:01.060 | reliably
00:53:02.120 | interact
00:53:04.580 | database
00:53:09.040 | that's
00:53:09.240 | interesting
00:53:10.360 | Marcus
00:53:11.160 | I don't
00:53:12.580 | saying
00:53:12.720 | something
00:53:12.940 | about
00:53:13.780 | someone
00:53:16.080 | would
00:53:16.400 | start
00:53:17.220 | PostSQL
00:53:26.580 | look into
00:53:27.020 | and then
00:53:27.460 | it seems
00:53:28.180 | in the
00:53:28.400 | comments
00:53:28.780 | saying
00:53:29.900 | Superbase
00:53:31.740 | database
00:53:32.260 | cloud
00:53:33.980 | often
00:53:36.300 | makes
00:53:36.500 | things
00:53:36.740 | easier
00:53:37.640 | machine
00:53:38.000 | another
00:53:38.620 | collaborating
00:53:39.120 | sorry
00:53:40.540 | someone
00:53:42.100 | spoke
00:53:43.640 | before
00:53:45.000 | mentioned
00:53:45.780 | something
00:53:46.120 | about
00:53:46.860 | server
00:53:47.740 | working
00:53:50.300 | wasn't
00:53:53.220 | curious
00:53:55.940 | point
00:53:56.340 | again
00:54:02.260 | talking
00:54:02.600 | about
00:54:04.120 | wasn't
00:54:04.360 | necessarily
00:54:05.340 | servers
00:54:05.660 | wouldn't
00:54:06.180 | regardless
00:54:09.560 | you're
00:54:09.920 | using
00:54:11.240 | you're
00:54:11.500 | interacting
00:54:13.480 | database
00:54:13.880 | whether
00:54:14.260 | an MCP
00:54:14.760 | server
00:54:15.280 | custom
00:54:18.500 | agent
00:54:20.140 | you'll
00:54:23.000 | want to
00:54:23.220 | make sure
00:54:23.560 | that you're
00:54:23.940 | using
00:54:24.340 | a database
00:54:25.120 | that operates
00:54:26.560 | schema
00:54:28.220 | popular
00:54:30.240 | existing
00:54:31.660 | around
00:54:32.380 | while
00:54:33.640 | results
00:54:34.240 | because
00:54:38.620 | going
00:54:39.120 | between
00:54:39.460 | Falcor
00:54:39.860 | DB and
00:54:40.360 | Neo4j
00:54:40.980 | personally
00:54:41.840 | and I
00:54:43.280 | really liked
00:54:43.880 | what Falcor
00:54:44.360 | DB had
00:54:44.940 | to offer
00:54:45.420 | in terms
00:54:46.440 | features
00:54:47.080 | functionality
00:54:47.520 | but I
00:54:48.780 | personally
00:54:49.560 | found
00:54:50.080 | that Neo4j
00:54:51.480 | was just
00:54:52.180 | much better
00:54:52.840 | represented
00:54:53.360 | in the
00:54:54.840 | training
00:54:55.060 | and it
00:54:55.360 | was just
00:54:55.620 | a much
00:54:56.260 | accessible
00:54:56.580 | system
00:54:57.120 | to the
00:54:58.060 | themselves
00:54:58.620 | and so
00:55:00.300 | for that
00:55:02.320 | reason
00:55:02.560 | I've just
00:55:02.940 | had a lot
00:55:03.300 | better
00:55:04.200 | since
00:55:06.440 | moving
00:55:07.500 | Neo4j
00:55:08.660 | for that
00:55:10.920 | project
00:55:12.520 | my error
00:55:13.620 | rate has
00:55:14.000 | dropped
00:55:14.280 | substantially
00:55:16.100 | errors
00:55:16.980 | compound
00:55:18.200 | operations
00:55:19.000 | and over
00:55:19.500 | the length
00:55:20.700 | of time
00:55:21.000 | you're
00:55:21.240 | using
00:55:21.800 | the system
00:55:24.760 | difference
00:55:27.160 | lifetime
00:55:27.620 | project
00:55:28.700 | talking
00:55:28.960 | differences
00:55:31.120 | question
00:55:32.480 | on Falcor
00:55:33.700 | do you
00:55:35.360 | query
00:55:36.020 | of those
00:55:36.360 | databases
00:55:36.840 | in the
00:55:38.240 | different
00:55:38.700 | schema
00:55:39.120 | inside
00:55:39.600 | utilities
00:55:40.780 | you're
00:55:40.880 | getting
00:55:43.560 | forget
00:55:43.840 | exactly
00:55:44.280 | off the
00:55:45.760 | of my
00:55:47.940 | schema
00:55:49.280 | pattern
00:55:50.480 | Falcor
00:55:54.680 | don't
00:55:55.260 | remember
00:55:55.460 | exactly
00:55:55.740 | what it
00:55:57.000 | could
00:55:57.640 | reliably
00:55:58.160 | query
00:55:58.780 | a Falcor
00:56:00.140 | database
00:56:04.260 | thought
00:56:05.260 | there
00:56:07.080 | needs
00:56:07.600 | represented
00:56:08.580 | training
00:56:11.540 | needs
00:56:11.860 | to be
00:56:12.860 | accurately
00:56:13.980 | needs
00:56:14.160 | to have
00:56:14.360 | a lot
00:56:14.700 | examples
00:56:15.360 | training
00:56:17.220 | accurately
00:56:18.240 | query
00:56:19.200 | database
00:56:26.400 | Cypher
00:56:28.280 | think
00:56:28.460 | you're
00:56:28.580 | probably
00:56:30.280 | weird
00:56:31.240 | weird
00:56:31.660 | proprietary
00:56:32.140 | language
00:56:32.580 | you're
00:56:32.780 | going to
00:56:33.020 | be in
00:56:33.640 | trouble
00:56:34.820 | that's
00:56:35.520 | great way
00:56:36.160 | that's
00:56:36.540 | great
00:56:37.240 | works
00:56:38.120 | Cypher
00:56:38.840 | you're
00:56:41.200 | hours
00:56:41.500 | gentlemen
00:56:42.240 | y'all
00:56:42.940 | cable
00:56:45.880 | going
00:56:46.240 | thank
00:56:46.960 | great
00:56:47.800 | conversation
00:56:48.480 | appreciate
00:56:49.680 | everybody
00:56:50.460 | participated
00:56:52.180 | speaker
00:56:54.840 | inspired
00:56:56.740 | tinker
00:56:57.200 | something
00:56:57.600 | learned
00:56:57.800 | today
00:56:58.840 | report
00:56:59.240 | whatever
00:57:00.220 | might
00:57:02.080 | speak
00:57:04.400 | session
00:57:07.380 | mentioning
00:57:08.460 | action
00:57:09.960 | channel
00:57:11.900 | autonomous
00:57:12.640 | don't
00:57:13.860 | human
00:57:14.340 | folks
00:57:15.440 | Neo4j
00:57:16.040 | if you
00:57:16.480 | bring
00:57:17.080 | conversation
00:57:17.700 | about
00:57:18.300 | doesn't
00:57:24.240 | makes
00:57:24.760 | continue
00:57:26.760 | bring
00:57:27.960 | topics
00:57:28.340 | bring
00:57:28.960 | stuff
00:57:29.320 | doesn't
00:57:29.880 | polished
00:57:30.900 | interesting
00:57:32.580 | thank
00:57:32.960 | again
00:57:33.280 | we'll
00:57:34.460 | strong
00:57:35.460 | recommend
00:57:36.460 | podcast
00:57:39.460 | master
00:57:49.000 | Thank you.