back to indexState of Agents with Andrew Ng

00:00:08.980 |
So we'll be doing a fireside chat with Andrew Ng. 00:00:11.680 |
Andrew probably doesn't need any introduction to most folks here. 00:00:15.000 |
I'm guessing a lot of people have taken some of his classes on Coursera or deep learning. 00:00:20.540 |
But Andrew has also been a big part of the lane chain story. 00:00:24.040 |
So I met Andrew a little over two years ago at a conference when we started talking about 00:00:31.000 |
And he graciously invited us to do a course on LinkChain with deep learning. 00:00:34.680 |
I think it must have been the second or third one that they ever did. 00:00:39.060 |
And I know a lot of people here probably watched that course or got started on LinkChain because 00:00:43.860 |
So Andrew has been a huge part of the LinkChain journey. 00:00:47.640 |
And I'm super excited to welcome him on stage for a fireside chat. 00:01:09.060 |
I think Harrison and his team has taught six short courses so far on deep learning AI. 00:01:14.000 |
And on metrics by net promoter score and so on are that Harrison's courses are among our most 00:01:23.060 |
I think the one had the clearest explanation. 00:01:26.060 |
I have seen myself with a bunch of agentic concepts. 00:01:27.060 |
They've definitely helped make our courses and explanations better. 00:01:35.120 |
You've obviously touched and thought about so many things in this industry. 00:01:41.120 |
But one of your takes that I cite a lot and probably people have heard me talk about is 00:01:46.180 |
your take on kind of like talking about the agentic-ness of an application as opposed to 00:01:53.120 |
And so as we're here now at an agent conference, maybe we should rename it to an agentic conference, 00:01:58.120 |
but would you mind kind of like clarifying that? 00:02:01.120 |
And I think it was like almost a year and a half, two years ago that you said that. 00:02:05.120 |
And so I'm curious if things have changed in your mind since then. 00:02:08.120 |
So I remember Harrison and I both spoke at a conference like a year, over a year ago. 00:02:14.120 |
And at that time I think both of us were trying to convince other people that agents are a 00:02:20.120 |
And that was before maybe I think it was mid-summer last year that a bunch of marketers got a hold 00:02:25.120 |
of the agentic term and started sticking that sticker everywhere until last meeting. 00:02:30.120 |
But to Harrison's question, I think about a year and a half ago I saw that a lot of people 00:02:36.120 |
They're different arguments, is this truly autonomous, not to an agent? 00:02:40.120 |
And I felt that it was fine to have that argument, but that we would succeed better as a community 00:02:45.120 |
if we just say that there are degrees to which something is agentic. 00:02:49.120 |
And then if we just say, if you want to build an agentic system with a little bit of autonomy 00:02:56.120 |
No need to spend time arguing, is this truly an agent? 00:02:59.120 |
Let's just call all of these things agentic systems with different degrees of autonomy. 00:03:03.120 |
And I think that actually hopefully reduced the amount of time people wasted, 00:03:10.120 |
And let's just call them all agentic and then get on with it. 00:03:15.120 |
Where on that spectrum of kind of like a little autonomy to a lot of autonomy 00:03:22.120 |
Yeah, so my team routinely uses LandGraph for our hardest problems, right? 00:03:31.120 |
I'm also seeing tons of business opportunities that frankly are fairly linear workflows 00:03:36.120 |
or linear with just occasional side branches. 00:03:39.120 |
So a lot of businesses, there are opportunities where right now we have people 00:03:43.120 |
looking at a form on a website, doing web search, checking some of the database 00:03:47.120 |
to see if it's a compliance issue or if there are, you know, someone we shouldn't sell certain stuff to. 00:03:51.120 |
And it's kind of a, or take something, copy paste it, maybe do another web search, paste it in a different form. 00:03:57.120 |
So in business processes, there are actually a lot of fairly linear workflows 00:04:01.120 |
or linear with very small loops and occasional branches usually connoting a failure 00:04:08.120 |
So I see a lot of opportunities, but one challenge I see businesses have is it's still pretty difficult 00:04:14.120 |
to look at, you know, some stuff that's being done in your business 00:04:17.120 |
and figure out how to turn it into an agentic workflow. 00:04:20.120 |
So what is the granularity with which you should break down this thing into micro tasks? 00:04:26.120 |
And then, you know, after you build your initial prototype, if it doesn't work well enough, 00:04:30.120 |
which of these steps do you work on to improve the performance? 00:04:33.120 |
So I think that whole bag of skills on how to look at a bunch of stuff that people are doing, 00:04:38.120 |
break it into sequential steps, where are the small number of branches, 00:04:42.120 |
how do you put in place evals, you know, all that, that skill set is still far too rare, I think. 00:04:48.120 |
And then, of course, there are much more complex agentic workflows 00:04:51.120 |
that I think you heard a bunch about with very complex loops that's very valuable as well. 00:04:57.120 |
But I see much more in terms of number of opportunities, still number of value. 00:05:01.120 |
There's a lot of simpler workflows that I think are still being built out. 00:05:09.120 |
I think a lot of courses are in pursuit of helping people kind of like build agents. 00:05:13.120 |
And so what are some of the skills that you think agent builders all across the spectrum 00:05:18.120 |
should kind of like master and get started with? 00:05:22.120 |
Boy, it's a good question. I wish I knew a good answer to that. 00:05:25.120 |
I've been thinking a lot about this actually recently. 00:05:27.120 |
I think a lot of the challenges, if you have a business process workflow, 00:05:32.120 |
you often have people in compliance, legal, HR, whatever, doing these steps. 00:05:37.120 |
How do you put in place the plumbing, either through a land graph type integration, 00:05:44.120 |
or we'll see if MCP helps with some of that too, to ingest the data? 00:05:48.120 |
And then how do you prompt or process and do the multiple steps in order to build this end-to-end system? 00:05:55.120 |
And then one thing I see a lot is putting in place the right evals framework to not only understand the performance of the overall system, 00:06:07.120 |
You can hone in on what's the one step that is broken, what's the one prompt that's broken to work on. 00:06:13.120 |
I find that a lot of teams probably wait longer than they should just using human evals, 00:06:18.120 |
where every time you change something, you then sit there and look at a bunch of output receivers, right? 00:06:22.120 |
I see most teams probably slower to put in place evals, systematic evals, than is ideal. 00:06:27.120 |
But I find that having the right instincts for what to do next in a project is still really difficult, right? 00:06:34.120 |
The school teams, the teams that are still learning these skills will often, you know, go down blind alleys, right? 00:06:42.120 |
Where you spend like a few months trying to improve one component, 00:06:45.120 |
the more experienced team will say, "You know what? I don't think this can ever be made to work." 00:06:49.120 |
So just don't, just find a different way around this problem. 00:06:53.120 |
I wish I knew, I wish I knew more efficient ways to get this kind of almost tactile knowledge. 00:06:59.120 |
Often you're there, you know, look at the output, look at trace, look at the landsmith output, 00:07:07.120 |
In minutes or hours on what to do next, and that's still very difficult. 00:07:12.120 |
And is this kind of like tactile knowledge mostly around LLMs and their limitations, 00:07:17.120 |
or more around like just the product framing of things and that skill of taking a job and breaking it down? 00:07:23.120 |
That's something that we're still getting accustomed to. 00:07:28.120 |
So I feel like over the last couple of years, AI tool companies have created an amazing set of AI tools. 00:07:36.120 |
And this includes tools like, you know, Landgraph, but also ideas like how do you think about Rack? 00:07:45.120 |
Many, many different ways of approaching memory. 00:07:52.120 |
But I feel like there's this, you know, wide sprawling array of really exciting tools. 00:07:57.120 |
One picture I often have in my head is if all you have are, you know, purple Lego bricks, right? 00:08:07.120 |
And I think of these tools as being akin to Lego bricks, right? 00:08:10.120 |
And the more tools you have is as if you don't just have purple Lego bricks, but a red one and a black one and a yellow one and a green one. 00:08:17.120 |
And as you get more different colored and shaped Lego bricks, you can very quickly assemble them into really cool things. 00:08:23.120 |
And so I think a lot of these tools, like the ones that's rafting off as different types of Lego bricks. 00:08:28.120 |
And when you're trying to build something, you know, sometimes you need that, right, squiggly, weird-shaped Lego brick. 00:08:33.120 |
And some people know it and can plug it in and just get the job done. 00:08:36.120 |
But if you've never built evals of a certain type, then, you know, then you could actually end up spending, whatever, 00:08:43.120 |
three extra months doing something that someone else that's done that before could say, oh, well, we should just build evals this way. 00:08:49.120 |
Use the alarm as a judge and just go through that process to get it done much faster. 00:08:54.120 |
So one of the unfortunate things about AI is it's not just one tool. 00:09:00.120 |
And when I'm coding, I just use a whole bunch of different stuff, right? 00:09:04.120 |
And I'm not a master of enough stuff myself, but I've learned enough tools to assemble them quickly. 00:09:09.120 |
So, yeah, and I think having that practice with different tools also helps with much faster decision-making. 00:09:19.120 |
So, for example, because LOMs have been having longer and longer context memory, a lot of the best practices for RAG from, you know, a year and a half ago or whatever, are much less relevant today, right? 00:09:31.120 |
And I remember, Harrison was really early to a lot of these things. 00:09:34.120 |
So I played with the early LAN chain, RAG frameworks, recursive summarization and all that. 00:09:39.120 |
As LOM context memories got longer, now we just dump a lot more stuff into LOM context. 00:09:44.120 |
It's not that RAG has gone away, but the hyperparameter tuning has gone way easier. 00:09:48.120 |
There's a huge range of hyperparameters that work, you know, like just fine. 00:09:52.120 |
So as LOMs keep progressing, the instincts we hold, you know, two years ago may or may not be relevant anymore today. 00:10:00.120 |
You mentioned a lot of things that I want to talk about. 00:10:04.120 |
So, okay, what are some of the Lego bricks that are maybe underrated right now that you would recommend that people aren't talking about? 00:10:11.120 |
Like evals, you know, we had three people talk about evals, and I think that's top of people's mind. 00:10:15.120 |
But what are some things that most people maybe haven't thought of or haven't heard of yet that you would recommend them looking into? 00:10:25.120 |
So even though people talk about evals, for some reason people don't do it. 00:10:32.120 |
I think it's because people often have -- I saw a post on this on evals writer's block. 00:10:38.120 |
People think of writing evals as this huge thing that you have to do right. 00:10:42.120 |
I think of evals as something I'm going to fill together really quickly, you know, in 20 minutes, and it's not that good. 00:10:48.120 |
But it starts to complement my human eyeball evals. 00:10:52.120 |
And so what often happens is I'll build a system and there's one problem where I keep on getting regression. 00:11:02.120 |
Then I code up a very simple eval, maybe with, you know, five input examples and some very simple misjudge to just check for this one regression, right? 00:11:13.120 |
And then I'm not swapping out human evals for automated evals. 00:11:19.120 |
But when I change something, I'll run this eval to just, you know, take this burden something so I don't have to think about it. 00:11:24.120 |
And then what happens is just like the way we write English, maybe, once you have some slightly helpful but clearly very broken, imperfect eval, then you start to go, you know what? 00:11:38.120 |
I can improve my eval to make it better, and I can improve it to make it better. 00:11:42.120 |
So just as when we build a lot of applications, we build some, you know, very quick and dirty thing that doesn't work and it will incrementally make it better. 00:11:49.120 |
For a lot of the way I build evals, I build really awful evals that barely helps. 00:11:54.120 |
And then when you look at what it does, you go, you know what? 00:12:01.120 |
I'll mention one thing that people have talked a lot about, but I think is still underrated, is the voice stack. 00:12:09.120 |
It's one of the things that I'm actually very excited about voice applications. 00:12:12.120 |
A lot of my friends are very excited about voice applications. 00:12:14.120 |
I see a bunch of large enterprises really excited about voice applications, very large enterprises, very large use cases. 00:12:20.120 |
For some reason, while there are some developers in this community doing voice, the amount of developer attention on voice stack applications, there is some, right? 00:12:30.120 |
It's not that people have ignored it, but that's one thing that feels much smaller than the large enterprise importance I see as well as applications coming down the pipe. 00:12:39.120 |
And not all of this is the real-time voice API. 00:12:42.120 |
It's not all speech-to-speech native audio-in, audio-out models. 00:12:47.120 |
I find those models are very hard to control, but when we use more of an agentic voice stack workflow, which we find much more controllable. 00:12:56.120 |
I find working with a ton of teams on voice stack stuff that some of which hopefully will be announced in the near future. 00:13:06.120 |
And then other things I think are underrated. 00:13:09.120 |
One other one that maybe is not underrated, but more business should do it. 00:13:13.120 |
I think many of you have seen that developers that use AI assistants in our coding is so much faster than developers that don't. 00:13:21.120 |
It's been interesting to see how many companies, CIOs and CTOs, still have policies that don't let engineers use AI-assisted coding. 00:13:31.120 |
I think maybe sometimes for good reasons, but I think we have to get past that. 00:13:36.120 |
Because frankly, I don't know, my teams and I just hate to ever have to code again without AI assistants. 00:13:42.120 |
But I think some businesses still need to get through that. 00:13:46.120 |
I think underrated is the idea that I think everyone should learn to code. 00:13:50.120 |
One fun fact about AI Fund, everyone in AI Fund, including the person that runs our front desk receptionist, and my CFO, and the general counsel, everyone in AI Fund actually knows how to code. 00:14:05.120 |
And it's not that I want them to be software engineers, they're not. 00:14:09.120 |
But in their respective job functions, many of them, by learning a little bit about how to code, are better able to tell a computer what they want it to do. 00:14:17.120 |
And so it's actually driving meaningful productivity improvements across all of these job functions that are not software engineering. 00:14:26.120 |
Talking about kind of like AI coding, what tools are you using for that personally? 00:14:33.120 |
So we're working on some things that we've not yet announced. 00:14:41.120 |
So maybe I do use Cursor, Windsurf, and some other things. 00:14:54.120 |
If people here want to get into voice and they're familiar with building kind of like agents with LLMs, how similar is it? 00:15:00.120 |
Are there a lot of ideas that are transferable? 00:15:05.120 |
So it turns out there are a lot of applications where I think voice is important. 00:15:10.120 |
It creates certain interactions that are much more... 00:15:17.120 |
Actually, it turns out from an application perspective, input text prompt is kind of intimidating, right? 00:15:23.120 |
For a lot of applications, well, we can go to users and say, "Tell me what you think. 00:15:31.120 |
And one of the problems with that is people can use backspace. 00:15:35.120 |
And so, you know, people are just slower to respond via text. 00:15:39.120 |
Whereas for voice, you know, time rolls forward. 00:15:45.120 |
You could actually say, "Oh, I changed my mind. 00:15:48.120 |
And our model is actually pretty good at dealing with it. 00:15:50.120 |
But I find that a lot of applications where the user friction to just getting them to use 00:15:56.120 |
We just say, you know, "Tell me what you think." 00:16:01.120 |
So in terms of voice, the one biggest difference in terms of engine requirements is latency. 00:16:09.120 |
If someone says something, you kind of really want to respond in, you know, I don't know, 00:16:18.120 |
And we have a lot of agentic workflows that will run for many seconds. 00:16:23.120 |
So when DeepBurn.AI worked with RailAvatar to build an avatar of me... 00:16:30.120 |
Our initial version had kind of five to nine seconds of latency. 00:16:37.120 |
You say something, you know, nine seconds of silence. 00:16:45.120 |
So just as, you know, if you ask me a question, 00:16:51.120 |
So we prompted an Elm to basically do that to hide the latency. 00:16:58.120 |
And there are all these other little tricks as well. 00:17:00.120 |
Turns out if you're building a voice customer service chatbot, 00:17:03.120 |
it turns out that if you play background noise of a customer contact center, 00:17:08.120 |
instead of dead silence, people are much more accepting of that, you know, latency. 00:17:13.120 |
So I find that there are a lot of these things that are different than a pure text-based Elm. 00:17:19.120 |
But in applications where our voice-based morality lets the user be comfortable and just start talking, 00:17:25.120 |
I think it sometimes really reduces the user friction to, you know, getting some information on all of them. 00:17:30.120 |
I think when we talk, we don't feel like we need to deliver perfection as much as when we write. 00:17:38.120 |
So it's somehow easier for people to just start blurting out their ideas and change their mind and go back and forth. 00:17:43.120 |
And that lets us get the information from them that we need to help the user to move forward. 00:17:49.120 |
One of the new things that's out there, and you mentioned briefly, is MCP. 00:17:58.120 |
How are you seeing that transform how people are building apps, what types of apps they're building, 00:18:03.120 |
or what's generally happening in the ecosystem? 00:18:07.120 |
Just this morning, we released with Anthropic short calls on MCP. 00:18:12.120 |
I actually saw a lot of stuff, you know, on the interweb on MCP that I thought was quite confusing. 00:18:20.120 |
So when we got to go to Anthropic, we said, you know, let's create a really good short calls on MCP that explains it clearly. 00:18:28.120 |
I think it was a very clear market gap and, you know, that OpenAI adopted it. 00:18:32.120 |
Also, I think speaks to the importance of this. 00:18:36.120 |
I think the MCP standard will continue to evolve, right? 00:18:39.120 |
So for example, so I think many of you know what MCP is, right? 00:18:43.120 |
It makes it much easier for agents primarily, but frankly, I think other types of software to plug in the different types of data. 00:18:49.120 |
When I'm using LMS myself or when I'm building applications, frankly, for a lot of us, we spend so much time on the plumbing. 00:18:58.120 |
So I think for those of you from large enterprises as well, the AI, especially, you know, reasoning models are like pretty darn intelligent. 00:19:05.120 |
They could do a lot of stuff when given the right context. 00:19:09.120 |
But so I find that I spend, my team spend a lot of time working on the plumbing on the data integrations to get the context to the OM to make it, you know, do something that often is pretty sensible when it has the right input context. 00:19:22.120 |
So MCP, I think is a fantastic way to try to standardize the interface to a lot of tools or API calls as well as data sources. 00:19:30.120 |
It feels like, it feels a little bit like Wild West. 00:19:34.120 |
You know, a lot of MCP servers you find on the internet do not work, right? 00:19:37.120 |
And then the authentication systems are kind of, you know, even for the very large companies, you know, with MCP service a little bit clunky, it's not clear if the authentication token totally works. 00:19:48.120 |
And when it expires, there's a lot of that going on. 00:19:50.120 |
I think the MCP protocol itself is also early. 00:19:53.120 |
Right now, MCP gives a long list of the resources available. 00:19:57.120 |
You know, eventually, I think we need some more hierarchical discovery. 00:20:01.120 |
Imagine you want to build something, I don't know, even, I don't know if that would be an MCP interface to a land graph. 00:20:08.120 |
But a land graph has so many API calls, you just can't have like a long list of everything under the sun for agents to sort out. 00:20:16.120 |
So I think some sort of hierarchical discovery mechanism. 00:20:18.120 |
So I think MCP is a really fantastic first step. 00:20:23.120 |
It will make your life easier, probably, if you find a good MCP server implementation to help with some of the data integrations. 00:20:32.120 |
It's this idea of when you have, you know, N models or N agents and M data sources, 00:20:39.120 |
it should not be an N times M effort to do all the integrations, it should be N plus M. 00:20:47.120 |
It will need to evolve, but it's a fantastic first step toward that type of data integration. 00:20:52.120 |
Another type of protocol that's seen less buzz than MCP is some of the agent-to-agent stuff. 00:20:58.120 |
And I remember when we were at a conference a year or so ago, I think you were talking about multi-agent systems, 00:21:05.120 |
So how do you see some of the multi-agent or agent-to-agent stuff evolving? 00:21:10.120 |
Yeah, so I think, you know, agent-to-agent AI is still so early. 00:21:15.120 |
Most of us, right, including me, we struggle to even make our code work. 00:21:20.120 |
And so making my code, my agent work with someone else's agent, it feels like a two-miracle, you know, requirement. 00:21:28.120 |
So I see that when one team is building a multi-agent system, that often works because we build a bunch of agents, 00:21:35.120 |
they communicate with themselves, we understand the protocols, that works. 00:21:38.120 |
But right now, at least at this moment in time, and maybe I'm off, the number of examples I'm seeing of when, you know, 00:21:45.120 |
one team's agent or collection of agents successfully engages. 00:21:48.120 |
They're totally different teams' agent or collection of agents. 00:21:54.120 |
But I'm not personally seeing, you know, real success, huge success stories of that yet. 00:22:05.120 |
I think if MCP is early, I think agent-to-agent stuff is even earlier. 00:22:08.120 |
Another thing that's kind of like top of people's mind right now is kind of vibe coding and all of that. 00:22:14.120 |
And you touched on it a little bit earlier with how people are using these AI coding assistants. 00:22:23.120 |
What kind of purpose does that serve in the world? 00:22:26.120 |
You know, so I think, you know, many of us cope with barely looking at the code, right? 00:22:33.120 |
I think it's unfortunate that that called vibe coding, because it's misleading a lot of people, 00:22:37.120 |
into thinking, just go with the vibes, you know, accept this, reject that. 00:22:42.120 |
And frankly, when I'm coding for a day, you know, with vibe coding or whatever, with AI coding assistance, 00:22:53.120 |
And so I think the name is unfortunate, but the phenomenon is real and it's been taking off and it's great. 00:22:59.120 |
So over the last year, a few people have been advising others to not learn to code on the basis that AI will automate coding. 00:23:10.120 |
I think we'll look back at some of the worst career advice ever given. 00:23:14.120 |
Because over the last many decades, as coding became easier, more people started to code. 00:23:20.120 |
So it turns out, you know, when we went from punch cards to keyboards and terminals, right? 00:23:25.120 |
Or when it turns out, I actually found some very old articles. 00:23:28.120 |
When programming went from assembly language to, you know, literally COBOL, there were people arguing back then, 00:23:34.120 |
you know, we have COBOL, it's so easy, we don't need programmers anymore. 00:23:37.120 |
And obviously, when it became easier, more people learned to code. 00:23:42.120 |
And so with AI coding assistance, a lot more people should code. 00:23:48.120 |
But I think, and it turns out, one of the most important skills of the future for developers and non-developers 00:23:53.120 |
is the ability to tell a computer exactly what you want so they will do it for you. 00:23:58.120 |
And I think understanding at some level, which all of you do, I know, but understanding at some level 00:24:04.120 |
how a computer works, lets you prompt or instruct the computer much more precisely, 00:24:09.120 |
which is why I still try to advise everyone to, you know, learn one programming language, learn Python or something. 00:24:16.120 |
And then, I think, maybe some of you know this, I perceive a much stronger Python developer than, say, JavaScript, right? 00:24:25.120 |
But with AI-CC coding, I now write a lot more JavaScript and TypeScript code than I ever used to. 00:24:32.120 |
But even when debugging, you know, JavaScript code that something else wrote for me that I didn't write with my own fingers, 00:24:38.120 |
really understanding, you know, what are the error cases, what does this mean? 00:24:42.120 |
That's been really important for me to, right, debug my JavaScript code. 00:24:47.120 |
If you don't like the name Vibe Coding, do you have a better name in mind? 00:24:51.120 |
Oh, that's a good question. I should think about that. 00:24:57.120 |
One of the things that you announced recently is a new fund for AI Fund, so congrats on that. 00:25:03.120 |
For people in the audience who are maybe thinking of starting a startup or looking into that, 00:25:12.120 |
So, we build companies and we exclusively invest in companies that we co-founded. 00:25:16.120 |
So, I think in terms of looking back on AI Fund's, you know, lessons learned, the number one, 00:25:23.120 |
I would say the number one predictor of a startup success is speed. 00:25:28.120 |
I know we're in Silicon Valley, but I see a lot of people that have never seen yet the speed 00:25:37.120 |
And if you've never seen it before, I know many of you have seen it. 00:25:40.120 |
It's just so much faster than, you know, anything that slower businesses know how to do. 00:25:47.120 |
And I think the number two predictor, also very important, is technical knowledge. 00:25:51.120 |
It turns out, if we look at the skills needed to build a startup, there's some things like, 00:25:58.120 |
You know, all that is important, but that knowledge has been around. 00:26:03.120 |
But the knowledge that's really rare is, how does technology actually work? 00:26:07.120 |
Because technology has been evolving so quickly. 00:26:09.120 |
So, I have deep respect for the go-to-market people. 00:26:17.120 |
And the most rare resource is someone that really understands how the technology works. 00:26:22.120 |
So, AI Fund, we really like working with deeply technical people that have good instincts 00:26:32.120 |
And then, I think, a lot of the business stuff, you know, that knowledge is very important, 00:26:46.120 |
We're going to go to a break now, but before we do, please join me in giving Andrew a big