State of Agents with Andrew Ng

I'm really excited for this next section. So we'll be doing a fireside chat with Andrew Ng. Andrew probably doesn't need any introduction to most folks here. I'm guessing a lot of people have taken some of his classes on Coursera or deep learning. But Andrew has also been a big part of the lane chain story.

So I met Andrew a little over two years ago at a conference when we started talking about LinkChain. And he graciously invited us to do a course on LinkChain with deep learning. I think it must have been the second or third one that they ever did. And I know a lot of people here probably watched that course or got started on LinkChain because of that course.

So Andrew has been a huge part of the LinkChain journey. And I'm super excited to welcome him on stage for a fireside chat. So let's welcome Andrew Ng. Thanks for being here. Thanks for being here. By the way, Harrison was really kind. I think Harrison and his team has taught six short courses so far on deep learning AI.

And on metrics by net promoter score and so on are that Harrison's courses are among our most highly rated. So go take all of Harrison's courses. I think the one had the clearest explanation. I have seen myself with a bunch of agentic concepts. They've definitely helped make our courses and explanations better.

So thank you guys for that as well. You've obviously touched and thought about so many things in this industry. But one of your takes that I cite a lot and probably people have heard me talk about is your take on kind of like talking about the agentic-ness of an application as opposed to whether something's an agent.

And so as we're here now at an agent conference, maybe we should rename it to an agentic conference, but would you mind kind of like clarifying that? And I think it was like almost a year and a half, two years ago that you said that. And so I'm curious if things have changed in your mind since then.

So I remember Harrison and I both spoke at a conference like a year, over a year ago. And at that time I think both of us were trying to convince other people that agents are a thing and we should pay attention to it. And that was before maybe I think it was mid-summer last year that a bunch of marketers got a hold of the agentic term and started sticking that sticker everywhere until last meeting.

But to Harrison's question, I think about a year and a half ago I saw that a lot of people are arguing, is this an agent, is this not? They're different arguments, is this truly autonomous, not to an agent? And I felt that it was fine to have that argument, but that we would succeed better as a community if we just say that there are degrees to which something is agentic.

And then if we just say, if you want to build an agentic system with a little bit of autonomy or a lot of autonomy, it's all fine. No need to spend time arguing, is this truly an agent? Let's just call all of these things agentic systems with different degrees of autonomy.

And I think that actually hopefully reduced the amount of time people wasted, spend arguing with something as an agent. And let's just call them all agentic and then get on with it. So I think that actually worked out. Where on that spectrum of kind of like a little autonomy to a lot of autonomy do you see people building for these days?

Yeah, so my team routinely uses LandGraph for our hardest problems, right? with complex flows and so on. I'm also seeing tons of business opportunities that frankly are fairly linear workflows or linear with just occasional side branches. So a lot of businesses, there are opportunities where right now we have people looking at a form on a website, doing web search, checking some of the database to see if it's a compliance issue or if there are, you know, someone we shouldn't sell certain stuff to.

And it's kind of a, or take something, copy paste it, maybe do another web search, paste it in a different form. So in business processes, there are actually a lot of fairly linear workflows or linear with very small loops and occasional branches usually connoting a failure because they reject this workflow.

So I see a lot of opportunities, but one challenge I see businesses have is it's still pretty difficult to look at, you know, some stuff that's being done in your business and figure out how to turn it into an agentic workflow. So what is the granularity with which you should break down this thing into micro tasks?

And then, you know, after you build your initial prototype, if it doesn't work well enough, which of these steps do you work on to improve the performance? So I think that whole bag of skills on how to look at a bunch of stuff that people are doing, break it into sequential steps, where are the small number of branches, how do you put in place evals, you know, all that, that skill set is still far too rare, I think.

And then, of course, there are much more complex agentic workflows that I think you heard a bunch about with very complex loops that's very valuable as well. But I see much more in terms of number of opportunities, still number of value. There's a lot of simpler workflows that I think are still being built out.

Let's talk about some of those skills. Like, so you've been doing deep learning. I think a lot of courses are in pursuit of helping people kind of like build agents. And so what are some of the skills that you think agent builders all across the spectrum should kind of like master and get started with?

Boy, it's a good question. I wish I knew a good answer to that. I've been thinking a lot about this actually recently. I think a lot of the challenges, if you have a business process workflow, you often have people in compliance, legal, HR, whatever, doing these steps. How do you put in place the plumbing, either through a land graph type integration, or we'll see if MCP helps with some of that too, to ingest the data?

And then how do you prompt or process and do the multiple steps in order to build this end-to-end system? And then one thing I see a lot is putting in place the right evals framework to not only understand the performance of the overall system, but to trace the individual steps.

You can hone in on what's the one step that is broken, what's the one prompt that's broken to work on. I find that a lot of teams probably wait longer than they should just using human evals, where every time you change something, you then sit there and look at a bunch of output receivers, right?

I see most teams probably slower to put in place evals, systematic evals, than is ideal. But I find that having the right instincts for what to do next in a project is still really difficult, right? The school teams, the teams that are still learning these skills will often, you know, go down blind alleys, right?

Where you spend like a few months trying to improve one component, the more experienced team will say, "You know what? I don't think this can ever be made to work." So just don't, just find a different way around this problem. I wish I knew, I wish I knew more efficient ways to get this kind of almost tactile knowledge.

Often you're there, you know, look at the output, look at trace, look at the landsmith output, and you just got to make a decision, right? In minutes or hours on what to do next, and that's still very difficult. And is this kind of like tactile knowledge mostly around LLMs and their limitations, or more around like just the product framing of things and that skill of taking a job and breaking it down?

That's something that we're still getting accustomed to. I think it's all of the above, actually. So I feel like over the last couple of years, AI tool companies have created an amazing set of AI tools. And this includes tools like, you know, Landgraph, but also ideas like how do you think about Rack?

How do you think about building chatbots? Many, many different ways of approaching memory. I don't know what else. How do you build evals? How do you build guardrails? How do you build guardrails? But I feel like there's this, you know, wide sprawling array of really exciting tools. One picture I often have in my head is if all you have are, you know, purple Lego bricks, right?

You can't build that much interesting stuff. And I think of these tools as being akin to Lego bricks, right? And the more tools you have is as if you don't just have purple Lego bricks, but a red one and a black one and a yellow one and a green one.

And as you get more different colored and shaped Lego bricks, you can very quickly assemble them into really cool things. And so I think a lot of these tools, like the ones that's rafting off as different types of Lego bricks. And when you're trying to build something, you know, sometimes you need that, right, squiggly, weird-shaped Lego brick.

And some people know it and can plug it in and just get the job done. But if you've never built evals of a certain type, then, you know, then you could actually end up spending, whatever, three extra months doing something that someone else that's done that before could say, oh, well, we should just build evals this way.

Use the alarm as a judge and just go through that process to get it done much faster. So one of the unfortunate things about AI is it's not just one tool. And when I'm coding, I just use a whole bunch of different stuff, right? And I'm not a master of enough stuff myself, but I've learned enough tools to assemble them quickly.

So, yeah, and I think having that practice with different tools also helps with much faster decision-making. And one other thing is, it also changes. So, for example, because LOMs have been having longer and longer context memory, a lot of the best practices for RAG from, you know, a year and a half ago or whatever, are much less relevant today, right?

And I remember, Harrison was really early to a lot of these things. So I played with the early LAN chain, RAG frameworks, recursive summarization and all that. As LOM context memories got longer, now we just dump a lot more stuff into LOM context. It's not that RAG has gone away, but the hyperparameter tuning has gone way easier.

There's a huge range of hyperparameters that work, you know, like just fine. So as LOMs keep progressing, the instincts we hold, you know, two years ago may or may not be relevant anymore today. You mentioned a lot of things that I want to talk about. So, okay, what are some of the Lego bricks that are maybe underrated right now that you would recommend that people aren't talking about?

Like evals, you know, we had three people talk about evals, and I think that's top of people's mind. But what are some things that most people maybe haven't thought of or haven't heard of yet that you would recommend them looking into? Good question. I don't know. Yeah. So even though people talk about evals, for some reason people don't do it.

Why don't you think they do it? I think it's because people often have -- I saw a post on this on evals writer's block. People think of writing evals as this huge thing that you have to do right. I think of evals as something I'm going to fill together really quickly, you know, in 20 minutes, and it's not that good.

But it starts to complement my human eyeball evals. And so what often happens is I'll build a system and there's one problem where I keep on getting regression. I thought I made it work, then it breaks. I thought I made it work, then it breaks. Well, darn it. This is getting annoying.

Then I code up a very simple eval, maybe with, you know, five input examples and some very simple misjudge to just check for this one regression, right? Did this one thing break? And then I'm not swapping out human evals for automated evals. I'm still looking at the output myself.

But when I change something, I'll run this eval to just, you know, take this burden something so I don't have to think about it. And then what happens is just like the way we write English, maybe, once you have some slightly helpful but clearly very broken, imperfect eval, then you start to go, you know what?

I can improve my eval to make it better, and I can improve it to make it better. So just as when we build a lot of applications, we build some, you know, very quick and dirty thing that doesn't work and it will incrementally make it better. For a lot of the way I build evals, I build really awful evals that barely helps.

And then when you look at what it does, you go, you know what? This eval's broken. I could fix it. And you incrementally make it better. So that's one thing. I'll mention one thing that people have talked a lot about, but I think is still underrated, is the voice stack.

It's one of the things that I'm actually very excited about voice applications. A lot of my friends are very excited about voice applications. I see a bunch of large enterprises really excited about voice applications, very large enterprises, very large use cases. For some reason, while there are some developers in this community doing voice, the amount of developer attention on voice stack applications, there is some, right?

It's not that people have ignored it, but that's one thing that feels much smaller than the large enterprise importance I see as well as applications coming down the pipe. And not all of this is the real-time voice API. It's not all speech-to-speech native audio-in, audio-out models. I find those models are very hard to control, but when we use more of an agentic voice stack workflow, which we find much more controllable.

I find working with a ton of teams on voice stack stuff that some of which hopefully will be announced in the near future. I've seen a lot of very exciting things. And then other things I think are underrated. One other one that maybe is not underrated, but more business should do it.

I think many of you have seen that developers that use AI assistants in our coding is so much faster than developers that don't. It's been interesting to see how many companies, CIOs and CTOs, still have policies that don't let engineers use AI-assisted coding. I think maybe sometimes for good reasons, but I think we have to get past that.

Because frankly, I don't know, my teams and I just hate to ever have to code again without AI assistants. But I think some businesses still need to get through that. I think underrated is the idea that I think everyone should learn to code. One fun fact about AI Fund, everyone in AI Fund, including the person that runs our front desk receptionist, and my CFO, and the general counsel, everyone in AI Fund actually knows how to code.

And it's not that I want them to be software engineers, they're not. But in their respective job functions, many of them, by learning a little bit about how to code, are better able to tell a computer what they want it to do. And so it's actually driving meaningful productivity improvements across all of these job functions that are not software engineering.

So that's been exciting as well. Talking about kind of like AI coding, what tools are you using for that personally? So we're working on some things that we've not yet announced. Oh, exciting. Yeah. So maybe I do use Cursor, Windsurf, and some other things. All right, we'll come back to that later.

Talking about voice. If people here want to get into voice and they're familiar with building kind of like agents with LLMs, how similar is it? Are there a lot of ideas that are transferable? Or what's new? What will they have to learn? Yeah. So it turns out there are a lot of applications where I think voice is important.

It creates certain interactions that are much more... It turns out that... Actually, it turns out from an application perspective, input text prompt is kind of intimidating, right? For a lot of applications, well, we can go to users and say, "Tell me what you think. Here's a block of text prompt.

Write a bunch of text for me." That's actually very intimidating for users. And one of the problems with that is people can use backspace. And so, you know, people are just slower to respond via text. Whereas for voice, you know, time rolls forward. You just have to keep talking.

You could change your mind. You could actually say, "Oh, I changed my mind. Forget that earlier thing." And our model is actually pretty good at dealing with it. But I find that a lot of applications where the user friction to just getting them to use it is lower. We just say, you know, "Tell me what you think." And then they respond in voice.

So in terms of voice, the one biggest difference in terms of engine requirements is latency. Because if you can... If someone says something, you kind of really want to respond in, you know, I don't know, sub one second, right? Less than 500 milliseconds is great. But really, ideally, sub one second.

And we have a lot of agentic workflows that will run for many seconds. So when DeepBurn.AI worked with RailAvatar to build an avatar of me... This is on the web page. You can talk to an avatar of me if you want. Our initial version had kind of five to nine seconds of latency.

And it's just a bad user experience. You say something, you know, nine seconds of silence. Then my avatar responds. But so we wound up building things like... We call it a pre-response. So just as, you know, if you ask me a question, I might go, "Huh, that's interesting." Or, "Let me think about that." So we prompted an Elm to basically do that to hide the latency.

And it actually seems to work great. And there are all these other little tricks as well. Turns out if you're building a voice customer service chatbot, it turns out that if you play background noise of a customer contact center, instead of dead silence, people are much more accepting of that, you know, latency.

So I find that there are a lot of these things that are different than a pure text-based Elm. But in applications where our voice-based morality lets the user be comfortable and just start talking, I think it sometimes really reduces the user friction to, you know, getting some information on all of them.

I think when we talk, we don't feel like we need to deliver perfection as much as when we write. So it's somehow easier for people to just start blurting out their ideas and change their mind and go back and forth. And that lets us get the information from them that we need to help the user to move forward.

That's interesting. One of the new things that's out there, and you mentioned briefly, is MCP. How are you seeing that transform how people are building apps, what types of apps they're building, or what's generally happening in the ecosystem? Yes, I think it's really exciting. Just this morning, we released with Anthropic short calls on MCP.

I actually saw a lot of stuff, you know, on the interweb on MCP that I thought was quite confusing. So when we got to go to Anthropic, we said, you know, let's create a really good short calls on MCP that explains it clearly. I think MCP is fantastic. I think it was a very clear market gap and, you know, that OpenAI adopted it.

Also, I think speaks to the importance of this. I think the MCP standard will continue to evolve, right? So for example, so I think many of you know what MCP is, right? It makes it much easier for agents primarily, but frankly, I think other types of software to plug in the different types of data.

When I'm using LMS myself or when I'm building applications, frankly, for a lot of us, we spend so much time on the plumbing. Right? So I think for those of you from large enterprises as well, the AI, especially, you know, reasoning models are like pretty darn intelligent. They could do a lot of stuff when given the right context.

But so I find that I spend, my team spend a lot of time working on the plumbing on the data integrations to get the context to the OM to make it, you know, do something that often is pretty sensible when it has the right input context. So MCP, I think is a fantastic way to try to standardize the interface to a lot of tools or API calls as well as data sources.

It feels like, it feels a little bit like Wild West. You know, a lot of MCP servers you find on the internet do not work, right? And then the authentication systems are kind of, you know, even for the very large companies, you know, with MCP service a little bit clunky, it's not clear if the authentication token totally works.

And when it expires, there's a lot of that going on. I think the MCP protocol itself is also early. Right now, MCP gives a long list of the resources available. You know, eventually, I think we need some more hierarchical discovery. Imagine you want to build something, I don't know, even, I don't know if that would be an MCP interface to a land graph.

But a land graph has so many API calls, you just can't have like a long list of everything under the sun for agents to sort out. So I think some sort of hierarchical discovery mechanism. So I think MCP is a really fantastic first step. Definitely encourage you to learn about it.

It will make your life easier, probably, if you find a good MCP server implementation to help with some of the data integrations. And I think it will be important. It's this idea of when you have, you know, N models or N agents and M data sources, it should not be an N times M effort to do all the integrations, it should be N plus M.

And I think MCP is a fantastic first step. It will need to evolve, but it's a fantastic first step toward that type of data integration. Another type of protocol that's seen less buzz than MCP is some of the agent-to-agent stuff. And I remember when we were at a conference a year or so ago, I think you were talking about multi-agent systems, which this would kind of enable.

So how do you see some of the multi-agent or agent-to-agent stuff evolving? Yeah, so I think, you know, agent-to-agent AI is still so early. Most of us, right, including me, we struggle to even make our code work. And so making my code, my agent work with someone else's agent, it feels like a two-miracle, you know, requirement.

So I see that when one team is building a multi-agent system, that often works because we build a bunch of agents, they communicate with themselves, we understand the protocols, that works. But right now, at least at this moment in time, and maybe I'm off, the number of examples I'm seeing of when, you know, one team's agent or collection of agents successfully engages.

They're totally different teams' agent or collection of agents. I think we're a little bit early to that. I'm sure we'll get there. But I'm not personally seeing, you know, real success, huge success stories of that yet. I'm not sure if you are seeing. No, I agree. I think it's super early.

I think if MCP is early, I think agent-to-agent stuff is even earlier. Another thing that's kind of like top of people's mind right now is kind of vibe coding and all of that. And you touched on it a little bit earlier with how people are using these AI coding assistants.

But how do you think about vibe coding? Is that a different skill than before? What kind of purpose does that serve in the world? You know, so I think, you know, many of us cope with barely looking at the code, right? I think it's a fantastic thing to be doing.

I think it's unfortunate that that called vibe coding, because it's misleading a lot of people, into thinking, just go with the vibes, you know, accept this, reject that. And frankly, when I'm coding for a day, you know, with vibe coding or whatever, with AI coding assistance, I'm frankly exhausted by the end of the day.

This is a deeply intellectual exercise. And so I think the name is unfortunate, but the phenomenon is real and it's been taking off and it's great. So over the last year, a few people have been advising others to not learn to code on the basis that AI will automate coding.

I think we'll look back at some of the worst career advice ever given. Because over the last many decades, as coding became easier, more people started to code. So it turns out, you know, when we went from punch cards to keyboards and terminals, right? Or when it turns out, I actually found some very old articles.

When programming went from assembly language to, you know, literally COBOL, there were people arguing back then, you know, we have COBOL, it's so easy, we don't need programmers anymore. And obviously, when it became easier, more people learned to code. And so with AI coding assistance, a lot more people should code.

But I think, and it turns out, one of the most important skills of the future for developers and non-developers is the ability to tell a computer exactly what you want so they will do it for you. And I think understanding at some level, which all of you do, I know, but understanding at some level how a computer works, lets you prompt or instruct the computer much more precisely, which is why I still try to advise everyone to, you know, learn one programming language, learn Python or something.

And then, I think, maybe some of you know this, I perceive a much stronger Python developer than, say, JavaScript, right? But with AI-CC coding, I now write a lot more JavaScript and TypeScript code than I ever used to. But even when debugging, you know, JavaScript code that something else wrote for me that I didn't write with my own fingers, really understanding, you know, what are the error cases, what does this mean?

That's been really important for me to, right, debug my JavaScript code. If you don't like the name Vibe Coding, do you have a better name in mind? Oh, that's a good question. I should think about that. We'll get back to you on that. That's a good question. One of the things that you announced recently is a new fund for AI Fund, so congrats on that.

Oh, thank you. For people in the audience who are maybe thinking of starting a startup or looking into that, what advice would you have for them? So, AI Fund's a venture studio. So, we build companies and we exclusively invest in companies that we co-founded. So, I think in terms of looking back on AI Fund's, you know, lessons learned, the number one, I would say the number one predictor of a startup success is speed.

I know we're in Silicon Valley, but I see a lot of people that have never seen yet the speed with which a skilled team can execute. And if you've never seen it before, I know many of you have seen it. It's just so much faster than, you know, anything that slower businesses know how to do.

And I think the number two predictor, also very important, is technical knowledge. It turns out, if we look at the skills needed to build a startup, there's some things like, how do you market? How do you sell? How do you price? You know, all that is important, but that knowledge has been around.

So, it's a little bit more widespread. But the knowledge that's really rare is, how does technology actually work? Because technology has been evolving so quickly. So, I have deep respect for the go-to-market people. Pricing is hard. You know, marketing is hard. Positioning is hard. But that knowledge is more diffuse.

And the most rare resource is someone that really understands how the technology works. So, AI Fund, we really like working with deeply technical people that have good instincts or understands, do this, don't do that. This lets you go twice as fast. And then, I think, a lot of the business stuff, you know, that knowledge is very important, but it's usually easier to figure out.

All right. That's great advice for starting something. We are going to wrap this up. We're going to go to a break now, but before we do, please join me in giving Andrew a big hand, and thank you. Thank you. Thank you. Thank you.

State of Agents with Andrew Ng

Transcript