Back to Index

Windsurf everywhere, doing everything, all at once - Kevin Hou, Windsurf


Chapters

0:0 Introduction to Windsurf: Discover the rapid growth and key features of this AI Engineer World's Fair product, including web search, MCP support, auto-generated memories, and parallel agents.
2:18 The Core Philosophy: Learn about the "secret sauce" behind Windsurf's intuitive, mind-reading AI, which creates a shared timeline between humans and AI.
3:46 Windsurf Everywhere: See the vision for Windsurf to ingest context from all developer tools, including Google Docs, Figma, GitHub, Notion, and Linear.
6:21 Windsurf Doing Everything: Explore how the AI will expand beyond coding to interact with third-party services, write design documents, and more.
8:40 Windsurf On All the Time: Understand the goal of creating a nearly autonomous AI that works in the background to assist developers.
11:17 Introducing SWE-1: Get a first look at the new software engineering model trained for entire workflows.
11:36 Benchmarking Success: Learn about the End-to-End Task Benchmark and Conversational SWE Task Benchmark, showcasing Windsurf's impressive results.
13:32 The Data Flywheel: Understand the feedback loop that drives Windsurf's continuous improvement.
14:49 The Future of AI Products: Hear Kevin's thoughts on the harmony of model, data, and application needed to build successful AI products in 2025.

Transcript

Hello. How we doing? All right, how's the energy level? Good, good, yes, let's go, let's go. Two more, two more. My name is Kevin, I lead product at Windsurf, and I'm super excited to be back here. Thank you so much, Swix and Ben, it's always a pleasure to come back to AI Engineer World's Fair.

The velocity of our industry right now is incredible. It's like being on a kite on the ocean, and we're really excited to see where the winds are taking us. A year ago, we didn't have Windsurf. People were coding with autocomplete, no one had heard of an agent, and now the Windsurf editor is being used by millions and millions of people all around the world.

And hopefully this is a larger number than last time. How many people have heard of Windsurf? And how many people have used Windsurf? Good numbers, good numbers. We've got to improve that. And Windsurf itself has changed immensely in the last six months since its launch in November. We retired the name Codium because we decided to catch this new wave, which is, by the way, what we call our next generation innovations in the product.

We call them waves. And in case you missed it, we are now 10 waves in. And some of the key waves we've been really excited about: web search, MCP support, auto-generated memories - oh, I was supposed to do that - auto-generated memories, deploys, and parallel agents, to name just a few.

And as the waves keep growing, as do the number of people that have discovered and integrated Windsurf into their daily workflows. To this day we are generating about 90 million lines of code every single day. And that equates to around a thousand, over a thousand messages sent every single minute.

But today is not about growth. I'm not going to sit here and tell you about the numbers. I'm going to tell you about the why. Why do people feel connected to the Windsurf editor? And I know no AI company really wants to disclose its secrets, but I had to come up with some content.

So today I'm going to let you in on one of ours. Our secret sauce is a shared timeline between the human and the AI. And this is what makes people feel like we're reading their minds. And now everything you do as a software engineer can be thought of on this shared timeline.

So if we rewind way back to the dark days - this is pre-autocomplete when everyone knew how to write a for loop - AI had to do everything. You had to edit files. You had to type every single character. Imagine that. But then once services like Copilot, like Codium, they launched, then devs got really excited.

They started seeing a small percentage of their code being written by AI, and we started to abstract and accelerate the number of small edits, small actions that we would do for a user. And in late 2024, with the advent of Windsurf's agent and the launch of the Windsurf editor, we saw that we could do more and more for the user.

We started being able to edit multiple files at once, perform background research across thousands and thousands of files, and execute terminal commands directly inside the editor. But at Windsurf, we're in the business of trying to change how software gets created. And this means that the timeline is actually a little bit more complicated.

It needs to handle actions taken outside of just the IDE. And so given how much of a developer's workflow happens outside of the editor, what does this mean for Windsurf? First, Windsurf is going to be everywhere. Specifically, Windsurf will need to be able to read and ingest context from every single source that a developer uses.

And if we zoom out and think about what makes you all, software engineers, successful, there are a couple of different categories. The first of which, coding related. File reads, running terminal commands, seeing your history, even which tabs you have open inside of your editor. This all informs how to generate the correct code.

But it goes beyond that. There's external sources. Things like going onto GitHub and viewing a past history of commits. Maybe looking at a PR that is doing something similar to the feature you're about to implement. Doing online searches, web searches, looking at documentation. And then there's the third category, and this is where it gets a little bit interesting.

It's called meta-learning. It's the idea of what separates a junior engineer from a senior engineer from from a staff engineer. These are the organizational best practices, the engineering preferences that all get encoded into what makes good code. And so if we think about what this means in practice, let's say that we are going to build a new page on a data viz dashboard.

Let's walk through step by step. So first, you would probably start in Slack, as all good things start from Slack. You'll build context, looking at a bunch of maybe customer requests. Maybe you'll have some internal messages. You'll collect that context, and you'll start planning. And this means you're going to be in Google Docs.

You're going to be writing design docs, probably working on some infrastructure designs. You're going to be tracking tickets inside of Jira. And then you might have a designer who's actually working in Figma in parallel, putting together all this material. And then finally the fun part, or at least this is my favorite part, which is the actual writing of the code, and hopefully use something like Windsurf to do so.

But you're not done from there. Once your code complete, you still have to open the PR. You've got to get reviews, you've got to merge into main, you've got to deploy SEO, analytics, the list goes on and on and on. And this is really why we've built what we've built.

Because we know that for you, it's extremely important that we can fetch context from your Google Docs, that we can read your Figma files, and that we can one-click connect to any MCP service so that you can access your information in things like Notion, Linear, Stripe, and countless others.

And we've spent the last 10 waves making sure that Windsurf can be ubiquitous. But we know that's also not enough. We know it's not enough just to read. We need to be able to do and write everything. We need to be able to do it all for you. And so the AI has to take action on a wide variety of surfaces beyond just the coding surface in order to accomplish what a human software engineer would do.

And so this doesn't mean just write code. This means interacting with third-party services, provisioning API keys, writing design docs, PRDs, wireframing, testing, and the list could go on and on and on. And so for the last six months we've oriented ourselves around how do we do everything. And if we go back to this concrete example of building a new web app, where do we start?

We start by running code-based relevant terminal commands. This is something that we launched right at the advent of Windsurf. And what's really cool about what we can do here is that we can intelligently decide which commands we want to run automatically and which ones we want to wait and ask for explicit user approval.

Next, you'll open up Windsurf browser previews, which allows you to iterate from there. It allows you to visually iterate with the agent so that Windsurf can take control of Chrome just like you would, inspecting DOM elements, looking at your JS console, being able to do what a web developer would do.

And so now you could say our app is code complete. We'll use the GitHub MCP to open up a pull request. And we can use context from your other PRs to be able to inform the description. And inform the test plan. And code review is still a necessary part of any software company.

And so we launched Windsurf reviews, which can automatically leave comments and suggest changes asynchronously so that you can be confident that the code that hits main is production ready. So now that your code is merged, you'll want to be able to deploy. And so we also released a one-click service to Netlify so that you can use Windsurf's custom tool integrations to actually just in one click the agent will deploy what you have to the live web.

And so as you can see, we've really built the ability for Windsurf to read everything that you can and do everything or almost everything that a software engineer can. So then you might ask, what's next? It's only inevitable that Windsurf will be on all the time, working for you even when you don't know it.

We pioneered the agentic human-in-the-loop synchronous workflows back when we released Windsurf in 2024. And today, timelines are 80 to 90% agent, 10 to 20% human. But we're trying to build towards a future that gets the 99% agent and 1% human. We only want to ask the user for final approval.

And as more and more of these timelines and workflows become AI-powered, it becomes possible to have Windsurf working for you at all times. Not only as you type and use autocomplete and tab, but also in the background, researching when you're working, fully in parallel, only asking you to approve.

And we want to build this future where you can code anytime. You can write software at any time. This includes your bed, this includes the toilet, when you're on the bus, voice-activated Alexa, the possibilities are endless. And so now that we've defined the problem, it's a little bit more structured.

You could say, all right, we'll throw GPT, we'll throw Gemini at this timeline problem. But then from there, where do we go? How do we improve? And specifically, how is Windsurf able to tackle this problem of the timeline? And if we take a step back, this really doesn't look like we're writing code anymore.

This looks significantly more complicated than your average competitive program in question. Windsurf wants to revolutionize the way that software gets built. It's not just how code gets written. We are solving a broader set of tasks than just code. And while the industry focuses heavily on things like Sweebench, we know that the future is not going to be tokens in, tokens out.

Software engineering workflows are going to be much messier than this. It means that you have to be able to pick up tasks mid-workflow. You have to be able to deal with messy code-based states mid-commit. And you will have to work with tools that are outside of the editor. And so we have to be able to ingest and perform over this broad set of actions on this timeline to keep our users in the flow.

We have to be able to open up PRs. We have to know when to access analytics. We need to know how to debug your CI/CD all by itself. And this problem starts to look really, really different from what people are evaling on. And because we have our own representation of this timeline we needed a different system to be able to handle these types of actions than what the off-the-shelf frontier models could give us.

And so where are we going with this? The realization of this is our brand new software engineering model called SweeOne. We realized ourselves that we could actually dream bigger and build the best software engineering model that we could. SweeOne is trained to handle software engineering workflows, not just purely code generation.

And we use two main offline eval benchmarks. The first one, end-to-end task benchmark. This is basically tackling pull requests. This is saying, given an intent, given the starting point of a code base, how do we get to the end and pass all the unit tests? Familiar paradigm. The second one is where it gets a little bit more interesting.

This is what we call a conversational SweeTask benchmark. And this is how well the model can assist when it's being dropped into an existing user conversation or a partially completed task. And so this actually lends itself very nicely to the windsurf paradigm, right? Because we're not going cleanly from start to end.

We're assisting in helping you along the way, mid-timeline. And so it results in this blended score of helpfulness, efficiency, and correctness, and really tests the model's ability to seamlessly integrate into the windsurf style of working. And this initial performance really gives us a lot of confidence in SweeOne's architecture.

Specifically, how we've been able to train for software engineering workflows. And we've been able to achieve near-frontier model results at the fraction of the cost and with a significantly smaller team. And one of windsurf's greatest strengths, of course, is in the value of community. Real software engineers doing real work, giving real feedback.

And what we found is that SweeOne, it's in the little drop-down for the models, it's right up there with the rest of the frontier models. People are choosing SweeOne because it recognizes how they do work, not necessarily how to generate code. And it's contributing, actually, an even higher frequency than models like 3.7 and 3.5.

Windsurf builds at the frontier so that our users can build more with the best technology. We learn from our failure modes so that we can iterate from there. And what does this start to look like? Dare I say it? A data flywheel? We ship the best product. Devs and non-devs use that product to level up as a skill multiplier or as a skill enabler.

Users then help us find the frontier. They use things like thumbs up, thumbs down, accept, reject, constantly informing us not of what the SweeBench frontier is, but what is the software engineering frontier. What tools are missing? Which workflows are repeated? Where does the product fall short? And we take those insights and we build at this frontier.

We train a better model. We build more tools. We improve our agentic harness. We improve our memories, our checkpointing, with the goal of being everywhere, doing anything. And we will repeat this cycle. We will be shipping, finding the frontier, building at the margin, and repeating. And what gets me really personally excited about this is SweeOne is really an example of this in action.

We have a very small team, significantly fewer resources than the larger companies, and we were able to achieve near-frontier model quality results with SweeOne. And even more so, this is really a demonstration of what it means to build AI products in 2025. It demands this harmony of model, data, and application, where the application is actually mimicking the user behavior that you want to replicate inside of your model.

And this is how Windsurf will be everywhere, doing everything all at once. Thank you so much for listening. I won't give you any promises, but someone made a profit. But in all seriousness, thank you so much for listening. I want to make sure that every engineer out there is using the best possible tools.

So please give Windsurf a try today, and we are also hiring across a number of different roles. We have a booth downstairs, so please come join us. Help make this future a reality. Thank you.