Windsurf everywhere, doing everything, all at once

00:00:00.000 | Hello. How we doing? All right, how's the energy level? Good, good, yes, let's go, let's go. Two more,

00:00:27.600 | two more. My name is Kevin, I lead product at Windsurf, and I'm super excited to be back here.

00:00:33.760 | Thank you so much, Swix and Ben, it's always a pleasure to come back to AI Engineer World's Fair.

00:00:38.400 | The velocity of our industry right now is incredible. It's like being on a kite on the ocean,

00:00:45.840 | and we're really excited to see where the winds are taking us. A year ago, we didn't have Windsurf.

00:00:53.280 | People were coding with autocomplete, no one had heard of an agent, and now the Windsurf editor is

00:00:59.200 | being used by millions and millions of people all around the world. And hopefully this is a larger

00:01:05.280 | number than last time. How many people have heard of Windsurf? And how many people have used Windsurf?

00:01:12.880 | Good numbers, good numbers. We've got to improve that. And Windsurf itself has changed immensely in

00:01:19.600 | the last six months since its launch in November. We retired the name Codium because we decided to catch

00:01:25.280 | this new wave, which is, by the way, what we call our next generation innovations in the product. We call

00:01:31.840 | them waves. And in case you missed it, we are now 10 waves in. And some of the key waves we've been really

00:01:37.840 | excited about: web search, MCP support, auto-generated memories - oh, I was supposed to do that - auto-generated

00:01:45.120 | memories, deploys, and parallel agents, to name just a few. And as the waves keep growing, as do the number of

00:01:54.560 | people that have discovered and integrated Windsurf into their daily workflows. To this day we are

00:02:00.480 | generating about 90 million lines of code every single day. And that equates to around a thousand,

00:02:06.400 | over a thousand messages sent every single minute. But today is not about growth. I'm not going to sit

00:02:12.320 | here and tell you about the numbers. I'm going to tell you about the why. Why do people feel connected to

00:02:17.120 | the Windsurf editor? And I know no AI company really wants to disclose its secrets, but I had to come up with

00:02:24.160 | some content. So today I'm going to let you in on one of ours. Our secret sauce is a shared timeline

00:02:32.960 | between the human and the AI. And this is what makes people feel like we're reading their minds.

00:02:38.000 | And now everything you do as a software engineer can be thought of on this shared timeline. So if we

00:02:45.280 | rewind way back to the dark days - this is pre-autocomplete when everyone knew how to write a for loop - AI had to do

00:02:51.040 | everything. You had to edit files. You had to type every single character. Imagine that.

00:02:55.280 | But then once services like Copilot, like Codium, they launched, then devs got really excited. They

00:03:02.800 | started seeing a small percentage of their code being written by AI, and we started to abstract and

00:03:07.280 | accelerate the number of small edits, small actions that we would do for a user. And in late 2024,

00:03:13.840 | with the advent of Windsurf's agent and the launch of the Windsurf editor, we saw that we could do

00:03:18.560 | more and more for the user. We started being able to edit multiple files at once, perform background

00:03:24.640 | research across thousands and thousands of files, and execute terminal commands directly inside the

00:03:30.320 | editor. But at Windsurf, we're in the business of trying to change how software gets created. And this

00:03:38.080 | means that the timeline is actually a little bit more complicated. It needs to handle actions taken outside

00:03:43.440 | of just the IDE. And so given how much of a developer's workflow happens outside of the editor,

00:03:51.440 | what does this mean for Windsurf? First, Windsurf is going to be everywhere.

00:03:57.440 | Specifically, Windsurf will need to be able to read and ingest context from every single source that a

00:04:06.240 | developer uses. And if we zoom out and think about what makes you all, software engineers, successful,

00:04:12.480 | there are a couple of different categories. The first of which, coding related. File reads, running

00:04:18.960 | terminal commands, seeing your history, even which tabs you have open inside of your editor. This all informs

00:04:24.320 | how to generate the correct code. But it goes beyond that. There's external sources. Things like going

00:04:30.640 | onto GitHub and viewing a past history of commits. Maybe looking at a PR that is doing something similar to

00:04:35.600 | the feature you're about to implement. Doing online searches, web searches, looking at documentation.

00:04:40.720 | And then there's the third category, and this is where it gets a little bit interesting.

00:04:44.400 | It's called meta-learning. It's the idea of what separates a junior engineer from a senior engineer from

00:04:51.280 | from a staff engineer. These are the organizational best practices, the engineering preferences that

00:04:56.880 | all get encoded into what makes good code. And so if we think about what this means in practice,

00:05:04.080 | let's say that we are going to build a new page on a data viz dashboard. Let's walk through step by

00:05:08.800 | step. So first, you would probably start in Slack, as all good things start from Slack. You'll build context,

00:05:14.160 | looking at a bunch of maybe customer requests. Maybe you'll have some internal messages. You'll collect that

00:05:18.560 | context, and you'll start planning. And this means you're going to be in Google Docs. You're going to

00:05:23.040 | be writing design docs, probably working on some infrastructure designs. You're going to be tracking

00:05:27.840 | tickets inside of Jira. And then you might have a designer who's actually working in Figma in parallel,

00:05:33.040 | putting together all this material. And then finally the fun part, or at least this is my favorite part,

00:05:37.600 | which is the actual writing of the code, and hopefully use something like Windsurf to do so.

00:05:41.040 | But you're not done from there. Once your code complete, you still have to open the PR. You've got to get reviews,

00:05:46.800 | you've got to merge into main, you've got to deploy SEO, analytics, the list goes on and on and on.

00:05:51.760 | And this is really why we've built what we've built. Because we know that for you,

00:05:57.680 | it's extremely important that we can fetch context from your Google Docs, that we can read your Figma

00:06:03.440 | files, and that we can one-click connect to any MCP service so that you can access your information in

00:06:09.680 | things like Notion, Linear, Stripe, and countless others. And we've spent the last 10 waves making sure

00:06:16.640 | that Windsurf can be ubiquitous. But we know that's also not enough. We know it's not enough just to read.

00:06:24.480 | We need to be able to do and write everything. We need to be able to do it all for you.

00:06:28.560 | And so the AI has to take action on a wide variety of surfaces beyond just the coding surface in order to

00:06:37.440 | accomplish what a human software engineer would do. And so this doesn't mean just write code. This means

00:06:42.160 | interacting with third-party services, provisioning API keys, writing design docs, PRDs, wireframing,

00:06:49.200 | testing, and the list could go on and on and on. And so for the last six months we've oriented ourselves

00:06:55.280 | around how do we do everything. And if we go back to this concrete example of building a new web app,

00:07:02.880 | where do we start? We start by running code-based relevant terminal commands. This is something

00:07:09.040 | that we launched right at the advent of Windsurf. And what's really cool about what we can do here is

00:07:13.760 | that we can intelligently decide which commands we want to run automatically and which ones we want to wait

00:07:19.040 | and ask for explicit user approval. Next, you'll open up Windsurf browser previews, which allows you

00:07:26.480 | to iterate from there. It allows you to visually iterate with the agent so that Windsurf can take

00:07:30.560 | control of Chrome just like you would, inspecting DOM elements, looking at your JS console, being able to

00:07:36.000 | do what a web developer would do. And so now you could say our app is code complete. We'll use the GitHub MCP

00:07:43.760 | to open up a pull request. And we can use context from your other PRs to be able to inform the description.

00:07:48.880 | And inform the test plan. And code review is still a necessary part of any software company.

00:07:56.560 | And so we launched Windsurf reviews, which can automatically leave comments and suggest changes

00:08:02.000 | asynchronously so that you can be confident that the code that hits main is production ready.

00:08:08.320 | So now that your code is merged, you'll want to be able to deploy. And so we also released a one-click service

00:08:15.760 | to Netlify so that you can use Windsurf's custom tool integrations to actually just in one click the agent

00:08:21.600 | will deploy what you have to the live web. And so as you can see, we've really built the ability for

00:08:29.200 | Windsurf to read everything that you can and do everything or almost everything that a software engineer can.

00:08:36.160 | So then you might ask, what's next?

00:08:38.880 | It's only inevitable that Windsurf will be on all the time, working for you even when you don't know it.

00:08:46.880 | We pioneered the agentic human-in-the-loop synchronous workflows back when we released Windsurf in 2024.

00:08:54.560 | And today, timelines are 80 to 90% agent, 10 to 20% human. But we're trying to build towards a future that gets

00:09:04.240 | the 99% agent and 1% human. We only want to ask the user for final approval. And as more and more of

00:09:12.800 | these timelines and workflows become AI-powered, it becomes possible to have Windsurf working for you at

00:09:17.920 | all times. Not only as you type and use autocomplete and tab, but also in the background, researching when

00:09:24.960 | you're working, fully in parallel, only asking you to approve. And we want to build this future where you

00:09:32.160 | can code anytime. You can write software at any time. This includes your bed, this includes the toilet, when

00:09:39.360 | you're on the bus, voice-activated Alexa, the possibilities are endless. And so now that we've defined the

00:09:47.120 | problem, it's a little bit more structured. You could say, all right, we'll throw GPT, we'll throw

00:09:51.360 | Gemini at this timeline problem. But then from there, where do we go? How do we improve? And specifically,

00:09:58.080 | how is Windsurf able to tackle this problem of the timeline? And if we take a step back, this really

00:10:05.840 | doesn't look like we're writing code anymore. This looks significantly more complicated than your average

00:10:10.800 | competitive program in question. Windsurf wants to revolutionize the way that software gets

00:10:17.040 | built. It's not just how code gets written. We are solving a broader set of tasks than just code. And while the

00:10:24.480 | industry focuses heavily on things like Sweebench, we know that the future is not going to be tokens

00:10:30.080 | in, tokens out. Software engineering workflows are going to be much messier than this. It means that you have to be able to

00:10:35.280 | pick up tasks mid-workflow. You have to be able to deal with messy code-based states mid-commit. And you will have to work with tools that are outside of the editor.

00:10:44.720 | And so we have to be able to ingest and perform over this broad set of actions on this timeline to keep

00:10:52.160 | our users in the flow. We have to be able to open up PRs. We have to know when to access analytics. We need to know

00:10:58.160 | how to debug your CI/CD all by itself. And this problem starts to look really, really different from what

00:11:04.160 | people are evaling on. And because we have our own representation of this timeline we needed a different

00:11:11.600 | system to be able to handle these types of actions than what the off-the-shelf frontier models could give us.

00:11:15.600 | And so where are we going with this? The realization of this is our brand new software engineering model

00:11:24.000 | called SweeOne. We realized ourselves that we could actually dream bigger and build the best software

00:11:30.000 | engineering model that we could. SweeOne is trained to handle software engineering workflows, not just

00:11:37.360 | purely code generation. And we use two main offline eval benchmarks. The first one, end-to-end task

00:11:45.040 | benchmark. This is basically tackling pull requests. This is saying, given an intent, given the starting point of a

00:11:50.320 | code base, how do we get to the end and pass all the unit tests? Familiar paradigm. The second one is where it gets

00:11:56.960 | a little bit more interesting. This is what we call a conversational SweeTask benchmark. And this is how well the

00:12:03.120 | model can assist when it's being dropped into an existing user conversation or a partially completed task. And so this

00:12:10.960 | actually lends itself very nicely to the windsurf paradigm, right? Because we're not going

00:12:14.480 | cleanly from start to end. We're assisting in helping you along the way, mid-timeline. And so it results in this

00:12:20.880 | blended score of helpfulness, efficiency, and correctness, and really tests the model's ability to

00:12:26.720 | seamlessly integrate into the windsurf style of working. And this initial performance really gives us a lot of

00:12:32.880 | confidence in SweeOne's architecture. Specifically, how we've been able to train for software engineering workflows.

00:12:40.080 | And we've been able to achieve near-frontier model results at the fraction of the cost

00:12:46.480 | and with a significantly smaller team.

00:12:49.040 | And one of windsurf's greatest strengths, of course, is in the value of community. Real software engineers

00:12:56.800 | doing real work, giving real feedback. And what we found is that SweeOne, it's in the little drop-down for the models,

00:13:03.680 | it's right up there with the rest of the frontier models. People are choosing SweeOne because it

00:13:08.400 | recognizes how they do work, not necessarily how to generate code. And it's contributing, actually,

00:13:14.640 | an even higher frequency than models like 3.7 and 3.5.

00:13:19.280 | Windsurf builds at the frontier so that our users can build more with the best technology.

00:13:27.200 | We learn from our failure modes so that we can iterate from there. And what does this start to look

00:13:33.680 | like? Dare I say it? A data flywheel? We ship the best product. Devs and non-devs use that product to

00:13:41.920 | level up as a skill multiplier or as a skill enabler. Users then help us find the frontier. They use things

00:13:50.240 | like thumbs up, thumbs down, accept, reject, constantly informing us not of what the SweeBench frontier is,

00:13:56.800 | but what is the software engineering frontier. What tools are missing? Which workflows are repeated? Where does the product fall short?

00:14:06.000 | And we take those insights and we build at this frontier. We train a better model. We build more tools.

00:14:15.600 | We improve our agentic harness. We improve our memories, our checkpointing, with the goal of being everywhere,

00:14:22.240 | doing anything. And we will repeat this cycle. We will be shipping, finding the frontier, building at the margin,

00:14:31.360 | and repeating. And what gets me really personally excited about this is SweeOne is really an example of this in action.

00:14:39.360 | We have a very small team, significantly fewer resources than the larger companies, and we were able to

00:14:45.600 | achieve near-frontier model quality results with SweeOne. And even more so, this is really a demonstration

00:14:53.440 | of what it means to build AI products in 2025. It demands this harmony of model, data, and application,

00:15:03.520 | where the application is actually mimicking the user behavior that you want to replicate inside of your model.

00:15:08.720 | And this is how Windsurf will be everywhere, doing everything all at once.

00:15:17.040 | Thank you so much for listening.

00:15:20.160 | I won't give you any promises, but someone made a profit. But in all seriousness, thank you so much for

00:15:40.320 | listening. I want to make sure that every engineer out there is using the best possible tools. So please

00:15:45.200 | give Windsurf a try today, and we are also hiring across a number of different roles. We have a booth

00:15:49.280 | downstairs, so please come join us. Help make this future a reality. Thank you.

Windsurf everywhere, doing everything, all at once - Kevin Hou, Windsurf

Chapters