What's new from Anthropic and what's next: Alex Albert

Good morning, everyone. So today, I want to start with a little story. A short history lesson, if you will. So, you know, sit back, get comfortable. I'm going to take us back to the year 1882. It's the dawn of the electrical revolution, really. The world's first commercial power plant just opened up.

Electricity, this amazing new force, is all the rage in the manufacturing industry. People are claiming that it's going to change everything. And yet, something very interesting happened around this time. Or rather, it didn't happen. You see, despite electricity's obvious superiority in comparison to the traditional techniques at the time, like steam engines, it didn't immediately improve manufacturing productivity.

Why? Well, because factory owners were simply trying to replace their old technology with this new technology into an outdated paradigm. Let's picture a typical factory at the time. So we have a huge coal-fired steam engine on one end, and we have a network of transmission lines going across the top, all driving hundreds of machines locked in the same rhythm.

These legacy steam power factories were incredibly inefficient. You know, if one station needed power, all of a sudden you had to turn on that entire steam engine and it had to power all of them. Factory layouts were dictated by the limitations of the transmission lines, not by what was best for the process or for the workers.

When electricity arrived, many factory owners simply swapped out the steam engine for an electric one. And sure, you know, they added some lights and, you know, workers didn't have to toil next to a coal-fired furnace all day. But the fundamental limitations of the factory remained. So, this real electrical revolution, well, it didn't actually come until we imagined factories from the ground up with electricity at its core.

Factories started to become flexible and adaptable. They allowed for smaller, specialized tools. Workers could bring their tools to the items instead of having to lug the items back to their workstations. The entire manufacturing process became more efficient, more humane, and more productive. Now, let's fast forward 140-something years to today, and you can see that we find ourselves at a similar point in regards to AI and LLMs.

Enterprises, startups, developers are all building and integrating LLMs into their products. But often, they're just tacking it onto their existing product surface, adding a few star icon buttons in the top left corner, and calling it a day. And this is not the first time we've seen this in Silicon Valley.

Let's think back to when mobile first emerged, right? Companies simply tried to just shrink down their website and put it on a phone. It wasn't until we redesigned apps from the ground up with the unique capabilities of mobile, like always-on camera and GPS, that we actually began to see true innovation in the space and adoption.

This is when the Snapchats and the Ubers of the world started to emerge. So, just as companies and just as factories went through their, you know, replaced steam engines with electric one phase, and tech companies went through their just-hire-a-couple mobile web dev people phase, we're now in our magic star icon phase with respect to AI.

And, yeah, it's funny, but the thing is you can't blame any of the companies or developers that are actually trying to do this right now, right? Like, all of us are trying to do this, but in many ways, we're just still so early. LLMs are non-deterministic. They're hard to build on.

They're completely different than what most developers are used to using. Reliability is still an issue. Prompts still take rounds and rounds of optimization. And we've also just started to scratch the surface of potential product opportunities. So far, not much has really stuck beyond just the text box. We've been missing something.

Something that's a little hard to put a finger on. But just last week, I think we scratched the surface of a potential new product feature that we can build. As some of you may have heard, last Thursday, we released our new model, Claude 3.5 Sonnet. 3.5 Sonnet is the first model that we released in the new Claude 3.5 family.

It's only the middle model, and yet it is better than our last best model, Claude 3 Opus. In my opinion, Claude 3.5 Sonnet is one of the best models in the world right now. And the benchmarks seem to back it up. MMLU, human eval, GPQA, tool use, all the common characters here.

It's top of its class in many regards in these academic lab-type environments. But what I am most excited for is how it actually does in the real world. The model is particularly strong in RAG use cases, thanks to its 200K context. It also has near-perfect recall over that entire context as well.

On coding tasks, 3.5 Sonnet seems to grasp debugging problems better. It's not getting stuck in those same loops as much as previous models. One of the best methods that we've found for actually measuring more complicated chains of reasoning is pull requests. They have a defined task. They usually take a few steps to solve.

And the model is able to iteratively write and test its way to a solution. In our own internal pull request evals, we're seeing that Claude 3.5 Sonnet scores a 64%. And to put that number in comparison, Claude 3 Opus only scored a 38%. 3.5 Sonnet also has state-of-the-art vision abilities.

It shows considerable improvement over 3 Opus in basically every benchmark that we tested it on. Things like table transcriptions and OCR are a breeze now. Passed this table in 3.5 Sonnet and basically replicated it perfectly in Markdown. Probably can't read all those numbers, but trust me, I double-checked them to make sure they're all right.

Vision capabilities were actually what amazed me the most when I started playing around with this model. It feels like we are really on the cutting edge of unlocking so many more use cases. And, you know, as you're hearing me say all this, you might be thinking, well, that's great, Alex, but, I mean, it doesn't mean anything if I can't actually use the model.

And you're right, and we heard you. And that's why 3.5 Sonnet is available on our API, AWS Bedrock, and Vertex AI. We understand that developers want choice when they're building, and we want Claude to be available wherever you are. In terms of pricing, 3.5 Sonnet is five times cheaper than 3 Opus.

It's only $3 per million input tokens and $15 per million output tokens. 3.5 Sonnet's combo of speed, intelligence, and low cost makes it much more economical to use and embed in your apps than 3 Opus. But 3.5 Sonnet is not all that we've released in the past week. We also released a new product feature that I think is actually more inspiring to developers in terms of thinking about and building those AI products from the ground up.

It's called Artifacts. Artifacts separate the content that Claude produces from the actual chat dialogue itself. This allows you to work collaboratively with Claude on things from essays to SVGs to React websites. Artifacts become really powerful when you combine it with 3.5 Sonnet. Those coding skills plus that reasoning ability plus that strong visual acuity enables a new product experience that's really fun to use.

It's also a developer's best friend in that it allows you to quickly take screenshots and figma diagrams and quickly turn it into code and components that you can actually just go use. As you can see in this, I basically cloned our entire Claude AI chat layout in React just from a single screenshot.

And this feature has practically been hiding in plain sight now, just waiting really to be discovered for over a year and a half. Maybe this tweet is right, and we really are early on this S-curve of productionizing LLMs, which I think is actually pretty inspiring. And Artifacts is not the only AI feature that we launched recently.

On Tuesday, we released Projects. Projects enables dev teams to work and collaborate much more efficiently by grounding Claude's outputs in your own knowledge, whether it's style guides or code bases or transcripts or even your past work. On our Claude team plan, you can even share these projects in your chats with all your teammates.

At Anthropic, our engineers now upload code repos and documentation that they use. And I've started to see people actually just share the chats and the artifacts instead of Google Docs or site documentation. Projects is another great example of when you think from an LLM and an AI standpoint first, you can actually start to build product experiences that complement these technologies and don't feel like a simple add-on to what you already have.

So now that hopefully the creative product juices are flowing in everyone's minds, I want to dive a little bit into API improvements that we've rolled out recently and things that allow you to actually build this cool stuff. I also want to give a preview of what's coming next that will enable you to build even more.

So a month ago, we released our new Tool Use API. Tool Use allows you to give Claude custom client-side functions that it can then intelligently leverage. Tool Use also enables things like consistent structured JSON output. With 3.5 Sonnet, I've actually started to see devs give Claude hundreds of tools at a time.

On the developer console front, we're also continuing to iterate. We added a prompt generator that uses Claude to write prompts for you based on a task description. So you can see in this video, we put in a task description, and then out comes an optimized prompt. And then once that prompt is all done, you can actually just start editing it right in the workbench itself.

You can see we've also added support for variables, so you can edit prompt templates as well, test things like RAG use cases. And finally, we're also working on a new Evaluate feature, which is currently in console right now with a beta tag. And we will plan to share more on this and continue to iterate on it very soon.

So what else is next? Well, there's two things that I can share right now. First is that you can expect more models. 3.5 Haiku and 3.5 Opus are coming later this year. With each model generation, we're looking to increase the intelligence, decrease the latency, and decrease the cost. The number one thing that I tell developers is to not forget to build with that in mind.

Models will become smarter, cheaper, and faster in orders of months, not years. When you're planning your product roadmap, be ambitious enough to build with the belief that new models may arrive during your development period. We are also working on other areas of research like interpretability. In one of our latest papers called Scaling Mono-Semanticity, we explained how we've been able to find features within models that activate for different topics.

Once you identify a feature, you're able to clamp its value and turn it up or down to actually steer the model's outputs. A few weeks ago, we showed Cloud.ai users how this worked through GoldenGate Cloud, which was a version of Cloud that had the GoldenGate bridge feature turned up significantly.

Yeah, fan favorite. We currently have a few beta testers also experimenting with a steering API. This allows developers to find and clamp features for specific attributes and actually turn that dial up or down, which, again, allows you to control Cloud's outputs in addition to actually just prompting it. We hope to be able to roll this out to more developers in the very near future as well.

Now, if anything in this talk has sparked any ideas, I want to encourage you guys to just go out there and build and make quick prototypes as fast as you can to get that validation and that feedback loop started. And for even more of an incentive, we actually just launched another build with Cloud contest yesterday.

It runs until July 10th. The top three projects will each receive 10K in Anthropic API credits. To see more details, just visit that link below. It's just at the top of our docs page as well, so you can find it there, too. I'll leave that up for a second.

And finally, if you have any questions or you want to hear more about just what we're thinking about, I'll be at that AWS booth down the hall for the next few hours. You can also find me on x slash Twitter at AlexAlbert with two underscores. I do try to read all my DMs.

I spend way too much time on that site, so feel free to ask questions there as well. And with that, I want to say thank you guys very much and enjoy the last day of the summit. I want to say thank you guys very much.

What's new from Anthropic and what's next: Alex Albert

Chapters

Transcript