Anthropic released Claude 3.7 Sonnet with a blog post here. Here’s the system card.
Some highlights:
- The reasoning and non-reasoning models are the same, you do have to select which one you want They’ve exposed the thinking tokens (which is different than OpenAI’s o1/o3 models)
- I tried one of the examples in the system card to demonstrate the thinking tokens: https://claude.ai/share/919a9387-4cd0-4163-a216-6f5650c28c03
- Anthropic does do a fantastic job on safety testing. They did a controlled trial on bioweapons acquisitions.
Some highlights from AI (using Claude 3.7 Sonnet):
- Extended thinking mode - Claude 3.7 Sonnet features a new “extended thinking” capability that allows the model to reason through complex problems step-by-step before giving a final answer, particularly valuable for mathematical problems and multi-step reasoning tasks.
- Hybrid reasoning model - Claude 3.7 Sonnet is described as a “hybrid reasoning model” that combines standard thinking with this extended reasoning capability.
- Improved refusal rates - The model reduces unnecessary refusals by 45% in standard thinking mode and 31% in extended thinking mode compared to Claude 3.5 Sonnet, providing more helpful responses to ambiguous queries.
- Visible reasoning process - Anthropic has decided to make Claude’s reasoning process visible to users, enhancing transparency and allowing users to better understand how the model reaches conclusions.
- Safety evaluations - The model underwent comprehensive testing across CBRN (Chemical, Biological, Radiological, Nuclear), cybersecurity, and autonomy risks, with ASL-2 safeguards determined appropriate.
- Capabilities improvements - Claude 3.7 Sonnet shows improved performance across multiple domains while maintaining strong bias and safety metrics.
- Chain-of-thought faithfulness - Research shows that while reasoning chains improve outputs, they don’t always fully reveal all factors influencing the model’s conclusions.
- Knowledge cutoff date - Claude 3.7 Sonnet’s reliable knowledge extends through October 2024.
They also released Claude Code, an AI Coding agent. More of these are coming out. It does feel like the way software development is done is changing very rapidly.