Spotlight on Databricks | Code w/ Claude

00:00:00.000 | Thank you for sticking around in here, I suppose.

00:00:16.240 | It'd probably be the most apropos thing to say.

00:00:18.220 | Thank you for joining me today.

00:00:20.540 | I wanted to talk a little bit about how all of this technology

00:00:23.760 | actually gets into the path of value

00:00:25.980 | inside large organizations and large businesses.

00:00:29.640 | As it would turn out, the ability for us to go prototype cool stuff

00:00:34.420 | versus us go and deliver these things into the critical path

00:00:37.800 | can vary widely.

00:00:39.120 | I'm Craig.

00:00:39.760 | I lead product management for Databricks,

00:00:42.280 | in case you hadn't figured that yet.

00:00:44.040 | Been with Databricks for about three years.

00:00:46.600 | Before that was at Google, where I was the leader of product

00:00:49.660 | for the founding of Vertex AI,

00:00:51.840 | and before that I was the founding general manager of AWS SageMaker.

00:00:55.800 | So I've been, as my wife says, continuing to strike out

00:00:59.920 | as I try and get better and better at helping enterprises build AI.

00:01:04.160 | But as we dive into this,

00:01:07.000 | I wanted to quickly just set a little bit of context on who Databricks,

00:01:11.140 | why Databricks, and why is Databricks here talking to you,

00:01:14.360 | and what have you.

00:01:15.340 | We are a leading data platform, cross-cloud data platform,

00:01:21.020 | tens of thousands of customers, billions of dollars in revenue,

00:01:26.240 | and moreover, the creator of a number of open source,

00:01:29.280 | very popular open source capabilities,

00:01:31.340 | Spark, MLflow, Delta, et cetera.

00:01:36.120 | You know, we live in a world, Brad just a minute ago,

00:01:39.560 | he talked about the importance of the model

00:01:41.880 | and then the data you bring to the model.

00:01:44.420 | And the enterprises we work with have a kind of nightmarish data scenario

00:01:49.540 | because, you know, you talk to these large multinational banks

00:01:53.940 | or something like that,

00:01:54.980 | and they've done dozens, if not scores of acquisitions over the years,

00:02:00.940 | and they have data on every cloud, in every possible vendor,

00:02:05.260 | in every possible service,

00:02:06.820 | and they're trying at this moment

00:02:09.000 | to figure out how to take advantage

00:02:12.140 | of this kind of transformational technological moment,

00:02:16.980 | but they're doing it with kind of a mess in the back end, if you will, right?

00:02:21.140 | And it turns out the problem is actually much worse than this

00:02:23.860 | because it's not like they just have one data warehouse

00:02:26.540 | or something like that.

00:02:27.500 | They often have many of them, right?

00:02:29.540 | And often the experts in one or two of these systems

00:02:33.620 | are only experts in one or two of these systems,

00:02:36.360 | and they don't know the other systems.

00:02:38.240 | So if you're stuck in your data warehouse

00:02:40.740 | or your streaming person isn't a Gen AI person,

00:02:43.800 | you may find yourself kind of locked out

00:02:46.180 | of being able to bring your data into these systems

00:02:50.720 | as easily as you want to.

00:02:51.840 | Now, I'm not going to go head on into Databricks.

00:02:53.980 | Databricks, ultimately, we help you manage your data,

00:02:56.900 | and then on top of that management of your data,

00:02:59.520 | we have a whole series of capabilities.

00:03:01.360 | I'm going to really focus on our AI capabilities with Mosaic AI today.

00:03:06.480 | Now, we think of this as a difference between what we call general intelligence

00:03:12.700 | and data intelligence.

00:03:13.940 | Both of these things are extraordinarily useful and extraordinarily important.

00:03:20.000 | But as Brad talked about, particularly for businesses or large enterprises,

00:03:27.000 | as they want to move into using this technology to automate more of their systems

00:03:32.560 | or drive greater insights within their organization,

00:03:35.780 | almost always it comes back to connecting it.

00:03:39.780 | We saw here Brad connecting it to the web or connecting it to MCP servers,

00:03:44.480 | but inevitably it comes back to trying to connect it to their data estate, right?

00:03:49.720 | So for a really good example of this, FactSet.

00:03:52.820 | I don't know if you guys have heard of FactSet.

00:03:54.300 | FactSet is a financial services company that sells data about other companies.

00:03:59.420 | They sell financial data about companies to banks and hedge funds and what have you.

00:04:04.080 | FactSet has their own query language,

00:04:07.560 | which is now a yellow flag to me when considering employers.

00:04:12.900 | If your employer has their own query language,

00:04:15.460 | you've got to think about whether or not this is the right place to be.

00:04:19.540 | Having said that, I did work at Google,

00:04:21.160 | who I think probably has a dozen of their own query languages.

00:04:23.600 | So FactSet had this problem and opportunity,

00:04:28.060 | which is that any customer they had who wanted to access their data,

00:04:32.380 | they had to learn FQL, FactSet Query Language, creative name in there.

00:04:37.740 | And so when this whole Gen.AI craze started,

00:04:41.460 | these guys lost their minds with excitement

00:04:43.160 | because they thought,

00:04:43.860 | what if we could translate English into FactSet query language?

00:04:48.660 | And so they went to their favorite cloud of choice.

00:04:51.580 | They hit the one-click rag button.

00:04:54.600 | I think they did a little more than the one-click rag button,

00:04:57.160 | but they basically showed up with this massive prompt

00:05:00.900 | of a bunch of examples and a bunch of documentation

00:05:03.900 | and then a massive VectorDB of a bunch of prompts

00:05:07.660 | and a bunch of documentation or a bunch of examples

00:05:10.680 | and a bunch of documentation.

00:05:11.780 | And this is what they ended up with, right?

00:05:14.560 | They ended up with 59% accuracy in about 15 seconds of latency.

00:05:20.340 | And I share with you that latency metric,

00:05:22.540 | not just because it's an important customer experience metric

00:05:27.340 | and all of these kinds of things,

00:05:28.480 | but in this world of Gen.AI,

00:05:30.040 | it's probably the closest thing we have to a cost metric, right?

00:05:33.920 | You're more or less paying for compute time.

00:05:35.700 | And so that 15 seconds is basically 15 seconds of cost, right?

00:05:39.560 | And 59% accuracy.

00:05:41.340 | With this, they showed up,

00:05:43.200 | they contacted us and said,

00:05:44.560 | hey, good news.

00:05:45.940 | We've got a Gen.AI solution.

00:05:48.160 | Bad news.

00:05:49.500 | It's just slightly better than a coin flip kind of thing, right?

00:05:53.440 | And so we worked with them on this problem

00:05:56.580 | and tried to understand what the opportunity was,

00:06:00.600 | what the challenge was.

00:06:01.440 | And really what we did was we just decomposed the prompt

00:06:05.780 | into each of the individual tasks

00:06:09.740 | that that prompt was being asked to use, right?

00:06:12.380 | So effectively what we did was we took that prompt

00:06:15.340 | and created kind of something of an agent,

00:06:18.100 | a multi-node, a multi-step chain or process

00:06:22.300 | to be able to solve this problem more wholly.

00:06:25.120 | And really the reason we did that was

00:06:28.520 | because it allowed us the opportunity

00:06:30.780 | to start tuning performance

00:06:32.420 | at each step of this problem, right?

00:06:35.820 | And you can see we got them to 85% accuracy

00:06:38.760 | in six seconds of latency.

00:06:40.320 | At 85% accuracy, they did two things.

00:06:43.940 | They turned to us and they said,

00:06:45.440 | cool, we're comfortable showing this

00:06:47.540 | to our existing customers.

00:06:48.820 | And they said, we get how you're helping us.

00:06:52.320 | We don't want to pay you to help us anymore.

00:06:54.320 | We'll take it from here.

00:06:55.640 | Last I talked to them, they had it into the 90s.

00:06:58.820 | And last I talked to them,

00:07:00.200 | transitioning to Claude was one of their next roadmap items.

00:07:04.980 | The reason I say all of this

00:07:07.620 | is because there's a paper out there

00:07:09.440 | from the Berkeley Artificial Intelligence Research Lab,

00:07:13.240 | which if you look into it,

00:07:15.520 | yes, there's a little bit of cross-pollination

00:07:17.460 | between us and Berkeley.

00:07:19.280 | But basically the folks at Berkeley did a,

00:07:23.100 | right after Gen.ai kind of really hit its stride,

00:07:26.420 | they went out and they looked at all the popular AI systems

00:07:29.840 | that are out in production today.

00:07:32.620 | And what they found was that none of these systems

00:07:36.800 | were as easy as kind of a single input

00:07:42.580 | and a single output kind of basic system.

00:07:45.880 | These systems were all kind of very complex,

00:07:50.140 | multi-node, kind of multi-part systems

00:07:53.600 | that were being chained together

00:07:54.860 | to create really fantastic outcomes.

00:07:57.200 | So our goal at Databricks

00:07:59.360 | is really to simplify the creation

00:08:01.780 | of these kinds of capabilities

00:08:03.780 | for our customers.

00:08:05.440 | But very specifically,

00:08:07.680 | we want to do it on the areas

00:08:10.300 | where there is financial and reputational risk.

00:08:12.740 | If what you're wanting to do

00:08:14.020 | is build a chatbot for you and your buddies

00:08:16.340 | to kind of search over your documents

00:08:20.200 | or your emails or what have you,

00:08:23.160 | your recent PRDs in my case,

00:08:25.140 | great, go for it.

00:08:27.000 | One click rag away at that thing,

00:08:30.260 | kind of, or prompt away at that thing.

00:08:32.320 | But if what you want to do is build something

00:08:34.440 | that you trust putting into a situation

00:08:37.280 | of financial or reputational risk,

00:08:38.880 | then it takes some additional capabilities.

00:08:40.720 | And not only that,

00:08:41.880 | but one of the things we see,

00:08:43.700 | and I'm sure you've seen this as well,

00:08:45.020 | is that many of the folks out there

00:08:47.200 | who are developing these systems,

00:08:48.880 | they're trying to develop

00:08:52.140 | deterministic systems

00:08:53.540 | using the most probabilistic portion

00:08:56.380 | of their entire software stack, right?

00:08:59.080 | And so one of the pieces of this

00:09:01.740 | is how do we help them drive those levels of,

00:09:04.740 | consistently drive those levels

00:09:06.880 | of repeatable determinism?

00:09:08.520 | And we think it comes down to two things.

00:09:10.620 | All else being equal,

00:09:12.700 | we think it comes down to governance,

00:09:14.240 | making sure you can control

00:09:16.140 | at the tightest levels,

00:09:17.520 | at the lowest grain,

00:09:18.640 | what this thing has access to and can do.

00:09:21.160 | And then evaluation.

00:09:23.300 | I was super excited.

00:09:25.460 | I met with a company this morning,

00:09:27.280 | a global logistics provider this morning,

00:09:29.120 | and it was one of the first times

00:09:30.940 | I had met with a customer who said,

00:09:32.780 | hey, we built this system,

00:09:34.120 | and it's like 85% accurate.

00:09:36.320 | And it was such a joy,

00:09:38.400 | because usually people say,

00:09:39.680 | hey, we built this system,

00:09:40.620 | we have it in production,

00:09:41.400 | we're super proud of it.

00:09:42.420 | And I say, how accurate is it?

00:09:43.980 | And they go, oh, it's pretty good.

00:09:45.080 | And so being able to really start to quantify

00:09:49.100 | and hill climb that,

00:09:50.160 | we believe is critical.

00:09:51.140 | So governance, what are we talking about?

00:09:52.740 | We're talking about really governing the access,

00:09:55.740 | treating these agents,

00:09:57.640 | or these prototype agents we're building,

00:10:00.480 | as principles within our data stack,

00:10:03.780 | and governing every single aspect of that.

00:10:07.300 | Now, on Databricks,

00:10:08.180 | we don't just govern your data.

00:10:10.860 | We also govern access to the models, right?

00:10:14.780 | And we govern tools, right?

00:10:17.660 | And we govern queries.

00:10:19.140 | So we govern access to the data,

00:10:20.740 | we govern access to the models,

00:10:22.220 | we govern access,

00:10:23.080 | all of the pieces.

00:10:24.160 | There is one piece we don't yet govern,

00:10:27.840 | yet,

00:10:28.820 | is MCP servers.

00:10:31.300 | But stick with us.

00:10:32.340 | We have a conference in a few weeks.

00:10:33.620 | You might come check it out.

00:10:34.940 | And hopefully, we'll have news for you there.

00:10:37.320 | So how do we get all of this to reason over your data?

00:10:41.040 | And, you know,

00:10:42.460 | we do that by injecting it with either the vector store

00:10:45.220 | or the feature store.

00:10:46.860 | And then we, as I said,

00:10:49.140 | we govern all of the aspects,

00:10:50.540 | whether it's the data,

00:10:51.580 | the models,

00:10:52.220 | the tools.

00:10:52.880 | And I want to stop for a second

00:10:54.420 | and talk about tools and tool calling.

00:10:56.320 | Because we saw some of it just a second ago

00:10:59.000 | in Brad's demos.

00:10:59.880 | And tool calling,

00:11:01.520 | when it comes to trying to build

00:11:03.460 | a deterministic system,

00:11:05.720 | usually what we actually see

00:11:08.940 | is we see someone building,

00:11:12.820 | using an LLM as a classifier

00:11:15.600 | to choose one of six or eight paths,

00:11:19.080 | right?

00:11:19.920 | One of six or eight tools.

00:11:21.140 | And those tools may be agents.

00:11:23.080 | Those tools may be SQL queries.

00:11:24.760 | Those tools are any sort of

00:11:26.320 | parameterizable function kind of thing, right?

00:11:28.420 | So we see them creating access to these tools.

00:11:31.240 | And then what do we see?

00:11:33.460 | We often see the next layer,

00:11:34.820 | another set of agents

00:11:38.260 | choosing between a set of tools.

00:11:41.240 | And so they end up

00:11:42.340 | with this massive decision tree,

00:11:44.020 | which is great

00:11:45.000 | from a kind of deterministic perspective

00:11:46.880 | on really reducing the entropy

00:11:49.320 | in these systems.

00:11:50.180 | The challenge for us

00:11:52.040 | was that before we had

00:11:54.340 | this relationship with Anthropic,

00:11:55.960 | we were talking to people

00:11:57.880 | about this stuff.

00:11:58.840 | But the tool calling

00:12:00.540 | just wasn't where it was needed to be.

00:12:02.860 | You would have these moments

00:12:04.260 | where it would be unbelievably obvious

00:12:06.260 | what tools should be called.

00:12:07.640 | and the models would consistently

00:12:09.500 | not necessarily,

00:12:10.420 | well, would consistently

00:12:11.700 | not get it right.

00:12:13.020 | With Claude,

00:12:15.280 | that has changed completely, right?

00:12:18.440 | We now see

00:12:19.700 | the ability for these systems

00:12:21.880 | to do tool calling

00:12:23.200 | really becomes

00:12:25.220 | the way in which

00:12:26.980 | software development engineers

00:12:29.160 | and app engineers

00:12:31.280 | can start building

00:12:33.400 | these quasi-deterministic systems

00:12:35.960 | using a highly probabilistic backend.

00:12:38.780 | Claude really in many ways

00:12:40.540 | completes this puzzle for us

00:12:42.580 | by giving us

00:12:43.540 | that frontier LLM

00:12:45.000 | available directly

00:12:46.280 | inside Databricks

00:12:47.500 | that has all

00:12:49.920 | of the capabilities needed

00:12:51.360 | to really superpower

00:12:52.780 | the use cases

00:12:54.200 | that our customers

00:12:54.900 | are putting together.

00:12:55.740 | So why Claude

00:12:57.140 | and Databricks together?

00:12:58.220 | First of all,

00:12:58.880 | Claude is natively available

00:13:00.260 | on Databricks

00:13:01.040 | on any cloud, right?

00:13:03.140 | So on Azure,

00:13:03.800 | on AWS,

00:13:04.460 | on GCP,

00:13:05.140 | you can call Claude

00:13:06.660 | within your Databricks instance.

00:13:08.080 | You can build

00:13:09.220 | state-of-the-art agents

00:13:10.940 | on Databricks

00:13:11.740 | powered by Claude,

00:13:12.520 | and then fundamentally

00:13:13.540 | you can connect Claude.

00:13:15.140 | You know,

00:13:15.580 | the vast majority

00:13:16.420 | of folks who are using Databricks

00:13:17.860 | are much lower-level

00:13:19.420 | data engineers

00:13:20.260 | and what have you,

00:13:20.960 | building out

00:13:21.660 | kind of massive schemas

00:13:23.260 | and building out

00:13:23.840 | massive governance policies,

00:13:25.200 | systems,

00:13:26.160 | and what have you.

00:13:27.120 | and you can use Claude

00:13:28.640 | as a principle

00:13:30.620 | within that system,

00:13:32.760 | right?

00:13:33.380 | And as you can see,

00:13:34.480 | including MCP servers

00:13:36.620 | coming soon.

00:13:37.820 | So why use it with us,

00:13:40.840 | right?

00:13:41.080 | Well,

00:13:41.560 | it really comes down

00:13:42.440 | to really pairing

00:13:43.720 | the strongest model

00:13:45.080 | with the strongest platform,

00:13:46.640 | using it

00:13:47.640 | in a fully controlled,

00:13:48.800 | right?

00:13:49.040 | You know,

00:13:49.480 | when you talk

00:13:50.040 | to these companies,

00:13:50.780 | I was sitting

00:13:51.720 | at a collection

00:13:53.020 | of banks recently.

00:13:54.660 | There were 10 or 12 banks

00:13:56.720 | at the table.

00:13:57.340 | We were all talking

00:13:58.440 | about what they were working on.

00:13:59.660 | I think more than half

00:14:01.280 | the banks in the room

00:14:02.560 | were prototyping

00:14:04.220 | on Claude

00:14:04.860 | as we spoke.

00:14:05.640 | One of the banks

00:14:06.980 | did raise their hand

00:14:07.720 | and said,

00:14:08.040 | we're not allowed

00:14:08.760 | to use any of this

00:14:09.600 | generative stuff

00:14:10.380 | in this industry,

00:14:11.180 | of which he was laughed at

00:14:13.260 | by the others in the room

00:14:14.400 | who were working

00:14:15.020 | on these things.

00:14:15.940 | And the real difference

00:14:18.400 | was what that guy

00:14:19.380 | was saying was,

00:14:20.180 | hey,

00:14:20.580 | we don't have

00:14:21.080 | the controls in place

00:14:22.920 | to use this

00:14:23.940 | within our own organization,

00:14:25.240 | whereas the banks,

00:14:26.560 | the hospitals,

00:14:27.200 | the other highly governed,

00:14:28.660 | highly regulated areas

00:14:29.900 | that have gone through this

00:14:31.280 | now have full access

00:14:32.680 | to this technology

00:14:33.720 | and no longer need

00:14:34.660 | to wait for the technology

00:14:36.060 | to kind of come to them,

00:14:37.720 | right?

00:14:38.240 | You can,

00:14:39.040 | commercially,

00:14:39.860 | there are some advantages.

00:14:40.940 | Scale

00:14:41.880 | and operational capabilities

00:14:43.920 | really add to the reasons

00:14:45.340 | for why to use it all together.

00:14:47.760 | Now,

00:14:48.400 | together,

00:14:49.980 | we enable these

00:14:51.160 | really high-value use cases.

00:14:53.840 | And one of the great things

00:14:55.940 | about sitting at the intersection

00:14:58.080 | of Gen AI

00:14:58.960 | and enterprise

00:14:59.880 | is getting to see

00:15:01.140 | these high-value use cases

00:15:02.620 | that kind of come through

00:15:04.240 | and really,

00:15:05.160 | you know,

00:15:05.820 | give us the confidence

00:15:08.180 | to see that this technology

00:15:10.180 | is not going to be

00:15:11.460 | another kind of three-year

00:15:14.580 | flash in the pan,

00:15:15.600 | but is really going to end up

00:15:17.660 | changing the way we all work.

00:15:19.240 | and we can now see

00:15:20.520 | that coming to fruition

00:15:21.760 | in some of these organizations

00:15:23.600 | we're working with.

00:15:25.120 | Now,

00:15:25.620 | I had said at the beginning

00:15:26.920 | that this comes down

00:15:28.560 | to governance

00:15:29.080 | and evaluation,

00:15:29.980 | right?

00:15:31.400 | and for us,

00:15:32.560 | one is not complete

00:15:34.340 | without the other.

00:15:35.360 | You can lock these things down.

00:15:37.200 | You can control

00:15:37.840 | what they have access to.

00:15:38.820 | You can control

00:15:39.560 | how they're going to operate

00:15:41.060 | within your data state.

00:15:42.480 | But if you're not measuring

00:15:44.380 | the quality of the system,

00:15:47.160 | then you're really not going to know

00:15:49.620 | whether or not

00:15:50.480 | this system you've built

00:15:52.320 | is high enough quality

00:15:53.620 | to be able to start putting in

00:15:55.200 | to those higher risk

00:15:56.660 | use cases

00:15:57.460 | without necessarily

00:15:58.860 | a human approval

00:15:59.780 | in the loop

00:16:00.380 | at every step,

00:16:01.300 | right?

00:16:01.860 | And so that's where

00:16:02.920 | eval comes in.

00:16:03.900 | This is our eval platform,

00:16:05.280 | by the way.

00:16:05.800 | You bring in a golden data set.

00:16:07.340 | We have a series

00:16:08.280 | of LLM judges

00:16:09.420 | that help determine

00:16:10.860 | whether or not

00:16:11.720 | your performance

00:16:12.460 | is what it needs to be.

00:16:13.900 | And you can use this.

00:16:15.800 | By the way,

00:16:16.280 | this whole system

00:16:17.120 | has a secondary UI

00:16:18.520 | for your subject matter expert.

00:16:20.480 | Time and again,

00:16:21.240 | we see the app developers

00:16:22.560 | building these systems

00:16:23.740 | are not necessarily

00:16:25.180 | the subject matter experts

00:16:26.540 | on these topics.

00:16:27.460 | And so having a simplified UI

00:16:29.560 | for that subject matter expert

00:16:31.200 | to be able to kind of quickly

00:16:32.680 | and easily give context

00:16:35.280 | or correct a prompt

00:16:37.720 | or create a better answer

00:16:39.320 | is critical.

00:16:40.260 | But this is how we start

00:16:42.180 | down this path

00:16:44.060 | of gaining confidence

00:16:45.180 | that these systems

00:16:46.640 | can perform in robust,

00:16:48.120 | higher risk situations.

00:16:50.260 | is by really kind of,

00:16:52.180 | you know,

00:16:52.440 | I had a guy

00:16:54.080 | the other day

00:16:55.300 | who said,

00:16:56.080 | you know,

00:16:57.360 | oh, you're just unit

00:16:58.160 | testing the agent.

00:16:59.040 | And I kind of said,

00:17:01.060 | well, I'd like to think

00:17:01.880 | it's more clever than that,

00:17:03.080 | but yeah,

00:17:03.860 | you know,

00:17:04.640 | more or less, right?

00:17:05.680 | You know,

00:17:06.220 | really kind of searching

00:17:07.680 | across the question space

00:17:09.760 | or that is expected

00:17:11.100 | to be kind of gone after

00:17:14.240 | with this system

00:17:15.060 | and then diving in

00:17:16.580 | at the most granular levels

00:17:18.140 | to ensure that this system

00:17:19.460 | is performing.

00:17:20.040 | now this eval system,

00:17:21.620 | I should say,

00:17:22.320 | a lot of it is open source

00:17:24.420 | in MLflow.

00:17:25.520 | The LLM judges are not,

00:17:27.520 | but a lot of the capabilities

00:17:28.960 | here can be run.

00:17:30.100 | Whether or not

00:17:30.980 | you're using Databricks

00:17:31.860 | or not,

00:17:32.240 | you can use open source MLflow

00:17:34.220 | to do these evals

00:17:35.300 | or you can,

00:17:36.100 | if you're using Databricks,

00:17:37.640 | you can hook it up

00:17:38.320 | and gain all the value

00:17:39.220 | of some of our custom judges

00:17:40.540 | and what have you,

00:17:41.760 | right?

00:17:43.820 | So that is kind of the stack

00:17:46.440 | and that's how we're helping

00:17:48.100 | organizations bring Gen AI

00:17:50.640 | and particularly bringing

00:17:52.300 | Claude into this space.

00:17:54.440 | Now,

00:17:55.100 | before we wrap up though,

00:17:56.480 | I wanted to share,

00:17:57.460 | you know,

00:17:58.680 | there are these analysts out there,

00:18:01.520 | Gartner and Forrester

00:18:03.020 | and all these,

00:18:03.680 | they go around

00:18:04.600 | and they write report cards

00:18:07.020 | on how good is every,

00:18:08.520 | how good are all the vendors,

00:18:10.560 | right?

00:18:10.960 | which vendors are the leaders

00:18:12.600 | and what have you.

00:18:13.340 | We do pretty well in these,

00:18:15.060 | but I'm really excited to say

00:18:18.880 | that we're now using ARIA

00:18:20.980 | to do these.

00:18:21.820 | So to give you a sense,

00:18:22.960 | the last time we filled out

00:18:24.660 | the Gartner one of these things,

00:18:26.060 | we ended up writing

00:18:27.140 | a 450 page document,

00:18:29.220 | right?

00:18:29.980 | They had 180 questions for us

00:18:31.740 | and we ended up passing back

00:18:33.040 | to them a 450 page document.

00:18:34.940 | So using Claude,

00:18:39.340 | we actually have taken

00:18:41.420 | a whole,

00:18:41.880 | our doc,

00:18:43.400 | our blogs,

00:18:44.160 | our docs,

00:18:45.080 | a whole bunch of the information

00:18:48.060 | about our system,

00:18:49.380 | as well as past answers

00:18:51.480 | we've written

00:18:52.020 | for these types of things

00:18:53.360 | and we've actually gotten it

00:18:55.080 | so that when Gartner

00:18:56.940 | or Forrester

00:18:57.800 | or what have you

00:18:58.500 | send us these questionnaires,

00:19:01.040 | we just run them

00:19:01.900 | through the bot.

00:19:03.620 | and, you know,

00:19:04.420 | I'll say the answers,

00:19:05.760 | we still read the answers over

00:19:07.580 | and we correct some of them

00:19:10.660 | some of the time,

00:19:12.360 | but the ability for us

00:19:14.480 | to do this

00:19:15.080 | has made it

00:19:15.680 | so that now

00:19:16.480 | instead of it being

00:19:17.800 | kind of hundreds of hours

00:19:20.000 | of product managers

00:19:21.140 | and engineers

00:19:21.920 | and marketing folks

00:19:22.920 | all kind of pounding

00:19:23.760 | on the keyboard

00:19:24.480 | to try and put something together,

00:19:26.240 | now we're just editing

00:19:27.840 | what I wouldn't even call

00:19:29.540 | a rough draft.

00:19:30.300 | we're editing

00:19:30.840 | what's pretty darn close

00:19:31.920 | to a final draft

00:19:33.080 | coming out of Claude

00:19:34.480 | and the reason why

00:19:35.900 | I have this up here

00:19:36.960 | is because we built this,

00:19:39.920 | this went through

00:19:42.640 | many iterations.

00:19:43.500 | We started with

00:19:44.500 | open source models,

00:19:45.340 | then we went to

00:19:47.480 | non-anthropic models

00:19:49.620 | of a different vendor

00:19:51.480 | and then we started

00:19:53.460 | using Claude

00:19:54.440 | and it wasn't until

00:19:55.780 | we started using Claude

00:19:57.160 | that the results

00:19:58.440 | were good enough

00:19:59.180 | that we,

00:19:59.760 | it was when

00:20:00.740 | we started using Claude

00:20:01.980 | that we,

00:20:02.280 | for the first time,

00:20:03.540 | had results

00:20:04.500 | that we could ship

00:20:05.420 | without touching them

00:20:07.100 | and that was a huge win

00:20:08.960 | for us

00:20:09.520 | and so it's a really exciting,

00:20:11.860 | this is one of these

00:20:12.720 | that I'm super excited about

00:20:14.200 | because it makes

00:20:14.700 | my life way better.

00:20:15.880 | We just published

00:20:18.080 | a blog on this,

00:20:19.180 | really exciting stuff

00:20:21.420 | if you have to spend

00:20:23.120 | your days filling out

00:20:24.180 | these darn questionnaires.

00:20:25.480 | Block is also

00:20:27.160 | a customer of ours

00:20:28.440 | and Block has built

00:20:29.840 | this open source system

00:20:30.960 | called Goose

00:20:31.680 | and if you haven't

00:20:32.300 | given Goose a try,

00:20:33.240 | you should take a look.

00:20:34.100 | As I said,

00:20:35.080 | it's open source.

00:20:35.680 | It's really a dev environment,

00:20:38.540 | an agentic dev environment

00:20:40.600 | to accelerate their developers

00:20:43.200 | to be able to build,

00:20:45.600 | you know,

00:20:45.940 | it has basically

00:20:46.940 | Claude built into it

00:20:48.220 | and it has connections

00:20:49.100 | to all of their systems

00:20:50.560 | and all of their data

00:20:51.660 | so that they can

00:20:52.760 | much,

00:20:53.360 | much more quickly

00:20:54.240 | and easily

00:20:54.900 | build,

00:20:55.980 | you know,

00:20:57.360 | kind of within

00:20:58.240 | and really accelerate

00:20:59.460 | the developer experience

00:21:00.780 | far,

00:21:01.540 | far beyond

00:21:02.360 | kind of what we're all used to

00:21:03.720 | with Code Complete

00:21:04.640 | or something like that

00:21:05.860 | into a much more

00:21:07.180 | purpose-built system

00:21:08.480 | to be able to go attack

00:21:10.040 | improvement of their workflows

00:21:11.740 | and things like this.

00:21:12.780 | You can see 40 to 50%

00:21:15.040 | weekly user adoption increase,

00:21:16.700 | 8 to 10 hours saved per week

00:21:19.100 | by using this

00:21:20.360 | and it's been really exciting

00:21:22.480 | to see Block be this successful

00:21:24.560 | with Databricks on,

00:21:26.400 | or with Claude on Databricks

00:21:28.160 | as well as to see Goose

00:21:30.640 | start to pick up in the market

00:21:32.500 | and more and more people

00:21:33.600 | playing around

00:21:34.580 | and starting to try out Goose.

00:21:36.160 | So those are just a couple

00:21:37.560 | of the areas

00:21:38.160 | where we've had success

00:21:39.200 | getting these models

00:21:41.700 | and these systems

00:21:42.500 | in production

00:21:43.060 | and creating value

00:21:43.980 | for customers.

00:21:44.660 | So I'll just end it with,

00:21:47.080 | you know,

00:21:47.580 | I'm sure everyone here

00:21:49.460 | is deep enough in this

00:21:50.740 | that I don't need to tell you

00:21:51.840 | to start identifying

00:21:52.760 | your AI use cases,

00:21:54.020 | but once you've identified

00:21:55.720 | those AI use cases

00:21:56.880 | and you've started

00:21:57.580 | to understand

00:21:58.320 | what success may look like,

00:21:59.880 | you know,

00:22:01.200 | contact us,

00:22:01.960 | reach out to Databricks,

00:22:03.060 | reach out to Anthropic,

00:22:04.640 | happy to work with you,

00:22:06.560 | either,

00:22:07.240 | you know,

00:22:08.000 | kind of in your

00:22:08.740 | professional capacity

00:22:09.660 | with the organizations

00:22:10.580 | you work for

00:22:11.440 | and really help them

00:22:13.240 | gain the confidence.

00:22:14.740 | The meeting I was in

00:22:16.200 | earlier today,

00:22:17.000 | as I was walking

00:22:19.540 | out of the meeting,

00:22:20.120 | the head of AI

00:22:20.820 | came running over to me

00:22:22.080 | and he said,

00:22:22.560 | hey,

00:22:22.720 | I really appreciate

00:22:23.500 | the session today.

00:22:24.300 | And I said,

00:22:25.040 | no,

00:22:25.240 | no worries,

00:22:25.760 | like happy to present,

00:22:27.080 | happy to chat with you

00:22:28.040 | about what we're doing.

00:22:28.740 | He goes,

00:22:29.140 | no,

00:22:29.320 | no,

00:22:29.460 | no,

00:22:29.580 | no,

00:22:29.780 | it wasn't learning

00:22:30.460 | about your stuff

00:22:31.180 | that I appreciated.

00:22:32.000 | It was you telling

00:22:33.160 | our chief data officer

00:22:34.600 | how hard my job is

00:22:35.820 | that I really appreciated,

00:22:37.060 | right?

00:22:37.380 | And so let us know

00:22:38.900 | how we can help you

00:22:39.900 | in this journey.

00:22:41.540 | With that,

00:22:42.320 | I wanted to open it up

00:22:43.500 | to any questions.

00:22:44.440 | So your safe score

00:22:46.260 | in the evaluation,

00:22:47.220 | is that

00:22:48.020 | leveraging adversarial

00:22:50.640 | testing problems

00:22:52.900 | or

00:22:53.200 | no,

00:22:54.700 | no,

00:22:56.940 | no.

00:22:57.400 | So the question was,

00:22:58.300 | is the safe score

00:22:59.700 | among our LLM judges

00:23:01.100 | kind of using a red teaming

00:23:02.820 | or a kind of adversarial

00:23:04.580 | technique

00:23:05.080 | or something like that?

00:23:05.900 | No,

00:23:06.300 | it's much more

00:23:07.040 | of a kind of,

00:23:07.780 | think of it more

00:23:08.600 | as like a guardrail

00:23:09.540 | type measure

00:23:10.200 | around,

00:23:10.760 | you know,

00:23:11.240 | was this response

00:23:13.160 | a green response

00:23:14.500 | or a comfortable response

00:23:16.720 | kind of thing?

00:23:17.340 | Any other questions?

00:23:19.040 | Yeah.

00:23:20.840 | Would you think

00:23:22.780 | like Minecraft Cloud

00:23:23.740 | is a compender?

00:23:24.340 | Like,

00:23:24.920 | do you feel some good

00:23:25.500 | of that?

00:23:25.760 | Yeah,

00:23:26.260 | I mean,

00:23:26.740 | you know,

00:23:27.900 | some of the folks

00:23:28.620 | over,

00:23:28.960 | you know,

00:23:29.800 | it's tough.

00:23:30.600 | we have a bunch

00:23:33.660 | of competitors

00:23:34.220 | for point solutions

00:23:35.980 | within the Gen AI space,

00:23:37.540 | right?

00:23:37.720 | You know,

00:23:37.940 | eval,

00:23:38.380 | you could say,

00:23:39.100 | you know,

00:23:39.760 | it might be Galileo,

00:23:41.000 | it might be,

00:23:42.020 | you know,

00:23:42.520 | Patronus,

00:23:43.740 | it might be others

00:23:44.540 | kind of thing,

00:23:45.220 | right?

00:23:45.500 | You know,

00:23:46.800 | and so there's

00:23:47.260 | some point-specific folks.

00:23:48.700 | I think

00:23:49.820 | the way we think

00:23:50.840 | about this

00:23:51.380 | is much more

00:23:52.260 | that the value

00:23:53.480 | comes in the connection

00:23:54.640 | between the AI system

00:23:55.860 | and the data system.

00:23:56.740 | Like,

00:23:57.360 | having worked

00:23:58.220 | at both AWS

00:23:59.080 | and GCP,

00:24:00.260 | I can say,

00:24:00.800 | like,

00:24:01.160 | the reason I'm at Databricks

00:24:03.320 | is because

00:24:04.080 | there was a conversation

00:24:05.740 | I had

00:24:06.280 | while at Vertex

00:24:07.980 | where we were

00:24:08.780 | sitting there saying,

00:24:09.960 | hey,

00:24:10.220 | with MLOps,

00:24:11.020 | we had taken

00:24:11.660 | an order of magnitude

00:24:12.680 | off the development time.

00:24:14.020 | Where does the next

00:24:15.340 | order of magnitude

00:24:16.380 | off development time

00:24:17.640 | came from?

00:24:18.160 | And it really,

00:24:18.620 | I believe,

00:24:19.880 | comes from being able

00:24:21.000 | to really integrate

00:24:22.780 | the AI

00:24:23.360 | and the data layers

00:24:24.620 | together much,

00:24:25.520 | much more intimately

00:24:26.840 | and deeply

00:24:27.420 | than we've seen

00:24:28.180 | from most of the

00:24:28.800 | hyperscalers.

00:24:29.920 | any other

00:24:31.180 | last questions?

00:24:31.880 | Yeah.

00:24:32.160 | In one of your earlier slides,

00:24:34.720 | the customer investment

00:24:35.860 | is not on Claude yet.

00:24:37.140 | Yeah.

00:24:37.960 | It looks like

00:24:38.980 | they have many agents

00:24:40.180 | going together.

00:24:41.420 | Have you found

00:24:43.680 | that with Claude

00:24:44.540 | or in the three points

00:24:45.940 | that it's like

00:24:46.380 | you don't need

00:24:47.120 | many agents,

00:24:47.840 | it can kind of

00:24:49.100 | what you see

00:24:49.560 | if they manage that?

00:24:50.660 | Yeah.

00:24:51.200 | I mean,

00:24:51.960 | that's certainly,

00:24:52.780 | you know,

00:24:53.220 | we often encourage

00:24:56.420 | companies

00:24:56.880 | to take a more

00:24:58.840 | kind of

00:24:59.680 | composable agentic

00:25:00.880 | approach

00:25:01.420 | and we often

00:25:02.200 | encourage them

00:25:02.900 | to do that

00:25:03.420 | simply because

00:25:04.320 | when you're trying

00:25:05.400 | to build these systems

00:25:06.440 | to behave

00:25:07.440 | deterministically

00:25:08.280 | in a higher risk

00:25:11.240 | environment,

00:25:12.120 | then you need

00:25:14.820 | to be able

00:25:15.580 | to tune them

00:25:16.560 | at a much

00:25:17.560 | more granular level

00:25:18.680 | and so,

00:25:19.620 | you know,

00:25:20.080 | our goal

00:25:20.920 | is really

00:25:21.360 | to drive

00:25:21.820 | as much entropy

00:25:22.600 | out of these systems

00:25:23.660 | as possible

00:25:24.380 | in trying to

00:25:25.580 | get this determinism

00:25:26.980 | and so,

00:25:27.440 | you know,

00:25:28.080 | yes,

00:25:29.220 | I think 3.7,

00:25:31.880 | I haven't gotten

00:25:32.860 | to play with 4

00:25:33.780 | nearly enough yet

00:25:34.760 | but I think 3.7

00:25:35.760 | probably could do

00:25:37.980 | a lot of that

00:25:38.700 | but I guess

00:25:39.740 | my only concern

00:25:40.620 | would be

00:25:41.200 | if we did find errors,

00:25:43.280 | would we have

00:25:43.740 | the knobs

00:25:44.180 | to be able

00:25:44.580 | to go

00:25:44.840 | and get them

00:25:45.440 | beyond just

00:25:46.280 | swapping up

00:25:47.800 | the prompts,

00:25:48.420 | right?

00:25:48.860 | And that's,

00:25:49.680 | I think,

00:25:49.980 | where,

00:25:50.280 | you know,

00:25:50.940 | even as these models

00:25:52.240 | have gotten much larger,

00:25:53.260 | I'll tell you,

00:25:54.280 | one of the things 3.7

00:25:55.380 | that I've really

00:25:56.340 | appreciated about 3.7

00:25:57.580 | is that

00:25:58.140 | it does a great job

00:26:00.200 | of taking prompts

00:26:01.300 | to other models

00:26:02.280 | and decomposing them

00:26:03.600 | into each of the steps.

00:26:04.860 | Like,

00:26:05.360 | I can take it

00:26:06.000 | and say,

00:26:06.340 | hey,

00:26:06.640 | if I needed

00:26:07.560 | to rewrite this

00:26:08.580 | where it was

00:26:10.320 | as many small

00:26:11.620 | granular steps

00:26:12.620 | as possible,

00:26:13.360 | then 3.7

00:26:15.380 | has done

00:26:16.420 | a great job

00:26:17.000 | of that.

00:26:17.340 | So listen,

00:26:18.000 | I appreciate

00:26:18.680 | all the time

00:26:19.880 | and attention today.

00:26:20.660 | I'll be back

00:26:21.520 | if you have

00:26:21.920 | other questions

00:26:22.660 | back by the door

00:26:23.540 | or back outside

00:26:24.180 | and thanks again

00:26:25.240 | for coming today.

00:26:25.880 | Thank you.

00:26:26.600 | Thank you.

00:26:27.600 | Thank you.

00:26:28.600 | Thank you.

00:26:29.600 | Thank you.

00:26:30.600 | Thank you.

00:26:31.600 | Thank you.

00:26:32.600 | you

00:26:33.100 | I'll see you next time.