back to index

Spotlight on Databricks | Code w/ Claude


Whisper Transcript | Transcript Only Page

00:00:00.000 | Thank you for sticking around in here, I suppose.
00:00:16.240 | It'd probably be the most apropos thing to say.
00:00:18.220 | Thank you for joining me today.
00:00:20.540 | I wanted to talk a little bit about how all of this technology
00:00:23.760 | actually gets into the path of value
00:00:25.980 | inside large organizations and large businesses.
00:00:29.640 | As it would turn out, the ability for us to go prototype cool stuff
00:00:34.420 | versus us go and deliver these things into the critical path
00:00:37.800 | can vary widely.
00:00:39.120 | I'm Craig.
00:00:39.760 | I lead product management for Databricks,
00:00:42.280 | in case you hadn't figured that yet.
00:00:44.040 | Been with Databricks for about three years.
00:00:46.600 | Before that was at Google, where I was the leader of product
00:00:49.660 | for the founding of Vertex AI,
00:00:51.840 | and before that I was the founding general manager of AWS SageMaker.
00:00:55.800 | So I've been, as my wife says, continuing to strike out
00:00:59.920 | as I try and get better and better at helping enterprises build AI.
00:01:04.160 | But as we dive into this,
00:01:07.000 | I wanted to quickly just set a little bit of context on who Databricks,
00:01:11.140 | why Databricks, and why is Databricks here talking to you,
00:01:14.360 | and what have you.
00:01:15.340 | We are a leading data platform, cross-cloud data platform,
00:01:21.020 | tens of thousands of customers, billions of dollars in revenue,
00:01:26.240 | and moreover, the creator of a number of open source,
00:01:29.280 | very popular open source capabilities,
00:01:31.340 | Spark, MLflow, Delta, et cetera.
00:01:36.120 | You know, we live in a world, Brad just a minute ago,
00:01:39.560 | he talked about the importance of the model
00:01:41.880 | and then the data you bring to the model.
00:01:44.420 | And the enterprises we work with have a kind of nightmarish data scenario
00:01:49.540 | because, you know, you talk to these large multinational banks
00:01:53.940 | or something like that,
00:01:54.980 | and they've done dozens, if not scores of acquisitions over the years,
00:02:00.940 | and they have data on every cloud, in every possible vendor,
00:02:05.260 | in every possible service,
00:02:06.820 | and they're trying at this moment
00:02:09.000 | to figure out how to take advantage
00:02:12.140 | of this kind of transformational technological moment,
00:02:16.980 | but they're doing it with kind of a mess in the back end, if you will, right?
00:02:21.140 | And it turns out the problem is actually much worse than this
00:02:23.860 | because it's not like they just have one data warehouse
00:02:26.540 | or something like that.
00:02:27.500 | They often have many of them, right?
00:02:29.540 | And often the experts in one or two of these systems
00:02:33.620 | are only experts in one or two of these systems,
00:02:36.360 | and they don't know the other systems.
00:02:38.240 | So if you're stuck in your data warehouse
00:02:40.740 | or your streaming person isn't a Gen AI person,
00:02:43.800 | you may find yourself kind of locked out
00:02:46.180 | of being able to bring your data into these systems
00:02:50.720 | as easily as you want to.
00:02:51.840 | Now, I'm not going to go head on into Databricks.
00:02:53.980 | Databricks, ultimately, we help you manage your data,
00:02:56.900 | and then on top of that management of your data,
00:02:59.520 | we have a whole series of capabilities.
00:03:01.360 | I'm going to really focus on our AI capabilities with Mosaic AI today.
00:03:06.480 | Now, we think of this as a difference between what we call general intelligence
00:03:12.700 | and data intelligence.
00:03:13.940 | Both of these things are extraordinarily useful and extraordinarily important.
00:03:20.000 | But as Brad talked about, particularly for businesses or large enterprises,
00:03:27.000 | as they want to move into using this technology to automate more of their systems
00:03:32.560 | or drive greater insights within their organization,
00:03:35.780 | almost always it comes back to connecting it.
00:03:39.780 | We saw here Brad connecting it to the web or connecting it to MCP servers,
00:03:44.480 | but inevitably it comes back to trying to connect it to their data estate, right?
00:03:49.720 | So for a really good example of this, FactSet.
00:03:52.820 | I don't know if you guys have heard of FactSet.
00:03:54.300 | FactSet is a financial services company that sells data about other companies.
00:03:59.420 | They sell financial data about companies to banks and hedge funds and what have you.
00:04:04.080 | FactSet has their own query language,
00:04:07.560 | which is now a yellow flag to me when considering employers.
00:04:12.900 | If your employer has their own query language,
00:04:15.460 | you've got to think about whether or not this is the right place to be.
00:04:19.540 | Having said that, I did work at Google,
00:04:21.160 | who I think probably has a dozen of their own query languages.
00:04:23.600 | So FactSet had this problem and opportunity,
00:04:28.060 | which is that any customer they had who wanted to access their data,
00:04:32.380 | they had to learn FQL, FactSet Query Language, creative name in there.
00:04:37.740 | And so when this whole Gen.AI craze started,
00:04:41.460 | these guys lost their minds with excitement
00:04:43.160 | because they thought,
00:04:43.860 | what if we could translate English into FactSet query language?
00:04:48.660 | And so they went to their favorite cloud of choice.
00:04:51.580 | They hit the one-click rag button.
00:04:54.600 | I think they did a little more than the one-click rag button,
00:04:57.160 | but they basically showed up with this massive prompt
00:05:00.900 | of a bunch of examples and a bunch of documentation
00:05:03.900 | and then a massive VectorDB of a bunch of prompts
00:05:07.660 | and a bunch of documentation or a bunch of examples
00:05:10.680 | and a bunch of documentation.
00:05:11.780 | And this is what they ended up with, right?
00:05:14.560 | They ended up with 59% accuracy in about 15 seconds of latency.
00:05:20.340 | And I share with you that latency metric,
00:05:22.540 | not just because it's an important customer experience metric
00:05:27.340 | and all of these kinds of things,
00:05:28.480 | but in this world of Gen.AI,
00:05:30.040 | it's probably the closest thing we have to a cost metric, right?
00:05:33.920 | You're more or less paying for compute time.
00:05:35.700 | And so that 15 seconds is basically 15 seconds of cost, right?
00:05:39.560 | And 59% accuracy.
00:05:41.340 | With this, they showed up,
00:05:43.200 | they contacted us and said,
00:05:44.560 | hey, good news.
00:05:45.940 | We've got a Gen.AI solution.
00:05:48.160 | Bad news.
00:05:49.500 | It's just slightly better than a coin flip kind of thing, right?
00:05:53.440 | And so we worked with them on this problem
00:05:56.580 | and tried to understand what the opportunity was,
00:06:00.600 | what the challenge was.
00:06:01.440 | And really what we did was we just decomposed the prompt
00:06:05.780 | into each of the individual tasks
00:06:09.740 | that that prompt was being asked to use, right?
00:06:12.380 | So effectively what we did was we took that prompt
00:06:15.340 | and created kind of something of an agent,
00:06:18.100 | a multi-node, a multi-step chain or process
00:06:22.300 | to be able to solve this problem more wholly.
00:06:25.120 | And really the reason we did that was
00:06:28.520 | because it allowed us the opportunity
00:06:30.780 | to start tuning performance
00:06:32.420 | at each step of this problem, right?
00:06:35.820 | And you can see we got them to 85% accuracy
00:06:38.760 | in six seconds of latency.
00:06:40.320 | At 85% accuracy, they did two things.
00:06:43.940 | They turned to us and they said,
00:06:45.440 | cool, we're comfortable showing this
00:06:47.540 | to our existing customers.
00:06:48.820 | And they said, we get how you're helping us.
00:06:52.320 | We don't want to pay you to help us anymore.
00:06:54.320 | We'll take it from here.
00:06:55.640 | Last I talked to them, they had it into the 90s.
00:06:58.820 | And last I talked to them,
00:07:00.200 | transitioning to Claude was one of their next roadmap items.
00:07:04.980 | The reason I say all of this
00:07:07.620 | is because there's a paper out there
00:07:09.440 | from the Berkeley Artificial Intelligence Research Lab,
00:07:13.240 | which if you look into it,
00:07:15.520 | yes, there's a little bit of cross-pollination
00:07:17.460 | between us and Berkeley.
00:07:19.280 | But basically the folks at Berkeley did a,
00:07:23.100 | right after Gen.ai kind of really hit its stride,
00:07:26.420 | they went out and they looked at all the popular AI systems
00:07:29.840 | that are out in production today.
00:07:32.620 | And what they found was that none of these systems
00:07:36.800 | were as easy as kind of a single input
00:07:42.580 | and a single output kind of basic system.
00:07:45.880 | These systems were all kind of very complex,
00:07:50.140 | multi-node, kind of multi-part systems
00:07:53.600 | that were being chained together
00:07:54.860 | to create really fantastic outcomes.
00:07:57.200 | So our goal at Databricks
00:07:59.360 | is really to simplify the creation
00:08:01.780 | of these kinds of capabilities
00:08:03.780 | for our customers.
00:08:05.440 | But very specifically,
00:08:07.680 | we want to do it on the areas
00:08:10.300 | where there is financial and reputational risk.
00:08:12.740 | If what you're wanting to do
00:08:14.020 | is build a chatbot for you and your buddies
00:08:16.340 | to kind of search over your documents
00:08:20.200 | or your emails or what have you,
00:08:23.160 | your recent PRDs in my case,
00:08:25.140 | great, go for it.
00:08:27.000 | One click rag away at that thing,
00:08:30.260 | kind of, or prompt away at that thing.
00:08:32.320 | But if what you want to do is build something
00:08:34.440 | that you trust putting into a situation
00:08:37.280 | of financial or reputational risk,
00:08:38.880 | then it takes some additional capabilities.
00:08:40.720 | And not only that,
00:08:41.880 | but one of the things we see,
00:08:43.700 | and I'm sure you've seen this as well,
00:08:45.020 | is that many of the folks out there
00:08:47.200 | who are developing these systems,
00:08:48.880 | they're trying to develop
00:08:52.140 | deterministic systems
00:08:53.540 | using the most probabilistic portion
00:08:56.380 | of their entire software stack, right?
00:08:59.080 | And so one of the pieces of this
00:09:01.740 | is how do we help them drive those levels of,
00:09:04.740 | consistently drive those levels
00:09:06.880 | of repeatable determinism?
00:09:08.520 | And we think it comes down to two things.
00:09:10.620 | All else being equal,
00:09:12.700 | we think it comes down to governance,
00:09:14.240 | making sure you can control
00:09:16.140 | at the tightest levels,
00:09:17.520 | at the lowest grain,
00:09:18.640 | what this thing has access to and can do.
00:09:21.160 | And then evaluation.
00:09:23.300 | I was super excited.
00:09:25.460 | I met with a company this morning,
00:09:27.280 | a global logistics provider this morning,
00:09:29.120 | and it was one of the first times
00:09:30.940 | I had met with a customer who said,
00:09:32.780 | hey, we built this system,
00:09:34.120 | and it's like 85% accurate.
00:09:36.320 | And it was such a joy,
00:09:38.400 | because usually people say,
00:09:39.680 | hey, we built this system,
00:09:40.620 | we have it in production,
00:09:41.400 | we're super proud of it.
00:09:42.420 | And I say, how accurate is it?
00:09:43.980 | And they go, oh, it's pretty good.
00:09:45.080 | And so being able to really start to quantify
00:09:49.100 | and hill climb that,
00:09:50.160 | we believe is critical.
00:09:51.140 | So governance, what are we talking about?
00:09:52.740 | We're talking about really governing the access,
00:09:55.740 | treating these agents,
00:09:57.640 | or these prototype agents we're building,
00:10:00.480 | as principles within our data stack,
00:10:03.780 | and governing every single aspect of that.
00:10:07.300 | Now, on Databricks,
00:10:08.180 | we don't just govern your data.
00:10:10.860 | We also govern access to the models, right?
00:10:14.780 | And we govern tools, right?
00:10:17.660 | And we govern queries.
00:10:19.140 | So we govern access to the data,
00:10:20.740 | we govern access to the models,
00:10:22.220 | we govern access,
00:10:23.080 | all of the pieces.
00:10:24.160 | There is one piece we don't yet govern,
00:10:28.820 | is MCP servers.
00:10:31.300 | But stick with us.
00:10:32.340 | We have a conference in a few weeks.
00:10:33.620 | You might come check it out.
00:10:34.940 | And hopefully, we'll have news for you there.
00:10:37.320 | So how do we get all of this to reason over your data?
00:10:41.040 | And, you know,
00:10:42.460 | we do that by injecting it with either the vector store
00:10:45.220 | or the feature store.
00:10:46.860 | And then we, as I said,
00:10:49.140 | we govern all of the aspects,
00:10:50.540 | whether it's the data,
00:10:51.580 | the models,
00:10:52.220 | the tools.
00:10:52.880 | And I want to stop for a second
00:10:54.420 | and talk about tools and tool calling.
00:10:56.320 | Because we saw some of it just a second ago
00:10:59.000 | in Brad's demos.
00:10:59.880 | And tool calling,
00:11:01.520 | when it comes to trying to build
00:11:03.460 | a deterministic system,
00:11:05.720 | usually what we actually see
00:11:08.940 | is we see someone building,
00:11:12.820 | using an LLM as a classifier
00:11:15.600 | to choose one of six or eight paths,
00:11:19.080 | right?
00:11:19.920 | One of six or eight tools.
00:11:21.140 | And those tools may be agents.
00:11:23.080 | Those tools may be SQL queries.
00:11:24.760 | Those tools are any sort of
00:11:26.320 | parameterizable function kind of thing, right?
00:11:28.420 | So we see them creating access to these tools.
00:11:31.240 | And then what do we see?
00:11:33.460 | We often see the next layer,
00:11:34.820 | another set of agents
00:11:38.260 | choosing between a set of tools.
00:11:41.240 | And so they end up
00:11:42.340 | with this massive decision tree,
00:11:44.020 | which is great
00:11:45.000 | from a kind of deterministic perspective
00:11:46.880 | on really reducing the entropy
00:11:49.320 | in these systems.
00:11:50.180 | The challenge for us
00:11:52.040 | was that before we had
00:11:54.340 | this relationship with Anthropic,
00:11:55.960 | we were talking to people
00:11:57.880 | about this stuff.
00:11:58.840 | But the tool calling
00:12:00.540 | just wasn't where it was needed to be.
00:12:02.860 | You would have these moments
00:12:04.260 | where it would be unbelievably obvious
00:12:06.260 | what tools should be called.
00:12:07.640 | and the models would consistently
00:12:09.500 | not necessarily,
00:12:10.420 | well, would consistently
00:12:11.700 | not get it right.
00:12:13.020 | With Claude,
00:12:15.280 | that has changed completely, right?
00:12:18.440 | We now see
00:12:19.700 | the ability for these systems
00:12:21.880 | to do tool calling
00:12:23.200 | really becomes
00:12:25.220 | the way in which
00:12:26.980 | software development engineers
00:12:29.160 | and app engineers
00:12:31.280 | can start building
00:12:33.400 | these quasi-deterministic systems
00:12:35.960 | using a highly probabilistic backend.
00:12:38.780 | Claude really in many ways
00:12:40.540 | completes this puzzle for us
00:12:42.580 | by giving us
00:12:43.540 | that frontier LLM
00:12:45.000 | available directly
00:12:46.280 | inside Databricks
00:12:47.500 | that has all
00:12:49.920 | of the capabilities needed
00:12:51.360 | to really superpower
00:12:52.780 | the use cases
00:12:54.200 | that our customers
00:12:54.900 | are putting together.
00:12:55.740 | So why Claude
00:12:57.140 | and Databricks together?
00:12:58.220 | First of all,
00:12:58.880 | Claude is natively available
00:13:00.260 | on Databricks
00:13:01.040 | on any cloud, right?
00:13:03.140 | So on Azure,
00:13:03.800 | on AWS,
00:13:04.460 | on GCP,
00:13:05.140 | you can call Claude
00:13:06.660 | within your Databricks instance.
00:13:08.080 | You can build
00:13:09.220 | state-of-the-art agents
00:13:10.940 | on Databricks
00:13:11.740 | powered by Claude,
00:13:12.520 | and then fundamentally
00:13:13.540 | you can connect Claude.
00:13:15.140 | You know,
00:13:15.580 | the vast majority
00:13:16.420 | of folks who are using Databricks
00:13:17.860 | are much lower-level
00:13:19.420 | data engineers
00:13:20.260 | and what have you,
00:13:20.960 | building out
00:13:21.660 | kind of massive schemas
00:13:23.260 | and building out
00:13:23.840 | massive governance policies,
00:13:25.200 | systems,
00:13:26.160 | and what have you.
00:13:27.120 | and you can use Claude
00:13:28.640 | as a principle
00:13:30.620 | within that system,
00:13:32.760 | right?
00:13:33.380 | And as you can see,
00:13:34.480 | including MCP servers
00:13:36.620 | coming soon.
00:13:37.820 | So why use it with us,
00:13:40.840 | right?
00:13:41.080 | Well,
00:13:41.560 | it really comes down
00:13:42.440 | to really pairing
00:13:43.720 | the strongest model
00:13:45.080 | with the strongest platform,
00:13:46.640 | using it
00:13:47.640 | in a fully controlled,
00:13:48.800 | right?
00:13:49.040 | You know,
00:13:49.480 | when you talk
00:13:50.040 | to these companies,
00:13:50.780 | I was sitting
00:13:51.720 | at a collection
00:13:53.020 | of banks recently.
00:13:54.660 | There were 10 or 12 banks
00:13:56.720 | at the table.
00:13:57.340 | We were all talking
00:13:58.440 | about what they were working on.
00:13:59.660 | I think more than half
00:14:01.280 | the banks in the room
00:14:02.560 | were prototyping
00:14:04.220 | on Claude
00:14:04.860 | as we spoke.
00:14:05.640 | One of the banks
00:14:06.980 | did raise their hand
00:14:07.720 | and said,
00:14:08.040 | we're not allowed
00:14:08.760 | to use any of this
00:14:09.600 | generative stuff
00:14:10.380 | in this industry,
00:14:11.180 | of which he was laughed at
00:14:13.260 | by the others in the room
00:14:14.400 | who were working
00:14:15.020 | on these things.
00:14:15.940 | And the real difference
00:14:18.400 | was what that guy
00:14:19.380 | was saying was,
00:14:20.580 | we don't have
00:14:21.080 | the controls in place
00:14:22.920 | to use this
00:14:23.940 | within our own organization,
00:14:25.240 | whereas the banks,
00:14:26.560 | the hospitals,
00:14:27.200 | the other highly governed,
00:14:28.660 | highly regulated areas
00:14:29.900 | that have gone through this
00:14:31.280 | now have full access
00:14:32.680 | to this technology
00:14:33.720 | and no longer need
00:14:34.660 | to wait for the technology
00:14:36.060 | to kind of come to them,
00:14:37.720 | right?
00:14:38.240 | You can,
00:14:39.040 | commercially,
00:14:39.860 | there are some advantages.
00:14:40.940 | Scale
00:14:41.880 | and operational capabilities
00:14:43.920 | really add to the reasons
00:14:45.340 | for why to use it all together.
00:14:48.400 | together,
00:14:49.980 | we enable these
00:14:51.160 | really high-value use cases.
00:14:53.840 | And one of the great things
00:14:55.940 | about sitting at the intersection
00:14:58.080 | of Gen AI
00:14:58.960 | and enterprise
00:14:59.880 | is getting to see
00:15:01.140 | these high-value use cases
00:15:02.620 | that kind of come through
00:15:04.240 | and really,
00:15:05.160 | you know,
00:15:05.820 | give us the confidence
00:15:08.180 | to see that this technology
00:15:10.180 | is not going to be
00:15:11.460 | another kind of three-year
00:15:14.580 | flash in the pan,
00:15:15.600 | but is really going to end up
00:15:17.660 | changing the way we all work.
00:15:19.240 | and we can now see
00:15:20.520 | that coming to fruition
00:15:21.760 | in some of these organizations
00:15:23.600 | we're working with.
00:15:25.620 | I had said at the beginning
00:15:26.920 | that this comes down
00:15:28.560 | to governance
00:15:29.080 | and evaluation,
00:15:29.980 | right?
00:15:31.400 | and for us,
00:15:32.560 | one is not complete
00:15:34.340 | without the other.
00:15:35.360 | You can lock these things down.
00:15:37.200 | You can control
00:15:37.840 | what they have access to.
00:15:38.820 | You can control
00:15:39.560 | how they're going to operate
00:15:41.060 | within your data state.
00:15:42.480 | But if you're not measuring
00:15:44.380 | the quality of the system,
00:15:47.160 | then you're really not going to know
00:15:49.620 | whether or not
00:15:50.480 | this system you've built
00:15:52.320 | is high enough quality
00:15:53.620 | to be able to start putting in
00:15:55.200 | to those higher risk
00:15:56.660 | use cases
00:15:57.460 | without necessarily
00:15:58.860 | a human approval
00:15:59.780 | in the loop
00:16:00.380 | at every step,
00:16:01.300 | right?
00:16:01.860 | And so that's where
00:16:02.920 | eval comes in.
00:16:03.900 | This is our eval platform,
00:16:05.280 | by the way.
00:16:05.800 | You bring in a golden data set.
00:16:07.340 | We have a series
00:16:08.280 | of LLM judges
00:16:09.420 | that help determine
00:16:10.860 | whether or not
00:16:11.720 | your performance
00:16:12.460 | is what it needs to be.
00:16:13.900 | And you can use this.
00:16:15.800 | By the way,
00:16:16.280 | this whole system
00:16:17.120 | has a secondary UI
00:16:18.520 | for your subject matter expert.
00:16:20.480 | Time and again,
00:16:21.240 | we see the app developers
00:16:22.560 | building these systems
00:16:23.740 | are not necessarily
00:16:25.180 | the subject matter experts
00:16:26.540 | on these topics.
00:16:27.460 | And so having a simplified UI
00:16:29.560 | for that subject matter expert
00:16:31.200 | to be able to kind of quickly
00:16:32.680 | and easily give context
00:16:35.280 | or correct a prompt
00:16:37.720 | or create a better answer
00:16:39.320 | is critical.
00:16:40.260 | But this is how we start
00:16:42.180 | down this path
00:16:44.060 | of gaining confidence
00:16:45.180 | that these systems
00:16:46.640 | can perform in robust,
00:16:48.120 | higher risk situations.
00:16:50.260 | is by really kind of,
00:16:52.180 | you know,
00:16:52.440 | I had a guy
00:16:54.080 | the other day
00:16:55.300 | who said,
00:16:56.080 | you know,
00:16:57.360 | oh, you're just unit
00:16:58.160 | testing the agent.
00:16:59.040 | And I kind of said,
00:17:01.060 | well, I'd like to think
00:17:01.880 | it's more clever than that,
00:17:03.080 | but yeah,
00:17:03.860 | you know,
00:17:04.640 | more or less, right?
00:17:05.680 | You know,
00:17:06.220 | really kind of searching
00:17:07.680 | across the question space
00:17:09.760 | or that is expected
00:17:11.100 | to be kind of gone after
00:17:14.240 | with this system
00:17:15.060 | and then diving in
00:17:16.580 | at the most granular levels
00:17:18.140 | to ensure that this system
00:17:19.460 | is performing.
00:17:20.040 | now this eval system,
00:17:21.620 | I should say,
00:17:22.320 | a lot of it is open source
00:17:24.420 | in MLflow.
00:17:25.520 | The LLM judges are not,
00:17:27.520 | but a lot of the capabilities
00:17:28.960 | here can be run.
00:17:30.100 | Whether or not
00:17:30.980 | you're using Databricks
00:17:31.860 | or not,
00:17:32.240 | you can use open source MLflow
00:17:34.220 | to do these evals
00:17:35.300 | or you can,
00:17:36.100 | if you're using Databricks,
00:17:37.640 | you can hook it up
00:17:38.320 | and gain all the value
00:17:39.220 | of some of our custom judges
00:17:40.540 | and what have you,
00:17:41.760 | right?
00:17:43.820 | So that is kind of the stack
00:17:46.440 | and that's how we're helping
00:17:48.100 | organizations bring Gen AI
00:17:50.640 | and particularly bringing
00:17:52.300 | Claude into this space.
00:17:55.100 | before we wrap up though,
00:17:56.480 | I wanted to share,
00:17:57.460 | you know,
00:17:58.680 | there are these analysts out there,
00:18:01.520 | Gartner and Forrester
00:18:03.020 | and all these,
00:18:03.680 | they go around
00:18:04.600 | and they write report cards
00:18:07.020 | on how good is every,
00:18:08.520 | how good are all the vendors,
00:18:10.560 | right?
00:18:10.960 | which vendors are the leaders
00:18:12.600 | and what have you.
00:18:13.340 | We do pretty well in these,
00:18:15.060 | but I'm really excited to say
00:18:18.880 | that we're now using ARIA
00:18:20.980 | to do these.
00:18:21.820 | So to give you a sense,
00:18:22.960 | the last time we filled out
00:18:24.660 | the Gartner one of these things,
00:18:26.060 | we ended up writing
00:18:27.140 | a 450 page document,
00:18:29.220 | right?
00:18:29.980 | They had 180 questions for us
00:18:31.740 | and we ended up passing back
00:18:33.040 | to them a 450 page document.
00:18:34.940 | So using Claude,
00:18:39.340 | we actually have taken
00:18:41.420 | a whole,
00:18:41.880 | our doc,
00:18:43.400 | our blogs,
00:18:44.160 | our docs,
00:18:45.080 | a whole bunch of the information
00:18:48.060 | about our system,
00:18:49.380 | as well as past answers
00:18:51.480 | we've written
00:18:52.020 | for these types of things
00:18:53.360 | and we've actually gotten it
00:18:55.080 | so that when Gartner
00:18:56.940 | or Forrester
00:18:57.800 | or what have you
00:18:58.500 | send us these questionnaires,
00:19:01.040 | we just run them
00:19:01.900 | through the bot.
00:19:03.620 | and, you know,
00:19:04.420 | I'll say the answers,
00:19:05.760 | we still read the answers over
00:19:07.580 | and we correct some of them
00:19:10.660 | some of the time,
00:19:12.360 | but the ability for us
00:19:14.480 | to do this
00:19:15.080 | has made it
00:19:15.680 | so that now
00:19:16.480 | instead of it being
00:19:17.800 | kind of hundreds of hours
00:19:20.000 | of product managers
00:19:21.140 | and engineers
00:19:21.920 | and marketing folks
00:19:22.920 | all kind of pounding
00:19:23.760 | on the keyboard
00:19:24.480 | to try and put something together,
00:19:26.240 | now we're just editing
00:19:27.840 | what I wouldn't even call
00:19:29.540 | a rough draft.
00:19:30.300 | we're editing
00:19:30.840 | what's pretty darn close
00:19:31.920 | to a final draft
00:19:33.080 | coming out of Claude
00:19:34.480 | and the reason why
00:19:35.900 | I have this up here
00:19:36.960 | is because we built this,
00:19:39.920 | this went through
00:19:42.640 | many iterations.
00:19:43.500 | We started with
00:19:44.500 | open source models,
00:19:45.340 | then we went to
00:19:47.480 | non-anthropic models
00:19:49.620 | of a different vendor
00:19:51.480 | and then we started
00:19:53.460 | using Claude
00:19:54.440 | and it wasn't until
00:19:55.780 | we started using Claude
00:19:57.160 | that the results
00:19:58.440 | were good enough
00:19:59.180 | that we,
00:19:59.760 | it was when
00:20:00.740 | we started using Claude
00:20:01.980 | that we,
00:20:02.280 | for the first time,
00:20:03.540 | had results
00:20:04.500 | that we could ship
00:20:05.420 | without touching them
00:20:07.100 | and that was a huge win
00:20:08.960 | for us
00:20:09.520 | and so it's a really exciting,
00:20:11.860 | this is one of these
00:20:12.720 | that I'm super excited about
00:20:14.200 | because it makes
00:20:14.700 | my life way better.
00:20:15.880 | We just published
00:20:18.080 | a blog on this,
00:20:19.180 | really exciting stuff
00:20:21.420 | if you have to spend
00:20:23.120 | your days filling out
00:20:24.180 | these darn questionnaires.
00:20:25.480 | Block is also
00:20:27.160 | a customer of ours
00:20:28.440 | and Block has built
00:20:29.840 | this open source system
00:20:30.960 | called Goose
00:20:31.680 | and if you haven't
00:20:32.300 | given Goose a try,
00:20:33.240 | you should take a look.
00:20:34.100 | As I said,
00:20:35.080 | it's open source.
00:20:35.680 | It's really a dev environment,
00:20:38.540 | an agentic dev environment
00:20:40.600 | to accelerate their developers
00:20:43.200 | to be able to build,
00:20:45.600 | you know,
00:20:45.940 | it has basically
00:20:46.940 | Claude built into it
00:20:48.220 | and it has connections
00:20:49.100 | to all of their systems
00:20:50.560 | and all of their data
00:20:51.660 | so that they can
00:20:52.760 | much,
00:20:53.360 | much more quickly
00:20:54.240 | and easily
00:20:54.900 | build,
00:20:55.980 | you know,
00:20:57.360 | kind of within
00:20:58.240 | and really accelerate
00:20:59.460 | the developer experience
00:21:01.540 | far beyond
00:21:02.360 | kind of what we're all used to
00:21:03.720 | with Code Complete
00:21:04.640 | or something like that
00:21:05.860 | into a much more
00:21:07.180 | purpose-built system
00:21:08.480 | to be able to go attack
00:21:10.040 | improvement of their workflows
00:21:11.740 | and things like this.
00:21:12.780 | You can see 40 to 50%
00:21:15.040 | weekly user adoption increase,
00:21:16.700 | 8 to 10 hours saved per week
00:21:19.100 | by using this
00:21:20.360 | and it's been really exciting
00:21:22.480 | to see Block be this successful
00:21:24.560 | with Databricks on,
00:21:26.400 | or with Claude on Databricks
00:21:28.160 | as well as to see Goose
00:21:30.640 | start to pick up in the market
00:21:32.500 | and more and more people
00:21:33.600 | playing around
00:21:34.580 | and starting to try out Goose.
00:21:36.160 | So those are just a couple
00:21:37.560 | of the areas
00:21:38.160 | where we've had success
00:21:39.200 | getting these models
00:21:41.700 | and these systems
00:21:42.500 | in production
00:21:43.060 | and creating value
00:21:43.980 | for customers.
00:21:44.660 | So I'll just end it with,
00:21:47.080 | you know,
00:21:47.580 | I'm sure everyone here
00:21:49.460 | is deep enough in this
00:21:50.740 | that I don't need to tell you
00:21:51.840 | to start identifying
00:21:52.760 | your AI use cases,
00:21:54.020 | but once you've identified
00:21:55.720 | those AI use cases
00:21:56.880 | and you've started
00:21:57.580 | to understand
00:21:58.320 | what success may look like,
00:21:59.880 | you know,
00:22:01.200 | contact us,
00:22:01.960 | reach out to Databricks,
00:22:03.060 | reach out to Anthropic,
00:22:04.640 | happy to work with you,
00:22:06.560 | either,
00:22:07.240 | you know,
00:22:08.000 | kind of in your
00:22:08.740 | professional capacity
00:22:09.660 | with the organizations
00:22:10.580 | you work for
00:22:11.440 | and really help them
00:22:13.240 | gain the confidence.
00:22:14.740 | The meeting I was in
00:22:16.200 | earlier today,
00:22:17.000 | as I was walking
00:22:19.540 | out of the meeting,
00:22:20.120 | the head of AI
00:22:20.820 | came running over to me
00:22:22.080 | and he said,
00:22:22.720 | I really appreciate
00:22:23.500 | the session today.
00:22:24.300 | And I said,
00:22:25.240 | no worries,
00:22:25.760 | like happy to present,
00:22:27.080 | happy to chat with you
00:22:28.040 | about what we're doing.
00:22:28.740 | He goes,
00:22:29.780 | it wasn't learning
00:22:30.460 | about your stuff
00:22:31.180 | that I appreciated.
00:22:32.000 | It was you telling
00:22:33.160 | our chief data officer
00:22:34.600 | how hard my job is
00:22:35.820 | that I really appreciated,
00:22:37.060 | right?
00:22:37.380 | And so let us know
00:22:38.900 | how we can help you
00:22:39.900 | in this journey.
00:22:41.540 | With that,
00:22:42.320 | I wanted to open it up
00:22:43.500 | to any questions.
00:22:44.440 | So your safe score
00:22:46.260 | in the evaluation,
00:22:47.220 | is that
00:22:48.020 | leveraging adversarial
00:22:50.640 | testing problems
00:22:57.400 | So the question was,
00:22:58.300 | is the safe score
00:22:59.700 | among our LLM judges
00:23:01.100 | kind of using a red teaming
00:23:02.820 | or a kind of adversarial
00:23:04.580 | technique
00:23:05.080 | or something like that?
00:23:06.300 | it's much more
00:23:07.040 | of a kind of,
00:23:07.780 | think of it more
00:23:08.600 | as like a guardrail
00:23:09.540 | type measure
00:23:10.200 | around,
00:23:10.760 | you know,
00:23:11.240 | was this response
00:23:13.160 | a green response
00:23:14.500 | or a comfortable response
00:23:16.720 | kind of thing?
00:23:17.340 | Any other questions?
00:23:19.040 | Yeah.
00:23:20.840 | Would you think
00:23:22.780 | like Minecraft Cloud
00:23:23.740 | is a compender?
00:23:24.340 | Like,
00:23:24.920 | do you feel some good
00:23:25.500 | of that?
00:23:25.760 | Yeah,
00:23:26.260 | I mean,
00:23:26.740 | you know,
00:23:27.900 | some of the folks
00:23:28.620 | over,
00:23:28.960 | you know,
00:23:29.800 | it's tough.
00:23:30.600 | we have a bunch
00:23:33.660 | of competitors
00:23:34.220 | for point solutions
00:23:35.980 | within the Gen AI space,
00:23:37.540 | right?
00:23:37.720 | You know,
00:23:37.940 | eval,
00:23:38.380 | you could say,
00:23:39.100 | you know,
00:23:39.760 | it might be Galileo,
00:23:41.000 | it might be,
00:23:42.020 | you know,
00:23:42.520 | Patronus,
00:23:43.740 | it might be others
00:23:44.540 | kind of thing,
00:23:45.220 | right?
00:23:45.500 | You know,
00:23:46.800 | and so there's
00:23:47.260 | some point-specific folks.
00:23:48.700 | I think
00:23:49.820 | the way we think
00:23:50.840 | about this
00:23:51.380 | is much more
00:23:52.260 | that the value
00:23:53.480 | comes in the connection
00:23:54.640 | between the AI system
00:23:55.860 | and the data system.
00:23:56.740 | Like,
00:23:57.360 | having worked
00:23:58.220 | at both AWS
00:23:59.080 | and GCP,
00:24:00.260 | I can say,
00:24:00.800 | like,
00:24:01.160 | the reason I'm at Databricks
00:24:03.320 | is because
00:24:04.080 | there was a conversation
00:24:05.740 | I had
00:24:06.280 | while at Vertex
00:24:07.980 | where we were
00:24:08.780 | sitting there saying,
00:24:10.220 | with MLOps,
00:24:11.020 | we had taken
00:24:11.660 | an order of magnitude
00:24:12.680 | off the development time.
00:24:14.020 | Where does the next
00:24:15.340 | order of magnitude
00:24:16.380 | off development time
00:24:17.640 | came from?
00:24:18.160 | And it really,
00:24:18.620 | I believe,
00:24:19.880 | comes from being able
00:24:21.000 | to really integrate
00:24:22.780 | the AI
00:24:23.360 | and the data layers
00:24:24.620 | together much,
00:24:25.520 | much more intimately
00:24:26.840 | and deeply
00:24:27.420 | than we've seen
00:24:28.180 | from most of the
00:24:28.800 | hyperscalers.
00:24:29.920 | any other
00:24:31.180 | last questions?
00:24:31.880 | Yeah.
00:24:32.160 | In one of your earlier slides,
00:24:34.720 | the customer investment
00:24:35.860 | is not on Claude yet.
00:24:37.140 | Yeah.
00:24:37.960 | It looks like
00:24:38.980 | they have many agents
00:24:40.180 | going together.
00:24:41.420 | Have you found
00:24:43.680 | that with Claude
00:24:44.540 | or in the three points
00:24:45.940 | that it's like
00:24:46.380 | you don't need
00:24:47.120 | many agents,
00:24:47.840 | it can kind of
00:24:49.100 | what you see
00:24:49.560 | if they manage that?
00:24:50.660 | Yeah.
00:24:51.200 | I mean,
00:24:51.960 | that's certainly,
00:24:52.780 | you know,
00:24:53.220 | we often encourage
00:24:56.420 | companies
00:24:56.880 | to take a more
00:24:58.840 | kind of
00:24:59.680 | composable agentic
00:25:00.880 | approach
00:25:01.420 | and we often
00:25:02.200 | encourage them
00:25:02.900 | to do that
00:25:03.420 | simply because
00:25:04.320 | when you're trying
00:25:05.400 | to build these systems
00:25:06.440 | to behave
00:25:07.440 | deterministically
00:25:08.280 | in a higher risk
00:25:11.240 | environment,
00:25:12.120 | then you need
00:25:14.820 | to be able
00:25:15.580 | to tune them
00:25:16.560 | at a much
00:25:17.560 | more granular level
00:25:18.680 | and so,
00:25:19.620 | you know,
00:25:20.080 | our goal
00:25:20.920 | is really
00:25:21.360 | to drive
00:25:21.820 | as much entropy
00:25:22.600 | out of these systems
00:25:23.660 | as possible
00:25:24.380 | in trying to
00:25:25.580 | get this determinism
00:25:26.980 | and so,
00:25:27.440 | you know,
00:25:29.220 | I think 3.7,
00:25:31.880 | I haven't gotten
00:25:32.860 | to play with 4
00:25:33.780 | nearly enough yet
00:25:34.760 | but I think 3.7
00:25:35.760 | probably could do
00:25:37.980 | a lot of that
00:25:38.700 | but I guess
00:25:39.740 | my only concern
00:25:40.620 | would be
00:25:41.200 | if we did find errors,
00:25:43.280 | would we have
00:25:43.740 | the knobs
00:25:44.180 | to be able
00:25:44.580 | to go
00:25:44.840 | and get them
00:25:45.440 | beyond just
00:25:46.280 | swapping up
00:25:47.800 | the prompts,
00:25:48.420 | right?
00:25:48.860 | And that's,
00:25:49.680 | I think,
00:25:49.980 | where,
00:25:50.280 | you know,
00:25:50.940 | even as these models
00:25:52.240 | have gotten much larger,
00:25:53.260 | I'll tell you,
00:25:54.280 | one of the things 3.7
00:25:55.380 | that I've really
00:25:56.340 | appreciated about 3.7
00:25:57.580 | is that
00:25:58.140 | it does a great job
00:26:00.200 | of taking prompts
00:26:01.300 | to other models
00:26:02.280 | and decomposing them
00:26:03.600 | into each of the steps.
00:26:04.860 | Like,
00:26:05.360 | I can take it
00:26:06.000 | and say,
00:26:06.640 | if I needed
00:26:07.560 | to rewrite this
00:26:08.580 | where it was
00:26:10.320 | as many small
00:26:11.620 | granular steps
00:26:12.620 | as possible,
00:26:13.360 | then 3.7
00:26:15.380 | has done
00:26:16.420 | a great job
00:26:17.000 | of that.
00:26:17.340 | So listen,
00:26:18.000 | I appreciate
00:26:18.680 | all the time
00:26:19.880 | and attention today.
00:26:20.660 | I'll be back
00:26:21.520 | if you have
00:26:21.920 | other questions
00:26:22.660 | back by the door
00:26:23.540 | or back outside
00:26:24.180 | and thanks again
00:26:25.240 | for coming today.
00:26:25.880 | Thank you.
00:26:26.600 | Thank you.
00:26:27.600 | Thank you.
00:26:28.600 | Thank you.
00:26:29.600 | Thank you.
00:26:30.600 | Thank you.
00:26:31.600 | Thank you.
00:26:33.100 | I'll see you next time.