back to indexForget RAG Pipelines—Build Production Ready Agents in 15 Mins: Nina Lopatina, Rajiv Shah, Contextual

00:00:00.000 |
welcome everybody thanks for showing up bright and early this morning we're gonna make your 00:00:18.840 |
life easier for RAG so just hang in there if you want to get started we have the link right 00:00:24.480 |
there contextual.ai/aie25 that's gonna get you a notebook it's gonna get you the getting started 00:00:31.980 |
link which will get you an API key that you'll be able to use for the notebook so feel free to jump 00:00:36.660 |
ahead and start on those pieces like that all right so let me first introduce the team that's 00:00:44.100 |
here so I'm Rajiv Shah I'm the chief evangelist of contextual AI so I do a lot of these talks 00:00:50.580 |
workshops I endlessly talk about AI which I'm kind of in this position for background I've been in 00:00:57.240 |
the AI space for a while before this I was over at hugging face for a number of years as well so 00:01:02.700 |
besides me talking we also have Nina you want to introduce yourself this is my fourth week at 00:01:11.940 |
contextual so very exciting for me thank you all for joining us this morning I've been working in NLP and 00:01:18.780 |
language modeling for the past seven or eight years really since Bert was considered a large language 00:01:25.440 |
model and I'm excited to show you how you can simplify your RAG pipeline with contextual today 00:01:31.920 |
thanks and besides Nina we also have two other members of our team that are in the back that if 00:01:38.640 |
you raise your hand and have questions they will come to you so one of them is Matthew who's a platform 00:01:43.440 |
engineer so if you have questions about how we scale up what's it what are the things going on in the back 00:01:48.480 |
end he's the person for you we also have John on our team who's one of the solution architects so if 00:01:54.240 |
you're like hey how would I integrate this into my environment he's the guy who's gonna be able to handle that as 00:01:58.800 |
well so let's kick this off what I want to do is for you to all get one of the big pieces out of 00:02:06.720 |
this is we can treat RAG like a managed service we don't go out anymore and train our own language large 00:02:13.620 |
language models if we're working with embeddings we don't build our own vector databases same thing with 00:02:20.260 |
RAG we can treat RAG just like any other managed service and so what we're gonna do today is I'm gonna do this quick 00:02:27.360 |
introduction here but then we're gonna get you started right away with building this RAG agent 00:02:31.360 |
with ingesting some files and Nina and I will go back and forth where I'll go and give you a bit more 00:02:36.480 |
of an overview of contextual you'll build an agent run a number of queries against it get a feel for it 00:02:43.200 |
and then I'll do a deep dive on some of the advanced settings that developers like around extraction re-ranking 00:02:49.520 |
retrieval like that and we'll end with a little bit on evaluation as well as how you can use MCP for 00:02:56.560 |
example connect your RAG agent to Claude desktop as well now to start with we all know the value of AI 00:03:04.960 |
right we're all here because of the importance of AI but for many of you also know the struggles of AI 00:03:11.680 |
how it's easy to build that demo with 10 documents when you're doing RAG but when you go to scale that 00:03:18.160 |
out all of a sudden it's hard to extract across thousands of documents of diversity right your accuracy 00:03:24.560 |
isn't there your users are complaining that they aren't happy with it because they don't know how 00:03:28.240 |
to query properly now what we had didn't we had contextual have focused on this so our founders 00:03:38.080 |
here dao and aman come from working on rag for a long time dao was an author was leading the team on the 00:03:45.680 |
initial rag paper aman was with him at meta i knew both of them from my time at hugging face where i was 00:03:52.720 |
helping people do rag years ago on question answer systems and i would be able to pull their researchers 00:03:58.960 |
in to help work with customers as i was cobbling together kind of open source components to build these 00:04:04.000 |
pipelines well they saw the need in that they want to change how the world works and so contextual has 00:04:10.000 |
come around for focusing on enterprise and we still have this dedication to ai research and you'll see 00:04:15.760 |
that as we go through our product as well now for again just the basics just in case there's and 00:04:23.840 |
somebody in here not familiar with it right when we're talking about rag retrieval augmented generation 00:04:29.520 |
the value here is when you have lots of that unstructured enterprise data we want to be able 00:04:35.840 |
to understand it so rag allows us to take that information a simple a very simple rag pipeline uses 00:04:43.760 |
a vector database to keep all of that use cosine similarity to find similar pieces pass it to an llm 00:04:50.560 |
that's the simplest we're going to get much more complicated today like that and what i want to do is 00:04:55.760 |
show off our platform to you the platform is built for a couple of different levels so one is there's 00:05:02.880 |
a no code so if you're a business user you just have a bunch of docs you want to be able to ask 00:05:07.200 |
questions to them we've made it very simple with opinionated defaults on that now if you're a developer 00:05:13.440 |
on the other hand you know what you want in your rag pipeline you're spending time evaluating 00:05:18.240 |
it you're spending time tweaking it well we've built a platform where you can then orchestrate 00:05:24.160 |
and change how do i want to do queering how do i want to do generation along the ways and finally 00:05:30.320 |
some of you might already have rag you might already have rag pipelines in production where there's just 00:05:37.520 |
one component that isn't working that well maybe your extraction isn't working that well or you want 00:05:42.080 |
a better re-ranker well we've made our system modular so if you just want to use one piece of it you can 00:05:49.920 |
and all we focus on is rag other companies focus on other pieces we just think about building rag and 00:05:57.680 |
again the reason we're here is as you go and try to do this in production you'll see like once you get 00:06:02.640 |
to lots of documents it gets messy all of a sudden then you're orchestrating a big environment full of 00:06:07.760 |
lots of different models now i need a bm25 i need a re-ranker putting all that stuff together ends up sucking up your 00:06:14.880 |
time it's fun to do the first time but after that it ends up sucking up your time ends up with slow kind 00:06:20.960 |
of costly um rag pieces so that's enough to get started i'm going to turn it over to nina 00:06:28.240 |
who's going to get you started on building that first agent 00:06:37.920 |
all right we're on the clock 15 minutes to get our agent up and running um just kidding we'll take 00:06:43.680 |
a little longer than that to explain everything but how you can get started is um you can find the 00:06:49.440 |
notebook and the getting started page at contextual.ai slash aie25 and then the gui for contextual is at 00:06:59.600 |
app.contextual.ai that's what that getting started button will point you to and so we're going to start by 00:07:06.160 |
loading a few documents we're going to load some financial statements from nvidia and some fun 00:07:12.480 |
spurious correlations to see how our reg system treats data that's not fitting with conventional wisdom 00:07:19.360 |
and then we'll try out a few queries uh including quantitative reasoning over tables and some data 00:07:26.160 |
interpretation and so i'm going to have these two screens here side by side the notebook on the left and the 00:07:35.680 |
the platform on the right so you can get started by signing up in the platform if you haven't already 00:07:43.120 |
that's at app.contextual.ai that's where you're going to get your api key so everyone take maybe 00:07:49.840 |
a minute or so you'll need to log into that and you'll need to set up your workspace i already have my 00:07:55.760 |
workspace set up and i'm actually using uh my right uh not my work account just so that you'll see exactly 00:08:02.880 |
what a general access user would see on their screen on my screen and just a tip for setting up those 00:08:10.480 |
workspace names it's going to have to be a unique name so you can't call it aie-demo i just throw my 00:08:17.680 |
name into the names just to make it a little bit more unique so um so everyone just uh take a minute to do 00:08:24.480 |
that and uh to help me get a sense of where folks are just briefly raise your hand and put it down 00:08:31.040 |
once you have your workspace set up and you see this welcome screen 00:08:38.640 |
okay folks are making fast progress um yes matthew 00:08:53.280 |
so yeah so we can start by in the notebook doing our pip installs oh start by making a copy of the 00:09:12.160 |
notebook so you can just do go to file save a copy and drive and then that will create a copy for you of 00:09:22.080 |
the notebook so that you can save whatever changes you're making to it if you want to make any changes 00:09:26.800 |
it is a bit slowly making a copy um but yeah after the pip installs and the imports 00:09:36.880 |
to get your api key you go into your contextual app click on api keys here on the right 00:09:46.720 |
save a copy of the notebook and then in our contextual platform on the right you can go to api keys 00:09:54.720 |
click on create api key so nina nina key 2 is my name and then we'll copy that and then we can go to our 00:10:07.440 |
notebook secrets on the notebook secrets on the left here the key there and then add a new secret and 00:10:13.120 |
then you'll just add contextual underscore api underscore key yeah it's refreshing 00:10:21.280 |
but if yours is working then uh you can i'll just do it in the notebook without making a copy contextual 00:10:31.040 |
so that's what we're going to do and then we're going to do it in the notebook and we're going to 00:10:39.440 |
do it in the notebook and we're going to do it in the notebook so we're going to do it in the notebook 00:10:42.640 |
and we're going to do it in the notebook and we're going to do it in the notebook and we're going to 00:10:44.880 |
if you want to brag on your own it's often an api scavenger hunt just to get started and this is 00:10:49.680 |
literally the only api key we're going to use for the whole rest of the workshop so we have just finished 00:10:58.080 |
so once we've saved those credentials then we're going to set up our client here so we're just going to 00:11:08.880 |
run the client in our notebook so here i'm on the left in the notebook and then we're going to create 00:11:15.280 |
our data store so you can change the name on this data store name variable and then it's going to see 00:11:25.200 |
if there's an existing data store or load that one 00:11:32.000 |
and then after that which seems to still not be running 00:11:52.800 |
that data store is going to be where you're going to be loading all of your documents 00:12:00.960 |
and then we have some code here in step two that will download those files that i mentioned earlier 00:12:09.920 |
to your notebook so that's going to be a few quarterly reports and 00:12:14.400 |
spurious correlation reports and so here in step two you'll see that it's fetching the data 00:12:21.920 |
and then uploading it to the data store and then you can go to your app.contextual.ai 00:12:28.160 |
and this is the aie demo data store that i made just here so if you click on your documents you'll 00:12:37.520 |
see that they're processing and once they're done processing you can click on these three dots and 00:12:46.480 |
then once the files are loaded you'll be able to inspect them and see how those files have been parsed and 00:12:53.200 |
ingested so i'll just take a quick look at the run i did yesterday of all these same exact files 00:12:59.680 |
and so if you click on inspect once your files are loaded for example if you have a table you're going 00:13:10.960 |
to see the raw text and then also the rendered preview so you can see this table 00:13:16.160 |
it has both the structure and the contents of the table in this image below and you're welcome to take 00:13:23.200 |
a look at these the numbers match very perfectly on the structure so that's kind of step one of getting 00:13:29.200 |
your rag system set up is having really accurate parsing and you can look at some of the image files we 00:13:36.880 |
chose some spurious correlations which are a fun way to test how much a rag system will actually 00:13:42.000 |
actually answer based on your own documents rather than its conventional wisdom and so you can see 00:13:48.080 |
there also there will be raw text and rendered previews with like key data points from figures and 00:13:56.240 |
summaries of the figures in addition to the text and other information extracted so while your files are 00:14:04.320 |
adjusting raj will come back and expand on the uh on the platform all right it seems to me i need to update 00:14:17.760 |
this slide with wi-fi issues as well but any of you who've built rag systems before know that there's several 00:14:24.960 |
components of it there's a lot of tasks you have to do right you have to think about building out an extraction pipeline 00:14:31.120 |
besides that when you're thinking about accuracy how am i going to chunk out what am i going to do 00:14:36.160 |
for chunking what am i going to do for re-ranking and then finally have to think about scaling it there's 00:14:40.960 |
a lot of these issues that come up and again it's interesting the first time you do it but maintaining 00:14:45.760 |
this stuff after a while gets tiresome and that's where contextual comes in where it's really an end-to-end 00:14:51.520 |
platform for managing rag and our platform runs in our sas it can run in your vpc we have a ui for 00:14:59.360 |
those business users but we also have an api rest api endpoints as well and when within the platform 00:15:06.720 |
we manage all those pieces for you so you don't have to worry about maintaining a vector store for 00:15:10.960 |
your embeddings and the way it works is you'll come with your structured and unstructured data 00:15:15.760 |
we bring that in we have a document understanding pipeline and that's the extraction part that we're 00:15:21.920 |
going to walk through here where we need to cleanly be able to pull all that information out of your 00:15:26.640 |
documents your tables your images once we have that we chunk that up and then we go through 00:15:32.960 |
the best practices around retrieving so we have a mixture of retrievers so bm25 as well as an embedding 00:15:39.760 |
model we have a state-of-the-art re-ranker that we've trained ourselves that's in our pipeline and then we 00:15:46.240 |
pass it all to a grounded language model now we don't use a model from open ai or gemini we trained our own 00:15:53.520 |
grounded language model because for rag use cases we want the model to be grounded we want it not to 00:16:01.680 |
give its own advice just because it can pass the law exam or know everything about medicine we don't 00:16:06.800 |
want it to use its knowledge when it's answering questions we want it to instead stay grounded 00:16:12.000 |
respect the context that we're giving like that and finally there's lots of different ways we can output this 00:16:18.480 |
and i'll show you this along the way as well now for each of these parts we have that academic 00:16:24.720 |
pedigree as i mentioned earlier so one thing we always do is look at the academic benchmarks nowadays 00:16:29.440 |
we have lots of customer data set points that we use as well but whether we think about the end-to-end 00:16:34.720 |
accuracy of our platform or the specific parts around document understanding retrieval grounded 00:16:40.880 |
generation each one of those we focused on to make them state-of-the-art for what they do like that 00:16:47.600 |
so once we've done that when you go to use them there's a lot of different uses that you can have 00:16:55.200 |
for this i think often when we think of initially of of rag we think about i'm going to build a question 00:17:01.920 |
and answer chatbot right like i'm going to ask a question get back an answer well yes that's one 00:17:08.240 |
use of what you can do with this but we have customers like qualcomm who use contextual on their 00:17:15.040 |
website so if you go live to their website right now and you ask a question there it's powered by 00:17:20.960 |
contextual on the back end and you can see similarly the question the answers right our feedback modules 00:17:27.040 |
sources are all available there besides that we have other customers for example some folks in the finance 00:17:34.480 |
financial spectrum my audio and they're thinking about automated workflows how can i take all the 00:17:42.080 |
unstructured data i have structure it to make it useful for folks well we have apis you can hook up 00:17:49.680 |
those apis be able to run them against here is it good ah okay no worries but we can take all that 00:18:01.200 |
unstructured use our rag agents in the context of something like a spreadsheet or some other workflow 00:18:06.320 |
to be able to use that finally we can connect it with lots of other tools 00:18:10.960 |
i'm going to put this in there even though i know it's not going to work 00:18:16.400 |
but just for all of you we can show you and i'm going to show you at the end here today how you can 00:18:20.880 |
for example integrate the rag agents with something like claude with mcp so that way you can take 00:18:26.960 |
advantage of the enterprise knowledge that you have in there all right so with that i'm going to hand it 00:18:35.200 |
back over to nina and let's get you started on your agents 00:19:03.920 |
it looks like the speaker wi-fi and my hot spot are not working so we'll just continue with the 00:19:25.600 |
notebook and we'll talk through it and then i guess is anyone else's wi-fi working raise your hand if you 00:19:32.400 |
have wi-fi okay well you guys can follow along you have the videos too right mm-hmm i do have oh oh 00:19:40.240 |
it's back up okay cool um so yeah where we were in the notebook on the left here is um hopefully all of 00:19:50.000 |
your uh data stores have loaded now um yes so assuming you had wi-fi to upload them uh these are the ones i just 00:20:01.600 |
loaded in the workshop earlier and then in the notebook on the left you can see um that you can 00:20:06.880 |
access these files uh through the api just the same as you can through the gui and then you can have the 00:20:12.480 |
document metadata like when it was created the name uh the status of the ingestion and now we're going to 00:20:19.120 |
create our agent uh so over here we have our system prompt uh this is kind of just our default system 00:20:25.360 |
prompt that the agents are loaded with and then you can run this next block of code here on the left 00:20:31.920 |
to set up your agent you can change the name if you'd like i called it demo-aie and then in on the right 00:20:38.800 |
in our gui if you click on agents uh there it is there's my new agent right here that i just created 00:20:45.840 |
and um and then you can put in your question based on these documents so for example um one question 00:20:54.640 |
that we can ask is what was nvidia's annual revenue by fiscal year 2022 to 2025 and so uh what you'll 00:21:03.360 |
notice here is we have our responses what we were provided was quarterly data so what the agent is 00:21:10.400 |
doing is adding that up and then listing it here on the right and then noting that this is based on 00:21:16.240 |
quarterly data and then you'll see these little numbers in a circle the two and the one so if you 00:21:20.880 |
click on the two here you'll actually see the image that this data came from and similarly if you click 00:21:29.440 |
on one you'll see the other image so interestingly this came from two separate files that were unrelated 00:21:34.960 |
in any way they actually don't even have like a standard naming convention for these i pulled these 00:21:39.200 |
from the nvidia financial reports website and it knew to reference those documents to pull that 00:21:45.440 |
information across documents and then you can do the same exact thing that we just did in the gui 00:21:52.160 |
programmatically so here in step four we're running our query uh and we're getting the exact same 00:21:58.960 |
response in the api so this can make it easier to integrate it into your um into your application 00:22:05.120 |
and so we can also here visualize which files were used to reference in the api and so now we're printing 00:22:14.480 |
the documents here the base64 encoding is getting returned with another api query so just because it's 00:22:22.560 |
a more fun and visual demo uh we'll ask the next few questions via the gui here so uh nvidia used to 00:22:31.280 |
be a gaming compute company so i'm gonna ask the chat bot when did nvidia's data center revenue overtake gaming revenue 00:22:39.680 |
and so um the crossover point was in q1 fiscal year 23 uh with data center revenue at 3.7 million and gaming 00:22:51.520 |
revenue at 3.6 and then here i can click on the one here so that's q1 fiscal year 23 and we can look at our 00:22:59.600 |
slide here and here it is that's actually like the exact crossover point where uh data center revenue overtook gaming revenue 00:23:08.880 |
now we're going to look at the spurious correlation files uh that we've also loaded so for those of you 00:23:16.880 |
unfamiliar this is a website that uh basically data crawls and finds correlations between things that have 00:23:24.080 |
no correlation so we're searching for what's the correlation between the distance uh between neptune 00:23:31.120 |
and the sun and burglary burglary rates in the us that would be some extreme astrology there if that 00:23:38.320 |
actually determined burglary rates and so what you'll find is that our rag system that's really really 00:23:46.160 |
focused on avoiding hallucinations um and really following the data that you've loaded it will give 00:23:52.320 |
you that correlation coefficient which is very high but then it will also share that context um so 00:23:58.800 |
otherwise elsewhere in the document they talk about how they're doing data dredging and um that this is not 00:24:05.200 |
really a valid statistical correlation and so you can then also reference where in that document these claims 00:24:12.240 |
are made both the statistical correlation and the uh and the caveats which are not loading i think from wi-fi 00:24:22.800 |
issues but you can test this out in your own instance uh as well um we have another fun one that was uh what's the 00:24:31.760 |
correlation between global revenue generated by unilever group and google searches for lost my wallet that's 00:24:38.880 |
another spurious correlation that we loaded and it will do the same thing it will give you the actual answer 00:24:45.760 |
and then note the caveats and i'll note that if you run these same queries in chat gpt here on the right 00:24:53.680 |
if you just ask the question without any files not doing rag it will just start with telling you there's 00:25:00.240 |
no meaningful correlation we know that's true but think about if your own documents don't follow 00:25:05.680 |
conventional wisdom or have some information that's not out there it's going to argue with the information 00:25:10.560 |
in your documents rather than presenting it even though it may not be may not actually be true in this case 00:25:18.000 |
and then if you do a document upload and use the long context in chat gpt you will get that response of the 00:25:26.560 |
actual correlation but then it will just say it's a spurious correlation and it won't really go into why 00:25:33.600 |
so it's not really going to um you know a simple system like this is not really going to hew to the 00:25:39.280 |
facts that you've presented it with um and uh for a fun question we just searched about uh global revenue from 00:25:50.160 |
unilever group and google searches for lost my wallet does this imply that unilever group's revenue is 00:25:56.320 |
derived from lost wallets and it will answer no that's not true um which we know and then if we ask 00:26:03.520 |
another query that relates to different documents that don't have any information shared between them 00:26:09.520 |
now we're looking at the correlation between the distance between neptune and the sun and global revenue 00:26:14.800 |
generated by unilever group and it will not answer that question it will just say it does not have 00:26:19.680 |
that information and so you know these are all uh pretty straightforward questions and answers but um 00:26:27.360 |
you set up with our default settings but if you go to your agent and you click on edit in the admin panel 00:26:34.880 |
there's actually a lot that you can change if you know for example that last question i asked if there was 00:26:40.960 |
data that supported it and it wasn't found or something uh you can check which data stores are 00:26:45.920 |
linked so you can link multiple data stores you can adjust the system prompt and that is also something 00:26:52.560 |
you can do via api that we'll go over briefly later there are some settings on query understanding so we 00:26:59.520 |
disabled multi-turn in our setup just to have everyone have the same consistent responses as they're trying 00:27:04.960 |
to make the workshop easier to follow um you can set up query expansion or decomposition uh you can set 00:27:14.400 |
some retrieval settings settings settings for the re-ranker settings for the filter including the prompt 00:27:21.520 |
generation settings and you can even set up a user experience where you have suggested queries for the user 00:27:28.240 |
um and so uh going back to our notebook here um what we have next in the notebook is going to be 00:27:39.120 |
examples of code for some of the components so these are the individual components that make up the 00:27:45.200 |
full rag system we won't run through this in the workshop i'm just going to share what we have here 00:27:50.640 |
before raj does a deeper dive into what these components can do so for each component we've shared a link 00:27:57.200 |
to a full a more complete notebook that lets you use more features of that component we have our benchmarks 00:28:04.160 |
that compare it to other solutions and then we have a little bit of example code that will run in the notebook 00:28:09.680 |
to try out these components alone and you don't need to set up the it's the same api key that we set up before 00:28:16.240 |
so everything will just run i will just say you want to uh let your file process before you display it 00:28:22.000 |
but in this example we're just printing the first page of the attention is all you need paper we have 00:28:27.760 |
our re-ranker which is the world's first instruction following re-ranker and that's also a standalone 00:28:33.520 |
sub-component that you can use independently of the full platform we have our generate model that is 00:28:39.040 |
according to the effect grounding benchmark the most grounded language model in the world 00:28:44.400 |
that's available as a standalone component and here's some sample code for that and then after raj's deep 00:28:51.920 |
dive on the platform and components we'll go over the lm unit the natural language unit testing model 00:28:58.400 |
available as an endpoint that we will use for our evaluation so yeah back to raj for the next portion of 00:29:09.360 |
the workshop all right yep all right i just need this one 00:29:18.720 |
all right if anybody knows me i'd love to kind of talk and get into the technical pieces and 00:29:26.080 |
what's going on a little bit deeper so this is useful for learning about contextual but if you're even 00:29:30.560 |
building your own rag pipelines as well all these components come into play um to start with now 00:29:37.600 |
to start with how many people have had issues with hallucinations in their rag pipelines 00:29:41.840 |
it's just been a few have people not have issues not had issues okay yeah i think everybody yeah finds 00:29:50.800 |
this now when we start thinking about hallucinations there's a couple of ways we can think about how to 00:29:56.960 |
prevent them the first is making sure we retrieve clean information if we don't do a great job at extraction if we 00:30:05.280 |
don't do a good job at retrieval well then we're not going to give good information to that generator 00:30:10.160 |
model it makes it the generator model will then put its own we'll try to answer it itself leading to 00:30:16.960 |
hallucinations so that's one big piece the next is that language model itself and we'll talk about that 00:30:23.440 |
is how we can ground that language model so it refers back to the context that it was given rather than 00:30:29.520 |
substituting its own judgment and then i want to talk about having some checks in place because even if 00:30:35.600 |
you do that you still want your end users to be able to trust the system so i'll talk about groundness 00:30:41.120 |
checks as well as you've already seen it inside the platform the bounding boxes for attribution as well 00:30:49.200 |
so to start with when you first build your first rag project right you need to extract from some pdfs 00:30:54.800 |
maybe you've used like an open source pdf extraction tool have folks used like open source extraction tools 00:31:01.360 |
like that pdf plumber things like that not that many have people been happy with them in terms of how 00:31:08.160 |
they've worked for tables and charts yeah it's the document on the left is the easy one right that's 00:31:17.360 |
what you show in the demos but your folks are going to show up with complex tables multimodal and that's 00:31:23.680 |
when it gets much trickier because when you get to these types of documents if you have a parsing error 00:31:30.400 |
in that table and things get shifted over right that's going to cause problems that you can't fix later on 00:31:37.280 |
if your vision language model hallucinates and adds some other information in there when it's reading 00:31:42.960 |
that multimodal chart right you're stuck with that right like that's going to lead to downstream problems 00:31:48.160 |
so making sure you have very good extraction is like top of the list when you get to complex documents 00:31:53.760 |
and so this is this is how i like to think about kind of what we're doing at contextual right now 00:32:00.800 |
it's an evolving system we update like every couple weeks the engineers probably don't exactly like 00:32:06.400 |
how i've diagrammed it like this but at least for me the mental model works where say you have a pdf 00:32:12.720 |
document the first thing we're going to do is add some metadata around it right like when was it created 00:32:18.720 |
what's the file name once we have that then you want to do a layout analysis is this a document that's 00:32:26.240 |
image only that we need to do an ocr for does it have images maybe technical descriptions multimodal charts 00:32:34.720 |
graphs that we need to add image captioning where we're going to get a text description of those features 00:32:40.960 |
if you have tables we have a special table extraction mode that focuses on getting high quality results out 00:32:49.120 |
of tables now beyond that the structure of the document also carries meaningful information 00:32:55.360 |
right what are the section headers what are the different subsections in there using that information 00:33:02.160 |
can make your make your extraction and go a lot better now once we have that we create a nice markdown 00:33:10.320 |
kind of json version of that i'll show you that in the platform here but then for rag use cases we're 00:33:17.440 |
going to take those documents and we're going to chunk them down into smaller pieces to work with right 00:33:22.000 |
because lms as much as the long context is growing you can't take some of these large documents and chunk 00:33:27.200 |
them in and especially there's the overall compute cost of trying to use those long contexts fully 00:33:33.440 |
so we create chunks we use some of that metadata that we've created inject that in like knowing 00:33:38.720 |
the hierarchical structure of that along the way we set bounding boxes this is one reason when we do 00:33:45.280 |
image captioning for example we don't want to take the entire whole page and do that we like to know 00:33:50.880 |
where the images are so we can do bounding box so the user knows when we say sales went up by you 00:33:56.800 |
know 10 10 10 they know where on the page exactly it was 00:34:00.640 |
so within the platform and i know you can't get to it right now we have what the components you can see 00:34:09.360 |
on the left side let me see if i can kind of right there all the individual parts of our platform you can 00:34:19.040 |
work with individually and we have a playground for that so this is the parse piece here where you can just 00:34:24.480 |
take a pdf document decide how you want the extraction i just want text or hey i want the full works where 00:34:30.720 |
i want image captioning as well or i have long tables i want the table extraction mode and you can do that 00:34:36.880 |
right inside the ui see what the results look like now of course right when developers we also have an api 00:34:43.360 |
python sdks javascript sdks as well so you can do this programmatically as well but that's the first step 00:34:52.000 |
that's the first step here is the first step here is the query reformulation and so here depending on if 00:34:59.680 |
you're doing multi-turn we need to take into account the previous conversations query expansion maybe the 00:35:20.560 |
types of users are doing lots of abbreviations inside your company so we need to take that query make it 00:35:34.240 |
a little richer and fuller before we pass that in to get good results or on the other hand maybe your 00:35:40.640 |
folks are putting in long complex queries where what we want to do is take that query break it up into smaller 00:35:47.280 |
sub queries answer each of those so this is where you have the flexibility you can turn these knobs 00:35:52.720 |
on and off as you want as nina was showing you in the edit to use those now once you've gone through 00:35:59.120 |
that process that long query might that you have might break into three sub queries each of those 00:36:05.680 |
reformulated queries then passes through your semantic search and lexical search so hybrid search kind of 00:36:12.320 |
classic best practices there and then we give you the option of using a what we call a data store filter 00:36:19.120 |
because maybe your query is asking what's the revenue of apple computer and your data store has 00:36:25.840 |
apple microsoft tesla all those documents well if you already know they they're interested in the apple 00:36:32.080 |
company why not just filter out all the other documents so you don't have to worry about that 00:36:36.720 |
that's where that data store filter comes into play now typically for retriever our defaults are 00:36:42.160 |
something like a hundred chunks that will come back retrieval's fast you can do that but then to get 00:36:49.360 |
better accuracy we like to use a re-ranker and again this is best practices around rag we've trained our 00:36:54.560 |
own the instruction following re-ranker and that helps you go from that 80 and like the default is 15 but 00:37:00.480 |
again you can change it for your application and give you let's say 15 high quality results that you 00:37:06.960 |
can then pass all the way through to the end we have another filter stage if you want to for example drop 00:37:12.160 |
out some of those as well so this is all of these pieces are orchestrated with lots of different machine 00:37:19.200 |
learning models like this so this is where it's like a pain to build this for yourself but we've built this 00:37:24.320 |
like this now if i had the time i'd go into all those pieces in depth but i want to highlight a 00:37:29.600 |
couple of things i like our re-ranker i think is interesting where you have the ability to provide 00:37:35.520 |
it instructions so a common one might be for example if you're asking questions about apple and you want 00:37:41.120 |
to get the most recent documents so you can give it a prompt and then when it re-ranks those it's going 00:37:47.120 |
to look at the contents of the chunks and use that prompt to help re-rank it so a lot of possibilities 00:37:53.120 |
kind of with this and we're kind of excited to have this piece there now the last step is generation 00:38:01.520 |
so we've retrieved back all of the query next we're taking it to our grounded large language model again 00:38:09.040 |
the notion here is we want a large language model that's been specifically fine-tuned on to to respect 00:38:16.480 |
the knowledge that it's given not give its own knowledge like that besides that we have the two 00:38:22.800 |
pieces that i've talked about earlier the groundness and the attributions you've already seen the 00:38:26.160 |
attributions and i'll talk a little bit more kind of about groundness now the grounded language model 00:38:31.680 |
since we trained it ourselves we got to kind of do some different things with it so one thing we've done 00:38:36.560 |
is the language model will tell you the difference between facts and commentary so when you have some 00:38:44.560 |
text it will look at that answer and see hey this is the important facts in here the rest is just other 00:38:50.720 |
commentary so this is where when you're working with this if you want i need a super grounded 00:38:56.080 |
model like i just want the facts in my rag system i don't want any of the superfluous stuff like you 00:39:01.600 |
can turn the setting on to eliminate the commentary and just focus on the facts for example so this is 00:39:08.560 |
where kind of training our own gives us flexibility to do this kind of thing 00:39:11.920 |
the second thing is is the groundness check we have so the groundness check works directly in our ui 00:39:20.880 |
you can also use this through the api where for every response that comes back what we do is we look 00:39:28.400 |
sentence by sentence essentially and see is the claim that's mated found in the documents that were 00:39:35.120 |
given so here if you look you'll see the bottom sentence is in yellow and it's in yellow because 00:39:42.560 |
the claim it's making is a totally bs claim that it just made up right it's not actually in the source 00:39:49.200 |
documents i literally prompted it to make up something and so here we'll automatically in the ui highlight 00:39:55.680 |
that's yellow so if you have users that you're worried about like they keep falling for these 00:40:00.720 |
hallucinations this is kind of a helper feature inside there and the and the way it works is when 00:40:08.000 |
we retrieve all the context and again you have access to all the chunks all the things that are retrieved 00:40:12.720 |
that answer is going to be generated but then we're going to decompose that answer into specific claims 00:40:20.320 |
so we have a model that just does this decomposition and looks and sees is this claim 00:40:25.600 |
grounded back in that context or not so we can run this live at time be able to get those groundness 00:40:32.560 |
scores show you that with the response as well all right nina has got more on evaluation like that 00:40:55.520 |
so for our last step of setting up our rag agent we will get to the evaluation step so we're going 00:41:09.840 |
to use our lm unit model for natural language unit tests which may be a little bit different from how 00:41:15.200 |
you've been doing evaluation of your rag system before so we have a fine-tuned model as a judge and 00:41:21.360 |
that has state-of-the-art performance on flask and big gen bench and the way that it works as we'll get 00:41:27.280 |
into in the notebook shortly is that you'll put in um the prompt and get the model response as your rag 00:41:35.040 |
system would generate and then we're going to create unit tests for the prompt that will ask specific 00:41:42.640 |
questions that you want tested about those responses then we're going to use the lm unit model to evaluate 00:41:49.280 |
these unit tests and then we can look at those scores so we'll be fully in the notebook now and 00:41:55.440 |
i've just given up on the wi-fi now i'm just going to go through what i ran earlier so we can just look 00:42:01.680 |
at the results uh if you had internet and could run this all on your own so 00:42:08.240 |
so in this section we have uh we have commented out the code that we used to generate the data set 00:42:17.280 |
so that has our set of questions about the data set uh six questions and then this is the section that 00:42:25.520 |
will generate those results from those queries so to save some trees we just ran this ahead of time 00:42:33.040 |
and saved it in our github so you could load it directly um and uh that's uh at this file here we 00:42:42.080 |
have this eval input csv um and then that just has prompts and responses for six queries from that same agent 00:42:51.360 |
that we set up earlier and the unit tests we're going to ask i thought these would be interesting 00:42:58.320 |
ones for this uh for this document set and the queries does the response accurately extract 00:43:05.680 |
specific numerical data from the documents does the agent properly distinguish between correlation and 00:43:14.000 |
causation that could be useful if you have some sort of statistical analysis in your documents our 00:43:19.840 |
multi-document comparisons performed correctly with accurate calculations our potential limitations or 00:43:26.560 |
uncertainties and the data clearly acknowledged our quantitative claims properly supported with specific 00:43:33.760 |
evidence from the source documents and does the response avoid unnecessary information 00:43:40.640 |
we know that uh llms like to blab on and on in their response so this is a good unit test i think overall 00:43:48.160 |
some of the other ones you can set based on your own documents if you want to try this out later 00:43:53.520 |
and so we just set up our unit tests here um and then the lm unit model will score on a one to five 00:44:03.840 |
scale and it's been fine tuned for this specific task and so then the next block of code that you would run 00:44:11.360 |
would be this uh response generation so we put in our query what was nvidia's data center revenue in q4 00:44:19.200 |
fiscal year 25 and then you get your response and that does have the uh the correct amount the citation 00:44:29.600 |
but then it has a lot of other information and so the unit test we used was does the response avoid our 00:44:35.840 |
necessary information and the score was 2.2 out of five so for example if we wanted to if this was an 00:44:42.720 |
important criteria for us then we could later update the system prompt or change some of those other 00:44:47.760 |
settings to specifically target this um the results from this unit test of course we don't want to operate 00:44:54.240 |
from an n of one so um we're now testing this batch of six which of course again is a small batch this 00:45:00.720 |
was meant to run in uh real time if we had internet um and this would just run through all of those six 00:45:09.520 |
queries and all of those six unit tests to get the results then this is the line here um run unit tests 00:45:17.520 |
with progress this uh will give you the results from those unit tests and so here we can look at those 00:45:25.280 |
results we can see the prompt response and um and then we can see the scores here for each of our six 00:45:36.800 |
unit tests and those are going to be from one to five um and then we can also then map those long sentences 00:45:45.840 |
to just a one word category so just accuracy causation synthesis limitations evidence and relevance you 00:45:53.600 |
can set these as you'd like and we're going to create polar plots that show us for example for our second 00:46:00.000 |
question what is the coefficient between neptune's distance from the sun and u.s burglary rates we can 00:46:07.120 |
see in the polar plot that the causation was very well addressed by our model the accuracy and limitations 00:46:14.800 |
also scored high on the unit test but the synthesis was not as highly scored um and these other factors 00:46:22.320 |
were not as highly scored so we can look at all six and uh we can kind of see some of these queries 00:46:28.640 |
scored really well across all our unit tests others may point to areas to improve we linked earlier to 00:46:35.360 |
the full lm unit notebook where you can look at these categories and cluster them so for a larger scale 00:46:42.640 |
evaluation data set you could actually like look for um for meaningful insights insights from this unit test 00:46:49.600 |
um and so uh your first homework if you've made it this far is to improve the agent based on the unit test 00:46:58.240 |
results and so one way that you could do that in the api would be uh here to update the system prompt so i 00:47:05.600 |
have our original one and you could you know change something here like hey keep the responses really 00:47:11.360 |
brief only answer um uh the direct question and then you can run this line of code here to update the agent 00:47:20.400 |
and um and yeah that's our that's our end-to-end pipeline and now for a bonus raj is going to show 00:47:28.640 |
you how you can connect your agent to claw desktop via mcp and for that we have links to our github 00:47:36.560 |
here and the youtube how-to if you want to try this out yourself um okay yeah yeah 00:47:51.440 |
i'm dealing with the challenges of the wi-fi and working around that so let's talk about this see 00:47:58.320 |
so for lucky for all of you i spend way too much time making videos so i have a copy of the video 00:48:08.480 |
integration that i want to show so we built a rag agent you can ask questions right you want to be 00:48:14.400 |
able to use it one of the most common ways of being able to use it right that was introduced like six 00:48:19.920 |
months ago here um at ai engineer was mcp server so one of the things i like to do is kind of show 00:48:27.200 |
people how i can connect the rag agents that we built a contextual and you can use them inside of other mcp 00:48:33.520 |
clients and so here's an example of using it inside of claw desktop where now when it has this particular 00:48:40.960 |
topic it's going out to the contextual rag agent to be able to get the answer to do that so you can do this 00:48:48.400 |
working inside of clients like claude you can also be able to do this for example if you're working 00:48:53.280 |
with code you want to be able to do this in something like cursor that works as well 00:48:57.680 |
so to do this if you're like hey i want to do this i want to be able to use this rag agent everywhere else 00:49:05.440 |
to do that we have a repo that's out there so the contextual mcp server it's the link is in the notebook 00:49:12.800 |
as well so you can grab that it's a neat it's a very basic mcp server you could probably build your 00:49:18.400 |
own let us know um like that but the workflow is you clone that repo and then inside that repo there's 00:49:26.320 |
a server.py file and we're just using our contextual um our contextual apis and just pointing to that so 00:49:34.640 |
the important thing of course is that doc string that you kind of describe what your rag agent is about 00:49:39.360 |
because that's what your client whether it's claude or cursor is going to be able to use that so to make 00:49:44.560 |
make this a little bit more here's a copy for example inside of mine where 00:49:52.000 |
you can see in my local environment i have that mcp server locally hosted this is what my server.py file 00:50:01.600 |
looks like where you can see i've got two different one set up one for technical queries one for financial 00:50:09.040 |
queries depending on who i demo to and it can have the client then pull in to that piece there 00:50:15.120 |
so that's the the server that's sitting in the middle 00:50:19.680 |
the next piece is configuring it with your client every client's a little bit different like that but 00:50:26.320 |
you just want to be able to have that client point to where that server file is so if we look at for 00:50:32.000 |
example in the case of claude desktop which is not running now but luckily we can still do this 00:50:38.080 |
in claude desktop has a developer piece where you can edit the config for where it should look for mcp 00:50:47.120 |
servers and if i click here edit config i can go see the file that it's there we can open that up 00:50:54.720 |
we're not going to open it up in cursor but you get a sense that's the config file that's going to 00:50:59.280 |
point towards where my server location is so again if you're not sure about this i have all the directions 00:51:05.040 |
on the github there's an accompanying video as well but it just allows you to use that rag agent now in 00:51:10.720 |
lots of other ways as well for example if you're building your own deep research tool you want to be 00:51:16.800 |
able to do that you can do that all right we're going to open it up for questions here in a couple 00:51:23.360 |
of minutes and again if you're having any issues raise your hand we got a couple of people in the 00:51:27.440 |
back that can help out so one question often is is like what is this going to cost me like you give me 00:51:33.920 |
the sales pitch what is it going to cost me well if you use our individual components the parse the re-rank 00:51:40.400 |
the generate lm unit we've made that all consumption based you pay by the token for those so the pricing 00:51:47.520 |
is up on the website it's pretty straightforward like that when you've signed up today you started 00:51:52.960 |
off with a 25 credit so you should be able to go start playing around with it as well so those are the 00:51:59.600 |
individual components we're also making our rag platform consumption-based pricing as well so the 00:52:07.520 |
pricing will be based on the number of documents you ingest and how many queries you do so based on 00:52:13.840 |
that you'll be able to kind of calculate out know what your workload is like that now some we i work 00:52:19.520 |
with some enterprise customers for some sensitive places they're like hey we really need a guarantee 00:52:24.800 |
on latency or amount of queries per second like that this is where we can work with that team do a 00:52:31.200 |
provision throughput so that way you have dedicated kind of hardware for your particular use case like 00:52:36.640 |
that but i think one of the great things that's going to really open it up to developers is the 00:52:40.400 |
consumption base because just like you're doing today you're going to be able to sign up pass 00:52:44.960 |
documents through that and just pay for what you use like that 00:52:48.560 |
all right so some final takeaways i'm hoping that you got out of this that how you can treat rag just as 00:52:57.760 |
any other managed service you don't have to go through build out all those pipelines yourself like that 00:53:03.600 |
you've got the code you can get started now building this pipeline individually or if you just need parts 00:53:09.760 |
of it just the components try them out as well you your re-ranker you're not happy with give it a shot 00:53:14.880 |
so go try out the app to do that all of this stuff is documented over in docs we have kind of full api docs 00:53:23.680 |
over there we also have a bunch of example notebooks showing for example integrations with things like 00:53:28.960 |
ragas as well if you want to use that for evaluation so those example notebooks will walk you through hey 00:53:34.400 |
how do i improve an agent how would i use all of these settings if you need other example notebooks let 00:53:40.000 |
us know nina and i will work on that as well so finally fill out our survey kind of share your feedback as 00:53:46.720 |
well we've got some nice kind of merch we can hand out to people who have good questions 00:53:52.480 |
as well i think we got a little bit of time for that how does that sound 00:53:55.840 |
i've thrown this rag platform at you are you all ready to sign up okay yeah 00:54:08.000 |
go ahead there's a q a mic or i'll try to repeat back the question to the wider audience so if you 00:54:27.600 |
want my interpretation of the question or i can't massage it then is that on yes yeah um so there's a 00:54:37.120 |
really interesting feedback you're defining here around you can define evals via prompt you can see 00:54:43.600 |
how they're doing you can let people change the system prompt do you find that your users of like 00:54:47.520 |
the platform or the enterprise customers like who is it that's doing that is it like technical people 00:54:51.760 |
or is it the people that's using it day to day so one of the things by putting it in the ui what you 00:54:57.760 |
have is you have some of the business users will try to mess around and play with it and some pieces 00:55:03.040 |
like a system prompt like non-technical people have a sense of using chat gpt and doing that 00:55:08.320 |
but when you get to what's your my top k for my retrieval or using query reformulation those business 00:55:14.080 |
users are just going to mess up and their agents are going to kind of not stop work well like that so 00:55:19.040 |
a lot of the settings there as you get to the advanced hill climbing because a lot of that stuff 00:55:25.200 |
comes into like i've built a good rag agent it's hitting 80 percent but the business wants us to 00:55:30.560 |
get to 90 now i have to build the evaluation now i have to see where are my errors are they on the 00:55:35.440 |
retrieval side are they on the generator side and this is where having that developer perspective of 00:55:40.160 |
understanding and being able to do that error analysis is important to figure out which of those 00:55:45.040 |
settings to do it because we give you a lot of settings but you still have to have some sense of 00:55:49.760 |
like which settings are appropriate to change for what outcomes like that um company i work for is 00:55:57.520 |
relatively new to ai we're loving what we're discovering but uh it's also pretty overwhelming 00:56:02.800 |
the information you presented uh hints at some of the answers to to our major questions which are 00:56:11.680 |
we have thousands of pdf documents we have thousands of uh you know excel documents we have an mrp system with 00:56:18.800 |
the nescue ball back end we need to be writing custom queries for this environment to consume 00:56:24.400 |
i guess does your environment have a checklist of the approach for different data types and how to 00:56:32.640 |
package this information for your environment yeah i i can repeat back the question like that so 00:56:38.320 |
so the question is is you know and i'll paraphrase here getting into it we have lots of documents we have 00:56:43.760 |
excel documents pdf documents structured data like how am i going to make sense of that 00:56:48.640 |
and figure out what the best way to kind of use your platform is is that fair yes and so this is 00:56:54.320 |
where as a startup we have a couple of different levels we have we have the developer go figure it 00:56:59.760 |
out yourself we've given you the buttons the api commands to do that but this is where we've grown 00:57:04.960 |
and we have a dedicated team that we call customer machine learning engineers that all they do is work 00:57:10.160 |
with customers and they do that process of hey let's walk you through building your evaluation data sets let's 00:57:16.240 |
help you kind of hill climb and improve that so depending on what you need we have a team that's 00:57:21.920 |
just focused on helping customers through that process as well and in the meantime nina and i 00:57:26.880 |
are trying to document it and make it more available for the rest of the users but there is that gap of like 00:57:32.560 |
there's a lot of knowledge to effectively use these systems to be able to do that is that fair 00:57:38.560 |
and it's a consulting fee or a startup package yeah talk to us we we can we got we'll find a way to take 00:57:44.560 |
your money a little bit uh go ahead hey rajim um um we've built a few agents and i'm very interested in 00:57:58.800 |
exploring the rag only part of like what you described is there a way to integrate these into 00:58:04.720 |
our agents that we're building in typescript javascript absolutely so each of the okay yes 00:58:11.440 |
i don't have to repeat so each of the pieces here we have kind of apis we have a we have a javascript 00:58:17.600 |
sdk like that so if you just want to use the components of this this is where when the company 00:58:22.960 |
first started about a year ago if you'd asked to kind of do a demo we would have sold you like you have 00:58:26.640 |
to buy the entire platform you have to get the end to end but we've realized like if we want 00:58:30.800 |
to appeal to developers that lots of times you've built stuff out that you just want some components 00:58:35.280 |
of it so yes we we've kind of modularized that out so if you just want to use parts of it you can and 00:58:41.440 |
integrate with others like that so and in fact you know most of our customers don't use like the ui that 00:58:46.560 |
we've showed most of them have other uis that they're integrating their applications with like that so 00:58:54.240 |
well in response in addition to the costing or pricing um how far can the 25 dollars go like 00:59:00.960 |
because we weren't able to test it right now like how many documents can we you know experiment in it 00:59:06.480 |
so the question is like how far will your 25 go i i i don't know exactly i get unlimited use of it um but 00:59:17.920 |
try it out if if you run into issues just let us know we can we can talk to sales we can figure out 00:59:22.720 |
something like that if depending on what you're doing like that so don't let the 25 dollars be an 00:59:27.440 |
issue we want the we want you to use it we want to hear your feedback if you need help let we can 00:59:31.920 |
talk to you and figure out something to do like that so yes do people like the idea of a managed rag 00:59:39.040 |
service does this feel like something okay okay okay yes well to that point i guess i'm wondering how 00:59:45.840 |
managed is it so at my company we build rag applications for government and other clients 00:59:50.480 |
who have really strict data sovereignty data residency questions uh or requirements so are you in either of 00:59:58.080 |
the big clouds where i can decide to be in like a gov cloud or some other sovereign data service like 01:00:06.800 |
how much control do i have over where stuff lives yeah so again we're early stage startup we're starting 01:00:13.040 |
out right now we have our own cloud which doesn't help you like that we have partnered with snowflake so 01:00:18.160 |
we are on snowflake if that can work for you we can also install in vpc but right now we're limiting 01:00:25.360 |
it to vpc we're not doing kind of custom on-premises deployment just because that takes a lot of 01:00:31.120 |
upfront work to be able to do like that does that help yeah can you do vpc and say like aws gov cloud 01:00:39.040 |
or azura's government solution we have not yet done like aws in kind of golf cloud like that like if you 01:00:45.760 |
have a strong demand for that let us know we can we can work and try to figure out something like that 01:00:49.680 |
but we haven't taken on that yet okay i'll come find you thanks yes thanks 01:00:56.560 |
hey yeah go ahead i don't know hi i was curious you know like how does the performance of your 01:01:05.920 |
rack platform vary as a number of documents vary for example like do you have like recommendations of 01:01:12.000 |
best practices when you're working with millions of documents as compared to hundreds of documents 01:01:15.600 |
what kind of you know configurations and knobs work better would be curious to learn yeah so i'm 01:01:20.640 |
going to give the hand wavy answer of we have customers like qualcomm for example that have tens 01:01:25.440 |
of thousands of documents their documents have hundreds of pages in them and we handle that fine 01:01:30.240 |
all i know is that i have engineers like matthew in the back that works the platform that makes sure 01:01:36.400 |
it kind of scales up like that so i would say go grab him if you have questions about scalability right now 01:01:42.160 |
we've been able to kind of scale with our customers um along those lines like that so 01:01:46.320 |
let me get take this and then i'll come back over here hey this is specifically about the lm unit 01:01:53.440 |
um tool um how deterministic and repeatable are the are the is the scoring from that 01:01:59.200 |
i'm looking to see if they have i i mean it's still a large language model so i think there might 01:02:09.360 |
be a little bit but i think it's fairly good is that uh it's pretty repeatable there's actually a paper 01:02:17.280 |
release there's a lot more details in there too but like there was a lot of analysis done about the 01:02:23.200 |
correlation with other trusted metrics and you know how you know running sorry run it running the lm unit 01:02:30.240 |
tests repeatedly making like um we use kind of a random seed to make sure it's repeatable so like 01:02:36.480 |
like all of our testing has suggested that uh it is repeatable of course like 01:02:42.720 |
you're if you're like altering the prompt you're altering kind of the natural language unit test that 01:02:48.240 |
can have you know unforeseen impacts on the results but i think keeping the prompt consistent it should be 01:02:55.600 |
pretty consistent with different types of queries awesome thank you yeah check out the paper as as matthew said 01:03:01.360 |
that it goes into a lot of those pieces like that so thanks 01:03:04.080 |
question here no so is this already integrated with uh microsoft copilot and azure and the second 01:03:16.480 |
question is uh are there any apis that you guys expose and get meaning if we are developing our own custom 01:03:23.120 |
code but we want to you know use your rack and what's approach so we have the api so if you're developing your own 01:03:30.080 |
custom solutions you can easily integrate kind of what we have to do that in terms of installing 01:03:36.160 |
inside of kind of um azure like that that's one of those custom vpcs that we can support like that in 01:03:43.920 |
terms of integrations with copilot i don't know if we've really done anything usually like people are 01:03:48.160 |
like i'm sick of copilot and that's why they come to our rag solution like that so again we have the apis 01:03:54.080 |
though so i don't know what the integrations would require for something like copilot like that 01:03:58.160 |
thank you hi first of all a great presentation because i have used your product and the other 01:04:07.200 |
thing is like regarding rag i have a question like uh you as a company what are the challenges that you're 01:04:14.320 |
facing in the rag and where do you see that would go in future so what are the challenges we're facing 01:04:21.040 |
in rag apart from hallucination internet search or things like that apart from that what are the 01:04:26.800 |
different challenges that you're facing matthew i might have you chime in on this too i think 01:04:32.880 |
one of the big challenges right now is around the extraction where we've spent a lot of time and energy 01:04:39.680 |
and there's still i think room for improvement when we're talking about working with complex tables 01:04:45.280 |
along charts i think there's one there i think scalability is always a piece like that because 01:04:51.680 |
everybody wants to ingest more documents in a in a faster amount of speed like that 01:04:56.720 |
i think if you talk about doing generated um working with structured data i think that's a challenge for 01:05:05.840 |
everybody in the industry when you're trying to do those queries of like text to sql type queries like 01:05:10.640 |
that so those are some of the top ones on my list like that do you have yeah i agree i think document 01:05:16.720 |
understanding like making sure that you are correctly understanding long complex documents with hierarchy 01:05:22.400 |
and interrelations between sections and really large tables and gnarly figures like i think that it has 01:05:28.160 |
been a consistent thing that we've been working on and seeing improvements on but there's definitely 01:05:32.880 |
still room to grow there um i think another kind of broader change in the rag space that our research 01:05:38.960 |
team is very focused on and it hasn't directly translated into our product yet but it's kind of 01:05:43.680 |
coming soon is moving away from the more like static rag pipelines that raj like beautifully described in his 01:05:51.040 |
presentation toward kind of a fully dynamic flow with you know model routers uh at certain kind of inflection 01:05:57.360 |
points deciding which tools to use how many times to retrieve you know how to alter the retrieval config 01:06:03.920 |
in order to correctly answer the query so i think those sorts of like dynamic workflows will greatly increase 01:06:10.240 |
what you can answer with any rag platform um almost like deep research for rag uh is one way to think about it 01:06:18.720 |
and i think that's something that our research team is working very hard on and will be like coming into our 01:06:23.600 |
platform in the near-ish future great thanks come back in six months 01:06:30.720 |
do you have plans for a public facing like a web search feature particularly one that could be configured 01:06:41.040 |
so are what you are you looking for kind of a general i need to be able to search the internet because like 01:06:48.880 |
there's tavilli fire crawl places like that or it's more that companies that have public facing 01:06:54.400 |
websites where it makes sense to instead of running rag on your like cms database you're actually just 01:07:01.040 |
you want to run it on that live and so john i might pull you in here because i think one of the things 01:07:06.560 |
john's worked with quite a bit because a lot of what that requires is the integrations to be able to pull 01:07:11.520 |
from that webs from those sources as well and so i know john you've done some of those things for 01:07:16.240 |
customers with fire crawling pieces yeah sorry what was the question for me again about ingesting 01:07:21.280 |
customer websites and being able to use that in rag oh yeah so um there's a lot of like uh ways that 01:07:26.640 |
you can scrape for websites or like directly you're like um kind of set up like an etl pipeline with like 01:07:31.920 |
pulling the records from the apis or just pulling the unstructured data from like a data cloud um or blob 01:07:37.840 |
storage and then kind of just setting up like an ingestion queue which is what we do on our side for 01:07:43.040 |
larger customers um or basically any customer that needs like a kickstart for lots of data coming in 01:07:47.840 |
some customers are like super happy to write their own scripts or like their own kind of functions our 01:07:52.960 |
daily con jobs to update uh documents but we're also working on like a managed solution on our side so 01:07:58.640 |
that you can just give us your credentials and then on our front end and then we'll just start pulling 01:08:02.720 |
and syncing all the data on our back end but that's like something coming on the roadmap in like two to 01:08:06.480 |
three months and so please if there's pieces like that you see that you want us let us know because 01:08:12.400 |
we like to kind of push the product team to kind of make that stuff happen so all right yeah hi hi 01:08:20.400 |
how do you deal with uh frequently updated content and document level permissions if not everybody can see 01:08:28.480 |
specific documents yes so those are both tricky things one of the things we'll do for frequently um 01:08:36.400 |
frequently updated documents is we have a continuous ingestion pipeline john if you can talk to in the 01:08:41.280 |
back can talk to you more about that piece as well so that's one piece for the frequently updated for 01:08:47.920 |
the other piece is the harder one entitlements how do you deal with all the permissions right like you're 01:08:53.360 |
indexing hr rag stuff next to customer support stuff how do you make sure that the person doesn't look 01:08:59.680 |
up the salaries of anybody else like that so this is where within our platform one of the things we're 01:09:04.880 |
adding is an entitlements layer like that because as we've talked to lots of customers they're like 01:09:10.160 |
this is nice when you do it over our customer support but inside of a real enterprise we have 01:09:15.440 |
governance we have permissions we have to be able to respect that when we do these types of searches 01:09:20.240 |
and so we're adding an entitlements layer on top of ours to be able to handle that so yeah that's an 01:09:25.600 |
important piece all right hi uh so my question is regarding uh in last six months what are some of the 01:09:31.360 |
major breakthroughs in uh rag area what are the major breakthroughs in the last six months in the rag area 01:09:38.240 |
so i think the re-rankers continue to get better i think that's that's an easy place um where we've seen 01:09:47.520 |
that i mean i think part of it is just every piece of that has steadily been getting better i think one 01:09:52.800 |
of the biggest changes you've seen in the last six months year is the rise of the vision language models 01:09:57.360 |
and how strong they are at being able to handle kind of images like that that's that's off the top of 01:10:02.480 |
my head i see matthew's busy talking to somebody else otherwise i'd have him weigh in as well does that help 01:10:07.520 |
help yes um does the contextual parser module can replace like doc lane or gsp invoice parsers and 01:10:17.520 |
things like this and if if it does does it have the ability to parse qr codes and barcodes and instruct 01:10:24.640 |
their metadata to kind of read that data as well yes so i think one of the things is there's a lot of 01:10:31.920 |
different types of documents out there in terms of document types and there's a lot of different 01:10:35.520 |
solutions so i don't we're not going to be able to handle every type of thing perfectly like that i 01:10:39.520 |
think there's going to be lots of pieces but this is where we've made those apis so you can try that out 01:10:44.320 |
on those piece like we haven't specifically focused on for example that type of document we do have the 01:10:50.720 |
image captioning so maybe it would pick them up maybe it wouldn't like that so but the idea here is to go 01:10:56.400 |
in that same space where there's a lot of these other companies doing this parsing thing and making sure that 01:11:01.200 |
we have a module that you can just stick in and replace and kind of use with ours like that 01:11:05.120 |
yes how do you deal with the domain specific language that's a good question how do you deal 01:11:14.560 |
with domain specific language and this can get a little tricky and we've seen it for example when 01:11:19.360 |
we're working with technical companies that have very specific words that they use inside there where 01:11:26.240 |
on one hand you can do things like changing system prompts a little bit for the retriever part but 01:11:33.200 |
sometimes what we've done with some of those customers is fine-tune that grounded language model 01:11:37.920 |
that's closer to their vocabulary and how they speak and that's been one technique we've used now and this 01:11:44.080 |
is where at one point in the platform we had that available to end developers but fine-tuning models can get a 01:11:49.280 |
little complex for using for rag so we've kind of put that away now and that's more for kind of our 01:11:54.160 |
service-oriented customers that work with our customer machine learning engineers but that's one thing 01:11:58.720 |
we've done to if that knowledge is out there and you need to get that into your grounded language model 01:12:04.000 |
is to do a kind of a fine-tuning process step go ahead oh i don't know where the mic is we're making them 01:12:11.600 |
run getting their steps in today hey um so how about hipaa regulations like uh phi data like this kind of 01:12:23.760 |
stuff like how you deal with that should we hash it before go to the platform or the llm can can 01:12:31.520 |
understand and deal with that so there's two aspects that one is we just got hipaa certified 01:12:37.360 |
that people are going to think i'm paying the audience members out there um like that the second 01:12:42.080 |
is is well what do you do in extraction like maybe i have phi data i want you to automatically mask or 01:12:47.920 |
filter that so this is one thing that we've been talking with a product team about whether to include 01:12:52.080 |
that capability for during that parsing piece um to automatically kind of flag and identify that that 01:12:58.560 |
behavior right now we don't kind of have that in the product but again like if that's something you 01:13:03.040 |
need or want let us know it's not that hard to kind of build on but you can also do that as a supplement 01:13:08.640 |
to the parsing and have a second job that runs and looks for that pii information does that help good 01:13:14.240 |
uh i know garbage in garbage out but if we have a document with say two different numbers on a particular 01:13:22.560 |
like statistic or fact how does your rag deal with that or how would it answer a question yeah 01:13:28.720 |
so what what happens in that case is often the retrieval is going to bring both of those pieces 01:13:34.480 |
of information over to the grounded language model and it's going to be up to the language model to use 01:13:40.640 |
its reasoning ability to try to reason out the two differences so sometimes if one is for example if it's 01:13:47.040 |
obvious that one is not like let's let's let's say it's the size of a mosquito and one of the answers 01:13:52.960 |
is 100 feet and the other is right like three millimeters the rat the language model is going to 01:13:58.000 |
know like one of these is junk and into ignore it but if they're both close and reasonable yes that and 01:14:03.040 |
that can get you in trouble um with with those pieces like that so i assume this is a pain that you're having now 01:14:11.040 |
so we're just trying to just trying to see how we can organize that um because a lot of my job right 01:14:21.680 |
now is just telling people information that should be in a document but it's not updated so so one thing 01:14:28.160 |
that can help with that is taking advantage of metadata and using as much rich information about those 01:14:34.320 |
documents so when at retrieval time it can help it kind of figure out like hey these are the two 01:14:39.200 |
differences and this is why i should prioritize this answer over another answer like maybe this is the 01:14:43.760 |
most recent or this was written by the authoritative person like that like that's like one one thing 01:14:49.440 |
that we kind of recommend for those use cases like that thank you all right okay well thank you all for 01:14:56.880 |
staying i've loved the kind of the questions like this please use the application will give us feedback 01:15:03.040 |
on how you're finding that as well good bad negative happy to take that all like that so thank you 01:15:08.880 |
again for kind of spending your morning with us like that anything else we have nina or we good oh 01:15:13.440 |
you've got the mic off oh i think i still have my mic uh yeah i've been going around distributing swag 01:15:19.200 |
to folks that have asked questions sorry if i missed some of you that were further away i have one more 01:15:24.080 |
pair of socks anyone wants them the guy in the back wants the socks all right yeah all right thank you 01:15:34.000 |
all we'll be hanging around for a little while if you have that so