back to index

Forget RAG Pipelines—Build Production Ready Agents in 15 Mins: Nina Lopatina, Rajiv Shah, Contextual


Whisper Transcript | Transcript Only Page

00:00:00.000 | welcome everybody thanks for showing up bright and early this morning we're gonna make your
00:00:18.840 | life easier for RAG so just hang in there if you want to get started we have the link right
00:00:24.480 | there contextual.ai/aie25 that's gonna get you a notebook it's gonna get you the getting started
00:00:31.980 | link which will get you an API key that you'll be able to use for the notebook so feel free to jump
00:00:36.660 | ahead and start on those pieces like that all right so let me first introduce the team that's
00:00:44.100 | here so I'm Rajiv Shah I'm the chief evangelist of contextual AI so I do a lot of these talks
00:00:50.580 | workshops I endlessly talk about AI which I'm kind of in this position for background I've been in
00:00:57.240 | the AI space for a while before this I was over at hugging face for a number of years as well so
00:01:02.700 | besides me talking we also have Nina you want to introduce yourself this is my fourth week at
00:01:11.940 | contextual so very exciting for me thank you all for joining us this morning I've been working in NLP and
00:01:18.780 | language modeling for the past seven or eight years really since Bert was considered a large language
00:01:25.440 | model and I'm excited to show you how you can simplify your RAG pipeline with contextual today
00:01:31.920 | thanks and besides Nina we also have two other members of our team that are in the back that if
00:01:38.640 | you raise your hand and have questions they will come to you so one of them is Matthew who's a platform
00:01:43.440 | engineer so if you have questions about how we scale up what's it what are the things going on in the back
00:01:48.480 | end he's the person for you we also have John on our team who's one of the solution architects so if
00:01:54.240 | you're like hey how would I integrate this into my environment he's the guy who's gonna be able to handle that as
00:01:58.800 | well so let's kick this off what I want to do is for you to all get one of the big pieces out of
00:02:06.720 | this is we can treat RAG like a managed service we don't go out anymore and train our own language large
00:02:13.620 | language models if we're working with embeddings we don't build our own vector databases same thing with
00:02:20.260 | RAG we can treat RAG just like any other managed service and so what we're gonna do today is I'm gonna do this quick
00:02:27.360 | introduction here but then we're gonna get you started right away with building this RAG agent
00:02:31.360 | with ingesting some files and Nina and I will go back and forth where I'll go and give you a bit more
00:02:36.480 | of an overview of contextual you'll build an agent run a number of queries against it get a feel for it
00:02:43.200 | and then I'll do a deep dive on some of the advanced settings that developers like around extraction re-ranking
00:02:49.520 | retrieval like that and we'll end with a little bit on evaluation as well as how you can use MCP for
00:02:56.560 | example connect your RAG agent to Claude desktop as well now to start with we all know the value of AI
00:03:04.960 | right we're all here because of the importance of AI but for many of you also know the struggles of AI
00:03:11.680 | how it's easy to build that demo with 10 documents when you're doing RAG but when you go to scale that
00:03:18.160 | out all of a sudden it's hard to extract across thousands of documents of diversity right your accuracy
00:03:24.560 | isn't there your users are complaining that they aren't happy with it because they don't know how
00:03:28.240 | to query properly now what we had didn't we had contextual have focused on this so our founders
00:03:38.080 | here dao and aman come from working on rag for a long time dao was an author was leading the team on the
00:03:45.680 | initial rag paper aman was with him at meta i knew both of them from my time at hugging face where i was
00:03:52.720 | helping people do rag years ago on question answer systems and i would be able to pull their researchers
00:03:58.960 | in to help work with customers as i was cobbling together kind of open source components to build these
00:04:04.000 | pipelines well they saw the need in that they want to change how the world works and so contextual has
00:04:10.000 | come around for focusing on enterprise and we still have this dedication to ai research and you'll see
00:04:15.760 | that as we go through our product as well now for again just the basics just in case there's and
00:04:23.840 | somebody in here not familiar with it right when we're talking about rag retrieval augmented generation
00:04:29.520 | the value here is when you have lots of that unstructured enterprise data we want to be able
00:04:35.840 | to understand it so rag allows us to take that information a simple a very simple rag pipeline uses
00:04:43.760 | a vector database to keep all of that use cosine similarity to find similar pieces pass it to an llm
00:04:50.560 | that's the simplest we're going to get much more complicated today like that and what i want to do is
00:04:55.760 | show off our platform to you the platform is built for a couple of different levels so one is there's
00:05:02.880 | a no code so if you're a business user you just have a bunch of docs you want to be able to ask
00:05:07.200 | questions to them we've made it very simple with opinionated defaults on that now if you're a developer
00:05:13.440 | on the other hand you know what you want in your rag pipeline you're spending time evaluating
00:05:18.240 | it you're spending time tweaking it well we've built a platform where you can then orchestrate
00:05:24.160 | and change how do i want to do queering how do i want to do generation along the ways and finally
00:05:30.320 | some of you might already have rag you might already have rag pipelines in production where there's just
00:05:37.520 | one component that isn't working that well maybe your extraction isn't working that well or you want
00:05:42.080 | a better re-ranker well we've made our system modular so if you just want to use one piece of it you can
00:05:49.920 | and all we focus on is rag other companies focus on other pieces we just think about building rag and
00:05:57.680 | again the reason we're here is as you go and try to do this in production you'll see like once you get
00:06:02.640 | to lots of documents it gets messy all of a sudden then you're orchestrating a big environment full of
00:06:07.760 | lots of different models now i need a bm25 i need a re-ranker putting all that stuff together ends up sucking up your
00:06:14.880 | time it's fun to do the first time but after that it ends up sucking up your time ends up with slow kind
00:06:20.960 | of costly um rag pieces so that's enough to get started i'm going to turn it over to nina
00:06:28.240 | who's going to get you started on building that first agent
00:06:37.920 | all right we're on the clock 15 minutes to get our agent up and running um just kidding we'll take
00:06:43.680 | a little longer than that to explain everything but how you can get started is um you can find the
00:06:49.440 | notebook and the getting started page at contextual.ai slash aie25 and then the gui for contextual is at
00:06:59.600 | app.contextual.ai that's what that getting started button will point you to and so we're going to start by
00:07:06.160 | loading a few documents we're going to load some financial statements from nvidia and some fun
00:07:12.480 | spurious correlations to see how our reg system treats data that's not fitting with conventional wisdom
00:07:19.360 | and then we'll try out a few queries uh including quantitative reasoning over tables and some data
00:07:26.160 | interpretation and so i'm going to have these two screens here side by side the notebook on the left and the
00:07:35.680 | the platform on the right so you can get started by signing up in the platform if you haven't already
00:07:43.120 | that's at app.contextual.ai that's where you're going to get your api key so everyone take maybe
00:07:49.840 | a minute or so you'll need to log into that and you'll need to set up your workspace i already have my
00:07:55.760 | workspace set up and i'm actually using uh my right uh not my work account just so that you'll see exactly
00:08:02.880 | what a general access user would see on their screen on my screen and just a tip for setting up those
00:08:10.480 | workspace names it's going to have to be a unique name so you can't call it aie-demo i just throw my
00:08:17.680 | name into the names just to make it a little bit more unique so um so everyone just uh take a minute to do
00:08:24.480 | that and uh to help me get a sense of where folks are just briefly raise your hand and put it down
00:08:31.040 | once you have your workspace set up and you see this welcome screen
00:08:38.640 | okay folks are making fast progress um yes matthew
00:08:53.280 | so yeah so we can start by in the notebook doing our pip installs oh start by making a copy of the
00:09:12.160 | notebook so you can just do go to file save a copy and drive and then that will create a copy for you of
00:09:22.080 | the notebook so that you can save whatever changes you're making to it if you want to make any changes
00:09:26.800 | it is a bit slowly making a copy um but yeah after the pip installs and the imports
00:09:36.880 | to get your api key you go into your contextual app click on api keys here on the right
00:09:46.720 | save a copy of the notebook and then in our contextual platform on the right you can go to api keys
00:09:54.720 | click on create api key so nina nina key 2 is my name and then we'll copy that and then we can go to our
00:10:07.440 | notebook secrets on the notebook secrets on the left here the key there and then add a new secret and
00:10:13.120 | then you'll just add contextual underscore api underscore key yeah it's refreshing
00:10:21.280 | but if yours is working then uh you can i'll just do it in the notebook without making a copy contextual
00:10:31.040 | so that's what we're going to do and then we're going to do it in the notebook and we're going to
00:10:39.440 | do it in the notebook and we're going to do it in the notebook so we're going to do it in the notebook
00:10:42.640 | and we're going to do it in the notebook and we're going to do it in the notebook and we're going to
00:10:44.880 | if you want to brag on your own it's often an api scavenger hunt just to get started and this is
00:10:49.680 | literally the only api key we're going to use for the whole rest of the workshop so we have just finished
00:10:55.120 | the hardest part of setting this up
00:10:58.080 | so once we've saved those credentials then we're going to set up our client here so we're just going to
00:11:08.880 | run the client in our notebook so here i'm on the left in the notebook and then we're going to create
00:11:15.280 | our data store so you can change the name on this data store name variable and then it's going to see
00:11:25.200 | if there's an existing data store or load that one
00:11:32.000 | and then after that which seems to still not be running
00:11:45.760 | yeah after the pip installs and the imports
00:11:52.800 | that data store is going to be where you're going to be loading all of your documents
00:12:00.960 | and then we have some code here in step two that will download those files that i mentioned earlier
00:12:09.920 | to your notebook so that's going to be a few quarterly reports and
00:12:14.400 | spurious correlation reports and so here in step two you'll see that it's fetching the data
00:12:21.920 | and then uploading it to the data store and then you can go to your app.contextual.ai
00:12:28.160 | and this is the aie demo data store that i made just here so if you click on your documents you'll
00:12:37.520 | see that they're processing and once they're done processing you can click on these three dots and
00:12:46.480 | then once the files are loaded you'll be able to inspect them and see how those files have been parsed and
00:12:53.200 | ingested so i'll just take a quick look at the run i did yesterday of all these same exact files
00:12:59.680 | and so if you click on inspect once your files are loaded for example if you have a table you're going
00:13:10.960 | to see the raw text and then also the rendered preview so you can see this table
00:13:16.160 | it has both the structure and the contents of the table in this image below and you're welcome to take
00:13:23.200 | a look at these the numbers match very perfectly on the structure so that's kind of step one of getting
00:13:29.200 | your rag system set up is having really accurate parsing and you can look at some of the image files we
00:13:36.880 | chose some spurious correlations which are a fun way to test how much a rag system will actually
00:13:42.000 | actually answer based on your own documents rather than its conventional wisdom and so you can see
00:13:48.080 | there also there will be raw text and rendered previews with like key data points from figures and
00:13:56.240 | summaries of the figures in addition to the text and other information extracted so while your files are
00:14:04.320 | adjusting raj will come back and expand on the uh on the platform all right it seems to me i need to update
00:14:17.760 | this slide with wi-fi issues as well but any of you who've built rag systems before know that there's several
00:14:24.960 | components of it there's a lot of tasks you have to do right you have to think about building out an extraction pipeline
00:14:31.120 | besides that when you're thinking about accuracy how am i going to chunk out what am i going to do
00:14:36.160 | for chunking what am i going to do for re-ranking and then finally have to think about scaling it there's
00:14:40.960 | a lot of these issues that come up and again it's interesting the first time you do it but maintaining
00:14:45.760 | this stuff after a while gets tiresome and that's where contextual comes in where it's really an end-to-end
00:14:51.520 | platform for managing rag and our platform runs in our sas it can run in your vpc we have a ui for
00:14:59.360 | those business users but we also have an api rest api endpoints as well and when within the platform
00:15:06.720 | we manage all those pieces for you so you don't have to worry about maintaining a vector store for
00:15:10.960 | your embeddings and the way it works is you'll come with your structured and unstructured data
00:15:15.760 | we bring that in we have a document understanding pipeline and that's the extraction part that we're
00:15:21.920 | going to walk through here where we need to cleanly be able to pull all that information out of your
00:15:26.640 | documents your tables your images once we have that we chunk that up and then we go through
00:15:32.960 | the best practices around retrieving so we have a mixture of retrievers so bm25 as well as an embedding
00:15:39.760 | model we have a state-of-the-art re-ranker that we've trained ourselves that's in our pipeline and then we
00:15:46.240 | pass it all to a grounded language model now we don't use a model from open ai or gemini we trained our own
00:15:53.520 | grounded language model because for rag use cases we want the model to be grounded we want it not to
00:16:01.680 | give its own advice just because it can pass the law exam or know everything about medicine we don't
00:16:06.800 | want it to use its knowledge when it's answering questions we want it to instead stay grounded
00:16:12.000 | respect the context that we're giving like that and finally there's lots of different ways we can output this
00:16:18.480 | and i'll show you this along the way as well now for each of these parts we have that academic
00:16:24.720 | pedigree as i mentioned earlier so one thing we always do is look at the academic benchmarks nowadays
00:16:29.440 | we have lots of customer data set points that we use as well but whether we think about the end-to-end
00:16:34.720 | accuracy of our platform or the specific parts around document understanding retrieval grounded
00:16:40.880 | generation each one of those we focused on to make them state-of-the-art for what they do like that
00:16:47.600 | so once we've done that when you go to use them there's a lot of different uses that you can have
00:16:55.200 | for this i think often when we think of initially of of rag we think about i'm going to build a question
00:17:01.920 | and answer chatbot right like i'm going to ask a question get back an answer well yes that's one
00:17:08.240 | use of what you can do with this but we have customers like qualcomm who use contextual on their
00:17:15.040 | website so if you go live to their website right now and you ask a question there it's powered by
00:17:20.960 | contextual on the back end and you can see similarly the question the answers right our feedback modules
00:17:27.040 | sources are all available there besides that we have other customers for example some folks in the finance
00:17:34.480 | financial spectrum my audio and they're thinking about automated workflows how can i take all the
00:17:42.080 | unstructured data i have structure it to make it useful for folks well we have apis you can hook up
00:17:49.680 | those apis be able to run them against here is it good ah okay no worries but we can take all that
00:18:01.200 | unstructured use our rag agents in the context of something like a spreadsheet or some other workflow
00:18:06.320 | to be able to use that finally we can connect it with lots of other tools
00:18:10.960 | i'm going to put this in there even though i know it's not going to work
00:18:16.400 | but just for all of you we can show you and i'm going to show you at the end here today how you can
00:18:20.880 | for example integrate the rag agents with something like claude with mcp so that way you can take
00:18:26.960 | advantage of the enterprise knowledge that you have in there all right so with that i'm going to hand it
00:18:35.200 | back over to nina and let's get you started on your agents
00:18:45.280 | looks like the wi-fi might be down
00:19:03.920 | it looks like the speaker wi-fi and my hot spot are not working so we'll just continue with the
00:19:25.600 | notebook and we'll talk through it and then i guess is anyone else's wi-fi working raise your hand if you
00:19:32.400 | have wi-fi okay well you guys can follow along you have the videos too right mm-hmm i do have oh oh
00:19:40.240 | it's back up okay cool um so yeah where we were in the notebook on the left here is um hopefully all of
00:19:50.000 | your uh data stores have loaded now um yes so assuming you had wi-fi to upload them uh these are the ones i just
00:20:01.600 | loaded in the workshop earlier and then in the notebook on the left you can see um that you can
00:20:06.880 | access these files uh through the api just the same as you can through the gui and then you can have the
00:20:12.480 | document metadata like when it was created the name uh the status of the ingestion and now we're going to
00:20:19.120 | create our agent uh so over here we have our system prompt uh this is kind of just our default system
00:20:25.360 | prompt that the agents are loaded with and then you can run this next block of code here on the left
00:20:31.920 | to set up your agent you can change the name if you'd like i called it demo-aie and then in on the right
00:20:38.800 | in our gui if you click on agents uh there it is there's my new agent right here that i just created
00:20:45.840 | and um and then you can put in your question based on these documents so for example um one question
00:20:54.640 | that we can ask is what was nvidia's annual revenue by fiscal year 2022 to 2025 and so uh what you'll
00:21:03.360 | notice here is we have our responses what we were provided was quarterly data so what the agent is
00:21:10.400 | doing is adding that up and then listing it here on the right and then noting that this is based on
00:21:16.240 | quarterly data and then you'll see these little numbers in a circle the two and the one so if you
00:21:20.880 | click on the two here you'll actually see the image that this data came from and similarly if you click
00:21:29.440 | on one you'll see the other image so interestingly this came from two separate files that were unrelated
00:21:34.960 | in any way they actually don't even have like a standard naming convention for these i pulled these
00:21:39.200 | from the nvidia financial reports website and it knew to reference those documents to pull that
00:21:45.440 | information across documents and then you can do the same exact thing that we just did in the gui
00:21:52.160 | programmatically so here in step four we're running our query uh and we're getting the exact same
00:21:58.960 | response in the api so this can make it easier to integrate it into your um into your application
00:22:05.120 | and so we can also here visualize which files were used to reference in the api and so now we're printing
00:22:14.480 | the documents here the base64 encoding is getting returned with another api query so just because it's
00:22:22.560 | a more fun and visual demo uh we'll ask the next few questions via the gui here so uh nvidia used to
00:22:31.280 | be a gaming compute company so i'm gonna ask the chat bot when did nvidia's data center revenue overtake gaming revenue
00:22:39.680 | and so um the crossover point was in q1 fiscal year 23 uh with data center revenue at 3.7 million and gaming
00:22:51.520 | revenue at 3.6 and then here i can click on the one here so that's q1 fiscal year 23 and we can look at our
00:22:59.600 | slide here and here it is that's actually like the exact crossover point where uh data center revenue overtook gaming revenue
00:23:08.880 | now we're going to look at the spurious correlation files uh that we've also loaded so for those of you
00:23:16.880 | unfamiliar this is a website that uh basically data crawls and finds correlations between things that have
00:23:24.080 | no correlation so we're searching for what's the correlation between the distance uh between neptune
00:23:31.120 | and the sun and burglary burglary rates in the us that would be some extreme astrology there if that
00:23:38.320 | actually determined burglary rates and so what you'll find is that our rag system that's really really
00:23:46.160 | focused on avoiding hallucinations um and really following the data that you've loaded it will give
00:23:52.320 | you that correlation coefficient which is very high but then it will also share that context um so
00:23:58.800 | otherwise elsewhere in the document they talk about how they're doing data dredging and um that this is not
00:24:05.200 | really a valid statistical correlation and so you can then also reference where in that document these claims
00:24:12.240 | are made both the statistical correlation and the uh and the caveats which are not loading i think from wi-fi
00:24:22.800 | issues but you can test this out in your own instance uh as well um we have another fun one that was uh what's the
00:24:31.760 | correlation between global revenue generated by unilever group and google searches for lost my wallet that's
00:24:38.880 | another spurious correlation that we loaded and it will do the same thing it will give you the actual answer
00:24:45.760 | and then note the caveats and i'll note that if you run these same queries in chat gpt here on the right
00:24:53.680 | if you just ask the question without any files not doing rag it will just start with telling you there's
00:25:00.240 | no meaningful correlation we know that's true but think about if your own documents don't follow
00:25:05.680 | conventional wisdom or have some information that's not out there it's going to argue with the information
00:25:10.560 | in your documents rather than presenting it even though it may not be may not actually be true in this case
00:25:18.000 | and then if you do a document upload and use the long context in chat gpt you will get that response of the
00:25:26.560 | actual correlation but then it will just say it's a spurious correlation and it won't really go into why
00:25:33.600 | so it's not really going to um you know a simple system like this is not really going to hew to the
00:25:39.280 | facts that you've presented it with um and uh for a fun question we just searched about uh global revenue from
00:25:50.160 | unilever group and google searches for lost my wallet does this imply that unilever group's revenue is
00:25:56.320 | derived from lost wallets and it will answer no that's not true um which we know and then if we ask
00:26:03.520 | another query that relates to different documents that don't have any information shared between them
00:26:09.520 | now we're looking at the correlation between the distance between neptune and the sun and global revenue
00:26:14.800 | generated by unilever group and it will not answer that question it will just say it does not have
00:26:19.680 | that information and so you know these are all uh pretty straightforward questions and answers but um
00:26:27.360 | you set up with our default settings but if you go to your agent and you click on edit in the admin panel
00:26:34.880 | there's actually a lot that you can change if you know for example that last question i asked if there was
00:26:40.960 | data that supported it and it wasn't found or something uh you can check which data stores are
00:26:45.920 | linked so you can link multiple data stores you can adjust the system prompt and that is also something
00:26:52.560 | you can do via api that we'll go over briefly later there are some settings on query understanding so we
00:26:59.520 | disabled multi-turn in our setup just to have everyone have the same consistent responses as they're trying
00:27:04.960 | to make the workshop easier to follow um you can set up query expansion or decomposition uh you can set
00:27:14.400 | some retrieval settings settings settings for the re-ranker settings for the filter including the prompt
00:27:21.520 | generation settings and you can even set up a user experience where you have suggested queries for the user
00:27:28.240 | um and so uh going back to our notebook here um what we have next in the notebook is going to be
00:27:39.120 | examples of code for some of the components so these are the individual components that make up the
00:27:45.200 | full rag system we won't run through this in the workshop i'm just going to share what we have here
00:27:50.640 | before raj does a deeper dive into what these components can do so for each component we've shared a link
00:27:57.200 | to a full a more complete notebook that lets you use more features of that component we have our benchmarks
00:28:04.160 | that compare it to other solutions and then we have a little bit of example code that will run in the notebook
00:28:09.680 | to try out these components alone and you don't need to set up the it's the same api key that we set up before
00:28:16.240 | so everything will just run i will just say you want to uh let your file process before you display it
00:28:22.000 | but in this example we're just printing the first page of the attention is all you need paper we have
00:28:27.760 | our re-ranker which is the world's first instruction following re-ranker and that's also a standalone
00:28:33.520 | sub-component that you can use independently of the full platform we have our generate model that is
00:28:39.040 | according to the effect grounding benchmark the most grounded language model in the world
00:28:44.400 | that's available as a standalone component and here's some sample code for that and then after raj's deep
00:28:51.920 | dive on the platform and components we'll go over the lm unit the natural language unit testing model
00:28:58.400 | available as an endpoint that we will use for our evaluation so yeah back to raj for the next portion of
00:29:09.360 | the workshop all right yep all right i just need this one
00:29:18.720 | all right if anybody knows me i'd love to kind of talk and get into the technical pieces and
00:29:26.080 | what's going on a little bit deeper so this is useful for learning about contextual but if you're even
00:29:30.560 | building your own rag pipelines as well all these components come into play um to start with now
00:29:37.600 | to start with how many people have had issues with hallucinations in their rag pipelines
00:29:41.840 | it's just been a few have people not have issues not had issues okay yeah i think everybody yeah finds
00:29:50.800 | this now when we start thinking about hallucinations there's a couple of ways we can think about how to
00:29:56.960 | prevent them the first is making sure we retrieve clean information if we don't do a great job at extraction if we
00:30:05.280 | don't do a good job at retrieval well then we're not going to give good information to that generator
00:30:10.160 | model it makes it the generator model will then put its own we'll try to answer it itself leading to
00:30:16.960 | hallucinations so that's one big piece the next is that language model itself and we'll talk about that
00:30:23.440 | is how we can ground that language model so it refers back to the context that it was given rather than
00:30:29.520 | substituting its own judgment and then i want to talk about having some checks in place because even if
00:30:35.600 | you do that you still want your end users to be able to trust the system so i'll talk about groundness
00:30:41.120 | checks as well as you've already seen it inside the platform the bounding boxes for attribution as well
00:30:49.200 | so to start with when you first build your first rag project right you need to extract from some pdfs
00:30:54.800 | maybe you've used like an open source pdf extraction tool have folks used like open source extraction tools
00:31:01.360 | like that pdf plumber things like that not that many have people been happy with them in terms of how
00:31:08.160 | they've worked for tables and charts yeah it's the document on the left is the easy one right that's
00:31:17.360 | what you show in the demos but your folks are going to show up with complex tables multimodal and that's
00:31:23.680 | when it gets much trickier because when you get to these types of documents if you have a parsing error
00:31:30.400 | in that table and things get shifted over right that's going to cause problems that you can't fix later on
00:31:37.280 | if your vision language model hallucinates and adds some other information in there when it's reading
00:31:42.960 | that multimodal chart right you're stuck with that right like that's going to lead to downstream problems
00:31:48.160 | so making sure you have very good extraction is like top of the list when you get to complex documents
00:31:53.760 | and so this is this is how i like to think about kind of what we're doing at contextual right now
00:32:00.800 | it's an evolving system we update like every couple weeks the engineers probably don't exactly like
00:32:06.400 | how i've diagrammed it like this but at least for me the mental model works where say you have a pdf
00:32:12.720 | document the first thing we're going to do is add some metadata around it right like when was it created
00:32:18.720 | what's the file name once we have that then you want to do a layout analysis is this a document that's
00:32:26.240 | image only that we need to do an ocr for does it have images maybe technical descriptions multimodal charts
00:32:34.720 | graphs that we need to add image captioning where we're going to get a text description of those features
00:32:40.960 | if you have tables we have a special table extraction mode that focuses on getting high quality results out
00:32:49.120 | of tables now beyond that the structure of the document also carries meaningful information
00:32:55.360 | right what are the section headers what are the different subsections in there using that information
00:33:02.160 | can make your make your extraction and go a lot better now once we have that we create a nice markdown
00:33:10.320 | kind of json version of that i'll show you that in the platform here but then for rag use cases we're
00:33:17.440 | going to take those documents and we're going to chunk them down into smaller pieces to work with right
00:33:22.000 | because lms as much as the long context is growing you can't take some of these large documents and chunk
00:33:27.200 | them in and especially there's the overall compute cost of trying to use those long contexts fully
00:33:33.440 | so we create chunks we use some of that metadata that we've created inject that in like knowing
00:33:38.720 | the hierarchical structure of that along the way we set bounding boxes this is one reason when we do
00:33:45.280 | image captioning for example we don't want to take the entire whole page and do that we like to know
00:33:50.880 | where the images are so we can do bounding box so the user knows when we say sales went up by you
00:33:56.800 | know 10 10 10 they know where on the page exactly it was
00:34:00.640 | so within the platform and i know you can't get to it right now we have what the components you can see
00:34:09.360 | on the left side let me see if i can kind of right there all the individual parts of our platform you can
00:34:19.040 | work with individually and we have a playground for that so this is the parse piece here where you can just
00:34:24.480 | take a pdf document decide how you want the extraction i just want text or hey i want the full works where
00:34:30.720 | i want image captioning as well or i have long tables i want the table extraction mode and you can do that
00:34:36.880 | right inside the ui see what the results look like now of course right when developers we also have an api
00:34:43.360 | python sdks javascript sdks as well so you can do this programmatically as well but that's the first step
00:34:52.000 | that's the first step here is the first step here is the query reformulation and so here depending on if
00:34:59.680 | you're doing multi-turn we need to take into account the previous conversations query expansion maybe the
00:35:20.560 | types of users are doing lots of abbreviations inside your company so we need to take that query make it
00:35:34.240 | a little richer and fuller before we pass that in to get good results or on the other hand maybe your
00:35:40.640 | folks are putting in long complex queries where what we want to do is take that query break it up into smaller
00:35:47.280 | sub queries answer each of those so this is where you have the flexibility you can turn these knobs
00:35:52.720 | on and off as you want as nina was showing you in the edit to use those now once you've gone through
00:35:59.120 | that process that long query might that you have might break into three sub queries each of those
00:36:05.680 | reformulated queries then passes through your semantic search and lexical search so hybrid search kind of
00:36:12.320 | classic best practices there and then we give you the option of using a what we call a data store filter
00:36:19.120 | because maybe your query is asking what's the revenue of apple computer and your data store has
00:36:25.840 | apple microsoft tesla all those documents well if you already know they they're interested in the apple
00:36:32.080 | company why not just filter out all the other documents so you don't have to worry about that
00:36:36.720 | that's where that data store filter comes into play now typically for retriever our defaults are
00:36:42.160 | something like a hundred chunks that will come back retrieval's fast you can do that but then to get
00:36:49.360 | better accuracy we like to use a re-ranker and again this is best practices around rag we've trained our
00:36:54.560 | own the instruction following re-ranker and that helps you go from that 80 and like the default is 15 but
00:37:00.480 | again you can change it for your application and give you let's say 15 high quality results that you
00:37:06.960 | can then pass all the way through to the end we have another filter stage if you want to for example drop
00:37:12.160 | out some of those as well so this is all of these pieces are orchestrated with lots of different machine
00:37:19.200 | learning models like this so this is where it's like a pain to build this for yourself but we've built this
00:37:24.320 | like this now if i had the time i'd go into all those pieces in depth but i want to highlight a
00:37:29.600 | couple of things i like our re-ranker i think is interesting where you have the ability to provide
00:37:35.520 | it instructions so a common one might be for example if you're asking questions about apple and you want
00:37:41.120 | to get the most recent documents so you can give it a prompt and then when it re-ranks those it's going
00:37:47.120 | to look at the contents of the chunks and use that prompt to help re-rank it so a lot of possibilities
00:37:53.120 | kind of with this and we're kind of excited to have this piece there now the last step is generation
00:38:01.520 | so we've retrieved back all of the query next we're taking it to our grounded large language model again
00:38:09.040 | the notion here is we want a large language model that's been specifically fine-tuned on to to respect
00:38:16.480 | the knowledge that it's given not give its own knowledge like that besides that we have the two
00:38:22.800 | pieces that i've talked about earlier the groundness and the attributions you've already seen the
00:38:26.160 | attributions and i'll talk a little bit more kind of about groundness now the grounded language model
00:38:31.680 | since we trained it ourselves we got to kind of do some different things with it so one thing we've done
00:38:36.560 | is the language model will tell you the difference between facts and commentary so when you have some
00:38:44.560 | text it will look at that answer and see hey this is the important facts in here the rest is just other
00:38:50.720 | commentary so this is where when you're working with this if you want i need a super grounded
00:38:56.080 | model like i just want the facts in my rag system i don't want any of the superfluous stuff like you
00:39:01.600 | can turn the setting on to eliminate the commentary and just focus on the facts for example so this is
00:39:08.560 | where kind of training our own gives us flexibility to do this kind of thing
00:39:11.920 | the second thing is is the groundness check we have so the groundness check works directly in our ui
00:39:20.880 | you can also use this through the api where for every response that comes back what we do is we look
00:39:28.400 | sentence by sentence essentially and see is the claim that's mated found in the documents that were
00:39:35.120 | given so here if you look you'll see the bottom sentence is in yellow and it's in yellow because
00:39:42.560 | the claim it's making is a totally bs claim that it just made up right it's not actually in the source
00:39:49.200 | documents i literally prompted it to make up something and so here we'll automatically in the ui highlight
00:39:55.680 | that's yellow so if you have users that you're worried about like they keep falling for these
00:40:00.720 | hallucinations this is kind of a helper feature inside there and the and the way it works is when
00:40:08.000 | we retrieve all the context and again you have access to all the chunks all the things that are retrieved
00:40:12.720 | that answer is going to be generated but then we're going to decompose that answer into specific claims
00:40:20.320 | so we have a model that just does this decomposition and looks and sees is this claim
00:40:25.600 | grounded back in that context or not so we can run this live at time be able to get those groundness
00:40:32.560 | scores show you that with the response as well all right nina has got more on evaluation like that
00:40:55.520 | so for our last step of setting up our rag agent we will get to the evaluation step so we're going
00:41:09.840 | to use our lm unit model for natural language unit tests which may be a little bit different from how
00:41:15.200 | you've been doing evaluation of your rag system before so we have a fine-tuned model as a judge and
00:41:21.360 | that has state-of-the-art performance on flask and big gen bench and the way that it works as we'll get
00:41:27.280 | into in the notebook shortly is that you'll put in um the prompt and get the model response as your rag
00:41:35.040 | system would generate and then we're going to create unit tests for the prompt that will ask specific
00:41:42.640 | questions that you want tested about those responses then we're going to use the lm unit model to evaluate
00:41:49.280 | these unit tests and then we can look at those scores so we'll be fully in the notebook now and
00:41:55.440 | i've just given up on the wi-fi now i'm just going to go through what i ran earlier so we can just look
00:42:01.680 | at the results uh if you had internet and could run this all on your own so
00:42:08.240 | so in this section we have uh we have commented out the code that we used to generate the data set
00:42:17.280 | so that has our set of questions about the data set uh six questions and then this is the section that
00:42:25.520 | will generate those results from those queries so to save some trees we just ran this ahead of time
00:42:33.040 | and saved it in our github so you could load it directly um and uh that's uh at this file here we
00:42:42.080 | have this eval input csv um and then that just has prompts and responses for six queries from that same agent
00:42:51.360 | that we set up earlier and the unit tests we're going to ask i thought these would be interesting
00:42:58.320 | ones for this uh for this document set and the queries does the response accurately extract
00:43:05.680 | specific numerical data from the documents does the agent properly distinguish between correlation and
00:43:14.000 | causation that could be useful if you have some sort of statistical analysis in your documents our
00:43:19.840 | multi-document comparisons performed correctly with accurate calculations our potential limitations or
00:43:26.560 | uncertainties and the data clearly acknowledged our quantitative claims properly supported with specific
00:43:33.760 | evidence from the source documents and does the response avoid unnecessary information
00:43:40.640 | we know that uh llms like to blab on and on in their response so this is a good unit test i think overall
00:43:48.160 | some of the other ones you can set based on your own documents if you want to try this out later
00:43:53.520 | and so we just set up our unit tests here um and then the lm unit model will score on a one to five
00:44:03.840 | scale and it's been fine tuned for this specific task and so then the next block of code that you would run
00:44:11.360 | would be this uh response generation so we put in our query what was nvidia's data center revenue in q4
00:44:19.200 | fiscal year 25 and then you get your response and that does have the uh the correct amount the citation
00:44:29.600 | but then it has a lot of other information and so the unit test we used was does the response avoid our
00:44:35.840 | necessary information and the score was 2.2 out of five so for example if we wanted to if this was an
00:44:42.720 | important criteria for us then we could later update the system prompt or change some of those other
00:44:47.760 | settings to specifically target this um the results from this unit test of course we don't want to operate
00:44:54.240 | from an n of one so um we're now testing this batch of six which of course again is a small batch this
00:45:00.720 | was meant to run in uh real time if we had internet um and this would just run through all of those six
00:45:09.520 | queries and all of those six unit tests to get the results then this is the line here um run unit tests
00:45:17.520 | with progress this uh will give you the results from those unit tests and so here we can look at those
00:45:25.280 | results we can see the prompt response and um and then we can see the scores here for each of our six
00:45:36.800 | unit tests and those are going to be from one to five um and then we can also then map those long sentences
00:45:45.840 | to just a one word category so just accuracy causation synthesis limitations evidence and relevance you
00:45:53.600 | can set these as you'd like and we're going to create polar plots that show us for example for our second
00:46:00.000 | question what is the coefficient between neptune's distance from the sun and u.s burglary rates we can
00:46:07.120 | see in the polar plot that the causation was very well addressed by our model the accuracy and limitations
00:46:14.800 | also scored high on the unit test but the synthesis was not as highly scored um and these other factors
00:46:22.320 | were not as highly scored so we can look at all six and uh we can kind of see some of these queries
00:46:28.640 | scored really well across all our unit tests others may point to areas to improve we linked earlier to
00:46:35.360 | the full lm unit notebook where you can look at these categories and cluster them so for a larger scale
00:46:42.640 | evaluation data set you could actually like look for um for meaningful insights insights from this unit test
00:46:49.600 | um and so uh your first homework if you've made it this far is to improve the agent based on the unit test
00:46:58.240 | results and so one way that you could do that in the api would be uh here to update the system prompt so i
00:47:05.600 | have our original one and you could you know change something here like hey keep the responses really
00:47:11.360 | brief only answer um uh the direct question and then you can run this line of code here to update the agent
00:47:20.400 | and um and yeah that's our that's our end-to-end pipeline and now for a bonus raj is going to show
00:47:28.640 | you how you can connect your agent to claw desktop via mcp and for that we have links to our github
00:47:36.560 | here and the youtube how-to if you want to try this out yourself um okay yeah yeah
00:47:51.440 | i'm dealing with the challenges of the wi-fi and working around that so let's talk about this see
00:47:57.280 | if we can get this running
00:47:58.320 | so for lucky for all of you i spend way too much time making videos so i have a copy of the video
00:48:08.480 | integration that i want to show so we built a rag agent you can ask questions right you want to be
00:48:14.400 | able to use it one of the most common ways of being able to use it right that was introduced like six
00:48:19.920 | months ago here um at ai engineer was mcp server so one of the things i like to do is kind of show
00:48:27.200 | people how i can connect the rag agents that we built a contextual and you can use them inside of other mcp
00:48:33.520 | clients and so here's an example of using it inside of claw desktop where now when it has this particular
00:48:40.960 | topic it's going out to the contextual rag agent to be able to get the answer to do that so you can do this
00:48:48.400 | working inside of clients like claude you can also be able to do this for example if you're working
00:48:53.280 | with code you want to be able to do this in something like cursor that works as well
00:48:57.680 | so to do this if you're like hey i want to do this i want to be able to use this rag agent everywhere else
00:49:05.440 | to do that we have a repo that's out there so the contextual mcp server it's the link is in the notebook
00:49:12.800 | as well so you can grab that it's a neat it's a very basic mcp server you could probably build your
00:49:18.400 | own let us know um like that but the workflow is you clone that repo and then inside that repo there's
00:49:26.320 | a server.py file and we're just using our contextual um our contextual apis and just pointing to that so
00:49:34.640 | the important thing of course is that doc string that you kind of describe what your rag agent is about
00:49:39.360 | because that's what your client whether it's claude or cursor is going to be able to use that so to make
00:49:44.560 | make this a little bit more here's a copy for example inside of mine where
00:49:52.000 | you can see in my local environment i have that mcp server locally hosted this is what my server.py file
00:50:01.600 | looks like where you can see i've got two different one set up one for technical queries one for financial
00:50:09.040 | queries depending on who i demo to and it can have the client then pull in to that piece there
00:50:15.120 | so that's the the server that's sitting in the middle
00:50:19.680 | the next piece is configuring it with your client every client's a little bit different like that but
00:50:26.320 | you just want to be able to have that client point to where that server file is so if we look at for
00:50:32.000 | example in the case of claude desktop which is not running now but luckily we can still do this
00:50:38.080 | in claude desktop has a developer piece where you can edit the config for where it should look for mcp
00:50:47.120 | servers and if i click here edit config i can go see the file that it's there we can open that up
00:50:54.720 | we're not going to open it up in cursor but you get a sense that's the config file that's going to
00:50:59.280 | point towards where my server location is so again if you're not sure about this i have all the directions
00:51:05.040 | on the github there's an accompanying video as well but it just allows you to use that rag agent now in
00:51:10.720 | lots of other ways as well for example if you're building your own deep research tool you want to be
00:51:16.800 | able to do that you can do that all right we're going to open it up for questions here in a couple
00:51:23.360 | of minutes and again if you're having any issues raise your hand we got a couple of people in the
00:51:27.440 | back that can help out so one question often is is like what is this going to cost me like you give me
00:51:33.920 | the sales pitch what is it going to cost me well if you use our individual components the parse the re-rank
00:51:40.400 | the generate lm unit we've made that all consumption based you pay by the token for those so the pricing
00:51:47.520 | is up on the website it's pretty straightforward like that when you've signed up today you started
00:51:52.960 | off with a 25 credit so you should be able to go start playing around with it as well so those are the
00:51:59.600 | individual components we're also making our rag platform consumption-based pricing as well so the
00:52:07.520 | pricing will be based on the number of documents you ingest and how many queries you do so based on
00:52:13.840 | that you'll be able to kind of calculate out know what your workload is like that now some we i work
00:52:19.520 | with some enterprise customers for some sensitive places they're like hey we really need a guarantee
00:52:24.800 | on latency or amount of queries per second like that this is where we can work with that team do a
00:52:31.200 | provision throughput so that way you have dedicated kind of hardware for your particular use case like
00:52:36.640 | that but i think one of the great things that's going to really open it up to developers is the
00:52:40.400 | consumption base because just like you're doing today you're going to be able to sign up pass
00:52:44.960 | documents through that and just pay for what you use like that
00:52:48.560 | all right so some final takeaways i'm hoping that you got out of this that how you can treat rag just as
00:52:57.760 | any other managed service you don't have to go through build out all those pipelines yourself like that
00:53:03.600 | you've got the code you can get started now building this pipeline individually or if you just need parts
00:53:09.760 | of it just the components try them out as well you your re-ranker you're not happy with give it a shot
00:53:14.880 | so go try out the app to do that all of this stuff is documented over in docs we have kind of full api docs
00:53:23.680 | over there we also have a bunch of example notebooks showing for example integrations with things like
00:53:28.960 | ragas as well if you want to use that for evaluation so those example notebooks will walk you through hey
00:53:34.400 | how do i improve an agent how would i use all of these settings if you need other example notebooks let
00:53:40.000 | us know nina and i will work on that as well so finally fill out our survey kind of share your feedback as
00:53:46.720 | well we've got some nice kind of merch we can hand out to people who have good questions
00:53:52.480 | as well i think we got a little bit of time for that how does that sound
00:53:55.840 | i've thrown this rag platform at you are you all ready to sign up okay yeah
00:54:08.000 | go ahead there's a q a mic or i'll try to repeat back the question to the wider audience so if you
00:54:27.600 | want my interpretation of the question or i can't massage it then is that on yes yeah um so there's a
00:54:37.120 | really interesting feedback you're defining here around you can define evals via prompt you can see
00:54:43.600 | how they're doing you can let people change the system prompt do you find that your users of like
00:54:47.520 | the platform or the enterprise customers like who is it that's doing that is it like technical people
00:54:51.760 | or is it the people that's using it day to day so one of the things by putting it in the ui what you
00:54:57.760 | have is you have some of the business users will try to mess around and play with it and some pieces
00:55:03.040 | like a system prompt like non-technical people have a sense of using chat gpt and doing that
00:55:08.320 | but when you get to what's your my top k for my retrieval or using query reformulation those business
00:55:14.080 | users are just going to mess up and their agents are going to kind of not stop work well like that so
00:55:19.040 | a lot of the settings there as you get to the advanced hill climbing because a lot of that stuff
00:55:25.200 | comes into like i've built a good rag agent it's hitting 80 percent but the business wants us to
00:55:30.560 | get to 90 now i have to build the evaluation now i have to see where are my errors are they on the
00:55:35.440 | retrieval side are they on the generator side and this is where having that developer perspective of
00:55:40.160 | understanding and being able to do that error analysis is important to figure out which of those
00:55:45.040 | settings to do it because we give you a lot of settings but you still have to have some sense of
00:55:49.760 | like which settings are appropriate to change for what outcomes like that um company i work for is
00:55:57.520 | relatively new to ai we're loving what we're discovering but uh it's also pretty overwhelming
00:56:02.800 | the information you presented uh hints at some of the answers to to our major questions which are
00:56:11.680 | we have thousands of pdf documents we have thousands of uh you know excel documents we have an mrp system with
00:56:18.800 | the nescue ball back end we need to be writing custom queries for this environment to consume
00:56:24.400 | i guess does your environment have a checklist of the approach for different data types and how to
00:56:32.640 | package this information for your environment yeah i i can repeat back the question like that so
00:56:38.320 | so the question is is you know and i'll paraphrase here getting into it we have lots of documents we have
00:56:43.760 | excel documents pdf documents structured data like how am i going to make sense of that
00:56:48.640 | and figure out what the best way to kind of use your platform is is that fair yes and so this is
00:56:54.320 | where as a startup we have a couple of different levels we have we have the developer go figure it
00:56:59.760 | out yourself we've given you the buttons the api commands to do that but this is where we've grown
00:57:04.960 | and we have a dedicated team that we call customer machine learning engineers that all they do is work
00:57:10.160 | with customers and they do that process of hey let's walk you through building your evaluation data sets let's
00:57:16.240 | help you kind of hill climb and improve that so depending on what you need we have a team that's
00:57:21.920 | just focused on helping customers through that process as well and in the meantime nina and i
00:57:26.880 | are trying to document it and make it more available for the rest of the users but there is that gap of like
00:57:32.560 | there's a lot of knowledge to effectively use these systems to be able to do that is that fair
00:57:38.560 | and it's a consulting fee or a startup package yeah talk to us we we can we got we'll find a way to take
00:57:44.560 | your money a little bit uh go ahead hey rajim um um we've built a few agents and i'm very interested in
00:57:58.800 | exploring the rag only part of like what you described is there a way to integrate these into
00:58:04.720 | our agents that we're building in typescript javascript absolutely so each of the okay yes
00:58:11.440 | i don't have to repeat so each of the pieces here we have kind of apis we have a we have a javascript
00:58:17.600 | sdk like that so if you just want to use the components of this this is where when the company
00:58:22.960 | first started about a year ago if you'd asked to kind of do a demo we would have sold you like you have
00:58:26.640 | to buy the entire platform you have to get the end to end but we've realized like if we want
00:58:30.800 | to appeal to developers that lots of times you've built stuff out that you just want some components
00:58:35.280 | of it so yes we we've kind of modularized that out so if you just want to use parts of it you can and
00:58:41.440 | integrate with others like that so and in fact you know most of our customers don't use like the ui that
00:58:46.560 | we've showed most of them have other uis that they're integrating their applications with like that so
00:58:54.240 | well in response in addition to the costing or pricing um how far can the 25 dollars go like
00:59:00.960 | because we weren't able to test it right now like how many documents can we you know experiment in it
00:59:06.480 | so the question is like how far will your 25 go i i i don't know exactly i get unlimited use of it um but
00:59:17.920 | try it out if if you run into issues just let us know we can we can talk to sales we can figure out
00:59:22.720 | something like that if depending on what you're doing like that so don't let the 25 dollars be an
00:59:27.440 | issue we want the we want you to use it we want to hear your feedback if you need help let we can
00:59:31.920 | talk to you and figure out something to do like that so yes do people like the idea of a managed rag
00:59:39.040 | service does this feel like something okay okay okay yes well to that point i guess i'm wondering how
00:59:45.840 | managed is it so at my company we build rag applications for government and other clients
00:59:50.480 | who have really strict data sovereignty data residency questions uh or requirements so are you in either of
00:59:58.080 | the big clouds where i can decide to be in like a gov cloud or some other sovereign data service like
01:00:06.800 | how much control do i have over where stuff lives yeah so again we're early stage startup we're starting
01:00:13.040 | out right now we have our own cloud which doesn't help you like that we have partnered with snowflake so
01:00:18.160 | we are on snowflake if that can work for you we can also install in vpc but right now we're limiting
01:00:25.360 | it to vpc we're not doing kind of custom on-premises deployment just because that takes a lot of
01:00:31.120 | upfront work to be able to do like that does that help yeah can you do vpc and say like aws gov cloud
01:00:39.040 | or azura's government solution we have not yet done like aws in kind of golf cloud like that like if you
01:00:45.760 | have a strong demand for that let us know we can we can work and try to figure out something like that
01:00:49.680 | but we haven't taken on that yet okay i'll come find you thanks yes thanks
01:00:52.720 | hello hi
01:00:56.560 | hey yeah go ahead i don't know hi i was curious you know like how does the performance of your
01:01:05.920 | rack platform vary as a number of documents vary for example like do you have like recommendations of
01:01:12.000 | best practices when you're working with millions of documents as compared to hundreds of documents
01:01:15.600 | what kind of you know configurations and knobs work better would be curious to learn yeah so i'm
01:01:20.640 | going to give the hand wavy answer of we have customers like qualcomm for example that have tens
01:01:25.440 | of thousands of documents their documents have hundreds of pages in them and we handle that fine
01:01:30.240 | all i know is that i have engineers like matthew in the back that works the platform that makes sure
01:01:36.400 | it kind of scales up like that so i would say go grab him if you have questions about scalability right now
01:01:42.160 | we've been able to kind of scale with our customers um along those lines like that so
01:01:46.320 | let me get take this and then i'll come back over here hey this is specifically about the lm unit
01:01:53.440 | um tool um how deterministic and repeatable are the are the is the scoring from that
01:01:59.200 | i'm looking to see if they have i i mean it's still a large language model so i think there might
01:02:09.360 | be a little bit but i think it's fairly good is that uh it's pretty repeatable there's actually a paper
01:02:17.280 | release there's a lot more details in there too but like there was a lot of analysis done about the
01:02:23.200 | correlation with other trusted metrics and you know how you know running sorry run it running the lm unit
01:02:30.240 | tests repeatedly making like um we use kind of a random seed to make sure it's repeatable so like
01:02:36.480 | like all of our testing has suggested that uh it is repeatable of course like
01:02:42.720 | you're if you're like altering the prompt you're altering kind of the natural language unit test that
01:02:48.240 | can have you know unforeseen impacts on the results but i think keeping the prompt consistent it should be
01:02:55.600 | pretty consistent with different types of queries awesome thank you yeah check out the paper as as matthew said
01:03:01.360 | that it goes into a lot of those pieces like that so thanks
01:03:04.080 | question here no so is this already integrated with uh microsoft copilot and azure and the second
01:03:16.480 | question is uh are there any apis that you guys expose and get meaning if we are developing our own custom
01:03:23.120 | code but we want to you know use your rack and what's approach so we have the api so if you're developing your own
01:03:30.080 | custom solutions you can easily integrate kind of what we have to do that in terms of installing
01:03:36.160 | inside of kind of um azure like that that's one of those custom vpcs that we can support like that in
01:03:43.920 | terms of integrations with copilot i don't know if we've really done anything usually like people are
01:03:48.160 | like i'm sick of copilot and that's why they come to our rag solution like that so again we have the apis
01:03:54.080 | though so i don't know what the integrations would require for something like copilot like that
01:03:58.160 | thank you hi first of all a great presentation because i have used your product and the other
01:04:07.200 | thing is like regarding rag i have a question like uh you as a company what are the challenges that you're
01:04:14.320 | facing in the rag and where do you see that would go in future so what are the challenges we're facing
01:04:21.040 | in rag apart from hallucination internet search or things like that apart from that what are the
01:04:26.800 | different challenges that you're facing matthew i might have you chime in on this too i think
01:04:32.880 | one of the big challenges right now is around the extraction where we've spent a lot of time and energy
01:04:39.680 | and there's still i think room for improvement when we're talking about working with complex tables
01:04:45.280 | along charts i think there's one there i think scalability is always a piece like that because
01:04:51.680 | everybody wants to ingest more documents in a in a faster amount of speed like that
01:04:56.720 | i think if you talk about doing generated um working with structured data i think that's a challenge for
01:05:05.840 | everybody in the industry when you're trying to do those queries of like text to sql type queries like
01:05:10.640 | that so those are some of the top ones on my list like that do you have yeah i agree i think document
01:05:16.720 | understanding like making sure that you are correctly understanding long complex documents with hierarchy
01:05:22.400 | and interrelations between sections and really large tables and gnarly figures like i think that it has
01:05:28.160 | been a consistent thing that we've been working on and seeing improvements on but there's definitely
01:05:32.880 | still room to grow there um i think another kind of broader change in the rag space that our research
01:05:38.960 | team is very focused on and it hasn't directly translated into our product yet but it's kind of
01:05:43.680 | coming soon is moving away from the more like static rag pipelines that raj like beautifully described in his
01:05:51.040 | presentation toward kind of a fully dynamic flow with you know model routers uh at certain kind of inflection
01:05:57.360 | points deciding which tools to use how many times to retrieve you know how to alter the retrieval config
01:06:03.920 | in order to correctly answer the query so i think those sorts of like dynamic workflows will greatly increase
01:06:10.240 | what you can answer with any rag platform um almost like deep research for rag uh is one way to think about it
01:06:18.720 | and i think that's something that our research team is working very hard on and will be like coming into our
01:06:23.600 | platform in the near-ish future great thanks come back in six months
01:06:30.720 | do you have plans for a public facing like a web search feature particularly one that could be configured
01:06:39.360 | on a company's domain
01:06:41.040 | so are what you are you looking for kind of a general i need to be able to search the internet because like
01:06:48.880 | there's tavilli fire crawl places like that or it's more that companies that have public facing
01:06:54.400 | websites where it makes sense to instead of running rag on your like cms database you're actually just
01:07:01.040 | you want to run it on that live and so john i might pull you in here because i think one of the things
01:07:06.560 | john's worked with quite a bit because a lot of what that requires is the integrations to be able to pull
01:07:11.520 | from that webs from those sources as well and so i know john you've done some of those things for
01:07:16.240 | customers with fire crawling pieces yeah sorry what was the question for me again about ingesting
01:07:21.280 | customer websites and being able to use that in rag oh yeah so um there's a lot of like uh ways that
01:07:26.640 | you can scrape for websites or like directly you're like um kind of set up like an etl pipeline with like
01:07:31.920 | pulling the records from the apis or just pulling the unstructured data from like a data cloud um or blob
01:07:37.840 | storage and then kind of just setting up like an ingestion queue which is what we do on our side for
01:07:43.040 | larger customers um or basically any customer that needs like a kickstart for lots of data coming in
01:07:47.840 | some customers are like super happy to write their own scripts or like their own kind of functions our
01:07:52.960 | daily con jobs to update uh documents but we're also working on like a managed solution on our side so
01:07:58.640 | that you can just give us your credentials and then on our front end and then we'll just start pulling
01:08:02.720 | and syncing all the data on our back end but that's like something coming on the roadmap in like two to
01:08:06.480 | three months and so please if there's pieces like that you see that you want us let us know because
01:08:12.400 | we like to kind of push the product team to kind of make that stuff happen so all right yeah hi hi
01:08:20.400 | how do you deal with uh frequently updated content and document level permissions if not everybody can see
01:08:28.480 | specific documents yes so those are both tricky things one of the things we'll do for frequently um
01:08:36.400 | frequently updated documents is we have a continuous ingestion pipeline john if you can talk to in the
01:08:41.280 | back can talk to you more about that piece as well so that's one piece for the frequently updated for
01:08:47.920 | the other piece is the harder one entitlements how do you deal with all the permissions right like you're
01:08:53.360 | indexing hr rag stuff next to customer support stuff how do you make sure that the person doesn't look
01:08:59.680 | up the salaries of anybody else like that so this is where within our platform one of the things we're
01:09:04.880 | adding is an entitlements layer like that because as we've talked to lots of customers they're like
01:09:10.160 | this is nice when you do it over our customer support but inside of a real enterprise we have
01:09:15.440 | governance we have permissions we have to be able to respect that when we do these types of searches
01:09:20.240 | and so we're adding an entitlements layer on top of ours to be able to handle that so yeah that's an
01:09:25.600 | important piece all right hi uh so my question is regarding uh in last six months what are some of the
01:09:31.360 | major breakthroughs in uh rag area what are the major breakthroughs in the last six months in the rag area
01:09:38.240 | so i think the re-rankers continue to get better i think that's that's an easy place um where we've seen
01:09:47.520 | that i mean i think part of it is just every piece of that has steadily been getting better i think one
01:09:52.800 | of the biggest changes you've seen in the last six months year is the rise of the vision language models
01:09:57.360 | and how strong they are at being able to handle kind of images like that that's that's off the top of
01:10:02.480 | my head i see matthew's busy talking to somebody else otherwise i'd have him weigh in as well does that help
01:10:07.520 | help yes um does the contextual parser module can replace like doc lane or gsp invoice parsers and
01:10:17.520 | things like this and if if it does does it have the ability to parse qr codes and barcodes and instruct
01:10:24.640 | their metadata to kind of read that data as well yes so i think one of the things is there's a lot of
01:10:31.920 | different types of documents out there in terms of document types and there's a lot of different
01:10:35.520 | solutions so i don't we're not going to be able to handle every type of thing perfectly like that i
01:10:39.520 | think there's going to be lots of pieces but this is where we've made those apis so you can try that out
01:10:44.320 | on those piece like we haven't specifically focused on for example that type of document we do have the
01:10:50.720 | image captioning so maybe it would pick them up maybe it wouldn't like that so but the idea here is to go
01:10:56.400 | in that same space where there's a lot of these other companies doing this parsing thing and making sure that
01:11:01.200 | we have a module that you can just stick in and replace and kind of use with ours like that
01:11:05.120 | yes how do you deal with the domain specific language that's a good question how do you deal
01:11:14.560 | with domain specific language and this can get a little tricky and we've seen it for example when
01:11:19.360 | we're working with technical companies that have very specific words that they use inside there where
01:11:26.240 | on one hand you can do things like changing system prompts a little bit for the retriever part but
01:11:33.200 | sometimes what we've done with some of those customers is fine-tune that grounded language model
01:11:37.920 | that's closer to their vocabulary and how they speak and that's been one technique we've used now and this
01:11:44.080 | is where at one point in the platform we had that available to end developers but fine-tuning models can get a
01:11:49.280 | little complex for using for rag so we've kind of put that away now and that's more for kind of our
01:11:54.160 | service-oriented customers that work with our customer machine learning engineers but that's one thing
01:11:58.720 | we've done to if that knowledge is out there and you need to get that into your grounded language model
01:12:04.000 | is to do a kind of a fine-tuning process step go ahead oh i don't know where the mic is we're making them
01:12:11.600 | run getting their steps in today hey um so how about hipaa regulations like uh phi data like this kind of
01:12:23.760 | stuff like how you deal with that should we hash it before go to the platform or the llm can can
01:12:31.520 | understand and deal with that so there's two aspects that one is we just got hipaa certified
01:12:37.360 | that people are going to think i'm paying the audience members out there um like that the second
01:12:42.080 | is is well what do you do in extraction like maybe i have phi data i want you to automatically mask or
01:12:47.920 | filter that so this is one thing that we've been talking with a product team about whether to include
01:12:52.080 | that capability for during that parsing piece um to automatically kind of flag and identify that that
01:12:58.560 | behavior right now we don't kind of have that in the product but again like if that's something you
01:13:03.040 | need or want let us know it's not that hard to kind of build on but you can also do that as a supplement
01:13:08.640 | to the parsing and have a second job that runs and looks for that pii information does that help good
01:13:14.240 | uh i know garbage in garbage out but if we have a document with say two different numbers on a particular
01:13:22.560 | like statistic or fact how does your rag deal with that or how would it answer a question yeah
01:13:28.720 | so what what happens in that case is often the retrieval is going to bring both of those pieces
01:13:34.480 | of information over to the grounded language model and it's going to be up to the language model to use
01:13:40.640 | its reasoning ability to try to reason out the two differences so sometimes if one is for example if it's
01:13:47.040 | obvious that one is not like let's let's let's say it's the size of a mosquito and one of the answers
01:13:52.960 | is 100 feet and the other is right like three millimeters the rat the language model is going to
01:13:58.000 | know like one of these is junk and into ignore it but if they're both close and reasonable yes that and
01:14:03.040 | that can get you in trouble um with with those pieces like that so i assume this is a pain that you're having now
01:14:11.040 | so we're just trying to just trying to see how we can organize that um because a lot of my job right
01:14:21.680 | now is just telling people information that should be in a document but it's not updated so so one thing
01:14:28.160 | that can help with that is taking advantage of metadata and using as much rich information about those
01:14:34.320 | documents so when at retrieval time it can help it kind of figure out like hey these are the two
01:14:39.200 | differences and this is why i should prioritize this answer over another answer like maybe this is the
01:14:43.760 | most recent or this was written by the authoritative person like that like that's like one one thing
01:14:49.440 | that we kind of recommend for those use cases like that thank you all right okay well thank you all for
01:14:56.880 | staying i've loved the kind of the questions like this please use the application will give us feedback
01:15:03.040 | on how you're finding that as well good bad negative happy to take that all like that so thank you
01:15:08.880 | again for kind of spending your morning with us like that anything else we have nina or we good oh
01:15:13.440 | you've got the mic off oh i think i still have my mic uh yeah i've been going around distributing swag
01:15:19.200 | to folks that have asked questions sorry if i missed some of you that were further away i have one more
01:15:24.080 | pair of socks anyone wants them the guy in the back wants the socks all right yeah all right thank you
01:15:34.000 | all we'll be hanging around for a little while if you have that so