back to indexEyeLevel Launch: Your RAG is Tripping, Here's the Real Reason Why

00:00:00.000 |
Hi, my name's Ben. I'm co-founder of iLevel.ai, and I've been building 00:00:17.440 |
applications powered with AI for the last 15 years. First at IBM Research, then at 00:00:24.240 |
IBM Watson, later working with major brands like the Weather Channel, and now 00:00:29.400 |
at iLevel, where we've built the world's most accurate and scalable RAG platform. 00:00:35.780 |
Using our no-code tools and APIs, our users can upload documents and receive 00:00:41.760 |
the most accurate retrievals in minutes. We've been developing our solution for 00:00:46.500 |
the last four years, and we're among the first early users that were admitted to 00:00:50.640 |
the GPT-3 beta program. We found it easy to get started with RAG and very difficult 00:00:56.940 |
to master. In our own experience, RAG applications can have error or hallucination 00:01:02.580 |
rates as high as 35%, especially when the knowledge base consists of the kinds of 00:01:07.800 |
complicated documents that are commonly found in the enterprise. The source of 00:01:14.760 |
these errors is rarely the LLMs or the prompts. Instead, it's typically RAG itself, or more 00:01:21.520 |
specifically, the quality and relevance of retrieved content. And the problems with 00:01:26.980 |
content generally fall into one of three categories: bad or improperly extracted text, 00:01:33.880 |
missing information from the surrounding parts of the document that's lost during 00:01:39.040 |
chunking, or visual elements that are not extracted at all. Most commonly, the problems 00:01:46.160 |
with RAG are content ingestion problems, and advanced RAG techniques that help you solve 00:01:51.920 |
these problems can take hundreds of hours to implement. We've spent the last four years 00:01:57.020 |
tackling these difficult data engineering problems and have built the solutions to them into our 00:02:02.720 |
ingestion pipeline. As a result, our users are able to build the most accurate RAG applications 00:02:09.220 |
in just minutes, and our customers such as Air France and Dartmouth tell us that their RAG applications 00:02:14.980 |
respond correctly more than 95% of the time. In a recent study, our platform achieved 98% accuracy 00:02:23.980 |
against complicated real-world documents and outperformed some of the most popular solutions 00:02:29.980 |
in market by as much as 120%. I'm going to quickly walk you through the unique approach we take to achieve this high level of accuracy, 00:02:38.560 |
and I'll start by telling you that we don't use vector databases at all, and in fact, we think 00:02:44.320 |
they may not be the best technology solution for a lot of RAG applications. Instead, what we do is we 00:02:50.160 |
create what we call semantic objects, and we do a multi-field search across the attributes of this object. 00:02:56.320 |
I'll show you what that means with a real example from Air France. Air France has been using our platform for the last year 00:03:05.280 |
to build a chat GPT-like copilot for their call center agents. 00:03:11.040 |
their knowledge base, which consists of hundreds of thousands of documents just like this one, 00:03:15.040 |
filled with tables, figures, and texts scattered across the pages. 00:03:21.040 |
In our ingestion pipeline, the first thing we do is run a vision model that we fine-tune with millions of documents 00:03:27.040 |
to identify where the images, the tables, and the text are. Then we run them through 00:03:32.800 |
dedicated multimodal processing pipelines to extract the visual and written information. 00:03:38.800 |
When you do RAG, you have to break apart this document into smaller chunks. 00:03:44.800 |
When you do that, you quite often lose information from around the chunks. Things like what section of the document it came from, 00:03:54.560 |
If you were to ask questions about a book and receive random paragraphs from the book, chances aren't great you'd get good answers. 00:04:02.560 |
And that's kind of what's happening with chunking and the loss of the context. 00:04:07.040 |
That's why we created semantic objects. It consisted of the original chunk text as well as auto-generated metadata 00:04:14.320 |
that preserves the information around the text. And then we rewrite the text into two ideal formats, 00:04:30.240 |
Let me show you what that looks like with an example. 00:04:36.320 |
So this is a figure from one of Air France's documents. 00:04:39.440 |
If you were to OCR this and extract the text from it, 00:04:42.640 |
vectorize it, put it in your vector database, it would look something like this. 00:04:46.560 |
Look at how much information is lost in the process, though. 00:04:50.000 |
Instead, what comes out of our ingestion pipeline is something like this. 00:04:55.600 |
And this includes both the search version as well as the completion version of the text. 00:05:04.080 |
When we receive a search query, we do something similar. 00:05:07.200 |
We rewrite the query into a format that's compatible with the objects. 00:05:11.680 |
Then we search the entire object, the original text, the auto-generated metadata, 00:05:19.200 |
We use a fine-tuned LLM to re-rank the results and improve the accuracy. 00:05:24.800 |
And in total, in our ingestion and search, there are more than nine models that are fine-tuned 00:05:34.400 |
The end result is the world's most accurate RAG platform. 00:05:38.880 |
And our users are able to build enterprise-quality, production-ready applications in minutes, 00:06:01.440 |
For more information, visit us at www.Xray.com/Xray.com/Xray