EyeLevel Launch: Your RAG is Tripping, Here's the Real Reason Why

Hi, my name's Ben. I'm co-founder of iLevel.ai, and I've been building applications powered with AI for the last 15 years. First at IBM Research, then at IBM Watson, later working with major brands like the Weather Channel, and now at iLevel, where we've built the world's most accurate and scalable RAG platform.

Using our no-code tools and APIs, our users can upload documents and receive the most accurate retrievals in minutes. We've been developing our solution for the last four years, and we're among the first early users that were admitted to the GPT-3 beta program. We found it easy to get started with RAG and very difficult to master.

In our own experience, RAG applications can have error or hallucination rates as high as 35%, especially when the knowledge base consists of the kinds of complicated documents that are commonly found in the enterprise. The source of these errors is rarely the LLMs or the prompts. Instead, it's typically RAG itself, or more specifically, the quality and relevance of retrieved content.

And the problems with content generally fall into one of three categories: bad or improperly extracted text, missing information from the surrounding parts of the document that's lost during chunking, or visual elements that are not extracted at all. Most commonly, the problems with RAG are content ingestion problems, and advanced RAG techniques that help you solve these problems can take hundreds of hours to implement.

We've spent the last four years tackling these difficult data engineering problems and have built the solutions to them into our ingestion pipeline. As a result, our users are able to build the most accurate RAG applications in just minutes, and our customers such as Air France and Dartmouth tell us that their RAG applications respond correctly more than 95% of the time.

In a recent study, our platform achieved 98% accuracy against complicated real-world documents and outperformed some of the most popular solutions in market by as much as 120%. I'm going to quickly walk you through the unique approach we take to achieve this high level of accuracy, and I'll start by telling you that we don't use vector databases at all, and in fact, we think they may not be the best technology solution for a lot of RAG applications.

Instead, what we do is we create what we call semantic objects, and we do a multi-field search across the attributes of this object. I'll show you what that means with a real example from Air France. Air France has been using our platform for the last year to build a chat GPT-like copilot for their call center agents.

They wanted to understand their knowledge base, which consists of hundreds of thousands of documents just like this one, filled with tables, figures, and texts scattered across the pages. In our ingestion pipeline, the first thing we do is run a vision model that we fine-tune with millions of documents to identify where the images, the tables, and the text are.

Then we run them through dedicated multimodal processing pipelines to extract the visual and written information. When you do RAG, you have to break apart this document into smaller chunks. When you do that, you quite often lose information from around the chunks. Things like what section of the document it came from, or even which document it came from.

If you were to ask questions about a book and receive random paragraphs from the book, chances aren't great you'd get good answers. And that's kind of what's happening with chunking and the loss of the context. That's why we created semantic objects. It consisted of the original chunk text as well as auto-generated metadata that preserves the information around the text.

And then we rewrite the text into two ideal formats, one for search and one for completion. Thank you. Let me show you what that looks like with an example. So this is a figure from one of Air France's documents. If you were to OCR this and extract the text from it, vectorize it, put it in your vector database, it would look something like this.

Look at how much information is lost in the process, though. Instead, what comes out of our ingestion pipeline is something like this. And this includes both the search version as well as the completion version of the text. When we receive a search query, we do something similar. We rewrite the query into a format that's compatible with the objects.

Then we search the entire object, the original text, the auto-generated metadata, and the search version of the text. We use a fine-tuned LLM to re-rank the results and improve the accuracy. And in total, in our ingestion and search, there are more than nine models that are fine-tuned to help deliver this kind of accuracy.

The end result is the world's most accurate RAG platform. And our users are able to build enterprise-quality, production-ready applications in minutes, not months. I invite you to try it for yourself, though. iLevel.ai/Xray. iLevel.ai/Xray. Thank you very much. Thank you, Ben. For more information, visit us at www.Xray.com/Xray.com/Xray

EyeLevel Launch: Your RAG is Tripping, Here's the Real Reason Why

Transcript