Navigating Challenges and Technical Debt in LLMs Deployment: Ahmed Menshawy

Hi everyone, I'll try to touch on three main things and mainly how AI moved from excellence in structured data to LLMs and the use of unstructured data that most of our organizations have and also we'll touch on intelligence augmentation and really the hype around AGI and the doomsday and finally we'll briefly talk about the challenges and technical debt and highlight the findings that we have published recently.

So over the last 10 to 15 years most of the AI values that we have seen is really coming from structured data and we have seen supervised learning and deep learning doing really well at labeling things and but this is not the reality like most of the organizations it's estimated that most of the organization's data is unstructured specifically more than 80 percent of the organization's data is unstructured and it's also estimated that 71 percent of them really struggle in managing and and securing this kind of data and it would have been ideal to really build automated systems try to do certain recommendations based on this data but now it's easy to to really use it and and have it to contextualize or customize the contextual language models so you can easily have this as an extended memory to to your language model and have it formulate answers based on the domain specific data data that you have within your organization and talking about the AGI and and and the way we see LLMs or generative AI in general at MasterCard it's really augmenting human productivity and and we have seen a lot of hype around you know generative AI is going to replace our jobs and and doomsday and it's taken over and I I recommend you this great article from from nature which is really talking about uh stop talking about tomorrow's AI doomsday when AI poses risks today so stop doing speculations about what AI will become tomorrow and what kind of risks uh that that will have tomorrow and really focus about the the current risk that it poses today and and funny enough some of the the the big speakers about the doomsday are actually ones who have AI systems out there to the end users with with a lot of risks that we have seen uh in the past uh and and this of course will help regulators as well be more focused like if we highlight the current risks and so it will help them more focused to have the laws and policies that can really help them regulate the current AI systems and at the same time be uh early sort of uh or proactive enough to to adopt any new laws whenever new algorithmic approach uh come up uh also when it comes to the algorithmic foundation like you know like you know AI and genitive AI specifically has been transforming our lives in so many ways but the algorithmic foundations itself behind LLMs is not really the ones that will get us to AGI and also I recommend this talk from Lacoon one of the fathers of machine learning where he talks about the objective driven uh learning um and the whole idea that you know you know despite the fact that it's transforming our life in so many ways it's really so dumb at the core of it uh and it's because of the whole idea that it's auto-aggressive and whenever it's making a mistake this mistake really amplifies over time because the other generation of tokens is so dependent on what it's already generated and I can't help it by by uh but you know praying this code from uh Ada Lovelace uh otherwise known as the world's first computer programmer so in in her 1843 analytics engine paper she mentions that the analytical engine or machine learning as we call today cannot originate anything by itself it can only do what what we ask it or what we order it to perform uh because basically we don't have this algorithmic foundation that can really get us to something that can originate something by itself and despite being you know about 180 years old this statement still holds uh despite the transformations that we have in so many uh AI algorithms and applications and um funny enough like I've met a lot of people that thinks open AI is the one behind language models and I I do hope that you folks don't share them the same misconception uh the whole idea of predicting the next token given a specific context is very intuitive and simple idea uh that it's not only a few years old it few decades old uh but was what was really broken with this is is the whole user interface um and and a lot of folks have really misunderstood what chat gbt is all about so chat gbt really fixed this whole user interface idea that you were able to naturally as as we speak be able to prompt the the lm in in a natural way and and get your response and this is what was was really broken with with the language models before gbt assistance and the chat gbt specifically because this kind of data is really rare and and you know lm's or or open AI specifically have built their base model based on the internet scale data but then in subsequent phases before they release this gbt assistant they had to go through uh outsourcing a lot of a lot of folks to really go about generating manual pairs of responses and questions and responses uh and as i said like you know lm's despite you know being dumb at the core of it it's really accelerating uh innovations everywhere and and we have seen great adoption in in so many industries and mastercard is no different uh so we have been de-risking this technology responsibly of course uh and we have a recent press release uh in fab uh where our president uh announced how we used lm's the generative guy specifically to post fraud detection in some cases by 300 percent and to go into the the the last topic of my uh of my session is is basically about the challenges so let's first you know understand the essentials that anyone needs for building a successful gni application so basically you need to have access to a variety of foundation models and you need to have an environment to customize contextual llms and you need to have an easy to use tool to build and deploy applications so basically all the you know the widely used tools that we have seen before gnii wasn't really applicable to the gnii landscape and finally we need to have a scalable ml infrastructure that can really help in scaling up and down not just creating replicas but really creating replicas at a speed that can work for our uh for our end users and i've tried to color code the different essentials based on the challenges that we we would see in in in building such applications so access to a variety of foundation models is is is not so challenging yes still you need to do this kind of trade-off between cost and the model size but it is available and the environment to customize the language model itself is is is a bit challenging because yes we most of the enterprises have their own ai environment but it is not really something that is built for uh such models such large models and and the easy to use tool i think is the most challenging part of the whole equation because none of the tools that we have seen before and and most of the tools that that most of you guys use now is really is really as new as lms none of them has existed before uh and finally the need to have the scalable ml infrastructure is is a bit of a challenge as well uh and and we have seen this nice curve from open.ui where they show that the gpu compute and ram uh for inference is actually getting more uh or greater than the the the compute they use for training the model itself um and and before i i i talk about the uh the challenges in llm and highlights the papers that we have recently published uh i just want to bring up this really nice chart from the next paper 2015 paper and it shows that ml code which is at the core of building any machine learning system is only a small fraction of what goes into buildings end-to-end pipeline and specifically it's less than five percent of what goes into buildings end-to-end pipeline and this is what i call like i have met a lot of folks uh uh during you know before my talk and and and they think that you know an ai engineer is all about really you know connecting apis and and getting this kind of plumbing uh in place but i think it's more than that it's it's really everything around this ml code box it's really building this end-to-end pipeline which is accounts for more than 95 percent of the work um sorry uh so before the challenges so we just highlight the the two different approaches that are widely used by uh in in the industry so the first one is is really the closed work approach so you have a foundation model you use it as it is zero shot or few shot learning or even fine tune it with your domain specific data and you know if you ask any of the the folks in the enterprises they will tell you we really have a hard time operationalizing such models because we have certain accuracy constraints so basically the hallucination and they do it very you know confidently uh attribution um you know we can't really understand why the models are saying what they are saying uh it's tallness they go they go out of date and we have seen as a different releases that uh that comes out of open ai revision as as you know in gdpr or even in california ai law uh folks can opt out of the ai systems and and their information can't be used again for training or influencing the model decisions so you need to be able to do the model editing and and this is really hard in in the foundation model or even if you fine tune your model and finally customization so you need to be able to customize these models with your own domain specific data and have it really more grounded or more factual to generate information only based on your info your domain specific data and it turned out that the solution to all of these problems is really to couple the foundation model to an external memory uh also known as the rag uh so rag as you can see that you know the original setup remains as it is but we have added this additional context which is coming from your domain specific data um and it is grounding so it's improved the the the factual recall and there is very nice paper uh around uh regulation reduces hallucination in conversation it kind of rhymes but like it's very nice and shows how this kind of architecture really reduces the hallucination of the lm systems and you can also have it up to date so you can easily swap in out vector indices so you can do the revision you can do uh attribution of course like all of the problems we have mentioned in the previous slide you can also do as part of this uh rack setup so you have access to the sources coming out of your retriever so you can easily go back and understand why the model generated certain certain text or certain uh decisions but it's not so easy right so like there are so many questions that need to be answered for this system really to be optimized and and be able to work in production and this is not even half of the questions that that we have out there so mess mostly how do we optimize the retriever and generator to work together so despite like the mainstream uh kind of frags that most of the people are doing right now is really having the retriever and and the generator as two separate brains that don't that none of them knows that each other exists uh but the actual rag paper uh that was released uh by by fair is is actually about training these two in in parallel so you need to have access to the model parameters and this is now thanks thanks to the people who are believing in the open source is possible uh so you can have access to the model parameters the open source model parameters so you can fine tune the generator to generate factual information based on what it gets from the retriever so it's not just you know attaching an external memory and and you know two sides of the brain that that that are totally separated so this is our paper uh so it's it's very similar to um the nips one but it it really shows the unique and different the challenges that that we would see in in building an end-to-end llm application so you can see that you know the again the the surrounding boxes around the llm code or the adoption of foundation model is is really you know accounts for more than 90 percent of what goes into building such application and it's not really just about you know if we pick one box about the domain specific data collection it's not just about building or generating the domain specific data it's also how do we how do we preserve the access controls within our enterprises into the into these ecosystems so like you know i'm sure most of the organizations that you work with have access controls like you can have access to certain systems but not others so how do we make sure that we don't have a global llm system that can really have access to all of the data that we have behind the scene so we need to maintain the same access controls uh and and have certain specialized models that can work for certain tasks and also you know coming back to this nature article that we need to focus about the current risks that ei poses today and how we build safeguards around it and this was really the core uh i i would say you know principle behind mastercard to move to to adopt llms so we have the same the seven core principle of building responsible ei and and it you know it's all everything around privacy it's around security reliability um and and you know we we also have this governing body and clear strategy that really enforces this core principles into into the building of such llm applications so yes we can go about really de-risking new technologies such as llms and use it for some of the services that we have but at the same time we need to have the right safeguards to really make sure that you know the access controls are in place and also we are not you know generating any biased information and um so funny enough one of the reviewers one of the reviewers who accepted this paper i mentioned that you know after he he read the paper he was wondering uh if uh if llm is is the right tool to use for for solving some of the applications given the huge number of challenges and technical debt uh that that that that he have seen uh but as the saying goes like you can't make an omelette without really breaking few eggs so you can't really use this kind of transformative technology uh in your business without really being challenged in so many ways uh and that's all i have for you and do check out some of the books boxes that we have from the ai engineering team from mastercard it's all about putting ai in production and and the whole other boxes around ml code or the llm that we have seen in the figures thank you and that we have to do so much for you and do check out some of the things that we have to do so

Navigating Challenges and Technical Debt in LLMs Deployment: Ahmed Menshawy

Transcript