back to indexNavigating Challenges and Technical Debt in LLMs Deployment: Ahmed Menshawy

00:00:00.000 |
Hi everyone, I'll try to touch on three main things and mainly how AI moved from excellence 00:00:20.960 |
in structured data to LLMs and the use of unstructured data that most of our organizations 00:00:26.880 |
have and also we'll touch on intelligence augmentation and really the hype around AGI and the doomsday 00:00:34.160 |
and finally we'll briefly talk about the challenges and technical debt and highlight the findings 00:00:40.320 |
that we have published recently. So over the last 10 to 15 years most of the AI values that we have 00:00:50.400 |
seen is really coming from structured data and we have seen supervised learning and deep learning 00:00:56.800 |
doing really well at labeling things and but this is not the reality like most of the organizations it's 00:01:03.840 |
estimated that most of the organization's data is unstructured specifically more than 80 percent of 00:01:10.160 |
the organization's data is unstructured and it's also estimated that 71 percent of them really struggle in 00:01:17.600 |
managing and and securing this kind of data and it would have been ideal to really build automated systems 00:01:25.120 |
try to do certain recommendations based on this data but now it's easy to to really use it and and have it to 00:01:33.520 |
contextualize or customize the contextual language models so you can easily have this as an extended memory to 00:01:42.160 |
to your language model and have it formulate answers based on the domain specific data 00:01:47.760 |
data that you have within your organization and talking about the AGI and and and the way we see 00:01:57.680 |
LLMs or generative AI in general at MasterCard it's really augmenting human productivity and and we have seen 00:02:06.640 |
a lot of hype around you know generative AI is going to replace our jobs and and doomsday and it's taken 00:02:12.960 |
over and I I recommend you this great article from from nature which is really talking about uh stop 00:02:19.680 |
talking about tomorrow's AI doomsday when AI poses risks today so stop doing speculations about what 00:02:26.480 |
AI will become tomorrow and what kind of risks uh that that will have tomorrow and really focus 00:02:32.880 |
about the the current risk that it poses today and and funny enough some of the the the big speakers about 00:02:39.360 |
the doomsday are actually ones who have AI systems out there to the end users with with a lot of risks 00:02:45.120 |
that we have seen uh in the past uh and and this of course will help regulators as well be more focused 00:02:51.120 |
like if we highlight the current risks and so it will help them more focused to have the laws and policies 00:02:57.520 |
that can really help them regulate the current AI systems and at the same time be uh early sort of uh or 00:03:04.720 |
proactive enough to to adopt any new laws whenever new algorithmic approach uh come up uh also when 00:03:12.320 |
it comes to the algorithmic foundation like you know like you know AI and genitive AI specifically has 00:03:17.600 |
been transforming our lives in so many ways but the algorithmic foundations itself behind LLMs is not really 00:03:25.120 |
the ones that will get us to AGI and also I recommend this talk from Lacoon one of the fathers of machine learning 00:03:32.160 |
where he talks about the objective driven uh learning um and the whole idea that you know you know despite 00:03:39.280 |
the fact that it's transforming our life in so many ways it's really so dumb at the core of it uh and it's 00:03:46.000 |
because of the whole idea that it's auto-aggressive and whenever it's making a mistake this mistake really 00:03:51.280 |
amplifies over time because the other generation of tokens is so dependent on what it's already generated 00:03:58.560 |
and I can't help it by by uh but you know praying this code from uh Ada Lovelace uh otherwise known as 00:04:06.640 |
the world's first computer programmer so in in her 1843 analytics engine paper she mentions that the 00:04:13.280 |
analytical engine or machine learning as we call today cannot originate anything by itself it can only do 00:04:20.000 |
what what we ask it or what we order it to perform uh because basically we don't have this algorithmic 00:04:25.520 |
foundation that can really get us to something that can originate something by itself and despite being 00:04:31.600 |
you know about 180 years old this statement still holds uh despite the transformations that we have in 00:04:37.680 |
so many uh AI algorithms and applications and um funny enough like I've met a lot of people that thinks 00:04:45.680 |
open AI is the one behind language models and I I do hope that you folks don't share them the same 00:04:51.600 |
misconception uh the whole idea of predicting the next token given a specific context is very intuitive 00:04:58.080 |
and simple idea uh that it's not only a few years old it few decades old uh but was what was really 00:05:04.080 |
broken with this is is the whole user interface um and and a lot of folks have really misunderstood what 00:05:11.760 |
chat gbt is all about so chat gbt really fixed this whole user interface idea that you were able to 00:05:18.400 |
naturally as as we speak be able to prompt the the lm in in a natural way and and get your response 00:05:25.760 |
and this is what was was really broken with with the language models before 00:05:29.760 |
gbt assistance and the chat gbt specifically because this kind of data is really rare and and 00:05:35.600 |
you know lm's or or open AI specifically have built their base model based on the internet scale data 00:05:42.480 |
but then in subsequent phases before they release this gbt assistant they had to go through uh 00:05:48.560 |
outsourcing a lot of a lot of folks to really go about generating manual pairs of responses and 00:05:54.640 |
questions and responses uh and as i said like you know lm's despite you know being dumb at the core of it 00:06:02.720 |
it's really accelerating uh innovations everywhere and and we have seen great adoption in in so many 00:06:08.800 |
industries and mastercard is no different uh so we have been de-risking this technology responsibly of 00:06:15.360 |
course uh and we have a recent press release uh in fab uh where our president uh announced how we used lm's 00:06:24.640 |
the generative guy specifically to post fraud detection in some cases by 300 percent 00:06:29.840 |
and to go into the the the last topic of my uh of my session is is basically about the challenges so 00:06:40.400 |
let's first you know understand the essentials that anyone needs for building a successful gni application 00:06:46.560 |
so basically you need to have access to a variety of foundation models and you need to have an environment 00:06:52.560 |
to customize contextual llms and you need to have an easy to use tool to build and deploy applications 00:06:59.440 |
so basically all the you know the widely used tools that we have seen before 00:07:03.920 |
gnii wasn't really applicable to the gnii landscape and finally we need to have a scalable ml infrastructure 00:07:11.120 |
that can really help in scaling up and down not just creating replicas but really creating replicas at a speed 00:07:17.520 |
that can work for our uh for our end users and i've tried to color code the different essentials based on the challenges 00:07:24.400 |
that we we would see in in in building such applications 00:07:28.400 |
so access to a variety of foundation models is is is not so challenging 00:07:32.880 |
yes still you need to do this kind of trade-off between cost and the model size but it is available 00:07:38.400 |
and the environment to customize the language model itself is is is a bit challenging because yes we most of 00:07:45.280 |
the enterprises have their own ai environment but it is not really something that is built for uh such 00:07:51.680 |
models such large models and and the easy to use tool i think is the most challenging part of the whole equation 00:07:58.400 |
because none of the tools that we have seen before and and most of the tools that that most of you guys use now 00:08:04.240 |
is really is really as new as lms none of them has existed before uh and finally the need to have 00:08:11.440 |
the scalable ml infrastructure is is a bit of a challenge as well uh and and we have seen this 00:08:16.640 |
nice curve from open.ui where they show that the gpu compute and ram uh for inference is actually getting 00:08:23.200 |
more uh or greater than the the the compute they use for training the model itself um and and before i i i talk 00:08:31.200 |
about the uh the challenges in llm and highlights the papers that we have recently published uh i just 00:08:36.960 |
want to bring up this really nice chart from the next paper 2015 paper and it shows that ml code which 00:08:44.880 |
is at the core of building any machine learning system is only a small fraction of what goes into buildings 00:08:51.040 |
end-to-end pipeline and specifically it's less than five percent of what goes into buildings end-to-end pipeline 00:08:57.760 |
and this is what i call like i have met a lot of folks uh uh during you know before my talk and and 00:09:03.600 |
and they think that you know an ai engineer is all about really you know connecting apis and and getting 00:09:08.560 |
this kind of plumbing uh in place but i think it's more than that it's it's really everything around 00:09:13.600 |
this ml code box it's really building this end-to-end pipeline which is accounts for more than 95 percent of the work 00:09:21.280 |
um sorry uh so before the challenges so we just highlight the the two different approaches that are 00:09:30.400 |
widely used by uh in in the industry so the first one is is really the closed work approach so you have 00:09:37.120 |
a foundation model you use it as it is zero shot or few shot learning or even fine tune it with your domain 00:09:43.520 |
specific data and you know if you ask any of the the folks in the enterprises they will tell you we really 00:09:49.440 |
have a hard time operationalizing such models because we have certain accuracy constraints so basically 00:09:55.200 |
the hallucination and they do it very you know confidently uh attribution um you know we can't 00:10:01.360 |
really understand why the models are saying what they are saying uh it's tallness they go they go out of 00:10:06.400 |
date and we have seen as a different releases that uh that comes out of open ai revision as as you know in 00:10:13.440 |
gdpr or even in california ai law uh folks can opt out of the ai systems and and their information can't 00:10:21.280 |
be used again for training or influencing the model decisions so you need to be able to do the model 00:10:27.120 |
editing and and this is really hard in in the foundation model or even if you fine tune your model 00:10:32.080 |
and finally customization so you need to be able to customize these models with your own domain specific 00:10:38.640 |
data and have it really more grounded or more factual to generate information only based on your info your 00:10:45.200 |
domain specific data and it turned out that the solution to all of these problems is really 00:10:51.120 |
to couple the foundation model to an external memory uh also known as the rag uh so rag as you can see 00:10:59.680 |
that you know the original setup remains as it is but we have added this additional context which is coming from 00:11:07.280 |
your domain specific data um and it is grounding so it's improved the the the factual recall and there 00:11:14.400 |
is very nice paper uh around uh regulation reduces hallucination in conversation it kind of rhymes but 00:11:21.280 |
like it's very nice and shows how this kind of architecture really reduces the hallucination of the lm 00:11:26.720 |
systems and you can also have it up to date so you can easily swap in out vector indices so you can do the 00:11:34.160 |
revision you can do uh attribution of course like all of the problems we have mentioned in the previous slide 00:11:39.760 |
you can also do as part of this uh rack setup so you have access to the sources coming out of your retriever 00:11:47.200 |
so you can easily go back and understand why the model generated certain certain text or certain uh decisions 00:11:54.880 |
but it's not so easy right so like there are so many questions that need to be answered for this system 00:12:01.040 |
really to be optimized and and be able to work in production and this is not even half of the questions 00:12:07.040 |
that that we have out there so mess mostly how do we optimize the retriever and generator to work together 00:12:14.000 |
so despite like the mainstream uh kind of frags that most of the people are doing right now is really 00:12:20.800 |
having the retriever and and the generator as two separate brains that don't that none of them knows 00:12:26.560 |
that each other exists uh but the actual rag paper uh that was released uh by by fair is is actually about 00:12:34.640 |
training these two in in parallel so you need to have access to the model parameters and this is now 00:12:41.040 |
thanks thanks to the people who are believing in the open source is possible uh so you can have access 00:12:46.640 |
to the model parameters the open source model parameters so you can fine tune the generator to 00:12:51.280 |
generate factual information based on what it gets from the retriever so it's not just you know attaching 00:12:57.120 |
an external memory and and you know two sides of the brain that that that are totally separated 00:13:02.000 |
so this is our paper uh so it's it's very similar to um the nips one but it it really shows 00:13:10.560 |
the unique and different the challenges that that we would see in in building an end-to-end llm application 00:13:16.800 |
so you can see that you know the again the the surrounding boxes around the llm code or the 00:13:22.720 |
adoption of foundation model is is really you know accounts for more than 90 percent of what goes into 00:13:28.960 |
building such application and it's not really just about you know if we pick one box about the domain 00:13:33.680 |
specific data collection it's not just about building or generating the domain specific data it's also how 00:13:40.080 |
do we how do we preserve the access controls within our enterprises into the into these ecosystems so 00:13:47.280 |
like you know i'm sure most of the organizations that you work with have access controls like you can 00:13:52.640 |
have access to certain systems but not others so how do we make sure that we don't have a global llm 00:13:58.240 |
system that can really have access to all of the data that we have behind the scene so we need to 00:14:02.960 |
maintain the same access controls uh and and have certain specialized models that can work for certain 00:14:09.360 |
tasks and also you know coming back to this nature article that we need to focus about the current risks 00:14:15.840 |
that ei poses today and how we build safeguards around it and this was really the core uh i i would say 00:14:22.880 |
you know principle behind mastercard to move to to adopt llms so we have the same the seven core principle 00:14:29.360 |
of building responsible ei and and it you know it's all everything around privacy it's around security 00:14:36.160 |
reliability um and and you know we we also have this governing body and clear strategy that really 00:14:43.440 |
enforces this core principles into into the building of such llm applications so yes we can go about 00:14:49.520 |
really de-risking new technologies such as llms and use it for some of the services that we have 00:14:55.120 |
but at the same time we need to have the right safeguards to really make sure that you know the access 00:15:00.320 |
controls are in place and also we are not you know generating any biased information and um so funny 00:15:08.400 |
enough one of the reviewers one of the reviewers who accepted this paper i mentioned that you know after 00:15:13.280 |
he he read the paper he was wondering uh if uh if llm is is the right tool to use for for solving some 00:15:19.680 |
of the applications given the huge number of challenges and technical debt uh that that that 00:15:25.360 |
that he have seen uh but as the saying goes like you can't make an omelette without really breaking few 00:15:30.400 |
eggs so you can't really use this kind of transformative technology uh in your business without really being 00:15:36.320 |
challenged in so many ways uh and that's all i have for you and do check out some of the books 00:15:43.120 |
boxes that we have from the ai engineering team from mastercard it's all about putting ai in production 00:15:49.200 |
and and the whole other boxes around ml code or the llm that we have seen in the figures thank you 00:16:04.880 |
and that we have to do so much for you and do check out some of the things that we have to do so