Navigating Challenges and Technical Debt in LLMs Deployment: Ahmed Menshawy

00:00:00.000 | Hi everyone, I'll try to touch on three main things and mainly how AI moved from excellence

00:00:20.960 | in structured data to LLMs and the use of unstructured data that most of our organizations

00:00:26.880 | have and also we'll touch on intelligence augmentation and really the hype around AGI and the doomsday

00:00:34.160 | and finally we'll briefly talk about the challenges and technical debt and highlight the findings

00:00:40.320 | that we have published recently. So over the last 10 to 15 years most of the AI values that we have

00:00:50.400 | seen is really coming from structured data and we have seen supervised learning and deep learning

00:00:56.800 | doing really well at labeling things and but this is not the reality like most of the organizations it's

00:01:03.840 | estimated that most of the organization's data is unstructured specifically more than 80 percent of

00:01:10.160 | the organization's data is unstructured and it's also estimated that 71 percent of them really struggle in

00:01:17.600 | managing and and securing this kind of data and it would have been ideal to really build automated systems

00:01:25.120 | try to do certain recommendations based on this data but now it's easy to to really use it and and have it to

00:01:33.520 | contextualize or customize the contextual language models so you can easily have this as an extended memory to

00:01:42.160 | to your language model and have it formulate answers based on the domain specific data

00:01:47.760 | data that you have within your organization and talking about the AGI and and and the way we see

00:01:57.680 | LLMs or generative AI in general at MasterCard it's really augmenting human productivity and and we have seen

00:02:06.640 | a lot of hype around you know generative AI is going to replace our jobs and and doomsday and it's taken

00:02:12.960 | over and I I recommend you this great article from from nature which is really talking about uh stop

00:02:19.680 | talking about tomorrow's AI doomsday when AI poses risks today so stop doing speculations about what

00:02:26.480 | AI will become tomorrow and what kind of risks uh that that will have tomorrow and really focus

00:02:32.880 | about the the current risk that it poses today and and funny enough some of the the the big speakers about

00:02:39.360 | the doomsday are actually ones who have AI systems out there to the end users with with a lot of risks

00:02:45.120 | that we have seen uh in the past uh and and this of course will help regulators as well be more focused

00:02:51.120 | like if we highlight the current risks and so it will help them more focused to have the laws and policies

00:02:57.520 | that can really help them regulate the current AI systems and at the same time be uh early sort of uh or

00:03:04.720 | proactive enough to to adopt any new laws whenever new algorithmic approach uh come up uh also when

00:03:12.320 | it comes to the algorithmic foundation like you know like you know AI and genitive AI specifically has

00:03:17.600 | been transforming our lives in so many ways but the algorithmic foundations itself behind LLMs is not really

00:03:25.120 | the ones that will get us to AGI and also I recommend this talk from Lacoon one of the fathers of machine learning

00:03:32.160 | where he talks about the objective driven uh learning um and the whole idea that you know you know despite

00:03:39.280 | the fact that it's transforming our life in so many ways it's really so dumb at the core of it uh and it's

00:03:46.000 | because of the whole idea that it's auto-aggressive and whenever it's making a mistake this mistake really

00:03:51.280 | amplifies over time because the other generation of tokens is so dependent on what it's already generated

00:03:58.560 | and I can't help it by by uh but you know praying this code from uh Ada Lovelace uh otherwise known as

00:04:06.640 | the world's first computer programmer so in in her 1843 analytics engine paper she mentions that the

00:04:13.280 | analytical engine or machine learning as we call today cannot originate anything by itself it can only do

00:04:20.000 | what what we ask it or what we order it to perform uh because basically we don't have this algorithmic

00:04:25.520 | foundation that can really get us to something that can originate something by itself and despite being

00:04:31.600 | you know about 180 years old this statement still holds uh despite the transformations that we have in

00:04:37.680 | so many uh AI algorithms and applications and um funny enough like I've met a lot of people that thinks

00:04:45.680 | open AI is the one behind language models and I I do hope that you folks don't share them the same

00:04:51.600 | misconception uh the whole idea of predicting the next token given a specific context is very intuitive

00:04:58.080 | and simple idea uh that it's not only a few years old it few decades old uh but was what was really

00:05:04.080 | broken with this is is the whole user interface um and and a lot of folks have really misunderstood what

00:05:11.760 | chat gbt is all about so chat gbt really fixed this whole user interface idea that you were able to

00:05:18.400 | naturally as as we speak be able to prompt the the lm in in a natural way and and get your response

00:05:25.760 | and this is what was was really broken with with the language models before

00:05:29.760 | gbt assistance and the chat gbt specifically because this kind of data is really rare and and

00:05:35.600 | you know lm's or or open AI specifically have built their base model based on the internet scale data

00:05:42.480 | but then in subsequent phases before they release this gbt assistant they had to go through uh

00:05:48.560 | outsourcing a lot of a lot of folks to really go about generating manual pairs of responses and

00:05:54.640 | questions and responses uh and as i said like you know lm's despite you know being dumb at the core of it

00:06:02.720 | it's really accelerating uh innovations everywhere and and we have seen great adoption in in so many

00:06:08.800 | industries and mastercard is no different uh so we have been de-risking this technology responsibly of

00:06:15.360 | course uh and we have a recent press release uh in fab uh where our president uh announced how we used lm's

00:06:24.640 | the generative guy specifically to post fraud detection in some cases by 300 percent

00:06:29.840 | and to go into the the the last topic of my uh of my session is is basically about the challenges so

00:06:40.400 | let's first you know understand the essentials that anyone needs for building a successful gni application

00:06:46.560 | so basically you need to have access to a variety of foundation models and you need to have an environment

00:06:52.560 | to customize contextual llms and you need to have an easy to use tool to build and deploy applications

00:06:59.440 | so basically all the you know the widely used tools that we have seen before

00:07:03.920 | gnii wasn't really applicable to the gnii landscape and finally we need to have a scalable ml infrastructure

00:07:11.120 | that can really help in scaling up and down not just creating replicas but really creating replicas at a speed

00:07:17.520 | that can work for our uh for our end users and i've tried to color code the different essentials based on the challenges

00:07:24.400 | that we we would see in in in building such applications

00:07:28.400 | so access to a variety of foundation models is is is not so challenging

00:07:32.880 | yes still you need to do this kind of trade-off between cost and the model size but it is available

00:07:38.400 | and the environment to customize the language model itself is is is a bit challenging because yes we most of

00:07:45.280 | the enterprises have their own ai environment but it is not really something that is built for uh such

00:07:51.680 | models such large models and and the easy to use tool i think is the most challenging part of the whole equation

00:07:58.400 | because none of the tools that we have seen before and and most of the tools that that most of you guys use now

00:08:04.240 | is really is really as new as lms none of them has existed before uh and finally the need to have

00:08:11.440 | the scalable ml infrastructure is is a bit of a challenge as well uh and and we have seen this

00:08:16.640 | nice curve from open.ui where they show that the gpu compute and ram uh for inference is actually getting

00:08:23.200 | more uh or greater than the the the compute they use for training the model itself um and and before i i i talk

00:08:31.200 | about the uh the challenges in llm and highlights the papers that we have recently published uh i just

00:08:36.960 | want to bring up this really nice chart from the next paper 2015 paper and it shows that ml code which

00:08:44.880 | is at the core of building any machine learning system is only a small fraction of what goes into buildings

00:08:51.040 | end-to-end pipeline and specifically it's less than five percent of what goes into buildings end-to-end pipeline

00:08:57.760 | and this is what i call like i have met a lot of folks uh uh during you know before my talk and and

00:09:03.600 | and they think that you know an ai engineer is all about really you know connecting apis and and getting

00:09:08.560 | this kind of plumbing uh in place but i think it's more than that it's it's really everything around

00:09:13.600 | this ml code box it's really building this end-to-end pipeline which is accounts for more than 95 percent of the work

00:09:21.280 | um sorry uh so before the challenges so we just highlight the the two different approaches that are

00:09:30.400 | widely used by uh in in the industry so the first one is is really the closed work approach so you have

00:09:37.120 | a foundation model you use it as it is zero shot or few shot learning or even fine tune it with your domain

00:09:43.520 | specific data and you know if you ask any of the the folks in the enterprises they will tell you we really

00:09:49.440 | have a hard time operationalizing such models because we have certain accuracy constraints so basically

00:09:55.200 | the hallucination and they do it very you know confidently uh attribution um you know we can't

00:10:01.360 | really understand why the models are saying what they are saying uh it's tallness they go they go out of

00:10:06.400 | date and we have seen as a different releases that uh that comes out of open ai revision as as you know in

00:10:13.440 | gdpr or even in california ai law uh folks can opt out of the ai systems and and their information can't

00:10:21.280 | be used again for training or influencing the model decisions so you need to be able to do the model

00:10:27.120 | editing and and this is really hard in in the foundation model or even if you fine tune your model

00:10:32.080 | and finally customization so you need to be able to customize these models with your own domain specific

00:10:38.640 | data and have it really more grounded or more factual to generate information only based on your info your

00:10:45.200 | domain specific data and it turned out that the solution to all of these problems is really

00:10:51.120 | to couple the foundation model to an external memory uh also known as the rag uh so rag as you can see

00:10:59.680 | that you know the original setup remains as it is but we have added this additional context which is coming from

00:11:07.280 | your domain specific data um and it is grounding so it's improved the the the factual recall and there

00:11:14.400 | is very nice paper uh around uh regulation reduces hallucination in conversation it kind of rhymes but

00:11:21.280 | like it's very nice and shows how this kind of architecture really reduces the hallucination of the lm

00:11:26.720 | systems and you can also have it up to date so you can easily swap in out vector indices so you can do the

00:11:34.160 | revision you can do uh attribution of course like all of the problems we have mentioned in the previous slide

00:11:39.760 | you can also do as part of this uh rack setup so you have access to the sources coming out of your retriever

00:11:47.200 | so you can easily go back and understand why the model generated certain certain text or certain uh decisions

00:11:54.880 | but it's not so easy right so like there are so many questions that need to be answered for this system

00:12:01.040 | really to be optimized and and be able to work in production and this is not even half of the questions

00:12:07.040 | that that we have out there so mess mostly how do we optimize the retriever and generator to work together

00:12:14.000 | so despite like the mainstream uh kind of frags that most of the people are doing right now is really

00:12:20.800 | having the retriever and and the generator as two separate brains that don't that none of them knows

00:12:26.560 | that each other exists uh but the actual rag paper uh that was released uh by by fair is is actually about

00:12:34.640 | training these two in in parallel so you need to have access to the model parameters and this is now

00:12:41.040 | thanks thanks to the people who are believing in the open source is possible uh so you can have access

00:12:46.640 | to the model parameters the open source model parameters so you can fine tune the generator to

00:12:51.280 | generate factual information based on what it gets from the retriever so it's not just you know attaching

00:12:57.120 | an external memory and and you know two sides of the brain that that that are totally separated

00:13:02.000 | so this is our paper uh so it's it's very similar to um the nips one but it it really shows

00:13:10.560 | the unique and different the challenges that that we would see in in building an end-to-end llm application

00:13:16.800 | so you can see that you know the again the the surrounding boxes around the llm code or the

00:13:22.720 | adoption of foundation model is is really you know accounts for more than 90 percent of what goes into

00:13:28.960 | building such application and it's not really just about you know if we pick one box about the domain

00:13:33.680 | specific data collection it's not just about building or generating the domain specific data it's also how

00:13:40.080 | do we how do we preserve the access controls within our enterprises into the into these ecosystems so

00:13:47.280 | like you know i'm sure most of the organizations that you work with have access controls like you can

00:13:52.640 | have access to certain systems but not others so how do we make sure that we don't have a global llm

00:13:58.240 | system that can really have access to all of the data that we have behind the scene so we need to

00:14:02.960 | maintain the same access controls uh and and have certain specialized models that can work for certain

00:14:09.360 | tasks and also you know coming back to this nature article that we need to focus about the current risks

00:14:15.840 | that ei poses today and how we build safeguards around it and this was really the core uh i i would say

00:14:22.880 | you know principle behind mastercard to move to to adopt llms so we have the same the seven core principle

00:14:29.360 | of building responsible ei and and it you know it's all everything around privacy it's around security

00:14:36.160 | reliability um and and you know we we also have this governing body and clear strategy that really

00:14:43.440 | enforces this core principles into into the building of such llm applications so yes we can go about

00:14:49.520 | really de-risking new technologies such as llms and use it for some of the services that we have

00:14:55.120 | but at the same time we need to have the right safeguards to really make sure that you know the access

00:15:00.320 | controls are in place and also we are not you know generating any biased information and um so funny

00:15:08.400 | enough one of the reviewers one of the reviewers who accepted this paper i mentioned that you know after

00:15:13.280 | he he read the paper he was wondering uh if uh if llm is is the right tool to use for for solving some

00:15:19.680 | of the applications given the huge number of challenges and technical debt uh that that that

00:15:25.360 | that he have seen uh but as the saying goes like you can't make an omelette without really breaking few

00:15:30.400 | eggs so you can't really use this kind of transformative technology uh in your business without really being

00:15:36.320 | challenged in so many ways uh and that's all i have for you and do check out some of the books

00:15:43.120 | boxes that we have from the ai engineering team from mastercard it's all about putting ai in production

00:15:49.200 | and and the whole other boxes around ml code or the llm that we have seen in the figures thank you

00:16:04.880 | and that we have to do so much for you and do check out some of the things that we have to do so