Breaking AI's 1-GHz Barrier: Sunny Madra (Groq)

what we really wanted to pay homage to today is actually you know just 25 years ago we crossed the one gigahertz speed barrier in microprocessors what's really crazy is when when we started thinking about this talk i actually thought it happened a lot before 1999 and i just kind of remember my own arc of getting involved with computers but really it was 1999 i had to kind of double and triple check it this is the exact press release when intel broke the one gigahertz speed barrier and obviously that was interesting you know for a couple of perspectives one it was this you know really big number and moment but two it was really after this that um you know intel started to change about how they think about processors would be used and they went for i guess you know multi cores and things like that and and it's really something that we need to think about in terms of what's going to happen with llms and and really if you go back to the the rate of increase it only took uh you know about two decades to get three orders of magnitude speed improvement in in microprocessors and so if we take a step now and look at where we are with llms and we think about anywhere close to the speed of innovation and in fact you know what we hear a lot of people talk about um you know including jensen is that we're beyond the sort of curve of moore's law so we're actually innovating even faster than that in in llms today um you know just to look at what we've been able to do at grok just in a short amount of time uh you know this is between april and june of this year you know we were able to increase the speed of llama 3 8b by over 50 percent and so uh the improvements that are happening in this area are really really quick and and super exciting and we're really kind of keen to kind of dive into what could happen here um and so let let's think about like the state of the art right and so um you know there's models today that you know we can process and others can process that say huge inputs say on the equivalent of you know 10 000 input tokens per second which gets you down to say a third of a second across you know processing all of those and when you do that you actually end up with these capabilities um from a you know speed perspective that far exceed human capabilities for both integrating and analyzing information and it's happening um you know really really fast the example i like to talk about here and i don't know if you've used this but i highly recommend it it's this um you know really cool service called globe.engineer and what it does is you give it a task and or you know so i say something here i think the example i use here helped me plan a trip to new york to try you know the best pizza or something like that and what it will do is it and you know i couldn't even capture the whole screen here but it'll basically figure out all the different elements that have to happen and it's doing this live online it's connected to the internet so everything from the flights to the taxi options to the hotel options and then the food options and then itinerary and how i can do it and it you know it does it all in you know maybe less than five seconds and if you think about what's really happening there and i like to you know think about when i try uh plan for trips myself i end up basically opening you know tens to sometimes even hundreds of tabs and those tabs each have like a like a research stream happening for me and now all of that is solved in like you know a simple interface you know really enabled by these llms being able to one input process tokens input tokens faster and then ultimately output tokens faster and it's really giving us a huge edge up and how we operate as humans and you know where does this all go like if we start thinking about you know human super intelligence and optimizing and accelerating models it really takes us to like interesting paradigms here and you know we'll talk about this more in a second but like you know the high level way to think about it is what if an llm you know really becomes either like an operating system or like the core of you know how we think about compute today and we can think about it completely differently than any of the approaches that we've had before you know the way we program these things the way our expectations are and how they analyze things and so we're really you know that's interesting in terms of where this is going in terms of super intelligence and staying away from agi but more about changing the paradigm from where we are today and you know the thing that crosses my mind here is what happened in the industrial revolution you know if we think about three industries let's think about making food making cars and making clothes all of those before the industrial revolution were bespoke right so you'd have you know people that would make one or two cars a day you'd have people work on farms that could you know maybe farm for less than a city even a small village or someone that was making sweaters could you know make them you know one one a day or maybe even one a week and when we had the industrial revolution show up we basically had this ability to make hundreds or thousands of cars a day food farming at a scale that could be national clothing that could be made at national scale and we're really you know we haven't had that in technology the arc of technology has been and this isn't my own framework it comes from paul moritz uh you know who was a long time microsoft guy and then uh vmware and then pivotal where he and i met um you know he said the first era of computing was just taking paper processes and making them digital and he goes that's evident in the way if you think about how the operating system is structured files folders inbox outbox those are all paper processes that got turned into you know digital processes the next era for us was basically making those things connected right that's the internet era and what we've been through now you know maybe in the last 15 years is form factor changes right either pushing things into the cloud for scale or mobile so you can do it on your phone but finally with ai where we're starting to get to a place where we have the industrialization in the same way we saw for those you know manufacturing and physical industries we see that for technology so you know 18 or maybe 24 months ago if you needed to have a um a photoshop made of some kind of artifact that you're going to put in a presentation you'd go to your designer and maybe the designer would make one or two a day for you now you can go to mid journey and a thousand made in the next minute if you want to so we're going through that same kind of industrialization for tech technology and if we just dive in deeper here into you know where we go as we can get into like 10 000 complex decisions per second just by getting this down to you know 0.1 milliseconds and then if we if we really really kind of start increasing that it does become viable to think about the core of our computing becoming an llm and i think this is a real challenge for a lot of people because we you know obviously we have existing paradigms that we're really really locked into but this paradigm shift is fundamentally different in terms of how software will be built how software will run and how software will scale and we don't think about it too much today because we think about the speed associated with um you know running llms and their capabilities but if we can imagine the same growth that we saw in cpus happen in this era we can imagine that the core of these devices change to become you know something and this is again a hat tip to carpathy this is a diagram that he drew but we can imagine an llm being a core at you know whether what happens in video and audio we're starting to see that today what happens in our browsers how we interact with other llms how we interact with you know code interpreters and even our file systems and how we interact with those type of things and so what is the art of possible if we start doing this and so i'll just kind of rattle off some things here that you know crossed our minds as we were putting this presentation together you know we really don't spend a lot of time thinking about it but many responses today in llms are sort of near real time they're at sort of reading speed but if we go to like instantaneous responses and decision making this becomes a lot faster again this is really evident when you think about something like that globe example i showed what you're really able to do there is take a task that would probably take you either an afternoon or evening or a number of evenings and it's done in just a few seconds for you and then there's personalized experiences you know today we don't really have a lot of personalized experiences happening we're starting to see elements of it you know i think open ai has started to launch a number of features that allow it to understand you know specifics of your world it could be your pets names or kids names or spouses names but really i think you know where this goes to and a lot of people push on this i know you know two of my friends uh you know bill gurley and brad gerstner they talk about this a lot on their pod where they really view personalization as the next major frontier and personalization and speed are going to go hand in hand if we're going to make that work kind of seamlessly for folks i think next is kind of a universal natural language processing and so if we think about our interface today to software it's you know what you know we started with sort of point and click and keyboards uh we've gone to touch with our you know mobile devices but really you know you start to see the power of this and you know i think everyone's been super excited for the release of gpt4o uh the voice agents we i don't think we fully got there yet but i think we've showed the art of the possible there with what they were able to do with voice and then that kind of mixed interaction i would say like you know we refer to it as sort of like xrx where it's like any type of input reasoning and any type of output um you know the example i like to tell people there if you're trying to order something you may want to interact with an agent in voice but you may want to see the responses in text and so think about if you're trying to book your haircut and you want to say well tell me what times are available and then you know it tells you well there's 9 a.m and 11 a.m and 3 30 and 5 30.

that's hard to remember if it's just coming back to you in voice so you want to basically have these interactions that are multimodal and kind of touches on my second point there and i think we're going to start to see a lot more of those uh interface changes as well um you know advanced virtual assistance this is like complex task scheduling i think a lot of what we'll see in the back half of just this year is uh you know agents start to become much more uh complex and a lot of focus from llm providers as well i think on making uh you know complex tasks something that are solved it's it's interesting today because we measure the efficacy of a llm through generally single shot and i think we do that because you know going back to that where we you know the start of the conversation which is the performance barrier but naturally if you even take any existing llm today and multi-shot it its scores get a lot better and there was a couple papers that came out recently that showed if you just had multiple agents working together on a problem they can far of a less you know less parameter model they can compete with higher parameter models just by doing sort of multi-shot reasoning or working together and so i think we'll see a lot more of that as a speed improves and i think there's there's an incredible incredible optionality there you know we saw the first um i think first cut of collaborative ai agents with apple ai you know where you see something maybe running on device interacting with something off device it's i think it's a very early implementation and i think these things will get much more sophisticated and better um an area you know we've spent a lot of time within our careers like analytics and predictive analytics i think today everything is uh you know pretty much action oriented and derived off a human action so i think if we get to a place where the speed goes up it can be a lot more predictive you know what does that really mean it's just an agent that's always running in the background because the compute cycles are next to free we don't see that today but i think we get there as we get you know higher up the curve you know context aware as well and today we again we are generally limited to how much context we can provide and we're having to even with with models with bigger context windows we still have to you know be conscious of you know how much compute cycles we're going to use but i think if that becomes next to free becomes quite powerful for us you know creative tools and customizable content i'll focus on the second one here this is this is an area where i think many of us would would like to see things go you know the example i always like to you know one of my favorite shows was seinfeld and obviously you know it's not on anymore but one of the things i like to do uh you know when i'm bored is go into you know llm of choice and have it write a seinfeld episode but made up of like modern day things that are happening and if you ever try that it's super fun because it does an incredible job of you know identifying which character in those scenarios that you give it would would have you know sort of the funny or odd thing happen to them and so the idea of you know taking that beyond sort of writing and taking that to multimedia forms is going to be really you know really really powerful going forward um you know complex decision making you know before our company was acquired by grok uh you know we were building a company called definitive intelligence so we spent a lot of air a lot of time in this space um not only uh doing sort of say natural language to re uh you know analysis of of sql uh right texas equal as a lot of people would call it but uh you know rick who's sitting here with us like you know he was working on this really cool product for us called pioneer which was a automated data science agent where it's really meant to run almost endlessly on a problem and uh you know you sort of define a kpi if you think about how a business runs a business has a bunch of kpis and then a business has a bunch of data that's coming in and then usually humans are taking that data and analyzing as a kpis and creating powerpoints and spreadsheets and telling either senior management or the world how well they're doing well there's no reason that just shouldn't happen automatically right and where there's an agent just constantly you know looking at the new data that's coming in asking additional questions diving into it and i think we had a lot of interesting things emerge you know we had let pioneer loose on a data set of human workers and their performance reviews and one of the things that we saw was it was able to correlate really interesting uh things that we couldn't think about in terms of you know depending on your age and depending on your performance review it really affected your um i guess your output your productivity and so i was able to kind of discover that if you're of a certain age and you got a certain type of performance review your productivity would fall off and maybe rick can correct me if i'm wrong later but it was something along those lines that which i it was always an interesting example for us and then obviously a lot of you know um really interesting things around dynamic optimization um you know this this an area we're familiar from before um you know when a bunch of us were at ford um after the acquisition of autonomic we really saw you know for the supply chain if you think about how you know cars are produced and how they're shipped um you know there's you know pretty sophisticated software that does this but it's still not efficient right and i think um you know the art of the possible with sort of what we were talking about earlier could be very very interesting for some of our old colleagues at ford um i'll touch on a couple more things and then leave a couple minutes for questions if there's any but edge ai and decentralized ai this is pretty cool um you know there's a really cool project called you know hyperspace that ai what they're doing is um they're actually have a lot of uh you know taking you know sort of like seti at home or even render and where they're basically allowing people to take their unused gpu compute and make it available in the cloud uh or i guess yeah and um and why that's interesting is there's certain use cases that necessarily don't require something to be real time and so i think we'll see a lot more of that now this intersects really well with us getting more throughput and getting lower latency out of existing systems so i think we'll see a lot more of that as well especially because the amount of power consumption that's required if you distribute that you could be really interesting um and a couple more here is uh enhanced security and privacy this is a big area you know i was i was talking uh to one of our colleagues last night and he was subject to a really really scary type of uh i guess maybe phishing call where um you know someone had called in sounded very formal and had a lot access to a lot of information now you you know we've all seen um there's you know these kind of uh people that run scam call centers and people that go and attack them but though these folks armed with ai are much more sophisticated because they can create stories and narratives that are much deeper than sort of the call center worker of past and uh now i think in order to protect against these systems you'll almost need to have something on your side um so that you can you know you can think about it you know with our colleague he was just so confused because the narrative was so good the only way he could really figure out that this person was a scam rather than hanging up on them was saying hey well send me some kind of formal message through the hsbc app uh and then then i'll know it's you and and you know the person wasn't able to do that and so i do think um you know as voice cloning as more of our information is online we have to be really careful and we will need these protective systems that we can use um and we need them to run incredibly fast and so um and i think this is the last set of them here is uh you know education is is something is really important to us you know broadly at grok we we think about this and we think about you know making tokens available cheaper and more broadly um and being able to personalize you know salah khan has a very good ted talk from a couple years ago where he really highlights um it's the two sigma talk he said you can take any student at any level the highest levels or even someone performing lower and if you give them a personalized tutor they can improve their test scores to standard deviations and so imagine doing that you know obviously with ai's that are um you know can be one very cheap to use and that can be personalized to their learning experience um you know i was speaking to someone recently who was building an ai service for home schooling and what was what was powerful about that particular service is let's say you have a young child and they're really into unicorns or ponies and you want to teach them about you know math and so you know math subtraction addition multiplication it's a lot easier if you frame it in the context of those things they you know you have three ponies times two unicorns and what do you get from it and so i never thought about that before but for learning and customizing that for the interest of the person is quite powerful so we'll see more of that um and then you just interoperability and compatibility right i think this is an area if you've ever been in enterprise software the majority of money spent in deploying and maintaining enterprise software is really related to you know interconnectivity and interoperability and compatibility and so um you know having really fast and cheap um you know ai technologies will help us really reduce a huge burden that exists on the enterprise today so um that's it hopefully you guys enjoyed that

Breaking AI's 1-GHz Barrier: Sunny Madra (Groq)

Chapters

Transcript