back to index

Breaking AI's 1-GHz Barrier: Sunny Madra (Groq)


Chapters

0:0 Intro
1:7 The speed of innovation
1:38 Groqs progress
2:5 Stateoftheart models
3:55 Human super intelligence
8:1 The art of possible
10:33 Advanced virtual assistance
13:31 Complex decision making
18:25 Personalized learning

Whisper Transcript | Transcript Only Page

00:00:00.000 | what we really wanted to pay homage to today is actually you know just 25 years ago we crossed the
00:00:19.980 | one gigahertz speed barrier in microprocessors what's really crazy is when when we started
00:00:26.700 | thinking about this talk i actually thought it happened a lot before 1999 and i just kind of
00:00:31.820 | remember my own arc of getting involved with computers but really it was 1999 i had to kind
00:00:38.240 | of double and triple check it this is the exact press release when intel broke the one gigahertz
00:00:43.940 | speed barrier and obviously that was interesting you know for a couple of perspectives one it was
00:00:49.440 | this you know really big number and moment but two it was really after this that um you know intel
00:00:55.220 | started to change about how they think about processors would be used and they went for
00:00:59.300 | i guess you know multi cores and things like that and and it's really something that we need to think
00:01:03.860 | about in terms of what's going to happen with llms and and really if you go back to the the rate of
00:01:10.640 | increase it only took uh you know about two decades to get three orders of magnitude speed improvement
00:01:16.800 | in in microprocessors and so if we take a step now and look at where we are with llms and we think about
00:01:23.500 | anywhere close to the speed of innovation and in fact you know what we hear a lot of people talk about
00:01:28.780 | um you know including jensen is that we're beyond the sort of curve of moore's law so we're actually
00:01:34.140 | innovating even faster than that in in llms today um you know just to look at what we've been able to do
00:01:41.580 | at grok just in a short amount of time uh you know this is between april and june of this year you know
00:01:47.980 | we were able to increase the speed of llama 3 8b by over 50 percent and so uh the improvements that are
00:01:56.060 | happening in this area are really really quick and and super exciting and we're really kind of keen to
00:02:02.540 | kind of dive into what could happen here um and so let let's think about like the state of the art
00:02:07.900 | right and so um you know there's models today that you know we can process and others can process that
00:02:14.300 | say huge inputs say on the equivalent of you know 10 000 input tokens per second which gets you down to
00:02:19.980 | say a third of a second across you know processing all of those and when you do that you actually end up
00:02:25.340 | with these capabilities um from a you know speed perspective that far exceed human capabilities for
00:02:31.500 | both integrating and analyzing information and it's happening um you know really really fast the example
00:02:38.460 | i like to talk about here and i don't know if you've used this but i highly recommend it it's this um you
00:02:44.860 | know really cool service called globe.engineer and what it does is you give it a task and or you know so i
00:02:51.980 | say something here i think the example i use here helped me plan a trip to new york to try you know
00:02:56.380 | the best pizza or something like that and what it will do is it and you know i couldn't even capture
00:03:01.100 | the whole screen here but it'll basically figure out all the different elements that have to happen and
00:03:05.660 | it's doing this live online it's connected to the internet so everything from the flights to the taxi
00:03:11.740 | options to the hotel options and then the food options and then itinerary and how i can do it and it
00:03:17.580 | you know it does it all in you know maybe less than five seconds and if you think about
00:03:21.820 | what's really happening there and i like to you know think about when i try uh plan for trips myself
00:03:28.220 | i end up basically opening you know tens to sometimes even hundreds of tabs and those tabs each
00:03:34.380 | have like a like a research stream happening for me and now all of that is solved in like you know a
00:03:39.420 | simple interface you know really enabled by these llms being able to one input process tokens input tokens
00:03:46.300 | faster and then ultimately output tokens faster and it's really giving us a huge edge up
00:03:51.660 | and how we operate as humans and you know where does this all go like if we start thinking about
00:03:57.740 | you know human super intelligence and optimizing and accelerating models it really takes us to like
00:04:05.100 | interesting paradigms here and you know we'll talk about this more in a second but like you know
00:04:09.820 | the high level way to think about it is what if an llm you know really becomes either like an operating system
00:04:17.020 | or like the core of you know how we think about compute today and we can think about it completely
00:04:22.620 | differently than any of the approaches that we've had before you know the way we program these things
00:04:27.980 | the way our expectations are and how they analyze things and so we're really you know that's interesting
00:04:33.740 | in terms of where this is going in terms of super intelligence and staying away from agi but more
00:04:39.900 | about changing the paradigm from where we are today and you know the thing that crosses my mind here
00:04:46.060 | is what happened in the industrial revolution you know if we think about three industries let's think
00:04:51.500 | about making food making cars and making clothes all of those before the industrial revolution
00:04:57.900 | were bespoke right so you'd have you know people that would make one or two cars a day you'd have
00:05:02.460 | people work on farms that could you know maybe farm for less than a city even a small village or someone
00:05:08.060 | that was making sweaters could you know make them you know one one a day or maybe even one a week and
00:05:13.020 | when we had the industrial revolution show up we basically had this ability to make hundreds or
00:05:17.580 | thousands of cars a day food farming at a scale that could be national clothing that could be made at
00:05:23.180 | national scale and we're really you know we haven't had that in technology the arc of technology has been
00:05:31.020 | and this isn't my own framework it comes from paul moritz uh you know who was a long time microsoft
00:05:36.700 | guy and then uh vmware and then pivotal where he and i met um you know he said the first era of
00:05:42.860 | computing was just taking paper processes and making them digital and he goes that's evident in the way
00:05:47.900 | if you think about how the operating system is structured files folders inbox outbox those are all paper
00:05:55.820 | processes that got turned into you know digital processes the next era for us was basically
00:06:00.940 | making those things connected right that's the internet era and what we've been through now
00:06:05.820 | you know maybe in the last 15 years is form factor changes right either pushing things into the cloud
00:06:10.620 | for scale or mobile so you can do it on your phone but finally with ai where we're starting to get to a
00:06:16.060 | place where we have the industrialization in the same way we saw for those you know manufacturing and
00:06:21.500 | physical industries we see that for technology so you know 18 or maybe 24 months ago if you needed to have a
00:06:28.860 | um a photoshop made of some kind of artifact that you're going to put in a presentation you'd go to
00:06:36.060 | your designer and maybe the designer would make one or two a day for you now you can go to mid journey
00:06:40.380 | and a thousand made in the next minute if you want to so we're going through that same kind of
00:06:44.380 | industrialization for tech technology and if we just dive in deeper here into you know where we go as
00:06:50.300 | we can get into like 10 000 complex decisions per second just by getting this down to you know 0.1
00:06:56.140 | milliseconds and then if we if we really really kind of start increasing that it does become viable to
00:07:02.140 | think about the core of our computing becoming an llm and i think this is a real challenge for a lot of
00:07:08.540 | people because we you know obviously we have existing paradigms that we're really really locked into
00:07:13.660 | but this paradigm shift is fundamentally different in terms of how software will be built how software
00:07:19.500 | will run and how software will scale and we don't think about it too much today because we think about
00:07:25.260 | the speed associated with um you know running llms and their capabilities but if we can imagine the same
00:07:32.380 | growth that we saw in cpus happen in this era we can imagine that the core of these devices change to
00:07:40.860 | become you know something and this is again a hat tip to carpathy this is a diagram that he drew but we can
00:07:46.300 | imagine an llm being a core at you know whether what happens in video and audio we're starting to see that
00:07:51.420 | today what happens in our browsers how we interact with other llms how we interact with you know code
00:07:57.180 | interpreters and even our file systems and how we interact with those type of things
00:08:01.900 | and so what is the art of possible if we start doing this and so i'll just kind of rattle off
00:08:07.420 | some things here that you know crossed our minds as we were putting this presentation together
00:08:11.980 | you know we really don't spend a lot of time thinking about it but many responses today
00:08:17.980 | in llms are sort of near real time they're at sort of reading speed but if we go to like instantaneous
00:08:24.780 | responses and decision making this becomes a lot faster again this is really evident when you think about
00:08:29.740 | something like that globe example i showed what you're really able to do there is take a task
00:08:34.700 | that would probably take you either an afternoon or evening or a number of evenings and it's done in just
00:08:39.500 | a few seconds for you and then there's personalized experiences you know today we don't really have a
00:08:45.580 | lot of personalized experiences happening we're starting to see elements of it you know i think open ai has
00:08:50.780 | started to launch a number of features that allow it to understand you know specifics of your world it could be your
00:08:57.100 | pets names or kids names or spouses names but really i think you know where this goes to and a lot of
00:09:02.860 | people push on this i know you know two of my friends uh you know bill gurley and brad gerstner
00:09:07.660 | they talk about this a lot on their pod where they really view personalization as the next major frontier
00:09:13.580 | and personalization and speed are going to go hand in hand if we're going to make that work
00:09:18.060 | kind of seamlessly for folks i think next is kind of a universal natural language processing and so if we
00:09:25.100 | think about our interface today to software it's you know what you know we started with sort of point
00:09:31.180 | and click and keyboards uh we've gone to touch with our you know mobile devices but really you know you
00:09:38.140 | start to see the power of this and you know i think everyone's been super excited for the release of gpt4o
00:09:43.660 | uh the voice agents we i don't think we fully got there yet but i think we've showed the art of the
00:09:48.620 | possible there with what they were able to do with voice and then that kind of mixed interaction i would
00:09:54.540 | say like you know we refer to it as sort of like xrx where it's like any type of input reasoning and
00:10:00.060 | any type of output um you know the example i like to tell people there if you're trying to order something
00:10:05.660 | you may want to interact with an agent in voice but you may want to see the responses in text and so
00:10:11.180 | think about if you're trying to book your haircut and you want to say well tell me what times are
00:10:14.780 | available and then you know it tells you well there's 9 a.m and 11 a.m and 3 30 and 5 30.
00:10:20.140 | that's hard to remember if it's just coming back to you in voice so you want to basically have these
00:10:23.660 | interactions that are multimodal and kind of touches on my second point there and i think we're going to
00:10:28.540 | start to see a lot more of those uh interface changes as well um you know advanced virtual
00:10:35.260 | assistance this is like complex task scheduling i think a lot of what we'll see in the back half of
00:10:40.540 | just this year is uh you know agents start to become much more uh complex and a lot of focus from llm
00:10:48.140 | providers as well i think on making uh you know complex tasks something that are solved it's it's
00:10:54.300 | interesting today because we measure the efficacy of a llm through generally single shot and i think
00:11:01.740 | we do that because you know going back to that where we you know the start of the conversation which is
00:11:05.580 | the performance barrier but naturally if you even take any existing llm today and multi-shot it its
00:11:11.020 | scores get a lot better and there was a couple papers that came out recently that showed if you just
00:11:15.820 | had multiple agents working together on a problem they can far of a less you know less parameter model
00:11:23.260 | they can compete with higher parameter models just by doing sort of multi-shot reasoning or working
00:11:27.660 | together and so i think we'll see a lot more of that as a speed improves and i think there's
00:11:32.060 | there's an incredible incredible optionality there you know we saw the first um i think first cut of
00:11:38.380 | collaborative ai agents with apple ai you know where you see something maybe running on device interacting
00:11:43.980 | with something off device it's i think it's a very early implementation and i think these things will
00:11:49.100 | get much more sophisticated and better um an area you know we've spent a lot of time within our careers
00:11:54.540 | like analytics and predictive analytics i think today everything is uh you know pretty much action
00:12:00.380 | oriented and derived off a human action so i think if we get to a place where the speed goes up it can
00:12:05.820 | be a lot more predictive you know what does that really mean it's just an agent that's always running
00:12:09.820 | in the background because the compute cycles are next to free we don't see that today but i think we get
00:12:14.700 | there as we get you know higher up the curve you know context aware as well and today we again we
00:12:20.700 | are generally limited to how much context we can provide and we're having to even with with models
00:12:25.420 | with bigger context windows we still have to you know be conscious of you know how much compute cycles
00:12:30.540 | we're going to use but i think if that becomes next to free becomes quite powerful for us
00:12:36.940 | you know creative tools and customizable content i'll focus on the second one here this is this is an area
00:12:44.300 | where i think many of us would would like to see things go you know the example i always like to you
00:12:50.140 | know one of my favorite shows was seinfeld and obviously you know it's not on anymore but one of the
00:12:55.580 | things i like to do uh you know when i'm bored is go into you know llm of choice and have it write a
00:13:01.980 | seinfeld episode but made up of like modern day things that are happening and if you ever try that
00:13:07.340 | it's super fun because it does an incredible job of you know identifying which character in those
00:13:13.420 | scenarios that you give it would would have you know sort of the funny or odd thing happen to them
00:13:17.900 | and so the idea of you know taking that beyond sort of writing and taking that to multimedia forms is
00:13:23.740 | going to be really you know really really powerful going forward um you know complex decision making
00:13:29.260 | you know before our company was acquired by grok uh you know we were building a company called
00:13:34.060 | definitive intelligence so we spent a lot of air a lot of time in this space um not only uh doing sort
00:13:40.700 | of say natural language to re uh you know analysis of of sql uh right texas equal as a lot of people
00:13:47.500 | would call it but uh you know rick who's sitting here with us like you know he was working on this
00:13:52.140 | really cool product for us called pioneer which was a automated data science agent where it's really meant to
00:13:58.700 | run almost endlessly on a problem and uh you know you sort of define a kpi if you think about how a
00:14:05.340 | business runs a business has a bunch of kpis and then a business has a bunch of data that's coming in
00:14:10.620 | and then usually humans are taking that data and analyzing as a kpis and creating powerpoints and
00:14:15.500 | spreadsheets and telling either senior management or the world how well they're doing well there's no
00:14:20.540 | reason that just shouldn't happen automatically right and where there's an agent just constantly you know
00:14:25.340 | looking at the new data that's coming in asking additional questions diving into it and i think
00:14:29.740 | we had a lot of interesting things emerge you know we had let pioneer loose on a data set of human
00:14:37.980 | workers and their performance reviews and one of the things that we saw was it was able to correlate
00:14:45.820 | really interesting uh things that we couldn't think about in terms of you know depending on your age and
00:14:51.580 | depending on your performance review it really affected your um i guess your output your productivity
00:14:57.980 | and so i was able to kind of discover that if you're of a certain age and you got a certain type
00:15:03.100 | of performance review your productivity would fall off and maybe rick can correct me if i'm wrong later but
00:15:07.500 | it was something along those lines that which i it was always an interesting example for us and then
00:15:12.140 | obviously a lot of you know um really interesting things around dynamic optimization um you know this
00:15:18.700 | this an area we're familiar from before um you know when a bunch of us were at ford um after the
00:15:26.220 | acquisition of autonomic we really saw you know for the supply chain if you think about how you know
00:15:31.580 | cars are produced and how they're shipped um you know there's you know pretty sophisticated software that does
00:15:37.100 | this but it's still not efficient right and i think um you know the art of the possible with
00:15:42.540 | sort of what we were talking about earlier could be very very interesting for some of our old colleagues
00:15:47.420 | at ford um i'll touch on a couple more things and then leave a couple minutes for questions if there's
00:15:53.340 | any but edge ai and decentralized ai this is pretty cool um you know there's a really cool project called you
00:16:00.860 | know hyperspace that ai what they're doing is um they're actually have a lot of uh you know taking
00:16:08.140 | you know sort of like seti at home or even render and where they're basically allowing people to take
00:16:13.900 | their unused gpu compute and make it available in the cloud uh or i guess yeah and um and why that's
00:16:21.420 | interesting is there's certain use cases that necessarily don't require something to be real time
00:16:25.580 | and so i think we'll see a lot more of that now this intersects really well with us getting more
00:16:31.500 | throughput and getting lower latency out of existing systems so i think we'll see a lot more of that as
00:16:36.300 | well especially because the amount of power consumption that's required if you distribute that you could be
00:16:41.420 | really interesting um and a couple more here is uh enhanced security and privacy this is a big area you
00:16:48.620 | know i was i was talking uh to one of our colleagues last night and he was subject to a really really
00:16:56.060 | scary type of uh i guess maybe phishing call where um you know someone had called in sounded very formal
00:17:03.580 | and had a lot access to a lot of information now you you know we've all seen um there's you know these
00:17:09.340 | kind of uh people that run scam call centers and people that go and attack them but though these folks
00:17:15.020 | armed with ai are much more sophisticated because they can create stories and narratives that are
00:17:19.740 | much deeper than sort of the call center worker of past and uh now i think in order to protect against
00:17:26.780 | these systems you'll almost need to have something on your side um so that you can you know you can think
00:17:32.300 | about it you know with our colleague he was just so confused because the narrative was so good the only
00:17:37.660 | way he could really figure out that this person was a scam rather than hanging up on them was saying hey well
00:17:42.540 | send me some kind of formal message through the hsbc app uh and then then i'll know it's you and and you
00:17:48.060 | know the person wasn't able to do that and so i do think um you know as voice cloning as more of our
00:17:54.300 | information is online we have to be really careful and we will need these protective systems that we can
00:18:00.220 | use um and we need them to run incredibly fast and so um and i think this is the last set of them
00:18:06.460 | here is uh you know education is is something is really important to us you know broadly at grok we
00:18:11.740 | we think about this and we think about you know making tokens available cheaper and more broadly
00:18:16.620 | um and being able to personalize you know salah khan has a very good ted talk from a couple years ago
00:18:22.060 | where he really highlights um it's the two sigma talk he said you can take any student at any level
00:18:27.180 | the highest levels or even someone performing lower and if you give them a personalized tutor they can
00:18:31.980 | improve their test scores to standard deviations and so imagine doing that you know obviously with
00:18:36.860 | ai's that are um you know can be one very cheap to use and that can be personalized to their learning
00:18:42.460 | experience um you know i was speaking to someone recently who was building an ai service for home
00:18:49.020 | schooling and what was what was powerful about that particular service is let's say you have a young
00:18:54.940 | child and they're really into unicorns or ponies and you want to teach them about you know math and so
00:19:00.940 | you know math subtraction addition multiplication it's a lot easier if you frame it in the context of
00:19:05.980 | those things they you know you have three ponies times two unicorns and what do you get from it and
00:19:10.540 | so i never thought about that before but for learning and customizing that for the interest of the
00:19:15.500 | person is quite powerful so we'll see more of that um and then you just interoperability and
00:19:20.700 | compatibility right i think this is an area if you've ever been in enterprise software
00:19:25.660 | the majority of money spent in deploying and maintaining enterprise software is really related
00:19:32.300 | to you know interconnectivity and interoperability and compatibility and so um you know having really
00:19:38.780 | fast and cheap um you know ai technologies will help us really reduce a huge burden that exists on the
00:19:45.900 | enterprise today so um that's it hopefully you guys enjoyed that