back to indexBreaking AI's 1-GHz Barrier: Sunny Madra (Groq)

Chapters
0:0 Intro
1:7 The speed of innovation
1:38 Groqs progress
2:5 Stateoftheart models
3:55 Human super intelligence
8:1 The art of possible
10:33 Advanced virtual assistance
13:31 Complex decision making
18:25 Personalized learning
00:00:00.000 |
what we really wanted to pay homage to today is actually you know just 25 years ago we crossed the 00:00:19.980 |
one gigahertz speed barrier in microprocessors what's really crazy is when when we started 00:00:26.700 |
thinking about this talk i actually thought it happened a lot before 1999 and i just kind of 00:00:31.820 |
remember my own arc of getting involved with computers but really it was 1999 i had to kind 00:00:38.240 |
of double and triple check it this is the exact press release when intel broke the one gigahertz 00:00:43.940 |
speed barrier and obviously that was interesting you know for a couple of perspectives one it was 00:00:49.440 |
this you know really big number and moment but two it was really after this that um you know intel 00:00:55.220 |
started to change about how they think about processors would be used and they went for 00:00:59.300 |
i guess you know multi cores and things like that and and it's really something that we need to think 00:01:03.860 |
about in terms of what's going to happen with llms and and really if you go back to the the rate of 00:01:10.640 |
increase it only took uh you know about two decades to get three orders of magnitude speed improvement 00:01:16.800 |
in in microprocessors and so if we take a step now and look at where we are with llms and we think about 00:01:23.500 |
anywhere close to the speed of innovation and in fact you know what we hear a lot of people talk about 00:01:28.780 |
um you know including jensen is that we're beyond the sort of curve of moore's law so we're actually 00:01:34.140 |
innovating even faster than that in in llms today um you know just to look at what we've been able to do 00:01:41.580 |
at grok just in a short amount of time uh you know this is between april and june of this year you know 00:01:47.980 |
we were able to increase the speed of llama 3 8b by over 50 percent and so uh the improvements that are 00:01:56.060 |
happening in this area are really really quick and and super exciting and we're really kind of keen to 00:02:02.540 |
kind of dive into what could happen here um and so let let's think about like the state of the art 00:02:07.900 |
right and so um you know there's models today that you know we can process and others can process that 00:02:14.300 |
say huge inputs say on the equivalent of you know 10 000 input tokens per second which gets you down to 00:02:19.980 |
say a third of a second across you know processing all of those and when you do that you actually end up 00:02:25.340 |
with these capabilities um from a you know speed perspective that far exceed human capabilities for 00:02:31.500 |
both integrating and analyzing information and it's happening um you know really really fast the example 00:02:38.460 |
i like to talk about here and i don't know if you've used this but i highly recommend it it's this um you 00:02:44.860 |
know really cool service called globe.engineer and what it does is you give it a task and or you know so i 00:02:51.980 |
say something here i think the example i use here helped me plan a trip to new york to try you know 00:02:56.380 |
the best pizza or something like that and what it will do is it and you know i couldn't even capture 00:03:01.100 |
the whole screen here but it'll basically figure out all the different elements that have to happen and 00:03:05.660 |
it's doing this live online it's connected to the internet so everything from the flights to the taxi 00:03:11.740 |
options to the hotel options and then the food options and then itinerary and how i can do it and it 00:03:17.580 |
you know it does it all in you know maybe less than five seconds and if you think about 00:03:21.820 |
what's really happening there and i like to you know think about when i try uh plan for trips myself 00:03:28.220 |
i end up basically opening you know tens to sometimes even hundreds of tabs and those tabs each 00:03:34.380 |
have like a like a research stream happening for me and now all of that is solved in like you know a 00:03:39.420 |
simple interface you know really enabled by these llms being able to one input process tokens input tokens 00:03:46.300 |
faster and then ultimately output tokens faster and it's really giving us a huge edge up 00:03:51.660 |
and how we operate as humans and you know where does this all go like if we start thinking about 00:03:57.740 |
you know human super intelligence and optimizing and accelerating models it really takes us to like 00:04:05.100 |
interesting paradigms here and you know we'll talk about this more in a second but like you know 00:04:09.820 |
the high level way to think about it is what if an llm you know really becomes either like an operating system 00:04:17.020 |
or like the core of you know how we think about compute today and we can think about it completely 00:04:22.620 |
differently than any of the approaches that we've had before you know the way we program these things 00:04:27.980 |
the way our expectations are and how they analyze things and so we're really you know that's interesting 00:04:33.740 |
in terms of where this is going in terms of super intelligence and staying away from agi but more 00:04:39.900 |
about changing the paradigm from where we are today and you know the thing that crosses my mind here 00:04:46.060 |
is what happened in the industrial revolution you know if we think about three industries let's think 00:04:51.500 |
about making food making cars and making clothes all of those before the industrial revolution 00:04:57.900 |
were bespoke right so you'd have you know people that would make one or two cars a day you'd have 00:05:02.460 |
people work on farms that could you know maybe farm for less than a city even a small village or someone 00:05:08.060 |
that was making sweaters could you know make them you know one one a day or maybe even one a week and 00:05:13.020 |
when we had the industrial revolution show up we basically had this ability to make hundreds or 00:05:17.580 |
thousands of cars a day food farming at a scale that could be national clothing that could be made at 00:05:23.180 |
national scale and we're really you know we haven't had that in technology the arc of technology has been 00:05:31.020 |
and this isn't my own framework it comes from paul moritz uh you know who was a long time microsoft 00:05:36.700 |
guy and then uh vmware and then pivotal where he and i met um you know he said the first era of 00:05:42.860 |
computing was just taking paper processes and making them digital and he goes that's evident in the way 00:05:47.900 |
if you think about how the operating system is structured files folders inbox outbox those are all paper 00:05:55.820 |
processes that got turned into you know digital processes the next era for us was basically 00:06:00.940 |
making those things connected right that's the internet era and what we've been through now 00:06:05.820 |
you know maybe in the last 15 years is form factor changes right either pushing things into the cloud 00:06:10.620 |
for scale or mobile so you can do it on your phone but finally with ai where we're starting to get to a 00:06:16.060 |
place where we have the industrialization in the same way we saw for those you know manufacturing and 00:06:21.500 |
physical industries we see that for technology so you know 18 or maybe 24 months ago if you needed to have a 00:06:28.860 |
um a photoshop made of some kind of artifact that you're going to put in a presentation you'd go to 00:06:36.060 |
your designer and maybe the designer would make one or two a day for you now you can go to mid journey 00:06:40.380 |
and a thousand made in the next minute if you want to so we're going through that same kind of 00:06:44.380 |
industrialization for tech technology and if we just dive in deeper here into you know where we go as 00:06:50.300 |
we can get into like 10 000 complex decisions per second just by getting this down to you know 0.1 00:06:56.140 |
milliseconds and then if we if we really really kind of start increasing that it does become viable to 00:07:02.140 |
think about the core of our computing becoming an llm and i think this is a real challenge for a lot of 00:07:08.540 |
people because we you know obviously we have existing paradigms that we're really really locked into 00:07:13.660 |
but this paradigm shift is fundamentally different in terms of how software will be built how software 00:07:19.500 |
will run and how software will scale and we don't think about it too much today because we think about 00:07:25.260 |
the speed associated with um you know running llms and their capabilities but if we can imagine the same 00:07:32.380 |
growth that we saw in cpus happen in this era we can imagine that the core of these devices change to 00:07:40.860 |
become you know something and this is again a hat tip to carpathy this is a diagram that he drew but we can 00:07:46.300 |
imagine an llm being a core at you know whether what happens in video and audio we're starting to see that 00:07:51.420 |
today what happens in our browsers how we interact with other llms how we interact with you know code 00:07:57.180 |
interpreters and even our file systems and how we interact with those type of things 00:08:01.900 |
and so what is the art of possible if we start doing this and so i'll just kind of rattle off 00:08:07.420 |
some things here that you know crossed our minds as we were putting this presentation together 00:08:11.980 |
you know we really don't spend a lot of time thinking about it but many responses today 00:08:17.980 |
in llms are sort of near real time they're at sort of reading speed but if we go to like instantaneous 00:08:24.780 |
responses and decision making this becomes a lot faster again this is really evident when you think about 00:08:29.740 |
something like that globe example i showed what you're really able to do there is take a task 00:08:34.700 |
that would probably take you either an afternoon or evening or a number of evenings and it's done in just 00:08:39.500 |
a few seconds for you and then there's personalized experiences you know today we don't really have a 00:08:45.580 |
lot of personalized experiences happening we're starting to see elements of it you know i think open ai has 00:08:50.780 |
started to launch a number of features that allow it to understand you know specifics of your world it could be your 00:08:57.100 |
pets names or kids names or spouses names but really i think you know where this goes to and a lot of 00:09:02.860 |
people push on this i know you know two of my friends uh you know bill gurley and brad gerstner 00:09:07.660 |
they talk about this a lot on their pod where they really view personalization as the next major frontier 00:09:13.580 |
and personalization and speed are going to go hand in hand if we're going to make that work 00:09:18.060 |
kind of seamlessly for folks i think next is kind of a universal natural language processing and so if we 00:09:25.100 |
think about our interface today to software it's you know what you know we started with sort of point 00:09:31.180 |
and click and keyboards uh we've gone to touch with our you know mobile devices but really you know you 00:09:38.140 |
start to see the power of this and you know i think everyone's been super excited for the release of gpt4o 00:09:43.660 |
uh the voice agents we i don't think we fully got there yet but i think we've showed the art of the 00:09:48.620 |
possible there with what they were able to do with voice and then that kind of mixed interaction i would 00:09:54.540 |
say like you know we refer to it as sort of like xrx where it's like any type of input reasoning and 00:10:00.060 |
any type of output um you know the example i like to tell people there if you're trying to order something 00:10:05.660 |
you may want to interact with an agent in voice but you may want to see the responses in text and so 00:10:11.180 |
think about if you're trying to book your haircut and you want to say well tell me what times are 00:10:14.780 |
available and then you know it tells you well there's 9 a.m and 11 a.m and 3 30 and 5 30. 00:10:20.140 |
that's hard to remember if it's just coming back to you in voice so you want to basically have these 00:10:23.660 |
interactions that are multimodal and kind of touches on my second point there and i think we're going to 00:10:28.540 |
start to see a lot more of those uh interface changes as well um you know advanced virtual 00:10:35.260 |
assistance this is like complex task scheduling i think a lot of what we'll see in the back half of 00:10:40.540 |
just this year is uh you know agents start to become much more uh complex and a lot of focus from llm 00:10:48.140 |
providers as well i think on making uh you know complex tasks something that are solved it's it's 00:10:54.300 |
interesting today because we measure the efficacy of a llm through generally single shot and i think 00:11:01.740 |
we do that because you know going back to that where we you know the start of the conversation which is 00:11:05.580 |
the performance barrier but naturally if you even take any existing llm today and multi-shot it its 00:11:11.020 |
scores get a lot better and there was a couple papers that came out recently that showed if you just 00:11:15.820 |
had multiple agents working together on a problem they can far of a less you know less parameter model 00:11:23.260 |
they can compete with higher parameter models just by doing sort of multi-shot reasoning or working 00:11:27.660 |
together and so i think we'll see a lot more of that as a speed improves and i think there's 00:11:32.060 |
there's an incredible incredible optionality there you know we saw the first um i think first cut of 00:11:38.380 |
collaborative ai agents with apple ai you know where you see something maybe running on device interacting 00:11:43.980 |
with something off device it's i think it's a very early implementation and i think these things will 00:11:49.100 |
get much more sophisticated and better um an area you know we've spent a lot of time within our careers 00:11:54.540 |
like analytics and predictive analytics i think today everything is uh you know pretty much action 00:12:00.380 |
oriented and derived off a human action so i think if we get to a place where the speed goes up it can 00:12:05.820 |
be a lot more predictive you know what does that really mean it's just an agent that's always running 00:12:09.820 |
in the background because the compute cycles are next to free we don't see that today but i think we get 00:12:14.700 |
there as we get you know higher up the curve you know context aware as well and today we again we 00:12:20.700 |
are generally limited to how much context we can provide and we're having to even with with models 00:12:25.420 |
with bigger context windows we still have to you know be conscious of you know how much compute cycles 00:12:30.540 |
we're going to use but i think if that becomes next to free becomes quite powerful for us 00:12:36.940 |
you know creative tools and customizable content i'll focus on the second one here this is this is an area 00:12:44.300 |
where i think many of us would would like to see things go you know the example i always like to you 00:12:50.140 |
know one of my favorite shows was seinfeld and obviously you know it's not on anymore but one of the 00:12:55.580 |
things i like to do uh you know when i'm bored is go into you know llm of choice and have it write a 00:13:01.980 |
seinfeld episode but made up of like modern day things that are happening and if you ever try that 00:13:07.340 |
it's super fun because it does an incredible job of you know identifying which character in those 00:13:13.420 |
scenarios that you give it would would have you know sort of the funny or odd thing happen to them 00:13:17.900 |
and so the idea of you know taking that beyond sort of writing and taking that to multimedia forms is 00:13:23.740 |
going to be really you know really really powerful going forward um you know complex decision making 00:13:29.260 |
you know before our company was acquired by grok uh you know we were building a company called 00:13:34.060 |
definitive intelligence so we spent a lot of air a lot of time in this space um not only uh doing sort 00:13:40.700 |
of say natural language to re uh you know analysis of of sql uh right texas equal as a lot of people 00:13:47.500 |
would call it but uh you know rick who's sitting here with us like you know he was working on this 00:13:52.140 |
really cool product for us called pioneer which was a automated data science agent where it's really meant to 00:13:58.700 |
run almost endlessly on a problem and uh you know you sort of define a kpi if you think about how a 00:14:05.340 |
business runs a business has a bunch of kpis and then a business has a bunch of data that's coming in 00:14:10.620 |
and then usually humans are taking that data and analyzing as a kpis and creating powerpoints and 00:14:15.500 |
spreadsheets and telling either senior management or the world how well they're doing well there's no 00:14:20.540 |
reason that just shouldn't happen automatically right and where there's an agent just constantly you know 00:14:25.340 |
looking at the new data that's coming in asking additional questions diving into it and i think 00:14:29.740 |
we had a lot of interesting things emerge you know we had let pioneer loose on a data set of human 00:14:37.980 |
workers and their performance reviews and one of the things that we saw was it was able to correlate 00:14:45.820 |
really interesting uh things that we couldn't think about in terms of you know depending on your age and 00:14:51.580 |
depending on your performance review it really affected your um i guess your output your productivity 00:14:57.980 |
and so i was able to kind of discover that if you're of a certain age and you got a certain type 00:15:03.100 |
of performance review your productivity would fall off and maybe rick can correct me if i'm wrong later but 00:15:07.500 |
it was something along those lines that which i it was always an interesting example for us and then 00:15:12.140 |
obviously a lot of you know um really interesting things around dynamic optimization um you know this 00:15:18.700 |
this an area we're familiar from before um you know when a bunch of us were at ford um after the 00:15:26.220 |
acquisition of autonomic we really saw you know for the supply chain if you think about how you know 00:15:31.580 |
cars are produced and how they're shipped um you know there's you know pretty sophisticated software that does 00:15:37.100 |
this but it's still not efficient right and i think um you know the art of the possible with 00:15:42.540 |
sort of what we were talking about earlier could be very very interesting for some of our old colleagues 00:15:47.420 |
at ford um i'll touch on a couple more things and then leave a couple minutes for questions if there's 00:15:53.340 |
any but edge ai and decentralized ai this is pretty cool um you know there's a really cool project called you 00:16:00.860 |
know hyperspace that ai what they're doing is um they're actually have a lot of uh you know taking 00:16:08.140 |
you know sort of like seti at home or even render and where they're basically allowing people to take 00:16:13.900 |
their unused gpu compute and make it available in the cloud uh or i guess yeah and um and why that's 00:16:21.420 |
interesting is there's certain use cases that necessarily don't require something to be real time 00:16:25.580 |
and so i think we'll see a lot more of that now this intersects really well with us getting more 00:16:31.500 |
throughput and getting lower latency out of existing systems so i think we'll see a lot more of that as 00:16:36.300 |
well especially because the amount of power consumption that's required if you distribute that you could be 00:16:41.420 |
really interesting um and a couple more here is uh enhanced security and privacy this is a big area you 00:16:48.620 |
know i was i was talking uh to one of our colleagues last night and he was subject to a really really 00:16:56.060 |
scary type of uh i guess maybe phishing call where um you know someone had called in sounded very formal 00:17:03.580 |
and had a lot access to a lot of information now you you know we've all seen um there's you know these 00:17:09.340 |
kind of uh people that run scam call centers and people that go and attack them but though these folks 00:17:15.020 |
armed with ai are much more sophisticated because they can create stories and narratives that are 00:17:19.740 |
much deeper than sort of the call center worker of past and uh now i think in order to protect against 00:17:26.780 |
these systems you'll almost need to have something on your side um so that you can you know you can think 00:17:32.300 |
about it you know with our colleague he was just so confused because the narrative was so good the only 00:17:37.660 |
way he could really figure out that this person was a scam rather than hanging up on them was saying hey well 00:17:42.540 |
send me some kind of formal message through the hsbc app uh and then then i'll know it's you and and you 00:17:48.060 |
know the person wasn't able to do that and so i do think um you know as voice cloning as more of our 00:17:54.300 |
information is online we have to be really careful and we will need these protective systems that we can 00:18:00.220 |
use um and we need them to run incredibly fast and so um and i think this is the last set of them 00:18:06.460 |
here is uh you know education is is something is really important to us you know broadly at grok we 00:18:11.740 |
we think about this and we think about you know making tokens available cheaper and more broadly 00:18:16.620 |
um and being able to personalize you know salah khan has a very good ted talk from a couple years ago 00:18:22.060 |
where he really highlights um it's the two sigma talk he said you can take any student at any level 00:18:27.180 |
the highest levels or even someone performing lower and if you give them a personalized tutor they can 00:18:31.980 |
improve their test scores to standard deviations and so imagine doing that you know obviously with 00:18:36.860 |
ai's that are um you know can be one very cheap to use and that can be personalized to their learning 00:18:42.460 |
experience um you know i was speaking to someone recently who was building an ai service for home 00:18:49.020 |
schooling and what was what was powerful about that particular service is let's say you have a young 00:18:54.940 |
child and they're really into unicorns or ponies and you want to teach them about you know math and so 00:19:00.940 |
you know math subtraction addition multiplication it's a lot easier if you frame it in the context of 00:19:05.980 |
those things they you know you have three ponies times two unicorns and what do you get from it and 00:19:10.540 |
so i never thought about that before but for learning and customizing that for the interest of the 00:19:15.500 |
person is quite powerful so we'll see more of that um and then you just interoperability and 00:19:20.700 |
compatibility right i think this is an area if you've ever been in enterprise software 00:19:25.660 |
the majority of money spent in deploying and maintaining enterprise software is really related 00:19:32.300 |
to you know interconnectivity and interoperability and compatibility and so um you know having really 00:19:38.780 |
fast and cheap um you know ai technologies will help us really reduce a huge burden that exists on the 00:19:45.900 |
enterprise today so um that's it hopefully you guys enjoyed that