Latent Space LIVE! - Best of 2024: Startups, Vision, Open Src, Reasoning, & The Great Scaling Debate

00:03:35.700 | i need to share audio yeah because because i'm not sharing my screen right

00:03:42.700 | but so for the mic she's going to do it oh you don't need to

00:04:00.060 | um

00:04:08.620 | so all the mics and like the audio from this room we're going to zoom yeah

00:04:26.620 | set up okay um can you take them yeah you just have to mute your yeah

00:04:37.260 | i might need to share your audio like if i present her yeah you can go but i'm just

00:04:47.340 | muting the music yeah we just need that yeah okay over there yeah yes um

00:05:15.980 | actually i don't know what else um i guess um yeah

00:05:37.740 | um

00:05:47.500 | so

00:06:01.500 | um

00:06:10.780 | um

00:06:40.060 | oh

00:06:45.820 | i mean the same pattern we're gonna we're gonna sleep

00:07:00.700 | you gonna do um

00:07:05.580 | um

00:07:10.300 | yeah

00:07:13.100 | um um yeah no you can't make it either two investors yeah

00:07:41.260 | yeah

00:07:43.420 | but uh yeah that's what the north is um they have a great terms um i didn't know

00:08:10.300 | yeah we got in two days ago

00:08:20.940 | um

00:08:22.780 | um

00:08:35.740 | he doesn't know

00:08:43.260 | okay

00:08:45.500 | okay

00:08:59.500 | um

00:09:11.500 | um

00:09:21.740 | what uh yes there is no

00:09:42.780 | oh

00:09:44.220 | i'm sure

00:09:58.220 | right

00:10:12.460 | um

00:10:26.700 | my question um

00:10:40.940 | um

00:10:53.180 | yeah um

00:11:07.900 | made plans last night it's great yeah i actually realized we should probably hire a designer

00:11:12.620 | the weird thing is you have no idea like how many people are having trouble finding this place

00:11:25.820 | versus so many people like this like waking up late yes well it's okay but we're recording the

00:11:31.980 | whole thing when you said 500 i was imagining exactly just while they're going uh

00:11:40.300 | so

00:12:04.060 | let me know if you can find the spot

00:12:07.820 | okay um you can just plug in here and i'll drop you the zoom link

00:12:21.820 | so we stream from zoom straight to youtube but we're also recording separately for the podcast

00:12:29.260 | and subsequent editing what is that

00:12:34.700 | it's the guest network here yeah

00:12:39.580 | to log in

00:12:48.700 | no that's not the link ignore the thing i just said

00:12:57.020 | yeah we need we need to show them a little harder ah

00:12:59.420 | okay that's the zoom link

00:13:04.780 | um so yeah it should be good for zoom yeah um do we need to send you a laptop there no that's

00:13:13.340 | great i i use my boots from pisa okay awesome

00:13:25.580 | we think so can you can you hear anything i'm not sure

00:13:29.180 | well there's like a slight delay but if i'm talking here it should show up there in like

00:13:36.940 | 10 seconds yeah okay oh one more thing for these mics yeah just make sure they use it so that it

00:13:44.860 | goes into zoom yeah this is on and then yeah we're using this manually but also

00:13:53.020 | simon fraser

00:13:55.020 | right

00:14:01.840 | you take it as well yeah uh all right i'm gonna wire you both up and now by the way oh i'm sorry

00:14:09.820 | hi sarah um and so there's a

00:14:13.580 | i feel like for sarah we need to give her a laugh

00:14:21.660 | no it's fine oh uh yes

00:14:24.780 | so plug in screw in

00:14:39.580 | looks like way about half of it

00:14:49.420 | oh is that it that's it there's a puppy thing but we're indoors so we don't need that

00:14:55.900 | wait that's so good i like your shirt oh thank you

00:15:01.100 | now yeah it's molly white

00:15:06.060 | yes he sent me a photo of this place and i was like we have to do this

00:15:17.340 | nice this is like the oxford union style

00:15:20.060 | i wanted to talk where the way they set up the conference there's a rotating platform

00:15:33.260 | the center it's like a stadium like thing it's like i don't know look at this is not intense

00:15:38.860 | yeah that's a really good bit yeah it's terrible but you should think about doing it next time

00:15:45.900 | yeah i mean it's funny to watch

00:15:48.460 | yeah you must be having a dinner office we just need a platform and then like

00:16:01.340 | that would be so amazing

00:16:02.620 | like we have a lot of people on youtube i don't know how many people

00:16:29.740 | i i just have all the um opening eye jokes that i've warm in my head

00:16:37.180 | um like how does uh how does rudolf update like yeah exactly i know i know i like that one too

00:16:49.020 | um

00:16:53.580 | with 40 people online all right i am ready to transfer over to you

00:17:15.260 | yes yeah

00:17:24.380 | so

00:17:33.500 | but this will never

00:17:46.540 | no this forever

00:17:49.580 | on it oh it's in your shirt now

00:17:57.420 | this sounds good i think that looks good actually yeah um and then on the top of it there's a button

00:18:06.780 | if you press it now um then it'll start flashing red and that's the report and it's not broadcasting

00:18:11.820 | it's just recording like that so if you push then okay there you go

00:18:19.180 | okay they're figuring out the suit okay so you want me to dial it into the sim uh yeah i think

00:18:41.100 | i sent it to you yeah i'll check text or email i'll text

00:18:45.180 | it's the second one i got that thank you

00:18:51.980 | uh

00:19:10.140 | screen so that plugs in yeah oh share screen too

00:19:14.060 | so right now it's just pinning wow not camera yeah so we can tweak it

00:19:18.780 | okay um and then i typically hide this

00:19:27.980 | messing with your computer settings there we go no it's just you know

00:19:36.300 | standard presenter issues uh this goes into stream and you're also mic'd up

00:19:42.780 | do you have the is it on yeah mine's recording yeah nice

00:19:46.700 | you want to check if

00:20:03.820 | do we get computer audio as well and do we get audio from the computer too

00:20:08.540 | okay uh we just have the hn demo but you can do a really good impression of

00:20:16.220 | i mean well um i think we can just run it in worst case we'll um

00:20:25.740 | we'll put it in the show notes it's fine yeah okay

00:20:32.940 | yeah do you do you want to start by saying anything yeah i think you should probably yeah

00:20:37.740 | okay i've been so busy with logistics and stuff that um i haven't done okay um i think we're

00:20:46.460 | going to kick this off um thanks to everyone who made it early morning um it's like really

00:20:51.580 | weird experiments that we wanted to try because one we saw this space uh and but two also i've

00:20:56.620 | been to a number of these things now and um i always felt like there was not enough like

00:21:01.100 | industry content for for people and we wanted an opportunity while everyone is in town in like one

00:21:06.780 | central spot to get everyone together um to talk about the best stuff of the year review the year

00:21:11.980 | it's very nice that new york is always the end of the year um and so i'm very honored that uh

00:21:17.420 | sarah and pranav have agreed to help us kick this off um sarah i've known for i was actually

00:21:23.100 | counting 17 years um and but she's she's gone she's uh been enormously successful as an ai

00:21:32.060 | investor um even uh even when you're doing your graylock days i was tracking your your investing

00:21:37.020 | and it's uh it's come a long way since then um and pranav uh i i've known i've known uh shorter

00:21:42.940 | but he's also starting to write uh really incredible posts and opinions about what he's

00:21:47.100 | seeing as an investor so i wanted to kick this off at the industry session um we have a great day of

00:21:52.380 | sort of like best of year recaps uh for uh lined up i think vic is here as well um and uh and the

00:21:59.580 | robo flow guys so uh i would just let you keep kick it off thank you hi everyone uh my name is

00:22:09.180 | sarah guo and thanks to uh sean and friends here for having me and pranav so um i'd start by just

00:22:16.860 | giving 30 seconds of intro i promise this isn't an ad uh we started a venture fund called conviction

00:22:22.140 | about two years ago here is a set of the investments we've made uh they range from

00:22:27.580 | companies at the infrastructure level in terms of feeding the revolution to foundation model

00:22:34.380 | companies alternative architectures domain specific training efforts and of course applications

00:22:39.420 | and the premise of the fund sean mentioned i worked at graylock for about a decade before

00:22:45.100 | that and came from the product engineering side was that uh we we thought that there was a really

00:22:50.700 | interesting technical revolution happening uh that it would probably be the biggest change in

00:22:55.580 | how people use technology in our lifetimes and that represented huge economic opportunity

00:23:00.380 | and and maybe that there would be an advantage versus the incumbent venture firms in that when

00:23:06.060 | the floor is lava the dynamics of the markets change the types of products and founders that

00:23:11.020 | you back change uh it's a lot for existing firms to ingest and a lot of their mental models may not

00:23:17.580 | apply in the same way uh and so there was an opportunity for first principles thinking and

00:23:22.060 | if we were right would we do really well and get to work with amazing people and so we are

00:23:25.980 | two years into that journey and we can share some of the opinions and predictions we have with all

00:23:29.820 | of you um sorry i'm just making sure that isn't actually blocking the whole presentation uh i'm

00:23:38.380 | proud it's going to start us off um so quick agenda for today we'll cover some of the model

00:23:43.580 | landscapes and themes that we've seen in 2024 uh what we think is happening in ai startups and then

00:23:48.220 | some of our latent priors uh on what we think is working in investing so the um i thought it'd be

00:23:54.780 | useful to start from like what was happening at neurops last year in december 2023 so in october

00:24:00.540 | 2023 opening i had just launched the ability to upload images to chat gpt which means up until

00:24:05.100 | that moment it's hard to believe but like roughly a year ago you could only input text and get text

00:24:08.940 | out of chat gpt um the mistral folks had just launched the mixed role model right before the

00:24:14.380 | beginning of neurops google had just announced gemini i very genuinely forgot about the existence

00:24:19.260 | of bard before making these slides and europe had just announced that they were doing their

00:24:24.140 | first round of ai regulation but not to be their last and when we were thinking about like what's

00:24:29.500 | changed in 2024 there's at least five themes that we could come up with that feel like they

00:24:33.980 | were descriptive of of what 2024 has meant for ai and for startups and so we'd start with um first

00:24:40.540 | it's a much closer race on the foundation model side than it was in 2023 so this is elem arena

00:24:45.900 | they're asked users to rate the evaluations from uh from generations from specific prompts so you

00:24:52.140 | get two responses from two language models answer which one of them is better the way to interpret

00:24:55.820 | this is like roughly 100 elo difference means that you're preferred two-thirds of the time

00:25:00.140 | and a year ago every open ai model was like more than 100 points better than anything else

00:25:05.020 | and the view from the ground was roughly like open ai is the ibm there is no point in competing

00:25:09.740 | everyone should just give up go work at open ai or attempt to use open ai models and i think the

00:25:15.020 | story today is not that i think it would have been unbelievable a year ago if you told people that a

00:25:20.780 | the best model today on this at least on this eval is not open ai and b that it was google

00:25:26.220 | would have been pretty unimaginable to the majority of researchers but actually there are a variety of

00:25:32.060 | of proprietary language model options and some set of open source options that are increasingly

00:25:36.060 | competitive and this seems true not just on the eval side but also in actual spend so this is

00:25:41.260 | ramp data there's a bunch of colors but it's actually just open ai and anthropic spend and the

00:25:45.900 | open ai spend at the beginning at the end of last year in november of 23 was close to 90 percent of

00:25:50.780 | total volume and today less than a year later it's closer to 60 percent of total volume which i think

00:25:56.700 | is indicative both that language models are pretty easy apis to switch out and people are trialing

00:26:01.420 | a variety of different options to figure out what works best for them related second trend that

00:26:06.700 | we've noticed is that open source is increasingly competitive so this is from the scale leader

00:26:11.740 | boards which is a set of independent evals that are not contaminated and on a number of topics

00:26:17.660 | that actually the the foundation models clearly care a great deal about open source models are

00:26:21.740 | pretty good on math instruction following and adversarial robustness the llama model is amongst

00:26:26.620 | the top three of evaluated models i included the agentic tool use here just to point out that this

00:26:32.060 | isn't true across the board there are clearly some areas where foundation model companies have

00:26:36.140 | had more data or more expertise in training against these use cases but models are surprisingly an

00:26:40.940 | increasing open source models are surprisingly increasingly effective this feels true across

00:26:45.100 | evals this is the mmlu eval i want to call it two things here one is that it's pretty remarkable

00:26:51.420 | that the ninth best model and two points behind and uh the the best state-of-the-art models is

00:26:56.540 | actually a 70 billion parameter model i think this would have been surprising to a bunch of people

00:27:00.940 | who were the belief was largely that most intelligence is just an emergent property

00:27:04.860 | and there's a limit to how much intelligence you can push into smaller form factors in fact a year

00:27:09.340 | ago the the best small model or under 10 billion parameter model would have been mistral 7b which

00:27:14.140 | on this eval if memory service is somewhere around is 60 and today that's the llama 8b model which is

00:27:19.660 | more than 10 points better the the gap between what is state-of-the-art and what you can fit

00:27:23.980 | into a fairly small uh form factor is actually actually shrinking um and again related the we

00:27:31.340 | think the price of intelligence has come down substantially this is this is a graph of flagship

00:27:35.180 | open ai model costs where the cost of the api has come down roughly 80 85 and call it the last year

00:27:42.300 | year and a half which is pretty remarkable this isn't just open ai2 this is also like the full

00:27:47.100 | set of models this is from artificial analysis which tracks cost per token across a variety of

00:27:51.340 | different apis and public inference options and like we were doing some math on this if you wanted

00:27:56.140 | to recreate like what a the kind of data that a text editor had or that like something like notion

00:28:01.260 | or coda that's somewhere in the volume of a couple thousand dollars to create that volume of tokens

00:28:06.060 | that's pretty remarkable and impressive it's clearly not the same distribution of data but

00:28:10.940 | just as like a sense of scope the there's an enormous volume of data that you can create

00:28:14.940 | and then fourth we think new modalities are beginning to work start quickly with biology

00:28:21.180 | we're lucky to work with the folks at chi discovery who just released chi1 which is

00:28:25.420 | open source model that outperforms alpha fold 3 it's impressive that this is like roughly a year

00:28:29.980 | of work with a pretty specific data set and then pretty specific technical beliefs but

00:28:33.980 | models in domains like biology are beginning to work we think that's true on the voice side as

00:28:38.300 | well point out that there were voice models before things like 11 labs have existed for a while but

00:28:43.420 | we think low latency voice is more than just a feature it's actually a net new experience

00:28:47.900 | interaction using voice mode feels very different than the historical transcription first models

00:28:52.780 | same thing with many of the cartesian models and then a new nascent use case is execution so cloud

00:28:59.340 | launch computer use openai launched code execution inside of canvas yesterday and then i think devon

00:29:03.980 | just announced that you can all try it for 500 a month which is pretty remarkable it's a set of

00:29:09.100 | capabilities that have historically never been available to vast majority of population and i

00:29:13.020 | think we're still in early innings cognition the company was founded under a year ago first product

00:29:17.340 | was roughly nine months ago which is pretty impressive if you recall like a year ago the

00:29:23.020 | point of view on swebench was like it was impossible to surpass what 15 percent or so

00:29:28.780 | and i think the the whole industry now considers that if not trivial accessible yeah

00:29:34.460 | um last new modality we wanted to call out although there are many more is video um i took

00:29:40.860 | the liberty i got early access to sora and managed to sign up before they cut off accesses so um here

00:29:46.220 | is my favorite joke in the form of a video hopefully someone here can guess it

00:29:49.740 | yeah you're telling me a shrimp fried this rice it's a pretty bad joke but i really like it

00:29:58.940 | and i think this one the next video here is uh one of our portfolio companies hey jen that

00:30:05.180 | translated and does the dubbing for or lip sync and dubbing for live speeches so this is javier

00:30:12.460 | mille who speaks in spanish but here you will hear him in english if this if this plays um

00:30:18.460 | and you can see that you can capture the original tonality of of his speech and performance i think

00:30:23.500 | auto here doesn't work but we'll we'll push something publicly sure um let's give it a shot

00:30:29.260 | yeah excellent of the western world yeah and you can hear that this captures like his original

00:30:36.700 | tone uh and like the emotion in his speech which is definitely new and pretty impressive

00:30:41.900 | from from new models um so the last uh the yeah that makes sense um the last point that we wanted

00:30:50.860 | to call out is uh the much purported end of scaling i think there is a great debate happening

00:30:55.180 | here later today on the question of this but we think at minimum it's hard to deny that there are

00:30:59.820 | at least some limits to the the clear benefits to increasing scale um but there also seems like

00:31:06.220 | there are new scaling paradigms so the question of test time compute scaling is a pretty interesting

00:31:10.220 | one it seems like openai has cracked a version of this that works and we think a foundation model

00:31:14.540 | labs will come up with better ways of doing this and b so far it largely works for very verifiable

00:31:20.700 | domains things that look like math and physics and maybe secondarily software engineering where

00:31:24.300 | we can get an objective value function and i think an open question for the next year is going to be

00:31:28.380 | how we generate those value functions for spaces that are not as well constrained or well defined

00:31:32.220 | so the question that this leaves us in is like well what does that mean for startups

00:31:37.180 | and i think a prevailing view has been that we live in an ai bubble there's an enormous amount

00:31:43.260 | of funding that goes towards ai companies and startups that is largely unjustified based on

00:31:47.100 | outcomes and what's actually working on the ground and startups are largely raising money

00:31:51.820 | on hype and so we pulled some pitch book data and the 2024 number is like probably incomplete since

00:31:56.940 | not all rounds are being reported and largely suggests like actually there is a substantial

00:32:01.100 | recovery in funding and maybe 2025 looks something like 2021 but if you break out the numbers here a

00:32:07.260 | bit more the red is actually just a small number of foundation model labs like what you would think

00:32:11.900 | of as the largest labs raising money which is upwards of 30 to 40 billion dollars this year

00:32:16.700 | and so the reality of the funding environment actually seems like much more sane and rational

00:32:21.260 | it doesn't look like we're headed to a version of 2021 in fact the the foundation model labs

00:32:25.420 | account for an outsized amount of money being raised but the the set of money going to companies

00:32:31.260 | that are working seems much more rational and we wanted to give you we can't share numbers for

00:32:35.900 | every company but this is one of our portfolio companies growing really really quickly um we

00:32:41.100 | think 0 to 20 and just plg style spending is pretty impressive if any of you are doing better than that

00:32:45.980 | you should come find us we'd love to chat and so what we wanted to try and center a discussion on

00:32:53.740 | this is certainly not all of the companies that are making 10 million more or revenue and growing

00:32:57.500 | but we took a selection of them and wanted to give you a couple ideas of patterns that we've noticed

00:33:02.300 | that seem to be working across the board um the first one that we've noticed is like first wave

00:33:07.180 | service automation so we think there's a large amount of work that doesn't get done at companies

00:33:12.460 | today either because it is too expensive to hire someone to do it it's too expensive to provide

00:33:17.020 | them context and enable them to be successful uh at uh at whatever the specific role is or

00:33:22.300 | it's too hard to manage um those set of people so prescribing it's too expensive to hire those

00:33:26.860 | specific set of people for sierra and decagon for customer support style companies it's really

00:33:30.620 | useful to do like next level automation and then there's obviously growth in that and for harvey

00:33:34.380 | and even up the story is um you can do first wave professional services and then grow beyond that

00:33:41.740 | second trend that we've noticed is better search new friends so we think that there is a it's pretty

00:33:47.340 | impressive like how effective text modalities have been so character and replica have been

00:33:51.180 | remarkably successful companies and there's a whole host of not safe for work chatbots as well

00:33:55.100 | that are pretty effective at just text generation they're pretty compelling mechanisms and on the

00:33:59.980 | productivity side perplexity and glean have demonstrated this as well i worked at a search

00:34:03.020 | company for a while i think the changing paradigms of how people capture and learn information is

00:34:08.220 | pretty interesting we think it's likely text isn't the last medium their infographics or sets of

00:34:13.020 | information that seem more useful or sets of engagement that are more engaging um but this

00:34:16.940 | feels like a pretty interesting place to start oh yeah okay mike so one thing that i've worked on

00:34:26.940 | investing in in a long time is democratization of different skills be they creative or technical

00:34:32.140 | this has been an amazing few years for that across different modalities audio video general image

00:34:39.900 | media text and now code and and really fully functioning applications um one thing that's

00:34:46.620 | really interesting about the growth driver for all of these companies is the the end users in

00:34:52.220 | large part are not people that we thought of as we the venture industry you know the royal we

00:34:57.980 | thought of as important markets before um and so a premise we have as a fund is that there's

00:35:03.660 | actually much more instinct for creativity visual creativity audio creativity technical creativity

00:35:09.660 | than like there's latent demand for it and ai applications can really serve that i think in

00:35:15.980 | particular mid journey was a company that is in the vanguard here and nobody understood for a long

00:35:20.380 | time because the perhaps outside view is like how many people want to generate images that are not

00:35:27.260 | easily you know the raster they're not easily editable they can't be using these professional

00:35:31.180 | context in a complete way and the answer is like an awful lot right for a whole range of use cases

00:35:36.220 | and i think we'll continue to find that especially as the capabilities improve and we think the the

00:35:41.420 | range of um uh quality and uh controllability that you can get in these different domains is still

00:35:49.020 | it's very deep and we're still very early um and then i i think as if if we're in the first or

00:35:55.180 | second inning of this ai wave one obvious place to go invest and to go build companies is the

00:36:02.220 | enabling layers right um shorthand for this is obviously compute and data i think the the needs

00:36:08.540 | for uh data are largely changed now as well you need more expert data you need different forms

00:36:15.340 | of table talk about that later in terms of who has like let's say reasoning traces in different

00:36:20.380 | domains that are interesting to companies doing their own training but this is this is an area

00:36:25.980 | that has seen explosive growth and we continue to invest here um okay so maybe time for some opinions

00:36:32.860 | there was a prevailing narrative that um you know some part from companies some part from investors

00:36:42.700 | it's a fun debate uh as to where is the value in the ecosystem and can there be

00:36:47.180 | opportunities for startups um if you guys remember the phrase gpt rapper it was like the dominant

00:36:52.620 | phrase in the tech ecosystem for a while of and what it what it represented with this idea that

00:36:58.540 | there was no value at the application layer you had to do pre-training and then like nobody's

00:37:02.940 | going to catch open ai and pre-training and you know this isn't this isn't like a a knock on

00:37:08.620 | open ai at all these these labs have done amazing work enabling the ecosystem and we continue to

00:37:13.420 | partner with them and and others but um but it's simply untrue as a narrative right the odds are

00:37:21.500 | clearly in favor of a very rich ecosystem of innovation you have a bunch of choices of models

00:37:27.420 | that are good at different things you have price competition you have open source uh i think an

00:37:33.340 | underappreciated impact of test time scaling is you're going to better match user value with your

00:37:39.420 | spend on compute and so if you are a new company that can figure out how to make these models

00:37:44.460 | useful to somebody the customer can pay for the compute instead of you taking as a as a startup

00:37:49.420 | the capex for pre-training or um or rl up front uh and um uh as pranav mentioned you know small

00:37:58.540 | models especially if you know the domain can be unreasonably effective uh and the product layer

00:38:03.500 | has if we look at the sort of cluster of companies that we described shown that it is creating and

00:38:09.100 | capturing value and that it's actually a pretty hard thing to build great products that leverage

00:38:13.020 | ai um so so broadly like we have a point of view that i think is actually shared by many of the

00:38:19.180 | labs that the world is full of problems in the last mile to go take even agi into all of those

00:38:26.220 | use cases is quite long okay another prevailing belief is that um or you know another great debate

00:38:34.060 | that sean could host is like does the value go to startups or incumbents uh we must admit some

00:38:38.700 | bias here even though we have you know friends and portfolio former portfolio companies that would

00:38:42.940 | be considered incumbents now but um uh oh sorry swap swap uh swap views sorry uh you know there

00:38:51.740 | are there are markets in venture that have been considered traditionally like too hard right like

00:38:57.740 | just bad markets for the the venture capital spec which is capital efficient rapid growth that's a

00:39:03.900 | venture backable company um where the end output is a you know a tens of billions of dollars of

00:39:09.900 | enterprise value company um and and these included areas like legal health care defense pharma

00:39:16.140 | education um you know any traditional venture firm would say like bad market nobody makes money

00:39:22.300 | there it's really hard to sell there's no budget etc and and one of the things that's interesting

00:39:26.460 | is if you look at the cluster of companies that has actually been effective over the past year

00:39:30.700 | some of them are in these markets that were traditionally non-obvious right and so perhaps

00:39:35.340 | one of our more optimistic views is that ai is really useful and if you make a capability that

00:39:42.300 | is novel that is several magnitudes um orders of magnitude cheaper then actually you can change the

00:39:48.620 | buying pattern and the structure of these markets and maybe the legal industry didn't buy anything

00:39:53.500 | because it wasn't anything worth buying for a really long time that's one example um we we

00:39:57.660 | also think that like what was the last great consumer company um maybe it was discord or

00:40:02.620 | roblox in terms of things that started that have just like really um enormous user basis and

00:40:07.820 | engagement uh until you know we had these consumer chatbots of different kinds and and like the next

00:40:13.900 | perhaps the next generation of search as Pranav mentioned we think that the um opportunity for

00:40:20.860 | social and media generation and games is uh large and new in a totally different way um and and

00:40:27.900 | finally uh in terms of the markets that we look at uh i think there's broad recognition now that

00:40:33.980 | you can sell against outcomes and services rather than software spend with ai because you're doing

00:40:39.740 | work versus just giving people the ability to do a workflow but um if you take that one step further

00:40:45.340 | we think there's elastic demand for many services right uh our classic example is um there's on

00:40:52.780 | order of 20 to 25 million professional software developers in the world uh you know i imagine much

00:40:58.700 | of this audience is technical uh demand for software is not being met right if we take the

00:41:05.740 | cost of software and high quality software down two orders of magnitude we're just going to end

00:41:10.540 | up with more software in the world we're not going to end up with fewer people doing development

00:41:14.940 | at least that's what we would argue um and then finally on the incumbent versus uh startup

00:41:21.820 | question uh the prevailing narrative is incumbents have the distribution the product surfaces and the

00:41:27.180 | data don't bother competing with them they're going to create and capture the value and share

00:41:30.860 | some of it back with their customers i think this is only partially true um they incumbents have the

00:41:35.820 | distribution they have always had the distribution like the point of the startup is you have to go

00:41:40.060 | fight with a better product or a more clever product um and maybe a different business model

00:41:45.340 | to go get new distribution but the specifics around the product surface and the data i think

00:41:50.940 | are actually worth understanding there's a really strong innovators dilemma if you look at the sas

00:41:55.740 | companies that are dominant they sell by seat and if i'm doing the work for you i don't necessarily

00:42:01.020 | want to sell you seats i might actually decrease the number of seats um the tens of the decades of

00:42:07.660 | years and millions of man and woman hours of code that have been written to uh enable a particular

00:42:16.860 | workflow in crm for example may not matter if i don't want people to do that workflow of filling

00:42:21.900 | out the database every friday anymore and so i i do think that this sunk cost or the incumbent

00:42:28.060 | advantage gets highly challenged by new ux and code generation as well and then one disappointing

00:42:34.620 | learning that we found in our own portfolio is no one has the data we want in many cases

00:42:40.540 | right so imagine you are trying to automate a specific type of knowledge work uh and what you

00:42:48.380 | want is the reasoning trace um all of the inputs and the output decision um like that sounds like

00:42:56.220 | a very useful set of data and the incumbent companies in any given domain they never save

00:43:00.620 | that data right like they have a database with the outputs some of the time and so i i would say uh

00:43:06.700 | one of the things that is worth thinking through as a startup is um when an incumbent says they

00:43:12.540 | have the data like what is the data you actually need to make your product higher quality

00:43:15.660 | okay so in in summary um you know our shorthand for the set of changes that are happening is

00:43:23.180 | software 3.0 we think it is a full stack rethinking and it enables um in a a new generation of

00:43:29.660 | companies to have a huge advantage the speed of change um favors startups if the floor is lava

00:43:35.500 | it's really hard to turn a really big ship uh i think that some of the ceos of large companies

00:43:40.460 | now are incredibly capable but they're still trying to make a hundred thousand people move

00:43:44.380 | very quickly in a new paradigm um the market opportunities are different right these markets

00:43:49.260 | that we think are interesting and very large like represent a trillion dollars of value

00:43:53.420 | are not just the replacement software markets of the last two decades um it's not clear what

00:43:59.500 | the business model for many of these companies should be uh sierra just started talking about

00:44:03.500 | charging for outcomes um outcomes based pricing has been this holy grail idea in software and

00:44:08.780 | it's been very hard but now we do more work um uh there are other business model challenges um

00:44:15.660 | and so you know our companies they spend a lot more on compute than they have in the past they

00:44:21.020 | spend a lot with the foundation model providers they think about gross margin uh they think about

00:44:25.660 | where to get the data uh it's a time where you need to be really creative about product um

00:44:30.220 | versus just replace the workflows of the past uh and it might require ripping out those workflows

00:44:36.140 | entirely it's a different development cycle i bet most of the people in this room have written

00:44:41.260 | evals um and like compared to you know the academic benchmark to a real world eval and said like

00:44:46.860 | you know that's not it and how do i make a user um understand uh the um non-deterministic nature

00:44:55.580 | of these outputs or gracefully fail i think that's like a different way to think about product than

00:45:00.220 | in the past um and we we need to think about infrastructure again right um there was this

00:45:05.420 | middle period where the cloud providers the hyperscalers took this problem away from software

00:45:10.380 | developers and it was all just going to be like i don't front end people at some point and it's

00:45:14.060 | like we are not there anymore we're back in the hardware era where people are um acquiring and

00:45:18.780 | managing and optimizing compute and i think that will really matter in terms of capability and

00:45:22.380 | companies um so uh i guess we'll end with a call to action here and and encourage all of you to

00:45:30.140 | seize the opportunity um it is the greatest technical and economic opportunity that we've

00:45:35.340 | ever seen like we made a decade plus career type bet on it and um uh we do a lot of work

00:45:43.580 | with the foundation model companies uh we think they are doing amazing work and they're great

00:45:48.540 | partners and even co-investors in some of our efforts but uh i think all of the focus on their

00:45:54.620 | interesting missions around agi and safety um do not mean that there are not opportunities in other

00:46:00.940 | parts of the economy the world is very large and we think much of the value will be distributed in

00:46:05.820 | the world through an unbundling and eventually a re-bundling uh as often happens in technology

00:46:10.940 | cycles um so we think this is a market that is structurally supportive of startups we're really

00:46:16.060 | excited to try to work with the more ambitious ones and the theme of 2024 um to us has been like

00:46:23.100 | well thank goodness this is a this is an ecosystem that is much friendlier to startups than 2023 it

00:46:29.500 | is what we hoped um and and so uh you know please uh ask those questions and take advantage of the

00:46:35.420 | opportunity do those things work yeah hello they do work i can kick us off okay so if some of these

00:46:56.860 | companies um can go from you know 1 to 20 in such a short amount of time do you think that they can

00:47:02.300 | also disappear in a short amount of time uh i can i can take this one i mean uh i think you've seen

00:47:10.140 | companies go from zero to 80 million and stall out pretty badly actually um so your data is correct

00:47:17.100 | um there's gonna be uh there's a set of challenges that um are just the challenges of scale right

00:47:26.060 | like i think sometimes the revenue numbers in these companies can overstate the maturity of

00:47:30.140 | the businesses themselves right they need to figure out how to serve customers they need to

00:47:33.580 | scale their leadership um they need to uh prepare to uh service these customers um with the right

00:47:41.820 | quality level and you know like the company that we showed that went zero to 20 that company has

00:47:46.540 | 20 people right and they have you know x hundred thousand users is yeah it's very challenging um

00:47:52.300 | and and so i think there there's a set of good hard problems that these companies will have

00:47:57.340 | i think part of the like most catchphrases or memes they don't catch on unless there's some

00:48:03.660 | seat of truth and so there was a set of companies that were described by this term gpt wrapper that

00:48:09.500 | were not more than a somewhat trivial set of prompts and seo pages that directed people to

00:48:17.660 | our particular use case and i think that's not uh that's like likely not a durable position as a

00:48:24.140 | technology company um and and so it's not a very clean answer for you it's a it's a nuanced one but

00:48:30.700 | some of the value that is represented by this um i'm going to scroll back to it some of this value

00:48:37.660 | that is represented by this cluster is durable and that's the thing that we are interested in

00:48:42.300 | um uh the the zero to 20 and the zero to 80 and then collapse it's actually valuable it's just

00:48:50.140 | not durable right users are voting for it and other people can compete and so you know we kind

00:48:54.780 | of separate these two questions of like you know which of these companies is defensible um and

00:49:00.220 | where is the revenue or the usage not a novelty but something that's really important to like

00:49:05.660 | work or player communication sean do you want me to take questions or do you want to do it

00:49:14.060 | yeah well yeah you can do it hi hi um i think my mic oh here it goes so if all of these companies

00:49:22.460 | need a lot more money and this is the greatest economic opportunity ever uh don't we need much

00:49:28.860 | bigger venture funds like orders of magnitude bigger and won't the economics of those funds

00:49:33.900 | be really broken if they're still raising 40 million dollar like gonna invest in a bunch

00:49:37.820 | of seed company funds okay uh this is a bit of a triggering question for me because i take a

00:49:43.820 | particular point of view on it um uh hopefully without arrogance we've chosen to raise

00:49:48.540 | funds that are relatively small um as early stage investors uh and part of it is the the view of um

00:49:55.980 | like this company that you know this company uh i think they've spent like maybe seven million

00:50:04.460 | dollars to date right um and so the view that all ai product companies or all ai companies in general

00:50:12.140 | are very expensive is not true objectively we have we have several companies that are

00:50:16.940 | um expensive in the traditional sense of sass like we got to go hire a lot of go-to-market people

00:50:22.460 | and we have to pay them and there's a j curve of that investment before it comes back in

00:50:26.540 | repeatable sass revenue um uh and you know i think um inference revenue uh we have companies that are

00:50:35.100 | profitable or break even and have been incredibly efficient and we have companies that spend a lot

00:50:39.580 | up front and so i think there's a an entire range um our view as a firm is uh that you know very

00:50:48.060 | early on um my friend a lot has a a funny phrase here which is um no gpu before product market fit

00:50:56.060 | i think that is not always true we have given people gpus before anything right but but there's

00:51:01.980 | there's a a shred of truth in this which is you can experiment like thank you to the open ai and

00:51:09.180 | anthropics and um other companies of the world that allow uh great product people to experiment

00:51:14.620 | at very low cost very incrementally and so i i think much of our portfolio looks like those

00:51:20.060 | companies where you're going to see what kind of value you can bring to users without spending a

00:51:24.940 | ton up front um as one example like we just saw um uh new fine tuning interfaces for a one come out

00:51:33.260 | the amount of data that you need to in theory improve um those models for a particular domain

00:51:40.300 | is very small if that pans out like that's incredibly encouraging as well so so i would

00:51:46.780 | say like i our goal is to work with the most important companies in ai with a relatively

00:51:52.860 | small fund and i think that um most companies don't actually they don't benefit from a huge

00:51:59.100 | amount of capital up front um the only thing i would add to that is uh i i think an interesting

00:52:05.740 | trend is that we work with a number of second time founders whose point of view this time around is

00:52:09.740 | like we're never going to make the company that big again i think it's not a surprise actually i

00:52:14.540 | was doing the math in my head and um this rough ratio of a million dollars of revenue for per

00:52:19.340 | employee of early stage company holds true for like a remarkable number of our companies like

00:52:23.420 | a number of our companies have more millions in revenue than they do employees and the point of

00:52:28.060 | view of a bunch of this is like we're going to keep it that way like we're we're not going to

00:52:31.020 | grow into a giant team uh ai will make us much more efficient and if you believe in the grand

00:52:35.660 | vision of much of the intellectual labor that we do should actually just be captured by some

00:52:39.980 | number of models and we can build much more long-term efficient businesses than we have been

00:52:44.060 | able to historically i do think it's an interesting question because um if we think

00:52:49.180 | there is this much opportunity like your opportunity doesn't come evenly right and so

00:52:54.460 | i'd say our investment pacing is higher than i guess mine has been traditionally and uh another

00:53:01.420 | part of our view is like okay well we want to offer and we want to offer founders a certain

00:53:05.980 | service level um and you know founders can decide if they want that or not but it is it's very time

00:53:12.140 | expensive to us we can only work with that many companies we think many more are really interesting

00:53:17.580 | and that is one of the reasons that pranav and i did this program for the ecosystem called embed

00:53:21.980 | where we can work with a larger set of companies we own less but we give them you know uh a network

00:53:27.340 | and some guidance and and it is genuinely because there are more interesting things that we think

00:53:31.420 | are going to work than we can work on in a traditional um like artisanal venture sense

00:53:36.620 | and shameless plug applications will open in january

00:53:38.940 | i think if i press a button so fast oh so fancy cool uh hi thanks for the talk it was awesome

00:53:53.500 | so i work for a series c enterprise focused company called writer and one of the interesting

00:53:58.380 | things about the multi-modality thing that we're seeing in the enterprises beyond vision we're not

00:54:03.500 | actually seeing a lot of like demand for multi-modality like we'll get asked about um audio

00:54:09.500 | and video stuff but then when we ask like sort of what's the use case it's sort of like i don't know

00:54:14.860 | and so i'm curious if if you and your um like portfolio companies are are seeing that in the

00:54:21.020 | enterprise space and if so like what use cases it seems very focused like the multi-modality stuff

00:54:25.260 | seems great for the consumer level i'm curious if you're seeing anything on the enterprise side

00:54:30.300 | i think it's a good call out um enterprises the data they have is mostly like it's text it's like

00:54:36.700 | structured data and some sql data like it's uh um i don't think your average enterprise has that much

00:54:43.020 | vision video audio data that is that interesting um but i think that will change um like

00:54:50.940 | maybe it's because i'm like lazy and disorganized but humans are very unstructured like they don't

00:54:57.260 | want they don't necessarily think in terms of like relational database schema and like hierarchical

00:55:02.700 | management of their own information uh and i i think there's a future where we take that away

00:55:07.900 | from people um and um the capture of information that you're going to use for different enterprise

00:55:13.260 | workflows um uh enables more multi-modal use if that makes sense and so like the sort of obvious

00:55:20.060 | example would be there are companies from like perhaps a half generation ago like the gongs of

00:55:24.940 | the world that captured video and found some um keywords and initial insights uh for sales reps

00:55:31.820 | but the communications within an organization the decisions made um the uh things that people

00:55:40.460 | create i think there will be much more capture especially of video but um uh making use of it

00:55:48.460 | requires companies to do that capture um so we kind of require this intermediate step i think

00:55:54.140 | there's a company in our uh and this is still a prosumer company today as well to your point of

00:55:59.100 | like you know the consumer prosumer side is ahead of the enterprise but there's a company in our

00:56:03.340 | last embed batch called highlight that kind of has this premise that like okay well you know

00:56:08.060 | we're going to use the multi-modality by using on-screen capture that's what this little like

00:56:12.460 | bubble is on screen and audio capture and i think that um i think it's a powerful idea

00:56:21.500 | uh hi

00:56:22.140 | by the way just a quick check uh peter isaac are you here

00:56:28.860 | uh hi thanks yeah there's sort of like a meme going around that the the price of intelligence

00:56:38.940 | is going to go to zero um and you can kind of see this with gpt40 and and with gemini flash

00:56:45.100 | you can get a million tokens a day which is probably enough for a small company right like

00:56:51.260 | so i'm curious how as these large companies lose tons of money for market share like how are

00:56:58.540 | startups going to respond to this like how does that change the market um i think it is impossible

00:57:04.300 | for anything to be too cheap so i'll start with that um i would also say this company

00:57:09.020 | with this like awesome revenue chart like i'm pretty sure we paid like five to seven million

00:57:14.460 | dollars to a uh foundation model provider in this period of time right and so um uh demand is

00:57:21.900 | like if there was like a secondary theme to this talk demand is elastic in so many ways especially

00:57:26.540 | for technology and when you make things cheaper we want things to be more intelligent right um and so

00:57:32.940 | if you make hundreds of calls in order to deliver an output um then suddenly like the fact that the

00:57:39.660 | cost of calls come down 85% doesn't do you enough uh and so yes it's like an incredibly compelling

00:57:46.380 | idea of like having intelligence too cheap to meter i'm like maybe this is really old school

00:57:51.340 | of me but for the last two decades like the internet and compute and software and data

00:57:56.300 | pipeline like they it still hasn't been cheap enough actually we would do more if it was free

00:58:02.140 | so uh the other like uh physical barrier that we've run into is um when models are really large

00:58:11.420 | if you're not going to like quantize and distill and do domain specific things like it's hard to

00:58:16.300 | run you need a lot of compute just to state the very basics and even with the foundation model

00:58:21.420 | providers we are seeing people run into inference capacity issues and so um i do not know if this

00:58:27.420 | is true but uh like one way to read anthropic pricing change is there's not enough capacity

00:58:34.300 | right uh and so i think like um incredible kudos to the open source ecosystem incredible kudos to

00:58:40.860 | open ai for like staying on this drumbeat of offering cheaper and cheaper intelligence in

00:58:45.980 | every generation but uh like we have a companies that are spending a lot of money on um you know

00:58:53.980 | let's say um search and validation systems with many calls and we think that will continue

00:58:58.940 | i think you can see that as well in like the the price charts that we had before

00:59:03.580 | the like one pricing is still absurd um it it seems like it actually is gpt3 pricing

00:59:11.180 | right yeah but i mean volume of tokens i think um like it is really interesting that

00:59:18.940 | if you believe like the i mean the the other part of this is like if you look at the test

00:59:22.700 | time compute scaling um this is it's a log scale like uh it's easy to forget that like that's a lot

00:59:30.060 | of like historically um like as a result of overtraining a small set of companies took on

00:59:35.180 | the majority of financial burden for generating high quality models which is you just overtrain

00:59:39.580 | the shit out of your model and then it's useful for everyone else um if the customer has to pay

00:59:43.740 | this like that's a lot of money um if you want high quality generation and that means that i pay

00:59:48.940 | on the order of like thousands of attempts um that's that ends up being pretty expensive

00:59:53.660 | um question from youtube uh so hi to the youtube audience

00:59:59.100 | um so we you know you talked about price right price going down uh there's also the other

01:00:05.420 | dimension of capabilities going up and people always getting steamrolled by open ai so the

01:00:10.380 | question is what are some specific ways that you've seen companies build to prepare for better models

01:00:14.860 | like gpt5 or o2 like how do you future proof that um so i i think the like the most common refrain

01:00:22.940 | from at least opening i but i think the the model companies is you should build a company where

01:00:27.340 | you're excited when you hear that a new model is coming out not anxious um i would have like one

01:00:33.260 | edit to this which is like in the limit it seems like the majority of things that are worth building

01:00:37.100 | today are actually i don't know should you hire a sales team at all if if you think that models

01:00:40.540 | would be perfectly capable um like one framing that i've thought about on this is um you should

01:00:45.500 | decide like uh how much you believe uh foundation models will improve on like some core learning or

01:00:53.100 | intelligence capability um and then build your company imagining that on that prediction so

01:00:59.020 | the like an example here would be um like if you take like i think there's a generation of these

01:01:04.300 | like copywriting companies that uh were largely subsumed by chat gpt and the the story for many

01:01:10.300 | of them was the original usage was they understood better than other people how to get the model to

01:01:16.060 | like learn what my intent was in generating some piece of content some piece of seo content or they

01:01:19.980 | understood how to ingest information about my business and it's not hard to imagine like the

01:01:23.900 | next generation of models are just natively better at this like the context length gets longer you can

01:01:28.140 | stuff more into the context length you can crawl and like learn more about external websites like

01:01:32.860 | all that is like relatively cheap and so if the the core thesis the company looks like we don't

01:01:37.580 | think models will be capable of doing that that feels uh likely short-sighted on the other hand

01:01:42.940 | like there are a number of delivery mechanisms that are like far out of range of what what models

01:01:48.380 | will do like sarah had a a good example of this which is like there are some businesses where the

01:01:52.940 | limiting factor is like not actually intelligence like the the limiting factor for a number of

01:01:57.100 | businesses is like access to a specific set of people or um like i don't know we work with a

01:02:01.740 | pharmacy services company where like a core question is like long term can you negotiate

01:02:05.340 | pricing contracts the core issue there is on intelligence you need some amount of scale and

01:02:08.940 | then the ability to negotiate contracts um so i think i think many businesses are not exactly

01:02:13.820 | just a function of your ability to efficiently compute some small set of things i gave this

01:02:18.780 | presentation um with pranav and i'm like oh i'm so biased it just sounds like startups are gonna

01:02:22.860 | win everything and i'm um we still there i like to play this game which is what investment decision

01:02:28.860 | do you regret from the past year it's a really fun game i'm super fun yes um but one of the one of

01:02:34.140 | the decisions that i regretted was actually um a company that operates in uh uh a space that feels

01:02:43.420 | very core to perhaps foundation model companies and to hyper scale software players where there's

01:02:50.860 | tons of ecosystem risk around the company and by the way the people are amazing the metrics were

01:02:56.060 | amazing we're just like oh they're gonna get crushed and so with everything i said i still

01:03:00.780 | like overestimated the incumbents like ability to compete and make aggressive strategic decisions

01:03:07.020 | and so um i i think it's like really hard to overstate how important it is to understand um

01:03:14.460 | somebody can steamroll you if they focused all of their effort and all their best people

01:03:21.500 | on a particular area um are they going to right the copywriting example is illustrative because

01:03:28.700 | it's just not hard to see that understanding the context of a business from its website and from a

01:03:36.460 | couple documents and by making prompting a little bit easier and adding like some buttons that

01:03:40.540 | replace some prompts or doing suggested queries like it's just not a lot of work right but there

01:03:46.460 | are things that are a lot of work like having taste in developer products and distributing

01:03:51.340 | something amazing and so uh i i i actually think that um uh it's if you ask me like we have to make

01:04:00.300 | predictions in this business i worry more about under projecting capability than i worry about

01:04:05.500 | over projecting at least in the short term and then i worry more about um expecting too much

01:04:11.820 | from the incumbents and being too afraid of them than uh being not afraid enough maybe it's just

01:04:18.940 | one investment regret either one of you yeah we have one more from online oh okay you can do the

01:04:28.700 | online one uh how do you see ai changing hardware or in what ways and for example do you see a new

01:04:39.100 | apple coming out transforming hardware to that level not specifically the humane situation

01:04:45.900 | they're trying to ask very general how ai interview uh i'm sorry okay i i'd approach this from um uh

01:04:55.980 | two dimensions um uh everybody every investor wants a like a new consumer hardware platform

01:05:04.700 | to exist because it's so valuable and the question is like why why should it um i can think of two

01:05:10.380 | very good reasons one is that the usage pattern that you can imagine for ai applications actually

01:05:16.460 | requires you to um like the specs you'd want are different right like what if i want to capture

01:05:22.060 | image or video 100 of the time and um that's like a determinant of my battery life of my

01:05:29.740 | sensors of how i manage my network etc what if i want to run local models all the time like maybe

01:05:35.500 | like most of the phone should be a gpu right um i don't uh i i think that the usage patterns are

01:05:42.700 | perhaps very different for the next generation of you know the the intelligence in your hand

01:05:48.460 | um i think it's a hard thing to pull off another reason that you could believe in a new hardware

01:05:54.700 | device is that the advantages of the existing consumer platforms go away right and so at the

01:06:01.260 | extreme like should you have individual applications that track a single habit like drink water today

01:06:11.740 | sarah like i don't know like i can generate that pretty easily now and like maybe the single

01:06:17.420 | function applications that live in the mobile phone ecosystems are um part of uh a more general

01:06:24.780 | intelligence and um they like that ecosystem is less important um and so i i think there are

01:06:30.700 | different arguments for this uh and like we continually look for uh opportunities to invest

01:06:37.660 | here i don't think this is exactly what you asked but i also think the um like there are

01:06:43.500 | we invested in a company this past year um that is doing uh robotics um i for many years at graylock

01:06:52.700 | my prior firm like thought of robotics as an easy way to lose a lot of money over a long period of

01:06:57.180 | time um and and like i think that is true when you look at the outcome set for classical robotics

01:07:03.100 | even for the companies that got to scale of distribution for an industrial robot or a single

01:07:07.740 | use consumer robot um but like it's really cool that algorithms and generalization from um the

01:07:15.180 | broader machine learning field seem to apply here as well uh and so i think being imaginative about

01:07:22.220 | what physical intelligence looks like is also something we're excited about

01:07:26.460 | yeah okay okay okay so related to agents i think everyone has been chatting about agents you're

01:07:39.900 | seeing more like agent usefulness and production but i'm more curious like at the infrastructure

01:07:44.860 | layer what agent what infrastructure primitives do you think are required for agents to actually work

01:07:50.620 | and continue to work in production um okay i uh i don't know we talked about this a little bit i'm

01:07:59.980 | not sure if our points of view in this are the same i think it is um i think it's really hard

01:08:03.580 | to tell um my suspicion is that um like if you look at the number of like true agents that work

01:08:11.420 | like the number roughly rounds to zero maybe it's like low single digits or low double digits now

01:08:17.180 | um double double yeah and uh like they're all like relatively recent i would say like beginning

01:08:21.900 | of this year um we saw like a bunch of agent framework companies and um like i uh like i

01:08:27.820 | empathize with like the the root of the question which is it's just really hard to tell what any

01:08:31.340 | of these companies need especially when like this set of companies that works really well is unclear

01:08:34.780 | and um i i think there's a lot of valid anxiety on what foundation model companies want the

01:08:39.900 | interface to be like the computer's interface is a pretty low level one like you the anthropic

01:08:44.700 | version is like actually just make specific clicks and you know like rumors of other interfaces are

01:08:49.820 | like much more general like they're take actions on a specific web page um or like entire browser

01:08:54.620 | environments and so um like at a high level like i imagine that there are sets of like there's like

01:08:59.660 | the full scope of tools which is like i worked in a search engine for a while like crawl seems

01:09:02.940 | pretty useful live data seems pretty useful like an api that looks something like here's a url give

01:09:08.140 | me the set of data that's available or here's a url and a user login let me take some action

01:09:13.420 | on this page seems pretty useful um and then i don't know what the right place to operationalize

01:09:18.300 | this and commercially develop a product are um if i had like uh if i was building a company here

01:09:23.980 | like one thing that i think it's useful to just remain agile like the corset of infrastructure

01:09:28.940 | is consistently useful like a crawler is consistently useful and then one day you can

01:09:33.020 | figure out how to expose this better um but i i like empathize with the difficulty of like it's

01:09:39.500 | really hard to know what works for a bunch of agent companies and my suspicion is like the

01:09:43.980 | most successful agent frameworks will come from the most successful of these agent companies that

01:09:48.220 | solve these problems in-house for themselves and then operationalize this externally like it's

01:09:52.220 | some version of like react is really useful because react was like well adopted at facebook for a

01:09:56.540 | while um i think we can say that there are like missing components in the ecosystem where that

01:10:05.180 | if there was a default lots of agent developers would use it right um and so like identity and

01:10:13.020 | access management is a big problem um uh like if you could make agent development feel more like

01:10:21.500 | traditional software development i think a lot of people would use that and be like oh like

01:10:25.260 | you know it magically retries until it gets something and then it gives me like data back

01:10:29.180 | about how well it's working like things that like it's i think it's pretty easy to actually imagine

01:10:33.420 | the utilities in the abstract that would be useful to the ecosystem and then um the entire environment

01:10:39.980 | is fluid right and so um uh do you need like if you think about other things in infrastructure

01:10:46.300 | like will more workloads need vector indices yes like what is the shape of company that gets to be

01:10:52.140 | durable here like we don't know yet um and we'll keep looking at it but as pranav said i think we

01:10:57.260 | look to the handful of companies in our portfolio that are agents working at some scale and um and

01:11:05.260 | like look for the patterns there versus try to intuit it right now my cash hit was wrong i should

01:11:10.860 | have updated it's a it's a dozen not a small number it's been a long six months guys yeah

01:11:17.260 | uh i think one last question and there's a whole bunch of online stuff you won't get to but um yeah

01:11:23.340 | mark okay um it seems like there should be more consumer companies

01:11:29.180 | like why why aren't there or is it just a matter of time

01:11:33.500 | i think simply matter of time like we uh

01:11:38.860 | we keep bringing people into embed we keep looking i i think the uh i genuinely this is not

01:11:46.060 | a um a a knock on the research community or the really young set of founders that like

01:11:52.460 | i think focused on ai companies um first but the diffusion of innovation curve that applies to

01:11:58.940 | customers i think also applies to entrepreneurs um researchers saw the capability first and they're

01:12:06.300 | like like we should do something with this this is going to be amazing and it's like that will

01:12:10.540 | continue to happen like our portfolio is heavily overrepresented with with people from the research

01:12:14.860 | community pushing the pushing the state of the art with creative technical ideas um uh i think young

01:12:21.180 | very young people also were quite early to ai because they're like oh of course like this

01:12:25.900 | makes sense i've never seen other technology like chachi pt all the way um and their opportunity

01:12:31.660 | cost is lower than like you're the best product person at an amazing product organization like

01:12:37.740 | you have to leave your job to start a new company uh and it's been a really long two years like i

01:12:43.900 | feel like that's just started to happen where some of the talent that has the and you know maybe

01:12:51.660 | maybe it's just like the next zuck you know there's some dropout that figures out like the

01:12:55.900 | pattern of social interaction and it's like really ai native about this stuff i also think there's a

01:13:00.460 | chance that um some of the people who have built intuition for um consumer adoption and consumer

01:13:08.780 | interfaces they're just taking a little bit to also build intuition for ai products and now

01:13:13.180 | they're showing up and starting companies and experimenting and so um we have a lot of confidence

01:13:18.540 | like it is going to happen over the next few years and just a matter of time okay i think we're i

01:13:24.860 | think we're out of time i'm just trying to defer to sean here but thank you so much um you know

01:13:29.420 | please call us yeah i'm sure sarah for now we'll be sticking around uh so you can you can uh sort

01:13:38.140 | of ask some questions outside or you know whatever you want to do in networking wise but we're going

01:13:42.860 | to move on in our schedule uh we have a ton of papers that we want to cover this is basically

01:13:47.740 | paper club live um and i think isaac peter uh you guys uh so the the top um the top uh when people

01:14:00.380 | signed up we actually asked people like what you wanted to cover and the top um votes were vision

01:14:06.300 | open models um post transformers and all the other stuff that's coming later we also added reasoning

01:14:12.780 | because i didn't even have the option there and i was like what am i doing doing a sort of paper

01:14:17.980 | review uh session this year without uh talking about reasoning and um you know test time compute

01:14:24.220 | so uh but first we're gonna have uh vision uh roblox has been uh really uh great friends with

01:14:29.340 | latent space we've we've had um joseph nelson on twice um with facebook talking about all the

01:14:35.820 | innovations in vision but um it's not you know only about segmentation there's a lot of foundation

01:14:41.100 | model uh progress that happened this year in both the large space and the very small space so we're

01:14:45.820 | also very proud to have vick um to to update us on moon dream which he's been hacking away from

01:14:51.820 | for the past year yeah very very short amount of time um are you guys ready are you plugged in

01:14:57.020 | uh sarah paradov do you do you guys want to take questions like

01:15:05.580 | i don't know if people want to like there are people that want to talk to you

01:15:07.980 | what's what's your availability are you

01:15:11.260 | okay good

01:15:18.380 | awesome yeah just plug in on yeah the white thing exactly do you have sound the stuff

01:15:25.900 | no sound listen do you have any audio things all right cool cool

01:15:29.500 | stay close to the mic uh hi hey are they mic'd up nice yeah

01:15:36.060 | oh okay

01:15:44.700 | uh man i was hoping to use speaker notes that's not gonna work

01:15:51.580 | um you could do like a mirroring thing yeah yeah uh so there's settings display yeah yeah

01:16:04.380 | yep there you go thank you sweet

01:16:07.340 | are you on zoom

01:16:11.260 | okay um

01:16:26.300 | i'm sending an email

01:16:40.060 | super pumped to be here this is so cool

01:16:53.420 | email yeah both of us relied on your vision capabilities um yeah so this is for the screen

01:17:01.020 | share is for the live stream and also the editing or recording that we're doing later

01:17:04.780 | okay so you just share your screen and mute yourself um we got we got the audio you just

01:17:12.140 | want to capture the screen video and share the um share the green share share the screen that

01:17:17.980 | you're actually want people to see yeah that one the the the one with with the image that one but

01:17:25.900 | this is the speaker view yeah you don't want to share the speaker yeah so so you want to share

01:17:30.460 | this out too that's right double click on it you're good okay all right all right figuring

01:17:37.900 | things out like yeah now where'd the presentation go uh you can you can do the triple yeah triple

01:17:49.260 | slide there you go let's pick pick the thing and it is it up there are you just struggling no

01:17:54.140 | let's uh kill this how do i exit out of this apologies technical difficulties

01:18:02.140 | nice okay let's okay we're going to drag this up yeah perfect

01:18:22.860 | see uh

01:18:30.300 | we're just gonna make this full screen and call it good

01:18:32.460 | okay yay okay um hi we're isaac and peter from roboflow and we're going to talk about the best

01:18:46.700 | papers of 2024 in computer vision um so for us we define best as what made the biggest shifts

01:18:56.780 | in the space uh and to determine that we looked at what are some major trends that happened uh

01:19:03.740 | and what papers most contributed to those trends so i'm going to talk about a couple

01:19:06.620 | trends peter's going to talk about a trend and then uh we're going to hand it off to moon dream

01:19:10.860 | so the trends that i'm interested in talking about are a major transition from models that

01:19:19.260 | run on per image basis to models that run using the same basic ideas on video and then also how

01:19:26.700 | debtors are starting to take over the uh real-time uh object detection scene from

01:19:33.420 | the yolos which have been dominant for years uh so as a highlight um we're going to talk about

01:19:40.540 | sora which from my perspective is the biggest paper of 2024 even though it came out in february

01:19:45.980 | um is the one yeah yeah so just it's a sora is just a uh a post um so i'm going to fill it in

01:19:55.900 | with details from replication efforts including open sora and related work such as a stable

01:20:01.020 | diffusion video um and then we're also going to talk about sam2 which applies the sam strategy to

01:20:08.780 | video and then how debtors are the improvements in 2024 to debtors that are making them a Pareto

01:20:15.580 | improvement to yellow base models um so to start this off we're going to talk about uh the state

01:20:23.260 | of the art of video generation at the end of 2023 mag v.i.t uh mag v.i.t is a discrete token video

01:20:32.940 | tokenizer akin to vq gan but applied to video sequences and it actually outperforms uh state

01:20:40.940 | of the art uh handcrafted video compression frameworks uh in terms of the uh bit rate

01:20:48.380 | versus human preference for quality and video is generated by autoregressing on these discrete

01:20:53.020 | tokens um generates some pretty nice stuff but up to like five seconds length and you know not

01:20:59.100 | super detailed and then suddenly a few uh months later we have this which when i saw it was totally

01:21:06.700 | mind-blowing to me um 1080p a whole minute long we've got light reflecting and puddles that's

01:21:13.020 | reflective uh reminds me of those rtx demonstrations for next generation video games such as cyberpunk

01:21:20.860 | but with better graphics you can see some issues in the background if you look closely but they're

01:21:26.300 | kind of with a lot as with a lot of these models the issues tend to be things that people aren't

01:21:31.660 | going to pay attention to unless they're looking for in the same way that like six fingers on a

01:21:35.340 | hand you're not going to notice is a uh giveaway unless you're looking for it um so yeah as we

01:21:43.500 | said sore does not have a paper so we're going to be filling it in with uh context from the rest of

01:21:48.140 | the uh computer vision scene attempting to replicate these efforts um so the first step

01:21:56.300 | you have an llm caption a huge amount of videos um this this is a trick that they introduced in

01:22:04.220 | dolly 3 where they train a uh image captioning model to just generate very high quality captions

01:22:10.940 | for a huge corpus and then train a diffusion model on that their uh sora and the replication

01:22:18.700 | efforts also show a bunch of other steps that are necessary for good video generation including uh

01:22:24.860 | filtering by aesthetic score and filtering by making sure the videos have enough motion so

01:22:30.060 | they're not just like kind of the generator is not learning to just generate static frames um

01:22:35.580 | so then we encode our video into a series of space-time latency once again this were a very

01:22:45.100 | sparse in details so um the replication related works uh open sora actually uses a mag vit v2

01:22:52.540 | itself to do this but swapping out the uh disc discretization step with a classic vae

01:23:00.140 | auto encoder framework and they show that there's a lot of benefit from getting the temporal

01:23:07.740 | compression which makes a lot of sense as uh the each sequential frames and videos have mostly

01:23:13.500 | redundant information um so by compressing against compressing in the temporal space you allow the

01:23:21.500 | latent to hold a lot more semantic information while uh avoiding that duplicate um

01:23:28.300 | so we've got our space-time latence possibly but via there's some 3d vae presumably a mag vat v2

01:23:39.020 | um and then you throw it into a diffusion transformer so um i i think it's personally

01:23:47.740 | interesting to note that open sora is using a mag vat v2 which originally used an autoregressive

01:23:53.980 | transformer decoder to model the latent space but uh is now using a diffusion uh diffusion

01:24:01.740 | transformer so it's still a transformer happening just the question is like is it parameterizing

01:24:06.060 | the stochastic uh differential equation is or parameterizing a uh conditional distribution

01:24:11.100 | via autoregression um it's also um it's also worth noting that most diffusion models today

01:24:21.100 | the the very high performance ones are switching away from the classic like ddpm

01:24:24.380 | denoising diffusion probability modeling framework to rectified flows um rectified

01:24:31.260 | flows have a very interesting property that as they converge they actually get closer to

01:24:36.940 | being able to be sampled with a single step which means that uh in practice you can actually

01:24:42.460 | generate high quality samples much faster um major problem of ddpm and related models for

01:24:50.380 | the past four years is just that they require many many steps to generate high quality samples

01:24:56.380 | so and naturally the third step is throwing lots of compute at the problem

01:25:02.540 | so uh i didn't i never figured out how to manage to get this video to loop

01:25:08.620 | but we see very little compute medium compute lots of compute this is so interesting because the uh

01:25:17.500 | the original diffusion transformer paper from facebook actually showed that in fact the specific

01:25:22.460 | hyperparameters of the transformer didn't really matter that much what mattered was that you were

01:25:27.660 | just increasing the amount of compute that the model had so i love how in the you know once again

01:25:35.340 | little blog posts they don't even talk about like the specific hyperparameters they say we're using

01:25:38.540 | a diffusion transformer and we're just throwing more compute at it and this is what happens

01:25:41.900 | um open sora shows similar results the uh primary issue i think here is that

01:25:49.660 | no one else has 32x compute budget so we end up with these uh uh we end up in the middle of the

01:25:58.620 | domain in most of the uh uh related work which is still super super cool it's just a little

01:26:05.260 | disappointing considering the context um so i think this is a beautiful extension of the

01:26:11.660 | uh framework that was introduced in 22 and 23 for these very high quality per image generation

01:26:19.900 | and then extending that to videos it's awesome and it's ga as of monday except no one can seem

01:26:27.020 | to get access to it because they keep shutting down the login uh the next so next paper i wanted

01:26:33.900 | to talk about is sam so we at roboflow allow users to label data and train models on that data sam

01:26:41.980 | for us has saved our users 75 years of labeling time um we are the the best of my knowledge the

01:26:48.620 | largest uh sam api that exists we also sam also allows us to have our users train just pure uh

01:26:57.660 | bounding box regression models and use those to generate high quality masks um which has the great

01:27:05.660 | side effect of requiring less training data to have a meaningful convergence so most people are

01:27:11.020 | data limited in the real world so anything that requires less data to get to a useful thing is

01:27:15.100 | super useful um most of our users actually run their object uh per frame object detectors on

01:27:22.860 | every frame in a video or maybe not most but many many and so uh sam follows into this category of

01:27:31.900 | taking sam2 falls into this category of taking something that really really works and applying

01:27:36.620 | it to a video which has the wonderful benefit of being plug and play with most of our many of our

01:27:43.420 | users use cases um we're we're still building out a sufficiently mature pipeline to take advantage

01:27:49.980 | of that but it's it's in the works um so here we've got a great example we can click on cells

01:27:58.780 | and then follow them you even notice the cell goes away and comes back and we can still uh

01:28:02.940 | keep track of it um which is very challenging for uh existing object trackers um high level

01:28:15.580 | overview of how sam2 works we uh uh there's a simple pipeline here where we

01:28:24.460 | can give provide some type of prompt and it fills out the rest of the likely masks for that object

01:28:33.260 | throughout the rest of the video so here we're giving a bounding box in the first frame a set

01:28:37.500 | of positive negative points or even just a simple mask um i'm gonna assume people are somewhat

01:28:45.580 | familiar with sam so i'm gonna just give a high level overview of how sam works you have an image

01:28:51.820 | encoder that runs on every frame um sam2 can be used on a single image in which case the only

01:28:58.780 | difference between sam2 and sam is that image encoder which sam used a standard vit um sam2

01:29:08.940 | replaced that with a uh uh hera hierarchical encoder which gets approximately the same

01:29:15.580 | results but leads to a six times faster inference which is excellent especially considering how in

01:29:22.460 | a trend of 23 was replacing the vit with more efficient backbones um in the case where you're

01:29:31.180 | doing video segmentation the the difference is that you actually create a memory bank and you

01:29:35.820 | cross attend the features from the image encoder based on the memory bank so the uh feature set

01:29:44.780 | that is created is essentially uh well i'll go more into it in a couple slides but we take the

01:29:52.860 | features from the past couple frames plus a set of object pointers and the set of prompts and

01:30:01.500 | use that to uh generate our new masks then we then fuse the new masks for this frame

01:30:07.660 | with the um image features and add that to the memory bank it's well i'll say more in a minute

01:30:16.620 | the um just like sam that sam2 actually uses a data engine to create its uh data set in that

01:30:23.180 | people are they assembled a huge amount of reference data used people to label some of it

01:30:28.780 | and train the model uh use the model to label more of it and ask people to refine the predictions of

01:30:35.820 | the model and then ultimately the data set is just uh created from the final output of the model

01:30:41.660 | on the uh reference data it's very interesting this paradigm is so interesting to me because it

01:30:47.100 | uh it uh unifies a model in a data set in a way that is very unique it seems unlikely that another

01:30:55.180 | model could come in and have such a tight relationship with the training set um yeah

01:31:02.460 | so brief overview of how the memory bank works the paper did not have a great visual so i'm just i'm

01:31:11.020 | going to fill in a bit more um so we take the last couple frames from our video and

01:31:20.940 | uh we take the last couple frames from our video uh attend that along with the set of prompts

01:31:29.500 | that we provided they could come from the future they could come from anywhere in the video

01:31:34.780 | as well as reference object pointers saying by the way here's what we've found so far

01:31:40.620 | uh attending to the last few frames has the interesting benefit of

01:31:44.460 | allowing it to model complex object motion uh without actually

01:31:50.300 | uh you by limiting the amount of frames that you attend to you manage to keep the model running in

01:31:58.300 | real time this is such an interesting topic topic for me because one would assume that attending to

01:32:04.620 | all of the frames is super essential having some type of summarization of all the frames

01:32:08.540 | is super essential for high performance um but we see in their later ablation that that actually is

01:32:14.700 | not the case so here just to make sure that there is some benchmarking happening we just compare to

01:32:22.380 | some of the stuff that's came out prior and indeed the sam2 strategy does improve on the state of the

01:32:29.980 | art um this ablation deep in their dependencies was super interesting to me uh we see in section

01:32:40.140 | c the number of memories um one would assume that increasing the count of memories would

01:32:45.580 | meaningfully increase performance and we see that it has some impact but not the type that

01:32:50.780 | you'd expect and that it meaningfully decreases speed which justifies in my mind just having this

01:32:56.540 | 50q of memories um although in the future i'm super interested to see a more dedicated

01:33:05.980 | summarization of all of the last video not just a stacking of the last frames so

01:33:13.660 | that another extension of beautiful per frame work into the uh video domain the next trend i'm

01:33:25.180 | interested in talking about is uh this interesting at roboflow we're super interested in training

01:33:31.260 | real-time object detectors those are bread and butter and so we're doing a lot to keep track of

01:33:35.660 | what is actually happening in that space uh we are finally starting to see something change

01:33:42.940 | so for years yellows have been the dominant way of doing real-time object detection and we can see

01:33:50.300 | here that they've essentially stagnated the the performance between 10 and 11 is not meaningfully

01:33:56.700 | different at least you know in in this type of high-level chart and even from the last couple

01:34:03.100 | series there's not a major change uh so yellows hit a plateau debtors have not so we can look here

01:34:14.700 | and see the yellow series has this plateau and then these are rt debtor lw data and define have

01:34:22.300 | meaningfully changed that plateau so that in fact the best define models are plus 4.6 ap on coco at

01:34:29.580 | the same latency so three major steps to accomplish this uh the first rt debtor which is technically

01:34:38.460 | a 2023 paper pre-print but published officially in 24 so i'm going to include that i hope that's

01:34:44.460 | okay um that is showed that uh rt data showed that we could actually match or outspeed yellows

01:34:50.940 | um then lw debtor showed that pre-training is hugely effective on debtors and much less so

01:34:58.060 | on yellows and then define out of the types of bells and whistles that we expect from uh

01:35:02.060 | these types this this uh arena so the major improvements that rt data shows was uh taking

01:35:11.820 | the multi-scale features that debtors typically pass into their encoder and decoupling them into

01:35:17.820 | a much more efficient uh transformer encoder uh the transformer is of course quadratic complexity

01:35:25.180 | so decreasing the amount of stuff that you pass in at once is super helpful for increasing your

01:35:31.580 | runtime or uh increasing your throughput so that change basically brought us up to yellow speed

01:35:38.700 | and then they do a hardcore analysis on uh benchmarking yellows including the nms step

01:35:46.620 | once you uh once you include the nms in the latency calculation you see that in fact these

01:35:52.380 | debtors are outperforming at least this time the uh the the yellows that existed

01:35:59.420 | then lw debtor goes in and suggests that in fact the uh um this frame the huge boost here is from

01:36:09.980 | pre-training so this uh is the defined line and this is the defined line without pre-training

01:36:16.860 | it's within range it's still an improvement over the uh yellows but the really huge boost comes

01:36:21.980 | from the benefit of pre-training uh in when yellow x came out in 2021 they showed that they got much

01:36:29.820 | better results by having a much much longer training time but they found that when they

01:36:36.780 | did that they actually did not benefit from pre-training so you see in this graph from lw

01:36:43.180 | debtor in fact yellows do have a real benefit from pre-training but it goes away as we increase the

01:36:49.180 | training time then the debtors converge much faster lw debtor trains for only 50 epochs rt

01:36:55.420 | debtors 60 epochs so one could assume that in fact the entire extra gain from pre-training is that

01:37:03.820 | you're not destroying your original weights by relying on this long training cycle

01:37:07.820 | um and then lw debtor also shows superior performance to our favorite data set roboflow 100

01:37:17.420 | which means that they do better on the real world not just on coco

01:37:20.380 | then define throws all the bells and whistles at it uh yellow models tend to have a lot of

01:37:29.340 | very specific uh complicated loss functions this uh define brings that into the debtor

01:37:36.300 | world and shows consistent improvement on a variety of debtor based frameworks

01:37:41.100 | so bring these all together and we see that suddenly we have almost 60 ap on coco while

01:37:47.900 | running in like 10 milliseconds huge huge stuff so we're spending a lot of time trying to build

01:37:56.220 | models that work better with less data and debtors are clearly becoming a promising step in that

01:38:01.660 | direction the we're interested in seeing from the debtors in this this trend to next is co-debtor and

01:38:11.660 | the the the models that are currently sitting on the top of the uh leaderboard for large scale

01:38:17.820 | inference scale really well as you switch out the backbone we're very interested in seeing and and

01:38:25.020 | having people publish a paper potentially us on what happens if you take these real-time ones

01:38:29.980 | and then throw a swing g at it like do we have a Pareto curve that extends from the real-time

01:38:34.780 | domain all the way up to the uh uh super super slow but high performance domain we also want

01:38:41.260 | to see people benchmarking an rf100 more because that type of data is what's relevant for most

01:38:46.860 | users um and we want to see more pre-training because pre-training works now it's super cool

01:38:57.500 | all right so yeah so in that theme uh one of the big things that we're focusing on

01:39:03.180 | is how do we get more out of our pre-trained models um and one of the lenses to look at this

01:39:08.540 | is through sort of this this new requirement for like fine-grained visual details and your

01:39:14.860 | representations that are extracted from your foundation model so it's sort of a hook for this

01:39:19.820 | um oh yeah this is just a list of all the the papers that i'm going to mention i just want to

01:39:24.940 | make sure i set up actual papers so you can find it later um yeah so sort of the big hook here is

01:39:30.620 | that i make the claim that llms can't see if you go to if you go to claude or um chat gpt you ask

01:39:38.860 | it to to see this uh uh watch and tell me what time it is it fails right and so you could say

01:39:45.820 | like maybe maybe the um like this is like a very classic uh test of an llm but you could say okay

01:39:53.260 | maybe this this image is like too zoomed out and it just like it'll do better if we increase the

01:39:58.700 | resolution and it has easier time finding these fine fine-grained features like where the watch

01:40:02.780 | hands are pointing no dice and you can say okay well maybe uh the model just doesn't know how to

01:40:07.660 | tell time from knowing the position of the hands but if you actually prompt it textually it's very

01:40:12.220 | easy for it to tell the time so this to me is proof that these llms literally cannot see the

01:40:17.180 | position of the watch hands and it can't see those details so the question is sort of why and uh for

01:40:22.380 | you anthropic heads out there claude fails too um so the the my first pick for best paper of 2024

01:40:30.620 | envision is this mmvp paper which tries to investigate why do llms not have the ability

01:40:35.900 | to see fine-grained details and so for instance it it comes up with a lot of images like this

01:40:40.860 | where you ask it a question that seems very visually apparent to us like which way is the

01:40:44.540 | school bus facing and it gets it wrong and then of course it makes up details to support its wrong

01:40:48.620 | claim um and so the process by which it finds these images is sort of contained in its hypothesis for

01:40:55.740 | why it can't uh see these details so it hypothesizes that models that have been initialized with with

01:41:03.260 | clip as their vision encoder they don't have fine-grained details and the features extracted

01:41:09.180 | using clip because um clip sort of doesn't need to find these fine-grained details to do its job

01:41:15.100 | correctly which is just to match um captions and images right um and sort of at a high level even

01:41:21.340 | if chat gpt wasn't initialized with clip um and wasn't trained contrastively at the vision encoder

01:41:26.780 | wasn't trained contrastively at all still in order to do its job of capturing the image uh it could

01:41:32.140 | do a pretty good job without actually finding the exact position of all the objects and visual

01:41:37.020 | features in the image right so this paper finds a set of difficult images for these types of models

01:41:44.540 | and the way it does it is it looks for embeddings that are similar in clip space but far

01:41:48.540 | in dyna v2 space so dyna v2 is a foundation model that was trained um self-supervised purely

01:41:55.020 | on image data um and it kind of uses like some complex student teacher framework but essentially

01:42:01.340 | and like it patches out like certain areas of the image or like crops with certain areas of

01:42:06.220 | the image and tries to make sure that those have consistent representations which is a way for it

01:42:09.740 | to learn very fine-grained visual uh features and so if you take things that are very close in clip

01:42:15.660 | space and very far in dyna v2 space you get a set of images that um basically a pairs of images that

01:42:22.620 | are hard for chat gpt and other big language models to distinguish so if you then ask it

01:42:27.900 | questions about this image well as you can see from this chart it's going to answer the same way

01:42:33.420 | um for both images right because to to from the perspectives of vision encoder they're the same

01:42:38.780 | image and so if you ask a question like how many eyes does this animal have it answers the same for

01:42:43.340 | both and like all these other models including lava um do the same thing right and so this is

01:42:49.260 | the the benchmark that they create which is like finding clip like clip blind pairs which is pairs

01:42:54.860 | of images that are similar in clip space and creating a data set of multiple choice questions

01:42:59.820 | based off of those um and so how do these models do well really bad um lava i think so so chat gpt

01:43:08.620 | and jim and i do a little bit better than random guessing but like half of the performance of

01:43:12.460 | humans who find these problems to be very easy uh lava is interestingly extremely negatively

01:43:19.740 | correlated with this data set it does much much much much worse than random guessing which means

01:43:24.780 | that this process has done a very good job of identifying hard images for for lava specifically

01:43:30.780 | and that's because lava is basically not trained for very long and is initialized from clip and so

01:43:37.020 | you would expect it to do poorly on this data set so one of the proposed solutions that this paper

01:43:44.140 | attempts is by basically saying okay well if clip features aren't enough what if we train

01:43:48.380 | the visual encoder of the language model also on dyno features and so it um proposes two different

01:43:54.540 | ways of doing this one out of additively um which is basically interpolating between the two features

01:44:00.460 | and then one is interleaving which is just kind of like training one on the combination of

01:44:05.340 | both features so there's this really interesting trend when you do the additive mixture of features

01:44:10.620 | so zero is all um clip features and one is all dyna v2 features so it as you in so i think it's

01:44:21.100 | helpful to look at the rightmost chart first which is as you increase the number of dyna v2 features

01:44:25.500 | your model does worse and worse and worse on the actual language modeling task and that's

01:44:29.420 | because dyna v2 features were trained completely from a self-supervised manner and completely in

01:44:34.620 | image space it knows nothing about text these features aren't really compatible with these text

01:44:38.940 | models and so you can train an adapter all you want but it seems that it's in such an alien

01:44:43.580 | language that it's like a very hard optimization for this these models to solve and so that kind

01:44:49.420 | of supports what's happening on the left which is that yeah it gets better at answering these

01:44:55.260 | questions as you include more dyna v2 features up to a point but then you when you oversaturate it

01:45:01.500 | completely loses its ability to like answer language and and do language tasks um so uh

01:45:10.140 | you can also see with the interleaving like they essentially double the number of tokens that are

01:45:14.860 | going into these models um and just train on both and it still doesn't really solve the mmvp task

01:45:20.620 | it gets lava 1.5 above random guessing by a little bit but still not close to um chachi pt or any

01:45:28.460 | like human performance obviously um so clearly this proposed solution of just using dyna v2

01:45:34.460 | features directly isn't going to work and basically what that means is that as a um

01:45:39.660 | as a vision foundation model dyna v2 is going to be insufficient for language tasks right

01:45:45.340 | so my next pick for best paper of 2024 um would be florence 2 which tries to solve this problem

01:45:52.700 | by incorporating not only this dimension of spatial hierarchy which is to say pixel level

01:45:58.940 | understanding but also in making sure to include what they call semantic granularity which ends up

01:46:05.020 | the goal is basically to have features that are sufficient for finding objects in the image so

01:46:10.860 | they're they're they have enough pixel information but also can be talked about and can be reasoned

01:46:16.780 | about um and that's on the semantic granularity axis so here's an example of um basically three

01:46:25.500 | different paradigms of labeling that they do um so they create a big data set um one is text

01:46:32.060 | which is just captioning and you would expect a model that's trained only on captioning to

01:46:35.900 | have similar performance like chachi pt and like not have uh spatial hierarchy not have

01:46:41.660 | features that are meaningful at the pixel level and so they add another type which is

01:46:46.220 | region text pairs which is essentially either classifying a region or um

01:46:51.900 | doing object detection or doing instant segmentation on that region or captioning that

01:46:59.500 | region and then they have text phrase region annotations which is essentially a triple um

01:47:05.580 | and basically not only do you have a region that you've described you also find it's like

01:47:10.860 | its place in a descriptive paragraph about the image which is basically trying to introduce even

01:47:16.700 | more like semantic understanding of these regions and so like for instance if you're saying a woman

01:47:21.260 | riding on the road right you have to know what a woman is and what the road is and that she's on

01:47:25.340 | top of it and that's that's basically composing a bunch of objects in this visual space but also

01:47:30.300 | thinking about it semantically right um and so the way that they do this is they take um basically

01:47:36.860 | they just dump uh features from a vision encoder straight into a uh encoder decoder transformer

01:47:44.860 | um and then they train a bunch of different tasks like object detection and so on uh as a language

01:47:52.540 | task and i think that's one of the big things that we saw in 2024 is these these um vision

01:47:59.260 | language models operating in on pixel space linguistically so they introduce a bunch of

01:48:04.380 | new tokens to point to locations and um in pixel space so how does it work how does it actually do

01:48:13.180 | we can see uh if you look at the graph on the right which is using the the dino the uh the dino

01:48:20.300 | framework um your your pre-trained florence 2 models transfer very very well they get 60 60

01:48:28.540 | percent map on cocoa which is like approaching state-of-the-art and they train with you're good

01:48:34.540 | and they train with a much more um uh much more efficiently so they they converge a lot faster

01:48:41.020 | which both of these things are pointing to the fact that they're actually leveraging

01:48:44.940 | their pre-trained weights effectively um so where is it falling short so these models i forgot to

01:48:52.380 | mention florence is a 0.2 billion and a 0.7 billion parameter count so they're very very

01:48:57.820 | small in terms of being a language model um and i think that this framework you can see saturation

01:49:04.460 | so what this graph is showing is that if you train a florence 2 model purely on the image

01:49:10.460 | level and region level annotations and not including the pixel level annotations like

01:49:14.860 | segmentation it actually performs better as an object detector and what that means is that

01:49:21.660 | it's not able to actually learn all the visual tasks that it's trying to learn because it doesn't

01:49:26.940 | have enough capacity so i'd like to see this paper explore larger model sizes which brings us

01:49:31.660 | to our next big paper of 2024 um or two papers so polygema came out earlier this year polygema 2 was

01:49:39.580 | released i think like a week or two ago um oh i forgot to mention you can actually train like

01:49:45.340 | label text data sets on roboflow and you can train a florence 2 model and you can actually train a

01:49:49.980 | train a polygema 2 model on roboflow which we got into the platform within like 14 hours of release

01:49:54.780 | which i was really excited about so anyway so polygema 2 and so polygema is essentially doing

01:50:00.620 | the same thing but instead of doing an encoder decoder it just dumps everything into a decoder

01:50:04.460 | only transformer model um but it also introduced the concept of location tokens to point to

01:50:08.940 | objects in pixel space polygema 2 so polygema uses gemma as the language encoder and it uses

01:50:15.820 | gemma 2b polygema 2 introduces using multiple different sizes of language encoders um so the

01:50:23.260 | way that they sort of get around having to do encoder decoder is they use the concept of prefix

01:50:28.460 | loss which basically means that when it's generating tokens um autoregressively it's

01:50:35.660 | all those uh tokens in the prefix which is like the image that it's looking at and like a

01:50:40.540 | description of the task that it's trying to do they're attending to each other fully full attention

01:50:45.420 | um which means that you know it can sort of find high level uh it's easier for the the prefix to

01:50:52.060 | color to color the output of the suffix and also to just find like features uh easily so

01:51:00.460 | this is sort of an example of like one of the tasks that was trained on which is like you

01:51:04.700 | describe the task in english um and then you give it all these like you're asking for it to segment

01:51:12.860 | these two classes um of objects and then it finds like their locations using these look tokens and

01:51:19.740 | it finds their masks using uh some encoding of the masks into tokens and yeah so one of my critiques

01:51:30.780 | i guess of polygema one at least is that um you find that performance saturates as a pre-trained

01:51:36.380 | model after only 300 million examples seen um so what this graph is representing is each blue dot

01:51:43.660 | is a performance on some downstream task you can see that after seeing 300 million examples

01:51:49.260 | it sort of does equally well on all of the downstream tasks that they tried it on which

01:51:55.340 | was a lot as 1 billion examples which to me also kind of suggests a lack of capacity for this model

01:52:02.060 | polygema 2 you can see the results on object detection so these were transferred to um

01:52:10.460 | to coco um and you can see that this sort of also points to an increase in capacity being

01:52:17.180 | helpful to the model you can see as both the resolution increases and the parameter count

01:52:23.020 | of the language model increases performance increases so resolution makes sense obviously

01:52:26.780 | it helps to find small images or small objects in the image but also makes sense from another reason

01:52:31.820 | which is that it kind of gives the model a thinking register and it gives it more tokens to

01:52:35.900 | like process when making its predictions um but yeah you could you could say oh 43.6 that's not

01:52:42.860 | that great like um Florence 2 got 60 but this is not training a dino or a debtor on top of this

01:52:50.140 | language or this image encoder it's doing the raw language modeling task on coco um so it doesn't

01:52:57.660 | have any of the bells whistles it doesn't have any of the fancy losses it doesn't even have

01:53:01.260 | bipartite graph matching or anything like that okay the big result and one of the reasons that

01:53:07.580 | I was really excited about this paper is that they blow everything else away on mmvp I mean 47.3

01:53:13.980 | sure that's nowhere near human accuracy which again is 94 but for a you know a two billion

01:53:19.500 | language two billion parameter language model to be chat2bt that's quite the achievement

01:53:23.820 | um and that sort of brings us to our final pick for paper of the year which um is aimv2 so

01:53:34.380 | aimv2 sort of says okay maybe this language model like maybe coming up with all these specific

01:53:40.780 | annotations to find features and with high fidelity and pixel space isn't actually necessary

01:53:47.420 | and we can come up with an even simpler more beautiful idea for combining um you know image

01:53:53.580 | tokens and pixel tokens in a way that's interfaceable for language tasks um and this

01:53:59.020 | is nice because it can scale you can come up with lots more data if you don't have to come up with

01:54:03.260 | all these annotations right so the way that it works is it does something very very similar to

01:54:07.900 | polygemo where you have a vision encoder that dumps image tokens into a decoder only transformer

01:54:13.420 | but the interesting thing is that it also autoregressively tries to learn

01:54:19.580 | the mean squared error of the image tokens so instead of having to come up with fancy object

01:54:24.940 | detection or semantic or segment or segmentation labels you can just try to reconstruct the image

01:54:30.060 | and have it learn fine-grained features that way um and it does this in kind of i think a beautiful

01:54:35.580 | way that's kind of compatible with the polygemo line of thinking which is randomly sampling a

01:54:39.820 | prefix prefix length and using only this number of image tokens as the prefix um and so doing a

01:54:47.580 | similar thing with the uh causal so the causal prefix is the the attention mask on the right so

01:54:53.340 | it's doing full block attention with some randomly sampled number of image tokens to then reconstruct

01:54:58.700 | the rest of the image and the downstream caption for that image and so this is the data set that

01:55:06.380 | they train on it's image or internet scale data very high quality data created by the

01:55:11.500 | data filtering networks paper essentially which is maybe the best clip data that exists

01:55:18.700 | and we can see that this is finally a model that doesn't saturate it's even at the highest

01:55:27.020 | parameter count it's it appears to be well at the highest parameter account it appears to be

01:55:34.140 | improving in performance with more and more samples seen and so you can sort of think that

01:55:39.100 | uh you know if we just keep bumping the parameter count and increasing the example scene which is

01:55:44.380 | the the line of thinking for language models then it'll keep getting better so how does it actually

01:55:49.900 | do at finding oh it also improves with resolution which you would expect for a model that um

01:55:57.100 | this is the image net classification accuracy but yeah it does better if you increase the

01:56:01.740 | resolution which means that it's actually leveraging and finding fine-grained visual

01:56:05.820 | features um and so how does that actually do compared to clip on coco well you can see that

01:56:12.620 | if you slap a transformer uh detection head on it and train on coco it's just 60.2 which is also

01:56:18.780 | within spitting distance of soda which means that it does a very good job of finding um visual

01:56:24.300 | features but you could say okay well wait a second uh clip got to 59.1 so like how does this prove

01:56:33.100 | your claim at all because doesn't that mean like clip which is known to be clip blind and do badly

01:56:38.300 | on mmvp it's able to achieve a very high performance on fine on this fine-grained visual

01:56:43.660 | features task of object detection well they train on like tons of data they train on like objects

01:56:49.740 | 365 coco flicker and everything else and so i think this benchmark doesn't do a great job of

01:56:56.300 | selling how good of a pre-trained model mv2 is and we would like to see uh performance on

01:57:02.060 | fewer data as examples and not trained to convergence on object detection so

01:57:07.100 | seeing it in the real world on like a data set like robo flow 100 i think would be

01:57:11.100 | quite interesting and our i guess our final final pick for paper of 2024 would be moondream so

01:57:17.420 | introducing vick to talk about that

01:57:21.260 | uh but overall that was exactly what i was looking for like best of 2034 amazing job

01:57:28.540 | um uh yeah you can there's any other questions while vick gets set up like vision stuff

01:57:35.260 | yeah

01:57:42.540 | hi well while we're getting set up hi over here thanks for the really awesome talk one of the

01:57:48.940 | things that's been weird and surprising is um that the foundation model companies uh

01:57:56.460 | even these mlms they're just like worse than rt tether at detection still like if you wanted to

01:58:05.180 | pay a bunch of money uh to auto label your detection data set if you gave it to openai

01:58:10.060 | or claude that would be like a big waste um so i'm curious just like even polygema 2 like uh

01:58:16.700 | is worse so so i'm curious to hear your thoughts on like how come nobody's cracked the code on like

01:58:22.700 | a generalist that really uh you know beats a specialist model in computer vision like they

01:58:30.380 | have in uh in lm land i can can you hear me okay oh yeah um it's very very interesting question

01:58:46.380 | i think um it depends on the specific domain uh for image classification it's basically there

01:58:53.260 | in the aim v2 showed a simple attentional probe on the pre-trained features gets like 90 which is

01:59:00.380 | as well as anyone does um the the the bigger question like why isn't it transferring to

01:59:06.860 | uh uh object detection especially like real-time object detection um i think in my mind there are

01:59:15.100 | two answers one is object detection is really really really uh the architectures are super

01:59:21.980 | domain specific you know we see these all these super super complicated things and it's not

01:59:26.700 | super easy to to to build something that just transfers naturally like that whereas

01:59:31.740 | image classification you know clip pre-training transfers super super

01:59:34.860 | easily um and the other thing is until recently the real-time object detectors didn't even really

01:59:43.340 | benefit from pre-training like you see the yolos that are like essentially saturated showing very

01:59:48.540 | little difference with uh pre-training improvements uh with using pre-trained model at all it's not

01:59:54.700 | surprising necessarily that people aren't looking at the effects of better and better pre-training

02:00:01.420 | on real-time detection maybe that'll change in the next year does that answer your question

02:00:05.260 | cool uh can you guys hear me uh yeah one thing i want to add is just like or just to summarize

02:00:12.860 | basically is that like until 2024 you know we haven't really seen a combination of transformer

02:00:19.340 | based uh object detectors and uh fancy losses and polygema suffers from the same problem which

02:00:25.900 | is basically to say that um these resnet are like the convolutional models they have all these like

02:00:32.940 | extreme optimizations for for doing object detection but essentially i think it's kind of

02:00:38.940 | been shown now that convolution models like just don't benefit from pre-training and just don't

02:00:42.780 | like have the level of intelligence to transform models awesome hi can you hear me cool sure you

02:00:54.780 | see you are you sharing your screen i might have forgotten to do that let me do that sorry

02:01:09.260 | oh here's your screen uh-oh classic um you might have to quit zoom and restart what um

02:01:18.140 | it's fine yeah it's like we we have we have a capture of your screen i'll just make sure it's

02:01:24.220 | visible so let's get to okay easy now

02:01:33.500 | to make it likely for you

02:01:34.860 | but soon no yeah yeah there you go perfect all right hi everyone my name is vic um i've been

02:01:46.460 | working on moon dream for almost a year now like sean mentioned i just went and looked and it turns

02:01:51.580 | out the first version i released december 29 2023 um it's been a fascinating journey so moon dream

02:01:58.940 | um started off as a tiny vision language model since then we've expanded scope a little bit to

02:02:04.300 | also try and build some tooling client libraries etc to help people really deploy it

02:02:09.020 | um unlike traditional large models that are focused at assistant type use cases we're

02:02:16.700 | laser focused on building um capabilities that developers can sorry it's uh

02:02:27.100 | yeah we're laser focused on building capabilities that developers can use to build vision applications

02:02:32.060 | uh that can run anywhere so in a lot of cases for vision more so than for text you really care about

02:02:37.580 | being able to run on the edge run in real time etc so um it's really important we have um we have

02:02:44.540 | different output modalities that we support there's query where you can ask general english

02:02:48.380 | questions about an image and get back human-like answers there's captioning which allows you to

02:02:53.660 | get back human-like answers there's captioning which a lot of our users use for generating

02:02:59.340 | synthetic data sets to then train diffusion models and whatnot um we've done a lot of work to minimize

02:03:04.140 | the hallucinations there so that's um used a lot we have open vocabulary object detection built-in

02:03:09.900 | similar to a couple more recent models like pali gem etc where rather than having to train a dedicated

02:03:14.540 | model you can just say show me soccer balls in this image or show me there any deer in this image

02:03:19.820 | detected uh more recently earlier this month we released pointing capability where if all

02:03:26.860 | you're interested in is the center of an object um you can just ask it to point out where that

02:03:32.940 | is this is very useful when you're doing ui automation type stuff um let's see

02:03:38.860 | la we we have two models out right now there's a general purpose to be paramodel which um

02:03:48.300 | runs fair like it's it's uh it's fine if you're running on server it's uh good for our localama

02:03:53.260 | desktop friends and you can run on flagship flagship mobile phones but it never really

02:03:58.300 | fulfill the promise of being able to run anywhere uh last week released a new 0.5b paramodel

02:04:03.500 | which should be seen more as a distillation target as opposed to a general purpose model

02:04:08.780 | uh it's very good if you're running on like older mobile phones or edge devices uses less memory

02:04:15.980 | even with our not yet fully optimized inference client um so the way we built our 0.5b model was

02:04:24.780 | to start with the two billion parameter model um and prune it while doing continual training to

02:04:32.620 | retain performance we our objective during the pruning was to preserve accuracy across a broad

02:04:40.140 | set of benchmarks so the way we went about it was to estimate the importance of different

02:04:44.380 | components of the model like attention heads channels um mlp rows and whatnot um using

02:04:51.500 | basically a technique based on the gradient i'm not sure how much people want to know details

02:04:55.900 | we'll be writing a paper about this but uh feel free to grab me if you have more questions

02:04:59.660 | uh then we iteratively prune a small chunk that will minimize loss in performance uh retrain the

02:05:05.500 | model to recover performance and bring it back um the 0.5b we release is more of a proof of concept

02:05:11.660 | that this is possible i think the thing that's really exciting about this is it makes it possible

02:05:15.180 | for um for developers to build using the 2b parameter model and just explore build their

02:05:24.540 | application and then once they're ready to deploy uh figure out what exactly they need out of the

02:05:28.940 | model and prune those capabilities into a smaller form factor that makes sense for their deployment

02:05:33.100 | target um so yeah very excited about that let me talk to you folks a little bit about uh another

02:05:40.540 | problem i've been working on recently which is similar to the clocks example we've been talking

02:05:44.140 | about we had a customer reach out who was uh talking about like who had a bunch of gauges

02:05:50.300 | out in the field this is very common in manufacturing and oil and gas where you

02:05:54.140 | have a bunch of analog devices that you need to monitor it's expensive to have humans look at that

02:06:00.620 | and monitor stuff and make sure that uh the system gets shut down when the temperature goes over 80

02:06:06.060 | or something so i was like yeah this seems easy enough happy to happy to help you distill that

02:06:11.020 | uh let's let's get it going turns out our model couldn't do it at all uh i went and looked at

02:06:15.900 | other open source models to see if i could just generate a bunch of data and learn from that that

02:06:20.940 | did not work either so i was like let's look at what the folks with hundreds of billions of dollars

02:06:25.580 | in market cap have to offer and yeah that doesn't work either um my hypothesis is that like the

02:06:35.100 | the way these models are trained are using a large amount of image text data scraped from

02:06:40.220 | the internet and that can be biased in the case of gauges most gauge images aren't gauges in the

02:06:45.740 | wild they're product detail images like these where it's always set to zero it's paired with

02:06:51.420 | an alt text that says something like givto pressure sensor psi zero to 30 or something

02:06:58.620 | and so the models are fairly good at picking up those details it'll tell you that it's a

02:07:01.980 | pressure gauge it'll tell you what the brand is but it doesn't really learn to pay attention to

02:07:05.420 | the needle over there um and so yeah that's a gap we need to address so naturally my mind goes to

02:07:16.220 | like let's use synthetic data to solve this problem um that works but it's problematic because it

02:07:23.180 | turned out we needed millions of synthetic gauge images to get to reasonable performance and

02:07:27.660 | thinking about it reading a gauge is like not a one like it's not a zero short process in our

02:07:33.660 | minds right like if you had to tell me the reading in celsius for this real world gauge

02:07:38.860 | there's two dials on there so first you have to figure out which one you have to be paying

02:07:42.300 | attention to like the inner one or the outer one um you look at the tip of the needle you look at

02:07:48.220 | what labels it's between and you count how many and do some math to figure out what that probably

02:07:55.340 | is so what happens if we just add that as chain of thought um to give the model better understanding

02:08:04.300 | of the difference up to allow the model to better learn the subtasks it needs to perform to accomplish

02:08:09.580 | this goal um so you can see in this example this was actually generated by the latest version of

02:08:15.100 | our model uh it's like okay celsius is the inner scale it's between 50 and 60 there's 10 ticks

02:08:22.060 | it's at the second tick it's a little debatable here like there's a weird shadow situation going

02:08:25.900 | on the dial is off so i i don't know what the ground truth is but it works okay um there's

02:08:33.020 | points on there that the points over there are actually grounded i don't know if this is easy

02:08:38.140 | to see but when i click on those there's a little red dot that moves around on the image the model

02:08:42.780 | actually has to predict where uh those points are i was already trying to do this with bounding boxes

02:08:48.620 | but then malmo came out with pointing capabilities and it's like pointing is a much better paradigm to

02:08:54.620 | uh to represent this we see pretty good results this one's actually for clock reading i

02:09:01.900 | couldn't find our chart for gauge reading at the last minute so um the light blue chart is

02:09:09.980 | with uh our grounded chain of thought um this measures we have we built a clock reading

02:09:16.620 | benchmark about 500 images this measures accuracy on that um you can see it's a lot more sample

02:09:23.020 | efficient uh when you're using the chain of thought to help the model um yep another big benefit

02:09:34.300 | from this approach is like you can kind of understand how the model is doing it and how

02:09:40.300 | it's feeling so in this example the actual correct reading is 54 celsius the model output 56

02:09:46.620 | not too bad um but you can actually go and see where it messed up like it got a lot of these

02:09:53.660 | right except uh instead of saying it was on the seventh tick it actually predicted that was it was

02:10:00.300 | the eighth eighth tick and that's why it went with 56 so now that you know that this is failing in

02:10:07.340 | this way you can adjust how you're doing the chain of thought to maybe say like actually count out

02:10:10.940 | each tick from 40 instead of just trying to say it's the eighth tick or you might say like okay

02:10:15.660 | i see that there's that middle thing i'll count from there instead of all the way from 40 um

02:10:20.780 | so helps a ton the other thing i'm excited about is a few short prompting or test time

02:10:26.540 | training with this like if a customer has a specific gauge that uh like we're seeing minor

02:10:31.340 | errors on they can give us a couple of examples where like if it's misdetecting the needle they

02:10:37.340 | can go in and correct that in the chain of thought and hopefully that works the next time um

02:10:41.820 | now exciting approach we only apply it to clocks and gauges the real question is is it going to

02:10:48.380 | generalize um probably like there's some signs from text models that when you train on a broad

02:10:53.500 | number of tasks it does generalize and um i'm seeing some signs with our model as well um

02:10:59.580 | so in addition to the image-based chain of thought stuff i also added some spelling-based

02:11:03.820 | chain of thought uh to help it understand uh better understand ocr i guess um i don't understand

02:11:11.740 | why everyone doesn't do this by the way like it's trivial benchmark question that's very very easy

02:11:16.860 | to nail um but i also wanted to support it for stuff like license plate partial matching like

02:11:23.580 | hey does any license plate in this image start with wha or whatever um so yeah that sort of worked

02:11:30.700 | um all right that that ends my story about the gauges if you think about what's going on over

02:11:39.020 | here um it's interesting that like llms are showing enormous progress in reasoning especially

02:11:48.540 | with the latest set of models that we've seen but we're not really seeing i i have a feeling that

02:11:54.620 | vlms are lagging behind as we can see with these tasks that should be very simple for a human to

02:12:01.660 | do that are very easy to find um vlms failing at uh my hypothesis on why this is the case is because

02:12:08.460 | on the internet there's a ton of data that talks about how to reason there's books about how to

02:12:14.780 | solve problems there's books critiquing the books about how to solve problems but humans are just so

02:12:19.260 | good at perception that we never really talk about it like maybe in art books where it's like hey to

02:12:24.540 | show that that mountain is further away you need to desaturate it a bit or whatever but um the

02:12:31.740 | actual data on how to like look at images is isn't really present also the data we have is kind of

02:12:37.500 | sketch the best source of data we have is like image all text pairs on the internet and that's

02:12:41.500 | pretty low quality um so yeah i i think our solution here is really just we need to teach

02:12:47.180 | them how to operate on individual tasks and figure out how to scale that out um all right yep so

02:12:56.780 | conclusion uh at moon dream we're trying to build amazing blms that run everywhere very hard

02:13:02.780 | problem much work ahead but uh we're making a ton of progress and i'm really excited about

02:13:07.340 | um if anyone wants to chat about more um technical details about how we're doing

02:13:12.620 | this or interested in collaborating please please hit me up

02:13:15.260 | yeah like i always when people say when people say multi-modality like you know always think

02:13:26.460 | about vision as the first among equals in all the modalities so i really appreciate

02:13:31.260 | having the experts um okay we are a little bit out of time so we're going to move on to luca

02:13:36.940 | um and talk about open models but if anyone wants to talk to the vision guys i think there's like

02:13:42.700 | coffee and tea outside we're going to have lunch in an hour as well um so you can ask follow-up

02:13:48.620 | questions uh outside if you if you wish but yeah luca you go you get set up with uh your mic okay

02:13:56.860 | we sent you a zoom okay uh it's on it's on the calendar and then

02:14:03.180 | alan can set you up with the respondents

02:14:10.540 | hey i'm just yeah i'm just

02:14:33.740 | they just screen share for here no audio no audio no yeah speecher uh plus plug-in

02:14:39.340 | oh yeah you gotta stick around people you stick around people for sure

02:14:45.340 | are you also presenting i'm backup okay

02:14:50.300 | so i didn't know what you're because you're you're coming later yeah i don't really know either

02:14:59.100 | how was your session yesterday for the tutorial yeah

02:15:03.260 | your master class

02:15:07.580 | yeah it's just good um definitely polish the slides

02:15:27.340 | yeah so share your screen

02:15:28.540 | cool yeah i think you're set um so as you speak into that mic but any of your

02:15:42.700 | nathan's microphone no you want me to be we'll just put this on yeah

02:15:51.340 | i have the same thing yeah so these two mics they're good all right all right cool um yeah

02:16:01.980 | thanks for having me over um i'm luca i'm a research scientist at the alliance for ai

02:16:07.980 | i threw together a few slides on sort of like a recap of like interesting themes in open models

02:16:15.740 | for for 2024 um have about maybe 20-25 minutes of slides and then we can chat if there are any

02:16:22.940 | questions if i can advance to the next slide okay cool um so um i did the quick check of like

02:16:33.340 | to sort of get a sense of like how much 2024 was different from 2023 um so i went on hug and face

02:16:39.580 | and sort of tried to get a picture of what kind of models were released in 2023 and like what do

02:16:45.100 | we get in 2024 um 2023 you get we got things like uh both llama one and two we got mistro got mpt

02:16:53.020 | falcon models think the yi model came at the tail end of the year it was a pretty good year

02:16:58.460 | but then i did the same for 2024 um and it's actually quite stark difference um you have

02:17:08.860 | models that are you know reveling frontier level performance of what you can get from close models

02:17:15.420 | from like quen from deep seek we got llama three we got all sorts of different models um i added

02:17:23.260 | our own uh olmo at the bottom uh there's this uh growing group of like fully open models that i'm

02:17:29.260 | going to touch on a little bit later um but you know just looking at the slides it feels like

02:17:35.500 | 2024 was just smooth sailing happy news much better than previous year um and you know you

02:17:42.940 | can plot um you can pick your favorite benchmark or least favorite i don't know depending on what

02:17:50.460 | point you're trying to make um and plot you know your closed model your open model um and sort of

02:17:58.220 | spin it in ways that show that oh you know open models are much closer to where closed models

02:18:04.860 | are today versus to versus last year where the gap was fairly significant um so one thing that

02:18:14.860 | i think i don't know if i have to convince people in this room but usually when i give this talks

02:18:21.500 | about like open models there is always like this background question in in in people's mind of like

02:18:27.180 | why should we use open models um is it just use model apis argument you know it's it's

02:18:33.820 | just an hdp request to get output from a from one of the best model out there why do i have to set

02:18:39.500 | up infra use local models um and they're really like to answer um there is the more researchy

02:18:47.820 | answer for this which is where my background lays which is um just research if you want to do

02:18:55.180 | research on language models research thrives on on open models there is like large worth of research

02:19:01.580 | on modeling on how these models behave on evaluation and inference on uh mechanistic

02:19:08.300 | interpretability that could not happen at all if you didn't have open models um they're also um

02:19:16.140 | for ai builders there are also like good use cases for using um local models um you know you have

02:19:24.940 | some this is like a very not uh comprehensive slides but you have things like there are some

02:19:29.660 | applications where local models just blow close models out of the water um so like retrieval it's

02:19:37.020 | a very clear example um you might have like constraints like edge ai applications where it

02:19:42.860 | makes sense but even just like in terms of like stability being able to say this model is not

02:19:47.980 | changing under the hood um it's there's plenty of good cases for for um open models um and the

02:19:56.860 | community is just not models um is i stole this slide from uh one of the quen2 announcement blog

02:20:04.860 | posts uh but it's super cool to see like how much um tech exists around um open models on serving

02:20:13.660 | them on making them efficient and hosting them it's pretty cool um and um it's um if you think

02:20:23.820 | about like where the term opens come from comes from like the open source um really open models

02:20:29.740 | meet the core tenants of of um open of open source uh specifically when it comes around

02:20:37.900 | collaboration there is truly a spirit like through these open models you can build on top of others

02:20:44.060 | people innovation um we see a lot of these even in our own work of like you know as we iterate

02:20:50.860 | in the various version of almo um it's not just like every time we collect from scratch all the

02:20:57.900 | data no the the first step is like okay what are the cool data sources and datasets people have put

02:21:04.060 | together for language model for training um or when it comes to like our post-training pipeline

02:21:11.820 | we uh one of uh the steps is um you want to do some dpo and use a lot of uh outputs of other models

02:21:21.100 | uh to improve your your preference model so it's really um having like an open sort of ecosystem

02:21:28.140 | benefits and accelerates the development of open models um one thing that um we got in 2024 which

02:21:37.420 | is not a specific model but i thought it was really significant is we first got uh we got our

02:21:42.780 | first open source ai definition um so this is from the open source initiative um they've been

02:21:50.220 | generally the steward of a lot of the open source licenses when it comes to software

02:21:55.100 | and so they embarked on this journey and trying to figure out okay

02:22:00.060 | how does a license an open source license for a model look like

02:22:03.740 | um majority of the work is very dry because licenses are dry so i'm not gonna walk through

02:22:11.500 | the license step by step but um i'm just gonna pick out uh one aspect that is very good uh and

02:22:19.820 | then one aspect that personally feels like it needs improvement on the good side um this um

02:22:26.780 | this open source ai license actually this is very intuitive if you ever build open source software

02:22:33.420 | and you have some expectation around like what open source uh looks like for software uh for

02:22:41.260 | for ai sort of matches your intuition so the weights need to be fairly available uh the code

02:22:49.020 | must be released with an open source license uh and there shouldn't be like license clauses that

02:22:56.380 | block specific use cases so under this definition for example lama or some of the quen models are

02:23:03.580 | not open source because the license says you can't you can't use this this model for this

02:23:09.340 | or it says if you use this model you have to name the output this way or derivative needs to be uh

02:23:15.660 | named that way those clauses don't meet open source definition um and so they will not be

02:23:20.780 | cover the the lama license will not be cover under the open source definition um it's not perfect um

02:23:30.300 | one of the things that um um internally you know in discussion with with osi we were sort of

02:23:38.700 | disappointed is around um the language for data um so you might imagine that an open source

02:23:47.980 | ai model means a model where the data is freely available uh there were discussion around that

02:23:53.420 | but at the end of the day they decide to go with a soften stance where they say um a model is open

02:24:00.860 | source if you provide sufficient detailed information on how to sort of replicate the

02:24:06.780 | data pipeline so you have an equivalent system sufficient sufficiently detailed uh it's very

02:24:14.300 | it's very fuzzy don't like that an equivalent system is also very fuzzy um and this doesn't

02:24:21.500 | take into account the accessibility of the process right it might be that you provide enough

02:24:26.700 | information but this process costs I don't know 10 million dollars to do um now the open source

02:24:33.580 | definition like any open source license has never been about accessibility so that's never factor

02:24:40.140 | in open source software how accessible software is um I can make a piece of open source put it on

02:24:46.540 | my hard drive and never access it that software is still open source the fact that it's not widely

02:24:51.340 | distributed doesn't change the license but practically the right expectation of like what

02:24:57.020 | we want good open sources to be so it's kind of sad to see that um the the data component

02:25:04.220 | in this license is not as as open as some of us would like uh would like it to be and I linked

02:25:11.500 | the blog post that Nathan wrote on the topic that it's less rambly and easier to follow through

02:25:18.460 | um one thing that in general I think it's fair to say about the state of open models in 2024 is that

02:25:28.780 | we know a lot more than what we knew in in 2023 um like um both on the training data like the

02:25:37.260 | pre-training data you curate um on like how to do like all the post-training especially like on the

02:25:43.580 | RL side um you know 2023 was a lot of like throwing random darts at the board uh I think 2024 we have

02:25:51.900 | clear recipes that okay don't get the same results as a closed lab because there is a cost

02:25:57.260 | in in actually matching what they do um but at least we have a good sense of like okay this is

02:26:03.020 | this is the path to get state-of-the-art language model um I think that one thing that it's a

02:26:09.900 | downside of 2024 is that I think we are more research constrained than 2023 it feels that

02:26:18.220 | like you know the barrier for compute that you need to to move innovation along that's just

02:26:24.940 | being right uh rising and rising um so like if you go back to this slide there is now this this

02:26:31.660 | cluster of models that are sort of released by the compute rich club um membership is hotly debated

02:26:39.980 | um you know some people don't want to be called rich because it comes to expectations some people

02:26:45.740 | want to be called rich but I don't know there's debate but like these are players that have you

02:26:50.380 | know 10,000 50,000 GPUs at minimum um and so they can do a lot of work um and a lot of exploration

02:26:58.620 | in improving models that it's not very accessible um to give you a sense of like how I personally

02:27:06.300 | think about research budgets um for each part of the of the language model pipeline is like on the

02:27:15.340 | pre-training side you can maybe do something with a thousand GPUs really you want 10,000 and like if

02:27:21.660 | you want real estate of the art you know your deep-seek and minimum is like 50,000 um and you

02:27:27.180 | can scale to infinity the more you have the better it gets um everyone on that side still complains

02:27:32.140 | that they don't have enough GPUs uh post-training is a super wide um sort of uh spectrum you can do

02:27:40.780 | as little with like eight GPUs um as long as you're able to um run you know a a good version

02:27:51.100 | of say a llama model you can do a lot of work there um you can scale a lot of the methodology

02:27:57.420 | just like scales with compute right if you're interested in um you know your open replication

02:28:05.100 | of what OpenAI's 01 is um you're going to be on the 10k spectrum of our GPUs um inference you can

02:28:12.780 | do a lot with very few resources evaluation you can do a lot with well I should say at least one

02:28:19.020 | GPUs if you want to evaluate um open models but um in general like if you are if you care a lot

02:28:27.660 | about intervention to do on this model which is my uh prefer area of research then you know the

02:28:35.500 | resources that you need um are quite quite significant um one of the trends um that has

02:28:43.340 | emerged in 2024 is this cluster of um fully open models um so almost the model that we built AI2

02:28:53.100 | being one of them um and you know it's nice that it's not just us there's like a cluster of other

02:28:59.820 | mostly research um efforts who are working on this um and so it's good to um to give you a primer

02:29:10.860 | of what like fully open means um so fully open the easy way to think about it is instead of just

02:29:18.380 | releasing a model checkpoint that you run you release a full recipe so that um other people

02:29:25.180 | working on it uh working on that space can pick and choose whatever they want from your recipe

02:29:31.660 | and create their own model or improve on top of your model um you're giving out the full pipeline

02:29:37.180 | and all the details there um instead of just like the end output um so I pull up the screenshot from

02:29:44.380 | our recent um MOE model um and like for this model for example we released the model itself

02:29:51.340 | data that was trained on the code both for training and inference um all the logs that

02:29:57.500 | we got through um the training run as well as um every intermediate checkpoint

02:30:03.020 | um and like the fact that you release different part of the pipeline allows others to do really

02:30:10.060 | cool things um so for example this tweet from early this year from uh folks at news research

02:30:17.020 | um they use our pre-training data uh to do a replication of the bitnet paper in the open um

02:30:24.220 | so they took just a really like the initial part of a pipeline um and then did the thing on top of

02:30:31.340 | it um it goes both ways so for example for the old mode 2 model um a lot of our pre-trained data for

02:30:39.820 | the first stage of pre-training um was from this DCLM uh initiative uh that was led by folks uh

02:30:48.220 | ooh a variety of institutions it was a really nice group effort but um you know for when it was nice

02:30:57.580 | to be able to say okay you know the state of the art in terms of like what is done in the open has

02:31:01.660 | improved we don't have to like do all this work from scratch to catch up the state of the art

02:31:07.740 | we can just take it directly and integrate it and do our own improvements on top of that

02:31:13.660 | um i'm gonna spend a few minutes uh doing like a shameless plug for

02:31:19.420 | some of our fully open recipes

02:31:21.900 | um so indulge me in this um so a few things that we released this year was as i was mentioning

02:31:30.220 | this OMOE model um which is i think still is state-of-the-art um MOE model in its size class

02:31:38.780 | and it's also fully open so every components of of this model are available um we release

02:31:46.060 | a multi-modal model called MOLMO um MOLMO is not just a model but it's a full recipe of how you go

02:31:52.460 | from a text-only model to a multi-modal model and we apply this recipe on top of

02:31:58.940 | QUAN checkpoints on top of OMOE checkpoints as well on top of OMOE um and i think they've

02:32:04.380 | been replication doing that on top of Mistral as well um um on on the post-training side

02:32:14.940 | we recently released TULU 3 um same story this is a recipe on how you go from a base model

02:32:20.780 | to a state-of-the-art post-training model we use the TULU recipe on top of OMOE on top of LAMA and

02:32:28.540 | then there's been um open replication effort to do that on top of QUAN as well uh it's really nice

02:32:34.220 | to see like you know when your recipe sort of it's kind of turnkey you can apply it to different

02:32:39.340 | models and it kind of just works um and finally the last thing we released this year was OMO 2

02:32:45.260 | which so far is the best state-of-the-art fully open language model um it sort of combines aspect

02:32:52.860 | from all three of these previous models um what we learned on the data side from OMOE

02:32:57.580 | and what we learned on like making models that are easy to adapt from the multiple project

02:33:02.700 | and the TULU project um i will close with a little bit of reflection like ways this this

02:33:10.380 | ecosystem of open models um like it's not all roses it's not all happy uh it feels like day

02:33:18.060 | to day it's always in peril um and you know i talked a little bit about like the compute issues

02:33:24.300 | that come with it uh but it's really not just compute um one thing that is on top of my mind

02:33:30.860 | is due to like the environment and how um you know growing feelings about like how AI is treated

02:33:39.020 | it's actually harder to get access to a lot of the data that was used to train a lot of the

02:33:45.020 | models up to last year so this is a screenshot from really fabulous work from Shane Longpray

02:33:50.860 | who's i think is in europe um about um just access of uh like diminishing access to data

02:34:00.140 | for language model pre-training so what they did is they um went through every snapshot

02:34:07.260 | of common crawl uh common crawl is this publicly available scrape of the of a subset of the

02:34:12.860 | internet and they looked at how um for any given website uh where the website that was

02:34:19.980 | accessible in say 2017 what whether it was accessible or not in 2024 and what they found is

02:34:26.860 | as a reaction to like the close uh like of the existence of closed models like openai or clod

02:34:36.860 | gpt or clond a lot of content owners have blanket blocked any type of crawling to their website

02:34:44.380 | and this is something that we see also internally at AI2 um like one project that we started this

02:34:50.620 | year is um we wanted to we want to understand like if you're a good citizen of the internet

02:34:57.980 | and you crawl uh following sort of norms and policy that have been established in the last 25 years

02:35:05.740 | what can you crawl and we found that there's a lot of websites where um the norms of how you

02:35:13.180 | express preference of whether to crawl or not are broken a lot of people would block a lot

02:35:18.220 | of crawling but do not advertise that in robots txt you can only tell that they're crawling that

02:35:24.060 | they're blocking you in crawling when you try doing it sometimes you can't even crawl their

02:35:28.860 | robot txt to to check whether you're allowed or not and then a lot of um websites um there's like

02:35:37.340 | all these technologies that historically have been have existed to make websites serving easier

02:35:42.780 | um such as um cloudflare or dns they're now being repurposed for um blocking ai or any type of

02:35:52.300 | crawling in a way that is very opaque to the content owners themselves um so you know you go

02:35:59.420 | to these websites you try to access them and they're not available you get a feeling it's like

02:36:06.220 | oh someone changed something changed on the on the dns side that it's blocking this and likely the

02:36:13.180 | content owner has no idea they're just using uh cloudflare for better you know load balancing and

02:36:19.180 | this is something that was sort of sprung on them uh with very little notice um and i think the

02:36:26.220 | problem is this this um blocking or ideas really it impacts people in different ways um it

02:36:35.100 | disproportionately helps um companies that have a head start which are usually the closed labs

02:36:41.980 | and it hurts uh incoming uh newcomer players um where you either have now to do things in a sketchy

02:36:49.660 | way um or you're never gonna get that content uh that the closed lab might have so there's a lot

02:36:56.620 | it was a lot of coverage i'm gonna plug nathan's blog post again uh that is that um i think the

02:37:04.140 | title of this one is very succinct uh which is like we're actually not you know before thinking

02:37:09.260 | about running out of training data we're actually running out of open training data and so if one

02:37:14.540 | better open models um they should be on top of our mind um the other thing that has emerged is that

02:37:23.340 | there's strong lobbying efforts on trying to define any kind of open source ai as like a new um

02:37:34.220 | extremely risky danger um and i want to be precise here like the problem is now um

02:37:40.380 | um but the problem is not not considering the risk of this technology every technology has risks

02:37:46.380 | that that should always be considered the thing that it's like to me is um sorry it's ingenious

02:37:52.940 | is like just putting this ai on a pedestal um and calling it like an unknown alien technology

02:38:00.780 | that has like new and undiscovered potentials to destroy um humanity when in reality all the

02:38:09.260 | dangers i think are rooted in dangers that we know from existing software industry or existing

02:38:17.740 | issues that come with when using software on um on a lot of sensitive domains like medical

02:38:25.980 | areas and i also noticed a lot of efforts that have actually been going on and trying to make

02:38:31.500 | these open models safe um i pasted one here uh from ai2 but there's actually like a lot of work

02:38:38.940 | that has been going on on like okay how do you make if you're distributing this model openly

02:38:44.700 | how do you make it safe um how what's the right balance between accessibility on open models and

02:38:50.300 | safety um and then also this annoying uh brushing of um sort of concerns that are then proved to be

02:38:59.820 | unfounded under the rug you know if you remember the beginning of this year it was all about

02:39:04.140 | bio risk of these open models uh the whole thing fizzled out because there's been finally there's

02:39:11.820 | been like rigorous research not just this paper from cohere folks but it's been rigorous future

02:39:18.300 | research showing that this is really not a concern that you we should be worried about again there is

02:39:23.340 | a lot of dangerous use of ai application but this one was just like a lobbying ploy to just make

02:39:30.860 | things sound scarier uh than they actually are so i gotta preface this part it says this is my

02:39:38.060 | personal opinion it's not my employer but i look at things like uh the sp1047 from from california

02:39:45.500 | and i think we kind of dodged a bullet bullet on on this legislation we you know the open source

02:39:52.460 | community a lot of the community came together at the last sort of the last minute um and did a

02:39:59.340 | very good effort trying to explain all the negative impact of this bill um but um there's like

02:40:07.260 | i feel like there's a lot of excitement on building these open models uh or like researching on these

02:40:12.860 | open models and lobbying is not sexy uh it's kind of boring uh but um it's sort of necessary to make

02:40:20.940 | sure that this ecosystem can can really thrive um this end of presentation i have some links

02:40:29.500 | emails sort of standard thing in case anybody wants to reach out and if folks have questions

02:40:37.260 | or anything they wanted to discuss it's our open floor

02:40:40.940 | here's sofia um who wants to uh who uh one one very important open model that we haven't covered

02:40:52.540 | is mistrial so yeah yeah well it's nice to have the mistrial person yes uh talk recap the year

02:40:59.900 | mistrial but uh while sofia gets set up does anyone have like just thoughts or questions about

02:41:04.460 | the progress in this space do you always have questions always i'm very curious how we should

02:41:10.140 | build incentives to build open models things like francois choulet's uh arc prize and other

02:41:16.300 | initiatives like that what is your opinion on how we should better align incentives in the community

02:41:20.940 | so that open models stay open i think you can tap in there nice the incentive bit is like really hard

02:41:32.300 | um like even as something that i actually even we think a lot about it internally um because

02:41:39.660 | like building open models is risky it's very expensive um and so people don't want to take

02:41:45.340 | risky bets um i think the definitely like the challenges um like our challenge i think those

02:41:54.060 | are like very valid approaches for it um and then i think in general promoting building so um any

02:42:03.740 | kind of effort to participate in this challenge in those challenges if we can promoting doing that

02:42:09.180 | on top of open models um and sort of really lean into like this multiplier effect um i think that

02:42:17.580 | is a good way to go um if there were more money for um efforts um like research efforts around

02:42:27.340 | open models there's a lot of i think there's a lot of investments in companies that at the moment

02:42:33.500 | are releasing their model in the open which is really cool um but um it's usually more because

02:42:39.580 | of commercial interest and not wanting to support um this this like open models in the long term

02:42:46.380 | it's a really hard problem because i think everyone is operating sort of in what everyone

02:42:52.700 | is at their local maximum right in ways that really optimize their position on the market

02:42:58.940 | the global maximum is harder to achieve

02:43:02.460 | okay somehow it's not being shared on the screen

02:43:28.140 | uh can i ask one question you know yeah uh so i think one of the gap between the closed and

02:43:34.140 | open source models is the mutability so the closed source models like chatty was pretty

02:43:39.660 | good on the low resource languages which is not the same on the open open source models right

02:43:45.020 | so is it in your plan to improve on that space um i think in general yes is

02:43:56.220 | here yeah just just use your natural voice yeah um i think if i think we'll see a lot

02:44:02.460 | of improvements there in like chinese on the side um like there's groups um like focus on

02:44:08.700 | guys are already working on like better call for multilingual um support i think what our

02:44:18.140 | challenges there is um you really want to be experts who are actually in those countries

02:44:26.620 | that use those languages to participate in the international to give you like a very easy example

02:44:33.740 | i'm originally from italy i think i'm terribly equipped to build a model that works well in

02:44:42.140 | italy because one of the things you need to be able to do is having that knowledge of like okay

02:44:47.500 | how do i access you know libraries or content that is from this region that covers from time

02:44:54.620 | again the u.s long enough that i no longer know that um so i think that the efforts that folks

02:45:01.900 | central europe for example are doing around like okay let's let's tap into regional communities

02:45:08.300 | um to get access uh to bring in collaborators from those areas i think it's going to be like

02:45:15.180 | very crucial for getting out of this area yes let me close it up

02:45:37.580 | hello everyone

02:45:52.700 | what's that

02:45:56.060 | it's fine she's not playing any audio that's weird okay okay okay cool

02:46:06.860 | um yeah i'm super excited to be here to talk to you guys uh about mistral uh a really short

02:46:15.260 | and quick recap of what we have done what kind of models and products we have released in the past

02:46:21.900 | a year and a half so um most of you have already known that we are a small startup

02:46:29.420 | funded about a year and a half ago in paris in may 2003 it was funded by three of our co-founders

02:46:36.540 | and in september 2003 we released our first open source model mistral 7b um yeah how many of you

02:46:44.780 | have used or heard about mistral 7b hey pretty much everyone thank you uh yeah it's our uh

02:46:52.620 | pretty popular and uh community our community really love this model and in december 2003 we

02:46:59.500 | we released another popular model with the moe architecture um mr 8x 7b and

02:47:07.100 | oh going into this year you can see we have released a lot of things this year

02:47:12.620 | um first of all in february 2004 we released uh mr small mr large uh le chat which is our

02:47:20.140 | chat interface i will show you in a little bit we released a embedding model for you know converting

02:47:28.140 | your text into embedding vectors and all of our models are available um the the big cloud resources

02:47:37.820 | so you can use our model on google cloud aws asia snowflake ibm so very useful for enterprise who

02:47:46.380 | wants to use our model through cloud and in april and may this year we released another powerful

02:47:53.500 | open source um moe model ax 22b and we also released our first code model coastal which is

02:48:01.820 | amazing at 80 plus languages and then we provided another fine tuning service for customization

02:48:09.340 | so because we know the community love to fine tune our models so we provide you a very nice

02:48:15.180 | and easy option for you to fine tune our model on our platform and also we released our fine

02:48:21.020 | tuning code base called mr fine tune it's open source so feel free to take it take a look and

02:48:27.180 | more models on july to november this year we released many many other models uh first of all

02:48:37.180 | is the two new small best small models we have minister 3b great for deploying on edge devices

02:48:45.340 | we have minister 8b if you used to use mr 7b mr minister 8b is a great replacement with much

02:48:53.900 | stronger performance than mr 7b we also collaborated with nvidia and open sourced

02:49:00.140 | another model nemo 12b another great model and just a few weeks ago we updated mr large with the

02:49:08.460 | version 2 with the updated updated state of our features and really great function calling

02:49:14.940 | capabilities it's supporting function calling latently and we released two multi-modal models

02:49:21.180 | pixel 12b it's open source and pixel large just amazing model models for not understanding

02:49:29.980 | images but also great at text understanding so yeah a lot of the image models are not so

02:49:36.620 | good at text understanding but pixel large and pixel 12b are good at both image understanding

02:49:42.540 | and text understanding and of course we have models for research coastal mamba is built on

02:49:49.500 | mamba architecture and method great with working with math math problems so yeah that's another

02:49:57.580 | models uh here's another view of our model reference we have several premier models which

02:50:09.820 | means these models are mostly available through our api i mean all of the models are available

02:50:17.020 | throughout our api except for minister 7 3b but for the premium model they have a special license

02:50:25.660 | minstrel research license you can use it for free for exploration but if you want to use it for

02:50:30.940 | enterprise for production use you will need to purchase a license from us so on the top row here

02:50:37.580 | we have minstrel 3b and ab as our premier model minstrel small for best best low latency use cases

02:50:45.820 | minstrel large is great for your most sophisticated use cases pixel large is the frontier class

02:50:52.300 | multimodal model and we have coastal for great for coding and then again mr embedding model

02:50:58.540 | and the bottom the bottom the slides here we have several apache 2.0 licensed open way models

02:51:06.380 | free for the community to use and also if you want to fine tune it use it for customization

02:51:12.460 | production feel free to do so the latest we have pictures 3 12b we also have mr nemo mom

02:51:21.580 | coastal mamba and master as a real as i mentioned and we have three legacy models that we don't

02:51:28.460 | update anymore so we recommend you to move to our newer models if you are still using them

02:51:35.900 | and then just a few weeks ago we did a lot of improvements to our code interface lachette

02:51:46.300 | how many of you have used lachette oh no only a few okay i highly recommend lachette it's

02:51:54.060 | chat.mr.ai it's free to use it has all the amazing capabilities i'm going to show you right now

02:52:01.180 | but before that lachette in french means cat so this is actually a cat logo

02:52:08.860 | yeah if you can tell this is the cat eyes yeah so first of all i want to show you

02:52:17.020 | something maybe let's let's take a look at image understanding

02:52:31.100 | so here i have a receipts and i want to ask i just going to get the prompts

02:52:40.780 | going back

02:52:54.860 | going on

02:52:56.460 | yeah i had an issue with wi-fi here so hopefully it would work

02:53:03.580 | cool so basically i have a receipt and i said i ordered a coffee and a sausage how much do i owe

02:53:17.020 | at a 18 tip so hopefully it was able to get the cost of the coffee and the sausage

02:53:23.820 | and ignore the other things and um yeah i don't really understand this but i think this is coffee

02:53:30.700 | uh it's yeah nine yep and then cost of the sausage we have 22 here

02:53:38.060 | yep and then it was able to add the cost calculate the tip and all that uh great so it's great at

02:53:47.260 | image understanding is great at uh ocr tasks so if you have ocr tasks please use it as free on

02:53:54.140 | lachette it's also available through our api and also i'm going to show you a canvas example

02:54:00.380 | a lot of you may have used canvas with other tools before but uh

02:54:08.620 | with lachette is completely free again here i'm asking it to create a canvas that's used

02:54:15.420 | pi script to execute python in my browser so oh what's going on

02:54:23.020 | okay let's see if it works import this oh

02:54:30.700 | yep okay so yeah so basically it's executing python uh here exactly what we wanted uh

02:54:43.180 | and the other day i was trying to ask lachette to create a game for me let's see if we can

02:54:49.900 | make it work yeah the tetris game uh yeah

02:54:57.660 | let's just get one row maybe

02:55:05.500 | um

02:55:10.220 | ah oh no

02:55:15.500 | okay all right you get the idea i failed my mission um

02:55:28.620 | okay here we go yay

02:55:31.580 | uh cool yeah so uh as you can see lachette can write like a code about a simple game pretty

02:55:41.420 | easily and you can ask lachette to explain the code make updates however you like um

02:55:49.100 | another example there is a bar here i want to move okay right okay and uh let's go back

02:56:00.780 | another one uh yeah we also have web search capabilities like you can ask what's the latest

02:56:10.540 | ai news uh image generation is pretty cool generate an image about researchers in vancouver

02:56:21.500 | uh yeah it's black forest labs uh flex pro uh again this is free so

02:56:31.020 | oh cool i guess researchers here are mostly from university of british columbia

02:56:39.820 | uh that's smart uh yeah so this is lachette i please feel free to use it uh and let me know

02:56:48.380 | if you have any feedback we're always looking for improvement and we're going to release

02:56:52.460 | a lot more powerful features in the coming years thank you

02:56:55.740 | yeah i think we can open up the questions there's lunch also outside but uh if anyone

02:57:06.300 | thought i don't think we have a youtube entry but if anyone has any thoughts on

02:57:10.700 | mistral or omo or any of the others the open models

02:57:15.340 | um yeah no i think we can just break for lunch and uh have a chat but thanks thanks so much to

02:57:23.020 | the speakers thank you again we'll be back here what we're gonna have like some people presenting

02:57:28.620 | during lunch um i i think i think basically just go grab lunch you can come back in and eat and

02:57:34.060 | chat uh we'll have some people presenting as well right so unless you want to say you see material

02:57:39.580 | okay maybe maybe maybe you get something off now

02:57:45.020 | yeah hi everyone thank you so much for coming today um huge shout out to SWIX and the latent

02:57:55.180 | space team i think it's been a great yeah let's just give it up for SWIX just real quick um i

02:58:02.220 | did a little bit of in terms of helping with the planning but i work at notable capital some of you

02:58:07.100 | may have heard of ggv which was our former name um on the cloud infrastructure team so basically

02:58:12.300 | anything data dev tools um ai infrastructure as well as ai applications um and so we like to stay

02:58:19.260 | close to those that are smarter than us which is all of you in this room um so if anyone ever wants

02:58:23.580 | to you know brainstorm or thinking about starting a company um we're happy to collaborate we've had

02:58:28.380 | the opportunity to partner with like amazing companies such as hoshi corp bracelle neon

02:58:32.780 | and many others over the years um and we're based in san francisco and new york so yeah feel free

02:58:38.380 | to find me laura hamilton x linkedin um you know if we become friends instagram yeah um thank you

02:58:45.740 | all for coming and then we'll kick off some of the chats with aws after everyone gets lunch all right

02:59:15.420 | hi these are up here too this is not mine although i did almost take it yeah it's not like everyone's

02:59:21.820 | happy uh

02:59:35.820 | um

02:59:46.220 | nope i didn't even ask the url

03:00:12.620 | you

03:00:13.120 | you

03:00:13.620 | you

03:00:15.120 | you

03:00:17.120 | you

03:00:19.120 | you

03:00:21.120 | you

03:00:23.120 | you

03:00:25.120 | you

03:00:27.120 | you

03:00:29.120 | you

03:00:31.120 | you

03:00:33.120 | you

03:00:34.120 | you

03:00:34.620 | you

03:00:35.120 | you

03:00:35.620 | you

03:00:36.120 | you

03:00:36.620 | you

03:00:37.120 | you

03:00:37.620 | you

03:00:38.120 | you

03:00:38.620 | you

03:00:39.120 | you

03:00:39.620 | you

03:00:40.120 | you

03:00:40.620 | you

03:00:41.120 | you

03:00:41.620 | you

03:00:42.120 | you

03:00:42.620 | you

03:00:43.120 | you

03:00:43.620 | you

03:00:44.120 | you

03:00:44.620 | you

03:00:45.120 | you

03:00:45.620 | you

03:00:46.120 | you

03:00:46.620 | you

03:00:47.120 | you

03:00:47.620 | you

03:00:48.120 | you

03:00:48.620 | you

03:00:49.120 | you

03:00:49.620 | you

03:00:50.120 | you

03:00:50.620 | you

03:00:51.120 | you

03:00:51.620 | you

03:00:52.120 | you

03:00:52.620 | you

03:00:53.120 | you

03:00:53.620 | you

03:00:54.120 | you

03:00:54.620 | you

03:00:55.120 | you

03:00:55.620 | you

03:00:56.120 | you

03:00:56.620 | you

03:00:57.120 | you

03:00:57.620 | you

03:00:58.120 | you

03:00:58.620 | you

03:00:59.120 | you

03:00:59.620 | you

03:01:00.120 | you

03:01:00.620 | you

03:01:01.120 | you

03:01:01.620 | you

03:01:02.120 | you

03:01:03.620 | you

03:01:04.120 | you

03:01:05.620 | you

03:01:06.120 | you

03:01:07.620 | you

03:01:08.120 | you

03:01:09.620 | you

03:01:10.120 | you

03:01:11.620 | you

03:01:12.120 | you

03:01:12.620 | you

03:01:13.620 | you

03:01:15.620 | you

03:01:18.620 | you

03:01:21.620 | you

03:01:23.620 | you

03:01:26.620 | you

03:01:29.620 | you

03:01:32.620 | you

03:01:35.620 | you

03:01:38.620 | you

03:01:39.120 | you

03:01:40.620 | you

03:01:41.120 | you

03:01:42.620 | you

03:01:43.120 | you

03:01:44.620 | you

03:01:45.120 | you

03:01:46.620 | you

03:01:48.120 | you

03:01:48.620 | you

03:01:50.120 | you

03:01:51.620 | you

03:01:53.120 | you

03:01:54.620 | you

03:01:56.120 | you

03:01:57.620 | you

03:01:59.120 | you

03:02:00.620 | you

03:02:02.120 | you

03:02:03.620 | you

03:02:05.120 | you

03:02:06.620 | you

03:02:08.120 | you

03:02:09.620 | you

03:02:11.120 | you

03:02:12.620 | you

03:02:14.120 | you

03:02:15.620 | you

03:02:17.120 | you

03:02:18.620 | you

03:02:20.120 | you

03:02:21.620 | you

03:02:23.120 | you

03:02:24.620 | you

03:02:26.120 | you

03:02:27.620 | you

03:02:29.120 | you

03:02:30.620 | you

03:02:32.120 | you

03:02:33.620 | you

03:02:35.120 | you

03:02:36.620 | you

03:02:38.120 | you

03:02:39.620 | you

03:02:41.120 | you

03:02:41.620 | you

03:02:44.620 | you

03:02:47.620 | you

03:02:50.620 | you

03:02:53.620 | you

03:02:57.620 | you

03:03:01.620 | you

03:03:05.620 | you

03:03:10.620 | you

03:03:11.120 | you

03:03:17.120 | you

03:03:21.120 | you

03:03:25.120 | you

03:03:29.120 | you

03:03:33.120 | you

03:03:37.120 | you

03:03:38.620 | you

03:03:40.120 | you

03:03:41.620 | you

03:03:43.120 | you

03:03:44.620 | you

03:03:46.120 | you

03:03:47.620 | you

03:03:49.120 | you

03:03:50.620 | you

03:03:52.120 | you

03:03:53.620 | you

03:03:55.120 | you

03:03:56.620 | you

03:03:58.120 | you

03:03:59.620 | you

03:04:01.120 | you

03:04:02.620 | you

03:04:04.120 | you

03:04:05.620 | you

03:04:07.120 | you

03:04:08.620 | you

03:04:10.120 | you

03:04:11.620 | you

03:04:13.120 | you

03:04:14.620 | you

03:04:16.120 | you

03:04:17.620 | you

03:04:19.120 | you

03:04:20.620 | you

03:04:22.120 | you

03:04:23.620 | you

03:04:25.120 | you

03:04:26.620 | you

03:04:28.120 | you

03:04:29.620 | you

03:04:31.120 | you

03:04:32.620 | you

03:04:34.120 | you

03:04:35.620 | you

03:04:37.120 | you

03:04:38.620 | you

03:04:40.120 | you

03:04:41.620 | you

03:04:43.120 | you

03:04:44.620 | you

03:04:46.120 | you

03:04:47.620 | you

03:04:49.120 | you

03:04:50.620 | you

03:04:52.120 | you

03:04:53.620 | you

03:04:55.120 | you

03:04:56.620 | you

03:04:58.120 | you

03:04:59.620 | you

03:05:01.120 | you

03:05:02.620 | you

03:05:04.120 | you

03:05:05.620 | you

03:05:07.120 | you

03:05:08.620 | you

03:05:10.120 | you

03:05:11.620 | you

03:05:13.120 | you

03:05:14.620 | you

03:05:16.120 | you

03:05:17.620 | you

03:05:19.120 | you

03:05:20.620 | you

03:05:22.120 | you

03:05:23.620 | you

03:05:25.120 | you

03:05:26.620 | you

03:05:28.120 | you

03:05:29.620 | you

03:05:31.120 | you

03:05:32.620 | you

03:05:34.120 | You

03:05:34.620 | You

03:05:35.620 | You

03:05:37.620 | You

03:05:39.620 | You

03:05:42.620 | You

03:05:45.620 | You

03:05:48.620 | You

03:05:51.620 | You

03:05:54.620 | You

03:05:57.620 | You

03:06:00.620 | You

03:06:01.120 | You

03:06:06.120 | You

03:06:10.120 | You

03:06:14.120 | You

03:06:18.120 | You

03:06:22.120 | You

03:06:27.120 | You

03:06:27.620 | You

03:06:31.620 | You

03:06:35.620 | You

03:06:39.620 | You

03:06:43.620 | You

03:06:47.620 | You

03:06:52.620 | You

03:06:53.120 | You

03:06:57.120 | You

03:07:01.120 | You

03:07:05.120 | You

03:07:09.120 | You

03:07:13.120 | You

03:07:18.120 | You

03:07:18.620 | You

03:07:22.620 | You

03:07:27.620 | You

03:07:31.620 | You

03:07:36.620 | You

03:07:42.620 | You

03:07:43.120 | You

03:07:48.120 | You

03:07:52.120 | You

03:07:56.120 | You

03:08:00.120 | You

03:08:04.120 | You

03:08:08.120 | You

03:08:08.620 | You

03:08:12.620 | You

03:08:16.620 | You

03:08:20.620 | You

03:08:24.620 | You

03:08:28.620 | You

03:08:32.620 | You

03:08:36.620 | You

03:08:37.120 | You

03:08:39.120 | You

03:08:43.120 | You

03:08:47.120 | You

03:08:51.120 | You

03:08:55.120 | You

03:08:59.120 | You

03:09:03.120 | You

03:09:03.620 | You

03:09:06.620 | You

03:09:10.620 | You

03:09:15.620 | You

03:09:19.620 | You

03:09:23.620 | You

03:09:27.620 | You

03:09:31.620 | You

03:09:32.120 | You

03:09:34.120 | You

03:09:38.120 | You

03:09:42.120 | You

03:09:46.120 | You

03:09:50.120 | You

03:09:54.120 | You

03:09:58.120 | You

03:09:58.620 | You

03:10:02.620 | You

03:10:06.620 | You

03:10:10.620 | You

03:10:14.620 | You

03:10:18.620 | You

03:10:22.620 | You

03:10:26.620 | You

03:10:27.120 | You

03:10:30.120 | You

03:10:35.120 | You

03:10:39.120 | You

03:10:43.120 | You

03:10:47.120 | You

03:10:51.120 | You

03:10:55.120 | You

03:10:55.620 | You

03:10:57.620 | You

03:11:01.620 | You

03:11:05.620 | You

03:11:09.620 | You

03:11:13.620 | You

03:11:17.620 | You

03:11:21.620 | You

03:11:22.120 | You

03:11:26.120 | You

03:11:32.120 | You

03:11:36.120 | You

03:11:40.120 | You

03:11:44.120 | You

03:11:48.120 | You

03:11:48.620 | You

03:11:50.620 | You

03:11:54.620 | You

03:11:58.620 | You

03:12:02.620 | You

03:12:06.620 | You

03:12:10.620 | You

03:12:14.620 | You

03:12:15.120 | You

03:12:21.120 | You

03:12:25.120 | You

03:12:29.120 | You

03:12:33.120 | You

03:12:37.120 | You

03:12:41.120 | You

03:12:41.620 | You

03:12:47.620 | You

03:12:51.620 | You

03:12:55.620 | You

03:12:59.620 | You

03:13:03.620 | You

03:13:07.620 | You

03:13:08.120 | You

03:13:14.120 | You

03:13:18.120 | You

03:13:22.120 | You

03:13:26.120 | You

03:13:30.120 | You

03:13:34.120 | You

03:13:34.620 | You

03:13:40.620 | You

03:13:44.620 | You

03:13:48.620 | You

03:13:52.620 | You

03:13:56.620 | You

03:14:00.620 | You

03:14:01.120 | You

03:14:05.120 | You

03:14:09.120 | You

03:14:13.120 | You

03:14:17.120 | You

03:14:21.120 | You

03:14:25.120 | You

03:14:29.120 | You

03:14:29.620 | You

03:14:31.620 | You

03:14:35.620 | You

03:14:39.620 | You

03:14:43.620 | You

03:14:47.620 | You

03:14:51.620 | You

03:14:55.620 | You

03:14:56.120 | You

03:14:58.120 | You

03:15:02.120 | You

03:15:06.120 | You

03:15:10.120 | You

03:15:14.120 | You

03:15:18.120 | You

03:15:22.120 | You

03:15:22.620 | You

03:15:28.620 | You

03:15:32.620 | You

03:15:36.620 | You

03:15:40.620 | You

03:15:44.620 | You

03:15:48.620 | You

03:15:49.120 | You

03:15:57.120 | You

03:16:01.120 | You

03:16:05.120 | You

03:16:09.120 | You

03:16:13.120 | You

03:16:17.120 | You

03:16:17.620 | You

03:16:19.620 | You

03:16:23.620 | You

03:16:27.620 | You

03:16:31.620 | You

03:16:35.620 | You

03:16:39.620 | You

03:16:43.620 | You

03:16:44.120 | You

03:16:46.120 | You

03:16:50.120 | You

03:16:54.120 | You

03:16:58.120 | You

03:17:02.120 | You

03:17:06.120 | You

03:17:10.120 | You

03:17:10.620 | You

03:17:18.620 | You

03:17:22.620 | You

03:17:26.620 | You

03:17:30.620 | You

03:17:34.620 | You

03:17:38.620 | You

03:17:39.120 | You

03:17:43.120 | You

03:17:47.120 | You

03:17:51.120 | You

03:17:55.120 | You

03:17:59.120 | You

03:18:03.120 | You

03:18:07.120 | You

03:18:07.620 | You

03:18:09.620 | You

03:18:13.620 | You

03:18:17.620 | You

03:18:21.620 | You

03:18:25.620 | You

03:18:29.620 | You

03:18:33.620 | You

03:18:34.120 | You

03:18:36.120 | You

03:18:40.120 | You

03:18:44.120 | You

03:18:48.120 | You

03:18:52.120 | You

03:18:56.120 | You

03:19:00.120 | You

03:19:00.620 | You

03:19:04.620 | You

03:19:08.620 | You

03:19:12.620 | You

03:19:16.620 | You

03:19:20.620 | You

03:19:24.620 | You

03:19:28.620 | You

03:19:29.120 | You

03:19:31.120 | You

03:19:35.120 | You

03:19:39.120 | You

03:19:43.120 | You

03:19:47.120 | You

03:19:51.120 | You

03:19:55.120 | You

03:19:55.620 | You

03:19:59.620 | You

03:20:03.620 | You

03:20:07.620 | You

03:20:11.620 | You

03:20:15.620 | You

03:20:19.620 | You

03:20:23.620 | You

03:20:24.120 | You

03:20:26.120 | You

03:20:30.120 | You

03:20:34.120 | You

03:20:38.120 | You

03:20:42.120 | You

03:20:46.120 | You

03:20:50.120 | You

03:20:50.620 | You

03:20:54.620 | You

03:20:58.620 | You

03:21:02.620 | You

03:21:06.620 | You

03:21:10.620 | You

03:21:14.620 | You

03:21:18.620 | You

03:21:19.120 | You

03:21:23.120 | You

03:21:27.120 | You

03:21:31.120 | You

03:21:35.120 | You

03:21:39.120 | You

03:21:43.120 | You

03:21:47.120 | You

03:21:47.620 | You

03:21:49.620 | You

03:21:53.620 | You

03:21:57.620 | You

03:22:01.620 | You

03:22:05.620 | You

03:22:09.620 | You

03:22:13.620 | You

03:22:14.120 | You

03:22:18.120 | You

03:22:22.120 | You

03:22:26.120 | You

03:22:30.120 | You

03:22:34.120 | You

03:22:38.120 | You

03:22:42.120 | You

03:22:42.620 | You

03:22:46.620 | You

03:22:50.620 | You

03:22:54.620 | You

03:22:58.620 | You

03:23:02.620 | You

03:23:06.620 | You

03:23:10.620 | You

03:23:11.120 | You

03:23:15.120 | You

03:23:19.120 | You

03:23:23.120 | You

03:23:27.120 | You

03:23:31.120 | You

03:23:35.120 | You

03:23:39.120 | You

03:23:39.620 | You

03:23:43.620 | You

03:23:47.620 | You

03:23:51.620 | You

03:23:55.620 | You

03:23:59.620 | You

03:24:03.620 | You

03:24:07.620 | You

03:24:08.120 | You

03:24:12.120 | You

03:24:16.120 | You

03:24:20.120 | You

03:24:24.120 | You

03:24:28.120 | You

03:24:32.120 | You

03:24:36.120 | You

03:24:36.620 | You

03:24:40.620 | You

03:24:44.620 | You

03:24:48.620 | You

03:24:52.620 | You

03:24:56.620 | You

03:25:00.620 | You

03:25:04.620 | You

03:25:05.120 | You

03:25:09.120 | You

03:25:13.120 | You

03:25:17.120 | You

03:25:21.120 | You

03:25:25.120 | You

03:25:29.120 | You

03:25:33.120 | You

03:25:33.620 | You

03:25:37.620 | You

03:25:41.620 | You

03:25:45.620 | You

03:25:49.620 | You

03:25:53.620 | You

03:25:57.620 | You

03:26:01.620 | You

03:26:02.120 | You

03:26:06.120 | You

03:26:10.120 | You

03:26:14.120 | You

03:26:18.120 | You

03:26:22.120 | You

03:26:26.120 | You

03:26:30.120 | You

03:26:30.620 | You

03:26:34.620 | You

03:26:38.620 | You

03:26:42.620 | You

03:26:46.620 | You

03:26:50.620 | You

03:26:54.620 | You

03:26:58.620 | You

03:26:59.120 | You

03:27:03.120 | You

03:27:07.120 | You

03:27:11.120 | You

03:27:15.120 | You

03:27:19.120 | You

03:27:23.120 | You

03:27:27.120 | You

03:27:27.620 | You

03:27:31.620 | You

03:27:35.620 | You

03:27:39.620 | You

03:27:43.620 | You

03:27:47.620 | You

03:27:51.620 | You

03:27:55.620 | You

03:27:56.120 | You

03:28:00.120 | You

03:28:06.120 | You

03:28:10.120 | You

03:28:14.120 | You

03:28:18.120 | You

03:28:22.120 | You

03:28:22.620 | You

03:28:28.620 | You

03:28:32.620 | You

03:28:36.620 | You

03:28:40.620 | You

03:28:44.620 | You

03:28:48.620 | You

03:28:49.120 | You

03:28:55.120 | You

03:28:59.120 | You

03:29:03.120 | You

03:29:07.120 | You

03:29:11.120 | You

03:29:15.120 | You

03:29:15.620 | You

03:29:19.620 | You

03:29:23.620 | You

03:29:27.620 | You

03:29:31.620 | You

03:29:35.620 | You

03:29:39.620 | You

03:29:43.620 | You

03:29:44.120 | You

03:29:48.120 | You

03:29:52.120 | You

03:29:56.120 | You

03:30:00.120 | You

03:30:04.120 | You

03:30:08.120 | You

03:30:12.120 | You

03:30:12.620 | You

03:30:14.620 | You

03:30:18.620 | You

03:30:22.620 | You

03:30:26.620 | You

03:30:30.620 | You

03:30:34.620 | You

03:30:38.620 | You

03:30:39.120 | You

03:30:41.120 | You

03:30:45.120 | You

03:30:49.120 | You

03:30:53.120 | You

03:30:57.120 | You

03:31:01.120 | You

03:31:05.120 | You

03:31:05.620 | You

03:31:07.620 | You

03:31:11.620 | You

03:31:15.620 | You

03:31:19.620 | You

03:31:23.620 | You

03:31:27.620 | You

03:31:31.620 | You

03:31:32.120 | You

03:31:38.120 | You

03:31:42.120 | You

03:31:46.120 | You

03:31:50.120 | You

03:31:54.120 | You

03:31:58.120 | You

03:31:58.620 | You

03:32:02.620 | You

03:32:06.620 | You

03:32:10.620 | You

03:32:14.620 | You

03:32:18.620 | You

03:32:22.620 | You

03:32:26.620 | You

03:32:27.120 | You

03:32:31.120 | You

03:32:35.120 | You

03:32:39.120 | You

03:32:43.120 | You

03:32:47.120 | You

03:32:51.120 | You

03:32:55.120 | You

03:32:55.620 | You

03:32:59.620 | You

03:33:03.620 | You

03:33:07.620 | You

03:33:11.620 | You

03:33:15.620 | You

03:33:19.620 | You

03:33:23.620 | You

03:33:24.120 | You

03:33:28.120 | You

03:33:32.120 | You

03:33:36.120 | You

03:33:40.120 | You

03:33:44.120 | You

03:33:48.120 | You

03:33:52.120 | You

03:33:52.620 | You

03:33:56.620 | You

03:34:02.620 | You

03:34:06.620 | You

03:34:10.620 | You

03:34:14.620 | You

03:34:18.620 | You

03:34:19.120 | Like in my view, I don't know if I would do a traditional.

03:34:32.120 | Hello. Oh great.

03:34:34.120 | Awesome. Yeah, sure.

03:34:38.120 | Well, hey everyone. Hope you enjoyed lunch. Thanks for thanks for dialing in here.

03:34:44.620 | My name is Aaron wanted to give a quick shout out to the latent latent space team notable capital swicks for organizing.

03:34:53.120 | I'm with the AWS AI startups team.

03:34:55.620 | I've been in the role for about three years now.

03:34:59.120 | I was a founding product hire at a series a company had a great exit there did machine learning for a while.

03:35:06.620 | Did some strategy consulting with Google for a while and then joined AWS actually got this job on Twitter of all places.

03:35:15.120 | I liked a tweet that was like, hey, I think more more VC meetings should be over surf lessons.

03:35:21.620 | And I got a DM back saying hey, you kind of want to come work at AWS and it was off of the races from there.

03:35:27.620 | So keep your DMS open.

03:35:28.620 | I'll keep I'll keep this short here.

03:35:31.620 | Basically just wanted to kind of chat about how AWS works with founders, right?

03:35:37.620 | I think everyone's aware compute and credits are kind of like the name of the game at this point.

03:35:43.620 | I like to I like to think about ways to go deeper than that and figure out how we can add value beyond just like here's some GPUs.

03:35:51.120 | Here's some credits and run with it, right?

03:35:53.120 | Like that's kind of table six at this point.

03:35:55.120 | So I wrote the PR FAQ for an accelerator program that is a 10 week program.

03:36:02.120 | It just wrapped up at reinvent last week where we take a couple companies from around the world and really lean in and try and build co build with them.

03:36:12.620 | We find design partners.

03:36:14.120 | We do like product strategy, help them with fundraising.

03:36:18.120 | We just put them on stage at reinvent.

03:36:21.120 | There's like, you know, 700 people in the audience.

03:36:23.120 | It's a really fun, fun experience.

03:36:25.120 | And that's just kind of like, you know, putting what we do on a day to day on the world stage because our whole team is dedicated to figuring out ways to, again, go beyond beyond credits, beyond compute and support.

03:36:38.620 | Right. So we worked with founders from like day zero, haven't even incorporated.

03:36:43.120 | We're still like bouncing ideas off of off of each other, thinking about ways to go to market.

03:36:48.120 | And then, you know, beyond that, like as you're scaling, finding design partners and then getting you listed on marketplace and really co-selling together.

03:36:57.120 | And we'd love to be a small part of the journey as you're considering entrepreneurship.

03:37:02.620 | So if you want to chat about all things entrepreneurship, please please reach out.

03:37:09.120 | I'm on LinkedIn, Aaron A. Melgar.

03:37:12.120 | If you do just want GPUs and compute and credits, happy to chat about that as well.

03:37:18.120 | But but great to be here. And again, thanks to SWIX for hosting and to the notable capital team for having us and organizing.

03:37:25.120 | So thanks, everyone. Enjoy the rest of the talks today.

03:37:31.620 | Also, we have them to thank for lunch. So all the amazing lunch that we got.

03:37:36.120 | This whole thing is like self-funded, community funded. So we're very much flying by the seat of our pants.

03:37:41.120 | And also thank you to Laura for making all this happen.

03:37:44.120 | OK, so we have a couple more presentations from folks, just people like launching things.

03:37:50.120 | We got Drew, you're next, but Ben, I'm going to I'm going to call you up first.

03:37:54.120 | Ben, are you ready? I can get Drew to go first.