back to indexWhy Compound AI + Open Source will beat Closed AI — with Lin Qiao, CEO of Fireworks AI
00:00:06.160 |
This is Alessio, partner in CTO at Danceable Partners, 00:00:08.920 |
and I'm joined by my co-host, Swix, founder of SmileyEye. 00:00:11.920 |
- Hey, and today we're in a very special studio 00:00:29.160 |
but I think our relationship is a bit unusual 00:00:34.800 |
Yeah, I'm super excited to talk about very interesting 00:00:41.200 |
- You just celebrated your two-year anniversary yesterday. 00:00:45.080 |
We circle around and share all the crazy stories 00:00:47.480 |
across these two years, and it has been super fun. 00:00:51.300 |
All the way from we experienced Silicon Valley bank run. 00:00:57.540 |
- To we delete some data that shouldn't be deleted. 00:01:02.540 |
Operationally, we went through a massive scale 00:01:08.160 |
where we actually are busy getting capacity to... 00:01:13.160 |
Yeah, we learned to kind of work with it as a team 00:01:17.260 |
with a lot of brilliant people across different places, 00:01:24.640 |
- When you started, did you think the technical stuff 00:01:27.280 |
would be harder or the bank run and then the people side? 00:01:34.600 |
the hardest thing is going to be building the product, 00:01:36.520 |
and then you have all these different other things. 00:01:38.440 |
So, were you surprised by what has been your experience? 00:01:44.600 |
my focus has always been on the product side, 00:01:49.420 |
And I didn't realize the rest has been so complicated, 00:02:00.120 |
So, I think I just somehow don't think about it too much 00:02:04.180 |
and solve whatever problem coming our way, and it worked. 00:02:08.440 |
- So, I guess let's start at the pre-history, 00:02:13.740 |
You ran the PyTorch team at Meta for a number of years, 00:02:34.400 |
- My background is deep in distributed system, 00:02:43.180 |
And I saw this tremendous amount of data growth, 00:02:50.440 |
And it's clear that AI is driving all this data generation. 00:02:58.880 |
Meta's going through ramping down mobile-first, 00:03:05.040 |
And there's a fundamental reason about that sequence, 00:03:07.880 |
because mobile-first gave a full range of user engagement 00:03:14.320 |
And all this user engagement generated a lot of data, 00:03:19.560 |
So, then the whole entire industry is also going through, 00:03:34.180 |
I'm like, I want to dive up there and help this movement. 00:03:44.940 |
It's a kind of proliferation of AI frameworks 00:03:49.940 |
But all the AI frameworks focus on production, 00:03:59.280 |
and they use that to drive the model actuation 00:04:16.620 |
and I'm gonna do something different for myself. 00:04:21.720 |
PyTorch actually started as the framework for researchers. 00:04:34.100 |
There are so many researchers across academic, 00:04:40.460 |
and they put their results out there in open source. 00:04:43.620 |
And that power the downstream productionization. 00:04:58.740 |
So, that's kind of a strategy behind PyTorch. 00:05:02.980 |
it's kind of a classical MATA established PyTorch 00:05:05.580 |
as the framework for both research and production. 00:05:10.540 |
And we have to kind of rethink how to architect PyTorch 00:05:13.380 |
so we can really sustain production workload, 00:05:18.100 |
all this production concern was never a concern before, 00:05:37.500 |
to site integrity detect bad content automatically using AI, 00:05:44.340 |
image classification, object detection, all this. 00:05:47.140 |
And also across AI running on the server side, 00:05:49.940 |
on mobile phones, on AI VR devices, the wide spectrum. 00:05:54.580 |
So by the time, we actually basically managed 00:05:57.780 |
to support AI across ubiquitous everywhere across MATA. 00:06:02.540 |
But interestingly, through open source engagement, 00:06:07.940 |
this industry is start to take on AI first transition. 00:06:22.460 |
For many companies we engage with through PyTorch, 00:06:28.980 |
hey, if we create fireworks and support industry 00:06:44.220 |
and extreme optimization in the industry will be different. 00:06:58.620 |
- When you and I chatted about like the origins of fireworks, 00:07:01.780 |
it was originally envisioned more as a PyTorch platform. 00:07:06.380 |
And then later became much more focused on generative AI. 00:07:13.300 |
- Right, so I would say our initial blueprint 00:07:22.020 |
and there's no SaaS platform to enable AI workloads. 00:07:34.340 |
Because at 2022, there's still like TensorFlow 00:07:41.580 |
And PyTorch is kind of getting more and more adoption, 00:07:45.140 |
but there's no PyTorch first SaaS platform existing. 00:08:04.940 |
Instead of building a horizontal PyTorch cloud, 00:08:07.060 |
we want to build a verticalized platform first. 00:08:13.140 |
And interesting, we started a company September 2022, 00:08:16.980 |
and October, November, the OpenAI announced ChatGPT. 00:08:21.700 |
And then boom, then when we talk with many customers, 00:08:28.340 |
So of course, there are some open-source models. 00:08:32.620 |
but people are already putting a lot of attention there. 00:08:35.700 |
Then we decide that if we're going to pick a vertical, 00:08:39.620 |
The other reason is all GNI models are PyTorch models. 00:08:44.260 |
We believe that because of the nature of GNI, 00:08:47.020 |
it's going to generate a lot of human consumable content. 00:08:58.900 |
Our prediction is for those kind of application, 00:09:01.700 |
the inference is much more important than training 00:09:12.860 |
Of course, each training round could be very expensive. 00:09:15.980 |
Although PyTorch supports both inference and training, 00:09:23.100 |
And we launched our public platform August last year. 00:09:35.860 |
We started with LM, and later on, we added a lot of models. 00:09:43.220 |
So we love to kind of dive deep into what we offer. 00:09:46.180 |
So, but that's a very fun journey in the past two years. 00:09:49.780 |
- What was the transition from you start focus on PyTorch 00:09:53.220 |
and people want to understand the framework, get it live. 00:09:56.340 |
And now I would say maybe most people that use you 00:09:58.500 |
don't even really know much about PyTorch at all. 00:10:08.060 |
you were just like, "Hey, most people just care 00:10:43.580 |
first hire a team, who is capable of crunch data. 00:11:01.860 |
and not many companies can afford it, actually. 00:11:05.300 |
And Gen-AI is a very different game right now 00:11:12.620 |
That makes AI much more accessible as a technology. 00:11:19.740 |
they can interact with Gen-AI models directly. 00:11:34.980 |
doesn't make any sense anymore with this new technology. 00:11:38.620 |
And then building easy, accessible APIs is the most important. 00:11:44.540 |
we decided we're going to be OpenAI compatible. 00:12:01.180 |
Gemini announced that they have OpenAI compatible APIs. 00:12:06.180 |
to adopt it all at night and then we have everyone. 00:12:17.900 |
to donate many very, very strong open source models. 00:12:29.740 |
the upper-level stack, built on top of Lama models. 00:12:37.180 |
They instead want to build a community around the stack 00:12:49.980 |
because they are kind of creating the top-of-the-line 00:12:54.540 |
because this is the most used open source model. 00:12:57.340 |
So I think it's really a lot of fun working at this time. 00:13:01.540 |
- I've been a little bit more doubtful on LamaStack. 00:13:28.980 |
That's why I kind of will work very closely with them 00:13:33.340 |
The feedback to the Meta team is very important. 00:13:35.660 |
So then they can use that to continue to improve the model 00:13:46.420 |
And I know Meta team would like to kind of work 00:13:49.340 |
with a broader set of community, but it's very early. 00:14:01.100 |
you started betting heavily on this term of compound AI. 00:14:03.820 |
It's not a term that we've covered very much in the podcast, 00:14:06.460 |
but I think it's definitely getting a lot of adoption 00:14:09.340 |
from Databricks and the Berkeley people and all that. 00:14:16.100 |
- Right, so let me give a little bit of context 00:14:24.140 |
there was no message, and now it's like on your landing page. 00:14:31.300 |
from when we first launched our public platform. 00:14:34.540 |
We are a single product, and we are a distributed 00:14:36.380 |
inference engine, where we do a lot of innovation, 00:14:45.860 |
and build distributed disaggregated execution, 00:14:50.180 |
inference execution, build all kind of caching. 00:14:55.940 |
is the fast, most cost-efficient inference platform. 00:15:00.540 |
we know we basically have a special PyTorch build for that, 00:15:07.900 |
we realized, oh, the distributed inference engine, 00:15:14.940 |
then everyone come in, and no matter what kind of 00:15:26.140 |
all customers have different kind of use cases. 00:15:28.460 |
The use cases come in all different form and shape. 00:15:37.900 |
with the data distribution in the training data 00:15:46.580 |
what's not important, like in preparing data for training. 00:15:57.540 |
So then we're saying, okay, we want to heavily invest 00:16:02.740 |
And we actually announced it called FireOptimizer. 00:16:04.980 |
So FireOptimizer basically help user navigate 00:16:16.180 |
And even for one company, for different use case, 00:16:22.100 |
So we automate that process for our customer. 00:16:32.620 |
And then we spit out inference deployment config 00:16:43.740 |
So that product thinking is one size fits one, 00:16:49.740 |
we provide a huge variety of state-of-the-art models, 00:16:54.940 |
varying from text to state-of-the-art English models. 00:17:02.820 |
we realize, oh, audio and text are very, very close. 00:17:06.420 |
Many of our customers start to build assistants, 00:17:31.020 |
because a lot of information doesn't live in plain text. 00:17:34.420 |
A lot of information live in multimedia format, 00:17:50.060 |
So vision is important, we also support vision model. 00:17:52.580 |
Various different kind of vision models specialize 00:17:54.580 |
in processing different kind of source and extraction. 00:17:58.580 |
And we're also gonna have another announcement 00:18:08.220 |
and then get the extract very accurate information out 00:18:19.380 |
And in addition to that, we also support text to image, 00:18:22.180 |
image generation models, text to image, image to image, 00:18:25.100 |
and we're adding text to video as well in our portfolio. 00:18:28.540 |
So it's very comprehensive set of model catalog 00:18:39.260 |
and then we realize one model is not sufficient 00:18:44.060 |
And it's very clear because one is the model who listens, 00:18:47.860 |
and many customer, when they onboard this JNI journey, 00:18:52.340 |
JNI is gonna solve all my problems magically, 00:18:54.460 |
but then they realize, oh, this model who listens. 00:18:57.100 |
It who listens because it's not deterministic, 00:19:00.540 |
So it's designed to always give you an answer, 00:19:14.380 |
And different model also have different specialties. 00:19:16.900 |
To solve a problem, you want to ask different special model 00:19:25.060 |
and have an expert model solve that task really well. 00:19:28.140 |
And of course, the model doesn't have all the information. 00:19:32.140 |
because the training data is finite, not infinite. 00:19:34.580 |
So model oftentimes doesn't have real-time information. 00:19:49.660 |
Compound AI system basically is gonna have multiple models 00:19:58.180 |
whether it's public APIs, internal proprietary APIs, 00:20:20.260 |
but it's public information, like MongoDB is our investor, 00:20:23.740 |
and we have been working closely with them for a while. 00:20:32.180 |
it's almost like you're centralizing a lot of the decisions 00:20:39.140 |
It's like you have GPUs in like a lot of different clusters, 00:20:41.740 |
so like you're sharding the inference across. 00:20:45.460 |
So first of all, we run across multiple GPUs. 00:20:49.620 |
But the way we distribute across multiple GPUs is unique. 00:20:54.060 |
We don't distribute the whole model monolithically 00:21:26.700 |
like different continent wakes up at a different time. 00:21:29.740 |
And you want to kind of load balancing across. 00:21:35.300 |
we manage various different kinds of hardware skew 00:21:44.580 |
whether it's long context, short content, long generation. 00:21:47.820 |
So all these different type of workload is best fitted 00:22:05.020 |
the image that Ray, I think, has been working on 00:22:07.700 |
with like all the different modalities that you offer. 00:22:10.140 |
Like to me, it's basically you offer the open source version 00:22:13.620 |
of everything that OpenAI typically offers, right? 00:22:31.940 |
I think we're betting on the open source community 00:22:41.140 |
- And there's amazing video generation companies. 00:22:48.140 |
Like cross-border, the innovation is off the chart 00:22:58.460 |
- I think I want to restate the value proposition 00:23:00.420 |
of Fireworks for people who are comparing you 00:23:02.940 |
versus like a raw GPU provider, like a RunPod, 00:23:08.820 |
which is like you create the developer experience layer 00:23:12.380 |
and you also make it easily scalable or serverless 00:23:25.860 |
for all large language models, all your models. 00:23:32.740 |
- Yeah, almost for all models we serve, we have. 00:23:52.460 |
- Yeah, I think the typical challenge for people 00:23:59.860 |
who are also offering open source models, right? 00:24:05.100 |
like a good experience for all these customers. 00:24:07.580 |
But if your existence is entirely reliant on people 00:24:17.660 |
So that's the kind of foundation we build on top of. 00:24:28.900 |
So what's happening in the industry right now 00:24:39.740 |
They help me understand existing way of doing PowerPoint, 00:24:56.380 |
how to fit into my storytelling into this format 00:25:19.460 |
combined with automated content generation through GNI 00:25:24.580 |
is the new thing that many founders are doing. 00:25:34.620 |
they are consumer, personal, and developer facing, 00:25:40.180 |
It's just a kind of product experience we all get used to. 00:25:46.340 |
Otherwise, nobody wants to spend time, right? 00:25:48.740 |
So again, and then that requires low latency. 00:25:52.700 |
the nature of consumer, personal, and developer facing 00:25:57.180 |
You want to scale up to product market fit quickly. 00:26:07.740 |
But when I scale, I scale out of my business. 00:26:09.900 |
So that's kind of very funny to think about it. 00:26:13.020 |
So then have low latency and low cost is essential 00:26:18.020 |
for those new application and product to survive 00:26:25.620 |
our distributed inference engine and the file optimizer. 00:26:43.940 |
And we automate that because we don't want you 00:26:46.980 |
as app developer or product engineer to think about 00:26:49.740 |
how to figure out all these low-level details. 00:27:07.380 |
Every week, there's at least a new model coming out. 00:27:43.180 |
You give developer tools to dictate how to do it. 00:27:49.300 |
where a developer tells what they want to do, not how. 00:27:52.660 |
So these are completely two different designs. 00:27:55.380 |
So the analogy I want to draw is in the data world, 00:27:59.740 |
the database management system is a declarative system 00:28:19.900 |
And database management system will figure out, 00:28:22.340 |
generate a new best plan and execute on that. 00:28:34.660 |
Imperative side is there are a lot of ETL pipelines 00:28:56.460 |
I don't think one is gonna subsume the other, 00:29:02.740 |
because from the lens of app developer and product engineer, 00:29:08.220 |
- I understand that's also why PyTorch won as well, right? 00:29:26.100 |
So another announcement is we will also announce 00:29:38.020 |
And this model is inspired by Owen's announcement 00:29:42.460 |
You should see that by the time we announce this or soon. 00:29:52.860 |
We actually have trained a model called FireFunction. 00:30:08.740 |
We have pre-baked set of APIs the model learn. 00:30:18.340 |
So we have a very high quality function calling model 00:30:28.180 |
that you don't even need to use function calling model. 00:30:35.060 |
approaching very high, like OpenAI's quality. 00:30:50.620 |
is this a next Gemini model or a MADIS model? 00:30:57.500 |
We're like watching the Reddit discussion right now. 00:31:00.420 |
- I mean, I have to ask more questions about this. 00:31:07.300 |
it's a single model or whether it's like a chain of models. 00:31:10.420 |
And basically everyone on the Strawberry team 00:31:17.100 |
for reinforcement learning, chain of thought, 00:31:24.500 |
Have you done the same amount of work on RL as they have 00:31:32.100 |
where I do, the caliber of team is very high, right? 00:31:51.300 |
We are definitely on, for that I fully agree with them. 00:31:54.740 |
But we're taking a completely different approach 00:32:02.140 |
All of that is because we built on the show of giants, right? 00:32:05.140 |
So the current model available we have access to 00:32:09.300 |
The future trend is the gap between the open source model, 00:32:19.180 |
That's why I think our early investment in inference 00:32:22.820 |
and all the work we do around balancing across quality, 00:32:29.780 |
because we have accumulated a lot of experience there 00:32:32.260 |
and that empower us to release this new model 00:32:40.340 |
what do you think the gap to catch up will be? 00:32:44.700 |
with open source models eventually will catch up. 00:32:47.340 |
And I think with 4, then with Lama 3.2, 3.1, 4.5b, 00:32:52.420 |
And then L1 just reopened the gap so much and it's unclear. 00:32:55.900 |
Obviously you're saying your model will have- 00:33:02.340 |
- So here's the thing that's happened, right? 00:33:06.620 |
But in reality, open source model in certain dimension 00:33:11.140 |
already on par or beat closed source model, right? 00:33:22.100 |
like file function is also really, really good. 00:33:24.220 |
So it's all a matter of whether you build one model 00:33:28.220 |
and you want to be the best of solving all the problems 00:33:31.260 |
or in the open source domain, it's gonna specialize, right? 00:33:34.580 |
All these different model builders specialize 00:33:39.260 |
And it's logical that they can be really, really good 00:33:44.500 |
And that's our prediction is with specialization, 00:33:48.540 |
there will be a lot of expert models really, really good 00:34:07.140 |
'cause you're basically fighting the bitter lesson. 00:34:13.900 |
about someone specializing doing something really well, right? 00:34:17.300 |
And that's how our, like when it evolved from ancient time, 00:34:19.980 |
we're all journalists, we do everything in the tribe too. 00:34:22.580 |
Now we heavily specialize in different domain. 00:34:30.420 |
you get short-term gains by having specialists, 00:34:33.700 |
domain specialists, and then someone just needs to train 00:34:36.060 |
like a 10X bigger model on 10X more inference, 00:34:43.780 |
And then it supersedes all the individual models 00:34:46.380 |
because of some generalized intelligence/world knowledge. 00:34:50.220 |
You know, I think that is the core insight of the GPTs, 00:35:00.180 |
you have increasing amount of data to train from 00:35:04.780 |
So I think on the data side, we're approaching the limit 00:35:11.300 |
And then there's like, what is the secret sauce there, right? 00:35:23.340 |
they are shifting from the training scaling law 00:35:28.260 |
So I definitely believe that's the future direction 00:35:31.660 |
and that's where we are really good at and doing inference. 00:35:35.580 |
Are you planning to share your reasoning traces? 00:35:48.660 |
it's interesting that like, for example, Sweden bench, 00:36:01.980 |
So that's why you don't see O1 preview on Sweden bench 00:36:05.300 |
because they don't submit their reasoning traces. 00:36:13.620 |
So your model is not going to be open source, right? 00:36:16.100 |
Like it's going to be a endpoint that you provide. 00:36:25.740 |
- This is, I don't have actually information. 00:36:35.540 |
It's nice to just talk about it as it goes live. 00:36:39.620 |
you want feedback on or you're thinking through? 00:36:41.700 |
It's kind of nice to just talk about something 00:36:43.980 |
when it's not decided yet about this new model. 00:36:56.860 |
So there's already a Reddit discussion about it 00:37:00.020 |
and the people are asking very deep medical questions. 00:37:15.740 |
So we're having a lot of fun testing this internally. 00:37:19.740 |
But I'm more curious, how will people use it? 00:37:22.780 |
What kind of application they're going to try 00:37:26.020 |
And that's where we'll really like to hear feedback 00:37:30.940 |
And also feedback to us, like what works out well, 00:37:37.660 |
And what kind of thing they think we should improve on? 00:37:41.620 |
And those kind of feedback will be tremendously helpful. 00:37:44.500 |
- Yeah, I mean, so I've been a production user 00:37:55.180 |
and for, oh, just like they made the previous 00:38:00.220 |
Like it's really that stark, that difference. 00:38:15.860 |
But sometimes you know how hard the problem is 00:38:27.980 |
So we actually thought about that requirement 00:38:31.180 |
and it should be at some point we need to support that. 00:38:35.540 |
Not initially, but that makes a lot of sense. 00:38:41.020 |
of just like the things that you're working on. 00:38:44.860 |
I don't know if I've ever given you this feedback, 00:38:50.300 |
Because like, you know, I think when you first met me, 00:39:08.540 |
You know, I think your surface area is very big. 00:39:13.900 |
- Yeah, and now here you are trying to compete 00:39:24.220 |
So there's no, there's no thing I can just copy. 00:39:29.540 |
- I think we all come from very aligned on the culture. 00:39:55.220 |
we are delivering a lot of business values to the customer. 00:40:15.300 |
So yeah, so that's just how we work as a team. 00:40:18.820 |
And the caliber of the team is really, really high as well. 00:40:38.460 |
Let's talk a little bit about that customer journey. 00:40:40.300 |
I think one of your more famous customers is Cursor. 00:40:44.780 |
and then obviously since then they have blown up. 00:40:48.180 |
But you guys especially worked on a fast supply model 00:40:54.940 |
to work on speculative decoding in a production setting. 00:41:00.020 |
what was the behind the scenes of working with Cursor? 00:41:03.220 |
- I will say, Cursor is a very, very unique team. 00:41:14.380 |
although like many companies including Copala, 00:41:17.340 |
they will say, I'm going to build a whole entire stack 00:41:20.700 |
And they are unique in the sense they seek partnership. 00:41:24.980 |
Not because they cannot, they're fully capable, 00:41:30.660 |
And of course they want to find a bypass partner. 00:41:39.180 |
because for them to deliver high caliber product experience 00:41:47.540 |
So actually we expanded our product feature quite a lot 00:41:55.220 |
and we massively scaled quickly across multiple regions. 00:41:59.460 |
And we develop pretty high intense inference stack, 00:42:07.900 |
I think that's a very, very interesting engagement. 00:42:10.700 |
And through that, there are a lot of trust being built. 00:42:18.820 |
That comes back to, hey, we're really customer obsessed. 00:42:32.700 |
Yeah, so you almost feel like working as one team. 00:42:41.940 |
but most of the time people will be using close models. 00:42:53.980 |
or like their house brand models are concerned, right? 00:43:04.620 |
- Very obviously the dropdown is 4.0 and then Cursor, right? 00:43:08.380 |
So like, I assume that the Cursor side is the Fireworks side 00:43:11.220 |
and then the other side, they're calling out the other. 00:43:15.420 |
And then like, do you see any more opportunity on like the, 00:43:26.380 |
Actually, when I mentioned a file optimizer, right? 00:43:36.780 |
Basically optimized for their specific workload. 00:43:39.220 |
And that's a lot of juice to extract out of there. 00:43:46.380 |
So that's why we started a separate product line 00:43:50.820 |
So speculative decoding is just one approach. 00:43:58.020 |
There's so many different ways to do speculative decoding. 00:43:59.940 |
You can pair a small model with a large model 00:44:15.260 |
or, you know, small, big model pair much better 00:44:20.900 |
So all of that is part of the File Optimizer offering. 00:44:27.020 |
I think the other question that people always have 00:44:30.260 |
So you get different performance on different platforms. 00:44:40.540 |
But maybe, you know, using speculative decoding, 00:44:47.740 |
How should people think about how much they should care 00:44:58.020 |
- Okay, so there are two big development cycle. 00:45:01.300 |
One is experimentation, where they need fast iteration. 00:45:14.420 |
but scaling and the quality is really important 00:45:17.020 |
and latency and all the other things are becoming important. 00:45:24.740 |
Make sure even like JNI is the right solution 00:45:31.260 |
then that's kind of the three-dimensional optimization curve 00:45:34.660 |
start to kick in across quality, latency, cost, 00:45:42.980 |
To many product, if you choose a lower quality 00:45:49.380 |
but it doesn't make a difference to the product experience, 00:45:53.300 |
So that's why I think inference is part of the validation. 00:46:02.180 |
we'll go through A/B testing through inference 00:46:09.780 |
So this is like traditional product evaluation. 00:46:18.020 |
and different model setup into the consideration. 00:46:33.100 |
And maybe you want to set the record straight 00:46:51.460 |
- Specifically by name, which is normally not what- 00:46:56.820 |
and have certain interpretation of our quality. 00:47:16.900 |
So we actually refrain ourselves to do any of those 00:47:37.380 |
we wrote out actually a very thorough blog post 00:47:42.580 |
We have various different quantization schemes. 00:47:45.100 |
We can quantize very different parts of the model 00:47:47.940 |
from ways to activation to cross-TPU communication 00:47:50.540 |
to they can use different quantization scheme 00:48:01.740 |
we actually let them find the best optimized point 00:48:11.460 |
But for self-serve, there's only one point to pick. 00:48:23.420 |
And I think the end results like AA published, 00:48:36.580 |
that's why what I mean is I will leave the evaluation 00:48:42.700 |
and work with them to find the most fair benchmark approach 00:48:47.900 |
But I'm not a part of approach of calling out specific names 00:48:52.900 |
and critique other competitors in a very biased way. 00:48:59.580 |
I think you're the more politically correct one. 00:49:11.820 |
No, actually all these directions we build together. 00:49:22.300 |
on just the last one on the competition side, 00:49:30.060 |
and we talked about the competitiveness in the market. 00:49:32.660 |
Do you aim to make margin on open source models? 00:49:39.140 |
So, but I think it really, when we think about pricing, 00:49:49.620 |
or there are a lot of people delivering same value, 00:49:53.180 |
There's only one way to go is going down, right? 00:50:00.140 |
we're more compared with like closed model providers, 00:50:05.980 |
their cost structure is even more interesting 00:50:27.780 |
So that created very interesting dynamics of, 00:51:14.980 |
We are not just a single model as a service provider, 00:51:27.700 |
where significantly simplify your interaction 00:51:38.220 |
- What do people not know about the work that you do? 00:51:45.860 |
Is there any kind of like underrated part of Fireworks 00:52:00.100 |
Fireworks can allow me to upload the LoRa adapter 00:52:13.580 |
like we rolled out multi-LoRa last year, actually, 00:52:17.020 |
and we actually have this function for a long time, 00:52:39.700 |
so I'm happy that user is marketing it for us. 00:53:02.860 |
- We have prompt catching way back last year also. 00:53:06.820 |
So yeah, so I think that is one of the underrated feature, 00:53:22.620 |
is not because they feel like charging people. 00:53:36.100 |
- Yeah, so this is kind of our technique called multi-LoRa. 00:53:54.340 |
and then basically all these different LoRa adapters 00:53:58.100 |
like direct the same traffic to the same base model, 00:54:17.500 |
or you're looking for model-wise or tooling-wise 00:54:20.860 |
that you think someone should be working on in this? 00:54:23.420 |
- Yeah, so we really want to get a lot of feedback 00:54:35.020 |
or starting to think about new use cases and so on, 00:54:41.740 |
and let us know what works out really well for you, 00:54:54.820 |
typically we want to launch to a small group of people. 00:55:05.180 |
We have a lot of communication going on there. 00:55:22.220 |
infrastructure cloud, infrastructure engineers, 00:55:31.300 |
who have done a lot of fine-tuning and so on.