Building an AI assistant that makes phone calls [Convex Workshop]

Well, hello everybody. Thank you for braving the early morning. 8 a.m. is rough for the best of us, so I really do appreciate the crawling out of bed and coming to check out this talk. My name is Tom Redmond. I am the head of DX at a company called Convex.

We're building a platform that people are able to build their companies on from day one to year two, hopefully year 10 and year 20 and beyond. Today I wanted to walk through this idea that I've had for a long time about building a better AI assistant. It came to me a little while ago after trying a number of AI assistants, most of which can pretty much set calendar events and reminders and timers and things like that.

And I thought to myself, what does a real personal assistant do for people? A lot of the time, short of collecting laundry and doing physical things, they're on the phone and they're on email. And so I realized I feel like there's enough technology out there these days that we could actually string together a number of platforms such that you could have an AI assistant that knows about you.

It has context on you and your life and who you are and would be able to manage a conversation in a non-creepy way with another human being and the technology exists such that we can do all of this in real time. We can transcribe speech to text in real time.

We can transpose text to speech in nearly real time as well. And so I kind of wanted to piece all these things together and that's what we're going to go through today. So this is a better AI assistant. This is Floyd. This is actually Lloyd from Entourage, but when I was thinking about the name I was like "Who's the best personal assistant of all time?" And it's Lloyd from Entourage, but I forgot his name and I thought it was Floyd.

And so Floyd is the name of the app and Lloyd is the name of the personal assistant from Entourage. So there is the repo available for this. We're going to walk through a demo. We're going to walk through some of the code. It's okay if you don't get it totally up and running right now.

There are a number of third-party platforms that we are going to string together to make this work. And so for it to work for you end-to-end, we need Google Cloud with the speech-to-text API enabled, a Convex account, an OpenAI key, and a Twilio account. If you have all those things, wonderful.

If not, I'm more than happy to help get anybody set up and see if we can get Floyd working for you on your machine after the talk. So this is what the .env file is going to look like. We've got the OpenAI stuff, Twilio stuff, some Convex things, and some Google stuff.

Yeah? It's not, but it will be. I'm not sure. I will send it. I'll make sure that it's available after the talk for sure. So a little bit of the higher-level architecture here. This is, at a high level, how Floyd works. So you'll have an application, either like your phone or a web app.

It will take a voice request from the person, you, who needs help. It'll take a voice request. It'll transcribe that in real time by streaming to Google Cloud, which will stream back the transcription. So you could say something like, "When's my next dentist appointment?" Or, "Book me a dentist appointment," or something like that.

It'll make that transcription. It'll make that transcription. And then what happens is that the client will simply save that request, the user who made it, and that request into the Convex database, and then hands off. What the server is doing is listening for changes to new requests in real time from that same Convex database.

It's not polling. We're not pinging the server to let it know. It's got a reactive query that it's effectively subscribed to. And so when a new request comes in, the server is able to simply pick that up and start working on it and provide status updates along the way, which, on the other hand, the client is able to then present back to the user.

So the request goes in, the server picks it up, and maybe sets it to in progress. That status in progress is, again, picked up for free automatically in real time by the client, which is also subscribed to that database. And we'll see what that looks like. So we save the request on the server.

The server will go and look up what it knows about the person who made the request. This is a platform that, over time, becomes more and more knowledgeable about you. Every time you make a request, or you ask it something, or you give it information, or you connect your email, etc., it will start to learn things about you.

What kind of car you drive? Where your last mechanic shop was? What school your kids go to? You can provide as much or as little of this as you want. You can wait for Floyd to prompt you to ask these questions. But this is the type of information it's going to need to know to, say, call the school and let them know that your kid's going to be late.

Okay? So the server is going to take what it knows about the person who made the request, and it's going to save that context as a moment in time onto that request. So now we have this request object in the database. It's got the person who made it. It's got the request itself.

It's got some context. And basically, it's ready to go. So the server will then take that, and it'll work with OpenAI, like the ChatGPT integration, 4.0 in this case, to effectively provide ChatGPT with the request and with the context that it needs to fulfill that request and say, "This is your job now.

Are you ready to help us out?" At that point, OpenAI is like, "Yeah, we got this. I think I know what to do." And it's like, "Great. Okay. We're going to make a phone call right now. The next thing is going to be somebody on the phone, and you're talking to them." Okay?

So at that point, the server now has this great starting point with the help of OpenAI to make the phone call. So there's a number of ways that can find the phone number for whatever the request might be. Typically, that would exist in your prior context. If not, now this doesn't exist yet, but if not, the idea would be, we understand the request.

We know the general area where you live. We would do the work to look up the phone number if, for example, it wasn't already existing in your context. And if we can't find it, at that point, we could send you a text or something like that. Floyd would send the text and say, "Hey, do you have a preferred vendor?" We couldn't find a mechanic in your history.

We couldn't find any online. Is there anybody that you would like to use? So at that point, the server will make the phone call. And it does that by coordinating the conversation through GPT-4, using OpenAI's text-to-speech, streaming that through Twilio. And then as the person on the other side of the call is speaking, we're streaming that audio to be transcribed in real time through Google Cloud, which we then feed back into GPT-4 to carry on the conversation.

So we're streaming this as fast as possible. I am super impressed by the technology that we have access to today. The fact that this works and isn't painfully slow, it boggles my mind. We're very fortunate to operate in the ecosystem that we do. So this loop that you're seeing right here, this happens over and over until the conversation is complete.

So the audio bytes come back and so on and so on. Yeah? Experimenting with multimodal Gemini that you can pass audio Not yet, not yet. Oh yeah, so the question was have I been experimenting yet with the multimodal Gemini where you can pass it audio and basically skip that step of transcribing, right?

I haven't. But the beauty of this is that I'm designing it in such a way that we should be able to swap in and out different services because they're coming out so fast. You know, when I started this, that didn't exist. And it's exactly the type of thing I would experiment with and see if I could make it even faster, right?

I think like the faster you make this, there's no limit to how quick this should be when it's operating. So I think that's an awesome thing to explore. So as this loop is happening, we are actively saving each part of this transcript as part of that request to the Convex database.

So we're just saving it, we're just pushing it, we're just appending to the database. What that means is that on the client, because Convex is a reactive database for free, it's like a one liner in the client. Instead of saying use state, you just say use query once. And it will update when the database updates.

So as we are writing the transcript to the database, your client, for example, could be streaming that back in real time. All right, well, let's dive in. I feel like a see if we can get a demo going first and foremost, why not? What could possibly go wrong? Okay, so this is sort of a shell.

It's very much a work in progress. I wouldn't exactly call this production ready, but it's it's fun and it does work. So I'm going to make a voice request to Floyd here, who is hopefully going to see if they can help me out. Now, for development, I have overwritten every phone number it would call to just call me.

So I pretend like I'm the vendor. I'm not, I'm not totally confident enough for it to call like a real business and not, and not totally embarrass me or do something crazy. So still in development. So let's see. Let's see here. Let's see what we can have Floyd do for us.

Hey Floyd, can you call the school and let them Mara is going to be home. She's sick today. So here we have the request ID. It'll automatically send that request when you stop speaking. Down here. I'm the school. Hello, this is Mara School. Hey there, this is Floyd. I'm calling on behalf of my client, Tom Redmond.

Mara Redmond is staying home today. She's sick. Oh no, is she going to be okay? Yeah, just a bit under the weather right now. Okay. That's good. Do you have any idea when she's going to be back? She's hoping to be back by the end of the week. I'll keep you updated if anything changes.

Okay. That's great. Thank you. Maybe I should try that Gemini thing. Thanks for understanding. Have a good day. So again, you can see like it's there. The latency is so crucial though. Now this is all in like development land. Moving this over to production in every one of these platforms would, you know, that would be the next thing I try.

Make it go faster. Try different models. There's different things that I've already done in terms of the audio encoding to make the streaming as fast as possible. There's a format called Opus, which is designed for phone call level quality and is encoded to be streaming really quick. So we can see here, here's the request that came in.

And again, all I did in the client was send the request text to the database. Uh, the client has listening for those, any requests that match this user and is going to automatically update, uh, the app. And so the first thing I asked open AI to do is come up with an action plan.

So we're not saying any of this, but in some ways I have open AI kind of prompt itself. I say, Hey, we're a team here. We're going to help, um, we're going to help Tom in this case, you know, do something. They've got a request. Uh, let's, let's work on this together.

Here's the request. Here's the context. And so then this is what open AI says. Here's based on the context, it'll look up the context on me, what it knows about me. And it'll say, okay, well, he pulls out all the important information required for that request. So Mara's full name, we know it's, uh, Mara Redman, uh, school name, reason for absence, today's date, steps to fulfill the request, dah, dah, dah, dah, dah.

And then I send this back in, um, as additional context when, uh, we're actually making the phone call. So it's like opening eyes got these instructions. It's just sitting with, yeah. Uh, I haven't tried maybe, um, I haven't, it, I certainly haven't done anything specific for that. But it definitely feels a lot more straightforward to do something like that.

Um, than interacting with a human. So I, I wanted to get the, the thing I didn't know was going to work out of the way. Um, I feel like navigating a call tree is definitely a solvable problem. This was like really a curiosity for me to see if we could get this to work.

And you can see down here, um, I'll see if I can zoom in. My, uh, CSS skills are, are, uh, are lacking here. But there's a transcript here that comes in, uh, in real time as well. And maybe I can actually do another, uh, let's make another request. Hey, Floyd, can you order Krista some flowers?

Now we can watch this action plan and everything coming in real time. Now I'm going to be the flower shop. So let's break this down. Okay. Oh. Sweet Violet's flowers. How can I help you? Hey there, this is Floyd. I'm calling on behalf of my client, Tom Redmond. Um, I need to order some flowers for delivery today.

Uh, okay. Do you have any idea what kind of flowers you would like? I know Krista likes your usual offerings. Could you recommend some popular from your collection? Yeah, we have some, uh, peonies and some roses for $560. Would you like that? That sounds good. Can you deliver it to an address in Guelph ON today?

Yeah, absolutely. I think we have, we have, have you been here before? Do we have your information on file? I'm calling on behalf of Tom. Do you need his details again or do you have them on file? Sorry, I'll have to call you back. Haha. That's right. So I have, uh, this like bail situation built in where it's like, all right, if you're in a pickle and you just don't have the information that they're asking for, you say, I'm sorry, I'm going to have to call you back.

And at that point, Floyd's like, I'm missing something. I need something. And it would send me a text saying, hey, tried to order you flowers, um, but you didn't give me a budget and they wanted to charge you $560 for some peonies. Um, all right. So what we're looking at here is the, uh, the database in, in Convex that's storing the request information.

So here we have some, uh, requests and you can see, hey, Floyd, can you order Krista some flowers? So I wish my, sorry about my, my CSS skill. Oh, there we go. Okay. That kind of works. Can you order Krista some flowers? So let's find that. Here we go.

Hey, Floyd. So I can go ahead and just update this straight in the database, uh, some flowers. I'm going to say there is no budget. Um, and then we'll just save that. And then watch what happens in the client when I change the database. That's it. That's why when I append parts of that transcript to the database and I'm listening for those changes in the client, they, uh, they just appear.

Uh, what I also have in here are, uh, for users, we have this context. Now I want to, I want to, uh, preface this by saying like the actual code of what I'm doing here is far from best practice. So like, don't take this and try to roll it into production.

This is definitely a proof of concept. Prior to using Convex for this, I was trying all sorts of things things to get that phone call, the transcript of that phone call to stream back live to the client. In fact, I wanted to get it so that you could listen in live to the client.

I was really struggling with that. Um, I was, I didn't roll my own server to begin with. I was using, uh, next JS, which is fantastic, but I was hosting it on Vercel, which doesn't play very nicely with, uh, web sockets. Web sockets, right? Most of Vercel's hosting is, uh, is serverless and web sockets are inherently stateful.

So while I was able to get, um, parts of this working with socket IO, for example, the first transcription, having it actually interact with the phone call and streaming that audio with Twilio through a web socket protocol, um, I found to be difficult on the front end. And so I was trying to do this all in one place.

With Convex, I was really able to separate those concerns and just listen for the things I wanted the, the bits of data I was interested in and let the server manage whatever server it is. In this case, it's, it's, uh, an express server that, um, that I've written and let it do the lifting and basically just post updates to the database.

So we don't, we don't have to on the client. We don't have to, we don't have to pull, um, we don't have to post anything. And so it's all, uh, it really simplified the separation of, uh, of concerns around this. Now, the really, really interesting thing here is that if you're not using a serverless hosting infrastructure, you could do this entire thing in tech, like in your client code base, when you're coding with Convex, you don't necessarily have to, uh, break out your server code from your front end code.

The whole value prop is that you're able to actually build a full server in client land. It doesn't actually get served on the client. It gets deployed to a Convex server, but you can define your backend and your APIs and your schemas and your databases all in the same code base as your front end.

Um, and I'll show you exactly what I mean about that. So, uh, let's close this here. So here I have, uh, the, the web, I have my, uh, my web client. Um, let's take a, take a look at the page that, uh, that shows that list of requests and then the request details.

This line here, this line fetched requests, that's it. Anytime a request is updated or changed and it matches the query I've defined in, uh, this, this Convex function, it'll update my React client, uh, however I want. So let's see, what, what do I do with this? Obviously I provide, uh, so it gets updated.

I provide, uh, a list of, uh, fetched requests. So requests. So I pass in my requests, which gets updated and updated, um, into my, into my dashboard. And then from there, I just, it's an array of requests. I list them out. It's got the details, uh, of the requests that I can use when somebody clicks on something in a list.

I can show those details in the, in the detail pane. Um, and I think what's interesting here is the way that I've defined, um, the get requests is just this query. So this query itself actually right now is not, uh, user specific, but typically you would probably add in some user ID and some auth.

This, this, this prototype does not have auth. Again, do not ship this. Um, so this is the query get. And if you recall in my last, uh, see if I can find it here. In my page, remember I said API requests get API is a generated type safe, uh, convex model that gets, um, updated and deployed every time you make a change to one of those convex files.

So API.requests.get. And what that's doing is specifically hitting, uh, this, this function here, I can name this whatever I want. This just happens to be a get request. Um, and so then this is this function, this is a query that actually lives not on the client. Even though my code is here in client land, it doesn't get shipped with the client, it gets built and deployed to convex.

And this function physically runs on the convex server on the same machine as your database. So this, there's a, there's a custom V8 engine that's running next to the database that's actually executing the JavaScript or the TypeScript that you define here. And this makes it extraordinarily fast. It absolves you from ever having to think about caching.

Yeah? Um, I do, um, my question would be, "What's the big, say, reason for case-changers contribute convex over any of the major existing databases?" Yeah, that's a good question. So the question is, why would you like, what's, what's the big compelling reason to use something like convex over, um, the other, you know, like common databases like Mongo or Postgres?

And the big thing is the developer ergonomics. You don't need a backend and/or infrastructure engineer or team if you're using convex. You get, um, type safety all the way through from your database through to your front end, and you get all the wonderful completions that happen, uh, with that as you define your schema.

Um, and you can operate in a much more simple code base. And so again, I don't have to stay, like this, this, what you're seeing here is the totality of my, of my backend server as far as the requests, uh, table goes. Uh, and so I've added different things like get requests, uh, by ID, get pending requests, right?

So we can do, like, different filtering, um, post request. The post request is interesting because, again, in, in, in front-end land, you can create an HTTP server that just has arbitrary endpoints with arbitrary responses. And you can use those with or without hitting the database. Um, but you can simply define a get root or a post root in your, uh, HTTP, um, actions and do with it whatever you want.

So in this instance, what I'm doing, again, because, um, because I need, I need the server here to be able to stream to Twilio. Any, any other platform or server that supports, um, like WebSockets natively, you can, you can do it all in one place. So what I've created here is an HTTP post request that posts the, uh, the, the data, uh, or the request of the user, um, to the, to the server, which also happens here.

So here I've added the post request and, um, um, this is going to run the action, which goes ahead and updates the database. Yeah. Sorry. The only way to debug this is to, so, uh, note we have, uh, on the dashboard, you get a full, uh, full access dashboard that includes all of, uh, all of your logs for all of the requests.

It includes the definitions for, uh, for your functions and you can actually run your functions from the dashboard as like test functions and to see, um, what the response is or yeah. Pardon. Uh, that's a good question. Um, Right. So the, the question is, um, can you hit break points in the convex server code?

Honestly, I don't know. Can I get back to you on that? I can, I can definitely follow up. Um, that's a, that's a, that's a great point. That's a great, that's a great question. Um, I've typically ended up relying on both the logs and then the, uh, the convex, um, client that is doing the, uh, compilation on your machine will also give you, give you any, uh, any errors, but that's, that's a great question.

Let me, let me get back to you on that one. Um, okay. And so here we have, try to make this a little bit easier to see. Toggle. Toggle. Uh, Okay. Maybe not. All right. So here we have the request being saved. Um, this is a little bit of an esoteric way to do it again, just because of the, this, this need to stream.

And then at the same time, what I wanted to do with the stream was write that transcript in real time as it's streaming into the database. Uh, And so what, uh, what that looks like is here in, uh, in the server, I've created a convex client right within the server that will effectively get an update every single time.

Um, this database, uh, changes, again, based on the query I define. Um, so in this case, I'm asking Convex to ping my server, uh, anytime there's a change to a pending request. And just by convention, I've, when somebody makes a request from the client, um, it's pending by default, the server will get that the server will then, um, make sure that it exists.

And the first thing it does is change the status to in flight. Um, and then it will start taking action on, uh, on that request. And so we will then get the full request, which has the, uh, the, the actual like ask of, of, of the client. Um, we'll get the, in this case, we get the session name and then we do this like gather context with open AI.

This is where, uh, this function is reaching into the database and trying to pull out everything it knows about the user who asked this request. Uh, then I, then I update the context. So I save that. So I have this context and I filter out the parts of that that are required to actually fulfill this ask.

And I save just that sort of subset of that context into, uh, into the request itself, just for ease of access. Again, you know, if we sit down and do like a design session on this, there's going to be a lot of changes to make. Um, at that point I say, okay, make the call.

Uh, and so the way that, uh, the way that Twilio works is that it will, uh, I can use the Twilio client and I can make a call. And when that call connects, this config tells Twilio where to hit me back when the status of the call changes. Um, and also any streaming data that's coming in.

And so I have a web socket opened up here. So I have ngrok running, um, on my server right now so that Twilio can hit it. Uh, so I have a, uh, web socket server here that I just set up in express. Um, and when that gets hit with, uh, with a Twilio event, which sometimes is like call initiated, you know, data heard, call ended, that kind of thing.

Um, I have this web socket up here. So I ask in the request to make the call, I pass the request ID. And when Twilio calls my server back, I've asked it to call my server back with the request ID. So that's how I can pass through that data from, from Twilio because I'm, I, once I send off that call, I'm just waiting.

It's, it's, it's gone into the ether and you hope that like you're going to get the phone call from Twilio. So I needed some way to be able to track that request ID so that when the call actually connects, um, I know the context of what the call is all about, and what the request is about.

And so I have Twilio send the web socket request to, uh, this location with the request ID. And at that point I open up the, the media stream handler, which grabs the request ID. Um, and then, uh, and then looks it up. And then there's, there's a number of fairly standard, like media stream functions in here, things like process message, which gets called, um, a lot, right over and over and over.

Uh, you can configure Twilio to, to send you different things. Like you can have it, uh, send you fixed size chunks. You can have it send you the audio data after every utterance or after every pause. Um, and so what I have here is, is basically every, I have Twilio sending me the audio data, the streaming audio data, uh, basically after every utterance, which is just like commas and, and, and, and periods and things like that.

And so when I get that, um, I take the audio data, I convert it into, uh, into a file type that Google Cloud is, uh, fast with. Uh, I get the transcription back and then, uh, I convert, uh, I get the transcription back. I ask open AI to then speak it.

Where's the speaks? Yeah. So here's the speak function. Uh, I then convert that into a streaming format that Twilio is down with, um, for streaming, uh, which is, uh, taking this moolah file, uh, into, and then streaming it as, as base, uh, base 64 back to, uh, back to Twilio.

And then throughout this whole time, um, you can see here client mutation, add to transcript. So every time I'm getting a new transcript entry back from Google Cloud during the stream, I'm just updating the convex database. Like that's just happening. And then the client is just subscribed to those changes and can, you know, show them in a list or I don't know, somebody who's better at UI than I can, can make that look really nice and like have it, uh, have it scrolling or something.

Um, so this is, here's a, this introduce yourself. So I actually have a prerecorded generated audio file that is something like, um, hi, my name is Floyd. I'm calling on behalf of my clients. That's prerecorded because I realized, I'm not sure if it was doing it today, actually, because I didn't hear it.

Um, but I realized the first interaction was the longest. Once somebody picks up the phone, everything kind of kicks into gear. And that's where like the, there's this like built up latency. Um, so somebody picks up the phone, they go, hello, Brock Road garage. And then it was like, well, you heard it because I don't think it's working.

Um, so what I did was have this prerecorded, pre-generated audio file that as soon as somebody says, picks up and says, hello, I just play it. So I don't have to do any transcribing or any, um, any text to speech or anything. And while I'm streaming that, hi, this is Floyd buying myself some time.

That's when I'm actually triggering the first loop of, um, of the conversation and all the transcriptions and the text to speech back. And usually I bought myself enough time that it's a fairly natural result such that it's like, hi, this is Floyd. I'm calling on behalf of my client.

I need to book a car in for an oil change. You know, and like, I found that there's going to be a number of little tricks that you can do to make the experience for the person on the other end of the phone better. So the first thing I do is I say, hi, I'm, uh, I'm an AI, you're talking to AI right now.

Um, I don't want to misrepresent, uh, what, what this is and why I feel good about a platform like this is that in this context, and it's the way that it's positioned, it's almost always buying a service from a business or otherwise making a benign change like canceling an appointment.

It's never selling anything. And my bet is that business owners are not going to care if they're talking to AI, if you're buying something from them. And if that appointment that you're booking is legitimate, they're going to be okay with it. In fact, they might prefer it. And they may even start to internalize and train themselves how to speak with an AI agent on the other end to be super efficient.

Exactly. Yeah. So again, uh, don't put this in production because they could just be like, forget everything, you know, um, which I haven't guarded for here again. So, you know, full, full transparency, but yeah, I, I think actually they could, a couple of Floyd's talking to each other to ultimately book the appointment.

Right. At what point does it just become APIs talking to each other? Right. Full circle. Not right now. I, I, uh, this is, this has been shaky enough. Um, yeah. Uh, yes. Yeah, yeah, for sure. So, uh, on the convex side, um, there is no, uh, heating up because the convex is, is built foundationally on web sockets.

You have your own deployment server that's running, always running. Um, it can scale indefinitely. That's kind of part of what the offer is, but warmup time is not an issue on the convex side. Those delays, um, are, uh, I have some benchmarking here. I have some, some timing so I can see how long did it take, um, for me to transcribe this piece of text.

How long did it take to send that to OpenAI and get the text conversation back? And then how long does it take to turn its response back into audio? So I could see where, um, I have it in here. It's in, it's in the, the terminal here somewhere. Um, but I could see where the latency was and what I, uh, what I've seen is that often as the conversation grows, the prompt I'm sending to OpenAI that includes all of the previous conversation takes longer and longer and longer.

And so the latency and the delays on average tend to get worse the longer the conversation goes. Now, I also originally had built that functionality before, um, threads and the threads, the OpenAI threads API was available. That would be something that I would try. I would work diligently to minimize every prompt I'm sending to OpenAI.

Uh, that would be, that would, I think would have a really big impact. Um, in terms of the, uh, the other thing that can be slow. If you send a large piece of text to the text to speech on OpenAI, that can be slow. That can be, like, like if you send, if it's three sentences, it can still take two, three, four, five seconds.

Um, and so there are a couple of parameters that you can tweak with the OpenAI, uh, text to speech stuff. Um, but not a lot. And so what I would do to fix that is I would pay more for lower latency or I would use another service that I could pay for that would give me lower latency.

Um, the way that this is now. So like in those phone calls, what, what you heard and what you experienced, those delays, there are still optimization opportunities, um, like crazy to, to bring that all down. I'm actually not even concerned with that right now because there's still half a dozen material things.

I haven't tried to close that gap. I was just happy to have a conversation with a computer, um, that I could ask questions to, but that, yeah, the, the, the latency comes from the, uh, the, the sum of all of the different interactions that are happening. And so if you speed up any of one of those along the chain, it's going to be faster and faster and faster.

Um, and second to that point, uh, everything that's coming out every couple months, there's like this massive improvement in some API in this stack. And so again, the bet is that like, well, I can get it as close as possible right now, but I know in six months, it'll be twice as fast without me doing anything.

It's just the rate of innovation and the rate of change in competition right now for this type of thing is so high. Um, that, that, that'd be a bet that I'm, I'm taking on and be like, yeah, maybe it's not perfect right now, but it inevitably will be very soon.

Um, all right. Let's see here. What, what time are we at? Okay. Uh, just a couple more minutes. Okay. Um, any, any other questions I can walk through any of the server stuff, any of the front end code? Yeah. Yeah, that's a, that's a, I love that. Um, so the question was a little bit more context about Convex, um, and how it works under the hood, how it distributes its queries, what kind of, uh, database infrastructure is it running on?

So the team that Convex is, um, it's open source, um, but it's a custom built from the ground up database. It's literally a database built from scratch to be able to solve all of these, to be able to basically provide this product and the people who built it. So my, my boss is the CEO and the CTO, Jamie and James.

Uh, Jamie's here today. Actually, he'll be doing a, some, a keynote speak. I joined Convex because of them. They have, they have this track record that was like mind blowing to me. And it was only about six months ago. I discovered them. And the more I was reading about it, the more I was like, these guys did what?

Like they, they built from scratch in Rust, a brand new database. So James, uh, has, uh, his, his PhD in, uh, database architecture from MIT. And so he was instrumental in basically designing a novel database to make all of this work. And so the way that it runs is, it actually runs on, uh, an AWS cluster and it's running the Convex database and application, which manages all of the WebSocket connections and all of the subscriptions.

Um, in terms of like, literally and physically how it's distributed, I'd love to follow up. Like that's, that's, it's deeper than my expertise. Um, but we do have some large customers, uh, using it these days. Um, there haven't been any, uh, fundamental issues at all in terms of its ability to scale.

We've been very, you know, happy with how that's worked out so far. Um, and then, uh, yeah. And so it's, you could take a look at the, take a look at the open source repo. Um, it's super interesting. And there's a really great blog post written by, uh, our chief scientist, Sujay called how Convex works.

And it does a deep dive into the architecture of, uh, of the database. Um, and so to be fair, sometimes people are like, well, like, uh, you know, is that risky not to do like Postgres or something like that? Um, and it's, it's a bet you'll be taking, but we believe that the, the trade-off there, the developer ergonomics, the speed, the fact that you're, if you want to start a company, you don't need your infrastructure engineers, um, to be building infrastructure.

You can take your infrastructure engineers and they can be building your product. They can be building the things. Okay. That's it. They can be building the things that your customers care about, not worrying about database backups. Cool. Thank you all so much. I really appreciate, um, your attention and your time.

This was, this was a lot of fun. If you want any help, uh, getting the, the, the repo up and running, come find me. I'm, I'm happy to help see if we can get it working on somebody else's machine. Right. Okay. Thank you all very much.

Building an AI assistant that makes phone calls [Convex Workshop]

Transcript