Back to Index

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI


Transcript

So I wanted to kick things off a little bit with some of my personal explorations, and then I'll hand it over to the actual expert, Damien, who actually will be showing how it works under the hood. So if some of you have tried this, did anyone try calling this?

I put this up frequently. How was the experience? Did anyone have an interesting experience? - He told me I told you a fun joke. - He told you a fun joke. Okay, nice. I mean, it basically gives you back whatever you want from it. So you said it's a lot quicker than you thought it would be.

Yeah. Is this a number you can call? Sure. I don't know if you have one phone call you want to call an AI, but whatever floats your boat. So we're going to make this live. Yeah. It's kind of fun. So I was just messing around with VAPI. It's one of these like YC startups, there's like five of these.

So don't particularly create this as like an endorsement, but they are very, very easy to work with. So I was pretty, I would definitely endorse that. Sorry? Yeah, VAPI. I think it's like voice API. And so we're just going to create a personal AI. I think I can just kind of create like a blank template and we can just call this like, I don't know, latent space pod.

I don't know. It doesn't actually matter. And we can do like a system prompt, right? Like you answer in pirates speak something. And then we can publish and then we can start calling it. I don't know if the voice actually works. Oh, by the way, we have no volume control on this thing.

So it's going to come over the speakers really loudly and we cannot control that, but it'll be short. For some reason it's not connecting. Oh, it wants to use my microphone. Hello. All right. Let's try calling it again. Is this working? Is this on? Hello? Hey. Hi, it's working, matey.

Your voice be echoing through loud and clear. What can this old seadog do for you today? Oh my God. Can you tell us how to turn down the volume in AWS loft? If you want to adjust the volume, you'll be in a bit of a pickle. It doesn't have a volume control.

I thought this was a super nice experience. You set up a voice thing. You can connect a phone number to it if you buy a phone number. That's the experience that you saw if you call this number. And you can customize it however you like, whatever system message you have.

On my live stream that I did last week, I added memory to this thing. So if you call it back, ideally you should just remember the previous conversation you had. And then you have a personal AI. It doesn't take that long. I just did it one handed. Yeah, it's pretty great.

It's actually built, today I discovered it's built on the ground, which is Damien's next talk. So I'll hand it over to Damien while he comes over and sets up. Welcome to Damien. Oh, and then Igor, if you want to come up and share your setup. Okay, so I was going to share my personal setup.

Am I the only person in the room recording right now? I see this whole, perfect, cool. Yeah, this is a recording okay meetup, right? Yes. Something I want to do for my conference in June is everything should be default recording and then you opt out. Yeah, absolutely, absolutely. So what I do is I record all my life, constantly, 24/7.

I have two setups, two recorders, and I live in the Netherlands, which allows me to record people without the explicit constant because it's legal in the Netherlands. If you have any private conversation, you are just allowed to record if you're not allowed to share. So that's cool. And yeah, my personal setup is pretty simple.

This is the recorder. It's running all the time. I have quite a number of use cases for it already, and I also, but the main goal for me is to create a huge data set on my life. One year of my life is less than to share a bunch of data or like one year of audio is less than to share a bunch of data and we absolutely can afford just saving all this stuff and then mine is data for important insights.

For example, I have really interesting and lovely conversations with my friends and I use it. I just dump them on my audio player and just can listen to the best conversations with my friends on the player. It's incredible. But I also talk a lot to myself when I'm alone and I'm just explaining this future self, the context of my life and how is my life started and what is going on.

I also record all my therapy sessions. Well, I record everything and I'm pretty sure it's something that will help me to align in AI with myself because I will have this huge data set. I call this interpersonal alignment because there is this gross alignment problem, but I also want AI to know me really, really well, to understand me in the proper context and how to help me to succeed.

I think this data set I created is really valuable. And all of you who are not recording right now, I think you should. Even if you're not going to use it right now, you'll just have this data. There is no good reason not to have this data. It's a recording 4K me time.

I think you should. If you have your Apple Watch, just open audio noise and start recording. Start recording conversations you have. It's super easy to do. You can just take your phone out of the pocket and just record stuff. Thank you. To me, that's actually what a meetup should be like, that people bring their stuff and they talk about their passions, what they're working on.

It's not a series of prepared talk after talk. You actually reminded me, I actually hacked on this iOS shortcut where I can just always press my button and it starts recording anything I have in person. I've actually done this in meetings. When I'm done recording, I can click it.

It transcribes it, saves the file, and then offers me to do a summary right after. So it's highly recommended if you want automation that's simple, that doesn't require walking around with this stuff. I want to share my shortcut as well. My shortcut is, I wrote some code to do it, but when I press my action button on Apple Watch, it also starts recording.

It listens to what I say and saves it locally. When it's in connection, it sends it to the cloud and transcribes it. I see it in my Notion as a list of what's recorded and then transcription next to it. It works really well. I can do it on the phone plain.

It works. Wow. Okay, nice. If you want to share that shortcut with the rest of us, I'd love to steal it. It's really messy, but I do want to do this. I want someone to improve this code because I'm working on my sample. Feel free to talk to me after this.

Awesome. Okay, so something a little bit more polished. I think the audio should come out as well when we get to the demo. So hey, everybody. Damian Murphy. I work as an applied engineer at DeepGround. What an applied engineer is, is it's basically a customer-focused engineer. So we work directly with startups like yourselves, and we help you build voice-enabled apps.

What I'm going to show you today is really around how to build essentially what you saw with Vapi, but using open-source software. So being able to make something like Vapi yourself. Some of the main considerations when you're building a real-time voice bot is performance, accuracy, and cost, and being able to scale.

You want everything with a real-time voice bot. When you called it, you had sub-second response time. So being able to get that sub-second response time is super important. So essentially, if you go beyond, say, 1.5, 2 seconds, a lot of people will actually say something again. They think that the person is no longer there on the other end.

And you need to do that for speech-to-text, the language model, and the text-to-speech. And then on the accuracy, so you want to be able to understand what the person says regardless of their accent. And you want to be able to do that in multiple languages as well. And then on the TTS side, really being able to be human-like, that's the big challenge.

And then cost and scale. So you can build a lot of this stuff with off-the-shelf, open-source software. And probably the text-to-speech stuff won't be fast enough, or the transcription won't be fast enough. But you can actually do a lot of this with managed solutions as well. I'll go into the unit economics towards the end.

So yeah, this is the basic setup. So you have a browser that's going to send audio. Obviously, you need to interact with the browser to capture audio. That's one of the requirements for security reasons. So you'll see a lot of these demos, you have to click a button to actually initiate speech.

The VoiceBot server, it's actually repeated here multiple times just to kind of simplify things. But you can imagine this as a back and forth. You're doing browser to the VoiceBot server to get the audio, and then the VoiceBot server sending that off. And the goal here is to get sub-second latency.

So we have around 200 millisecond latency. We can get that down a lot lower if you host yourself. So that's actually what Vapi does. They host their own GPUs running our software. And you can crank up the unit economics. You can say, "Hey, you know what? Instead of doing it at this rate, I want to do it at 5x." So you can get that 200 milliseconds down to about 50 milliseconds at the cost of extra GPU compute.

And then GPD 3.5 Turbo or 4, you probably get 400, maybe 600 milliseconds of latency in their hosted API. If you go into Azure and you use their services, you can get that down a lot lower. And then on the text-to-speech side, I'm going to show you using DeepGround's text-to-speech.

There's a lot of other text-to-speech providers out there. And what we try to do is have low latency, human-like at a really good price point. And you can get extremely human-like at about 40x price. But that's just something to keep in mind. All right. Demo time. We have two tabs open here.

Okay. So hopefully, this audio will play. Good evening. How can DeepGround or text-to-speech meet the specific needs of your projects? Modify the sound. You're using my... Yeah, yeah. The sound output. Okay. Sorry. It's going to be loud. We can't control it. I think I may have blown the speakers.

Did you blow the speakers? Try to tell us something. Yeah. I can use just the laptop. Yeah. Okay. Let me do that. Yeah. So turn off the... Yeah. All right. Okay. Let's see if it works. Good evening. Planning to integrate DeepGround or a text-to-speech into your workflow? Let's discuss how to get started.

Yeah. I want to talk about something else today. I'm actually at an AWS hacker space for a meetup with personal AI. Sounds like a cool meetup. What's on your mind today? Feel free to share your thoughts or questions. Yeah. Could you give me something funny to show in my demo?

Absolutely. How about a joke or a lighthearted anecdote to break the ice during your demo? Laughter always helps. Yeah. That'd be great. How about this one? Why did the computer go to the doctor? Wait, wait. Hang on there a second. A virus. Smile. Feel free to share that. How about this one?

Why did the computer go to the doctor? Because it had a virus. Smile. Feel free to share that with your audience for a good chuckle. Take your time. Yeah. So you can see there, it's pretty low latency. Oh, sorry. It's pretty low latency. And if you want to get the source code, it's all open source.

So you can just grab the whole repo. You can build your own voice bot. And if you want to hook it up to a telephony system, we have that all available in sample code as well. So you can hook it up to a Twilio, buy a number, do those sorts of things.

So let's just jump back into the presentation. Oh, yeah. So the different components. So the speech to text, that's going to be super low latency. If you don't get that accurate, you're going to get the wrong answer from the LLM. And this is the code that you can use.

So we have SDKs, Python, Go, Ruby, .NET. And you can essentially use all of those. This is actually a Node.js SDK. And it's very simple to set up, right? You literally just import it, drop in your API key, and you can listen to those events. So we'll give you back all the text that was actually spoken while it's being spoken.

And you just need to send us the raw audio packets. And then on the GPD side, so you can swap this out with Cloud. We actually have a fork of that repo that uses Cloud as well. Cloud haiku is surprisingly good. So if cost is something we want to get down, that's definitely an option.

But some of our customers will actually run their own LLAMA2 model super close to the GPUs that are running the speech to text and text to speech. And that just removes all the network latency out of the equation. So here's a simple example of how you would consume that.

I'm sure you're all pretty familiar with the OpenAI API. And that will basically give you streaming. That's one of the really important things here is you want time to first token as low as possible. The reason for that is if you wait till the last token, you're going to increase that latency.

And then on the last bit, this is the text to speech part. It's a little more tricky. You've got to deal with audio streams. And you're going to want to stream the audio as soon as you get it so that you can actually start playing at the moment the first byte is ready.

Again, with the LLM, if you wait for the last token, if you wait for the last byte of your audio stream, you're going to incur that bandwidth delay. So yeah, if anybody wants the open source repo, go ahead and scan that. Yeah, that's a really cool demo. The latency is great.

What do you think the next frontiers are? You had interruptions, right? What about can it proactively interrupt you based on maybe it knows what you're trying to say, like a human would just cut you off, or back channeling or overlapping speech and all the more getting more towards human level kind of dialing?

Yeah, so the question was, could the AI interrupt the person? And that's definitely possible. I don't think that would happen necessarily at the AI model level. I think that would just be business logic. And so you'll get everything that's spoken as it's spoken. So if you were like, you know what, I think I know what you're going to say, you could preemptively do it.

And I have seen some demos where that's a trick that they use to actually lower the latency is to predict what you're about to say. So then you can fire off an early call to the LLM. This guy's no fun. You have to send a demo, open repo, and everything.

They interrupt, they can have that. Very good. Yeah. And the cost of a lot of these LLM things is a big challenge as well. So if you're constantly sending it to an LLM to achieve these use cases, your cost per minute might go up to like 30 cents. So in this demo here, and these are all kind of list prices, you can get these prices down with volume.

And so if you just signed up today, these are the sorts of prices that you would pay. And just to give you an idea, Cupidy 3.5 Turbo has dropped in price dramatically over time. And so Claude Haiku is even a fraction of this as well. And then on the text-to-speech side, doing something like this within 11 labs would be about maybe $1.20, just to give you an idea of comparison.

So you can do a five-minute call here for about six and a half cents. And if you're doing millions of hours of calls, that price can definitely come down. Yeah. So changing that then to be a real-time callable voice bot, like you saw with the VAPI demo, you're essentially just swapping out the browser for this telephony service, right?

So Twilio has about 100 millisecond latency to give you the audio when you get called. And then you're just sending it through that same system and then just back to the telephony provider. And yeah, so if you sign up today, you get $200 in free credits. For post-call transcription, that's about 750 hours.

For real-time, that's probably about 500 hours of real-time transcription. So it's a pretty big freebie there, so if anybody wants it. And that's it. Any questions? Yeah, go ahead. Just in terms of achieving real-time performance, GPD 3.5 versus 4, how do you- Yeah, 4 is going to be a lot slower, especially if you're using their hosted API endpoint.

You're going to see massive second fluctuations in their hosted endpoint. If you go on to Azure and you use their service, you're paying more, but you're getting much better latency. So you could deploy all of this on Azure next to GPD 4, and that's going to give you the sort of latency that you saw in the VAPI demo.

The demo I guess actually is using all hosted APIs, so there's no on-prem kind of setup there. Ethan, Nick, and then I think Harrison just walked in. So I'll just warm you guys up a bit. Thanks to David. David only signed up to speak today, which is a very classic meetup strategy.

So we don't have a screen for this. Do we have a screen? You can share your screen. I can share my screen. All right, so I'm afraid the audio might be shattering your eardrums, so I might have to cut it off, but we can try. You said it was okay at the start.

Yeah, I mean, okay. You can do that on the laptop, no? No, no, there's no laptop. Yeah, sorry. So maybe from the start, what did you work on, why? Yeah, sure. So I think actually it feels good to be here because I feel like I'm with my people. You pretty much summed up the philosophy.

What can AI do for you if it has your whole life in this context and it experiences your life as you do? Wouldn't it be way better? I understand you as all your contacts. So that's kind of, I think a lot of us here have the same idea. And so that's what I've been working on for, it's about November when Lava came out.

So first started working on the visual component. It's not just audio. You want it to see what you see. But it's a lot more challenging to get a form factor with continual video capture. So built a really, really small but simple device that's actually easy to use. I don't know.

I mean, you guys can check it out after the talk. But I think there's a lot of advantages to not having to have an extra piece of hardware to carry around, but at least we try and make it as small and as light as possible. You don't have to charge it every couple of days.

But there's also other subtle reasons why it's good to have an external piece of hardware, which I can show you in a minute. So yeah, this captures all the time. And in fact, I can show you here in the app. So you can see this default. It's got the battery level here.

But here you can see all the conversations I've had. And this is actually an ongoing conversation. So you can see where I am and transcript in real time. We do a bunch of different pipelines. So after the conversation is over, we'll run it through a larger model. And then do the speaker verification so it knows it's me, which is important so it can understand what I'm saying versus other people in the room.

Actually, conversation end pointing, and the VPN person probably knows, even just utterance level end pointing is complicated at heart. When am I stopping talking? Or am I going to keep talking and just pause for a second? That's hard. But then conversation end pointing, when is this a distinct conversation versus another, is even harder.

But that's important because you don't want just one block of dialogue for your whole day. It's much more useful if you can segment it into distinct semantic conversations. So that involves not only voice activity detection, but things like signals around your location, and even the topics that you were talking about, like at the LLM level.

So it's very difficult. There's still a lot of work to do, but it does work. So in the end result, I will get a summary generated, some takeaways, a summary of the atmosphere. Then from the major topics, I'll find some links, and then you still have the raw data.

So that's the foundation layer, is to have something that does the continuous capture, does the basic level of processing, but that's just the base layer. I can query against the whole context of everything it knows about me. I think this is, I was talking to my developer a few days ago.

So here we have, you know, it's through retrieval on all my conversations. This is a conversation I was last talking about, and it will even cite the source. So I can just jump to the actual conversation just a few days ago, or a week ago, or something. And so here we were debugging some web view issues.

So that's just kind of like the basic memory recall use case. I just have maybe one or two more, then I'll turn it back over. So again, that's all kind of the base layer, but like the real, you know, I think everybody here believes that the real future will be using all of this context so that AI can be more proactive, can be more autonomous, because it doesn't need to come to ask you everything.

If you had the meager co-workers every day, and it's a blank slate, you know, it's way less productive than if they have the whole history. So there's a voice component to it, and this is, I'm a little nervous, because I'm going to blow everybody's ears on that, but we can try.

So we're using hot words here, which I don't believe is the best paradigm, but for now, we have some other ideas, but for now, similar to Alexa or Siri, I can basically inform my AI that I'm giving it a command that it should respond to in voice. So I will just, Scarlett, can you hear me?

Yes, I can receive and understand your messages. How can I assist you today? Okay, that's frightening. I will just now do one example. So like, you know, you have the ability to interact with the internet, your AI should, so I can have it go do actions for me using any app.

So I'll just do a very simple example. Scarlett, send a message to Maria on WhatsApp saying hello. One sec, I'm on it. Starting a new agent to send WhatsApp message to Maria. So now this is on my personal WhatsApp account. Maria's right here. She can verify that she receives it.

Message hello sent to Maria on WhatsApp. So maybe one more, and then I'll let you guys go. So like, that's just opening one app and doing something, but can it do multiple apps and have a working memory to remember between app contexts? So Scarlett, find a good taco restaurant in Pacific Heights and send it to Maria.

One sec, I'm on it. Starting a new agent to send taco restaurant details to Maria. So it's opened Google Maps. It's going to try and find a taco restaurant, hopefully remember once it does, and then send it to Maria, which it learned that I implicitly met WhatsApp, right? Hopefully, because it picked up that I talked to Maria on WhatsApp.

So going to WhatsApp, pasting the link. The details of Taco Bar, a well-rated taco restaurant in Pacific Heights, San Francisco, have been successfully sent to Maria on WhatsApp. Yeah, so that's basically it. Yeah, that's a great demo. There's got to be some questions. Components are open source. So there's industry, you know, Adam's here, great, Nick, some really great people in the open source space here.

We open sourced a lot. I really learned a lot about, you know, I've done some minor open source projects myself, you know, mostly I'm just a contributor, but trying to run and launch one was kind of a new experience for me. And I learned that, like, we did not make the developer experience very good.

It was very complicated, like, because we were using, like, local whisper, local models, and, like, getting it to work on CUDA, Mac, Windows, we didn't do a good job. So it was very difficult for people to get started. And so it was a little disappointed with the uptake, you know, and there are much better projects that are way easier to use.

So really been focused now on just trying to focus on figuring out what the right use cases are and what the right experiences are going to be. And it was, like, really difficult to try and fit everything into an open source project that would be actually used. So yeah, you can see the repo, and I'd recommend Adam's and Nick's too.

But we'll definitely be contributing a lot back more to open source. >> Is it OWL or B? >> OWL is the open source, the repo. >> Yeah, yeah, okay. >> And you should, I'll plug Adam's, Adios and Nick's repo. We can, they can send out the, I don't know what the actual GitHubs are called, but they're easy to find.

But yeah, he's going to talk, so you can show. Yeah. >> Can you talk about the hardware? >> Sure, yeah, yeah. >> Tell us more about it. >> This is, like, V1. So we have another, like, V1.1, which is actually even about 25% smaller, and way better charging situation in terms of wireless charging, so, like, the size down.

But the real thing we're most excited about is, like, the next version with Vision. Like I say, Vision is really hard. I don't know of any device that can do all-day capture of, like, sending video. There's, like, a ton of challenges around power and also bandwidth. But we have some really kind of novel ideas about it.

The, it's Bluetooth Low Energy, which, I mean, I'm sure you've seen other ones that operate that way, and that has a lot of advantages. We also have one that's LTE, and so there's, like, LTE-M is, like, a low-power subset of LTE. It's only, like, two megabits, but it's way more power efficient.

I think Humane and Rabbit are both LTE and Wi-Fi, but to get, like, a wearable, you really need Bluetooth. Just a quick question. I saw the part of the interface where it had a, was that a streaming from iPhone? It's an emulated cloud browser. With that picture-in-picture, how are you doing the part of the interface where you have the Google Maps at the bottom corner?

That's on the cloud, so we're streaming that as feedback. I think the ultimate goal is that that disappears entirely, right? That's actually mainly for the demo to, like, show that it's real and that it's working. But, like, I think ideally, like, it should be totally transparent that you're just, you have, like, a personal AI.

It can do whatever it needs to do, and it will just give you updates on if it, you know, needs more info or its status. But, like, it's kind of just an interim solution until we have, I guess, so sorry. Yes, it is. Okay, last question, and then we have to move on.

I've been experimenting with this app, and you have, like, a focus here. Yeah, I just put it in my pocket. What I wanted to say about Vision is that, like, I tried to experiment with capturing Vision, and, like, the best solution so far I found is, like, to buy a cheaper Android smartphone and put it in your focus here, like, camera outwards.

It's incredible. It has an internet connection. Good battery. Good battery. It's incredible. It's extremely cheap, but it's incredible. Like, you have this focus. Yeah, I've had whole demos of doing that, because also, you know, also, it doesn't put people off as much. Yes, yes, and nobody ever thinks you're an idiot.

Yes, you just have a… I do think, in privacy sense, maybe slightly different than you, I do want people to understand, but, like, it was interesting that, yeah, hairstyle, basically. But nobody thinks twice. It's a different thing. You can just write on it. It's like, I'm recording it. I'm just saying, the convenience of not working on the hardware, you can just take the… Sure, sure, yeah.

You do need a front pocket, so maybe there'll be new AI fashion, where it's, like, these are my phone pockets. Okay, give it up for him. That was awesome. Yeah, I mean, everyone will, I think, will be sticking around, so you can obviously go up to him and get more insight.

Awesome. All right. Hey, everyone. One second. Oh, yeah, it's connected. Now it is. Nice. So, where do I even start? Yeah, unfortunately, I was kind of not supposed to be here, because I'm organizing a brain-computer interface hackathon tomorrow, and I had to, like, somehow get 50 headsets, which is why there will be no presentation, but I will still try to be useful to you as possible.

This is the hackathon I was telling about right now. We have, like, lots of people. There will be people from Neuralink. We'll have, like, 50 different BCI headsets and so on. So, if you're interested in, like, BCI stuff, etc., and if you want to attend the hackathon, scan this QR code and mention that you have been here and… Oh, yeah, my bad.

Sorry. And you might get much more chances to be accepted, because we have, like, 50-50% rate. We try not to accept people who don't have experience. Anyways, so, yeah, what I will try to help with, I honestly really, really love open source, and I believe, like, all this stuff should be open source, which is why now on this short demo I will just show you all current open source projects, and I will try to highlight most important things you need to know about them, and I will probably first start with Owl, which you have just seen by Ethan.

So, he started that. I think he was, like, one of the first people who started, like, open sourcing any kind of wearables. Probably Adam actually was before, but you announced first, so I remember that. But, yeah, yeah, yeah. Anyways, so, yeah, this is his repo. You can, like, check it out.

I think I have a QR code here opened as well. If not, just give me a sec. I will just generate it quickly for you. All right. Should be, yeah, just scan this QR code, and you can just access his repository. Yeah, so, this is Ethan's. Then there is another one, which, in my opinion, well, it's definitely the biggest one.

It's by the guy who's sitting right there, Adam. I truly believe that Adam is, like, the guy who started all open source hardware movement, so at least I started doing everything because of Adam, so thank you for that. And they have a lot of traction here, 2,600 stars, and if you want to kind of, like, ramp up your way into open source, like, wearables of any kind, I suggest to start with this repository.

It's probably the biggest one you'll be able to find right now, and the QR code, you can scan, I think, this one. Yeah, just, like, feel free to scan. I'll send it. Yeah, yeah, cool, cool, cool, cool, awesome. So, they use, I think, Raspberry Pi, also ESP32, which is kind of technical.

You probably don't need this information, but anyways. Yeah, and now, who I am, like, a little bit about myself as well, some marketing. So, my story starts very recently, maybe two months ago, after I saw Ethan's and Adam's launch of, like, their open source hardware stuff, and I launched my own.

It all started with, basically, Seasoned Disease from Humane, and we launched, like, just for fun, honestly, Pumain. The idea here was, like, you take a picture, and you scan the person's face, and we searched the entire internet, and we find the person's profile, like, and send it to you, while, like, either, like, via notification, or on the site, and so on.

So, this is how it looks. Started with that, had a lot of contributors, was a fun project, but not really, I mean, like, yeah, we don't really want to bring any harm to Pumain, and so on and so forth, just fun stuff. So, after that, we did, oh, QR for this, I think we will also send, right?

Like, cool, cool, cool, cool, cool. Okay, another one I will promote here a little bit is this one. This, we launched, literally, like, last week. This is pretty much what Adam and Ethan have done, with the only difference that we use right now, the lowest power chip. I think Ethan also uses that.

I just try to, like, you know, like, as soon as possible, to let everyone know that, like, I think it's probably the best opportunity you can currently find in the market. It's called Friend, you can check it out. This is how it looks. I actually have it with me.

And I was supposed to actually show you, like, the live demo as well, but we just did a very cool update, where we made it work with Android, by the way, so it's iOS and Android right now. And also, we updated the quality of speech, literally, three hours ago, and made it, like, five times better.

So, when we launched, I'll be honest, it was, like, completely horrible, and now it's, like, five times better. It's amazing, and I'm really excited by that. I would want to show you the video, but I know that it's, like, you will not want to have your ears, you know, like...

Oh, yeah, really? Oh, nice. Okay, let's try. Anyways, yeah, this is the chip that is being used inside of that wearable, and it works pretty much the same as what Ethan has with... Oh, it doesn't work, right? Next. Wait, wait, wait, wait, wait, let me figure out which one.

Yeah, yeah. You can go back. Okay, let's try. Oh, nice. Okay, cool. Awesome. Start it. Give me the permissions here, and hit next, and scan for devices here, and then select the device, and that should just connect. Okay, I think we're going to start speaking. Yeah, so pretty much testing this for the first time, and let's see if it's able to transcribe what I'm trying to speak, and yeah, pretty much waiting for the first 20 days must have finished, and for it to return us the output of the speech from the OpenAI Whisper endpoint, and it says...

Yeah, there we go. Yeah, so as you have seen, it recorded the speech on the lowest power chip probably ever right now accessible in the market, and I'm very excited that we actually made it work, because it was very, very fun. It took, like, a lot of time. Anyways, that's pretty much it, I guess.

I don't really have anything else for you to show. This is the final QR code. I know you'll have all the links, but this one I really, really advise you to scan, because it has basically the collection of, like, all links in one place. So yeah, that's pretty much it.

Use open source. I think it's cool, and let's try to build cool stuff together. That's pretty much it. Thank you. Any questions? Did you record on that? No, because... You're not recording right now. No, no, no. I'm not recording, because we launched the update, like, four hours ago, and I wanted to bring it here, so I broke my thing, and now it's, like, half broken, unfortunately.

But anyways, any questions? Cool. Yeah, go on. What do you think the biggest next challenge? Biggest next challenge? Yeah, I definitely agree with you that the biggest challenge is, like, editing video and images and so on. It's, like, very hard as hell, and I think to make software useful as well is very, very hard.

Like, yeah, we can all do, maybe, like, recording from the device and, like, attach maybe some battery and so on, but, like, how to make it actually, like, sexy, that's hard. Like, how to make it, you know, do actions, how to make it remember everything and so on and so forth, that's the biggest challenge.

So, yeah, but I agree with everything you said. I would have said that the biggest challenge is making people want to wear it. Oh, yeah. I don't know the tech bros or girls, or specific, probably. Yeah, Adam, the guy who created the biggest open source thing, he said that the biggest challenge is to make people want it, basically, right?

I mean, same, yes. So, that's one of his suggestions. Yeah, go on. What's been the challenge of reducing latency? What's been the challenge of reducing latency? Honestly, it was just, like, software issue, because, like, this chip is, like, not that widely used. There's not so much documentation, not so many projects and so on.

So, just, like, a matter of, like, trying other things. And also, it doesn't have huge on-board memory. So, like, how do you store very good quality memory on a very small, very quality audio on a small memory chip, and then send it to phones and stream it, basically, non-stop?

That was pretty hard. But we solved it, like, five hours ago, and that's pretty much it. Yeah, go on. How did you improve the quality of voice? We just made it work. Like, we used, we had 4,000, it's pretty technical, I don't think, I don't know if you'll want this, but anyways, we used 4,000 hertz, like, quality.

It was, like, super bad, because the memory was too small. And now, we just found a way to, like, compress it and improve it to, like, 16,000, which is, like, pretty great, which you have heard on the video. So, it can, like, recognize pretty much anything, even, like, multiple people speaking, even if you will be there and the device will be here.

So, yeah. Anything else? I think we can leave the... The last one. Go on. How far can the device detect audio? Cool. So, good one will be, like, you know, two feet from person to person, like, good. If you have maybe, like, four feet from each other, it might still be a person.

Cool. Thank you all. Cool. So, what I want to talk about is a lot less cool than all this hardware stuff, so I feel a little bit out of place, but my name is Harrison, a co-founder of BlankChain. We build, kind of, like, developer tools to make it easy to build LLM applications.

One of the big parts of LLM applications that we're really excited about is the concept of, kind of, like, memory and personalization, and I think it's really important for personal AI, because, you know, hopefully, these systems that we're building remember things about us and remember what we're doing and things like that.

We do absolutely nothing with hardware, so when we're exploring this, we are not building, kind of, like, hardware devices, so we took a more, kind of, like, software approach. I think one of the use cases where this type of, like, personalization and memory is really important is in, kind of, like, a journaling app.

I think, for obvious reasons, when you journal, you expect it to be, kind of, like, a personal experience where you're, kind of, like, sharing your goals and learnings, and I think in a scenario like that, it's really, really important for, if there is an interactive experience for the LLM that you're interacting with to really remember a lot of what you're talking about.

So, this is something we launched, I think, last week, and it's basically a journaling app with some interactive experience, and we're using it as a way to, kind of, like, test out some of the memory functionality that we're working on. So, I want to give a quick walkthrough of this and then maybe just share some high-level thoughts on memory for LLM systems in general.

So, the UX for this that we decided on was you would open up, kind of, like, a new journal, and then you'd basically write a journal entry, and I think this is, kind of, like, a little cheat mode as well, because I think this will encourage people to say more interesting things.

So, I think if you're just taking, like, a regular chatbot, there's a lot of, like, hey, hi, what's up, things like that, and I don't think that's actually interesting to try to, like, remember things about. I think it's more interesting if you talk about personal things, and so let me try this out.

I'm giving the talk right now, and then I can submit this, and then the UX that we have is that a little chat with a companion will open up. Okay, so, yeah, so, right before this, I told you that I was about to give a chat about a journaling app, and so it, kind of, like, remembered that I was going to do all that.

Is there a particular part I'm most excited to share? The memory bit. I don't know. So, this actually worked on the first try, so I was a bit surprised by that, so that's good. And, oh, okay, so, how do you plan to tie in your love for sports with the theme of memory during your talk?

So, before, when I was talking to it, I had mentioned that one of the things that I wanted to talk about was, you know, how a journal app should remember that I like sports, so I guess it remembered that fact as well. Amazing. So, I can end the session.

So, the basic idea there, and again, this is, you know, we're not building this as a real application. We would love to enable other people who are building applications like this. I think the thing that we're interested in is really, like, what does memory look like for applications like this?

And I think you can see a little bit of that if you click on this memory tab here. We have, like, a user profile where we basically kind of, like, show what we learned about a person over time, and then we also have a more, like, semantic thing as well.

So, I could search in things like Europe, and I'm going to Europe kind of, like, after a wedding. I love Italy. And so, basically, there's a few different forms of memory. And if you'll allow me two minutes of kind of just theorizing about memory, we're doing a hackathon tomorrow, and maybe some of you are going to that, sort of signed up.

I don't know if he's actually going to show up. So, very quickly, like, how I think memory is really, really interesting. It's also kind of, like, really vague at a high level. I think, like, there's some state that you're tracking, and then how do you update that state, and how do you use that state?

These are, like, really kind of, like, vague things, and there's a bunch of different forms that it could take. Some examples of kind of, like, yeah. Some examples of memory, like, that a bunch of real apps do right now, like, conversational memory is a very simple but obvious form of memory.

Like, it remembers the previous message that you sent. Like, that is, like, incredibly basic, but I would argue that it can fall into this idea of, like, how is it updated? What's the state? How is it combined? Semantic memory is another kind of similar one, where it's a pretty simple idea.

You take all the memory bits, you throw them into a vector store, and then you fetch the most relevant ones. And then I think one of the early types of memory that we had in LingChain was this knowledge graph memory, where you kind of, like, construct a knowledge graph over time, which is maybe, like, overly complex for some kind of, like, use cases, but really interesting to think about.

So, LingMem, like, name TBD, is some of the memory things that we're working on, and we kind of wanted to constrain how we're thinking about memory to make it more tractable. So, we're focusing on, like, chat experiences and chat data. We're primarily focused on, like, one human to one AI conversations, and we thought that flexibility in defining, like, memory schemas and instructions was really important.

Like, one of the things we noticed when talking to a bunch of people was, like, the exact memory that their bot cared about was different based on their application. If they were building, like, a SQL bot, that type of memory would be very different from the journaling app, for example.

So, there's a few different, like, memory types that we're thinking about. All of these are very, like, early on. I think one interesting one is, like, a thread-level memory. An example of this would just be, like, a summary of a conversation. You could then use this. You could extract, like, follow-up items, and then in the journaling app, you could kind of, like, follow up with that in the next conversation.

We actually might have added that. That might be why it's so good at remembering what I talked about in the previous talk. I forget. Another one is this concept of a user profile. It's basically some JSON schema that you can kind of, like, update over time. This is one of the newer ones we've added, which is basically, like, you might want to extract, like, a list of things.

Similarly, like, define a schema, extract it, but it's kind of, like, append only. So, an example could be, like, restaurants that I've mentioned. Maybe you want to extract the name of the restaurant or what city it's in. If you're kind of, like, overwriting, if you put that as part of the user profile and you overwrite that every time, that's a bit tedious, so this is, like, append only, and then we do some stuff with, like, knowledge triplets as well, and that's kind of, like, the semantic bit.

I think probably the most interesting thing for both of these is maybe, like, how it's, like, fetched. So, I don't know if people are familiar with the generative agents paper that came out of Stanford last summer-ish, but I think one of the interesting things they had was this idea of fetching memories not only based on semantic-ness, but also based on recency and also based on, like, importance, and they'd use an LLM to, like, assign an importance score, and I thought that was really, really novel and enjoyed that a lot.

And, yeah, that's basically it. So, yeah, lots of questions. Oh, yeah, yeah. I think they're all about all the same things, and I think a lot of your approach makes a lot of sense. Same kind of compromise you have to make implicit, but, like, to give a true knowledge, you talk about, like, the triplets, and, like, how do you think that we can get to the point where we can have more of a dense graph rather than just simple propositions about it?

Because, like, our memory works, and it's all relational to, like, you know, to people, to places, and that's actually important information. And so, like, do you think we'll be able to figure out a simple way to do that, or is it just going to be too hard? Yeah, the honest answer is I don't know.

I think even today, like, if you had that, and I think, like, there's two things. Like, one's, like, constructing that graph, but then the other one's, like, using that in generation, and, like, even today, like, for most, like, RAG situations, combining, if you combine a knowledge graph, it's often not taking advantage of, like, it's really, like, it's, there's different ways to put a knowledge graph into RAG, and it's not, yeah, it's very exploratory there, I'd say.

So, I'd say, like, one issue is just even creating that, and then the other one is, like, using that in RAG. So, I think that's, like, a huge, yeah, I don't know. Yeah, we'll see, it's interesting. Yeah, because of all the memory things, I just have, like, a couple points, I think.

No matter which memory option you're using, like, let's say you're using, like, the pen-only memory model versus the knowledge graph one, like, eventually, that has to be inserted into context, right? Like, you're picking the relevant sections, and then there might be, like, a semantic way to figure out, okay, which ones, which facts should I embed into the context during generation or during the query?

But, like, I was just curious, like, is there work on defining semantic models that don't leverage, like, the reasoning of the model that, sort of, like, let's say you want to build up, like, not just memory, but, like, an understanding of the semantics of the operations, for example, outside of depending on the language model to provide that semantic interpretation on top of whatever memory context you inject?

If I'm understanding correctly, are you asking, like, is there a concept of building up a memory about how the language model should do, like, the generation aside from just inserting it into the prompt? Yeah, like, you know, the memory is just externalizing, it's basically, like, a cache for to, like, get rid of the token limit, and also bring things into more attention by transforming the prompt a little bit and injecting instructions about how to treat bits of memory.

Yeah, I was wondering, like, have you, has there been work done to basically decouple the language processing versus, like, the actual, like, operational semantics of using the memory? Yeah, I'm assuming there's, yeah, I think, like, an alternative approach to this, which I think is what you're getting at is, like, rather than doing, kind of, rather than, kind of, like, creating these strings and putting them into the prompt, you could have some kind of, like, attention mechanism which would, like, attend to different parts of previous conversations or something like that.

I'm sure there's, I think, like, I mean, you could take this to the extreme and basically say, like, for, there's a lot of stuff that could fit into, like, even, like, a million token context window or a 10 million token context window. And so, an alternative strategy could just be, like, hey, put everything, all these conversations and all these journals I've ever had into one LLM chat and, kind of, like, let it go from there.

And, yeah, I'm sure people are working on things like that and doing things like some sort of, like, caching to make it easy to, kind of, like, continue that. I don't, I don't know of any details. But, yeah, I think that's a completely valid, completely alternative, kind of, like, approach.

It's also really interesting. I don't think anyone knows how to deal with memory. And so, I think all these different approaches are. >> In a general way. >> Yeah, yeah. >> How do you expect, you know, what's going on in our brains? >> Questions? >> Yeah, so, when I see these systems, and I guess that's a question pretty much for everyone who presented, right?

Like, when I think about these systems, I always think, what does 10 years of memory look like? And a lot of the facts that we remember are either not relevant anymore or probably false. So, how do you think about, like, memory? >> Yeah, I think there absolutely needs to be some sort of memory decay or some sort of, like, invalidating previous memories.

I think it can come in a few forms. So, like, with the generative agents, kind of, like, paper, I think they tackled this by having some sort of recency weighting and then also some sort of importance weighting. So, like, it doesn't matter, like, you know, how long ago it was, there are some memories that I should always remember, right?

But then otherwise, like, you know, I remember what I had for breakfast this morning. I don't remember what I had for breakfast, like, 10, like, and maybe I should, but I don't think that's, like, important. So, yeah, I think, like, recency weighting and importance weighting are two really interesting things in the generative AI or the generative agents paper.

Another really interesting paper with a very different approach is MemGPT. So, MemGPT uses a language model to, like, actively, kind of, like, construct memory. So, like, in the flow of a conversation, like, the agent basically will decide whether it should write to, like, short-term memory or long-term memory. I think that's a, yeah, that's a, I think it's actually quite a different approach because I think in one you're having the application, like, actively write to and read from memory, and then in the other one, the one that we're building, it's more in the background, and I think there's pros and cons to both.

But I think with that approach you could potentially have some, like, overwriting of memory or, yeah. >> Cool. One last question. >> Last question. Also, that generative agent small bill paper, amazing. It's, like, one page. They have, like, an exponential time decay every day. Stuff is less relevant. Definitely recommend it.

>> Hey, Harrison. Thank you for the presentation. I just want to ask about there was a new paper called Rapture, I think, and it feels like it's a really cool approach to memory because sometimes when you want to say something, like, who am I? Oh, roast me. It's really hard to do with rag and these type of approaches, but the Rapture could be a nice way to tackle that.

>> Can you summarize the paper? >> I think it's about, like, doing, like, partial summarization in, like, a free form, and we are trying to experiment with that, and it seems like, but I think you know more about it. I just, like, found out about it a couple days ago.

>> Yeah, so I think the idea is basically for rag, you chunk everything into really small chunks, and then cluster them, and then basically hierarchically summarize them, and then when navigating it, you go down to the different nodes. I hadn't actually thought of aligning it with memory, but that actually makes a ton of sense.

So one of the issues that we haven't really tackled is in this journaling app, if you notice in here, there's a bunch of ones that are really similar, right? And so, like, there's a clear kind of, like, consolidation or update or something procedure that kind of needs to happen that we haven't built in yet, and so I actually love the idea of doing this kind of, like, hierarchical summarization, and maybe, like, collapsing a bunch of these into one, and maybe that runs as some sort of background process that you run every, I don't know, yeah, you run every day, week, whatever, it collapses them, accounts for recency to account for the issue that was brought up earlier around wanting to, like, maybe overwrite things.

Yeah, I think that's a, I had not thought of it at all, but I think that's really interesting. >> That's kind of like sweeping. >> Sorry? >> Like when the brain consolidates memory. >> Yeah, yeah, yeah, yeah, yeah, yeah. >> Cool, I think I've got to wrap it here.

>> Thank you so much, Harrison, and a round of applause for him. >> I think that's the real reason why this is not just a hardware-only meetup. It started as a wearables meetup, but then we added full-time voice, and then we added memory, because it's all components, including your personal AI.

So we have 15 minutes until we have to clear out of here. Everyone is, it looks like you're sticking around to just chat. If you want to just see the devices and talk to Harrison, go ahead, you may guess. There are, like, three hackathons spawning from this thing, so I'll send all the links on Google, but thank you so much, thank you very much.