back to indexPersonal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI
00:00:00.000 |
So I wanted to kick things off a little bit with some of my personal explorations, and then I'll 00:00:04.000 |
hand it over to the actual expert, Damien, who actually will be showing how it works under the 00:00:14.400 |
hood. So if some of you have tried this, did anyone try calling this? I put this up frequently. 00:00:20.240 |
How was the experience? Did anyone have an interesting experience? 00:00:24.800 |
- He told you a fun joke. Okay, nice. I mean, it basically gives you back whatever you want from 00:00:29.040 |
it. So you said it's a lot quicker than you thought it would be. Yeah. 00:00:34.160 |
Is this a number you can call? Sure. I don't know if you have one phone call you want to call an AI, 00:00:43.200 |
but whatever floats your boat. So we're going to make this live. Yeah. It's kind of fun. So I was 00:00:51.680 |
just messing around with VAPI. It's one of these like YC startups, there's like five of these. 00:00:57.040 |
So don't particularly create this as like an endorsement, but they are very, very easy to 00:01:01.040 |
work with. So I was pretty, I would definitely endorse that. Sorry? Yeah, VAPI. I think it's 00:01:08.240 |
like voice API. And so we're just going to create a personal AI. I think I can just kind of create 00:01:13.520 |
like a blank template and we can just call this like, I don't know, latent space pod. I don't know. 00:01:21.600 |
It doesn't actually matter. And we can do like a system prompt, right? Like you answer in pirates 00:01:30.160 |
speak something. And then we can publish and then we can start calling it. I don't know if the voice 00:01:37.920 |
actually works. Oh, by the way, we have no volume control on this thing. So it's going to come over 00:01:41.840 |
the speakers really loudly and we cannot control that, but it'll be short. For some reason it's 00:01:47.600 |
not connecting. Oh, it wants to use my microphone. Hello. All right. Let's try calling it again. 00:01:55.840 |
Is this working? Is this on? Hello? Hey. Hi, it's working, matey. Your voice be echoing 00:02:07.280 |
through loud and clear. What can this old seadog do for you today? Oh my God. Can you tell us how 00:02:12.720 |
to turn down the volume in AWS loft? If you want to adjust the volume, you'll be in a bit of a 00:02:21.200 |
pickle. It doesn't have a volume control. I thought this was a super nice experience. You 00:02:31.440 |
set up a voice thing. You can connect a phone number to it if you buy a phone number. That's 00:02:35.200 |
the experience that you saw if you call this number. And you can customize it however you 00:02:40.560 |
like, whatever system message you have. On my live stream that I did last week, I added memory 00:02:46.720 |
to this thing. So if you call it back, ideally you should just remember the previous conversation 00:02:52.160 |
you had. And then you have a personal AI. It doesn't take that long. I just did it one handed. 00:02:55.920 |
Yeah, it's pretty great. It's actually built, today I discovered it's built on the ground, 00:03:02.240 |
which is Damien's next talk. So I'll hand it over to Damien while he comes over and sets up. 00:03:08.160 |
Welcome to Damien. Oh, and then Igor, if you want to come up and share your setup. 00:03:12.480 |
Okay, so I was going to share my personal setup. Am I the only person in the room recording right 00:03:19.920 |
now? I see this whole, perfect, cool. Yeah, this is a recording okay meetup, 00:03:24.720 |
right? Yes. Something I want to do for my conference in June is everything should be 00:03:28.880 |
default recording and then you opt out. Yeah, absolutely, absolutely. So what I do is I record 00:03:35.600 |
all my life, constantly, 24/7. I have two setups, two recorders, and I live in the Netherlands, 00:03:41.120 |
which allows me to record people without the explicit constant because it's legal in the 00:03:45.440 |
Netherlands. If you have any private conversation, you are just allowed to record if you're not 00:03:50.080 |
allowed to share. So that's cool. And yeah, my personal setup is pretty simple. This is the 00:03:59.920 |
recorder. It's running all the time. I have quite a number of use cases for it already, 00:04:07.760 |
and I also, but the main goal for me is to create a huge data set on my life. One year of my life 00:04:15.280 |
is less than to share a bunch of data or like one year of audio is less than to share a bunch of 00:04:19.760 |
data and we absolutely can afford just saving all this stuff and then mine is data for important 00:04:28.080 |
insights. For example, I have really interesting and lovely conversations with my friends and I 00:04:32.800 |
use it. I just dump them on my audio player and just can listen to the best conversations with 00:04:38.000 |
my friends on the player. It's incredible. But I also talk a lot to myself when I'm alone and 00:04:45.360 |
I'm just explaining this future self, the context of my life and how is my life started and what is 00:04:51.920 |
going on. I also record all my therapy sessions. Well, I record everything and I'm pretty sure it's 00:04:58.720 |
something that will help me to align in AI with myself because I will have this huge data set. 00:05:05.520 |
I call this interpersonal alignment because there is this gross alignment problem, but I also want 00:05:12.000 |
AI to know me really, really well, to understand me in the proper context and how to help me to 00:05:20.000 |
succeed. I think this data set I created is really valuable. And all of you who are not recording 00:05:25.280 |
right now, I think you should. Even if you're not going to use it right now, you'll just have this 00:05:32.240 |
data. There is no good reason not to have this data. It's a recording 4K me time. I think you 00:05:37.840 |
should. If you have your Apple Watch, just open audio noise and start recording. Start recording 00:05:42.480 |
conversations you have. It's super easy to do. You can just take your phone out of the pocket 00:05:47.120 |
and just record stuff. Thank you. To me, that's actually what a meetup should be like, that people 00:05:57.840 |
bring their stuff and they talk about their passions, what they're working on. It's not a 00:06:01.760 |
series of prepared talk after talk. You actually reminded me, I actually hacked on this iOS shortcut 00:06:07.280 |
where I can just always press my button and it starts recording anything I have in person. I've 00:06:13.360 |
actually done this in meetings. When I'm done recording, I can click it. It transcribes it, 00:06:18.880 |
saves the file, and then offers me to do a summary right after. So it's highly recommended 00:06:23.760 |
if you want automation that's simple, that doesn't require walking around with this stuff. 00:06:28.720 |
I want to share my shortcut as well. My shortcut is, I wrote some code to do it, but when I press 00:06:33.680 |
my action button on Apple Watch, it also starts recording. It listens to what I say and saves it 00:06:40.640 |
locally. When it's in connection, it sends it to the cloud and transcribes it. I see it in my 00:06:49.600 |
Notion as a list of what's recorded and then transcription next to it. It works really well. 00:06:55.440 |
I can do it on the phone plain. It works. Wow. Okay, nice. If you want to share that 00:07:02.960 |
shortcut with the rest of us, I'd love to steal it. It's really messy, but I do want to do this. 00:07:09.760 |
I want someone to improve this code because I'm working on my sample. 00:07:12.240 |
Feel free to talk to me after this. Awesome. Okay, so something a little bit more polished. 00:07:25.040 |
I think the audio should come out as well when we get to the demo. So hey, everybody. 00:07:38.080 |
Damian Murphy. I work as an applied engineer at DeepGround. What an applied engineer is, 00:07:44.560 |
is it's basically a customer-focused engineer. So we work directly with startups like yourselves, 00:07:50.160 |
and we help you build voice-enabled apps. What I'm going to show you today is really around 00:07:57.520 |
how to build essentially what you saw with Vapi, but using open-source software. So being able to 00:08:04.240 |
make something like Vapi yourself. Some of the main considerations when you're building a real-time 00:08:09.840 |
voice bot is performance, accuracy, and cost, and being able to scale. You want everything 00:08:16.240 |
with a real-time voice bot. When you called it, you had sub-second response time. So being able 00:08:21.760 |
to get that sub-second response time is super important. So essentially, if you go beyond, 00:08:27.280 |
say, 1.5, 2 seconds, a lot of people will actually say something again. They think that the person is 00:08:32.640 |
no longer there on the other end. And you need to do that for speech-to-text, the language model, 00:08:38.400 |
and the text-to-speech. And then on the accuracy, so you want to be able to understand what the 00:08:42.960 |
person says regardless of their accent. And you want to be able to do that in multiple languages 00:08:48.240 |
as well. And then on the TTS side, really being able to be human-like, that's the big challenge. 00:08:56.640 |
And then cost and scale. So you can build a lot of this stuff with off-the-shelf, 00:09:01.760 |
open-source software. And probably the text-to-speech stuff won't be fast enough, 00:09:06.400 |
or the transcription won't be fast enough. But you can actually do a lot of this with 00:09:11.040 |
managed solutions as well. I'll go into the unit economics towards the end. 00:09:15.760 |
So yeah, this is the basic setup. So you have a browser that's going to send audio. Obviously, 00:09:22.560 |
you need to interact with the browser to capture audio. That's one of the requirements for security 00:09:28.000 |
reasons. So you'll see a lot of these demos, you have to click a button to actually initiate 00:09:32.720 |
speech. The VoiceBot server, it's actually repeated here multiple times just to kind of 00:09:38.000 |
simplify things. But you can imagine this as a back and forth. You're doing browser to the 00:09:43.600 |
VoiceBot server to get the audio, and then the VoiceBot server sending that off. And the goal 00:09:49.040 |
here is to get sub-second latency. So we have around 200 millisecond latency. We can get that 00:09:55.440 |
down a lot lower if you host yourself. So that's actually what Vapi does. They host their own 00:10:00.720 |
GPUs running our software. And you can crank up the unit economics. You can say, "Hey, you know 00:10:06.880 |
what? Instead of doing it at this rate, I want to do it at 5x." So you can get that 200 milliseconds 00:10:11.680 |
down to about 50 milliseconds at the cost of extra GPU compute. And then GPD 3.5 Turbo or 4, 00:10:19.760 |
you probably get 400, maybe 600 milliseconds of latency in their hosted API. If you go into Azure 00:10:26.240 |
and you use their services, you can get that down a lot lower. And then on the text-to-speech side, 00:10:30.880 |
I'm going to show you using DeepGround's text-to-speech. There's a lot of other text-to-speech 00:10:35.760 |
providers out there. And what we try to do is have low latency, human-like at a really good 00:10:41.280 |
price point. And you can get extremely human-like at about 40x price. But that's just something to 00:10:48.000 |
keep in mind. All right. Demo time. We have two tabs open here. Okay. So hopefully, this audio 00:10:57.920 |
will play. Good evening. How can DeepGround or text-to-speech meet the specific needs of your 00:11:03.600 |
projects? Modify the sound. You're using my... Yeah, yeah. The sound output. Okay. Sorry. It's 00:11:13.200 |
going to be loud. We can't control it. I think I may have blown the speakers. Did you blow the 00:11:20.080 |
speakers? Try to tell us something. Yeah. I can use just the laptop. Yeah. Okay. Let me do that. 00:11:28.320 |
Yeah. So turn off the... Yeah. All right. Okay. Let's see if it works. 00:11:36.000 |
Good evening. Planning to integrate DeepGround or a text-to-speech into your workflow? 00:11:42.640 |
Let's discuss how to get started. Yeah. I want to talk about something else today. 00:11:46.400 |
I'm actually at an AWS hacker space for a meetup with personal AI. 00:11:50.400 |
Sounds like a cool meetup. What's on your mind today? Feel free to share your thoughts or 00:11:57.120 |
questions. Yeah. Could you give me something funny to show in my demo? Absolutely. How about a joke 00:12:04.800 |
or a lighthearted anecdote to break the ice during your demo? Laughter always helps. Yeah. That'd be 00:12:10.000 |
great. How about this one? Why did the computer go to the doctor? Wait, wait. Hang on there a 00:12:16.560 |
second. A virus. Smile. Feel free to share that. How about this one? Why did the computer go to 00:12:21.200 |
the doctor? Because it had a virus. Smile. Feel free to share that with your audience for a good 00:12:26.240 |
chuckle. Take your time. Yeah. So you can see there, it's pretty low latency. Oh, sorry. It's 00:12:31.200 |
pretty low latency. And if you want to get the source code, it's all open source. So you can just 00:12:41.120 |
grab the whole repo. You can build your own voice bot. And if you want to hook it up to a telephony 00:12:46.960 |
system, we have that all available in sample code as well. So you can hook it up to a Twilio, 00:12:52.240 |
buy a number, do those sorts of things. So let's just jump back into the presentation. 00:12:57.440 |
Oh, yeah. So the different components. So the speech to text, that's going to be super low 00:13:07.360 |
latency. If you don't get that accurate, you're going to get the wrong answer from the LLM. 00:13:12.240 |
And this is the code that you can use. So we have SDKs, Python, Go, Ruby, .NET. And you can 00:13:20.400 |
essentially use all of those. This is actually a Node.js SDK. And it's very simple to set up, 00:13:26.000 |
right? You literally just import it, drop in your API key, and you can listen to those events. So 00:13:32.320 |
we'll give you back all the text that was actually spoken while it's being spoken. And you just need 00:13:38.240 |
to send us the raw audio packets. And then on the GPD side, so you can swap this out with Cloud. 00:13:46.560 |
We actually have a fork of that repo that uses Cloud as well. Cloud haiku is surprisingly good. 00:13:52.480 |
So if cost is something we want to get down, that's definitely an option. But some of our 00:13:58.640 |
customers will actually run their own LLAMA2 model super close to the GPUs that are running 00:14:06.000 |
the speech to text and text to speech. And that just removes all the network latency out of the 00:14:10.640 |
equation. So here's a simple example of how you would consume that. I'm sure you're all pretty 00:14:16.160 |
familiar with the OpenAI API. And that will basically give you streaming. That's one of 00:14:21.680 |
the really important things here is you want time to first token as low as possible. The reason for 00:14:27.200 |
that is if you wait till the last token, you're going to increase that latency. And then on the 00:14:32.960 |
last bit, this is the text to speech part. It's a little more tricky. You've got to deal with audio 00:14:39.600 |
streams. And you're going to want to stream the audio as soon as you get it so that you can 00:14:44.480 |
actually start playing at the moment the first byte is ready. Again, with the LLM, if you wait 00:14:50.720 |
for the last token, if you wait for the last byte of your audio stream, you're going to incur that 00:14:55.600 |
bandwidth delay. So yeah, if anybody wants the open source repo, go ahead and scan that. 00:15:02.080 |
Yeah, that's a really cool demo. The latency is great. What do you think the next frontiers are? 00:15:11.040 |
You had interruptions, right? What about can it proactively interrupt you based on maybe it knows 00:15:19.280 |
what you're trying to say, like a human would just cut you off, or back channeling or overlapping 00:15:25.040 |
speech and all the more getting more towards human level kind of dialing? Yeah, so the question was, 00:15:30.800 |
could the AI interrupt the person? And that's definitely possible. I don't think that would 00:15:37.040 |
happen necessarily at the AI model level. I think that would just be business logic. 00:15:41.680 |
And so you'll get everything that's spoken as it's spoken. So if you were like, you know what, 00:15:47.600 |
I think I know what you're going to say, you could preemptively do it. And I have seen some 00:15:52.000 |
demos where that's a trick that they use to actually lower the latency is to predict what 00:15:57.280 |
you're about to say. So then you can fire off an early call to the LLM. This guy's no fun. 00:16:04.640 |
You have to send a demo, open repo, and everything. They interrupt, they can have that. 00:16:08.880 |
Very good. Yeah. And the cost of a lot of these LLM things is a big challenge as well. So if 00:16:18.000 |
you're constantly sending it to an LLM to achieve these use cases, your cost per minute might go up 00:16:26.160 |
to like 30 cents. So in this demo here, and these are all kind of list prices, you can get these 00:16:32.560 |
prices down with volume. And so if you just signed up today, these are the sorts of prices that you 00:16:39.920 |
would pay. And just to give you an idea, Cupidy 3.5 Turbo has dropped in price dramatically over 00:16:47.200 |
time. And so Claude Haiku is even a fraction of this as well. And then on the text-to-speech side, 00:16:54.160 |
doing something like this within 11 labs would be about maybe $1.20, just to give you an idea 00:17:00.880 |
of comparison. So you can do a five-minute call here for about six and a half cents. 00:17:05.040 |
And if you're doing millions of hours of calls, that price can definitely come down. 00:17:09.680 |
Yeah. So changing that then to be a real-time callable voice bot, like you saw with the VAPI 00:17:17.840 |
demo, you're essentially just swapping out the browser for this telephony service, right? So 00:17:23.680 |
Twilio has about 100 millisecond latency to give you the audio when you get called. And then you're 00:17:29.280 |
just sending it through that same system and then just back to the telephony provider. 00:17:34.240 |
And yeah, so if you sign up today, you get $200 in free credits. For post-call transcription, 00:17:41.600 |
that's about 750 hours. For real-time, that's probably about 500 hours of real-time transcription. 00:17:49.440 |
So it's a pretty big freebie there, so if anybody wants it. And that's it. Any questions? 00:17:58.080 |
Yeah, go ahead. Just in terms of achieving real-time performance, 00:18:01.680 |
GPD 3.5 versus 4, how do you- Yeah, 4 is going to be a lot slower, especially if you're using 00:18:08.800 |
their hosted API endpoint. You're going to see massive second fluctuations in their hosted 00:18:14.960 |
endpoint. If you go on to Azure and you use their service, you're paying more, but you're getting 00:18:21.360 |
much better latency. So you could deploy all of this on Azure next to GPD 4, and that's going to 00:18:29.120 |
give you the sort of latency that you saw in the VAPI demo. The demo I guess actually is using all 00:18:35.120 |
hosted APIs, so there's no on-prem kind of setup there. Ethan, Nick, and then I think Harrison 00:18:45.360 |
just walked in. So I'll just warm you guys up a bit. Thanks to David. David only signed up to 00:18:53.040 |
speak today, which is a very classic meetup strategy. So we don't have a screen for this. 00:19:02.320 |
Do we have a screen? You can share your screen. I can share my screen. All right, so I'm afraid 00:19:07.680 |
the audio might be shattering your eardrums, so I might have to cut it off, but we can try. 00:19:13.760 |
You said it was okay at the start. Yeah, I mean, okay. You can do that on the laptop, 00:19:19.760 |
no? No, no, there's no laptop. Yeah, sorry. So maybe from the start, what did you work on, 00:19:25.040 |
why? Yeah, sure. So I think actually it feels good to be here because I feel like I'm with my 00:19:29.840 |
people. You pretty much summed up the philosophy. What can AI do for you if it has your whole life 00:19:36.160 |
in this context and it experiences your life as you do? Wouldn't it be way better? I understand 00:19:42.880 |
you as all your contacts. So that's kind of, I think a lot of us here have the same idea. 00:19:50.160 |
And so that's what I've been working on for, it's about November when Lava came out. So first 00:19:56.720 |
started working on the visual component. It's not just audio. You want it to see what you see. 00:20:01.520 |
But it's a lot more challenging to get a form factor with continual video capture. So 00:20:09.840 |
built a really, really small but simple device that's actually easy to use. 00:20:15.120 |
I don't know. I mean, you guys can check it out after the talk. But I think there's a lot 00:20:25.280 |
of advantages to not having to have an extra piece of hardware to carry around, but at least 00:20:29.120 |
we try and make it as small and as light as possible. You don't have to charge it every 00:20:32.560 |
couple of days. But there's also other subtle reasons why it's good to have an external piece 00:20:38.480 |
of hardware, which I can show you in a minute. So yeah, this captures all the time. And in fact, 00:20:45.200 |
I can show you here in the app. So you can see this default. It's got the battery level here. 00:20:55.440 |
But here you can see all the conversations I've had. And this is actually an ongoing conversation. 00:21:02.240 |
So you can see where I am and transcript in real time. We do a bunch of different pipelines. So 00:21:13.200 |
after the conversation is over, we'll run it through a larger model. And then do the speaker 00:21:20.960 |
verification so it knows it's me, which is important so it can understand what I'm saying 00:21:24.720 |
versus other people in the room. Actually, conversation end pointing, and the VPN person 00:21:30.880 |
probably knows, even just utterance level end pointing is complicated at heart. When am I 00:21:35.600 |
stopping talking? Or am I going to keep talking and just pause for a second? That's hard. But 00:21:40.560 |
then conversation end pointing, when is this a distinct conversation versus another, is even 00:21:45.680 |
harder. But that's important because you don't want just one block of dialogue for your whole 00:21:50.880 |
day. It's much more useful if you can segment it into distinct semantic conversations. So that 00:21:56.800 |
involves not only voice activity detection, but things like signals around your location, 00:22:03.200 |
and even the topics that you were talking about, like at the LLM level. So it's very difficult. 00:22:08.000 |
There's still a lot of work to do, but it does work. So in the end result, I will get a summary 00:22:14.880 |
generated, some takeaways, a summary of the atmosphere. Then from the major 00:22:26.640 |
topics, I'll find some links, and then you still have the raw data. So that's the foundation layer, 00:22:33.200 |
is to have something that does the continuous capture, does the basic level of processing, 00:22:39.280 |
but that's just the base layer. I can query against the whole context of everything it 00:22:49.360 |
knows about me. I think this is, I was talking to my developer a few days ago. So here we have, 00:23:07.440 |
you know, it's through retrieval on all my conversations. This is a conversation I was 00:23:13.120 |
last talking about, and it will even cite the source. So I can just jump to the actual 00:23:18.080 |
conversation just a few days ago, or a week ago, or something. And so here we were debugging some 00:23:22.800 |
web view issues. So that's just kind of like the basic memory recall use case. I just have 00:23:30.000 |
maybe one or two more, then I'll turn it back over. So again, that's all kind of the base layer, 00:23:37.600 |
but like the real, you know, I think everybody here believes that the real future will be using 00:23:41.680 |
all of this context so that AI can be more proactive, can be more autonomous, because it 00:23:48.800 |
doesn't need to come to ask you everything. If you had the meager co-workers every day, 00:23:52.720 |
and it's a blank slate, you know, it's way less productive than if they have the whole history. 00:23:58.480 |
So there's a voice component to it, and this is, I'm a little nervous, because I'm going to blow 00:24:03.920 |
everybody's ears on that, but we can try. So we're using hot words here, which I don't believe is the 00:24:11.840 |
best paradigm, but for now, we have some other ideas, but for now, similar to Alexa or Siri, 00:24:17.680 |
I can basically inform my AI that I'm giving it a command that it should respond to in voice. 00:24:32.080 |
Yes, I can receive and understand your messages. How can I assist you today? 00:24:35.760 |
Okay, that's frightening. I will just now do one example. So like, you know, you have the ability 00:24:47.920 |
to interact with the internet, your AI should, so I can have it go do actions for me using any app. 00:24:55.120 |
So I'll just do a very simple example. Scarlett, send a message to Maria on WhatsApp saying hello. 00:25:03.920 |
One sec, I'm on it. Starting a new agent to send WhatsApp message to Maria. 00:25:10.320 |
So now this is on my personal WhatsApp account. Maria's right here. 00:25:15.120 |
She can verify that she receives it. Message hello sent to Maria on WhatsApp. 00:25:23.360 |
So maybe one more, and then I'll let you guys go. So like, that's just opening one app and doing 00:25:29.840 |
something, but can it do multiple apps and have a working memory to remember between app contexts? 00:25:36.240 |
So Scarlett, find a good taco restaurant in Pacific Heights and send it to Maria. 00:25:41.920 |
One sec, I'm on it. Starting a new agent to send taco restaurant details to Maria. 00:25:49.600 |
So it's opened Google Maps. It's going to try and find a taco restaurant, 00:25:54.320 |
hopefully remember once it does, and then send it to Maria, which it learned that I implicitly 00:26:03.200 |
met WhatsApp, right? Hopefully, because it picked up that I talked to Maria on WhatsApp. 00:26:19.120 |
The details of Taco Bar, a well-rated taco restaurant in Pacific Heights, 00:26:23.120 |
San Francisco, have been successfully sent to Maria on WhatsApp. 00:26:28.800 |
Yeah, that's a great demo. There's got to be some questions. 00:26:45.600 |
Components are open source. So there's industry, you know, Adam's here, 00:26:49.360 |
great, Nick, some really great people in the open source space here. We 00:26:53.600 |
open sourced a lot. I really learned a lot about, you know, I've done some minor open source 00:27:01.120 |
projects myself, you know, mostly I'm just a contributor, but trying to run and launch one 00:27:05.280 |
was kind of a new experience for me. And I learned that, like, we did not make the developer experience 00:27:09.920 |
very good. It was very complicated, like, because we were using, like, local whisper, local models, 00:27:15.280 |
and, like, getting it to work on CUDA, Mac, Windows, we didn't do a good job. So it was 00:27:19.120 |
very difficult for people to get started. And so it was a little disappointed with the uptake, 00:27:24.080 |
you know, and there are much better projects that are way easier to use. So really been focused now 00:27:31.120 |
on just trying to focus on figuring out what the right use cases are and what the right experiences 00:27:36.320 |
are going to be. And it was, like, really difficult to try and 00:27:40.080 |
fit everything into an open source project that would be actually used. So 00:27:45.520 |
yeah, you can see the repo, and I'd recommend Adam's and Nick's too. But we'll definitely 00:27:55.440 |
be contributing a lot back more to open source. >> Is it OWL or B? 00:27:59.520 |
>> OWL is the open source, the repo. >> Yeah, yeah, okay. 00:28:02.240 |
>> And you should, I'll plug Adam's, Adios and Nick's repo. We can, they can send out the, 00:28:09.920 |
I don't know what the actual GitHubs are called, but they're easy to find. 00:28:14.160 |
But yeah, he's going to talk, so you can show. Yeah. 00:28:19.760 |
>> Can you talk about the hardware? >> Sure, yeah, yeah. 00:28:22.400 |
>> Tell us more about it. >> This is, like, V1. So we have another, 00:28:27.760 |
like, V1.1, which is actually even about 25% smaller, and way better charging situation in 00:28:39.520 |
terms of wireless charging, so, like, the size down. But the real thing we're most excited about 00:28:44.160 |
is, like, the next version with Vision. Like I say, Vision is really hard. I don't know of any 00:28:48.320 |
device that can do all-day capture of, like, sending video. There's, like, a ton of challenges 00:28:53.680 |
around power and also bandwidth. But we have some really kind of novel ideas about it. The, 00:29:00.880 |
it's Bluetooth Low Energy, which, I mean, I'm sure you've seen other ones that operate that 00:29:10.160 |
way, and that has a lot of advantages. We also have one that's LTE, and so there's, like, LTE-M 00:29:16.240 |
is, like, a low-power subset of LTE. It's only, like, two megabits, but it's way more power 00:29:21.280 |
efficient. I think Humane and Rabbit are both LTE and Wi-Fi, but to get, like, a wearable, 00:29:30.160 |
you really need Bluetooth. Just a quick question. I saw the part of the interface where it had a, 00:29:38.400 |
was that a streaming from iPhone? It's an emulated cloud browser. With that picture-in-picture, 00:29:46.400 |
how are you doing the part of the interface where you have the Google Maps at the bottom corner? 00:29:50.800 |
That's on the cloud, so we're streaming that as feedback. I think the ultimate goal 00:29:56.800 |
is that that disappears entirely, right? That's actually mainly for the demo to, like, 00:30:01.200 |
show that it's real and that it's working. But, like, I think ideally, like, it should be totally 00:30:06.320 |
transparent that you're just, you have, like, a personal AI. It can do whatever it needs to do, 00:30:11.200 |
and it will just give you updates on if it, you know, needs more info or its status. But, 00:30:16.240 |
like, it's kind of just an interim solution until we have, I guess, so sorry. 00:30:22.160 |
Yes, it is. Okay, last question, and then we have to move on. 00:30:30.560 |
I've been experimenting with this app, and you have, like, a focus here. 00:30:37.440 |
Yeah, I just put it in my pocket. What I wanted to say about Vision is that, 00:30:40.960 |
like, I tried to experiment with capturing Vision, and, like, the best solution so far 00:30:45.040 |
I found is, like, to buy a cheaper Android smartphone and put it in your focus here, 00:30:50.080 |
like, camera outwards. It's incredible. It has an internet connection. Good battery. 00:30:54.720 |
Good battery. It's incredible. It's extremely cheap, but it's incredible. Like, 00:30:58.640 |
you have this focus. Yeah, I've had whole demos of doing that, 00:31:04.000 |
because also, you know, also, it doesn't put people off as much. 00:31:07.440 |
Yes, yes, and nobody ever thinks you're an idiot. Yes, you just have a… 00:31:11.680 |
I do think, in privacy sense, maybe slightly different than you, I do want people to 00:31:17.200 |
understand, but, like, it was interesting that, yeah, hairstyle, basically. But nobody thinks 00:31:26.880 |
twice. It's a different thing. You can just write on it. It's like, I'm recording it. I'm just 00:31:31.200 |
saying, the convenience of not working on the hardware, you can just take the… 00:31:34.320 |
Sure, sure, yeah. You do need a front pocket, so maybe there'll be new AI fashion, where it's, 00:31:40.800 |
like, these are my phone pockets. Okay, give it up for him. That was awesome. 00:31:45.040 |
Yeah, I mean, everyone will, I think, will be sticking around, so you can 00:31:54.320 |
obviously go up to him and get more insight. Awesome. 00:32:00.960 |
All right. Hey, everyone. One second. Oh, yeah, it's connected. Now it is. Nice. 00:32:08.400 |
So, where do I even start? Yeah, unfortunately, I was kind of not supposed to be here, because 00:32:16.000 |
I'm organizing a brain-computer interface hackathon tomorrow, and I had to, like, somehow 00:32:21.360 |
get 50 headsets, which is why there will be no presentation, but I will still try to be 00:32:27.200 |
useful to you as possible. This is the hackathon I was telling about right now. We have, like, 00:32:33.280 |
lots of people. There will be people from Neuralink. We'll have, like, 50 different 00:32:37.600 |
BCI headsets and so on. So, if you're interested in, like, BCI stuff, etc., and if you want 00:32:43.840 |
to attend the hackathon, scan this QR code and mention that you have been here and… 00:32:49.440 |
Oh, yeah, my bad. Sorry. And you might get much more chances to be accepted, because 00:32:56.000 |
we have, like, 50-50% rate. We try not to accept people who don't have experience. 00:33:01.280 |
Anyways, so, yeah, what I will try to help with, I honestly really, really love open 00:33:06.560 |
source, and I believe, like, all this stuff should be open source, which is why now on 00:33:10.560 |
this short demo I will just show you all current open source projects, and I will try to highlight 00:33:16.320 |
most important things you need to know about them, and I will probably first start with 00:33:20.720 |
Owl, which you have just seen by Ethan. So, he started that. I think he was, like, one 00:33:27.840 |
of the first people who started, like, open sourcing any kind of wearables. Probably Adam 00:33:32.080 |
actually was before, but you announced first, so I remember that. But, yeah, yeah, yeah. 00:33:38.080 |
Anyways, so, yeah, this is his repo. You can, like, check it out. I think I have a QR code 00:33:43.040 |
here opened as well. If not, just give me a sec. I will just generate it quickly for you. 00:33:49.440 |
All right. Should be, yeah, just scan this QR code, and you can just access his repository. 00:33:55.280 |
Yeah, so, this is Ethan's. Then there is another one, which, in my opinion, well, 00:34:02.960 |
it's definitely the biggest one. It's by the guy who's sitting right there, Adam. I truly 00:34:08.000 |
believe that Adam is, like, the guy who started all open source hardware movement, so at least 00:34:13.280 |
I started doing everything because of Adam, so thank you for that. And they have a lot 00:34:18.640 |
of traction here, 2,600 stars, and if you want to kind of, like, ramp up your way into open source, 00:34:26.240 |
like, wearables of any kind, I suggest to start with this repository. It's probably the biggest 00:34:31.840 |
one you'll be able to find right now, and the QR code, you can scan, I think, this one. Yeah, 00:34:37.120 |
just, like, feel free to scan. I'll send it. Yeah, yeah, cool, cool, cool, cool, awesome. 00:34:41.120 |
So, they use, I think, Raspberry Pi, also ESP32, which is kind of technical. You probably don't 00:34:48.080 |
need this information, but anyways. Yeah, and now, who I am, like, a little bit about myself as well, 00:34:55.040 |
some marketing. So, my story starts very recently, maybe two months ago, after I saw Ethan's and 00:35:01.040 |
Adam's launch of, like, their open source hardware stuff, and I launched my own. It all started with, 00:35:07.360 |
basically, Seasoned Disease from Humane, and we launched, like, just for fun, honestly, Pumain. 00:35:15.440 |
The idea here was, like, you take a picture, and you scan the person's face, and we searched the 00:35:21.120 |
entire internet, and we find the person's profile, like, and send it to you, while, like, either, 00:35:26.560 |
like, via notification, or on the site, and so on. So, this is how it looks. Started with that, 00:35:31.840 |
had a lot of contributors, was a fun project, but not really, I mean, like, yeah, we don't really 00:35:36.800 |
want to bring any harm to Pumain, and so on and so forth, just fun stuff. So, after that, we did, 00:35:43.280 |
oh, QR for this, I think we will also send, right? Like, cool, cool, cool, cool, cool. Okay, 00:35:49.040 |
another one I will promote here a little bit is this one. This, we launched, literally, like, 00:35:54.240 |
last week. This is pretty much what Adam and Ethan have done, with the only difference that we use 00:36:01.760 |
right now, the lowest power chip. I think Ethan also uses that. I just try to, like, you know, 00:36:06.960 |
like, as soon as possible, to let everyone know that, like, I think it's probably the best 00:36:11.440 |
opportunity you can currently find in the market. It's called Friend, you can check it out. This is 00:36:15.840 |
how it looks. I actually have it with me. And I was supposed to actually show you, like, the live 00:36:22.400 |
demo as well, but we just did a very cool update, where we made it work with Android, by the way, 00:36:30.320 |
so it's iOS and Android right now. And also, we updated the quality of speech, literally, 00:36:37.040 |
three hours ago, and made it, like, five times better. So, when we launched, I'll be honest, 00:36:41.280 |
it was, like, completely horrible, and now it's, like, five times better. It's amazing, 00:36:45.360 |
and I'm really excited by that. I would want to show you the video, but I know that it's, like, 00:36:50.320 |
you will not want to have your ears, you know, like... Oh, yeah, really? Oh, nice. Okay, let's try. 00:36:58.320 |
Anyways, yeah, this is the chip that is being used inside of that wearable, and it works pretty much 00:37:04.960 |
the same as what Ethan has with... Oh, it doesn't work, right? Next. Wait, wait, wait, wait, wait, 00:37:16.800 |
let me figure out which one. Yeah, yeah. You can go back. Okay, let's try. Oh, nice. Okay, cool. 00:37:26.800 |
Start it. Give me the permissions here, and hit next, and scan for devices here, 00:37:35.360 |
and then select the device, and that should just connect. 00:37:41.040 |
Okay, I think we're going to start speaking. Yeah, so pretty much testing this for the first time, 00:37:50.720 |
and let's see if it's able to transcribe what I'm trying to speak, and yeah, pretty much 00:37:58.640 |
waiting for the first 20 days must have finished, and for it to return us the output of the speech 00:38:09.680 |
from the OpenAI Whisper endpoint, and it says... Yeah, there we go. Yeah, so as you have seen, 00:38:21.920 |
it recorded the speech on the lowest power chip probably ever right now accessible in the market, 00:38:27.360 |
and I'm very excited that we actually made it work, because it was very, very fun. It took, 00:38:33.120 |
like, a lot of time. Anyways, that's pretty much it, I guess. I don't really have anything else for 00:38:38.880 |
you to show. This is the final QR code. I know you'll have all the links, but this one I really, 00:38:44.080 |
really advise you to scan, because it has basically the collection of, like, all links in one place. 00:38:48.800 |
So yeah, that's pretty much it. Use open source. I think it's cool, 00:38:54.640 |
and let's try to build cool stuff together. That's pretty much it. Thank you. 00:39:02.720 |
Any questions? Did you record on that? No, because... You're not recording right now. 00:39:14.320 |
No, no, no. I'm not recording, because we launched the update, like, four hours ago, 00:39:18.400 |
and I wanted to bring it here, so I broke my thing, and now it's, like, 00:39:22.640 |
half broken, unfortunately. But anyways, any questions? Cool. Yeah, go on. 00:39:31.920 |
What do you think the biggest next challenge? Biggest next challenge? Yeah, I definitely agree 00:39:38.560 |
with you that the biggest challenge is, like, editing video and images and so on. It's, like, 00:39:42.160 |
very hard as hell, and I think to make software useful as well is very, very hard. Like, yeah, 00:39:48.560 |
we can all do, maybe, like, recording from the device and, like, attach maybe some battery and 00:39:52.640 |
so on, but, like, how to make it actually, like, sexy, that's hard. Like, how to make it, you know, 00:39:58.560 |
do actions, how to make it remember everything and so on and so forth, that's the biggest challenge. 00:40:02.960 |
So, yeah, but I agree with everything you said. I would have said that the biggest challenge is 00:40:08.560 |
making people want to wear it. Oh, yeah. I don't know the tech bros or girls, or specific, probably. 00:40:15.040 |
Yeah, Adam, the guy who created the biggest open source thing, he said that the biggest 00:40:21.680 |
challenge is to make people want it, basically, right? I mean, same, yes. So, that's one of his 00:40:27.840 |
suggestions. Yeah, go on. What's been the challenge of reducing latency? 00:40:32.080 |
What's been the challenge of reducing latency? Honestly, it was just, like, software issue, 00:40:38.880 |
because, like, this chip is, like, not that widely used. There's not so much documentation, 00:40:44.640 |
not so many projects and so on. So, just, like, a matter of, like, trying other things. And also, 00:40:48.640 |
it doesn't have huge on-board memory. So, like, how do you store very good quality memory on a 00:40:54.080 |
very small, very quality audio on a small memory chip, and then send it to phones and stream it, 00:41:01.040 |
basically, non-stop? That was pretty hard. But we solved it, like, five hours ago, 00:41:05.680 |
and that's pretty much it. Yeah, go on. How did you improve the quality of voice? 00:41:10.480 |
We just made it work. Like, we used, we had 4,000, it's pretty technical, I don't think, 00:41:18.880 |
I don't know if you'll want this, but anyways, we used 4,000 hertz, like, quality. It was, like, 00:41:23.600 |
super bad, because the memory was too small. And now, we just found a way to, like, compress it 00:41:29.520 |
and improve it to, like, 16,000, which is, like, pretty great, which you have heard on the video. 00:41:35.600 |
So, it can, like, recognize pretty much anything, even, like, multiple people speaking, even if you 00:41:39.520 |
will be there and the device will be here. So, yeah. Anything else? I think we can leave the... 00:41:46.400 |
The last one. Go on. How far can the device detect audio? Cool. So, good one will be, like, 00:41:58.400 |
you know, two feet from person to person, like, good. If you have maybe, like, four feet from 00:42:03.680 |
each other, it might still be a person. Cool. Thank you all. Cool. So, what I want to talk 00:42:24.480 |
about is a lot less cool than all this hardware stuff, so I feel a little bit out of place, 00:42:29.280 |
but my name is Harrison, a co-founder of BlankChain. We build, kind of, like, 00:42:35.600 |
developer tools to make it easy to build LLM applications. One of the big parts of LLM 00:42:39.440 |
applications that we're really excited about is the concept of, kind of, like, memory and 00:42:43.600 |
personalization, and I think it's really important for personal AI, because, you know, hopefully, 00:42:48.160 |
these systems that we're building remember things about us and remember what we're doing and things 00:42:53.440 |
like that. We do absolutely nothing with hardware, so when we're exploring this, we are not building, 00:42:59.120 |
kind of, like, hardware devices, so we took a more, kind of, like, software approach. 00:43:02.080 |
I think one of the use cases where this type of, like, personalization and memory 00:43:06.320 |
is really important is in, kind of, like, a journaling app. I think, for obvious reasons, 00:43:12.640 |
when you journal, you expect it to be, kind of, like, a personal experience where you're, 00:43:15.920 |
kind of, like, sharing your goals and learnings, and I think in a scenario like that, it's really, 00:43:21.120 |
really important for, if there is an interactive experience for the LLM that you're interacting 00:43:26.480 |
with to really remember a lot of what you're talking about. So, this is something we launched, 00:43:32.640 |
I think, last week, and it's basically a journaling app with some interactive experience, 00:43:38.000 |
and we're using it as a way to, kind of, like, test out some of the memory functionality that 00:43:41.360 |
we're working on. So, I want to give a quick walkthrough of this and then maybe just share 00:43:46.960 |
some high-level thoughts on memory for LLM systems in general. So, the UX for this that we decided 00:43:53.680 |
on was you would open up, kind of, like, a new journal, and then you'd basically write a journal 00:43:58.240 |
entry, and I think this is, kind of, like, a little cheat mode as well, because I think 00:44:02.240 |
this will encourage people to say more interesting things. So, I think if you're just taking, like, 00:44:08.960 |
a regular chatbot, there's a lot of, like, hey, hi, what's up, things like that, and I don't think 00:44:13.840 |
that's actually interesting to try to, like, remember things about. I think it's more interesting 00:44:18.240 |
if you talk about personal things, and so let me try this out. I'm giving the talk right now, 00:44:28.640 |
and then I can submit this, and then the UX that we have is that a little chat with a companion 00:44:34.160 |
will open up. Okay, so, yeah, so, right before this, I told you that I was about to give a chat 00:44:39.440 |
about a journaling app, and so it, kind of, like, remembered that I was going to do all that. 00:44:47.280 |
Is there a particular part I'm most excited to share? The memory bit. I don't know. So, 00:44:54.160 |
this actually worked on the first try, so I was a bit surprised by that, so that's good. 00:44:58.960 |
And, oh, okay, so, how do you plan to tie in your love for sports with the theme of memory during 00:45:05.440 |
your talk? So, before, when I was talking to it, I had mentioned that one of the things that I 00:45:10.560 |
wanted to talk about was, you know, how a journal app should remember that I like sports, so I guess 00:45:15.280 |
it remembered that fact as well. Amazing. So, I can end the session. So, the basic idea there, 00:45:20.480 |
and again, this is, you know, we're not building this as a real application. We would love to 00:45:25.920 |
enable other people who are building applications like this. I think the thing that we're interested 00:45:30.400 |
in is really, like, what does memory look like for applications like this? And I think you can 00:45:35.840 |
see a little bit of that if you click on this memory tab here. We have, like, a user profile 00:45:41.680 |
where we basically kind of, like, show what we learned about a person over time, and then we also 00:45:46.960 |
have a more, like, semantic thing as well. So, I could search in things like Europe, 00:45:51.040 |
and I'm going to Europe kind of, like, after a wedding. I love Italy. And so, basically, 00:45:57.200 |
there's a few different forms of memory. And if you'll allow me two minutes of kind of just 00:46:05.040 |
theorizing about memory, we're doing a hackathon tomorrow, and maybe some of you are going to that, 00:46:10.880 |
sort of signed up. I don't know if he's actually going to show up. So, very quickly, like, how I 00:46:18.240 |
think memory is really, really interesting. It's also kind of, like, really vague at a high level. 00:46:23.040 |
I think, like, there's some state that you're tracking, and then how do you update that state, 00:46:28.400 |
and how do you use that state? These are, like, really kind of, like, vague things, 00:46:33.120 |
and there's a bunch of different forms that it could take. Some examples of kind of, like, 00:46:38.720 |
yeah. Some examples of memory, like, that a bunch of real apps do right now, like, 00:46:46.320 |
conversational memory is a very simple but obvious form of memory. Like, it remembers the previous 00:46:51.600 |
message that you sent. Like, that is, like, incredibly basic, but I would argue that it 00:46:55.600 |
can fall into this idea of, like, how is it updated? What's the state? How is it combined? 00:46:59.920 |
Semantic memory is another kind of similar one, where it's a pretty simple idea. You take all the 00:47:04.720 |
memory bits, you throw them into a vector store, and then you fetch the most relevant ones. And 00:47:08.560 |
then I think one of the early types of memory that we had in LingChain was this knowledge graph 00:47:13.200 |
memory, where you kind of, like, construct a knowledge graph over time, which is maybe, 00:47:16.000 |
like, overly complex for some kind of, like, use cases, but really interesting to think about. 00:47:19.760 |
So, LingMem, like, name TBD, is some of the memory things that we're working on, and we 00:47:28.320 |
kind of wanted to constrain how we're thinking about memory to make it more tractable. So, 00:47:33.040 |
we're focusing on, like, chat experiences and chat data. We're primarily focused on, like, 00:47:38.560 |
one human to one AI conversations, and we thought that flexibility in defining, like, memory schemas 00:47:44.400 |
and instructions was really important. Like, one of the things we noticed when talking to a bunch 00:47:47.600 |
of people was, like, the exact memory that their bot cared about was different based on their 00:47:52.240 |
application. If they were building, like, a SQL bot, that type of memory would be very different 00:47:56.320 |
from the journaling app, for example. So, there's a few different, like, memory types that we're 00:48:01.440 |
thinking about. All of these are very, like, early on. I think one interesting one is, like, 00:48:06.240 |
a thread-level memory. An example of this would just be, like, a summary of a conversation. You 00:48:10.320 |
could then use this. You could extract, like, follow-up items, and then in the journaling 00:48:14.720 |
app, you could kind of, like, follow up with that in the next conversation. We actually might have 00:48:18.320 |
added that. That might be why it's so good at remembering what I talked about in the previous 00:48:21.680 |
talk. I forget. Another one is this concept of a user profile. It's basically some JSON schema 00:48:27.840 |
that you can kind of, like, update over time. This is one of the newer ones we've added, 00:48:34.640 |
which is basically, like, you might want to extract, like, a list of things. Similarly, 00:48:39.520 |
like, define a schema, extract it, but it's kind of, like, append only. So, an example could be, 00:48:44.160 |
like, restaurants that I've mentioned. Maybe you want to extract the name of the restaurant or what 00:48:48.080 |
city it's in. If you're kind of, like, overwriting, if you put that as part of the user profile and 00:48:53.040 |
you overwrite that every time, that's a bit tedious, so this is, like, append only, and then 00:48:57.280 |
we do some stuff with, like, knowledge triplets as well, and that's kind of, like, the semantic bit. 00:49:01.120 |
I think probably the most interesting thing for both of these is maybe, like, how it's, like, 00:49:06.480 |
fetched. So, I don't know if people are familiar with the generative agents paper that came out of 00:49:11.360 |
Stanford last summer-ish, but I think one of the interesting things they had was this idea of 00:49:16.240 |
fetching memories not only based on semantic-ness, but also based on recency and also based on, 00:49:21.520 |
like, importance, and they'd use an LLM to, like, assign an importance score, 00:49:24.400 |
and I thought that was really, really novel and enjoyed that a lot. And, yeah, that's basically 00:49:31.760 |
it. So, yeah, lots of questions. Oh, yeah, yeah. I think they're all about all the same things, 00:49:38.000 |
and I think a lot of your approach makes a lot of sense. Same kind of compromise you have to 00:49:44.000 |
make implicit, but, like, to give a true knowledge, you talk about, like, the triplets, and, like, 00:49:50.400 |
how do you think that we can get to the point where we can have more of a dense graph rather 00:49:54.960 |
than just simple propositions about it? Because, like, our memory works, and it's all relational 00:50:00.080 |
to, like, you know, to people, to places, and that's actually important information. 00:50:04.560 |
And so, like, do you think we'll be able to figure out a simple way to do that, or is it 00:50:08.960 |
just going to be too hard? Yeah, the honest answer is I don't know. I think even today, 00:50:13.280 |
like, if you had that, and I think, like, there's two things. Like, one's, like, constructing that 00:50:18.640 |
graph, but then the other one's, like, using that in generation, and, like, even today, like, 00:50:23.520 |
for most, like, RAG situations, combining, if you combine a knowledge graph, it's often not 00:50:29.040 |
taking advantage of, like, it's really, like, it's, there's different ways to put a knowledge 00:50:36.160 |
graph into RAG, and it's not, yeah, it's very exploratory there, I'd say. So, I'd say, like, 00:50:41.440 |
one issue is just even creating that, and then the other one is, like, using that in RAG. So, 00:50:46.320 |
I think that's, like, a huge, yeah, I don't know. Yeah, we'll see, it's interesting. 00:50:52.000 |
Yeah, because of all the memory things, I just have, like, a couple points, I think. 00:51:01.760 |
No matter which memory option you're using, like, let's say you're using, like, 00:51:10.720 |
the pen-only memory model versus the knowledge graph one, like, eventually, that has to be 00:51:15.200 |
inserted into context, right? Like, you're picking the relevant sections, and then there might be, 00:51:21.680 |
like, a semantic way to figure out, okay, which ones, which facts should I embed into the context 00:51:28.400 |
during generation or during the query? But, like, I was just curious, like, is there work on 00:51:36.480 |
defining semantic models that don't leverage, like, the reasoning of the model that, sort of, like, 00:51:44.080 |
let's say you want to build up, like, not just memory, but, like, an understanding of the 00:51:50.720 |
semantics of the operations, for example, outside of depending on the language model to provide 00:52:03.200 |
that semantic interpretation on top of whatever memory context you inject? 00:52:07.600 |
If I'm understanding correctly, are you asking, like, is there a concept of building up a memory 00:52:14.880 |
about how the language model should do, like, the generation aside from just inserting it into the 00:52:18.800 |
prompt? Yeah, like, you know, the memory is just externalizing, it's basically, like, a cache for 00:52:27.680 |
to, like, get rid of the token limit, and also bring things into more attention by 00:52:32.640 |
transforming the prompt a little bit and injecting instructions about how to treat bits of memory. 00:52:40.160 |
Yeah, I was wondering, like, have you, has there been work done to 00:52:46.800 |
basically decouple the language processing versus, like, the actual, like, operational 00:52:57.680 |
semantics of using the memory? Yeah, I'm assuming there's, yeah, I think, like, an 00:53:07.600 |
alternative approach to this, which I think is what you're getting at is, like, rather than doing, 00:53:11.680 |
kind of, rather than, kind of, like, creating these strings and putting them into the prompt, 00:53:15.600 |
you could have some kind of, like, attention mechanism which would, like, attend to different 00:53:19.520 |
parts of previous conversations or something like that. I'm sure there's, I think, like, I mean, 00:53:25.200 |
you could take this to the extreme and basically say, like, for, there's a lot of stuff that could 00:53:30.160 |
fit into, like, even, like, a million token context window or a 10 million token context window. 00:53:34.880 |
And so, an alternative strategy could just be, like, hey, put everything, all these conversations 00:53:40.640 |
and all these journals I've ever had into one LLM chat and, kind of, like, let it go from there. 00:53:44.880 |
And, yeah, I'm sure people are working on things like that and doing things like some sort of, 00:53:52.400 |
like, caching to make it easy to, kind of, like, continue that. I don't, I don't know of any 00:53:58.800 |
details. But, yeah, I think that's a completely valid, completely alternative, kind of, like, 00:54:03.040 |
approach. It's also really interesting. I don't think anyone knows how to deal with memory. And 00:54:07.520 |
so, I think all these different approaches are. >> In a general way. 00:54:14.720 |
expect, you know, what's going on in our brains? >> Questions? 00:54:19.600 |
>> Yeah, so, when I see these systems, and I guess that's a question pretty much for everyone who 00:54:29.440 |
presented, right? Like, when I think about these systems, I always think, what does 10 years of 00:54:34.720 |
memory look like? And a lot of the facts that we remember are either not relevant anymore or 00:54:42.560 |
probably false. So, how do you think about, like, memory? >> Yeah, I think there absolutely 00:54:48.560 |
needs to be some sort of memory decay or some sort of, like, invalidating previous memories. 00:54:52.800 |
I think it can come in a few forms. So, like, with the generative agents, kind of, like, 00:54:56.640 |
paper, I think they tackled this by having some sort of recency weighting and then also some sort 00:55:02.880 |
of importance weighting. So, like, it doesn't matter, like, you know, how long ago it was, 00:55:07.360 |
there are some memories that I should always remember, right? But then otherwise, like, you 00:55:11.680 |
know, I remember what I had for breakfast this morning. I don't remember what I had for breakfast, 00:55:15.440 |
like, 10, like, and maybe I should, but I don't think that's, like, important. So, yeah, I think, 00:55:20.800 |
like, recency weighting and importance weighting are two really interesting things in the generative 00:55:24.880 |
AI or the generative agents paper. Another really interesting paper with a very different approach 00:55:30.720 |
is MemGPT. So, MemGPT uses a language model to, like, actively, kind of, like, construct memory. 00:55:37.520 |
So, like, in the flow of a conversation, like, the agent basically will decide whether it should 00:55:42.720 |
write to, like, short-term memory or long-term memory. I think that's a, yeah, that's a, I think 00:55:48.800 |
it's actually quite a different approach because I think in one you're having the application, like, 00:55:52.880 |
actively write to and read from memory, and then in the other one, the one that we're building, 00:55:57.840 |
it's more in the background, and I think there's pros and cons to both. But I think with that 00:56:02.080 |
approach you could potentially have some, like, overwriting of memory or, yeah. 00:56:06.000 |
>> Cool. One last question. >> Last question. Also, that generative 00:56:12.560 |
agent small bill paper, amazing. It's, like, one page. They have, like, an exponential time decay 00:56:18.480 |
every day. Stuff is less relevant. Definitely recommend it. 00:56:21.760 |
>> Hey, Harrison. Thank you for the presentation. I just want to ask about there was a new paper 00:56:27.360 |
called Rapture, I think, and it feels like it's a really cool approach to memory because sometimes 00:56:34.560 |
when you want to say something, like, who am I? Oh, roast me. It's really hard to do with rag and 00:56:40.400 |
these type of approaches, but the Rapture could be a nice way to tackle that. 00:56:45.040 |
>> Can you summarize the paper? >> I think it's about, like, doing, like, 00:56:51.600 |
partial summarization in, like, a free form, and we are trying to experiment with that, 00:56:57.760 |
and it seems like, but I think you know more about it. I just, like, found out about it a 00:57:01.840 |
couple days ago. >> Yeah, so I think the idea is basically 00:57:04.960 |
for rag, you chunk everything into really small chunks, and then cluster them, and then basically 00:57:11.040 |
hierarchically summarize them, and then when navigating it, you go down to the different nodes. 00:57:16.000 |
I hadn't actually thought of aligning it with memory, but that actually makes a ton of sense. 00:57:19.200 |
So one of the issues that we haven't really tackled is in this journaling app, 00:57:23.200 |
if you notice in here, there's a bunch of ones that are really similar, right? And so, like, 00:57:29.920 |
there's a clear kind of, like, consolidation or update or something procedure that kind of 00:57:34.560 |
needs to happen that we haven't built in yet, and so I actually love the idea of doing this kind of, 00:57:39.040 |
like, hierarchical summarization, and maybe, like, collapsing a bunch of these into one, 00:57:45.120 |
and maybe that runs as some sort of background process that you run every, I don't know, 00:57:50.960 |
yeah, you run every day, week, whatever, it collapses them, accounts for recency to account 00:57:56.480 |
for the issue that was brought up earlier around wanting to, like, maybe overwrite things. 00:58:00.400 |
Yeah, I think that's a, I had not thought of it at all, but I think that's really interesting. 00:58:04.880 |
>> Like when the brain consolidates memory. >> Yeah, yeah, yeah, yeah, yeah, yeah. 00:58:08.640 |
>> Cool, I think I've got to wrap it here. >> Thank you so much, Harrison, 00:58:12.720 |
and a round of applause for him. >> I think that's the real reason why 00:58:19.680 |
this is not just a hardware-only meetup. It started as a wearables meetup, but then we added 00:58:23.920 |
full-time voice, and then we added memory, because it's all components, including your personal AI. 00:58:29.600 |
So we have 15 minutes until we have to clear out of here. Everyone is, 00:58:32.960 |
it looks like you're sticking around to just chat. If you want to just see the devices and 00:58:36.960 |
talk to Harrison, go ahead, you may guess. There are, like, three hackathons spawning 00:58:41.520 |
from this thing, so I'll send all the links on Google, but thank you so much, thank you very much.