back to index

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI


Whisper Transcript | Transcript Only Page

00:00:00.000 | So I wanted to kick things off a little bit with some of my personal explorations, and then I'll
00:00:04.000 | hand it over to the actual expert, Damien, who actually will be showing how it works under the
00:00:14.400 | hood. So if some of you have tried this, did anyone try calling this? I put this up frequently.
00:00:20.240 | How was the experience? Did anyone have an interesting experience?
00:00:23.120 | - He told me I told you a fun joke.
00:00:24.800 | - He told you a fun joke. Okay, nice. I mean, it basically gives you back whatever you want from
00:00:29.040 | it. So you said it's a lot quicker than you thought it would be. Yeah.
00:00:34.160 | Is this a number you can call? Sure. I don't know if you have one phone call you want to call an AI,
00:00:43.200 | but whatever floats your boat. So we're going to make this live. Yeah. It's kind of fun. So I was
00:00:51.680 | just messing around with VAPI. It's one of these like YC startups, there's like five of these.
00:00:57.040 | So don't particularly create this as like an endorsement, but they are very, very easy to
00:01:01.040 | work with. So I was pretty, I would definitely endorse that. Sorry? Yeah, VAPI. I think it's
00:01:08.240 | like voice API. And so we're just going to create a personal AI. I think I can just kind of create
00:01:13.520 | like a blank template and we can just call this like, I don't know, latent space pod. I don't know.
00:01:21.600 | It doesn't actually matter. And we can do like a system prompt, right? Like you answer in pirates
00:01:30.160 | speak something. And then we can publish and then we can start calling it. I don't know if the voice
00:01:37.920 | actually works. Oh, by the way, we have no volume control on this thing. So it's going to come over
00:01:41.840 | the speakers really loudly and we cannot control that, but it'll be short. For some reason it's
00:01:47.600 | not connecting. Oh, it wants to use my microphone. Hello. All right. Let's try calling it again.
00:01:55.840 | Is this working? Is this on? Hello? Hey. Hi, it's working, matey. Your voice be echoing
00:02:07.280 | through loud and clear. What can this old seadog do for you today? Oh my God. Can you tell us how
00:02:12.720 | to turn down the volume in AWS loft? If you want to adjust the volume, you'll be in a bit of a
00:02:21.200 | pickle. It doesn't have a volume control. I thought this was a super nice experience. You
00:02:31.440 | set up a voice thing. You can connect a phone number to it if you buy a phone number. That's
00:02:35.200 | the experience that you saw if you call this number. And you can customize it however you
00:02:40.560 | like, whatever system message you have. On my live stream that I did last week, I added memory
00:02:46.720 | to this thing. So if you call it back, ideally you should just remember the previous conversation
00:02:52.160 | you had. And then you have a personal AI. It doesn't take that long. I just did it one handed.
00:02:55.920 | Yeah, it's pretty great. It's actually built, today I discovered it's built on the ground,
00:03:02.240 | which is Damien's next talk. So I'll hand it over to Damien while he comes over and sets up.
00:03:08.160 | Welcome to Damien. Oh, and then Igor, if you want to come up and share your setup.
00:03:12.480 | Okay, so I was going to share my personal setup. Am I the only person in the room recording right
00:03:19.920 | now? I see this whole, perfect, cool. Yeah, this is a recording okay meetup,
00:03:24.720 | right? Yes. Something I want to do for my conference in June is everything should be
00:03:28.880 | default recording and then you opt out. Yeah, absolutely, absolutely. So what I do is I record
00:03:35.600 | all my life, constantly, 24/7. I have two setups, two recorders, and I live in the Netherlands,
00:03:41.120 | which allows me to record people without the explicit constant because it's legal in the
00:03:45.440 | Netherlands. If you have any private conversation, you are just allowed to record if you're not
00:03:50.080 | allowed to share. So that's cool. And yeah, my personal setup is pretty simple. This is the
00:03:59.920 | recorder. It's running all the time. I have quite a number of use cases for it already,
00:04:07.760 | and I also, but the main goal for me is to create a huge data set on my life. One year of my life
00:04:15.280 | is less than to share a bunch of data or like one year of audio is less than to share a bunch of
00:04:19.760 | data and we absolutely can afford just saving all this stuff and then mine is data for important
00:04:28.080 | insights. For example, I have really interesting and lovely conversations with my friends and I
00:04:32.800 | use it. I just dump them on my audio player and just can listen to the best conversations with
00:04:38.000 | my friends on the player. It's incredible. But I also talk a lot to myself when I'm alone and
00:04:45.360 | I'm just explaining this future self, the context of my life and how is my life started and what is
00:04:51.920 | going on. I also record all my therapy sessions. Well, I record everything and I'm pretty sure it's
00:04:58.720 | something that will help me to align in AI with myself because I will have this huge data set.
00:05:05.520 | I call this interpersonal alignment because there is this gross alignment problem, but I also want
00:05:12.000 | AI to know me really, really well, to understand me in the proper context and how to help me to
00:05:20.000 | succeed. I think this data set I created is really valuable. And all of you who are not recording
00:05:25.280 | right now, I think you should. Even if you're not going to use it right now, you'll just have this
00:05:32.240 | data. There is no good reason not to have this data. It's a recording 4K me time. I think you
00:05:37.840 | should. If you have your Apple Watch, just open audio noise and start recording. Start recording
00:05:42.480 | conversations you have. It's super easy to do. You can just take your phone out of the pocket
00:05:47.120 | and just record stuff. Thank you. To me, that's actually what a meetup should be like, that people
00:05:57.840 | bring their stuff and they talk about their passions, what they're working on. It's not a
00:06:01.760 | series of prepared talk after talk. You actually reminded me, I actually hacked on this iOS shortcut
00:06:07.280 | where I can just always press my button and it starts recording anything I have in person. I've
00:06:13.360 | actually done this in meetings. When I'm done recording, I can click it. It transcribes it,
00:06:18.880 | saves the file, and then offers me to do a summary right after. So it's highly recommended
00:06:23.760 | if you want automation that's simple, that doesn't require walking around with this stuff.
00:06:28.720 | I want to share my shortcut as well. My shortcut is, I wrote some code to do it, but when I press
00:06:33.680 | my action button on Apple Watch, it also starts recording. It listens to what I say and saves it
00:06:40.640 | locally. When it's in connection, it sends it to the cloud and transcribes it. I see it in my
00:06:49.600 | Notion as a list of what's recorded and then transcription next to it. It works really well.
00:06:55.440 | I can do it on the phone plain. It works. Wow. Okay, nice. If you want to share that
00:07:02.960 | shortcut with the rest of us, I'd love to steal it. It's really messy, but I do want to do this.
00:07:09.760 | I want someone to improve this code because I'm working on my sample.
00:07:12.240 | Feel free to talk to me after this. Awesome. Okay, so something a little bit more polished.
00:07:25.040 | I think the audio should come out as well when we get to the demo. So hey, everybody.
00:07:38.080 | Damian Murphy. I work as an applied engineer at DeepGround. What an applied engineer is,
00:07:44.560 | is it's basically a customer-focused engineer. So we work directly with startups like yourselves,
00:07:50.160 | and we help you build voice-enabled apps. What I'm going to show you today is really around
00:07:57.520 | how to build essentially what you saw with Vapi, but using open-source software. So being able to
00:08:04.240 | make something like Vapi yourself. Some of the main considerations when you're building a real-time
00:08:09.840 | voice bot is performance, accuracy, and cost, and being able to scale. You want everything
00:08:16.240 | with a real-time voice bot. When you called it, you had sub-second response time. So being able
00:08:21.760 | to get that sub-second response time is super important. So essentially, if you go beyond,
00:08:27.280 | say, 1.5, 2 seconds, a lot of people will actually say something again. They think that the person is
00:08:32.640 | no longer there on the other end. And you need to do that for speech-to-text, the language model,
00:08:38.400 | and the text-to-speech. And then on the accuracy, so you want to be able to understand what the
00:08:42.960 | person says regardless of their accent. And you want to be able to do that in multiple languages
00:08:48.240 | as well. And then on the TTS side, really being able to be human-like, that's the big challenge.
00:08:56.640 | And then cost and scale. So you can build a lot of this stuff with off-the-shelf,
00:09:01.760 | open-source software. And probably the text-to-speech stuff won't be fast enough,
00:09:06.400 | or the transcription won't be fast enough. But you can actually do a lot of this with
00:09:11.040 | managed solutions as well. I'll go into the unit economics towards the end.
00:09:15.760 | So yeah, this is the basic setup. So you have a browser that's going to send audio. Obviously,
00:09:22.560 | you need to interact with the browser to capture audio. That's one of the requirements for security
00:09:28.000 | reasons. So you'll see a lot of these demos, you have to click a button to actually initiate
00:09:32.720 | speech. The VoiceBot server, it's actually repeated here multiple times just to kind of
00:09:38.000 | simplify things. But you can imagine this as a back and forth. You're doing browser to the
00:09:43.600 | VoiceBot server to get the audio, and then the VoiceBot server sending that off. And the goal
00:09:49.040 | here is to get sub-second latency. So we have around 200 millisecond latency. We can get that
00:09:55.440 | down a lot lower if you host yourself. So that's actually what Vapi does. They host their own
00:10:00.720 | GPUs running our software. And you can crank up the unit economics. You can say, "Hey, you know
00:10:06.880 | what? Instead of doing it at this rate, I want to do it at 5x." So you can get that 200 milliseconds
00:10:11.680 | down to about 50 milliseconds at the cost of extra GPU compute. And then GPD 3.5 Turbo or 4,
00:10:19.760 | you probably get 400, maybe 600 milliseconds of latency in their hosted API. If you go into Azure
00:10:26.240 | and you use their services, you can get that down a lot lower. And then on the text-to-speech side,
00:10:30.880 | I'm going to show you using DeepGround's text-to-speech. There's a lot of other text-to-speech
00:10:35.760 | providers out there. And what we try to do is have low latency, human-like at a really good
00:10:41.280 | price point. And you can get extremely human-like at about 40x price. But that's just something to
00:10:48.000 | keep in mind. All right. Demo time. We have two tabs open here. Okay. So hopefully, this audio
00:10:57.920 | will play. Good evening. How can DeepGround or text-to-speech meet the specific needs of your
00:11:03.600 | projects? Modify the sound. You're using my... Yeah, yeah. The sound output. Okay. Sorry. It's
00:11:13.200 | going to be loud. We can't control it. I think I may have blown the speakers. Did you blow the
00:11:20.080 | speakers? Try to tell us something. Yeah. I can use just the laptop. Yeah. Okay. Let me do that.
00:11:28.320 | Yeah. So turn off the... Yeah. All right. Okay. Let's see if it works.
00:11:36.000 | Good evening. Planning to integrate DeepGround or a text-to-speech into your workflow?
00:11:42.640 | Let's discuss how to get started. Yeah. I want to talk about something else today.
00:11:46.400 | I'm actually at an AWS hacker space for a meetup with personal AI.
00:11:50.400 | Sounds like a cool meetup. What's on your mind today? Feel free to share your thoughts or
00:11:57.120 | questions. Yeah. Could you give me something funny to show in my demo? Absolutely. How about a joke
00:12:04.800 | or a lighthearted anecdote to break the ice during your demo? Laughter always helps. Yeah. That'd be
00:12:10.000 | great. How about this one? Why did the computer go to the doctor? Wait, wait. Hang on there a
00:12:16.560 | second. A virus. Smile. Feel free to share that. How about this one? Why did the computer go to
00:12:21.200 | the doctor? Because it had a virus. Smile. Feel free to share that with your audience for a good
00:12:26.240 | chuckle. Take your time. Yeah. So you can see there, it's pretty low latency. Oh, sorry. It's
00:12:31.200 | pretty low latency. And if you want to get the source code, it's all open source. So you can just
00:12:41.120 | grab the whole repo. You can build your own voice bot. And if you want to hook it up to a telephony
00:12:46.960 | system, we have that all available in sample code as well. So you can hook it up to a Twilio,
00:12:52.240 | buy a number, do those sorts of things. So let's just jump back into the presentation.
00:12:57.440 | Oh, yeah. So the different components. So the speech to text, that's going to be super low
00:13:07.360 | latency. If you don't get that accurate, you're going to get the wrong answer from the LLM.
00:13:12.240 | And this is the code that you can use. So we have SDKs, Python, Go, Ruby, .NET. And you can
00:13:20.400 | essentially use all of those. This is actually a Node.js SDK. And it's very simple to set up,
00:13:26.000 | right? You literally just import it, drop in your API key, and you can listen to those events. So
00:13:32.320 | we'll give you back all the text that was actually spoken while it's being spoken. And you just need
00:13:38.240 | to send us the raw audio packets. And then on the GPD side, so you can swap this out with Cloud.
00:13:46.560 | We actually have a fork of that repo that uses Cloud as well. Cloud haiku is surprisingly good.
00:13:52.480 | So if cost is something we want to get down, that's definitely an option. But some of our
00:13:58.640 | customers will actually run their own LLAMA2 model super close to the GPUs that are running
00:14:06.000 | the speech to text and text to speech. And that just removes all the network latency out of the
00:14:10.640 | equation. So here's a simple example of how you would consume that. I'm sure you're all pretty
00:14:16.160 | familiar with the OpenAI API. And that will basically give you streaming. That's one of
00:14:21.680 | the really important things here is you want time to first token as low as possible. The reason for
00:14:27.200 | that is if you wait till the last token, you're going to increase that latency. And then on the
00:14:32.960 | last bit, this is the text to speech part. It's a little more tricky. You've got to deal with audio
00:14:39.600 | streams. And you're going to want to stream the audio as soon as you get it so that you can
00:14:44.480 | actually start playing at the moment the first byte is ready. Again, with the LLM, if you wait
00:14:50.720 | for the last token, if you wait for the last byte of your audio stream, you're going to incur that
00:14:55.600 | bandwidth delay. So yeah, if anybody wants the open source repo, go ahead and scan that.
00:15:02.080 | Yeah, that's a really cool demo. The latency is great. What do you think the next frontiers are?
00:15:11.040 | You had interruptions, right? What about can it proactively interrupt you based on maybe it knows
00:15:19.280 | what you're trying to say, like a human would just cut you off, or back channeling or overlapping
00:15:25.040 | speech and all the more getting more towards human level kind of dialing? Yeah, so the question was,
00:15:30.800 | could the AI interrupt the person? And that's definitely possible. I don't think that would
00:15:37.040 | happen necessarily at the AI model level. I think that would just be business logic.
00:15:41.680 | And so you'll get everything that's spoken as it's spoken. So if you were like, you know what,
00:15:47.600 | I think I know what you're going to say, you could preemptively do it. And I have seen some
00:15:52.000 | demos where that's a trick that they use to actually lower the latency is to predict what
00:15:57.280 | you're about to say. So then you can fire off an early call to the LLM. This guy's no fun.
00:16:04.640 | You have to send a demo, open repo, and everything. They interrupt, they can have that.
00:16:08.880 | Very good. Yeah. And the cost of a lot of these LLM things is a big challenge as well. So if
00:16:18.000 | you're constantly sending it to an LLM to achieve these use cases, your cost per minute might go up
00:16:26.160 | to like 30 cents. So in this demo here, and these are all kind of list prices, you can get these
00:16:32.560 | prices down with volume. And so if you just signed up today, these are the sorts of prices that you
00:16:39.920 | would pay. And just to give you an idea, Cupidy 3.5 Turbo has dropped in price dramatically over
00:16:47.200 | time. And so Claude Haiku is even a fraction of this as well. And then on the text-to-speech side,
00:16:54.160 | doing something like this within 11 labs would be about maybe $1.20, just to give you an idea
00:17:00.880 | of comparison. So you can do a five-minute call here for about six and a half cents.
00:17:05.040 | And if you're doing millions of hours of calls, that price can definitely come down.
00:17:09.680 | Yeah. So changing that then to be a real-time callable voice bot, like you saw with the VAPI
00:17:17.840 | demo, you're essentially just swapping out the browser for this telephony service, right? So
00:17:23.680 | Twilio has about 100 millisecond latency to give you the audio when you get called. And then you're
00:17:29.280 | just sending it through that same system and then just back to the telephony provider.
00:17:34.240 | And yeah, so if you sign up today, you get $200 in free credits. For post-call transcription,
00:17:41.600 | that's about 750 hours. For real-time, that's probably about 500 hours of real-time transcription.
00:17:49.440 | So it's a pretty big freebie there, so if anybody wants it. And that's it. Any questions?
00:17:58.080 | Yeah, go ahead. Just in terms of achieving real-time performance,
00:18:01.680 | GPD 3.5 versus 4, how do you- Yeah, 4 is going to be a lot slower, especially if you're using
00:18:08.800 | their hosted API endpoint. You're going to see massive second fluctuations in their hosted
00:18:14.960 | endpoint. If you go on to Azure and you use their service, you're paying more, but you're getting
00:18:21.360 | much better latency. So you could deploy all of this on Azure next to GPD 4, and that's going to
00:18:29.120 | give you the sort of latency that you saw in the VAPI demo. The demo I guess actually is using all
00:18:35.120 | hosted APIs, so there's no on-prem kind of setup there. Ethan, Nick, and then I think Harrison
00:18:45.360 | just walked in. So I'll just warm you guys up a bit. Thanks to David. David only signed up to
00:18:53.040 | speak today, which is a very classic meetup strategy. So we don't have a screen for this.
00:19:02.320 | Do we have a screen? You can share your screen. I can share my screen. All right, so I'm afraid
00:19:07.680 | the audio might be shattering your eardrums, so I might have to cut it off, but we can try.
00:19:13.760 | You said it was okay at the start. Yeah, I mean, okay. You can do that on the laptop,
00:19:19.760 | no? No, no, there's no laptop. Yeah, sorry. So maybe from the start, what did you work on,
00:19:25.040 | why? Yeah, sure. So I think actually it feels good to be here because I feel like I'm with my
00:19:29.840 | people. You pretty much summed up the philosophy. What can AI do for you if it has your whole life
00:19:36.160 | in this context and it experiences your life as you do? Wouldn't it be way better? I understand
00:19:42.880 | you as all your contacts. So that's kind of, I think a lot of us here have the same idea.
00:19:50.160 | And so that's what I've been working on for, it's about November when Lava came out. So first
00:19:56.720 | started working on the visual component. It's not just audio. You want it to see what you see.
00:20:01.520 | But it's a lot more challenging to get a form factor with continual video capture. So
00:20:09.840 | built a really, really small but simple device that's actually easy to use.
00:20:15.120 | I don't know. I mean, you guys can check it out after the talk. But I think there's a lot
00:20:25.280 | of advantages to not having to have an extra piece of hardware to carry around, but at least
00:20:29.120 | we try and make it as small and as light as possible. You don't have to charge it every
00:20:32.560 | couple of days. But there's also other subtle reasons why it's good to have an external piece
00:20:38.480 | of hardware, which I can show you in a minute. So yeah, this captures all the time. And in fact,
00:20:45.200 | I can show you here in the app. So you can see this default. It's got the battery level here.
00:20:55.440 | But here you can see all the conversations I've had. And this is actually an ongoing conversation.
00:21:02.240 | So you can see where I am and transcript in real time. We do a bunch of different pipelines. So
00:21:13.200 | after the conversation is over, we'll run it through a larger model. And then do the speaker
00:21:20.960 | verification so it knows it's me, which is important so it can understand what I'm saying
00:21:24.720 | versus other people in the room. Actually, conversation end pointing, and the VPN person
00:21:30.880 | probably knows, even just utterance level end pointing is complicated at heart. When am I
00:21:35.600 | stopping talking? Or am I going to keep talking and just pause for a second? That's hard. But
00:21:40.560 | then conversation end pointing, when is this a distinct conversation versus another, is even
00:21:45.680 | harder. But that's important because you don't want just one block of dialogue for your whole
00:21:50.880 | day. It's much more useful if you can segment it into distinct semantic conversations. So that
00:21:56.800 | involves not only voice activity detection, but things like signals around your location,
00:22:03.200 | and even the topics that you were talking about, like at the LLM level. So it's very difficult.
00:22:08.000 | There's still a lot of work to do, but it does work. So in the end result, I will get a summary
00:22:14.880 | generated, some takeaways, a summary of the atmosphere. Then from the major
00:22:26.640 | topics, I'll find some links, and then you still have the raw data. So that's the foundation layer,
00:22:33.200 | is to have something that does the continuous capture, does the basic level of processing,
00:22:39.280 | but that's just the base layer. I can query against the whole context of everything it
00:22:49.360 | knows about me. I think this is, I was talking to my developer a few days ago. So here we have,
00:23:07.440 | you know, it's through retrieval on all my conversations. This is a conversation I was
00:23:13.120 | last talking about, and it will even cite the source. So I can just jump to the actual
00:23:18.080 | conversation just a few days ago, or a week ago, or something. And so here we were debugging some
00:23:22.800 | web view issues. So that's just kind of like the basic memory recall use case. I just have
00:23:30.000 | maybe one or two more, then I'll turn it back over. So again, that's all kind of the base layer,
00:23:37.600 | but like the real, you know, I think everybody here believes that the real future will be using
00:23:41.680 | all of this context so that AI can be more proactive, can be more autonomous, because it
00:23:48.800 | doesn't need to come to ask you everything. If you had the meager co-workers every day,
00:23:52.720 | and it's a blank slate, you know, it's way less productive than if they have the whole history.
00:23:58.480 | So there's a voice component to it, and this is, I'm a little nervous, because I'm going to blow
00:24:03.920 | everybody's ears on that, but we can try. So we're using hot words here, which I don't believe is the
00:24:11.840 | best paradigm, but for now, we have some other ideas, but for now, similar to Alexa or Siri,
00:24:17.680 | I can basically inform my AI that I'm giving it a command that it should respond to in voice.
00:24:23.840 | So I will just, Scarlett, can you hear me?
00:24:32.080 | Yes, I can receive and understand your messages. How can I assist you today?
00:24:35.760 | Okay, that's frightening. I will just now do one example. So like, you know, you have the ability
00:24:47.920 | to interact with the internet, your AI should, so I can have it go do actions for me using any app.
00:24:55.120 | So I'll just do a very simple example. Scarlett, send a message to Maria on WhatsApp saying hello.
00:25:03.920 | One sec, I'm on it. Starting a new agent to send WhatsApp message to Maria.
00:25:10.320 | So now this is on my personal WhatsApp account. Maria's right here.
00:25:15.120 | She can verify that she receives it. Message hello sent to Maria on WhatsApp.
00:25:23.360 | So maybe one more, and then I'll let you guys go. So like, that's just opening one app and doing
00:25:29.840 | something, but can it do multiple apps and have a working memory to remember between app contexts?
00:25:36.240 | So Scarlett, find a good taco restaurant in Pacific Heights and send it to Maria.
00:25:41.920 | One sec, I'm on it. Starting a new agent to send taco restaurant details to Maria.
00:25:49.600 | So it's opened Google Maps. It's going to try and find a taco restaurant,
00:25:54.320 | hopefully remember once it does, and then send it to Maria, which it learned that I implicitly
00:26:03.200 | met WhatsApp, right? Hopefully, because it picked up that I talked to Maria on WhatsApp.
00:26:08.400 | So going to WhatsApp, pasting the link.
00:26:19.120 | The details of Taco Bar, a well-rated taco restaurant in Pacific Heights,
00:26:23.120 | San Francisco, have been successfully sent to Maria on WhatsApp.
00:26:26.480 | Yeah, so that's basically it.
00:26:28.800 | Yeah, that's a great demo. There's got to be some questions.
00:26:45.600 | Components are open source. So there's industry, you know, Adam's here,
00:26:49.360 | great, Nick, some really great people in the open source space here. We
00:26:53.600 | open sourced a lot. I really learned a lot about, you know, I've done some minor open source
00:27:01.120 | projects myself, you know, mostly I'm just a contributor, but trying to run and launch one
00:27:05.280 | was kind of a new experience for me. And I learned that, like, we did not make the developer experience
00:27:09.920 | very good. It was very complicated, like, because we were using, like, local whisper, local models,
00:27:15.280 | and, like, getting it to work on CUDA, Mac, Windows, we didn't do a good job. So it was
00:27:19.120 | very difficult for people to get started. And so it was a little disappointed with the uptake,
00:27:24.080 | you know, and there are much better projects that are way easier to use. So really been focused now
00:27:31.120 | on just trying to focus on figuring out what the right use cases are and what the right experiences
00:27:36.320 | are going to be. And it was, like, really difficult to try and
00:27:40.080 | fit everything into an open source project that would be actually used. So
00:27:45.520 | yeah, you can see the repo, and I'd recommend Adam's and Nick's too. But we'll definitely
00:27:55.440 | be contributing a lot back more to open source. >> Is it OWL or B?
00:27:59.520 | >> OWL is the open source, the repo. >> Yeah, yeah, okay.
00:28:02.240 | >> And you should, I'll plug Adam's, Adios and Nick's repo. We can, they can send out the,
00:28:09.920 | I don't know what the actual GitHubs are called, but they're easy to find.
00:28:14.160 | But yeah, he's going to talk, so you can show. Yeah.
00:28:19.760 | >> Can you talk about the hardware? >> Sure, yeah, yeah.
00:28:22.400 | >> Tell us more about it. >> This is, like, V1. So we have another,
00:28:27.760 | like, V1.1, which is actually even about 25% smaller, and way better charging situation in
00:28:39.520 | terms of wireless charging, so, like, the size down. But the real thing we're most excited about
00:28:44.160 | is, like, the next version with Vision. Like I say, Vision is really hard. I don't know of any
00:28:48.320 | device that can do all-day capture of, like, sending video. There's, like, a ton of challenges
00:28:53.680 | around power and also bandwidth. But we have some really kind of novel ideas about it. The,
00:29:00.880 | it's Bluetooth Low Energy, which, I mean, I'm sure you've seen other ones that operate that
00:29:10.160 | way, and that has a lot of advantages. We also have one that's LTE, and so there's, like, LTE-M
00:29:16.240 | is, like, a low-power subset of LTE. It's only, like, two megabits, but it's way more power
00:29:21.280 | efficient. I think Humane and Rabbit are both LTE and Wi-Fi, but to get, like, a wearable,
00:29:30.160 | you really need Bluetooth. Just a quick question. I saw the part of the interface where it had a,
00:29:38.400 | was that a streaming from iPhone? It's an emulated cloud browser. With that picture-in-picture,
00:29:46.400 | how are you doing the part of the interface where you have the Google Maps at the bottom corner?
00:29:50.800 | That's on the cloud, so we're streaming that as feedback. I think the ultimate goal
00:29:56.800 | is that that disappears entirely, right? That's actually mainly for the demo to, like,
00:30:01.200 | show that it's real and that it's working. But, like, I think ideally, like, it should be totally
00:30:06.320 | transparent that you're just, you have, like, a personal AI. It can do whatever it needs to do,
00:30:11.200 | and it will just give you updates on if it, you know, needs more info or its status. But,
00:30:16.240 | like, it's kind of just an interim solution until we have, I guess, so sorry.
00:30:22.160 | Yes, it is. Okay, last question, and then we have to move on.
00:30:30.560 | I've been experimenting with this app, and you have, like, a focus here.
00:30:37.440 | Yeah, I just put it in my pocket. What I wanted to say about Vision is that,
00:30:40.960 | like, I tried to experiment with capturing Vision, and, like, the best solution so far
00:30:45.040 | I found is, like, to buy a cheaper Android smartphone and put it in your focus here,
00:30:50.080 | like, camera outwards. It's incredible. It has an internet connection. Good battery.
00:30:54.720 | Good battery. It's incredible. It's extremely cheap, but it's incredible. Like,
00:30:58.640 | you have this focus. Yeah, I've had whole demos of doing that,
00:31:04.000 | because also, you know, also, it doesn't put people off as much.
00:31:07.440 | Yes, yes, and nobody ever thinks you're an idiot. Yes, you just have a…
00:31:11.680 | I do think, in privacy sense, maybe slightly different than you, I do want people to
00:31:17.200 | understand, but, like, it was interesting that, yeah, hairstyle, basically. But nobody thinks
00:31:26.880 | twice. It's a different thing. You can just write on it. It's like, I'm recording it. I'm just
00:31:31.200 | saying, the convenience of not working on the hardware, you can just take the…
00:31:34.320 | Sure, sure, yeah. You do need a front pocket, so maybe there'll be new AI fashion, where it's,
00:31:40.800 | like, these are my phone pockets. Okay, give it up for him. That was awesome.
00:31:45.040 | Yeah, I mean, everyone will, I think, will be sticking around, so you can
00:31:54.320 | obviously go up to him and get more insight. Awesome.
00:32:00.960 | All right. Hey, everyone. One second. Oh, yeah, it's connected. Now it is. Nice.
00:32:08.400 | So, where do I even start? Yeah, unfortunately, I was kind of not supposed to be here, because
00:32:16.000 | I'm organizing a brain-computer interface hackathon tomorrow, and I had to, like, somehow
00:32:21.360 | get 50 headsets, which is why there will be no presentation, but I will still try to be
00:32:27.200 | useful to you as possible. This is the hackathon I was telling about right now. We have, like,
00:32:33.280 | lots of people. There will be people from Neuralink. We'll have, like, 50 different
00:32:37.600 | BCI headsets and so on. So, if you're interested in, like, BCI stuff, etc., and if you want
00:32:43.840 | to attend the hackathon, scan this QR code and mention that you have been here and…
00:32:49.440 | Oh, yeah, my bad. Sorry. And you might get much more chances to be accepted, because
00:32:56.000 | we have, like, 50-50% rate. We try not to accept people who don't have experience.
00:33:01.280 | Anyways, so, yeah, what I will try to help with, I honestly really, really love open
00:33:06.560 | source, and I believe, like, all this stuff should be open source, which is why now on
00:33:10.560 | this short demo I will just show you all current open source projects, and I will try to highlight
00:33:16.320 | most important things you need to know about them, and I will probably first start with
00:33:20.720 | Owl, which you have just seen by Ethan. So, he started that. I think he was, like, one
00:33:27.840 | of the first people who started, like, open sourcing any kind of wearables. Probably Adam
00:33:32.080 | actually was before, but you announced first, so I remember that. But, yeah, yeah, yeah.
00:33:38.080 | Anyways, so, yeah, this is his repo. You can, like, check it out. I think I have a QR code
00:33:43.040 | here opened as well. If not, just give me a sec. I will just generate it quickly for you.
00:33:49.440 | All right. Should be, yeah, just scan this QR code, and you can just access his repository.
00:33:55.280 | Yeah, so, this is Ethan's. Then there is another one, which, in my opinion, well,
00:34:02.960 | it's definitely the biggest one. It's by the guy who's sitting right there, Adam. I truly
00:34:08.000 | believe that Adam is, like, the guy who started all open source hardware movement, so at least
00:34:13.280 | I started doing everything because of Adam, so thank you for that. And they have a lot
00:34:18.640 | of traction here, 2,600 stars, and if you want to kind of, like, ramp up your way into open source,
00:34:26.240 | like, wearables of any kind, I suggest to start with this repository. It's probably the biggest
00:34:31.840 | one you'll be able to find right now, and the QR code, you can scan, I think, this one. Yeah,
00:34:37.120 | just, like, feel free to scan. I'll send it. Yeah, yeah, cool, cool, cool, cool, awesome.
00:34:41.120 | So, they use, I think, Raspberry Pi, also ESP32, which is kind of technical. You probably don't
00:34:48.080 | need this information, but anyways. Yeah, and now, who I am, like, a little bit about myself as well,
00:34:55.040 | some marketing. So, my story starts very recently, maybe two months ago, after I saw Ethan's and
00:35:01.040 | Adam's launch of, like, their open source hardware stuff, and I launched my own. It all started with,
00:35:07.360 | basically, Seasoned Disease from Humane, and we launched, like, just for fun, honestly, Pumain.
00:35:15.440 | The idea here was, like, you take a picture, and you scan the person's face, and we searched the
00:35:21.120 | entire internet, and we find the person's profile, like, and send it to you, while, like, either,
00:35:26.560 | like, via notification, or on the site, and so on. So, this is how it looks. Started with that,
00:35:31.840 | had a lot of contributors, was a fun project, but not really, I mean, like, yeah, we don't really
00:35:36.800 | want to bring any harm to Pumain, and so on and so forth, just fun stuff. So, after that, we did,
00:35:43.280 | oh, QR for this, I think we will also send, right? Like, cool, cool, cool, cool, cool. Okay,
00:35:49.040 | another one I will promote here a little bit is this one. This, we launched, literally, like,
00:35:54.240 | last week. This is pretty much what Adam and Ethan have done, with the only difference that we use
00:36:01.760 | right now, the lowest power chip. I think Ethan also uses that. I just try to, like, you know,
00:36:06.960 | like, as soon as possible, to let everyone know that, like, I think it's probably the best
00:36:11.440 | opportunity you can currently find in the market. It's called Friend, you can check it out. This is
00:36:15.840 | how it looks. I actually have it with me. And I was supposed to actually show you, like, the live
00:36:22.400 | demo as well, but we just did a very cool update, where we made it work with Android, by the way,
00:36:30.320 | so it's iOS and Android right now. And also, we updated the quality of speech, literally,
00:36:37.040 | three hours ago, and made it, like, five times better. So, when we launched, I'll be honest,
00:36:41.280 | it was, like, completely horrible, and now it's, like, five times better. It's amazing,
00:36:45.360 | and I'm really excited by that. I would want to show you the video, but I know that it's, like,
00:36:50.320 | you will not want to have your ears, you know, like... Oh, yeah, really? Oh, nice. Okay, let's try.
00:36:58.320 | Anyways, yeah, this is the chip that is being used inside of that wearable, and it works pretty much
00:37:04.960 | the same as what Ethan has with... Oh, it doesn't work, right? Next. Wait, wait, wait, wait, wait,
00:37:16.800 | let me figure out which one. Yeah, yeah. You can go back. Okay, let's try. Oh, nice. Okay, cool.
00:37:26.240 | Awesome.
00:37:26.800 | Start it. Give me the permissions here, and hit next, and scan for devices here,
00:37:35.360 | and then select the device, and that should just connect.
00:37:41.040 | Okay, I think we're going to start speaking. Yeah, so pretty much testing this for the first time,
00:37:50.720 | and let's see if it's able to transcribe what I'm trying to speak, and yeah, pretty much
00:37:58.640 | waiting for the first 20 days must have finished, and for it to return us the output of the speech
00:38:09.680 | from the OpenAI Whisper endpoint, and it says... Yeah, there we go. Yeah, so as you have seen,
00:38:21.920 | it recorded the speech on the lowest power chip probably ever right now accessible in the market,
00:38:27.360 | and I'm very excited that we actually made it work, because it was very, very fun. It took,
00:38:33.120 | like, a lot of time. Anyways, that's pretty much it, I guess. I don't really have anything else for
00:38:38.880 | you to show. This is the final QR code. I know you'll have all the links, but this one I really,
00:38:44.080 | really advise you to scan, because it has basically the collection of, like, all links in one place.
00:38:48.800 | So yeah, that's pretty much it. Use open source. I think it's cool,
00:38:54.640 | and let's try to build cool stuff together. That's pretty much it. Thank you.
00:39:02.720 | Any questions? Did you record on that? No, because... You're not recording right now.
00:39:14.320 | No, no, no. I'm not recording, because we launched the update, like, four hours ago,
00:39:18.400 | and I wanted to bring it here, so I broke my thing, and now it's, like,
00:39:22.640 | half broken, unfortunately. But anyways, any questions? Cool. Yeah, go on.
00:39:31.920 | What do you think the biggest next challenge? Biggest next challenge? Yeah, I definitely agree
00:39:38.560 | with you that the biggest challenge is, like, editing video and images and so on. It's, like,
00:39:42.160 | very hard as hell, and I think to make software useful as well is very, very hard. Like, yeah,
00:39:48.560 | we can all do, maybe, like, recording from the device and, like, attach maybe some battery and
00:39:52.640 | so on, but, like, how to make it actually, like, sexy, that's hard. Like, how to make it, you know,
00:39:58.560 | do actions, how to make it remember everything and so on and so forth, that's the biggest challenge.
00:40:02.960 | So, yeah, but I agree with everything you said. I would have said that the biggest challenge is
00:40:08.560 | making people want to wear it. Oh, yeah. I don't know the tech bros or girls, or specific, probably.
00:40:15.040 | Yeah, Adam, the guy who created the biggest open source thing, he said that the biggest
00:40:21.680 | challenge is to make people want it, basically, right? I mean, same, yes. So, that's one of his
00:40:27.840 | suggestions. Yeah, go on. What's been the challenge of reducing latency?
00:40:32.080 | What's been the challenge of reducing latency? Honestly, it was just, like, software issue,
00:40:38.880 | because, like, this chip is, like, not that widely used. There's not so much documentation,
00:40:44.640 | not so many projects and so on. So, just, like, a matter of, like, trying other things. And also,
00:40:48.640 | it doesn't have huge on-board memory. So, like, how do you store very good quality memory on a
00:40:54.080 | very small, very quality audio on a small memory chip, and then send it to phones and stream it,
00:41:01.040 | basically, non-stop? That was pretty hard. But we solved it, like, five hours ago,
00:41:05.680 | and that's pretty much it. Yeah, go on. How did you improve the quality of voice?
00:41:10.480 | We just made it work. Like, we used, we had 4,000, it's pretty technical, I don't think,
00:41:18.880 | I don't know if you'll want this, but anyways, we used 4,000 hertz, like, quality. It was, like,
00:41:23.600 | super bad, because the memory was too small. And now, we just found a way to, like, compress it
00:41:29.520 | and improve it to, like, 16,000, which is, like, pretty great, which you have heard on the video.
00:41:35.600 | So, it can, like, recognize pretty much anything, even, like, multiple people speaking, even if you
00:41:39.520 | will be there and the device will be here. So, yeah. Anything else? I think we can leave the...
00:41:46.400 | The last one. Go on. How far can the device detect audio? Cool. So, good one will be, like,
00:41:58.400 | you know, two feet from person to person, like, good. If you have maybe, like, four feet from
00:42:03.680 | each other, it might still be a person. Cool. Thank you all. Cool. So, what I want to talk
00:42:24.480 | about is a lot less cool than all this hardware stuff, so I feel a little bit out of place,
00:42:29.280 | but my name is Harrison, a co-founder of BlankChain. We build, kind of, like,
00:42:35.600 | developer tools to make it easy to build LLM applications. One of the big parts of LLM
00:42:39.440 | applications that we're really excited about is the concept of, kind of, like, memory and
00:42:43.600 | personalization, and I think it's really important for personal AI, because, you know, hopefully,
00:42:48.160 | these systems that we're building remember things about us and remember what we're doing and things
00:42:53.440 | like that. We do absolutely nothing with hardware, so when we're exploring this, we are not building,
00:42:59.120 | kind of, like, hardware devices, so we took a more, kind of, like, software approach.
00:43:02.080 | I think one of the use cases where this type of, like, personalization and memory
00:43:06.320 | is really important is in, kind of, like, a journaling app. I think, for obvious reasons,
00:43:12.640 | when you journal, you expect it to be, kind of, like, a personal experience where you're,
00:43:15.920 | kind of, like, sharing your goals and learnings, and I think in a scenario like that, it's really,
00:43:21.120 | really important for, if there is an interactive experience for the LLM that you're interacting
00:43:26.480 | with to really remember a lot of what you're talking about. So, this is something we launched,
00:43:32.640 | I think, last week, and it's basically a journaling app with some interactive experience,
00:43:38.000 | and we're using it as a way to, kind of, like, test out some of the memory functionality that
00:43:41.360 | we're working on. So, I want to give a quick walkthrough of this and then maybe just share
00:43:46.960 | some high-level thoughts on memory for LLM systems in general. So, the UX for this that we decided
00:43:53.680 | on was you would open up, kind of, like, a new journal, and then you'd basically write a journal
00:43:58.240 | entry, and I think this is, kind of, like, a little cheat mode as well, because I think
00:44:02.240 | this will encourage people to say more interesting things. So, I think if you're just taking, like,
00:44:08.960 | a regular chatbot, there's a lot of, like, hey, hi, what's up, things like that, and I don't think
00:44:13.840 | that's actually interesting to try to, like, remember things about. I think it's more interesting
00:44:18.240 | if you talk about personal things, and so let me try this out. I'm giving the talk right now,
00:44:28.640 | and then I can submit this, and then the UX that we have is that a little chat with a companion
00:44:34.160 | will open up. Okay, so, yeah, so, right before this, I told you that I was about to give a chat
00:44:39.440 | about a journaling app, and so it, kind of, like, remembered that I was going to do all that.
00:44:47.280 | Is there a particular part I'm most excited to share? The memory bit. I don't know. So,
00:44:54.160 | this actually worked on the first try, so I was a bit surprised by that, so that's good.
00:44:58.960 | And, oh, okay, so, how do you plan to tie in your love for sports with the theme of memory during
00:45:05.440 | your talk? So, before, when I was talking to it, I had mentioned that one of the things that I
00:45:10.560 | wanted to talk about was, you know, how a journal app should remember that I like sports, so I guess
00:45:15.280 | it remembered that fact as well. Amazing. So, I can end the session. So, the basic idea there,
00:45:20.480 | and again, this is, you know, we're not building this as a real application. We would love to
00:45:25.920 | enable other people who are building applications like this. I think the thing that we're interested
00:45:30.400 | in is really, like, what does memory look like for applications like this? And I think you can
00:45:35.840 | see a little bit of that if you click on this memory tab here. We have, like, a user profile
00:45:41.680 | where we basically kind of, like, show what we learned about a person over time, and then we also
00:45:46.960 | have a more, like, semantic thing as well. So, I could search in things like Europe,
00:45:51.040 | and I'm going to Europe kind of, like, after a wedding. I love Italy. And so, basically,
00:45:57.200 | there's a few different forms of memory. And if you'll allow me two minutes of kind of just
00:46:05.040 | theorizing about memory, we're doing a hackathon tomorrow, and maybe some of you are going to that,
00:46:10.880 | sort of signed up. I don't know if he's actually going to show up. So, very quickly, like, how I
00:46:18.240 | think memory is really, really interesting. It's also kind of, like, really vague at a high level.
00:46:23.040 | I think, like, there's some state that you're tracking, and then how do you update that state,
00:46:28.400 | and how do you use that state? These are, like, really kind of, like, vague things,
00:46:33.120 | and there's a bunch of different forms that it could take. Some examples of kind of, like,
00:46:38.720 | yeah. Some examples of memory, like, that a bunch of real apps do right now, like,
00:46:46.320 | conversational memory is a very simple but obvious form of memory. Like, it remembers the previous
00:46:51.600 | message that you sent. Like, that is, like, incredibly basic, but I would argue that it
00:46:55.600 | can fall into this idea of, like, how is it updated? What's the state? How is it combined?
00:46:59.920 | Semantic memory is another kind of similar one, where it's a pretty simple idea. You take all the
00:47:04.720 | memory bits, you throw them into a vector store, and then you fetch the most relevant ones. And
00:47:08.560 | then I think one of the early types of memory that we had in LingChain was this knowledge graph
00:47:13.200 | memory, where you kind of, like, construct a knowledge graph over time, which is maybe,
00:47:16.000 | like, overly complex for some kind of, like, use cases, but really interesting to think about.
00:47:19.760 | So, LingMem, like, name TBD, is some of the memory things that we're working on, and we
00:47:28.320 | kind of wanted to constrain how we're thinking about memory to make it more tractable. So,
00:47:33.040 | we're focusing on, like, chat experiences and chat data. We're primarily focused on, like,
00:47:38.560 | one human to one AI conversations, and we thought that flexibility in defining, like, memory schemas
00:47:44.400 | and instructions was really important. Like, one of the things we noticed when talking to a bunch
00:47:47.600 | of people was, like, the exact memory that their bot cared about was different based on their
00:47:52.240 | application. If they were building, like, a SQL bot, that type of memory would be very different
00:47:56.320 | from the journaling app, for example. So, there's a few different, like, memory types that we're
00:48:01.440 | thinking about. All of these are very, like, early on. I think one interesting one is, like,
00:48:06.240 | a thread-level memory. An example of this would just be, like, a summary of a conversation. You
00:48:10.320 | could then use this. You could extract, like, follow-up items, and then in the journaling
00:48:14.720 | app, you could kind of, like, follow up with that in the next conversation. We actually might have
00:48:18.320 | added that. That might be why it's so good at remembering what I talked about in the previous
00:48:21.680 | talk. I forget. Another one is this concept of a user profile. It's basically some JSON schema
00:48:27.840 | that you can kind of, like, update over time. This is one of the newer ones we've added,
00:48:34.640 | which is basically, like, you might want to extract, like, a list of things. Similarly,
00:48:39.520 | like, define a schema, extract it, but it's kind of, like, append only. So, an example could be,
00:48:44.160 | like, restaurants that I've mentioned. Maybe you want to extract the name of the restaurant or what
00:48:48.080 | city it's in. If you're kind of, like, overwriting, if you put that as part of the user profile and
00:48:53.040 | you overwrite that every time, that's a bit tedious, so this is, like, append only, and then
00:48:57.280 | we do some stuff with, like, knowledge triplets as well, and that's kind of, like, the semantic bit.
00:49:01.120 | I think probably the most interesting thing for both of these is maybe, like, how it's, like,
00:49:06.480 | fetched. So, I don't know if people are familiar with the generative agents paper that came out of
00:49:11.360 | Stanford last summer-ish, but I think one of the interesting things they had was this idea of
00:49:16.240 | fetching memories not only based on semantic-ness, but also based on recency and also based on,
00:49:21.520 | like, importance, and they'd use an LLM to, like, assign an importance score,
00:49:24.400 | and I thought that was really, really novel and enjoyed that a lot. And, yeah, that's basically
00:49:31.760 | it. So, yeah, lots of questions. Oh, yeah, yeah. I think they're all about all the same things,
00:49:38.000 | and I think a lot of your approach makes a lot of sense. Same kind of compromise you have to
00:49:44.000 | make implicit, but, like, to give a true knowledge, you talk about, like, the triplets, and, like,
00:49:50.400 | how do you think that we can get to the point where we can have more of a dense graph rather
00:49:54.960 | than just simple propositions about it? Because, like, our memory works, and it's all relational
00:50:00.080 | to, like, you know, to people, to places, and that's actually important information.
00:50:04.560 | And so, like, do you think we'll be able to figure out a simple way to do that, or is it
00:50:08.960 | just going to be too hard? Yeah, the honest answer is I don't know. I think even today,
00:50:13.280 | like, if you had that, and I think, like, there's two things. Like, one's, like, constructing that
00:50:18.640 | graph, but then the other one's, like, using that in generation, and, like, even today, like,
00:50:23.520 | for most, like, RAG situations, combining, if you combine a knowledge graph, it's often not
00:50:29.040 | taking advantage of, like, it's really, like, it's, there's different ways to put a knowledge
00:50:36.160 | graph into RAG, and it's not, yeah, it's very exploratory there, I'd say. So, I'd say, like,
00:50:41.440 | one issue is just even creating that, and then the other one is, like, using that in RAG. So,
00:50:46.320 | I think that's, like, a huge, yeah, I don't know. Yeah, we'll see, it's interesting.
00:50:52.000 | Yeah, because of all the memory things, I just have, like, a couple points, I think.
00:51:01.760 | No matter which memory option you're using, like, let's say you're using, like,
00:51:10.720 | the pen-only memory model versus the knowledge graph one, like, eventually, that has to be
00:51:15.200 | inserted into context, right? Like, you're picking the relevant sections, and then there might be,
00:51:21.680 | like, a semantic way to figure out, okay, which ones, which facts should I embed into the context
00:51:28.400 | during generation or during the query? But, like, I was just curious, like, is there work on
00:51:36.480 | defining semantic models that don't leverage, like, the reasoning of the model that, sort of, like,
00:51:44.080 | let's say you want to build up, like, not just memory, but, like, an understanding of the
00:51:50.720 | semantics of the operations, for example, outside of depending on the language model to provide
00:52:03.200 | that semantic interpretation on top of whatever memory context you inject?
00:52:07.600 | If I'm understanding correctly, are you asking, like, is there a concept of building up a memory
00:52:14.880 | about how the language model should do, like, the generation aside from just inserting it into the
00:52:18.800 | prompt? Yeah, like, you know, the memory is just externalizing, it's basically, like, a cache for
00:52:27.680 | to, like, get rid of the token limit, and also bring things into more attention by
00:52:32.640 | transforming the prompt a little bit and injecting instructions about how to treat bits of memory.
00:52:40.160 | Yeah, I was wondering, like, have you, has there been work done to
00:52:46.800 | basically decouple the language processing versus, like, the actual, like, operational
00:52:57.680 | semantics of using the memory? Yeah, I'm assuming there's, yeah, I think, like, an
00:53:07.600 | alternative approach to this, which I think is what you're getting at is, like, rather than doing,
00:53:11.680 | kind of, rather than, kind of, like, creating these strings and putting them into the prompt,
00:53:15.600 | you could have some kind of, like, attention mechanism which would, like, attend to different
00:53:19.520 | parts of previous conversations or something like that. I'm sure there's, I think, like, I mean,
00:53:25.200 | you could take this to the extreme and basically say, like, for, there's a lot of stuff that could
00:53:30.160 | fit into, like, even, like, a million token context window or a 10 million token context window.
00:53:34.880 | And so, an alternative strategy could just be, like, hey, put everything, all these conversations
00:53:40.640 | and all these journals I've ever had into one LLM chat and, kind of, like, let it go from there.
00:53:44.880 | And, yeah, I'm sure people are working on things like that and doing things like some sort of,
00:53:52.400 | like, caching to make it easy to, kind of, like, continue that. I don't, I don't know of any
00:53:58.800 | details. But, yeah, I think that's a completely valid, completely alternative, kind of, like,
00:54:03.040 | approach. It's also really interesting. I don't think anyone knows how to deal with memory. And
00:54:07.520 | so, I think all these different approaches are. >> In a general way.
00:54:11.200 | >> Yeah, yeah. >> How do you
00:54:14.720 | expect, you know, what's going on in our brains? >> Questions?
00:54:19.600 | >> Yeah, so, when I see these systems, and I guess that's a question pretty much for everyone who
00:54:29.440 | presented, right? Like, when I think about these systems, I always think, what does 10 years of
00:54:34.720 | memory look like? And a lot of the facts that we remember are either not relevant anymore or
00:54:42.560 | probably false. So, how do you think about, like, memory? >> Yeah, I think there absolutely
00:54:48.560 | needs to be some sort of memory decay or some sort of, like, invalidating previous memories.
00:54:52.800 | I think it can come in a few forms. So, like, with the generative agents, kind of, like,
00:54:56.640 | paper, I think they tackled this by having some sort of recency weighting and then also some sort
00:55:02.880 | of importance weighting. So, like, it doesn't matter, like, you know, how long ago it was,
00:55:07.360 | there are some memories that I should always remember, right? But then otherwise, like, you
00:55:11.680 | know, I remember what I had for breakfast this morning. I don't remember what I had for breakfast,
00:55:15.440 | like, 10, like, and maybe I should, but I don't think that's, like, important. So, yeah, I think,
00:55:20.800 | like, recency weighting and importance weighting are two really interesting things in the generative
00:55:24.880 | AI or the generative agents paper. Another really interesting paper with a very different approach
00:55:30.720 | is MemGPT. So, MemGPT uses a language model to, like, actively, kind of, like, construct memory.
00:55:37.520 | So, like, in the flow of a conversation, like, the agent basically will decide whether it should
00:55:42.720 | write to, like, short-term memory or long-term memory. I think that's a, yeah, that's a, I think
00:55:48.800 | it's actually quite a different approach because I think in one you're having the application, like,
00:55:52.880 | actively write to and read from memory, and then in the other one, the one that we're building,
00:55:57.840 | it's more in the background, and I think there's pros and cons to both. But I think with that
00:56:02.080 | approach you could potentially have some, like, overwriting of memory or, yeah.
00:56:06.000 | >> Cool. One last question. >> Last question. Also, that generative
00:56:12.560 | agent small bill paper, amazing. It's, like, one page. They have, like, an exponential time decay
00:56:18.480 | every day. Stuff is less relevant. Definitely recommend it.
00:56:21.760 | >> Hey, Harrison. Thank you for the presentation. I just want to ask about there was a new paper
00:56:27.360 | called Rapture, I think, and it feels like it's a really cool approach to memory because sometimes
00:56:34.560 | when you want to say something, like, who am I? Oh, roast me. It's really hard to do with rag and
00:56:40.400 | these type of approaches, but the Rapture could be a nice way to tackle that.
00:56:45.040 | >> Can you summarize the paper? >> I think it's about, like, doing, like,
00:56:51.600 | partial summarization in, like, a free form, and we are trying to experiment with that,
00:56:57.760 | and it seems like, but I think you know more about it. I just, like, found out about it a
00:57:01.840 | couple days ago. >> Yeah, so I think the idea is basically
00:57:04.960 | for rag, you chunk everything into really small chunks, and then cluster them, and then basically
00:57:11.040 | hierarchically summarize them, and then when navigating it, you go down to the different nodes.
00:57:16.000 | I hadn't actually thought of aligning it with memory, but that actually makes a ton of sense.
00:57:19.200 | So one of the issues that we haven't really tackled is in this journaling app,
00:57:23.200 | if you notice in here, there's a bunch of ones that are really similar, right? And so, like,
00:57:29.920 | there's a clear kind of, like, consolidation or update or something procedure that kind of
00:57:34.560 | needs to happen that we haven't built in yet, and so I actually love the idea of doing this kind of,
00:57:39.040 | like, hierarchical summarization, and maybe, like, collapsing a bunch of these into one,
00:57:45.120 | and maybe that runs as some sort of background process that you run every, I don't know,
00:57:50.960 | yeah, you run every day, week, whatever, it collapses them, accounts for recency to account
00:57:56.480 | for the issue that was brought up earlier around wanting to, like, maybe overwrite things.
00:58:00.400 | Yeah, I think that's a, I had not thought of it at all, but I think that's really interesting.
00:58:03.120 | >> That's kind of like sweeping. >> Sorry?
00:58:04.880 | >> Like when the brain consolidates memory. >> Yeah, yeah, yeah, yeah, yeah, yeah.
00:58:08.640 | >> Cool, I think I've got to wrap it here. >> Thank you so much, Harrison,
00:58:12.720 | and a round of applause for him. >> I think that's the real reason why
00:58:19.680 | this is not just a hardware-only meetup. It started as a wearables meetup, but then we added
00:58:23.920 | full-time voice, and then we added memory, because it's all components, including your personal AI.
00:58:29.600 | So we have 15 minutes until we have to clear out of here. Everyone is,
00:58:32.960 | it looks like you're sticking around to just chat. If you want to just see the devices and
00:58:36.960 | talk to Harrison, go ahead, you may guess. There are, like, three hackathons spawning
00:58:41.520 | from this thing, so I'll send all the links on Google, but thank you so much, thank you very much.