back to index

Books reimagined: AI to create new experiences for things you know — Lukasz Gandecki, TheBrain.pro


Whisper Transcript | Transcript Only Page

00:00:00.040 | So my name is Łukasz Gądecki and I've been programming since I was a little kid and I want
00:00:19.680 | to tell you about my newest project, Books Reimagined. So how to use AI to create new
00:00:26.520 | experiences for things you already know. So how it all started. I was reading a book
00:00:32.760 | about Donald Trump re-election and since, as you can hear, I'm not from the
00:00:38.080 | United States, there was a few too many characters to me. I didn't follow
00:00:44.040 | everyone, so I decided to vibe code my way through the understanding. I built a
00:00:50.820 | little bit of an AI companion application. It looked terrible, but it gave me
00:00:57.720 | context for the people that were on the page with a little bit. It found the
00:01:02.520 | images for them and gave me a little bit of a summary in the context of the of the
00:01:08.060 | page that I was at. And a month later, it turned into something different. So this is
00:01:21.760 | going to be the Snow Queen. This is one of the first experiences we've built. This is
00:01:25.580 | the Snow Queen book, and this is the part where the sorcerer's parentheses are flying away
00:01:31.820 | with the mirror that distorts the reality. So all right. So it tells a story about the
00:01:46.820 | flying and flying and the heaven is so far away. There's music and it reads, but you
00:01:53.440 | can't go into your time story. But then the crash happens and the mirror shutters and it
00:02:02.600 | distorts everything all around. So this is one of the first experiences we've built. But
00:02:08.820 | it's all in Polish, so I want to actually demonstrate one that we built just for this
00:02:14.560 | conference that's in English. This is 1984. And what's interesting here, which I don't think I'll be able
00:02:23.340 | to show you, is that you can send a quick voice note to the book and ask what's going
00:02:29.460 | on in this scene right now. I don't really have audio. But the point is that there's many
00:02:39.860 | different AI voice assistants, but they are almost always just terrible, if not all of them, to
00:02:48.380 | be honest, serious terrible. We had a demo from Google yesterday. They were
00:02:53.240 | saying up front that it works 50 percent. It's usually there's a delay. They start talking
00:02:59.360 | in the wrong position. I mean, at the wrong time. Then they interrupt you. So we built here
00:03:07.120 | a system where you hold it as to just specify when you are speaking and then you let it go.
00:03:15.320 | And it immediately, 100 milliseconds, responds to you. And then you could scroll further and
00:03:21.360 | then ask a question like, "What happened between the last time I asked a question and now," and
00:03:26.120 | it can summarize what's going on. So you have to believe me that. You can check later on bookgenius.net.
00:03:31.880 | Another thing that we were thinking about is the search. That's a very common thing, searching.
00:03:41.600 | So the most normal search would be just exact search. But if you want to -- the way our brains
00:03:47.480 | doesn't work, they don't memorize the pages. So if you want to find a scene where Winston
00:03:52.240 | met O'Brien, then exact search is not going to work. But embeddings work. So you can quickly
00:04:00.400 | find the scene you were thinking about this way. And then you can go to that -- go to that
00:04:06.240 | spot, read a bit more. And you can go back to the place where you were reading. But there's
00:04:12.160 | also one step forward -- I mean, one more thing you can do. You can basically say, "Talk about
00:04:20.640 | all the way the party propaganda works." And you can do deep research. And it's going to actually read the whole book
00:04:30.480 | point that you finished at to give you the answer. So it's very useful. It's going to take a couple minutes. I'm going to go back to presentation.
00:04:41.240 | So I started with VibeCoding, Vanilla.js, very confusing code. But it gave me the freedom to iterate very quickly.
00:04:49.240 | You basically don't know what you don't know. And if you start to -- especially right now, the time it takes to plan
00:04:56.360 | everything up front is often wasted. Because you can much quicker just tell your thinking to the AI and generate
00:05:02.680 | something that works. And then you see, "Oh, that's actually not that great. Let's try this and that." And I realized
00:05:09.000 | that throwing away code that you poured your heart into often feels terrible. Like, you're invested. You've spent so much time. But throwing away
00:05:16.080 | code written by AI actually feels great. So I would describe this as waves of changes. So basically, once I start
00:05:27.440 | feeling that I don't rewrite the whole code base day after day, like the amplitude of the waste is getting
00:05:34.000 | lower and lower. And there comes a time where I can start old-school engineering. I can start getting tests and
00:05:40.640 | refactor. But there are traps to refactoring. Do I refactor the worst piece of code? I would suggest
00:05:46.000 | that it's better to focus on no hanging fruits. So for example, I had a piece of a code from OpenAI audio
00:05:52.320 | processing. And it's like JavaScript, very quickly written, no types, very confusing. But I never have
00:05:58.800 | to touch it. So I'm not refactoring it. Although it was very tempting. So we often think about refactoring
00:06:05.360 | by adding this. How bad? How painful? How easy? But if something is very bad and very easy to change, but it's not
00:06:11.360 | painful at all. And it's probably not a good idea to change it. So I would suggest that it's better to
00:06:16.560 | look at how bad the code is multiplied by how painful and multiplied by how easy. And when all those factors
00:06:21.760 | are taken into consideration, then it starts making sense to make a decision. So a lot of the AI experiences
00:06:30.560 | that we see and talk about are basically either chat GPT wrappers or image generators or half-working
00:06:36.720 | useless voice assistants, including Siri. So our approach was to hide the AI from the user.
00:06:45.120 | So when we produce the books, the AI does the initial draft and we do the rest. And I would argue that the
00:06:51.360 | human touch is invaluable in situations like this. AI cannot tell if the music that it generated isn't
00:06:58.800 | good. It cannot say if the graphics are good-looking or if the avatar is actually matching the vibe of the
00:07:05.440 | person that the book is talking about. So we want to make the AI disappear. And multiple things connected
00:07:13.520 | together, simple things, simple building blocks, make for the magical experience for the reader. There's nothing
00:07:18.560 | really new here. You could already ask a friend a question about the book, but is your friend available
00:07:25.040 | 24/7 and all-knowing? Probably not. You can already search, but is the search the spoiler-free search?
00:07:31.600 | Is it natural language search or exact match? So I think that beautiful graphics help you get into the mood
00:07:39.200 | of the book and help you with the character recall. And music that matches the scene makes it the experience
00:07:46.080 | like watching a movie. And we know that music influences the emotions hugely. And it's very nice
00:07:50.960 | when you're reading the book and the music just flows with the book and gives you this great experience.
00:07:57.840 | So nothing new, but at the same time completely new, which is what AI allows us to do nowadays. And I would
00:08:04.640 | encourage everyone to think about those tiny little niches where we can create some completely new
00:08:10.080 | experiences on top of something that we have known for such a long time. So in thousands of years,
00:08:15.600 | it was never possible to read books like this or even really produce books like this. Because if I had
00:08:20.400 | to do all those graphics and music for every single book, it would cost me, I don't know, $100,000 per book.
00:08:25.920 | So it never made sense to do this. So how do we do this? The process is we use a combination of LLMs
00:08:32.640 | to the scene analysis, book characters detection. We give the AI an overall music theme. So we say for
00:08:39.600 | Sherlock Holmes books, for example, that it's like Victoria London and all that, and it should be
00:08:44.960 | norm music and kind of on a sad node. So with scene analysis plus mood detection, we do music generation,
00:08:53.840 | and we also destructured XML with metadata. So for example, we have a text like this, and AI is very good at
00:09:00.960 | doing this kind of a mapping, which then is very easy for us to use in the book when we say, like,
00:09:07.120 | we can display the avatars that are in the scene. It would be very time consuming for a person to go
00:09:12.480 | through the whole book and map every single thing like this. So today we are open sourcing the player,
00:09:17.920 | so anyone can create the Netflix-style experiences for books. And if you want AI that feels like magic,
00:09:23.840 | not like chatbots, come talk to me. We build AI experiences that ship in the light and not slides,
00:09:29.920 | although I hope the slides were nice. So thank you, and you can find me at those places.