back to indexBooks reimagined: AI to create new experiences for things you know — Lukasz Gandecki, TheBrain.pro

00:00:00.040 |
So my name is Łukasz Gądecki and I've been programming since I was a little kid and I want 00:00:19.680 |
to tell you about my newest project, Books Reimagined. So how to use AI to create new 00:00:26.520 |
experiences for things you already know. So how it all started. I was reading a book 00:00:32.760 |
about Donald Trump re-election and since, as you can hear, I'm not from the 00:00:38.080 |
United States, there was a few too many characters to me. I didn't follow 00:00:44.040 |
everyone, so I decided to vibe code my way through the understanding. I built a 00:00:50.820 |
little bit of an AI companion application. It looked terrible, but it gave me 00:00:57.720 |
context for the people that were on the page with a little bit. It found the 00:01:02.520 |
images for them and gave me a little bit of a summary in the context of the of the 00:01:08.060 |
page that I was at. And a month later, it turned into something different. So this is 00:01:21.760 |
going to be the Snow Queen. This is one of the first experiences we've built. This is 00:01:25.580 |
the Snow Queen book, and this is the part where the sorcerer's parentheses are flying away 00:01:31.820 |
with the mirror that distorts the reality. So all right. So it tells a story about the 00:01:46.820 |
flying and flying and the heaven is so far away. There's music and it reads, but you 00:01:53.440 |
can't go into your time story. But then the crash happens and the mirror shutters and it 00:02:02.600 |
distorts everything all around. So this is one of the first experiences we've built. But 00:02:08.820 |
it's all in Polish, so I want to actually demonstrate one that we built just for this 00:02:14.560 |
conference that's in English. This is 1984. And what's interesting here, which I don't think I'll be able 00:02:23.340 |
to show you, is that you can send a quick voice note to the book and ask what's going 00:02:29.460 |
on in this scene right now. I don't really have audio. But the point is that there's many 00:02:39.860 |
different AI voice assistants, but they are almost always just terrible, if not all of them, to 00:02:48.380 |
be honest, serious terrible. We had a demo from Google yesterday. They were 00:02:53.240 |
saying up front that it works 50 percent. It's usually there's a delay. They start talking 00:02:59.360 |
in the wrong position. I mean, at the wrong time. Then they interrupt you. So we built here 00:03:07.120 |
a system where you hold it as to just specify when you are speaking and then you let it go. 00:03:15.320 |
And it immediately, 100 milliseconds, responds to you. And then you could scroll further and 00:03:21.360 |
then ask a question like, "What happened between the last time I asked a question and now," and 00:03:26.120 |
it can summarize what's going on. So you have to believe me that. You can check later on bookgenius.net. 00:03:31.880 |
Another thing that we were thinking about is the search. That's a very common thing, searching. 00:03:41.600 |
So the most normal search would be just exact search. But if you want to -- the way our brains 00:03:47.480 |
doesn't work, they don't memorize the pages. So if you want to find a scene where Winston 00:03:52.240 |
met O'Brien, then exact search is not going to work. But embeddings work. So you can quickly 00:04:00.400 |
find the scene you were thinking about this way. And then you can go to that -- go to that 00:04:06.240 |
spot, read a bit more. And you can go back to the place where you were reading. But there's 00:04:12.160 |
also one step forward -- I mean, one more thing you can do. You can basically say, "Talk about 00:04:20.640 |
all the way the party propaganda works." And you can do deep research. And it's going to actually read the whole book 00:04:30.480 |
point that you finished at to give you the answer. So it's very useful. It's going to take a couple minutes. I'm going to go back to presentation. 00:04:41.240 |
So I started with VibeCoding, Vanilla.js, very confusing code. But it gave me the freedom to iterate very quickly. 00:04:49.240 |
You basically don't know what you don't know. And if you start to -- especially right now, the time it takes to plan 00:04:56.360 |
everything up front is often wasted. Because you can much quicker just tell your thinking to the AI and generate 00:05:02.680 |
something that works. And then you see, "Oh, that's actually not that great. Let's try this and that." And I realized 00:05:09.000 |
that throwing away code that you poured your heart into often feels terrible. Like, you're invested. You've spent so much time. But throwing away 00:05:16.080 |
code written by AI actually feels great. So I would describe this as waves of changes. So basically, once I start 00:05:27.440 |
feeling that I don't rewrite the whole code base day after day, like the amplitude of the waste is getting 00:05:34.000 |
lower and lower. And there comes a time where I can start old-school engineering. I can start getting tests and 00:05:40.640 |
refactor. But there are traps to refactoring. Do I refactor the worst piece of code? I would suggest 00:05:46.000 |
that it's better to focus on no hanging fruits. So for example, I had a piece of a code from OpenAI audio 00:05:52.320 |
processing. And it's like JavaScript, very quickly written, no types, very confusing. But I never have 00:05:58.800 |
to touch it. So I'm not refactoring it. Although it was very tempting. So we often think about refactoring 00:06:05.360 |
by adding this. How bad? How painful? How easy? But if something is very bad and very easy to change, but it's not 00:06:11.360 |
painful at all. And it's probably not a good idea to change it. So I would suggest that it's better to 00:06:16.560 |
look at how bad the code is multiplied by how painful and multiplied by how easy. And when all those factors 00:06:21.760 |
are taken into consideration, then it starts making sense to make a decision. So a lot of the AI experiences 00:06:30.560 |
that we see and talk about are basically either chat GPT wrappers or image generators or half-working 00:06:36.720 |
useless voice assistants, including Siri. So our approach was to hide the AI from the user. 00:06:45.120 |
So when we produce the books, the AI does the initial draft and we do the rest. And I would argue that the 00:06:51.360 |
human touch is invaluable in situations like this. AI cannot tell if the music that it generated isn't 00:06:58.800 |
good. It cannot say if the graphics are good-looking or if the avatar is actually matching the vibe of the 00:07:05.440 |
person that the book is talking about. So we want to make the AI disappear. And multiple things connected 00:07:13.520 |
together, simple things, simple building blocks, make for the magical experience for the reader. There's nothing 00:07:18.560 |
really new here. You could already ask a friend a question about the book, but is your friend available 00:07:25.040 |
24/7 and all-knowing? Probably not. You can already search, but is the search the spoiler-free search? 00:07:31.600 |
Is it natural language search or exact match? So I think that beautiful graphics help you get into the mood 00:07:39.200 |
of the book and help you with the character recall. And music that matches the scene makes it the experience 00:07:46.080 |
like watching a movie. And we know that music influences the emotions hugely. And it's very nice 00:07:50.960 |
when you're reading the book and the music just flows with the book and gives you this great experience. 00:07:57.840 |
So nothing new, but at the same time completely new, which is what AI allows us to do nowadays. And I would 00:08:04.640 |
encourage everyone to think about those tiny little niches where we can create some completely new 00:08:10.080 |
experiences on top of something that we have known for such a long time. So in thousands of years, 00:08:15.600 |
it was never possible to read books like this or even really produce books like this. Because if I had 00:08:20.400 |
to do all those graphics and music for every single book, it would cost me, I don't know, $100,000 per book. 00:08:25.920 |
So it never made sense to do this. So how do we do this? The process is we use a combination of LLMs 00:08:32.640 |
to the scene analysis, book characters detection. We give the AI an overall music theme. So we say for 00:08:39.600 |
Sherlock Holmes books, for example, that it's like Victoria London and all that, and it should be 00:08:44.960 |
norm music and kind of on a sad node. So with scene analysis plus mood detection, we do music generation, 00:08:53.840 |
and we also destructured XML with metadata. So for example, we have a text like this, and AI is very good at 00:09:00.960 |
doing this kind of a mapping, which then is very easy for us to use in the book when we say, like, 00:09:07.120 |
we can display the avatars that are in the scene. It would be very time consuming for a person to go 00:09:12.480 |
through the whole book and map every single thing like this. So today we are open sourcing the player, 00:09:17.920 |
so anyone can create the Netflix-style experiences for books. And if you want AI that feels like magic, 00:09:23.840 |
not like chatbots, come talk to me. We build AI experiences that ship in the light and not slides, 00:09:29.920 |
although I hope the slides were nice. So thank you, and you can find me at those places.