The Hidden Life of Embeddings: Linus Lee

. Hey everyone, I'm Linus, I'm here to talk about embedding. I'm grateful to be here at the inaugural AI engineer conference. Who learned something new today? Yeah. Before I talk about that, a little bit about myself. If you don't know me already, I am Linus. I work on AI at Notion for the last year or so.

Before that, I did a lot of independent work prototyping, experimenting with, trying out different things with language models, with traditional LLP, things like TF-IDF, BM25, to build interesting interfaces for reading and writing. In particular, I worked a lot with embedding models and latent spaces of models, which is what I'll be talking about today.

But before I do that, I want to take a moment to say it's been almost a year since Notion launched Notion AI. Our public beta was first announced in around November 2022. So as we get close to a year, we've been steadily launching new and interesting features inside Notion AI.

From November, we have AI autofill inside databases, translation, and things coming soon, though not today, so keep an eye on the space. And obviously we're hiring, just like everybody else here. We're looking for AI engineers, product engineers, machine learning engineers to tackle the full gamut of problems that people have been talking about today.

Agents, tool use, evaluations, data, training, and all the interface stuff that we'll see today and tomorrow. So if you're interested, please grab me, and we'll have a little chat. Now, it wouldn't be Alan's talk without talking about latent spaces, so let's talk about it. One of the problems that I find always motivated by is the problem of steering language models.

And I always say that prompting language models feels a lot like you're steering a car from the backseat with a pool noodle. Like, yes, technically you have some control over the motion of the vehicle. It's like there's some connection. But you're not really in the driver's seat, the control isn't really there, it's not really direct.

There's like three layers of indirection between you and what the vehicle's doing. And that, to me, trying to prompt a model, especially smaller, more efficient models that we can use for production, with just tokens, just prompts, feels a lot like there's too many layers of indirection. And even though models are getting better at understanding prompts, I think there's always going to be this fundamental barrier between indirect control of models with just prompts and getting the model to do what we want them to do.

And so perhaps we can get a closer layer of control, a more direct layer of control, by looking inside the model, which is where we look at latent spaces. Latent spaces arise, I think, most famously inside embedding models. If you embed some piece of text, that vector of 1536 numbers or 1024 numbers is inside a high-dimensional vector space.

That's a latent space, but also you can look at the latent spaces inside activation spaces of models, inside token embeddings, inside image models, and then obviously other model architectures like auto-encoders. Today we're going to be looking at embedding models, but I think a lot of the general takeaways apply to other models, and I think there's a lot of fascinating research work happening inside other models as well.

When you look at an embedding, you kind of see this, right? You see rows and rows of numbers if you ever debug some kind of an embedding pipeline and you'll print out the embedding. You can kind of tell it has like a thousand numbers, but it's just looking at a matrix screen of numbers raining down.

But in theory, there's a lot of information actually packed inside those embeddings. If you get an embedding of a piece of text or image, these latent spaces, these embeddings represent, in theory, the most salient features of a text or the image that the model is using to lower its loss or do its task.

And so maybe if we can disentangle some meaningful attributes or features out of these embeddings, if we can look at them a little more closely and interpret them a little better, maybe we can build more expressive interfaces that let them control the model by interfering or intervening inside the model.

Another way to say that is that embeddings show us what the model sees in a sample of input. So maybe we can read out what it sees and try to understand better what the model's doing. And maybe we can even control the embedding, intermediate activations to see what the model can generate.

So let's see some of that. So some of this some of you might have seen before, but I promise there's some new stuff at the end, so hang tight. So here's some sentence that I have. It's a sentence about this novel, one of my favorite novels, named Diaspora. It's a science fiction novel by Greg Egan that explores evolution and existence, post-human artificial intelligences, something to do with alien civilizations and the questioning the nature of reality and consciousness, which you might be doing a lot given all the things that are happening.

And so I have trained this model that can generate some embeddings out of this text. So if I hit the center, it's going to give us an embedding. But it's an embedding of length 2048, and so it's quite large. But it's just a row of numbers, right? But then I have a decoder half of this model that can take this embedding and try to reconstruct the original input that may have produced this embedding.

So in this case, it took the original sentence. There's some variation. You can tell it's not exactly the same length, maybe. But it's mostly reconstructed the original sentence, including the specific details like the title of the book and so on. So we have an encoder that's going from text to embedding and a decoder that's going from embedding back to text.

And now we can start to do things with the embedding to vary it a little bit and see what the decoder might see if we make some modifications to the embedding. So here I've tried to kind of blur the embedding and sample some points around the embedding with this blur radius.

And you can see the text that's generated from those blurry embeddings. They're a little off. Like, this is not the correct title. The title's kind of gone here. It still kept the name Greg, but it's a different person. And so there's kind of a semantic blur that's happened here.

But this is kind of boring. This is not really useful. What's a little more useful is trying to actually manipulate things in more meaningful directions. Now we have the same taste of text. And now here I have a bunch of controls. So maybe I want to find a direction in this embedding space.

Here I've computed a direction where if you push an embedding in that direction, that's going to represent a shorter piece of text of roughly the same topic. And so I pick this direction and I hit go. And it'll try to push the embedding of this text in that direction and decode them out.

And you can tell they're a little bit shorter if I push it a little bit further even. So now I'm taking that shorter direction and moving a little farther along it and sampling, generating text out of those embeddings again. And they're even a little bit shorter. But they've still kept the general kind of idea, general topic.

And with that kind of building block, you can build really interesting interfaces. Like, for example, I can plop this piece of text down here. And maybe I want to generate a couple of sort of shorter versions. So this is like a little bit shorter. This is even more short.

But maybe I like this version. So I'm going to clone this over here. And I'm going to make the sentiment of the sentence a little more negative. And you can start to explore the latent space of this embedding model, this language model, by actually moving around in a kind of spatial canvas interface, which is kind of interesting.

Another thing you can do with this kind of embedding model is now that we have a vague sense that there are specific directions in this space that mean specific things, we can start to more directly look at a text and ask the model, hey, where does this piece of text lie along your length direction or along your negative sentiment direction?

So this is the original text that we've been playing with. It's pretty objective, like a Wikipedia-style piece of text. Here I've asked ChatGPT to take the original text and make it sound a lot more pessimistic. So things like the futile quest for meaning and plunging deeper into the abyss of nihilism.

And if I embed both of these, what I'm asking the model to do here is embed both of these things in the embedding space of the model, and then project those embeddings down onto each of these directions. So one way to read this table is that this default piece of text is at this point in this negative direction, which by itself doesn't mean anything, but it's clearly less than this.

So this piece of text is much further along the negative sentiment axis inside this model. When you look at other properties, like how much of the artistic kind of topic does it talk about is roughly the same, the length is roughly the same, maybe the negative sentiment text is a bit more elaborate in its vocabulary, and so you can start to project these things into these meaningful directions and say, what are the features of the models, what are the attributes of the models finding in the text that we're feeding it?

Another way you could test out some of these ideas is by mixing embeddings. And so here I'm going to embed both of these pieces of text. This one's the one that we've been playing with. This one is the beginning of a short story that I wrote once. It's about this town in the Mediterranean coast that's calm and a little bit old.

And both of these have been embedded. And so I'm going to say, this is a 2,000-dimensional embedding. I'm going to say, give me a new embedding that's just the first 1,000 or so dimensions from the one embedding, and then take the last 1,000 dimensions of the second embedding and just like slam them together, you're going to have this new embedding.

And naively, you wouldn't really think that that would amount too much. That would be kind of gibberish. But actually, if you generate some samples from it, you can tell, you can see in a bit, you get a sentence that's kind of a semantic mix of both. You have structural similarities to both of those things.

Like you have this structure where there's a quoted kind of title of a book in the beginning. There's topical similarities, there's punctuation similarities, tone similarities. And so this is an example of interpolating in latent space. The last thing I have, you may have seen on Twitter, is about, okay, I have this un-embedding model and I have kind of an un-embedding model.

That works pretty well. Can I use this un-embedding model and somehow fine-tune it or otherwise adapt it so we can read out text from other kinds of embedding spaces? So this is the same sentence we've been using, but now when I hit this run button, it's going to embed this text not using my embedding model, but using OpenAI's text-to-eta2.

And then there's a linear adapter that I've trained so that my decoder model can read out not from my embedding model, but from OpenAI's embedding space. So I'm going to embed it. It's going to try to decode out the text from given just the OpenAI embedding. And you can see, okay, it's not as perfect, but there's a surprising amount of detail that we've recovered out of just the embedding with no reference to the source text.

So you can see this proper noun, diaspora, it's surprisingly still in there. This feature where there's a quoted title of a book is in there. It's roughly about the same topic, things like the rogue AI. Sometimes when I rerun this, there's also references to the author where the name is roughly correct.

So even surprising features like proper nouns, punctuation, things like the quotes, general structure and topic, obviously, those are recoverable given just the embedding because of the amount of detail that these high-capacity embedding spaces have. But not only can you do this in the text space, you can also do this in image space.

So here I have a few prepared files. Let's start with me. And for dumb technical reasons, I have to put two of me in. And then let's try to interpolate in this image space. So this is now using clip, clip's embedding space. I'm going to try to generate, say, like six images in between me and the Notion avatar version of me, the cartoon version of me, if the backend will warm up, cold starting models is sometimes difficult.

There we go. So now it's generating six images, bridging, kind of interpolating between the photographic version of me and the cartoon version of me. And again, it's not perfect, but you can see here, on the left, it's quite photographic. And then as you move further down this interpolation, you're seeing more kind of cartoony features appear.

And it's actually quite a surprisingly smooth transition. Another thing you can do on top of this is you can do text manipulations as well, because clip is a multimodal text and image model. And so I can say, let's add some text. I'm going to subtract the vector for a photo of a smiling man.

And instead, I'm going to add the vector for a photo of a very sad, crying man. And then I'll embed these pieces of text. And empirically, I find that for text, I have to be a little more careful. So I'm going to dial down how much of those vectors I'm adding and subtracting.

And then generate six again. And... It's taking a bit. Okay. I'm really sad. And you can do even more fun things. Like, you can try to add... Like, here's a photo of a beach. I'm going to try to add some beach in this. This time, maybe just generate four for the sake of time.

Or maybe there's a bug and it won't let me generate. So in all these images that I've done, both in the text and image domain... Okay, the beach didn't quite survive the latent space arithmetic. But in all these demos, the only thing I'm doing is calculating vectors, calculating embeddings for examples.

And embedding them and just adding them together with some normalization. And it's surprising that just by doing that, you can try to manipulate interesting features in text and images. And with this, you can also do things like add style and subject at the same time. You can... This is a cool image that I thought I generated when I made my first demo.

And then you can also do some pretty smooth transitions between landscape imagery. So... That's interesting. In all these prototypes, one principle that I've tried to reiterate to myself is that oftentimes when you're studying this very complex, sophisticated models, you don't necessarily have the ability to look inside and say, "Okay, what's happening?" Not even getting an intuitive understanding.

Even getting an intuitive understanding of what is the model thinking, what is the model looking at, can be difficult. And I think these are some of the ways that I've tried to render these invisible parts of the model a little bit more visible. To let you a little bit more directly observe exactly what the model is.

The representations the model is operating in. And sometimes you can also take those and directly interact or let humans directly interact with the representations to explore what these spaces represent. And I think there's a ton of interesting, pretty groundbreaking research that's happening here. On the left here is the Othello world model paper, which is fascinating.

Neurons in a haystack. And then on the right is a very, very recent, I had to add this in last minute because it's super relevant. In a lot of these examples, I've calculated these feature dimensions by just giving examples and calculating centroids between them. But here, Anthropix and Newark, along with other work from Conjecture and other labs, have found unsupervised ways to try to automatically discover these dimensions inside models.

So that's super exciting. And in general, I'm really excited to see latent spaces that appear to encode, by some definition, interpretable, controllable representations of the models, input and output. I want to talk a little bit in the last few minutes about the models that I'm using. The text model is a custom model.

I won't go into too much detail, but it's fine-tuned from a T5 checkpoint as a denoising autoencoder. It's an encoder-decoder transformer with some modifications that you can see in the code. So here's a general transformer. Encoder on the left, decoder on the right. I have some pooling layers to get an embedding.

This is like a normal T5 embedding model stack. And then on the right, I have this special kind of gated layer that pulls from the embedding to decode from the embedding. You can look at the code. It's a little more easy to understand. But we take this model, and we can adapt it to other models as well, as you saw with the OpenAI embedding recovery.

And so on the left is the normal trading regime where you have an encoder, you get an embedding, and you try to reconstruct the text. On the right, we just train this linear adapter layer to go from embedding of a different model to then reconstruct the text with a normal decoder.

And today, I'm excited to share that these models that I've been dealing with, that you may have asked about before, are open on HuggingFace. So you can go download them and try them out now. These are the links. On the left is the HuggingFace models. And then there's a Colab notebook that lets you get started really quickly and try to do things like interpolation and interpretation of these features.

And so if you find any interesting results with these, please let me know. And if you have any questions, also reach out and I'll be able to help you out. The image model that I was using at the end was CacaoBrainsCarlo. Excited to see Korea stepping up there. In this model, this model is an unclip model, which is trained kind of like the way that DALI 2 was trained as a diffusion model that's trained to invert clip embedding.

So go from clip embedding of images back to text. And that lets us do similar things as the text model that we used. In all this prototyping, I think a general principle, if you have one takeaway from this talk, it's that when you're working with these really complex models and kind of inscrutable pieces of data, if you can get something into a thing that feels like it can fit in your hand that you can play with, that you can concretely see and observe and interact with, can be directly manipulated, visualized, all these things, all the tools and prototypes that you can build around these things, I think help us get a deeper understanding of how these models work and how we can improve them.

And in that way, I think models, language models and image models, generative models, are a really interesting laboratory for knowledge, for studying how these different kinds of modalities can be represented. And Brad Victor said, "The purpose of a thinking medium is to bring thought outside the head to represent these concepts in a form that can be seen with the senses and manipulated with the body.

In this way, the medium is literally an extension of the mind." And I think that's a great poetic way to kind of describe the philosophy that I've approached a lot of my prototyping with. So, if you follow some of these principles and try to dig deeper in what the models are actually looking at, build interfaces around them, I think more humane interfaces to knowledge are possible.

I'm really excited to see that future. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. I'll see you next time.

The Hidden Life of Embeddings: Linus Lee

Transcript