AI Frontiers in Trust and Safety Combatting Multifaceted Harm on Tinder at Scale: Vibhor Kumar

00:00:00.000 | .

00:00:14.100 | Hello, AI engineer World's Fair.

00:00:16.500 | My name is Vibor, VB for short, and I'm a senior AI engineer at Tinder,

00:00:21.220 | where I've been working on trust and safety for the last five years.

00:00:24.080 | I also work on and maintain some open-source AI projects,

00:00:26.940 | and I'm an advisor to a few AI startups.

00:00:29.700 | But we only have 15 minutes, so I'll jump right into it,

00:00:31.940 | maybe a little bit less than 15 now.

00:00:33.820 | Today I'll be talking about AI frontiers in trust and safety,

00:00:36.920 | combating multifaceted harm on Tinder at scale.

00:00:40.420 | We'll first go over what trust and safety actually is for everyone in the audience,

00:00:43.660 | and more specifically what it means at Tinder.

00:00:45.940 | Then we'll go over the complex interaction between generative AI and trust and safety,

00:00:50.060 | some of the problems which most people think about,

00:00:52.280 | but also some of the tremendous opportunities which will fundamentally change the space.

00:00:56.880 | Next we'll dive specifically into how to actually use LLMs

00:00:59.400 | for detecting trust and safety violations in text, covering the end-to-end stack from training to fine-tuning to productionization,

00:01:06.620 | and an overview of how we're doing this at Tinder.

00:01:09.000 | Finally we'll cover what the future looks like for this effort, and what we should be most excited about.

00:01:15.840 | So what is trust and safety actually?

00:01:17.840 | It's not really something that's well defined, and it's more of an art than a science.

00:01:21.860 | Oftentimes we have to make ethical judgment calls, but it's helpful to look at a breakdown of the goals of TNS by Del Harvey,

00:01:27.840 | who led trust and safety at Twitter for 13 years.

00:01:32.900 | Ultimately TNS is about preventing risk, reducing risk, detecting harm, and mitigating harm.

00:01:38.060 | The ultimate goal is to protect users and also the companies creating the products that they use.

00:01:43.220 | In this presentation we'll focus on the detecting harm part of it, where we devote a lot of our time to at Tinder.

00:01:50.060 | Speaking of which, as the largest dating app in the world, we encounter many, many types of violative behavior on Tinder.

00:01:56.000 | Here are some of the different categories, and a representative, synthetic textual example of each.

00:02:01.280 | First, we have social media in your profile, a relatively minor, but rather prevalent violation of our private information policy.

00:02:11.100 | And it's often done by low intent users.

00:02:13.500 | On the other side of the spectrum, we have things that are low prevalence, but high harm, things like hate speech, harassment, and pig butchering scams.

00:02:22.220 | So now that we have a sense of what TNS is, let's move on to the problems that generative AI is causing for the industry.

00:02:30.220 | One of the biggest problems is that Gen AI enables rapid generation of content, which makes it particularly easy to spread misinformation, propaganda, and low-quality spam by drowning out genuine content.

00:02:40.940 | That's what's known as the content pollution phenomenon.

00:02:43.940 | Additionally, there's some risk that platforms where the content is posted will essentially inherit the known copyright issues plaguing consumer Gen AI tools today, like OpenAI.

00:02:53.320 | Another problem is the accessibility of deepfake technology, which lowers the bar of entry to impersonation.

00:03:00.160 | and catfishing.

00:03:00.920 | This also enables malicious interpersonal harm, like in the case of revenge porn.

00:03:06.100 | Lastly, Gen AI can be used to scale up organized spam and scam operations.

00:03:10.880 | Bad actors can rapidly create profiles by generating text and images, which means that existing signals, the ones we are using today, rely on similarity matching or hashes, will be increasingly less likely to work.

00:03:21.880 | As a side note, this is why TNS teams dealing with automated fraud will need to increasingly take advantage of metadata-type signals associated with physical bottlenecks in meatspace, things like IP address, ISP information, and phone numbers.

00:03:38.640 | Now that we've covered some of the major problems that Gen AI is causing for the trust and safety industry, but there's actually some big reasons to be hopeful as well.

00:03:55.400 | The first is that AI labs at both startups and big companies have already done the hard work of pre-training and open sourcing LLMs for everyone's benefit.

00:04:03.280 | Out of the box, these are really powerful in that they have latent semantic capability and often global language coverage as well.

00:04:10.120 | Just try doing a few-shot example with prompt engineering LLM3 or Mist rule to detect any kind of violation.

00:04:17.000 | It usually works pretty well.

00:04:18.840 | By fine-tuning these models, we can actually achieve state-of-the-art performance, in some cases better than few-shot GPT-4 performance on downstream textual detection tasks.

00:04:28.880 | And the act of fine-tuning has been made a lot easier because the open source community has produced libraries and tools that are relatively mature and maintained now.

00:04:37.880 | It's easier than ever to do fine-tuning, with the low-level details being abstracted away, and there's libraries built for every stage of model development and productionization.

00:04:45.520 | The next two opportunities I wanted to bring up are things that every trust and safety organization should be paying attention to.

00:04:52.360 | First is that we're fine-tuning rather than starting from scratch, and because of that, and the strong open source library support, we can actually accelerate the model development process from months to weeks.

00:05:05.960 | And additionally, one of the major reasons we see such a good performance from the fine-tuned open source LLMs is that, in general, model performance degrades quickly in trust and safety due to the adversarial nature of automated fraud.

00:05:17.640 | For example, whenever we release a rule to block one spam wave, bad actors are incentivized to, and ultimately do change their behavior to get around it.

00:05:27.480 | But, the generalization performance of LLMs slows the model degradation curve significantly, and we basically get this for free when we use LLMs.

00:05:35.240 | Okay, so let's move on to some of the specifics of actually using LLMs for TNS violation detection.

00:05:42.520 | The first major step is creating our training dataset.

00:05:45.160 | This is often the hardest part.

00:05:48.040 | That's due, in part, to how easy some of the later steps are, as we'll see, but it's also because smaller datasets are required for fine-tuning versus training from scratch.

00:05:57.560 | In some cases, hundreds to thousands of examples.

00:06:00.760 | And this necessitates creating a pretty high-quality dataset.

00:06:04.440 | What does this dataset look like?

00:06:08.760 | Well, GPT-type LLMs, like the closed-source GPT-4 and Cloud Opus, and also like the open-source LAM and MISTRO models, can be thought of as a text-in to text-out.

00:06:18.040 | This is an approximate mental model, but it helps for understanding what the dataset should look like.

00:06:22.520 | In our case, the text-in is the potentially violating text we want the model to be able to make a prediction on, wrapped by some prompt.

00:06:30.600 | And the text-out is a classification label or some extracted characters representing the violation.

00:06:37.320 | It's not a very complicated format.

00:06:39.320 | And there's a synthetic example for how we would detect users if they're underage in their written bio.

00:06:48.120 | As for actually assembling this dataset, it's possible to do it entirely by hand, because, again, we only need hundreds to thousands of examples.

00:06:57.240 | One thing we've seen work quite well is actually to incorporate the largest LLMs in the data generation process.

00:07:04.360 | We could generate purely synthetic training examples with few-shot prompting, but this introduces some risk that the data won't resemble the true data distribution, and it's platform-agnostic.

00:07:14.360 | What we found works better is to actually do a hybrid process where we can use GPT-4 with some clever prompting to ensure we don't run into the alignment built-in to actually make predictions on internal analytics data and to mine examples for our training set that way.

00:07:29.560 | The cost of doing this is inversely proportional to the true prevalence of the harm, but that cost is still pretty negligible, and it provides a metric that alone is actually really helpful to track for TNS operations teams, anyways.

00:07:42.760 | It's possible to restrict the LLM calls to more likely candidates, and finally, when we get the mined examples from using GPT-4 effectively as an auto-moderation, we can then do a manual verification, fixing any mislabeled data, and making judgment calls where the label is more ambiguous.

00:08:00.760 | Okay, so I've got the training set, now let's talk about fine-tuning.

00:08:06.760 | One question you might have is, why don't we just directly use the API LLMs, like GPT-4, to do this detection and production?

00:08:12.760 | One fundamental reason is scale.

00:08:14.760 | One fundamental reason is scale.

00:08:16.760 | Tinder has a huge, real-time volume of profile interactions, and hitting OpenAI APIs that often doesn't scale in terms of cost, latency, and throughput.

00:08:24.760 | The other reason is maintainability.

00:08:26.760 | By fine-tuning our own models, we have full control over the model weights and can re-fine-tune when production performance inevitably degrades, without needing to worry about changes in the underlying base model.

00:08:37.760 | One additional benefit for us for classification tasks is that we have access to the output probability distribution, which means we can create, essentially, a confidence of the prediction, like in a traditional machine learning model.

00:08:52.760 | As for actually doing the fine-tuning, well, the relatively mature open-source ecosystem makes this really easy.

00:08:58.760 | Hugging face libraries make this as simple as writing a few hundred lines of code, without really needing to understand anything about deep learning or transformers.

00:09:05.760 | We've also had particular success building out training pipelines in notebooks.

00:09:10.760 | There are also libraries which abstract fine-tuning to just config files, like Axolotl, Ludwig, Lama Factory.

00:09:16.760 | And finally, there's managed solutions emerging that provide additional UI and dataset management support for rapid experimentation, like H2LM Studio and Predibase, the latter with whom we've had a lot of success with.

00:09:30.760 | So many of you are probably familiar with parameter efficient fine-tuning.

00:09:36.760 | This is critical for us.

00:09:38.760 | Low-rank adaptation, or LoRa, freezes the weights of the base model and can create a fine-tuned model while only really needing to learn megabytes of additional weights.

00:09:47.760 | Accordingly, the fine-tuning can be done quickly and only on one or a few GPUs, which enables rapid experimentation and also unlocks using larger base models.

00:09:56.760 | PEFs of larger base models are more likely to be better than full fine-tunes of smaller base models, especially for classification tasks.

00:10:03.760 | We've had a lot of success also with QLaura, which unlocks fine-tuning on a single GPU, even the largest models.

00:10:10.760 | Lastly, one of the biggest reasons to use LoRa is that we can take advantage of the massive inference optimizations, as we'll see now.

00:10:17.760 | So, production.

00:10:22.760 | In production, we use LoRaX, which is an open-source framework that allows users to efficiently serve thousands of fine-tuned models on a single GPU.

00:10:29.760 | It exploits the fact that a fine-tuned LoRa adapter, which is just a single fine-tuned model, is only a few megabytes in size.

00:10:36.760 | Many adapters can be efficiently served jointly by simply shuffling and batching adapters and requests for efficient serving.

00:10:43.760 | In practice, this means that the marginal cost of serving a new adapter on the same base model is virtually zero.

00:10:48.760 | I just want to let the implication of that sink in.

00:10:51.760 | It means that we can train adapters for the many, many different types of trust and safety violations possible.

00:10:56.760 | Hate speech, promotion, catfishing, pig butchering scams, and so on.

00:11:00.760 | And we can serve all those adapters on one or even a small set of GPUs and not need to worry about horizontal scaling.

00:11:07.760 | Incorporating a new adapter in production is as simple as storing the megabytes of weights on some file system and modifying a request to the LoRaX client.

00:11:15.760 | Special thanks to the Predibase team who developed and maintains LoRaX and have provided us a lot of support in it.

00:11:22.760 | The optimizations in LoRaX basically enable us to do real-time inference.

00:11:27.760 | And at Tinder, that's critical.

00:11:29.760 | We can support on 7 billion parameter models, tens of QPS on 100-ish milliseconds of latency on A10 GPUs.

00:11:36.760 | This is good enough for some use cases.

00:11:39.760 | And for those other use cases, the high-frequency domains,

00:11:42.760 | we can further reduce throughput by gating requests with heuristics.

00:11:47.760 | For example, for detecting social media in profiles,

00:11:51.760 | we can make predictions only on bios that contain some word that's not in a dictionary.

00:11:57.760 | And then we're also exploring doing cascade classification through some distillation process

00:12:01.760 | where we train adapters on smaller base models optimizing for recall,

00:12:05.760 | and only call the larger base model adapters when the smaller one gives a high enough score.

00:12:10.760 | Another advantage for us in this TNS space is, in general, LLM outputs are computationally expensive,

00:12:17.760 | because the generation is done autoregressively, one token at a time.

00:12:21.760 | But classification or extraction tasks require only exactly one token or a few tokens to output,

00:12:28.760 | which means our time to prediction is low.

00:12:30.760 | And compared to NLP models of the past, we're seeing that we can get massive improvements in precision and recall,

00:12:39.760 | just due to the much higher latent semantic capability of today's LLMs.

00:12:44.760 | We can achieve near 100% recall in simpler tasks like social handle detection,

00:12:48.760 | and significant improvements over the baseline in more semantically complex tasks.

00:12:54.760 | The other huge benefit that we get is way better generalization performance,

00:12:57.760 | which I've talked about before.

00:12:59.760 | In particular, this is important for TNS because it's an adversarial game.

00:13:05.760 | Bad actors and other violative users always try to avoid detection.

00:13:08.760 | For example, with intentional typos, mixing letters and numbers and innuendos.

00:13:12.760 | But LLMs are much better at making sense of these, meaning that these new adapter-based models get stale less quickly

00:13:18.760 | than other traditional machine learning models and are a better defense against harm in the long run.

00:13:23.760 | So where do we go from here?

00:13:27.760 | We're interested in the growing work on non-textual modalities and how we can leverage that for detection purposes.

00:13:33.760 | For example, we can use pre-trained visual language models like LAVA to do explicit image detection,

00:13:38.760 | and that's an active area of exploration for us.

00:13:41.760 | Overall, we're excited about rapidly training adapters for detecting harm along the long tail of TNS violations.

00:13:49.760 | We can create high-quality data sets with trust and safety operations and policy experts with that AI in the loop.

00:13:55.760 | We can automate training and retraining pipelines for fine-tuning adapters.

00:13:59.760 | And we can take advantage of Lorax to slot in new adapters for inference with low marginal cost.

00:14:05.760 | Ultimately, we can build a next-generation defensive moat against harm that takes advantage of the Gen.AI landscape today,

00:14:11.760 | ultimately leading to a safer, healthier platform.

00:14:15.760 | Thanks for listening.

00:14:16.760 | .

00:14:17.760 | .

00:14:18.760 | .

00:14:19.760 | .

00:14:20.760 | .

00:14:21.760 | .

00:14:22.760 | .

00:14:29.760 | .

00:14:30.760 | .

00:14:31.760 | .

00:14:32.760 | .

00:14:33.760 | .

00:14:34.760 | .