- Can you briefly speak to the Alexa Prize for people who are not familiar with it and also just maybe where things stand and what have you learned and what's surprising? What have you seen that's surprising from this incredible competition? - Absolutely, it's a very exciting competition. Alexa Prize is essentially a grand challenge in conversational artificial intelligence where we threw the gauntlet to the universities who do active research in the field to say, can you build what we call a social bot that can converse with you coherently and engagingly for 20 minutes?
That is an extremely hard challenge talking to someone who you're meeting for the first time or even if you've met them quite often, to speak at 20 minutes on any topic, an evolving nature of topics is super hard. We have completed two successful years of the competition. The first was one with the University of Washington, second, the University of California.
We are in our third instance. We have an extremely strong team of 10 cohorts and the third instance of the Alexa Prize is underway now. And we are seeing a constant evolution. First year was definitely a learning. It was a lot of things to be put together. We had to build a lot of infrastructure to enable these universities to be able to build magical experiences and do high quality research.
- Just a few quick questions, sorry for the interruption. What does failure look like in the 20 minute session? So what does it mean to fail not to reach the 20 minute mark? - Oh, awesome question. So there are one, first of all, I forgot to mention one more detail.
It's not just 20 minutes, but the quality of the conversation too that matters. And the beauty of this competition before I answer that question on what failure means is first that you actually converse with millions and millions of customers as the social bots. So during the judging phases, there are multiple phases, before we get to the finals, which is a very controlled judging in a situation where we bring in judges and we have interactors who interact with these social bots, that is a much more controlled setting.
But till the point we get to the finals, all the judging is essentially by the customers of Alexa. And there you basically rate on a simple question how good your experience was. So that's where we are not testing for a 20 minute boundary being crossed, because you do want it to be very much like a clear cut winner be chosen, and it's an absolute bar.
So did you really break that 20 minute barrier is why we have to test it in a more controlled setting with actors, essentially interactors, and see how the conversation goes. So this is why it's a subtle difference between how it's being tested in the field with real customers versus in the lab to award the prize.
So on the latter one, what it means is that essentially there are three judges, and two of them have to say this conversation has stalled essentially. - Got it, and the judges are human experts? - Judges are human experts. - Okay, great. So this is in the third year.
So what's been the evolution? How far, so the DARPA challenge in the first year, the autonomous vehicles, nobody finished in the second year, a few more finished in the desert. So how far along in this, I would say much harder challenge are we? - This challenge has come a long way to the extent that we're definitely not close to the 20 minute barrier being with coherence and engaging conversation.
I think we are still five to 10 years away in that horizon to complete that. But the progress is immense. Like what you're finding is the accuracy in what kind of responses these social bots generate is getting better and better. What's even amazing to see that now there's humor coming in.
The bots are quite-- - Awesome. (laughs) - You're talking about ultimate science of intelligence. I think humor is a very high bar in terms of what it takes to create humor. And I don't mean just being goofy. I really mean good sense of humor is also a sign of intelligence in my mind and something very hard to do.
So these social bots are now exploring not only what we think of natural language abilities, but also personality attributes and aspects of when to inject an appropriate joke, when you don't know the domain, how you come back with something more intelligible so that you can continue the conversation. If you and I are talking about AI and we are domain experts, we can speak to it.
But if you suddenly switch the topic to that, I don't know of, how do I change the conversation? So you're starting to notice these elements as well. And that's coming from partly by the nature of the 20 minute challenge that people are getting quite clever on how to really converse and essentially mask some of the understanding defects if they exist.
- So some of this, this is not Alexa the product. This is somewhat for fun, for research, for innovation and so on. I have a question sort of in this modern era, there's a lot of, if you look at Twitter and Facebook and so on, there's discourse, public discourse going on.
And some things that are a little bit too edgy, people get blocked and so on. I'm just out of curiosity, are people in this context pushing the limits? Is anyone using the F word? Is anyone sort of pushing back, sort of arguing, I guess I should say, as part of the dialogue to really draw people in?
- First of all, let me just back up a bit in terms of why we are doing this, right? So you said it's fun. I think fun is more part of the engaging part for customers. It is one of the most used skills as well in our skill store.
But that apart, the real goal was essentially what was happening is with a lot of AI research moving to industry, we felt that academia has the risk of not being able to have the same resources at disposal that we have, which is lots of data, massive computing power, and a clear ways to test these AI advances with real customer benefits.
So we brought all these three together in the Alexa Prize. That's why it's one of my favorite projects in Amazon. And with that, the secondary effect is, yes, it has become engaging for our customers as well. We're not there in terms of where we want it to be, right?
But it's a huge progress. But coming back to your question on how do the conversations evolve? Yes, there is some natural attributes of what you said in terms of argument and some amount of swearing. The way we take care of that is that there is a sensitive filter we have built.
- That's some keywords and so on. - It's more than keywords, a little more in terms of, of course, there's keyword based too, but there's more in terms of, these words can be very contextual, as you can see. And also the topic can be something that you don't want a conversation to happen because this is a communal device as well.
A lot of people use these devices. So we have put a lot of guardrails for the conversation to be more useful for advancing AI and not so much of these other issues you attributed, what's happening in the AI field as well. - Right, so this is actually a serious opportunity.
I didn't use the right word, fun. I think it's an open opportunity to do some, some of the best innovation in conversational agents in the world. - Absolutely. - Why just universities? - Why just universities? Because as I said, I really felt- - Young minds. - Young minds, it's also to, if you think about the other aspect of where the whole industry is moving with AI, there's a dearth of talent in, given the demands.
So you do want universities to have a clear place where they can invent and research and not fall behind with that they can't motivate students. Imagine all grad students left to, to industry like us or faculty members, which has happened too. So this is a way that if you're so passionate about the field where you feel industry and academia need to work well, this is a great example and a great way for universities to participate.
- So what do you think it takes to build a system that wins the Alexa Prize? - I think you have to start focusing on aspects of reasoning that it is, there are still more lookups of what intents the customer is asking for and responding to those rather than really reasoning about the elements of the conversation.
For instance, if you have, if you're playing, if the conversation is about games and it's about a recent sports event, there's so much context involved and you have to understand the entities that are being mentioned so that the conversation is coherent rather than you suddenly just switch to knowing some fact about a sports entity and you're just relaying that rather than understanding the true context of the game.
Like if you just said, I learned this fun fact about Tom Brady rather than really say how he played the game the previous night, then the conversation is not really that intelligent. So you have to go to more reasoning elements of understanding the context of the dialogue and giving more appropriate responses, which tells you that we are still quite far because a lot of times it's more facts being looked up and something that's close enough as an answer but not really the answer.
So that is where the research needs to go more in actual true understanding and reasoning. And that's why I feel it's a great way to do it because you have an engaged set of users working to make, help these AI advances happen in this case. - You mentioned customers there quite a bit and there's a skill.
What is the experience for the user that's helping? So just to clarify, this isn't, as far as I understand, the Alexa, so this skill is a standalone for the Alexa Prize. I mean, it's focused on the Alexa Prize. It's not you ordering certain things and it was like, oh, we're checking the weather or playing Spotify, right?
It's a separate skill. - Exactly. - And so you're focused on helping that. I don't know, how do people, how do customers think of it? Are they having fun? Are they helping teach the system? What's the experience like? - I think it's both actually. And let me tell you how you invoke the skill.
So all you have to say, Alexa, let's chat. And then the first time you say, Alexa, let's chat, it comes back with a clear message that you're interacting with one of those university social bots. And there's a clear, so you know exactly how you interact, right? And that is why it's very transparent.
You are being asked to help, right? And we have a lot of mechanisms where as the, we are in the first phase of feedback phase, then you send a lot of emails to our customers and then they know that the team needs a lot of interactions to improve the accuracy of the system.
So we know we have a lot of customers who really want to help these university bots and they're conversing with that. And some are just having fun with just saying, Alexa, let's chat. And also some adversarial behavior to see whether, how much do you understand as a social bot?
So I think we have a good healthy mix of all three situations. - So what is the, if we talk about solving the Alexa challenge, the Alexa prize, what's the data set of really engaging, pleasant conversations look like? 'Cause if we think of this as a supervised learning problem, I don't know if it has to be, but if it does, maybe you can comment on that.
Do you think there needs to be a data set of what it means to be an engaging, successful, fulfilling conversation? - I think that's part of the research question here. This was, I think, we at least got the first spot right, which is have a way for universities to build and test in a real world setting.
Now you're asking in terms of the next phase of questions, which we are also asking, by the way, what does success look like from a optimization function? That's what you're asking in terms of, we as researchers are used to having a great corpus of annotated data and then making, then sort of tune our algorithms on those, right?
And fortunately and unfortunately, in this world of Alexa Prize, that is not the way we are going after it. So you have to focus more on learning based on live feedback. That is another element that's unique, where just now I started with giving you how you ingress and experience this capability as a customer.
What happens when you're done? So they ask you a simple question on a scale of one to five, how likely are you to interact with this social bot again? That is a good feedback and customers can also leave more open-ended feedback. And I think partly that to me is one part of the question you're asking, which I'm saying is a mental model shift that as researchers also, you have to change your mindset that this is not a DARPA evaluation or an NSF funded study and you have a nice corpus.
This is where it's real world, you have real data. - The scale is amazing. - It's amazing. - That's a beautiful thing. And then the customer, the user can quit the conversation at any time. - Exactly. The user can, that is also a signal for how good you were at that point.
- So, and then on a scale of one to five, one to three, do they say how likely are you, or is it just a binary? - A one to five. - One to five. Wow, okay. That's such a beautifully constructed challenge, okay. Let's go to the next question.
- Okay, so I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one.
I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one.
I'm gonna go with this one. I'm gonna go with this one. I'm gonna go with this one.