back to index

Rohit Prasad: Alexa Prize | AI Podcast Clips


Whisper Transcript | Transcript Only Page

00:00:00.000 | - Can you briefly speak to the Alexa Prize
00:00:04.160 | for people who are not familiar with it
00:00:06.120 | and also just maybe where things stand
00:00:10.120 | and what have you learned and what's surprising?
00:00:12.880 | What have you seen that's surprising
00:00:14.360 | from this incredible competition?
00:00:15.920 | - Absolutely, it's a very exciting competition.
00:00:18.440 | Alexa Prize is essentially a grand challenge
00:00:21.520 | in conversational artificial intelligence
00:00:24.340 | where we threw the gauntlet to the universities
00:00:26.880 | who do active research in the field
00:00:29.400 | to say, can you build what we call a social bot
00:00:32.800 | that can converse with you coherently
00:00:34.800 | and engagingly for 20 minutes?
00:00:37.280 | That is an extremely hard challenge
00:00:39.960 | talking to someone who you're meeting for the first time
00:00:43.920 | or even if you've met them quite often,
00:00:47.100 | to speak at 20 minutes on any topic,
00:00:51.020 | an evolving nature of topics is super hard.
00:00:55.200 | We have completed two successful years of the competition.
00:00:59.080 | The first was one with the University of Washington,
00:01:00.880 | second, the University of California.
00:01:03.040 | We are in our third instance.
00:01:04.360 | We have an extremely strong team of 10 cohorts
00:01:07.100 | and the third instance of the Alexa Prize is underway now.
00:01:11.420 | And we are seeing a constant evolution.
00:01:14.960 | First year was definitely a learning.
00:01:16.400 | It was a lot of things to be put together.
00:01:18.640 | We had to build a lot of infrastructure
00:01:21.120 | to enable these universities
00:01:23.440 | to be able to build magical experiences
00:01:25.760 | and do high quality research.
00:01:29.040 | - Just a few quick questions, sorry for the interruption.
00:01:31.360 | What does failure look like in the 20 minute session?
00:01:34.720 | So what does it mean to fail
00:01:36.200 | not to reach the 20 minute mark?
00:01:37.440 | - Oh, awesome question.
00:01:38.700 | So there are one, first of all,
00:01:40.840 | I forgot to mention one more detail.
00:01:42.840 | It's not just 20 minutes,
00:01:44.020 | but the quality of the conversation too that matters.
00:01:46.760 | And the beauty of this competition
00:01:48.960 | before I answer that question on what failure means
00:01:51.280 | is first that you actually converse
00:01:54.080 | with millions and millions of customers
00:01:56.480 | as the social bots.
00:01:58.320 | So during the judging phases,
00:02:00.720 | there are multiple phases,
00:02:02.480 | before we get to the finals,
00:02:03.800 | which is a very controlled judging
00:02:05.440 | in a situation where we bring in judges
00:02:07.880 | and we have interactors who interact with these social bots,
00:02:11.880 | that is a much more controlled setting.
00:02:13.400 | But till the point we get to the finals,
00:02:16.440 | all the judging is essentially by the customers of Alexa.
00:02:20.200 | And there you basically rate on a simple question
00:02:23.680 | how good your experience was.
00:02:25.920 | So that's where we are not testing for a 20 minute
00:02:28.680 | boundary being crossed,
00:02:30.280 | because you do want it to be very much like a
00:02:33.240 | clear cut winner be chosen,
00:02:35.320 | and it's an absolute bar.
00:02:37.560 | So did you really break that 20 minute barrier
00:02:40.280 | is why we have to test it in a more controlled setting
00:02:43.400 | with actors, essentially interactors,
00:02:46.140 | and see how the conversation goes.
00:02:48.320 | So this is why it's a subtle difference
00:02:51.640 | between how it's being tested in the field
00:02:54.480 | with real customers versus in the lab to award the prize.
00:02:57.920 | So on the latter one,
00:02:59.240 | what it means is that essentially
00:03:01.720 | there are three judges,
00:03:05.440 | and two of them have to say this conversation
00:03:07.760 | has stalled essentially.
00:03:09.160 | - Got it, and the judges are human experts?
00:03:13.120 | - Judges are human experts.
00:03:14.360 | - Okay, great.
00:03:15.200 | So this is in the third year.
00:03:16.520 | So what's been the evolution?
00:03:18.280 | How far, so the DARPA challenge in the first year,
00:03:22.040 | the autonomous vehicles,
00:03:23.120 | nobody finished in the second year,
00:03:25.160 | a few more finished in the desert.
00:03:27.080 | So how far along in this,
00:03:30.680 | I would say much harder challenge are we?
00:03:33.760 | - This challenge has come a long way
00:03:35.120 | to the extent that we're definitely not close
00:03:37.840 | to the 20 minute barrier being with coherence
00:03:40.120 | and engaging conversation.
00:03:42.120 | I think we are still five to 10 years away
00:03:44.240 | in that horizon to complete that.
00:03:46.880 | But the progress is immense.
00:03:48.800 | Like what you're finding is the accuracy
00:03:51.480 | in what kind of responses these social bots generate
00:03:54.760 | is getting better and better.
00:03:56.600 | What's even amazing to see that now there's humor coming in.
00:04:00.760 | The bots are quite--
00:04:02.320 | - Awesome.
00:04:03.160 | (laughs)
00:04:04.000 | - You're talking about ultimate science of intelligence.
00:04:06.880 | I think humor is a very high bar
00:04:09.240 | in terms of what it takes to create humor.
00:04:12.320 | And I don't mean just being goofy.
00:04:13.920 | I really mean good sense of humor
00:04:16.880 | is also a sign of intelligence in my mind
00:04:19.000 | and something very hard to do.
00:04:20.520 | So these social bots are now exploring
00:04:22.440 | not only what we think of natural language abilities,
00:04:25.960 | but also personality attributes
00:04:27.800 | and aspects of when to inject an appropriate joke,
00:04:31.520 | when you don't know the domain,
00:04:35.840 | how you come back with something more intelligible
00:04:38.800 | so that you can continue the conversation.
00:04:40.600 | If you and I are talking about AI
00:04:42.600 | and we are domain experts, we can speak to it.
00:04:44.920 | But if you suddenly switch the topic to that,
00:04:46.720 | I don't know of, how do I change the conversation?
00:04:49.560 | So you're starting to notice these elements as well.
00:04:52.640 | And that's coming from partly by the nature
00:04:55.960 | of the 20 minute challenge
00:04:57.560 | that people are getting quite clever
00:04:59.960 | on how to really converse
00:05:03.000 | and essentially mask some of the understanding defects
00:05:06.000 | if they exist.
00:05:07.280 | - So some of this, this is not Alexa the product.
00:05:10.120 | This is somewhat for fun,
00:05:13.040 | for research, for innovation and so on.
00:05:15.240 | I have a question sort of in this modern era,
00:05:17.720 | there's a lot of, if you look at Twitter
00:05:20.840 | and Facebook and so on, there's discourse,
00:05:23.240 | public discourse going on.
00:05:24.560 | And some things that are a little bit too edgy,
00:05:26.240 | people get blocked and so on.
00:05:28.080 | I'm just out of curiosity,
00:05:29.720 | are people in this context pushing the limits?
00:05:33.400 | Is anyone using the F word?
00:05:35.160 | Is anyone sort of pushing back,
00:05:38.840 | sort of arguing, I guess I should say,
00:05:43.400 | as part of the dialogue to really draw people in?
00:05:45.760 | - First of all, let me just back up a bit
00:05:47.800 | in terms of why we are doing this, right?
00:05:49.600 | So you said it's fun.
00:05:51.760 | I think fun is more part of the engaging part for customers.
00:05:56.760 | It is one of the most used skills as well
00:05:59.960 | in our skill store.
00:06:01.840 | But that apart, the real goal was essentially
00:06:04.680 | what was happening is with a lot of AI research
00:06:07.880 | moving to industry,
00:06:09.360 | we felt that academia has the risk of not being able
00:06:12.480 | to have the same resources at disposal that we have,
00:06:15.640 | which is lots of data, massive computing power,
00:06:20.200 | and a clear ways to test these AI advances
00:06:23.800 | with real customer benefits.
00:06:26.000 | So we brought all these three together in the Alexa Prize.
00:06:28.360 | That's why it's one of my favorite projects in Amazon.
00:06:31.360 | And with that, the secondary effect is, yes,
00:06:35.280 | it has become engaging for our customers as well.
00:06:38.440 | We're not there in terms of where we want it to be, right?
00:06:41.400 | But it's a huge progress.
00:06:42.520 | But coming back to your question on
00:06:44.560 | how do the conversations evolve?
00:06:46.280 | Yes, there is some natural attributes of what you said
00:06:49.360 | in terms of argument and some amount of swearing.
00:06:51.640 | The way we take care of that is that
00:06:54.280 | there is a sensitive filter we have built.
00:06:56.560 | - That's some keywords and so on.
00:06:57.880 | - It's more than keywords, a little more in terms of,
00:07:00.960 | of course, there's keyword based too,
00:07:02.360 | but there's more in terms of,
00:07:04.400 | these words can be very contextual, as you can see.
00:07:06.920 | And also the topic can be something that
00:07:10.400 | you don't want a conversation to happen
00:07:12.880 | because this is a communal device as well.
00:07:14.760 | A lot of people use these devices.
00:07:16.680 | So we have put a lot of guardrails for the conversation
00:07:20.040 | to be more useful for advancing AI
00:07:23.360 | and not so much of these other issues you attributed,
00:07:28.360 | what's happening in the AI field as well.
00:07:30.320 | - Right, so this is actually a serious opportunity.
00:07:32.720 | I didn't use the right word, fun.
00:07:34.320 | I think it's an open opportunity to do some,
00:07:37.920 | some of the best innovation
00:07:39.440 | in conversational agents in the world.
00:07:42.200 | - Absolutely.
00:07:43.360 | - Why just universities?
00:07:46.440 | - Why just universities?
00:07:47.320 | Because as I said, I really felt-
00:07:48.960 | - Young minds.
00:07:49.800 | - Young minds, it's also to,
00:07:52.520 | if you think about the other aspect
00:07:55.320 | of where the whole industry is moving with AI,
00:07:58.840 | there's a dearth of talent in, given the demands.
00:08:02.320 | So you do want universities to have a clear place
00:08:07.320 | where they can invent and research and not fall behind
00:08:09.920 | with that they can't motivate students.
00:08:11.360 | Imagine all grad students left to, to industry like us
00:08:16.360 | or faculty members, which has happened too.
00:08:20.320 | So this is a way that if you're so passionate
00:08:22.640 | about the field where you feel industry
00:08:25.480 | and academia need to work well,
00:08:27.200 | this is a great example and a great way
00:08:30.320 | for universities to participate.
00:08:31.920 | - So what do you think it takes to build a system
00:08:34.760 | that wins the Alexa Prize?
00:08:37.080 | - I think you have to start focusing on aspects
00:08:41.880 | of reasoning that it is, there are still more lookups
00:08:46.880 | of what intents the customer is asking for
00:08:51.640 | and responding to those rather than really reasoning
00:08:56.360 | about the elements of the conversation.
00:08:59.960 | For instance, if you have, if you're playing,
00:09:03.720 | if the conversation is about games
00:09:05.560 | and it's about a recent sports event,
00:09:08.720 | there's so much context involved
00:09:10.760 | and you have to understand the entities
00:09:13.280 | that are being mentioned so that the conversation
00:09:16.520 | is coherent rather than you suddenly just switch
00:09:19.000 | to knowing some fact about a sports entity
00:09:22.640 | and you're just relaying that rather than understanding
00:09:24.840 | the true context of the game.
00:09:26.160 | Like if you just said, I learned this fun fact
00:09:29.760 | about Tom Brady rather than really say how he played
00:09:34.320 | the game the previous night,
00:09:36.760 | then the conversation is not really that intelligent.
00:09:40.280 | So you have to go to more reasoning elements
00:09:43.640 | of understanding the context of the dialogue
00:09:46.600 | and giving more appropriate responses,
00:09:48.680 | which tells you that we are still quite far
00:09:51.160 | because a lot of times it's more facts being looked up
00:09:54.880 | and something that's close enough as an answer
00:09:57.400 | but not really the answer.
00:09:59.520 | So that is where the research needs to go more
00:10:02.520 | in actual true understanding and reasoning.
00:10:05.840 | And that's why I feel it's a great way to do it
00:10:07.920 | because you have an engaged set of users
00:10:10.960 | working to make, help these AI advances happen in this case.
00:10:15.680 | - You mentioned customers there quite a bit
00:10:18.120 | and there's a skill.
00:10:19.600 | What is the experience for the user that's helping?
00:10:23.960 | So just to clarify, this isn't, as far as I understand,
00:10:27.560 | the Alexa, so this skill is a standalone
00:10:30.000 | for the Alexa Prize.
00:10:31.000 | I mean, it's focused on the Alexa Prize.
00:10:32.800 | It's not you ordering certain things
00:10:34.720 | and it was like, oh, we're checking the weather
00:10:36.640 | or playing Spotify, right?
00:10:38.120 | It's a separate skill. - Exactly.
00:10:39.920 | - And so you're focused on helping that.
00:10:43.000 | I don't know, how do people, how do customers think of it?
00:10:45.960 | Are they having fun?
00:10:47.240 | Are they helping teach the system?
00:10:49.440 | What's the experience like?
00:10:50.440 | - I think it's both actually.
00:10:52.040 | And let me tell you how you invoke the skill.
00:10:55.240 | So all you have to say, Alexa, let's chat.
00:10:57.600 | And then the first time you say, Alexa, let's chat,
00:11:00.720 | it comes back with a clear message
00:11:02.080 | that you're interacting
00:11:02.920 | with one of those university social bots.
00:11:05.400 | And there's a clear,
00:11:06.720 | so you know exactly how you interact, right?
00:11:09.200 | And that is why it's very transparent.
00:11:11.480 | You are being asked to help, right?
00:11:13.640 | And we have a lot of mechanisms where as the,
00:11:18.320 | we are in the first phase of feedback phase,
00:11:21.040 | then you send a lot of emails to our customers
00:11:24.120 | and then they know that the team needs a lot of interactions
00:11:29.160 | to improve the accuracy of the system.
00:11:31.320 | So we know we have a lot of customers
00:11:33.280 | who really want to help these university bots
00:11:36.320 | and they're conversing with that.
00:11:37.800 | And some are just having fun with just saying,
00:11:40.080 | Alexa, let's chat.
00:11:41.400 | And also some adversarial behavior to see whether,
00:11:44.720 | how much do you understand as a social bot?
00:11:47.640 | So I think we have a good healthy mix
00:11:49.680 | of all three situations.
00:11:51.320 | - So what is the,
00:11:52.720 | if we talk about solving the Alexa challenge,
00:11:55.440 | the Alexa prize,
00:11:58.040 | what's the data set of really engaging,
00:12:02.840 | pleasant conversations look like?
00:12:04.840 | 'Cause if we think of this as a supervised learning problem,
00:12:07.960 | I don't know if it has to be,
00:12:09.560 | but if it does, maybe you can comment on that.
00:12:12.760 | Do you think there needs to be a data set
00:12:14.840 | of what it means to be an engaging,
00:12:18.520 | successful, fulfilling conversation?
00:12:19.960 | - I think that's part of the research question here.
00:12:22.120 | This was, I think,
00:12:23.320 | we at least got the first spot right,
00:12:26.560 | which is have a way for universities to build and test
00:12:31.560 | in a real world setting.
00:12:33.160 | Now you're asking in terms of the next phase of questions,
00:12:35.960 | which we are also asking, by the way,
00:12:38.440 | what does success look like from a optimization function?
00:12:42.760 | That's what you're asking in terms of,
00:12:44.520 | we as researchers are used to having a great corpus
00:12:46.920 | of annotated data and then making,
00:12:50.000 | then sort of tune our algorithms on those, right?
00:12:54.920 | And fortunately and unfortunately,
00:12:57.960 | in this world of Alexa Prize,
00:13:00.240 | that is not the way we are going after it.
00:13:02.680 | So you have to focus more on learning
00:13:05.040 | based on live feedback.
00:13:08.240 | That is another element that's unique,
00:13:10.280 | where just now I started with giving you how you ingress
00:13:14.600 | and experience this capability as a customer.
00:13:18.840 | What happens when you're done?
00:13:20.920 | So they ask you a simple question
00:13:23.600 | on a scale of one to five,
00:13:24.960 | how likely are you to interact with this social bot again?
00:13:29.280 | That is a good feedback
00:13:31.240 | and customers can also leave more open-ended feedback.
00:13:34.840 | And I think partly that to me
00:13:38.280 | is one part of the question you're asking,
00:13:40.040 | which I'm saying is a mental model shift
00:13:42.000 | that as researchers also, you have to change your mindset
00:13:45.960 | that this is not a DARPA evaluation or an NSF funded study
00:13:50.080 | and you have a nice corpus.
00:13:52.400 | This is where it's real world, you have real data.
00:13:56.160 | - The scale is amazing.
00:13:57.240 | - It's amazing. - That's a beautiful thing.
00:13:58.960 | And then the customer, the user can quit the conversation
00:14:03.160 | at any time. - Exactly.
00:14:04.000 | The user can, that is also a signal
00:14:05.840 | for how good you were at that point.
00:14:09.160 | - So, and then on a scale of one to five, one to three,
00:14:12.440 | do they say how likely are you, or is it just a binary?
00:14:15.120 | - A one to five. - One to five.
00:14:17.440 | Wow, okay.
00:14:18.280 | That's such a beautifully constructed challenge, okay.
00:14:20.960 | Let's go to the next question.
00:14:22.800 | - Okay, so I'm gonna go with this one.
00:14:24.600 | I'm gonna go with this one.
00:14:25.440 | I'm gonna go with this one.
00:14:26.280 | I'm gonna go with this one.
00:14:27.120 | I'm gonna go with this one.
00:14:27.960 | I'm gonna go with this one.
00:14:28.800 | I'm gonna go with this one.
00:14:29.640 | I'm gonna go with this one.
00:14:30.480 | I'm gonna go with this one.
00:14:31.320 | I'm gonna go with this one.
00:14:32.160 | I'm gonna go with this one.
00:14:33.000 | I'm gonna go with this one.
00:14:33.840 | I'm gonna go with this one.
00:14:34.680 | I'm gonna go with this one.
00:14:35.520 | I'm gonna go with this one.
00:14:36.360 | I'm gonna go with this one.
00:14:37.200 | I'm gonna go with this one.
00:14:38.040 | I'm gonna go with this one.
00:14:38.880 | I'm gonna go with this one.
00:14:39.720 | I'm gonna go with this one.