The CVUT Alquist team managed to get with other two teams to the finals of a $2.5 million Alexa Prize, university competition. Our team has developed the Alquist social bot.
The whole team has met in the eClub during summer 2016. That time we have been working on a question answering system YodaQA. YodaQA is a somewhat complex system, and students learned the classic NLP. Of course, everybody wanted to use Neural Networks and design End to End systems. That time we have also been playing with simple conversational systems for home automation. Surprisingly Amazon announced the Alexa Prize and all clicked together. We have quickly put together the team and submitted a proposal. One Ph.D., three MSc, and one BSc student completed a team with strong experience in NLP. In the beginning, we have been competing with more than a hundred academic teams trying to get to the top twelve and receive the 100k USD scholarship funding. We were lucky, and once we were selected in November 2016, we began working hard. We started with many different incarnations of NNs (LSTM, GRU, attention NN, ....) but soon we have realized the bigger problem, a lack of high-quality training data. We tried to use many, movies scripts, Reddit dialogues, and many others with mixed results. The systems performed poorly. Sometimes they picked an interesting answer, but mostly the replies were very generic and boring. We have humbly returned to the classical information retrieval approach with a bunch of rules. The final design is a combination of the traditional approach and some NNs. We have finally managed to put together at least a little reasonable system keeping up with a human for at least tenths of seconds. Here started the forced labor. We have invented and implemented several paradigms for authoring the dialogues and acquiring knowledge from the Internet. As a first topic, we have chosen movies since it is also our favorite topic. Then, we have step by step added more and more other dialogues. While perfecting dialogues, we have been improving the IR algorithms. We had improved the user experience when Amazon introduced the SSML. Since then Alexa voice started to sound more natural.
While developing Alquist, we have gained a lot of experience. A significant change is a fact that we have to look at Alquist more as a product than an interesting university experiment. The consequences are dramatic. We need to keep Alquist running, which means we must very well test a new version. Conversational applications testing is by itself a research problem. We have designed software to evaluate users behavior statistically. First, a task is to find dialogues problems, misunderstanding, etc. Second, we try to estimate how happy are users with particular parts of the conversation to make further improvements. Thanks to the Amazon we have reasonably significant traffic, and while we are storing all conversations, we can accumulate a large amount of data for new experiments. Extensive data is a necessary condition for training more advanced systems. We have many new ideas in mind for enhancing the dialogues. We will report about them in future posts.
Many thanks for the scholarship go to Amazon since it was a real blessing for our team. It helped us to keep the team together with a single focus for a real task. Students worked hard for more than ten months, and it helped us to be successful.
Today we are thrilled we made it to the finals with the University of Washington in Seattle and their Sounding Board and the wild card team from Heriot-Watt University in Edinburgh, Scotland, with their What’s up Bot. Celebrate with us and keep the fingers crossed. There is a half a million at stake.