Friday, September 2, 2016

Bots and question answering in ESC 2016

a team of talented students formed around conversational applications in the eClub Summer Camp 2016. They continue developing the YodaQA and a simple Alquist bot. Both systems are built on top of a set of services including dialog managers up to NLP services.

We have started with the YodaQA factoid question answering system inspired by the IBM Watson. It is a fairly sophisticated engine, which builds on many NLP algorithms, Lucene search, RDF databases etc. The architecture and technology description is available on GitHub along with a test website and Android application.

The latest work is concentrated on teaching YodaQA Czech. This requires replacing some of the components with Czech versions. The most important are the Stanford syntactic parser, the Named Entity Recognition and finally the answer classifier. For syntactic parser, we use the Google Tensor Flow, the Syntaxnet and the Czech dependencies dataset. We get similar accuracy as the classical top of the line algorithms. Currently, we are developing the basic version of the entity recognition algorithm based on Conditional Random Fields (CRF). We have plans to implement NER also using Neural Nets.

The biggest problem in machine learning are the training sets. For the initial answer scoring algorithm, we have put together a set of questions-answer pairs. To make the set as rich as possible we have been enriching the set using variables for entities and synonyms, which is allowing us to algorithmically generate a large number of questions. The real system is logging questions and answers helping us creating better training sets. The sets still require some manual processing, but it is worth doing it.

The emergence of the conversational bots caught our attention too. Initially, we have tested the Wit.ai, Microsoft Luise, Meya, and Amazon Echo for English. Soon we have found many different limitations. Because the YodaQA is put together from a set of independent services, NLU processors, we have decided to use the same services to build simpler conversational bots. The bots use two essential parts intent and the entities recognizers. The bot processes the input users query and the extracted intent and entities are saved to a context object. Dialog manager (DM) uses the context to control the dialog flow.

Since we do not use the DM in YodaQA, we had to develop it. During our experiments with commercially available bots, we liked very much the Meya DM because of the simple dialog declaration in YAML. We have decided to go in a similar direction and created our own version called Alquist. It allows us writing even complicated dialogs. The implementation was fast and today we are running our first version of Alquist DM.

All this work is done by about ten students and the team grows. During the last weeks, we have made a considerable progress. We have at work several applications, stay tuned to be the first to test them.

1 comment: