Jan Sedivy: Just some comments: July 2016

the eClub Summer Camp is in full swing and our lab is full of students. The projects can be divided into two groups IoT and in Machine Learning.

The IoT group is busy with projecting architecture for connecting HUBs with Cloud servers. We assume the cloud will have to serve millions of HUBs collecting the information from sensors and controlling the actuators. We are discussing and making predictions how many events we will collect from sensors, how active will be the smartphone users, how much of administration traffic (heartbeats, updates, etc.) we will have to support. The users' profiles, sensors, and HUB configuration need to be maintained in databases. We also plan to save all logs to provide access to historical data. There will be probably two different systems one for handling the incoming data and another one for storing the logs. Haboop with HDFS seems to be the choice for managing the logs. SPARK for filtering and managing the events. Of course, security is one of the most important features of the system and we are busily studying communication protocols. It is a large project and many students work on preparing the specification and testing parts of the design. Our goal is to create a proof of concept showing the HUB CLOUD communication still this year.

We have also a large group focusing on conversational systems. The work is centered around the open source YodaQA factoid answering engine. It has been inspired by the Watson Jeopardy system. It already answers English questions. Our major task is to convert it to Czech and improve the functionality. We are working on the integration of WikiData knowledge DB and we have to retrain a lot of the NLU blocks to Czech. One of the students works on creating a Czech parser model for the Google SyntaxNet parser,

We are also looking at bots, which are good for creating of simple conversational apps for example for controlling a simple home IoT. The bots technology is based on information retrieval approach. We try to search for the best answer for a particular question. In this field, we have been working on Sentence Pair Similarity algorithms, which can be trained to recognize the question intent. Students are also looking at new development packages such as wit.ai, api.ai, Microsoft LUIS, land others. We try to develop examples of small simple applications. We believe the hands-on experience will help us to understand where are the limitations of the small, IR base systems and where we need to opt for the YodaQA technology. The laest interesting but the essential part of our effort is creating training databases. Everybody is involved in the hard work of data set collection.

If you are interested, join us visit us we may help you to select an interesting project. There is still time to join.

Jan Sedivy: Just some comments

Pages

Wednesday, July 20, 2016

eClub Summer Camp IoT, Machine Learning