Wednesday, August 28, 2013

Summer projects

Summer is great, I can finally focus on real research and development. We have pursued several interesting projects in machine learning and we designed one Android application in the Cloud Computing Center. 

Tomas has been mostly focusing on documents categorization. His task is: for a given document find the best category. In practice this kind of algorithm allows automatic tagging of documents, it is good for grouping similar documents or it can be used for pairing ads and content in web advertising etc. To categorize the documents we need to build models for all categories. We have generated large number of models with the Latent Dirichlet Allocation (LDA) algorithm. The final classification to the best category was done with a Random forest (RF) classifier trained on a set of manually categorized documents. The training set was kindly provided by Seznam.cz. The LDA is requiring a lot of computational power. To accommodate these requirements we selected the Cloudera virtual images with preinstalled Hadoop. The LDA is written in Python and was run in Mapreduce. The work is in progress and the results are promising.

Ondrej with other students is looking in to another interesting problem. They try to model on-line gamers. We have received data from Pool Live Tour game made by Geewa. The data shows in detail the user behaviour. The task is simple. Model users good enough to predict their readiness for buying a new cue. Players are gaining skills, proceeding to higher levels, spending more time in the game etc. All this has to be captured to estimate the correct moment for cue update. The biggest problem in such projects is to get through the data and extract the right features. Mirek from Geewa helped us with decoding how it works. The data includes gamers who bought as well as non-buying users. This is good for supervised training. Again we decided to use our popular RF algorithm for classification. Currently we are planning tests. The models we have developed are also useful for clustering users to groups, for uncovering cheating etc. Similar models can be used for many other games or activities. These algorithms may uncover necessary insight and help us making the game or activity more challenging and engaging. This field is really rich for many further improvements. The greatest advantage: there are really many gamers producing a lot of data and more data results in bigger power.

In June we have started to work on Android application for the CTU students. The app allows them to search for faculties and students from the mobile device. Students can find their classes with detailed description. The app also offers detailed information about the university. The built-in RSS feed reader aggregates the most important university information sources. Finally the application allows users to check what is todays menu in the canteens. We are almost finished. We are starting testing and final debugging. Our plan is to give this away to students next month. It will be downloadable through Google Play for Android owners. We plan to introduce the iPhone version later, since iPhone is less frequent between students. Thanks go to the group of programmers at the FIT faculty of CTU who prepared the required KOS API.

There are some more projects we are cooking in our lab, but I will report about them next time. If you are interested in our projects or if you want to join our group let me know. We have plenty of interesting tasks.

Tuesday, July 30, 2013

We have finished several interesting machine learning projects recently. I have updated  our Cloud Computing Center where you can find more details, other finished projects as well as description of what we work on now.  

First, let me describe new implementation of a Contact-less heart beat measurement on iPad. Jan Plešek has designed an iPad app, which works similarly to Mirror, mirror, tell me who's the most beautiful? You just watch the iPad screen locating your face in a square and in a couple of moment the system tells you a pulse rate estimate. The application is using a unique algorithm for the pulse frequency estimation based on the observed changes of the under the eyes skin color . The human heart pumps the blood and the built in iPad camera can recognize the difference in color during the high an low pressure. To focus to the right place the app must first find face, nose, eyes and then focuses under the eyes. The color changes constitute a time series to which we apply a simple algorithm estimating its base frequency. We have achieved a similar accuracy as the standard heart beat equipment.

The next project was done in cooperation with AVG. Yes, the famous anti-virus company. I bet that on many of your machines, while reading this, is softly humming their Free Antivirus software. Ondrej Pluskal has specialised on the development of Mallware detection, particularly on the algorithm estimating anomalies. The antivirus is continuously watching what is going on on your PC: what files are downloaded, what apps are started, what dll’s are instantiated etc. All this and similar actions are signals forming feature vectors. The task is then simple - identify those signalling an anomaly. To design an algorithm deciding which vectors or situations are perfectly legal from malicious situations requires learning. Taching classifier requires a lot of vectors describing usual situations and also the malicious vectors to discover the difference between them. We have received such a data collected on running PCs from AVG. Our classifier is using the Support Vector Machine (SVM). After lot of work with preprocessing, testing and tuning of the operation point, we have delivered a new classification vector improving the performance. We are hoping our solution will get soon to the product.

Third, Tonda Novák was involved with the Design of Probabilistic Models for Text Input Correction project. It was a great opportunity to explore the learning to rank algorithms capabilities, which became the core of the solution. What is it doing? When entering a query to a search engine users are making mistakes and the engine needs to correct them before starting the search. To find what the user really wants to enter is not a simple task. They are entering many more different words than we can find in a standard dictionary, many words are appearing or are being created on everyday basis, some of the works are showing in multiple words phrases, some users do not know the spelling but know the phonetic version. We have collected all this information to guess what the user wants to type. Of course, there are multiple choices for each word. The problem is to put all the information together and run an algorithm deciding what is the most likely word or phrase the users is about to enter. Tonda has decided to use the pair-wise learning to rank algorithm. It outputs a list of ranked corrected queries. It is a supervised algorithm requiring a learning set. We have used a corpus of queries from the seznam.cz search engine. Jointly with seznam.cz we set up a testing server to run tests on unknown test set to measure the accuracy. Our machine works quite well. The proof: parts of our algorithm are already in the product version of seznam.cz query correction. Hooray!

I have chosen these three examples of our projects to show how we want to progress in the future. It is simple, we want to focus on machine learning algorithms in practical applications. Most of students want to work on real industry problems. They would like to get practical experience before leaving the school, cooperate with a company to try how it feels working for them. This is also the best form of a cooperation for the faculty, because we can apply the latest research results and confront them with the industry. \It is not easy to find the right projects and put all the required ingredients together. It requires people with vision and empathy on both sides in the industry and academia too. Not all companies are ready, not all companies compete delivering better, and technologically more advanced solutions. At the university it is a never ending process searching for the best partners, with leaders interested in innovation, bringing their customers the best solutions. We are looking for more partners with clear technological vision to help them solving the most challenging problems in AI, machine learning and computer science. If you know suitable company let me know.


Sunday, June 30, 2013

How Googlers defend the MSc theses

Vojta Jina successfully defended his MSc theses at the Czech Technical University. He has designed and wrote Karma a JavaScript Test Runner. I was his supervisor and I feel very relieved, I have managed pushing him through the process of finishing exams, writing a theses and defending it.

I met Vojta about two years ago and that time he already was a real geek. He knew what he wants to do. His life was coding and the chosen language - JavaScript. He just arrived from a stay in Google London where he was practicing as a student. I was very happy to meet with a student having clear idea for a diploma work. It was great to see him step by step formulating what he needs for developing the JS code. He was enough self motivated and I did not need to do anything except to keep an eye on him finishing exams on time - geeks have very radical opinions, which are not a great match with very theoretically focused examiners. What more, the university curriculum was not of a top interest for Vojta, but beside few flops he got through. Shortly after we agreed on the diploma task he got hired by the AngularJS in Mountain View. I know some of the AngularJS folks and it is a great team. They live for their projects, they love programming, they were free to make their own development decisions and this is the spirit most of the programmers are looking for. Vojta immediately became one of them, dedicated to the project. It became even more difficult to stay in touch. We had regular Skype chats 8:00 am Europe and 11:00 pm California time. I enjoyed the chats discovering what is going on in Silicon Valley. Vojta was making big progress in improving the code, giving presentations and other activities. I was very happy Vojta is moving ahead, but the deadline for submitting the diploma work was approaching quickly and the fear of the blank page grew. He already had well working code on github, number of downloads was rapidly increasing, but the university needs the diploma work. It was a struggle, every week we had to go through the wording discovering how to write and what is the structure of a technical report. It was also big effort to get rid of many geeky wording.

The submission day approached and Vojta’s sister was haunting me to get the last signatures before officially submitting the work. Finally, few days before the state exam and diploma defense Vojta arrived with a Californian smile on his face. We met in my house. He brought the latest Google Chrombook. We had great time and some glasses of scotch, but I had no idea what is awaiting me.

The state committee is a panel of very serious, very formal professors, total missmatch with Vojta's attitude. Vojta entered the room very relaxed, no tie and he immediately started: "hi guys, how are you doing these days? Ok, life is short and time is swift, I’ll skip the details and show you some nifty tricks". He continued: "if you did not know, this is a terminal screen. Just give me a second, I need to launch bunch of daemons to make the things move". Then he started typing in a speed of about 200 chars/min first commands. Each enter key hit emitted the full screen of text. In a sunny room with an obsolete projector nobody had a chance to read a single line. Vojta did not show any sign of nervousness and in a cheerful tone was praising his great ideas commenting like: “isn’t it great, this is even better, I have almost forgotten, I need to show you this trick”. He certainly was successful in making the professors awake, they have never encountered candidate like him. Vojta finished the presentation very abruptly saying: “Ok, this is may be enough, if you have any other questions let me know”. The room became dead silent, no questions. I was scared, what would be the perception? It was go/no go situation, either they will get angry and will kick him out or they accept the style and Vojta will become a hero. Luckily the supervisor can comment while the committee is deliberating. To cut the story short. Yes, he is a hero. He has done it. Great job!

The lesson I have learn: I am looking forward supervising other batch of new geeks next year. Let me know.

Friday, May 17, 2013

eClub winners, great new projects


yesterday evening we have awarded eClub summer 2013 winners. The winners are Apeman boards, BlindShell and Mixturam. eClub and MediaLab Foundation awarded scholarships to support further work on their ideas.

The ApeMan team introduced plans for manufacturing cool long boards, which became recently again very popular. They started by creating a community of stylish enthusiasts as well as racers with distinguished taste interested in long boarding. This way they have learned what is the dreamed board, how to design it and how to manufacture it. They are bringing even a little more - great design, which is a very important part of their image, check their cool site. The team includes designers, long board racers, faculty of mechanical engineering PhD. students, and business people. The company manufactures a high-tech carbon based boards with a unique design and great shapes ranging from the middle to the high end.  I wish the team to get through the initial organizational problems as soon as possible and I hope, we will see many more enthusiasts and racers smoothly sliding their boards. I have no doubts, the main award is in the right hands, Apeman will effectively use the 40k scholarship to invest further to technology.

The second team called Blindshell are two students from the Faculty of Electrical Engineering. With their tutor they are for quite some time researching user interfaces for disabled people. Blindshell is a launcher for touch based Android phones customized for visually impaired users. The latest smart phones are bringing many new sensors providing lot of information to enable new and smarter applications, but the whole interface is mostly visually intensive. For people with poor vision is sometimes very difficult to effectively take advantage of all the goodies. The whole idea is based on gesture, voice based interface allowing quick access even for visually impaired. The team still needs to turn the idea into a product and I wish we would soon see Blindshell on new Androids.

The third team is MixTuram offering custom tailored food supplement. Check the web, you can mixFDA to offer the products on the US market. I was lucky and I got the first box with the serial number #00001. I am looking forward to show it to my friends once Mixturam is a major worldwide company.
the right set of vitamins and other ingredients to make you stronger. MixTuram will mail you nice small box with 100 pills. Get one and boost your self for better performance. They have already received all the legal approvals in the Czech Rep. and they work with

It was a great evening and we stayed long having great discussion with all winners after announcing the results. All people were energetic and charged to continue working on their ideas. For the first time two of the winners have already running company and one company is already selling. Overall I feel happy - something good is starting and eClub and MediaLab is part of it. We all are looking for the next eClub season, watch us on the web and FB, we will start in September.

Tuesday, May 14, 2013

Final presentations in eClub


Eleven new teams met on Thursday to present their new ideas in the eClub. If you did not join us, see the video. All presentations were great. New teams are awesome and the international jury will have a hard time to select the best teams. We will award the winners this Thursday.

Let me share some of my observations. Karel Obluk opened the whole series of presentations in this semester. He talked about the pros and cons of entrepreneurship, the importance of education and the joy of building startup. This semester we have included new educational presentations. Thanks to Jan Vesely we had a great series of Lean Startup lessons.  He explained the steps of these days de facto standard methodology for developing startup. Lukas Fittl nicely continued with a great presentation about actionable metrics showing how to use and interpret the company data. Lucie Havlickova reviewed very well how to deliver the project pitch, how to make viewers involved and how to communicate the basic ideas. We have closed the educational part with an entertaining session about pricing delivered by Jiri Fabian (Vendavo) and Vaclav Lorenc (he calls himself a pricing idiot). They put together an insightful pricing guide for small companies as well as for corporations.

As usual we have been mixing in also motivational presentations. David Vavra an independent developer has shown how to develop a “lonely ranger” application. Working as a micropreneur is a great example. Following his example students can get in touch with technology make some money and understand how the business works. Jarda Gergic has presented some of the techniques for project management using his GoodData as an example. Finally, the business star Honza Rezab from Socialbakers shared with us some of his thoughts about entrepreneurship and his life.

Thanks to ERA Svet we have enjoyed the great space directly in the middle of Prague. They have great people who helped us a lot with organization, streaming and recording all presentations. This year we have been streaming not only to our friend universities we have made the streams freely available for everybody. Google Hangout technology proved to be excellent for streaming and sharing the video on Youtube. I am also very grateful to other sponsor ICT Alliance helping us with catering.

Overall I have the feeling we did very well and offered great speakers and valuable presentations with lot of know-how. We certainly want to continue starting in September. Let me know if you have some ideas for improvement or any comments. I also hope, I will be able to help students with starting their projects during the summer vacation.

Next Thursday, the jury will announce the winners and the CTU Media Lab will give away the scholarships. Join us on Thursday or watch the web for the winners.


Enhanced by Zemanta

Tuesday, April 16, 2013

Demo day in eClub


new teams will meet in eClub this Thursday for a dry run presentations.  The purpose is to test the ideas, get a feedback and prepare for a final presentation in front of a panel of judges on May 9th.

This time eClub tried to put together curriculum to help you to develop and communicate your ideas to be successful in competing for a seed money in our competition. Two weeks ago Lucie Havlickova taught eClubbers in the art of delivering a stunning presentation. She has listed and explained all the basic rules. It is up to you to exercise in front of an audience now. The presentation is the first test of how well you can formulate your ideas. It is also the best first test about the quality of the idea. Jan Vesely delivered a series of Lean Startup presentations formulating the first steps on the road of testing your hypotheses. One of the most important messages is: get out of the buildings. This is the Steve Blank’s famous advice. Finding customers and making sure they need your solution, your product is the first essential step in building successful startup. A presentation in front of your peers will be the first real step in this direction.

We will organize the dry run presentation in a similar way we have done in the last years. We will meet in the great Era Svet rooms having coffee in a relaxed atmosphere. Every team will have about 10 minutes for a short presentation, and then we will open discussion. Ask for explanation for clarification, find out the essential, try to learn how you would better formulate the final idea, how to communicate briefly etc.  We will as usual record all presentations. This is a great chance to watch yourself and fix all the flaws.

eClub focuses on students teams. There needs to be at least one student in a competing team. We do not expect the teams will be describing finished products we are looking more on newly created teams planning to start an interesting fresh business. How to prepare for the presentation? Come with a maximum of four slides to support your ideas. Try to present to some of your friends before to fix the ideas and see how long it will take. I will appreciate if you can register before Thursday, this will allow us to better plan for the evening. If you don't, it is OK too, just show up, registration will be open till the final demo day in May.

I hope, for the final presentation all teams will already have a web page with a short product or service description. You do not need it for the dry run. The final competition will be run in front of a panel of international judges. It is therefore important to deliver your presentation in English. In exceptional cases we will allow the dry run in Czech, but it is highly recommended to test your English skills.

Join us and present your new ideas, show up even if you are not presenting you can give a valuable feedback to the new teams. I am sure it will be a great experience for all.

Sunday, March 31, 2013

Huge crowd in eClub

More than 80 people arrived for Lukas Fittl's presentation this Thursday. The folks were excited by his “Actionable metrics” presentation.

It might be because of the great rooms in the middle of Prague, which is EraSvet kindly providing to us or it might be because of the snacks we are serving in the networking part of the eClub meetings. Who knows, but I believe, it is because we have attracted great speakers.

First we welcomed David Vavra well known Android applications developer. His presentation was a continuation of Tonda Hildebrand's presentation. David explained  the advantages of an independent developer, micropreneur.  He has step by step reviewed how he has been developing the “Dluznicek” alias “Settle up” (English version) mobile application. David was chronologically showing how he was improving and enhancing his application, how he was attracting users. All the development steps and actions were accompanied with many references to many interesting  sites, which can help you to advertize, analyze and sell your application. Watch the presentation.

It is most important for me that I met David first as my student and then he has joined eClub and entered one of the first competitions. It is great to see, he has not stopped working on his ideas.  And today, tens of thousands people are using his application. Great job! What more his application is earning money. See the presentation and get inspired, do something similar, this is the way to make dreams happened.

Lukas Fittl had attracted the largest crowd this year. We did not have chairs enough, the food was finished before we started, but it was a great success this Thursday.

Lukas is well known on the European startup scene. He is mostly pitching the Lean startup. This time he gave a very interesting presentation about actionable metrics. The presentation step by step analyzed how to quantitatively answer questions like: am I making progress in the right direction? Does my product fit the market requirements? How to increase revenue? He also touched the cohort analysis. In his presentation cohort is a group of people sharing common characteristic over a period of time. According to the Lean Startup theory he has shown how to deliver and measure value. The final part focused on how to take action, how to plan, run, evaluate experiments and quantify the goals. Look at the video recording and the presentation, I am sure you will find it very inspiring.

There are another presentations ahead of us, Lucie Havlickova will teach how to give a catchy presentation and Jarda Gergic will talk about organizing group of developers.  He runs the GoodData group of developers in Prague. But the most important event is coming just after that, the first presentation day of this year eClub competition.  Join us too and compete for interesting prices. Looking to meet you at the next eClub meetings.