Jan Sedivy: Just some comments: Gesture recognition

End of April is the deadline for applying for PhD study at our faculty. I am putting together new research team. The target is the gesture recognition. I am looking for PhD, and MSc. candidates with a strong statistical and programming background.

What is the story? Students are approaching me with suggestions for a diploma theses or suggesting doctoral research. Tomas Tunys came recently with an idea of working on phone gestures. He is interested in recognizing gesturing with your phone in hand. Tomas Gogar on the other hand is interested in dead reckoning for his startup application. His goal is improving positioning even in locations without GPS. Working on these problems we have identified lot of similarities, at least they both use the phone sensors. Reading the literature I discovered that the gesture rocognition is based on the same mathematical framework as speech recognition, what a surprise.

All smart phones are equipped with many different sensors, which are helping the OS automatically switch the screen between the landscape an portrait, they turn the phone to pedometer, they track our bike trip, they record gestures etc. Drawing a gesture on the touch screen, gesturing with the phone or walking with the phone in the pocket are movements generating a time series produced by the motion sensors. Time series analysis is well established mathematical discipline. I was very lucky to work for twenty years in speech recognition and suddenly the smartphone sensors gave me a great chance to refresh my math. Fantastic, I can again use the vector quantization and the Linde Buzo Grey algorithm for generating the k-means vector quantizer. Check it out in Wikipedia they have published the algorithm 1980. My first US patent is partly using this algorithm. The gesture modeling relies on Hidden Markov Models. HMMs are the core of all current speech recognition systems using the old well known Viterbi algorithm. I remember all this from the times I had the privilege to work with a fantastic team in IBM led by Fred Jelinek. I worked with Raimo Bakis, David Nahamoo and other researchers who pioneered this mathematical background and applied it for speech recognition.

After a little state of the art analysis we have found a lot of articles focusing on gesture reco and the similarity with speech reco was immediately obvious, they use the same algorithms. The other surprise is that many of the gesture reco algorithms are still utilizing only a fraction of the sophistication developed for speech. How much more can we improve gesture reco using better algorithm? Can we build better sensing software recognizing your gestures or your movements? Can we build well recognizing system learning from a single instance? Can we build software recognizing complicated movements of your hands, the whole body or other even more complex movements? Can we build fusion systems combining many different inputs from GPS, compass, WiFi AP, cameras, speech etc?

What is the motivation? There are many applications, which would benefit from sensors. With the introduction of iPad the whole UI is being redefined today. I can imagine an iPad of a size of an office desk serving for everyday office tasks, helping graphical designers, helping CAD designers etc. Wrist watch may monitor your daily movements or predict what you want to do. You can gesture control your TV, HiFi or other home devices. Dead reckoning for robots or people etc. is also very challenging problem. Gestures combined with image or speech recognition we can create smarter devices with simpler more intuitive UIs. Gesture UIs for medicine are envisioned and tested already several years. Rehabilitation, aids for handicapped etc, is another segment where smarter sensors will make change. The list of opportunities is very long. I have mentioned only a fraction. Just look around and you will find many.

I am looking for students at the MSc and PhD level who will be interested in joining the new team focusing on enabling all these capabilities. I’d like to apply old and study new algorithms improving and uncovering the sensors possibilities. I am interested in deep research with the clear vision for practical deployment. I am looking for people who are committed to show by building pilots the power of research. I want to build practical applications changing the way of live.

If you are interested let me know.

Jan Sedivy: Just some comments

Pages

Tuesday, March 20, 2012

Gesture recognition - New research plan

No comments:

Post a Comment