How to Combine Computer Vision with Lifelike Speech Synthesis

It was just a few days ago when I posted a video demonstration of a very simple application (by looks) that combines two powerful forms of machine learning: computer vision and speech synthesis.

The app is more like a showcase of what you can do by stacking up codes on top of one another. To be specific, I'm using IBM Watson for visual recognition and Ivona for speech synthesis in a life-like voice.

The app takes an image URL as input, then IBM Watson does visual recognition, and then Ivona speaks what Watson has identified.

I've posted a detailed blog about it and the code is in my github. Feel free to grab it and modify to your own preference. See the video demo below.

I recently learned about something much more powerful, a neural network implementation in tensorflow that describes images. It's called Show-and-Tell and it would be interesting to explore it, should I find the (!freaking) time...

To stay in touch with me, follow @cristi

Cristi Vlad, Self-Experimenter and Author

How to Combine Computer Vision with Lifelike Speech Synthesis - [App Release + Video Demo]

To stay in touch with me, follow @cristi