Saturday, August 7, 2010

iPhone app for voice-controlled TV

I was looking to do something useful during my extended sabbatical "between jobs" last year so contacted ability.org.au to see if I could assist with any projects. Their director, Graeme, suggested an iPhone app to allow voice control of a TV. He noted the relatively low-cost of iPhones (compared to typical assistive devices), their ubiquity, the availability of IR dongles for iPhones (nearly all mobile phones prior to iPhone used to include IR capability, how ironic), and the availability of voice-recognition software for the iPhone (specifically Dragon Dictation, Google's voice search and the voice control utility in IOS). So now seemed like a good time to try this. It might even be possible to get some research funding to at least cover costs. (And to be honest, if it works it might be saleable on the App Store for general users, not simply those who can't accurately hit the buttons on a typical TV controller.)

Graeme suggested I look at the L5 Remote dongle for iPhone because they have recently open-sourced the API. I downloaded this package and it does indeed seem usable. The L5 costs US$49.95 (+S&H), about AU$70. Another dongle I looked at is the My TV Remote (originality in naming doesn't seem to be part of the plan). It sells for US$9.99 and plugs into the iPhone audio socket (L5 plugs into docking connector). I emailed the company and although they don't sell to Australia yet, they are considering it. And they are willing to send me their API if I want it. MTR uses audio socket which might make use of headset mike difficult (probably use Bluetooth to bypass). OTOH L5 uses docking connector which might make charging and/or long-term use difficult. (Couldn't find a charging double adapter for dock.) I purchased an L5 and it arrived within a week but it's just sat on my desk while work and life have intervened :-)

My initial hope was to use the recogniser in Google's voice search but so far haven't been able to find an open-source API for it. Dragon Dictation is proprietary and requires licensing which I'd rather avoid if possible.

Last week I discovered the CMU Sphinx project (http://cmusphinx.sourceforge.net/), an open-source voice recognition project. Brian King has made an iPhone Objective-C wrapper available for the pocketsphinx library (http://github.com/KingOfBrian/VocalKit) so I'm currently trying to learn how to use Sphinx.

The project, as I see it, requires the following:
1) A recent XCode and iOS4 SDK installed on my MacBook
2) pocketsphinx library added to XCode's static lib list
3) L5's lib added to XCode's static lib list
4) Some glue code to output IR codes when one of a small list of command words is recognised.

Each of the above steps is a project in itself:
1a) Renew developers subscription with Apple
1b) Download latest XCode and iOS4 SDK
1c) Install
1d) Install/update app signature certificate
1e) Write a test app, compile and test on iPhone emulator
1f) Install and run on iPhone

2a) Download Brian King's iPhone wrapper
2b) Install in XCode as per README
2c) Write and test a "hello world" app
2ca) Do I need a special dictionary or is default dictionary adequate?
2cb) Does Sphinx need training for Australian accent?
2cc) Should I test Sphinx on MacBook first to answer these questions? (Probably yes.)
2d) Modify pocketsphinx output if necessary to ease connect to L5

3a) Download and install L5 Remote app on iPhone
3b) Upload app with test controller's IR codes. (Can use digital tuner remote controller for sampling and testing.)
3c) Verify app works.
3d) Download and install L5 API in XCode
3e) Write and test "Hello world" app
3ea) Specify what a "Hello world" app should do.
3eb) Write, test, debug on iPhone (Can't use emulator for extra hardware lik L5)

4a) Design control program
4aa) GUI is almost non-existent.
4ab) Functionality to copy existing L5 app controls
4b) Code and test on iPhone
4c) Repeat 4a) and 4b) until working :-)

OK, so today we are upto 1e) write and install a test app on my 3GS with iOS4.0.1

More later.

No comments: