Automatic speech/voice recognition software for the Oracle
Oracle Database Tips by Donald BurlesonJanuary 19, 2015
The 21st Century steno pool - Jott
There are few tools that are not a re-hash of older
technology, but one tool that shows great promise is the Internet-based ?Jott?
tool. (Thanks Steve
Karam, for bringing this useful dictation software to my attention!).
Jott is a great way to do direct cell phone to e-mail
transcription, and after a short set-up you can effortlessly dictate for up to
30 seconds and have the transcribed text sent via e-mail to anyone you wish.
Few executives today remember the 'steno pool?, where
roomfuls of young women honed their skills with Gregg Shorthand. The old phrase
'take a letter? is long-gone, and the art of stenography has gone the way of the
"Ms. Jones, please come have a
seat on my lap and take a letter"
The original voice recognition were plagued with problems
and they were unable to differentiate between speaking and background noise,
leading to botched transcriptions like this:
Background noise plagued early speech
Lets take a look at where we came come since the nascent
beginning of ASR (VR), more than 15 years ago.
A brief history of voice recognition
I remember talking with the early developers of speech
recognition interviewing a brilliant quadriplegic programmer in Toronto who
originally invented the first speech recognition software out of necessity.
Back them, the problem was always RAM shortages, as a good voice recognition
(VR) tool must be able to cache huge amounts of rules in order to be performant.
Back in the early 1990's I was asked to write some article
for Computerworld magazine and I vividly remember by research for a 1994
Computerworld article on
automatic speech recognition (ASR) technology.
"ASR provides a task-driven interface to the computer,
but programmers need an entirely new skill set to create user-friendly
systems,'' says Dan Thompson, co-founder of KolVox."
Dan was an inspiration to me, and I was happy to know that
Dan Thompson worked as an
inspirational speaker, telling people how he overcame adversity with sheer
will and programming skill.
But voice recognition has come a long way in the past
Jott promised to be a very functional tool for the busy
executive because it allows you to use your cell phone to dictate messages (but
only up to 30 seconds) and automatically transcribe your verbal message and send
it via text-messaging or e-mail.
Testing Jott Voice recognition
I?ve been waiting for 30 years for voice recognition
technology to become a reality, and there has yet to be a tool that will fully
conquer contest-sensitive grammar.
For more details, see my notes on
context-sensitive grammar and my 1994 Computerworld article on
automatic speech recognition (ASR) technology.
But Jott comes close to being quite useful. On my cell
phone, I had this interaction to send myself a e-mail with a note.
Jott is great for
transcribing cell phone dictation
While Jott is great for ordinary transcription, how does it
work for complex grammar? For a more advanced test, I interjected three
phrases that can be confusing to voice recognition software:
- Complex terminology - I asked for a
'tachistoscopic episcatistor?, a super-fast camera and projector shutter
that is used in psychology perception experiments.
- Sounds-alike phrase - I used the term ?Courier
and Ives?, which some VR tools will transcribe as ?Career knives?.
- Context-sensitive sounds - The age-old test has
always been ?Please write to Mr. Wright right now.?
Here is a sample session with Jott. I used a very
clear enunciation (I'm a licensed auctioneer) and my dictation was clear, crisp
and free of background noise:
- Me - Press voice-dial
button on cell phone
- Phone - ?Who do you
wish to call?
- Me - "call Jott"
- Phone - "Calling
- Jott - ?Who do you
wish to Jott?? self
- Me ?Jott self?
- Jott - "Jotting self"
- Me - "I need a
tachistoscopic episcatistor and a Currier and Ives print for the
meeting. And please write to Mr. Wright, right now."
- Me - Hang up
Here is what I received in my e-mail inbox:
I need [Unclear speech] tester(?) and the courier in
i-Sprint(?) for the meeting. And please write to Mr. Wright, right now.
Conclusions on Jott
Jott is great for most common communications, but there are
some areas for improvement of the transcription software:
Note that Jott did great with the context sensitive
grammar (Please write to Mr. Wright right now".
Jott did did not know the programmable "cue" that
Courier and Ives go together, transcribing "courier in i-Sprint" instead of
"Currier and Ives Print".
Jott failed my test on complex terminology. This
is an easy software fix by adding scientific words to their dictionary.
As of January 2015, the Jott software will not personalize
and accept feedback on mistakes or learn your voice cadence, but that may be
Overall, I'm finally ready to concede that voice
recognition has matured to the point where it's is usable.
If you like Oracle tuning, you
might enjoy my book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
You can buy it direct from the publisher for 30%-off and get instant
access to the code depot of Oracle tuning scripts.