 |
|
Automatic
Speech Recognition (ASR)
Oracle Database Tips by Donald Burleson
|
Often a new technology is driven by
necessity. For example, a quadriplegic programmer has developed an
automatic speech recognition system entirely out of necessity because he
does not have the use of his hands, and the only thing that had been
holding him back from making commercially available products was the
high cost of RAM memory. At the time it required about a hundred
megabytes of memory, but now he's got what they call continuous speech
recognition capability with 200 meg of RAM, which is still expensive for
today's PC's, but is certainly not out of reach for many different kinds
of applications.
This product has become a commercial product for
use by physicians and attorneys, where they can just go directly to a machine
and dictate their information directly onto the computer without having to have
people transcribe that kind of information. So again, here, within the
marketplace, clerical skills are going to be going away as we know them. Just
as within the last twenty years, nobody learns Gregg shorthand in secretarial
school anymore because it's been displaced by tape recorders, so will become
keyboarding is going to be very obsolete.
A lot of the new technology has been driven by
the U.S. Federal Government. If we examine automatic speech recognition (ASR)
it started with the CIA doing research for the Northwest Parallel Computing
Resource Center in Syracuse, New York. The Government has been working on
databases where a user can make an English query into a Russian database and
return the results said in French. These kinds of dynamic language translations
are of immense value but they didn't come without a cost. In the 1980s IBM
spent years and millions of dollars learning how to parse spoken English with a
computer. For example, it is very difficult for a computer to understand that
the word "bus station" is two words and not one, since they are pronounced as a
single sound.
But parsing out the spoken words is just the
beginning. Once you've then parsed the word, you still have to be able to
derive meaning from the spoken word, and that's a very difficult challenge in
and of itself, especially because of the natural ambiguities that come out of
the human language. Just a simple phrase, "Mary had a little lamb", computer
will get confused and it can't resolve the verb "had" in the sentence. The
computer would ask "What do you mean, Mary had a little lamb? Did Mary buy a
little lamb? Did Mary eat a little lamb? Do we take a biblical interpretation
of "had"? What do you mean Mary "had" a little lamb"?
These kinds of natural ambiguities occur in
human English all the time and it's very difficult to get computers to resolve
the nuances of the English language. For example, there was a product in the
1970s marketed by Excalibur Corporation called Savvy, which was a natural
language interface to databases. It worked for very simple queries, but when
you gave it something sophisticated, or ambiguous like "How long has Joe been
with us?", Savvy would come back and say, "What are you asking for, Joe's date
of birth?, or Joe's date of hire?".
See, so you have these kinds of natural
ambiguities that have to be addressed, and the language translation routines
often produce hilarious results. Consider the phrase, "the spirit is willing,
but the flesh is weak", it was translated it into Russian, then the translation
was again translated back into English and it stated, "the vodka is good, but
the meat is rotten". Hardly a literal translation of that phrase.
But this type of technology is getting much more
sophisticated and we are actually going to start seeing ASR, the whole area of
Automatic Speech Recognition, taking over the desktops. We will actually be
keyboard and mouse free, and people with actually interact with their computers,
much as they would if they were individuals today. Within the more ho hum,
mundane realm of information systems, we're finally seeing another nascent
technology finally taking off. And here, we're talking about object technology.