Automatic speech recognition (ASR) is intended to allow a computer system to under- stand anyone who speaks to it, regardless of what they say. Although it has not yet fulfilled expectations, it is generating a lot of interest, and programmers should prepare to meet its challenges.
New ASR products have built-in vocabularies, but the association of words to events, such as ``display the order for the customer,'' are done by programmers.
From the user's perspective, voice programs appear easy to use. MedTalk, an ASR product from Kol Vox, Inc. in Toronto, for instance, allows complex patient scheduling and records management to be performed with voice. A physician can simply state a patient's name, which in turn invokes a procedure that extracts records from a database and allows him to verbally append the information.
But despite the simplicity, a great deal of sophisticated programming lies behind this interface. ``ASR provides a task-driven interface to the computer, but programmers need an entirely new skill set to create user-friendly systems,'' says Dan Thompson, co-founder of Kol Vox.
A voice interface is similar to the macro definition facility within Windows. When the macro recorder is turned on, all subsequent keystrokes are recorded. The programmer ``names'' the keystroke sequence, usually to a function key. When the function key is hit, the keystrokes are executed.
Voice interfaces work in much the same way. A programmer determines the keystroke sequence and defines them to the voice interface by assigning the sequence to an utterance (a spoken word or phrase). When that particular word is spoken, the keystrokes are executed.
Newer ASR systems are being developed for industry-specific applications, but many do not require programming. They simply plug and play. More general products, such as Dragon Systems, Inc.'s DragonDictate and IBM's VoiceType 2, however, can be customized, allowing programmers to create custom voice interfaces for different applications.
Other products such as voice macro tools provide an interface to voice systems and are used to associate phrases with events such as ``open window'' and ``close document.''
The biggest challenge for programmers is when voice macros are parameterized. A parameter is a key word that is expected to influence the sequence of events. For example, the Print utterance can have a file name parameter, so when Print Mydoc is uttered, the system takes the Mydoc file and directs it to an output device.
Polymorphic parameters are especially intricate. Polymorphism occurs when a word is modified by another word and causes a function change. For example, a Print utterance will invoke different procedures depending upon the destination of the print job. Print to Fax will have a very different behavior than the Print to Printer utterance.
Synonyms are also problematic. A good implementation of ASR should include all utterances that possess a common meaning. For example, the Exit function should be associated with the utterances of Clear, Leave, Bye and Close. Programmers must learn to program for this ambiguity.
Despite these difficulties, experts predict that programmers will
eventually be able to create ASR front ends for databases that accept user
slang and shorthand. In the long term, ASR technology will eventually
improve the speed and the efficiency of operations and make some tasks
easier. It is only a matter of time before more companies jump on the