Tuesday, 4 December 2018

Speech Recognition

In computer science and electrical engineering, speech recognition (SR) is the translation of spoken words into text. It is also known as automatic speech recognition (ASR), computer speech recognition, or just speech to text (STT).

Some SR systems use speaker-independent speech recognition while others use training where an individual speaker reads sections of text into the SR system. These systems analyze the person-specific voice and use it to fine-tune the recognition of that person speech, resulting in more accurate transcription. Systems that do not use training are called speaker-independent systems. Systems that use training are called speaker-dependent systems.


Applications

Speech recognition applications include voice user interfaces such as voice dialling (e.g. Call home), call routing (e.g. I would like to make a collect call), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).

The term voice recognition or speaker identification refers to finding the identity of who is speaking, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person voice or it can be used to authenticate or verify the identity of a speaker as part of a security process. In-car system, typically a manual control input, for example by means of a finger control on the steering-wheel, enables the speech recognition system and this is signaled to the driver by an audio prompt. Following the audio prompt, the system has a listening window during which it may accept a speech input for recognition.

Simple voice commands may be used to initiate phone calls, select radio stations or play music from a compatible smartphone, MP3 player or music-loaded flash drive. Voice recognition capabilities vary between car make and model. Some of the most recent car models offer natural-language speech recognition in place of a fixed set of commands. allowing the driver to use full sentences and common phrases. With such systems, there is, therefore, no need for the user to memorize a set of fixed command words.

In the health care sector, speech recognition can be implemented in front-end or back-end of the medical documentation process. Front-end speech recognition is where the provider dictates into a speech-recognition engine, the recognized words are displayed as they are spoken, and the dictator is responsible for editing and signing off on the document. Back-end or deferred speech recognition is where the provider dictates into a digital dictation system, the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the editor, where the draft is edited and report finalized.

Deferred speech recognition is widely used in the industry currently.

No comments:

Post a Comment