Front-end of Wake-Up-Word Speech Recognition System Design on FPGA (Dissertation)

Date

2015-2

Type

PhD Thesis

Thesis title

Author(s)

Mohamed Muftah Eljhani

Abstract

A typical speech recognition system is push button operated (Push-to-talk), which requires hand movement and hence mixed multi-modal interface. However, for disabled patients and those who use hands-busy applications (e.g., where the user has objects to manipulate or device to control while asking for assistance from another device) movement may be restricted or impossible. The only alternative is to use Speech Only Interface. The method that is being proposed is called Wake-Up-Word Speech Recognition (WUW-SR). A WUW-SR system would allow the user to operate (activate) many systems (Cell phone, Computer, Elevator, etc.) with speech commands instead of hand movements. This work defines a new front-end paradigm of the Wake-Up-Word Speech Recognition on Field Programmable gate Arrays (FPGA). The-State-Of-The-Art Front-end of WUW-SR system is based on three different subsystems that produce three sets of features: (1) Mel-frequency Cepstral Coefficients (MFCC), (2) Linear Predictive Coding Coefficients (LPC), and (3) Enhanced Mel-frequency Cepstral Coefficients (ENH_MFCC). These extracted features are then compressed and transmitted to the server via a dedicated channel, where subsequently they are decoded. These features are decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the WUW-SR. In the WUW-SR system, the front-end processor is located at the terminal (e.g. Mobile phone) which is typically connected over a data network to remote back-end recognition (e.g., server). WUW’s front-end can be added to any hand-held electronic device compatible with WUW-SR and command (activate) it by using our voice only (no push to talk as is presently done). WUW’s front-end is designed, and implemented in Altera DSP development kit with Cyclone III FPGA as a portable system acting as a processor that is capable of computing three different sets of features at a much faster rate than software. It is cost effective, consumes very little power, and it is not limited by having to operate on a general-purpose computer so it can be used on any portable device.

Fulltext

View