detector is a device invented or created to detect the sounds that are made
when people speak or sing. Computer scientists have been searching ways to
enable computer to record, interpret and understand human speech since 1960’s. This
has been a dauting task throughout the decades. Even the most rudimentary
problem such as sampling voice was a huge challenge in the early years. It took
until the 1980s before the first systems arrived which could actually decipher
speech (Goel and Singh, 2014).
with the expectation of sound handling technique evolution, inventors and
engineers invented the first voice recognition system in 1950s which could only
recognize digits (Pinola, 2011). “Audrey”, the first voice recognition system
in 1952, was able to recognize spoken digits (Warren, 2014). In other words,
“Audrey” could only distinguish between ten digits from zero to nine. The IBM
Shoebox was the most advanced voice recognition machine because of the ability
to understand 16 words spoken in English when it was revealed at the Seattle
World’s Fair in 1962 (Kane, 2015). The improvement of voice recognition
technology can be seen after 20 years in Harpy system. Harpy is a voice
recognition system developed in Carnegie-Mellon University resulted from the
performance analysis in various design choices of two earlier speech
recognition systems, which are Hearsay-I system and Dragon system (Lowerre,
1976). According to Pinola (2011), Harpy system could understand 1101 words,
approximately the vocabulary that may learn by a three years old child. In
1980s, the Hidden Markov Model (HMM) was the turning point of voice recognition
to voice prediction (Gales and Young, 2007). HMM allows the conversion from
sound input to words written output accurately by using voice prediction
technology. While in 1990s, the first voice recognition product for consumer,
Dragon Dictate has been developed. This new product can recognize continuous speech
in about 100 words per minute (Pinola, 2011).
Moving to late 2000s, Google has introduced a voice recognition software
that will serve as a foundation for the company’s future Voice Search product
(Huang, Baker, and Reddy, 2014).
that, according to Martins, Trancoso, Abad, and Meinedo (2009), current voice
detector technology can recognize the gender identity by detecting the voice.
This means that the gender of the speaker can be determined after analysis made
on the voice detected. Nevertheless, voice detector is used to detect unusual
voice for nursing system purposes (Wilson et al., 2009). Examples of the
unusual voice include cough, groan, wheeze, cry and etc. In addition, voice/non-voice
(VNV) detection which used for determining the vocal folds activity regions in
the speech signal are widely used in speech processing applications such as
speech enhancement, speech coding and speech recognition (Kumar and Rao, 2016).
the implementation of Artificial Intelligence (AI) in sound technology,
interaction between humans and machines such as computer or smartphones has
been allowed nowadays. For instance, Siri in Apple smartphones and iPads,
Google search and Window 10 Cortana has allowed the interaction between humans
and smart device.