Voice Activated Un-Lock Technology

Voice Activated Un-Lock Technology V.A.U.L.T A Matlab based Simulation

By Siddharth Advani B2213401 Anand Gokhale B2213420 Vishal Jain B2213426 Guided by Dr. P.M. Patil

OBJECTIVE Correct decision on a speaker’s identity claim given a speech segment (password)

MOTIVATION • Speech contains speaker specific characteristics • Voiceprint as a biometric (distinguishing trait) • Natural & economical way of identification

DEFINATIONS Client: speaker registered on the system Impostor: speaker who claims a false identity Mel-filtering: a frequency scaling that takes into account the fact that the ear is sensitive to linear changes in frequency below 1000 Hz and logarithmic change in frequency above 1000 Hz

What is Simulation? A simulation is the imitation of the operation of a real world process or system over time. Using MATLAB as a tool, VAULT aims at simulating a voice recognition system

Software Implementation

MATLAB Features: • Interpreter  Meant for simulation in R&D • High performance numerical computation • Signal Processing Toolbox

Visual Basic Features • Easy to implement. • Very user friendly, interactive. • Compatible with MATLAB and any Windows version. • Less complicated than the GUI of MATLAB. • Any Microsoft application can be embedded in the VB.

Zones Of VAULT

Phase 1 - Identification FEATURE EXTRACTION PATTERN RECOGNITION USER ID WORD SYSTEM DATABASE TRAINING

PROCESS FEATURE EXTRACTION VECTOR QUANTIZER DECISION WORD WORD IS SAMPLED AT 11.025 kHz PHASE 1 - IDENTIFICATION THE WORD IS DIVIDED INTO SEGMENTS 256 SAMPLES IN EACH SEGMENT

8 CEPSTRUM COEFFICIENTS ARE CALCULATED FOR EACH SEGMENT PHASE 1 - IDENTIFICATION PROCESS FEATURE EXTRACTION VECTOR QUANTIZER DECISION WORD

PROCESS FEATURE EXTRACTION VECTOR QUANTIZER DECISION WORD VECTOR QUANTIZATION IS USED TO CREATE CODEBOOK PHASE 1 - IDENTIFICATION CEPSTRUM COEFFICIENTS ARE QUANTIZED USING A CODEBOOK OF 128 VECTORS

PROCESS FEATURE EXTRACTION VECTOR QUANTIZER DECISION WORD DISTANCE=8 ? DISTANCE=16 DISTANCE=5 DISTANCE=12 PHASE 1 - IDENTIFICATION 1 2 CLIENT 3 3 DISTANCE=12 4

Database 4 1 2 3 Identification EVERY SPEAKER IS GIVEN A TAG ‘Zero’ 4

PHASE 2 - Authentication ACCEPT REJECT PATTERN RECOGNITION FEATURE EXTRACTION PASS-WORD SYSTEM DATABASE TRAINING

PROCESS FEATURE EXTRACTION VECTOR QUANTIZER CODEBOOK WORD THE SPEECH IS SAMPLED AND THE CEPSTRUM COEFFICIENTS ARE CALCULATED THE SAME WAY AS IN THE IDENTIFICATION PHASE PHASE 2 - AUTHENTICATION

PROCESS FEATURE EXTRACTION VECTOR QUANTIZER CODEBOOK WORD THIS TIME THE QUANTIZER USES A PERSONAL CODEBOOK TRAINED BY THE REAL USER PASSWORD PHASE 2 - AUTHENTICATION USER

PROCESS FEATURE EXTRACTION VECTOR QUANTIZER CODEBOOK WORD PHASE 2 - AUTHENTICATION THRESHOLD PASSWORD DECISION ACCEPT/ REJECT DISTANCE USER CLIENT THRESHOLD DECIDES THE DECISION

Main Obstacle • How to define and extract the unique features of human voice CEPSTRUM cepstrum(frame)=IDFT(log(|DFT(frame)|))

STOCHASTICMODEL TEMPLATE MODEL DETERMINISTIC BETTER SCORE  MIN. DIST PROBABILISTIC BETTER SCORE  MAX. PROB PATTERN MATCHING Dynamic Time Warping Vector Quantization Nearest Neighbour Hidden Markov Model Gaussian Mixture Model

VECTOR QUANTIZATION Goal: finding how the data is clustered • A (feature) vector space is broken into cells • Speaker model: codebook • Codebook: set of prototype vectors (codevectors) • Codevector: vector computed from "similar" single (feature) vectors (e.g. 8 cepstrum coefficients makes 1 codevector)

CLUSTERING

RESULTS THRESHOLD = 5 REJECT ACCEPT

PERFORMANCE EVALUATION • False Rejection (FR) – A client request as himself/herself is rejected • False Acceptance (FA) – An impostor request as a client is accepted • Genuine Acceptance (GA) – A client request as himself/herself is accepted

ACCURACY • FAR (False Acceptance Rate): Prob. of false acceptance Estimate: # false acceptances ---------------------------------------- # false claims • FRR (False Rejection Rate): Prob. of false rejection Estimate: # false rejections ---------------------------------------- # true claims • GAR (Genuine Acceptance Rate): Prob. of genuine acceptance Estimate: # true acceptances ---------------------------------------- # true claims

GRAPHS

THRESHOLD The threshold T can be determined by: 1) choosing T to satisfy a fixed FA or FR criterion 2) varying T to find different FA/FR ratios and choosing T to give the desired FA/FR ratio.

SOURCES OF ERROR CLIENT: • Bad Pronunciation • Extreme emotional states (e.g. stress) • Sickness (head colds alter the vocal tract) • Aging (vocal tract can drift away from models with age) • Channel mismatch (using different microphones for enrollment and verification) IMPOSTER: • Mimicry AMBIENT NOISE

STRENGTHS & WEAKNESSES Strengths • SPEECH IS EASY TO PRODUCE • LOW COMPUTATION REQUIREMENTS • SPEECH IS A BEHAVIORAL SIGNAL • SPOOFING OF SYSTEMS Weaknesses

APPLICATIONS • Security Systems • Voice Dialing • Access control to computers / databases • Remote access to computer networks • Electronic commerce • Forensic • Telephone banking

Hardware Application Robotics Aim: To control a robot via voice

Robot Control via Voice

Parallel Port Interface • 25 pin D-type Male Connector • Parallel port of computer :3 registers • Data register • Status register • Control register

FM Transmitter-Receiver • Frequency of operation: 433.92 MHz • Modulation type : ASK • Bandwidth : 200 kHz

FEA – The Robot Features • Wireless • Prime Mover: DC motors

Relay Driver IC ULN2803 • Eight Darlington Arrays • Internal Free Wheeling Diodes • Output Compatible with TTL logic

FEA’s Drivers IC L293B Motor Driver IC • Four Channel drivers • Bidirectional Motor drive • High voltage , high current output

PROJECT TIME DISTRIBUTION JAN –PARTICIPATED AT IIT TECHFEST FEB –(a) SUBMITTED PAPER AT TECHKRITI KANPUR (b) MADE FEA FOR FERVOR AT COEP (c) MATLAB & VISUAL BASIC TRAINING MAR – PHASE 1 & 2 COMPLETED IN MATLAB APR – MATLAB & VISUAL BASIC INTERFACE MAY – EVALUATION OF SOFTWARE: FAR,FRR & GAR JUNE – APPLICATION BOARD

Future Expansion • Implementation over the DSP board • Making the system to work in real time • Speech Recognition

Voice Activated Un-Lock Technology