1 / 12

NetTalk Project

NetTalk Project. Speech Generation Using a Neural Network. Michael J Euhardy. The Speech Generation Idea. Input: a specific letter whose sound is to be generated Input: three letters on each side of it for a total of seven letters input

ishana
Télécharger la présentation

NetTalk Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NetTalk Project Speech Generation Using a Neural Network Michael J Euhardy

  2. The Speech Generation Idea • Input: a specific letter whose sound is to be generated • Input: three letters on each side of it for a total of seven letters input • Output: the sound that should be generated based on the input letter and the surrounding letters

  3. The Strategy • 26 possible letters • 7 input position • Map each letter in each position to a unique input 7*26 = 182 total inputs

  4. The Strategy • 57 possible sounds generated • Map to 57 output labels

  5. The Resulting ANN A fully connected single layer perceptron with 182 inputs and 57 outputs

  6. The Findings • The trained neural network performs very well, and the larger the training set and the longer spent training on it, the better it performs • The training can be an extremely long process if a high rate of classification is desired and the training set is large

  7. Problems • Time • Space

  8. Time • You can’t rush training the network. Even using a dual PIII-733 with 512MB, it still took a really long time to train any data of a significant size. And just converting all of the characters in the data file to the matrices necessary to use as inputs and labels took hours.

  9. Space • 20000 words of data with maybe 7 letters on average. That’s a matrix 140000x239 Double precision in Matlab, that’s a lot of memory

  10. Workarounds • Smaller data set, only 1000 words • Lower standards of training, only train to 80% classification

  11. Next Time • C++ • Matlab is way too slow and way too memory intensive • Start Earlier, it’s a long process • Multi-Layer Perceptron

  12. Conclusion • I give up! • I don’t know how Microsoft’s Narrator does it, but I bet it doesn’t do it this way.

More Related