120 likes | 241 Vues
This project aims to create a portable dictation system designed to record impromptu meetings in natural environments. It will feature capabilities like detecting multiple speakers, allowing for corrections and annotations, and supporting efficient indexing and searching of recorded content. Initially focusing on single-user dictation, the system will integrate advanced correction mechanisms, facilitating faster and more intuitive editing. Future enhancements include wireless operation, improved synchronization, gesture-based corrections, and comprehensive text formatting options.
E N D
A Prototype Personal Dictation System Adam Janin janin@icsi.berkeley.edu
Final Goal – A Portable Meeting Recorder • Record impromptu meetings in a natural environment. • Detect multiple speakers. • Allow correction and annotation. • Support indexing and searching. • Self-contained (using IRAM).
Intermediate Goal – A Personal Dictation System • Record a single user dictating text. • Allow correction and editing. • Hosted system: • ASR runs on workstation. • GUI runs on Pilot. • Communicate via wired network. • Close-talking mic. • Limited domain (Broadcast News).
Asides... • Why not Wizard of Oz? • Structure of correction mechanism is recognizer specific. • Develop infrastructure. • Produce a working demo. • Informal user study, mostly with speech researchers.
Architecture Palm Pilot Correct transcripts Edit transcripts Create new text Sun Workstation Audio frontend Speech recognizer Correction server
Correcting and Editing • Correcting – informing the recognizer that it has made an error. • If recognizer has a good idea of alternatives, it may be faster to correct than to edit. • Recognizer can adapt to user and vocabulary. • Editing – changing the output. • “That’s not what I meant to say”. • Text vs. speech input.
Correction Methods: Background • Lattice contains recognizer’s best guesses. • More compact than N-best lists. • Contains word order and timing. 1). the records … 2). a rack ... 3). the wreck or … 4). a record ...
System picks all words that overlap in time. Correction Methods: Selecting Hypotheses • User corrects “records”. • Presents in order from most likely to least. • Note: full overlap is probably not optimal. 1). the records … 2). a rack ... 3). the wreck or … 4). a record ...
Select only paths with “record”. • Rescore lattice. Correction Methods: Rescoring • User corrects “records” to “record”. Unexpected changes! 1). the records … 2). a rack ... 3). the wreck or … 4). a record ...
Editing • Allows user to add or edit text arbitrarily. • Must synchronize with correction server. • Edit vs. Correct is currently implemented modally with push buttons on-screen. • Gestural interface for correcting and editing would be preferable.
Details... • Correction allows for words not in lattice. • Tap to correct worked better than press-and-hold. • System updates text when user pauses. • Doesn’t handle punctuation, paragraphs, etc. • Correction is fast, but dictation is slow.
Future Work • “Real” user studies. • Experiment more with correction mechanisms. • Implement editing synchronization. • Implement gestures. • Move to wireless network and mic. • Add punctuation, paragraphs, etc.