1 / 20

Summarization of Broadcast News using Speaker Tracking

Summarization of Broadcast News using Speaker Tracking Sree Harsha Yella, Kishore Prahallad, Vasudeva Varma LTRC, IIIT-Hyderabad. Introduction .

vic
Télécharger la présentation

Summarization of Broadcast News using Speaker Tracking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarization of Broadcast News using Speaker Tracking Sree Harsha Yella, Kishore Prahallad, Vasudeva Varma LTRC, IIIT-Hyderabad.

  2. Introduction • Summarization is a process of extracting important information present in a media either by extraction or abstraction and presenting it to the user in desired manner. • Speech summarization systems take speech signal as input and give summary in either text or speech form. Speech signal Speech Summarization summary Text/Speech

  3. Previous work • Broadly classified into two categories • Application of text summarization approaches such as MMR and LSA on ASR output Speech signal text summary • Supervised systems using both lexical and acoustic features ASR Text Summarization ASR Lexical features Classifier Summary Speech signal Signal Processing Acoustic features training Manual summaries

  4. Previous work • Issues • Dependence on ASR output and/or • Human reference summaries for training classifier • Current work aims to summarize news without above issues using anchor speaker tracking

  5. Scope and Aim • Scope • Broadcast news featuring an anchor speaker with reporters and other speakers taking turns to present a news story • Aim of current summarizer • Generate extractive audio summaries that are indicative or informative

  6. Motivation • Anchor speaker segments are precise, informative and well formed • They may form a good candidates for extractive summarization • Key idea • Apply speaker tracking technology to track anchor speakers • Use automatically identified anchor speaker's segments for speech summarization

  7. Dataset used • Global news podcast of BBC news • Daily report of news stories all round the world • Single anchor speaker in a show • Total 10 shows with 5 anchor speakers • 3 male and 2 female

  8. Overview of system • Block diagram of summarization system Speech signal Feature Extraction Anchor speaker tracking Post processing Final anchor segments Concatenation Summary

  9. Speaker Tracking • Auto associative neural network • Identical mapping in input space • Input vector is given as desired output • Iterative training • 13 mfcc’s for each speech frame • Initial 30 seconds of speech from anchor speaker • Confidence measure (c[n]) on mean squared error (e[n]) • c[n] = -exp(e[n]), where n is the frame number

  10. Speaker Tracking • Smoothed confidence contour with anchor regions marked

  11. Speaker Tracking • Iterative training • Smoothed confidence contour is divided into non-overlapping segments of 5 sec each • Threshold is computed as mean of the confidence scores of the training data • Confidence scores of each segment are compared against the threshold • Identified segments are added to training data • The process is repeated until the model converges

  12. Performance of Speaker Tracking

  13. Post processing • Missed segment identification • Unidentified segments having anchor speaker segments on either side • False alarm detection • Isolated segments without an anchor speaker segment in the neighbourhood of 10 sec • Final anchor speaker segments are obtained by adding the missed segments and removing the false alarms

  14. Summary construction • Concatenation with compression • Compression ratio (cr)‏ • Summary length (Sl) = cr * (Tl), where Tl is total length of news show • Approximate number of news stories • Final anchor speaker regions (N)‏ • Duration of each story in summary (D)‏ • D = Sl/N • Concatenation of initial D seconds of speech from each news story

  15. Evaluation • Two types of evaluations • Rouge based evaluation • To measure n-gram overlap between reference summaries and automatic summary. • Human evaluation • To evaluate the quality of audio summaries

  16. Evaluation • ROUGE based evaluation • Automatic audio summaries are transcribed by hand into text • Model summaries are generated by humans for 25% compression ratio

  17. Evaluation • Recall (solid line), Precision (dashed line), F-measure (dotted line) for various cr

  18. Evaluation • Human evaluation • Question based • Questions of type what, where, who, when • 5 under graduate students listened to summaries of different compression ratios and answered the questions

  19. Conclusions • Proposed a method to generate automatic audio summaries for broadcast news • Good overlap between reference summaries and automatic summaries • Audio summaries showed an increase in recall with increase in compression ratio without much drop in the precision • Human evaluation of the audio summaries also show a similar trend

  20. References • H. Christensen, B. Kolluru, Y. Gotoh, and S. Renals. 2004. From text summarisation to style-specific summarisation for broadcast news. In ECIR. • S. Furui, T. Kikuchi, Y. Shinnaka, and C. Hori. 2004. Speech-to-text and speech-to-speech summarization of spontaneous speech. Speech and Audio Processing, IEEE Transactions on, 12(4):401–408, July. • A. Inoue, T. Mikami, and Y. Yamashita. 2004. Improvement of speech summarization using prosodic information. In Proc. Speech Prosody, Japan. • Balakrishna Kolluru, Heidi Christensen, and Yoshihiko Gotoh. 2005. Multi-stage compaction approach to broadcast news summarisation. In Proceedings of Eurospeech, pages 69–72. • S. Maskey and J. Hirschberg. 2008. Intonational phrases for speech summarization. In Interspeech. • Inderjeet Mani. 2001. Automatic Summarization. John Benjamins. • B. Yegnanarayana, K. Sharat Reddy, and S. P. Kishore. 2001. Source and system features for speaker recognition using aann models. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 409–412. • K. Zechner. 2001. Automatic generation of concise summaries of spoken dialogues in unrestricted domains. R and D in IR, pages 199–207. • B. Yegnanarayana. 2004. Artificial Neural Networks. Prentice-Hall of India Pvt.Ltd.

More Related