1 / 23

ROSIDS - R apid O pen S ource I ntelligence D eployment S ystem

ROSIDS - R apid O pen S ource I ntelligence D eployment S ystem. Mark P. Pfeiffer, SAIL LABS Technology AG mark.pfeiffer@sail-technology.com August 7, 2006. open source intelligence IS. intelligence gather by publicly accessible sources (TV, Radio, Newspapers, Internet...)

sharla
Télécharger la présentation

ROSIDS - R apid O pen S ource I ntelligence D eployment S ystem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ROSIDS - Rapid Open Source Intelligence Deployment System Mark P. Pfeiffer, SAIL LABS Technology AG mark.pfeiffer@sail-technology.com August 7, 2006

  2. open source intelligence IS • intelligence gather by publicly accessible sources (TV, Radio, Newspapers, Internet...) • 85% of used intelligence is open source intelligence • OSINT is only a single digit % of the intelligence budget

  3. Government - SAIL LABS Project • “A” Navy needed a reliable, robust, independent, maintenance free, real-time and inexpensive open source intelligence (OSINT) tool for Arabic TV and radio … • and they needed it fast.

  4. 1st Step: Needs Assessment • “Need” • Close caption insert • Time shift • 60 seconds (10-20s speech engine*, 10s translation, 30s safety buffer) • “Have” • SAIL LABS: reliable, real-time and robust ASR for Arabic • Sakhr: fast, reliable Arabic translation engine *=due to the nature of languages themselves, engine requires only 2s

  5. Result They said one of our competitors could deliver in 30 days at very little cost! We said: „Sorry, but we don‘t want to disappoint a customer“

  6. 2nd Step: 1 year later (366 days exact) • The same Navy still needed a reliable, working, robust, independent, maintenance free, real-time and inexpensive open source intelligence (OSINT) tool for Arabic TV and radio ... • fast (well, at least as quick as it works!)

  7. Result We decided to build and offer ROSIDS (Rapid Open Source Intelligence Deployment System)

  8. Building ROSIDS

  9. Building ROSIDS • Requires close work with • Someone who knows time shifting • Someone who knows ASR • Someone who knows translation technologies • Someone who knows how to put this all together

  10. Situational AwarenessInternational Crisis ManagementOpen Source Intelligence Real-time Speech-to-Text (ASR) Translation (MT) ROSIDS Arabic to English Also to and from: Arabic, English,French, German, Greek, Polish, Spanish, …

  11. Schematic Layout • Satellite • TV Antenna • Cable • Radio Real-time 30s latency Sail Labs ROSIDS • Speech Recognition • Text Translation Store in archive Sail Labs Media Mining

  12. Accuracy Hits • How do you make this thing readable? • ASR WER is 5-25% (depends on audio, domain, etc) • Translation error rate is 5-30% (depends on source text) • Combined untreated error rate CAN GO ANYWHERE! • Context is much more important than WER!

  13. Machine Translation (MT) • Traditional MT sources from Books • MT + ASR, MT must assume non structured, non grammatical,no syntax • New MT models where adapted to Broadcast news

  14. context relation reference meaning No impact Impact of ASR and MT combined errors

  15. Remedies BAD RESULT source ASR domain MT vocab

  16. Remedies VERY BAD RESULT source domain MT ASR vocab

  17. Remedies VERY BAD RESULT MT source domain ASR vocab

  18. Remedies GOOD RESULT source ASR MT domain vocab

  19. Remedies GOOD RESULT domain MT source ASR vocab

  20. Remedies GOOD RESULT domain source MT ASR vocab

  21. Human vs Machine It will always be necessary to get somebody who is familiar with the language and even better with the cultural environment to look at the relevant piece and decide what it means. ROSIDS just helps a non-linguist decide when to get (wake) the analyst and when better let him sleep!

  22. Mark P. Pfeiffer, SAIL LABS Technology AG mark.pfeiffer@sail-technology.com US cell: (571) 224 7275

More Related