1 / 53

Rapid Overview of Large Volumes of Text

Rapid Overview of Large Volumes of Text. Presented by: Roger Bradford 28 June, 2005. Workflow Environment – Inserting Conceptual Processing. Visualization. Data Acquisition Systems. Stored Data. Users. Content Analyst. Alert Generation. New Entity Identification. Taxonomy

eeaton
Télécharger la présentation

Rapid Overview of Large Volumes of Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rapid Overview of Large Volumes of Text Presented by:Roger Bradford28 June, 2005

  2. Workflow Environment – Inserting Conceptual Processing Visualization Data Acquisition Systems Stored Data Users Content Analyst Alert Generation New Entity Identification Taxonomy Generation Categorization Entity Extraction Machine Translation Novelty Ranking Alias Detection Other Software

  3. Scenario • Analyst Reviewing Newly Acquired Data • Topic of Interest = A Specific Terrorist Group – Salafist Group for Call and Combat (GSPC)

  4. Data Used • 1M News Articles: • 900K Analyzed • 100K New • Processing Time on a Single PC: • 900K ~<24 Hrs • 100K ~<3 Hrs • NO Auxiliary Structures: • Taxonomies • Ontologies • Grammars

  5. Chemical Nuclear Bioterrorism Rapid Information Overview Taxonomic Overview Incoming Document Stream Alerts Relevant And Unique Items Content Analyst • Mourad Merabet • Abdelouahab Djouba • Venissieux network • URSSAF mosque Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim New Entities Categorized Items Possible Aliases

  6. Alerts Taxonomic Overview Incoming Document Stream Alerts Relevant And Unique Items Content Analyst • Mourad Merabet • Abdelouahab Djouba • Venissieux network • URSSAF mosque Chemical Nuclear Bioterrorism Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim Menad Benchellali  Abdel Hakim Laurent Mourad  Djamel Beghal New Entities Categorized Items Possible Aliases

  7. Alert Generation Analysts Indicate High-interest Topics using Example Documents: Documents like this Relate to Biological Warfare…Alert me if Anything Else like this Comes in Xxxxxxxxx Xxxxxxxxx ..smallpox.. Xxxxxxxxx ..anthrax… Newly Acquired Document(s) being Relevant to Designated Topic  Associated Country Highlighted

  8. Example – Top Priority (Red) = WMD-related Israel Warns of Nuke Development in Iran and Syria Israel has stepped up its warnings in recent weeks of the potential dangers posed by Iran and Syria. In particular, Israeli officials warn that Iran could soon develop nuclear weapons and they object to what they say were Syrian attempts to purchase upgraded missiles from Russia. Earlier this week, the chief of Israel's Mossad intelligence agency told a parliamentary committee that by the end of 2005 Iran will have the technology needed to develop nuclear weapons within several years after that. The head of Israel's parliamentary security and foreign affairs committee, Yuval Steinitz told VOA, the warnings are urgent. "The Iranian very ambitious nuclear program is combined with a very ambitious ballistic missile program and the real aim of this is not becoming a regional nuclear power, but a global nuclear superpower and if this will happen a dark curtain will cover the Middle East and Europe and the rest of the world." The U.N. nuclear watchdog agency, the I.A.E.A. has been looking into Iran's nuclear program, but has yet to find clear evidence that Tehran intends to make nuclear weapons. Iran's Foreign Ministry spokesman rejected the Israeli allegations and accused Israel of simply trying to divert attention from its own nuclear program. Israel is widely believed to have nuclear weapons, but will neither confirm nor deny their existence and has not signed on to the international nuclear non-proliferation treaty. Israeli officials have also recently raised alarm bells about what they say were plans by Syria to purchase upgraded weapons from Russia, in particular the SA-18 Igla shoulder-fired surface to air missile. The Israeli warning came prior to a visit to Moscow this week by Syrian President Bashar al-Assad. In Moscow, President Assad denied that the issue was under discussion, but also defended his country's right to buy what he called "defensive" weapons. Israeli security analyst, Shlomo Brum of Tel Aviv's Jaffee Center for Strategic Studies says the SA-18 missile is not really new. He says he inspected one of them 15 years ago. "When I was Israeli Defense Attache in South Africa and the South Africans had captured some of these missiles in Angola. We know it for the last 15 years and so I presume we are quite capable of developing countermeasures against Shlomo Brum says, however, such missiles in the hands of Syria or Syrian-supported terrorist groups such as Hizbollah would present a challenge though it would not likely change the balance of military power between Syria and Israel. But, Yuval Steinitz says there is information that Syria is also attempting to acquire nuclear capability.

  9. Example – Top Priority (Red) = WMD-related (Cont’d) Iraq Insurgents Seeking Chemical, Germ WeaponsInsurgent networks across Iraq are increasingly trying to acquire and use toxic nerve gases, blister agents and germ weapons against U.S. and coalition forces, according to a CIA report. Investigators said one group recruited scientists and sought to prepare poisons over seven months before it was dismantled in June. U.S. officials say the threat is especially worrisome because leaders of the previously unknown group, which investigators dubbed the "Al Abud network," were based in Fallujah close to insurgents aligned with fugitive militant Abu Musab al-Zarqawi. The CIA says al-Zarqawi, who is blamed for numerous attacks on U.S. forces and beheadings of hostages, has long sought to get chemical and biological weapons for use against targets in Europe as well as Iraq. An exhaustive report released last week by Charles Duelfer, the CIA's chief weapons investigator in Iraq, concluded that deposed Iraqi President Saddam Hussein destroyed his stockpiles of chemical and biological weapons in the early 1990s and never tried to rebuild them. But a little-noticed section of the 960-page report warns that the danger of a "devastating" attack with unconventional weapons has grown since the U.S.-led invasion and occupation of Iraq last year. The Bush administration, which went to war primarily to disarm the Baghdad regime of suspected illicit stockpiles, has not previously disclosed that the insurgent groups that have emerged and steadily expanded since Saddam's ouster now are seeking to develop their own crude supplies of such deadly agents as mustard gas, ricin and the nerve gas tabun. Neither of the two chemists who worked for Al Abud had any ties to Saddam's long-defunct weapons programs, and Duelfer's investigators found no evidence that the group's poison project was part of a "prescribed plan by the former regime to fuel an insurgency." For now, the leaders and financiers of the network "remain at large, and alleged chemical munitions remain unaccounted," the report said. It added that other insurgent groups are "planning or attempting to produce or acquire" chemical and biological agents throughout Iraq, and warned that the availability of chemicals and munitions, as well as sympathetic former Iraqi weapons scientists, "increases the future threat."

  10. Example – Top Priority (Red) = WMD-related (Cont’d) Iran Denies Smuggling Uranium from Monitored SiteTEHRAN (Reuters) - Iran denied Wednesday a report that it may have secretly moved some sensitive nuclear material from a site being monitored by the U.N.'s atomic watchdog. Diplomats told Reuters Tuesday that the International Atomic Energy Agency (IAEA) was taking an inventory of processed uranium at the Isfahan uranium conversion facility in central Iran amid concerns some may have been moved. The diplomats said one intelligence agency had accused Iran of spiriting an unspecified amount of processed uranium, which could be processed further and enriched for weapons purposes, out of Isfahan to an unknown location. Iran says its nuclear program is exclusively for generating electricity, not making bombs. Foreign Ministry spokesman Hamid Reza Asefi denied Iran had smuggled any uranium out of Isfahan. "Our nuclear activities are transparent and under the supervision of the IAEA," the official IRNA news agency quoted him as saying. "Iran seeks nuclear technology for peaceful purposes. It would be meaningless for Iran to smuggle" uranium from Isfahan, he added. Iran agreed last November to freeze work at Isfahan and all other nuclear fuel-related activities while it tries to reach agreement with the European Union over the future of its nuclear program. Diplomats say Britain, Germany and France, who are leading the EU talks with Iran, are currently considering an Iranian proposal to allow Tehran to keep a pilot uranium enrichment facility under IAEA supervision. But Iran's chief nuclear negotiator Hassan Rohani on Wednesday denied that any limitations on Iran's nuclear program were being considered.

  11. Conceptual Generalization Alerts Generated Covered: Dhori Virus Hemorrhagic Fever West Nile Virus Cholera Marburg Virus Typhoid Rabies Bovine Brucellosis Rift Valley Fever Diptheria Foot-and-mouth Disease BW Alert Exemplars Mentioned: Smallpox Plague Anthrax

  12. Terminology Variant Clustering Djamel Beghal Djamal Beghal Jamal Begal  Jamal Beghal  X Djamel Beghal      Jamal Baghal Djamel Begal Djamel Baghal Gemal Baqqal

  13. Can deal Effectively with Text that is Highly Corrupted

  14. Taxonomic Overview Taxonomic Overview Incoming Document Stream Alerts Relevant And Unique Items Content Analyst • Mourad Merabet • Abdelouahab Djouba • Venissieux network • URSSAF mosque Chemical Nuclear Bioterrorism Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim New Entities Categorized Items Possible Aliases

  15. Taxonomy of New Documents

  16. Taxonomy (Enlarged)

  17. Taxonomy – Second Level

  18. Taxonomy – Individual Document Titles

  19. Drill Down to Individual Documents A chechen lead has emerged in the case of discovery of ricin poison Antiterrorist police have focused on an alleged Algerian-dominated network whose operatives are believed to have received specialized training with biological and chemical weapons at Al Qaeda camps in the Russian republic of Chechnya. One of the suspected leaders is Abu Musab Zarqawi, a veteran terrorist who has operated in iraq, according to US officials.In January, British police arrested suspected members of the so-called ''Chechen network'' during a raid on a makeshift ricin lab in London. That group was linked to cells previously dismantled in the Paris suburbs of Romainville and La Courneuve.The arrests were part of a crackdown in Britain, France, and Spain that may well have averted cyanide gas attacks on the Russian Embassy here and on the London subway, French officials said.France's interior minister said yesterday that the new ricin case probably involves the same network.''One can think that there are ties, without being certain, to the Al Qaeda movement and the teams that were arrested in Romainville and La Courneuve,'' Interior Minister Nicolas Sarkozy said. ''But no information at our disposal leads us to affirm that France was targeted.''...One flask was labeled ''X-4 Pakistan.''

  20. Relevant and Unique Documents Taxonomic Overview Incoming Document Stream Alerts Relevant And Unique Items Content Analyst • Mourad Merabet • Abdelouahab Djouba • Venissieux network • URSSAF mosque Chemical Nuclear Bioterrorism Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim New Entities Categorized Items Possible Aliases

  21. Highlighting Novel Information Analyst’s Notes: Draft Report  Text Documents In Relevance Order Documents In Novelty Order Previously Reviewed Documents Lists Tables Previously Seen Information

  22. Novel Item UK: Roque Fernandes and Dominic Martins named in London dirty bomb arrests Three men accused of taking part in a terrorist plot were today remanded in custody. The men, charged with offences under the Terrorism Act, were arrested following information from the News of the World, which claimed it had uncovered a plan to supply radioactive material for a dirty bomb. Roque Flaviano Fernandes, 43, of De Havilland Road, Edgware; Dominic Agnello Martins, 44, of Du Cros Drive, Stanmore, both north-west London and Abdurahman Kanyare, 52, of Milling Road, also Edgware, appeared at a brief hearing at Horseferry Road Magistrates' Court. The court heard that all three faced charges under Section 17 of the Terrorism Act and are accused of having involvement in a plot to supply dangerous radioactive material to a third party between July and September this year.

  23. Relevant Term Highlighting

  24. Summarization • Although Gulf Islamists have long criticized their Algerian brothers for being too talkative and careless, they regard with respect militants of the GSPC [Salafist Group for Preaching and Combat], a breakaway group from the GIA [Armed Islamic Group], headed by Emir Hassan Hattab. • Despite the latest discoveries, they believe that London will remain a rear base, perhaps even a "terrorism laboratory" where explosives and chemicals destined for the continent are stored. • There were recently contacts in Algeria between Hassan Hattab and a Yemenite emissary of al-Qa'ida, Emad Abdelwahid Ahmed Alwan, killed this fall by government forces.

  25. Relevant Segment Identification ……….     "If I was thinking like al-Qaida, the prize would not be Mali or Chad, but Nigeria," said Lyman, listing its history of corruption, massive Muslim population and religious tensions as evidence.     Lyman said though he "gets mixed reports of al-Qaida's established presence in the area," Nigeria is ripe for ideological entrepreneurs who exploit poverty-stricken areas and popular discontent with governments.     Nigeria's 120 million people are almost evenly divided between the Muslim-dominated north and the Christian south, though much intermingling goes on. The adoption of strict Islamic law in some northern provinces has sparked outbreaks of violence in which thousands have died.     Many analysts fear the growth of Al-Sunnah wal Jama'ah, an extremist group that has attacked authorities and stirred sectarian unrest with the goal of creating a "Taliban-style" state in the north.     Believing West Africa to be the new front in the global war on terror, U.S. military officials have been engaged in the Pan-Sahel Initiative, a low-profile $7.75 million military program that began in 2004 to improve border security throughout the region.     PSI-trained troops of the Chadian army reportedly killed 43 GSPC militants in a cross-border firefight last March. Since then, the improved efforts of Algerian and Malian troops are said to have critically undermined GSPC activities. But more preventive measures are now under way.     The PSI will morph into the Trans-Saharan Counter Terrorism Initiative in June, the Pentagon announced this week. The beefed up $125 million initiative, which may draw more federal funding, will deploy U.S. special operations forces to train counterparts in Morocco, Algeria, Mauritania, Chad, Tunisia, Senegal, Nigeria, Mali and Niger and facilitate better collaboration on regional security.     "Everyone agrees the holistic approach is most effective," said Crisis Group's McGovern, adding he was satisfied that Washington had reached a decision and moved to push it forward.     "West Africa is not a hotbed of terrorism, but a place we want to sure up so it doesn't become one in the future," he said. "I think that's worth the investment of a few hundred million dollars." ………… Relevant Segment

  26. Dealing with New Terminology Today MOX is widely used in Europe and is planned to be used in Japan. Currently over 30 reactors in Europe (Belgium, Switzerland, Germany and France) are using MOX and a further 20 have been licensed to do so. Japan also plans to use MOX in around a third of its reactors by 2010. Most reactors use it as about one third of their core, but some will accept up to 50% MOX assemblies. France aims to have all its 900 MWe series of reactors running with at least one third MOX. Japan aims to have one third of its reactors using MOX by 2010, and has

  27. Instant Context Display uranium-plutonium mixed-oxide bnfl reactor reactor's plutonium-uranium bnfl's nuclear-fuel vver Today MOX is widely used in Europe and is planned to be used in Japan. Currently over 30 reactors in Europe (Belgium, Switzerland, Germany and France) are using MOX and a further 20 have been licensed to do so. Japan also plans to use MOX in around a third of its reactors by 2010. Most reactors use it as about one third of their core, but some will accept up to 50% MOX assemblies. France aims to have all its 900 MWe series of reactors running with at least one third MOX. Japan aims to have one third of its reactors using MOX by 2010, and has

  28. Chemical Nuclear Bioterrorism New Entities Taxonomic Overview Incoming Document Stream Alerts Relevant And Unique Items Content Analyst • Mourad Merabet • Abdelouahab Djouba • Venissieux network • URSSAF mosque Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim New Entities Categorized Items Possible Aliases

  29. New Entities List Mourad Merabet Abdelouahab Djouba Venissieux network URSSAF mosque

  30. Most Relevant Terms Display Omar Teguer Vaux-en-velin Belmehel Beddaidj Maamar Bedderar Menad Benchellali Mohammed Naouars Venisseux Belmehel Venissuex Vennissieux network Mourad Merabet Abdelouahab Djouba Venissieux network URSSAF mosque

  31. New Entities – Exploration GSPC member, Romainville GSPC Member, Romainville Nearby French Town GSPC Member, Romainville Al Qaeda-trained CW Expert Terrorist Suspect Arrested in Lyons French Town Unidentified French Town - Misspelled • Omar Teguer • Vaux-en-velin • Belmehel Beddaidj • Maamar Bedderar • Menad Benchellali • Mohammed Naouars • Venisseux • Belmehel • Venissuex Vennissieux network

  32. Vennissieux Network Omar Teguer Maamar Bedderar Menad Benchellali Mohammed Naouars Belmehel Beddaidj

  33. Categorization Taxonomic Overview Incoming Document Stream Alerts Relevant And Unique Items Content Analyst • Mourad Merabet • Abdelouahab Djouba • Venissieux network • URSSAF mosque Chemical Nuclear Bioterrorism Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim New Entities Categorized Items Possible Aliases

  34. Chemical Biological Nuclear Categorization Approach Incoming Document Stream Content Analyst Exemplars Chosen by Analyst Xxxxxxxxx Xxxxxxxxx ..smallpox.. Xxxxxxxxx ..anthrax… Categorized Items

  35. Categorization – Nuclear Example IAEA: Traces of enriched uranium found in LibyaA leaked report from the International Atomic Energy Agency says that highly enriched uranium was found in centrifuges in Libya's nuclear facilities, a diplomat who has seen the document said Friday. The report was given to the 35 member countries of the IAEA's governing board ahead of a meeting scheduled for Tuesday to consider Libya's nuclear program. Details of the IAEA reports often are leaked after they have been handed to member countries. "The report says low and high enriched uranium was found in centrifuges in Libyan nuclear facilities but there is no determination as to the source of the uranium," the diplomat said. The diplomat, familiar with recent IAEA programs, said the findings were similar to recent controversial infringements found in centrifuges in Iran. Iran claimed the traces of uranium were on recently imported equipment obtained from the black market. "As far as is known, Libya has not enriched any uranium itself so a similar explanation may be feasible," the diplomat said. The diplomat said the report did not mean Libya had breached the commitments it recently made to end its weapons of mass destruction programs. Chemical Biological Nuclear

  36. Categorization – Biological Example Arrests foil new ricin poison plot POLICE and the intelligence services were on alert last night after the discovery of ricin and explosives revealed the extent of the threat to Britain from the expanding terror network in Europe. Spanish authorities launched dawn raids on 12 addresses in Barcelona and on the Costa Brava, following tip-offs from France and Britain. They discovered substantial quantities of the deadly toxic agent ricin. In Italy, police uncovered Semtex explosives. Both operations were linked to the discovery of traces of ricin in a house in London this month. Spanish police arrested at least 16 people and seized two barrels of ricin as well as bomb detonators and a number of false passports and credit cards. The Spanish operation involved more than 150 officers, while members of anti-terrorist forces from other countries, including Scotland Yard’s SO13, flew to Spain prior to the operation. The arrests - in Barcelona, Gerona and the Costa Brava resort town of Bagnols - came barely 24 hours after Italian police detained five terror suspects believed to have been plotting an attack on central London. The men, all of Moroccan origin, were arrested after 2.2lbs of plastic explosive and detailed maps of the London Underground were found in a farmhouse near Venice. Scotland Yard sources expressed concerns that the arrests of the terrorist suspects in Spain may point to a larger plot to target the two million British tourists who visit the Spanish Costas each summer. Gerona is the main air travel hub for tourists visiting the popular holiday resorts of Salou and Sitges. A Scotland Yard source said: "These arrests were directly linked to information we received following last weekend’s arrests in west London. The location of the Spanish arrests causes us obvious concern, especially when you consider how many British tourists travel through Gerona airport en route to the Costas. There is a large Arab population along Spain’s Mediterranean coast and the area’s geographical location makes it the perfect enclave for terrorist cells originating out of North Africa." It is understood that most of those detained yesterday were Algerian nationals known to be members of the Salafist Group for Combat and Preaching (GSPC), an Islamic fundamentalist extremist group active in Algeria’s bloody 11-year-old civil war and with close connections with al-Qaeda. Of the 200 men arrested by western intelligence agencies in Europe in the past 15 months on suspicion of terrorist offences, about 80 per cent have been of Algerian origin. geographical location makes it the perfect enclave for terrorist cells originating out of North Africa." Chemical Biological Nuclear

  37. Categorization – Chemical Example Israel fears chemical attack by Hamas suicide bombers Israeli Intelligence chiefs believe that Palestinian bomb-makers are trying to acquire lethal toxins to use in future suicide attacks. It is believed that leaders of the military wing of Hamas, the radical Islamist group, living abroad in Qatar, Syria and Jordan have decided to include chemical weapons in their arsenal. According to Israeli Intelligence, Hamas first added poisonous chemicals to home-made bombs in 1997. "They used pesticides and other poisons that are relatively easy to get hold of," a senior security source said. "The concern is that they are becoming ambitious and are trying to get hold of sarin and other nerve gases." Chemical Biological Nuclear

  38. Exemplar Document - GSPC The Salafist Group for Call and Combat

  39. Leveraging Analysis Artifacts GSPC Documents for Review Draft Report Tables Notes Lists

  40. Dynamic Exemplar Update • Dynamic • Class • Exemplar Content Analyst Documents for Review • Extracted • Information • Items Analyst’s Working Notes • Automatic Updates • to Classification Exemplar

  41. Possible Aliases Taxonomic Overview Incoming Document Stream Alerts Relevant And Unique Items Content Analyst • Mourad Merabet • Abdelouahab Djouba • Venissieux network • URSSAF mosque Chemical Nuclear Bioterrorism Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim New Entities Categorized Items Possible Aliases

  42. Possible Aliases - List Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim

  43. Possible Alias #1 – Closest Names Laurent Mourad Djamel Beghal Djamel Loiseau Yacine Akhnouche Brahim Yadel Meroine Berrahal Slimane Khalafaoui Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim

  44. Closest Names to Laurent Mourad Head of a GSPC Cell in Europe Killed at Tora Bora Member of GSPC Cell in Frankfurt Held at Guantanamo Had Contact with Two People now Held at Guantanamo Main Correspondent for Al Qaeda Networks in England • Laurent Mourad • Djamel Beghal • Djamel Loiseau • Yacine Akhnouche • Brahim Yadel • Meroine Berrahal • Slimane Khalafaoui Laurent Mourad

  45. Possible Alias #2 – Closest Names • Menad Benchellali • Abdel Hakim • Imad Kanouni • Ridouane Khalid • Youcef Abou el Bassir • Djamel Beghal • Ammar Saaefi Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim

  46. Closest Names Exploration Alias Used by Menad Benchellali Captured in Afghanistan with Benchellali’s Brother Captured in Afghanistan Alias used by Mohamed Lounis, Head of External Relations for GSPC Head of a GSPC Cell in Europe GSPC Commander in Eastern Algeria • Menad Benchellali • Abdel Hakim • Imad Kanouni • Ridouane Khalid • Youcef Abou el Bassir • Djamel Beghal • Ammar Saaefi Menad Benchellali

  47. No Auxiliary Structures Required Taxonomies Ontologies Thesauri Grammars

  48. New Information Discovery Efficiency Comprehensiveness  80% 80%      60% 60% 40% 40% 500K 1M 500K 1M Database Size ( # of Documents)

  49. Multilingual Processing Chemical Nuclear Bioterrorism Korean Russian Chinese Farsi Arabic Doc Content Analyst • Mourad Merabet • Abdelouahab Djouba • Venissieux network • URSSAF mosque Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim

  50. Chemical Nuclear Bioterrorism Cross-lingual Capabilities Incoming Documents in: Content Analyst • Mourad Merabet • Abdelouahab Djouba • Venissieux network • URSSAF mosque English-Language Exemplars Chosen by Analyst Xxxxxxxxx Xxxxxxxxx ..smallpox.. Xxxxxxxxx ..anthrax… Laurent Mourad  Djamel Beghal Menad Benchellali  Abdel Hakim

More Related