1 / 15

Hidden Markov Models, Logistic Regression and Term Covering Algorithms

Hidden Markov Models, Logistic Regression and Term Covering Algorithms. John M. Conroy and Judith D. Schlesinger Institute for Defense Analysis, Center for Computing Sciences Mary Ellen Okurowski, DOD Dianne P. O’Leary

shawngeorge
Télécharger la présentation

Hidden Markov Models, Logistic Regression and Term Covering Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models, Logistic Regression and Term Covering Algorithms John M. Conroy and Judith D. Schlesinger Institute for Defense Analysis, Center for Computing Sciences Mary Ellen Okurowski, DOD Dianne P. O’Leary Dept of Computer Sci. & Institute for Advanced Computer Studies, Univ. of MD

  2. Outline • Single Document Summaries • Pseudo-query terms • Tagging Data • HMM • Logistic Regression • Multi-Document Summaries • HMM to flag interesting sentences • Two token covering schemes to select sentences from reduced set: greedy and pivoted QR.

  3. Single Document Summaries: Pseudo-query Terms One tail Z-test • Build a mono-gram language model on the tokens based on the total DUC corpus. • Score token frequencies for each set based on the mono-gram model. • Identify any token with score greater than 10 standard deviations as a “query term.” • To Do: Compare with Dunning 1993, Hovy & Lin 2000.

  4. Hidden Markov Model no 1 no 2 no k no

  5. HMM Summarization Model • Hidden states {summary, non-summary}. • Observations. • log (# tokens in sentence+1). • log (# pseudo-query terms in sentence+1). • Model output as multi-variant Gaussian. • HMM g’s give posterior probabilities of sentence being a summary.

  6. Logistic Regression • (V1) number of unique query terms in a sentence. • (V2) number of tokens (non-stop words) • (V3) log distance of a sentence from one with a query term • (V4) position of a sentence in document.

  7. More on Logistic Regression Score = f(a+b1V1+b2V2+ b3V3+b4V4) where

  8. Self Evaluation on Training Data • Automatic extracts from DUC abstracts were inadequate • Often missing key points. • Often too long. • Human extracts derived from abstracts. • 148 documents marked. • Extracts on average were 60% longer than abstracts.

  9. Precision Results for Training Data

  10. Multi-document Methods • Use HMM score to identify candidate sentences. • Select from reduced set with pivoted QR or greedy term covering. • Sort sentences by date and then within document.

  11. Evaluating our Models on Training Data

  12. Cover: 200 words on d30 He had decided he could not comply with requirements of a consulting job he had accepted from the Agency for International Development, and he was scrambling to come up with a suitable substitute. The bulk of African debt is owed to official lenders under various aid agreements. The debts represent loans with a substantial grant element. The debts of African countries have often been cancelled or rescheduled, frequently several times for the same country. MR Lewis Preston, World Bank president, yesterday promised to strengthen the bank's efforts to reduce poverty in developing countries. ' We will look for specific increases in the share of lending going for these purposes.' Internationally, it appears that it will be even more difficult for economically troubled developing nations to attract new bank loans. But at the World Bank, Mr. Conable finds himself under fire. The bank has tried to help the countries by tiding them over with some new loans.

  13. QR: 200 words on d30 Foreign Minister Roberto de Abreau Sodre of Brazil told the opening session of the 42nd General Assembly that the Third World economic picture was dimming ``due to the lack of progress in international economic relations.'' ``It is ... sad to note that we, American, Asian, African brothers, still suffer from the same horrors and the same desolation which so badly affected our forebears,'' he said, adding, ``hunger ... is endemically spreading throughout the continents.'' What he foresees is a body blow to world poverty through bootstrap economics. The link between loans and poverty relief forms part of a new drive to make poverty alleviation the bank's central mission in the 1990s. The shift in priorities is also reflected in a commitment to make comprehensive assessments of the nature and extent of poverty in the third world, allowing the bank to design more effective policies to fight poverty. The figure, contained in the bank's annual report and made public before its annual meeting here Sept. 23, was almost a third larger than in 1987, when the net pay-back totaled $38.3 billion. Domestically, the move reflects the competitive advantage that regional banks with large loan-loss reserves have over their big brothers in such money centers as New York, Chicago and San Francisco. World Bank President Barber Conable was so well regarded during his 20-year career as a Republican congressman from New York that some journalists nicknamed him "H.R." -- for "highly respected."

  14. Conclusions • Systems are naïve. • No pronoun resolution. • Sentence extraction based. • Glitches • Buggy code for Multi-doc: Ouch!!! • Boilerplate not always filtered out. • Sentence boundaries not always found.

  15. "I Will Be Brief. Not Nearly So Brief As Salvador Dali, Who Gave the World's Shortest Speech. He Said I Will Be So Brief I Have AlreadyFinished, and He Sat Down.- Edward O. Wilson

More Related