1 / 36

自然語言處理專題

自然語言處理專題. Spring 2011. 課程資訊 I. 講師:戴鴻傑 Email: hongjie@saturn.yzu.edu.tw Office hours: Mon. PM 2:00-4:00 助教: 蔡偉棋 Email: s986004@mail.yzu.edu.tw 賴政佑 Email : s986038@mail.yzu.edu.tw Office hours: Mon., Tue., Wed. PM 3:00-6:00 請事先用 Email 確定時間。. 課程資訊 II. 評分標準: 作業: 70% 出席: 30%

erelah
Télécharger la présentation

自然語言處理專題

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 自然語言處理專題 Spring 2011 Hong-Jie Dai, IIS, Academia Sinica

  2. 課程資訊 I • 講師:戴鴻傑 • Email: hongjie@saturn.yzu.edu.tw • Office hours: Mon. PM 2:00-4:00 • 助教: • 蔡偉棋 • Email: s986004@mail.yzu.edu.tw • 賴政佑 • Email: s986038@mail.yzu.edu.tw • Office hours: Mon., Tue., Wed. PM 3:00-6:00 • 請事先用 Email 確定時間。 Hong-Jie Dai, IIS, Academia Sinica

  3. 課程資訊 II • 評分標準: • 作業:70% • 出席:30% • 回答問題、Demo:Bonus • Today: • 自然語言處理之簡介 • 自然語言處理之相關應用 • 專題內容與作業說明 Hong-Jie Dai, IIS, Academia Sinica

  4. Natural Language Processing; NLP 處理 自然語言 Hong-Jie Dai, IIS, Academia Sinica

  5. Natural Language • A language that is spoken, signed, or written by humans for general-purpose communication.--- WIKIPEDIA [http://en.wikipedia.org/wiki/Natural_language] • 數學表示式? • 電腦程式語言? Hong-Jie Dai, IIS, Academia Sinica

  6. Natural language processing (NLP) • A field of computer science concerned with the interactions between computers and human (natural) languages.--- WIKIPEDIA [http://en.wikipedia.org/wiki/Natural_language_processing] Hong-Jie Dai, IIS, Academia Sinica

  7. IBM Watson System Hong-Jie Dai, IIS, Academia Sinica

  8. Hong-Jie Dai, IIS, Academia Sinica

  9. Jeopardy! Facebook game: http://apps.facebook.com/jeopardybygsn/?ref_source=61aap Hong-Jie Dai, IIS, Academia Sinica

  10. 詞彙 語法、語意 生活經驗、概念 自然語言 Hong-Jie Dai, IIS, Academia Sinica

  11. 瞭解人類的語言 • 瞭解其語法與語意 • 瞭解人類的概念 • 概念與概念間的關連 • 產生人類的語言 自然語言處理(NLP)的目標 A subfield of artificial intelligence and computational linguistics. It studies the problems of automated generation and understanding of natural human languages. WIKIPEDIA [http://en.wikipedia.org/wiki/Natural_language_processing] Hong-Jie Dai, IIS, Academia Sinica

  12. Computer Strengths • 數學表示式運算 • 2 + 3 = 5 • ln((12,546,798 * ) ^ 2) / 34,567.46 = ? Hong-Jie Dai, IIS, Academia Sinica

  13. Computer Strengths • 數學表示式運算 • 2 + 3 = 5 • ln((12,546,798 * )^2)/34,567.46 = 0.00885 Hong-Jie Dai, IIS, Academia Sinica

  14. Computer Weakness • 自然語言 • Implicit • Highly contextual • Ambiguous • Often imprecise Hong-Jie Dai, IIS, Academia Sinica

  15. 重要性 • NLP 是改變世界十大新興科技趨勢之一(MIT 2001 科技評論;Technology Review) • NLP 是 2000 至 2010 最重要的十二項資訊技術之一(Gartner Group, Nov. 2000) • NLP 可以改善新藥物研究發展的速度(Gartner Group, April 2004) • 微軟亞洲研究院的研發方向之一 Brown University University of California, Berkeley Carnegie Mellon University Columbia University Duke University Georgetown University Harvard University Indiana University Johns Hopkins University New Mexico State University New York University National Chengchi University National Taiwan University National TsingHua University University of Pennsylvania … WIKIPEDIAhttp://en.wikipedia.org/wiki/User:Stevenbird/List_of_NLP_Courses Hong-Jie Dai, IIS, Academia Sinica

  16. 自然語言處理之相關應用 Hong-Jie Dai, IIS, Academia Sinica

  17. 第一個人工智慧的軟體程式 Eliza Hong-Jie Dai, IIS, Academia Sinica

  18. ELIZA • A computer program and an early example (by modern standards) of primitive natural language processing. • The first prototype of ELIZA program was written by Joseph Weizenbaum between 1964 to 1966. --- WIKIPEDIA[http://en.wikipedia.org/wiki/ELIZA] Hong-Jie Dai, IIS, Academia Sinica

  19. ELIZA in 1966 • User: Men are all alike. • ELIZA: IN WHAT WAY • User: They’re always complaining us about something or other. • ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE • User: Well, my boyfriend made me come here. • ELIZA: YOUR BOYFRIEND MADE YOU COME HERE • User: He says I’m depressed much of the time. • ELIZA: I AM SORRY TO HEAR YOU ARE DEPRESSED. Hong-Jie Dai, IIS, Academia Sinica

  20. 智慧? • 使用字串替換或是正規表示法(Regular Expression)對使用者的輸入進行處理: • s/.* all .*/IN WHAT WAY/ • s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/ • s/.* I am (depressed|sad) .*/WHY DO YOU THINK YOU ARE \1/ • s/.* I am (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \1/ Hong-Jie Dai, IIS, Academia Sinica

  21. 問答系統 • http://asqa.iis.sinica.edu.tw/clqa2006/ Hong-Jie Dai, IIS, Academia Sinica

  22. 專有名詞辨識 能夠自動化辨識「人名」、「地名」、「機構名」、「時間」等其它各類專有名詞的系統 http://ir.hit.edu.cn/demo/ltp/ Hong-Jie Dai, IIS, Academia Sinica

  23. 基因名稱 專有名稱辨識 http://asqa.iis.sinica.edu.tw/biocreative2/ Hong-Jie Dai, IIS, Academia Sinica

  24. 語法、語意剖析 • 自動化解析文句的語法結構 http://parser.iis.sinica.edu.tw/ Hong-Jie Dai, IIS, Academia Sinica

  25. http://ir.hit.edu.cn/demo/ltp/ Hong-Jie Dai, IIS, Academia Sinica

  26. 文件分類 • 依據文件的內容,對文件進行分類 • 新聞分類 • 垃圾郵件 http://asqa.iis.sinica.edu.tw/biocreative2/ Hong-Jie Dai, IIS, Academia Sinica

  27. 自動摘要 http://ir.hit.edu.cn/demo/ltp/ Hong-Jie Dai, IIS, Academia Sinica

  28. 機器翻譯 • 網站:http://translate.google.com/translate_t • Google Language API :http://code.google.com/apis/ajaxlanguage/ Hong-Jie Dai, IIS, Academia Sinica

  29. 中文輸入法 http://iasl.iis.sinica.edu.tw/products/goingindex.htm Hong-Jie Dai, IIS, Academia Sinica

  30. 語音分析 http://mir.cs.nthu.edu.tw/demo.htm Hong-Jie Dai, IIS, Academia Sinica

  31. 語音合成 • http://mir.cs.nthu.edu.tw/demo/TTS/default.asp Hong-Jie Dai, IIS, Academia Sinica

  32. 共通點? Hong-Jie Dai, IIS, Academia Sinica

  33. 中文斷詞 • 對於中文的 NLP 而言,中文斷詞是一個很基本但是卻很重要的步驟: • 現在時刻下午八點整 • 現在 時刻 下午 八點 整 • 接著再依據斷詞完的結果,進行後續之處理。 Hong-Jie Dai, IIS, Academia Sinica

  34. 中文斷詞 • 專有名稱辨識: • 決定斷詞後的各詞是否為專有名詞 • 機器翻譯(中翻英): • 依據斷好的詞,翻譯成對應的英文 • 中文輸入法: • 搶詞分析 • 語音合成: • 從語音檔資料庫中找出適當的片段,利用訊號處理技術,合成出對應的語音。 Hong-Jie Dai, IIS, Academia Sinica

  35. 專題目標 • 開發一中文斷詞系統 • 長詞優先演算法 • 開發工具: • Microsoft Visual C++ • Googletest Hong-Jie Dai, IIS, Academia Sinica

  36. 作業 • ELIZA • 至少有如下規則: • s/.* all .*/IN WHAT WAY/ • s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/ • 加入隨機: • s/.* I am sad .*/WHY DO YOU THINK YOU ARE SAD/ • s/.* I am sad .*/I AM SORRY TO HEAR YOU ARE SAD/ • 實作 tokenize function • 詳細的作業規定請參考助教的投影片 Hong-Jie Dai, IIS, Academia Sinica

More Related