1 / 7

Giới thiệu một số công cụ xử lý ngôn ngữ tự nhiên và khai phá dữ liệu

Giới thiệu một số công cụ xử lý ngôn ngữ tự nhiên và khai phá dữ liệu. TRẦN MAI VŨ. Vietnamese NLP Tools. JVnTextPro : http://sourceforge.net/projects/jvntextpro/ Sentence Segmentation, Sentence Tokenization, Word Segmentation, Pos Tagging

Télécharger la présentation

Giới thiệu một số công cụ xử lý ngôn ngữ tự nhiên và khai phá dữ liệu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GiớithiệumộtsốcôngcụxửlýngônngữtựnhiênvàkhaiphádữliệuGiớithiệumộtsốcôngcụxửlýngônngữtựnhiênvàkhaiphádữliệu TRẦN MAI VŨ

  2. Vietnamese NLP Tools • JVnTextPro: http://sourceforge.net/projects/jvntextpro/ • Sentence Segmentation, Sentence Tokenization, Word Segmentation, Pos Tagging • VnToolkit: http://www.loria.fr/~lehong/softwares.php • A software for automatically extracting LTAGs* from treebanks. • An automatic tagger for Vietnamese texts • A tokenize for automatic word segmentation of Vietnamese texts • A sentence detector for automatic detecting sentences of Vietnamese texts • VLSP Tools: http://vlsp.vietlp.org:8080/demo/?page=resources • Vietnamese Chunking (*) Lexicalized Tree Adjoining Grammars

  3. NLP Tools • LingPipe: http://alias-i.com/lingpipe/ • Gate – General Architecture for Text Engineering: http://gate.ac.uk/ • Mallet - Machine Learning for Language Toolkit: http://mallet.cs.umass.edu/ • MinorThird: http://sourceforge.net/projects/minorthird/ • OpenNLP: http://opennlp.sourceforge.net/

  4. Preprocessing Tools • TextCat - Java Text Categorizing Library: http://textcat.sourceforge.net/ • HTML Parser: http://htmlparser.sourceforge.net/ • CyberNeko HTML Parser: http://nekohtml.sourceforge.net/ • Crawler4J: http://code.google.com/p/crawler4j/ • Lucene: http://lucene.apache.org/

  5. Other Tools • SVM-Light Support Vector Machine: http://svmlight.joachims.org/ • CRF: http://crf.sourceforge.net/ • Text Clustering Toolkit: http://mlg.ucd.ie/tct • A Java Implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling for Parameter Estimation and Inference: http://jgibblda.sourceforge.net/

  6. Data mining Tools • Weka - Machine Learning Software in Java: http://sourceforge.net/projects/weka/ • RapidMiner -- Data Mining, ETL, OLAP, BI: http://sourceforge.net/projects/yale/ • RSES - Rough Set Exploration System: http://logic.mimuw.edu.pl/~rses/

  7. Ontology Tools • The Protégé Ontology Editor and Knowledge Acquisition System: http://protege.stanford.edu/ • Jena Semantic Web Framework: http://jena.sourceforge.net/

More Related