70 likes | 177 Vues
The BioText Project, led by Marti Hearst at UC Berkeley and supported by NSF and Genentech, aims to enhance access to bioscience information through intelligent text analysis. By focusing on textual data from journal articles, the project integrates advanced search interfaces with databases and ontologies. Recent talks cover various topics, including the layered query language, gene function determination from text, and protein-protein interaction identification. The project seeks to provide flexible solutions for bioscience applications, leveraging cutting-edge computational linguistics.
E N D
The BioText Project:Recent Work Marti Hearst SIMS, UC Berkeley http://biotext.berkeley.edu Supported by NSF DBI-0317510 and a gift from Genentech
Project Team • Project Leaders: • PI: Marti Hearst • Co-PI: Adam Arkin • Computational Linguistics • Preslav Nakov • Emilia Stoica • Sarah Poon • IR/Databases/Software • Ariel Schwartz • Itai Brickner • Brian Wolf • Bioscience • Janice Hamer • Alumni • Dr. Barbara Rosario • Dr. TingTing Zhang • Gaurav Bhalotia
BioText Project Goals • Provide flexible, intelligent access to information for use in biosciences applications. • Focus on • Textual Information from Journal Articles • Tightly integrated with other resources • Ontologies • Record-based databases
BioText Architecture Sophisticated Text Analysis Annotations in Database Improved Search Interface
Today’s Talks • Intro (Marti) • Design and Implementation of the Layered Query Language (Ariel & Brian) • Adding Fulltext to LQL (Itai) • Determining Gene Function from Text (Emilia) • Using the Web as an Implicit Training Corpus (Presley) • Identifing Protein-Protein Interactions (Marti, covering Barbara’s work) • Citances (Marti) • Discussion: what should our user interface do?
Recent Papers • Predicting Gene Functions from Text Using a Cross-Species Approach, Emilia Stoica and Marti Hearst, to appear in PSB 2006. • Multi-way Relation Classification: Application to Protein-Protein Interaction, Barbara Rosario and Marti Hearst, in HLT/EMNLP 2005. • Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution, Preslav Nakov and Marti Hearst, in HLT/EMNLP 2005.
Recent Papers • Scaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing, Preslav Nakov, Ariel Schwartz, Brian Wolf, and Marti Hearst, in ACL/ISMB SIGLINK 2005. • Search Engine Statistics Beyond the n-gram: Application to Noun Compound Bracketing , Preslav Nakov and Marti Hearst, in CoNNL 2005. • Citances: Citation Sentences for Semantic Analysis of Bioscience Text, Preslav Nakov, Ariel Schwartz, and Marti Hearst, in the SIGIR'04 workshop on Search and Discovery in Bioinformatics.