210 likes | 334 Vues
Trends in Web Search and its relevance to Digital Libraries. Min-Yen Kan Web IR NLP Group (WING) National University of Singapore. Tips on Web Searching. Visualize results, then come up with multiple queries Use multiple search engines Advanced Search inurl:, site: “Phrasal search”
E N D
Trends in Web Searchand its relevance to Digital Libraries Min-Yen Kan Web IR NLP Group (WING) National University of Singapore
Tips on Web Searching • Visualize results, then come up with multiple queries • Use multiple search engines • Advanced Search • inurl:, site: • “Phrasal search” But that’s just general search… • Federated resources / Niche search engines 26 Sep 2008
Site- and Task-specific resources • Site Prestige Know what others think and do • Google PageRank (Link structure), Alexa (Traffic) • Google Trends / Insight (Queries) • Social Searching (Web 2.0) The voice of the reader / critic • (Bookmarks / Tags) Del.icio.us, Citeulike.org, Bibsonomy.org • (News) Digg / Slashdot • (Blogs) Google Blog, Technorati • People Search: Finding public information on a person • Spock (web), Zabasearch (US only) • LinkedIn, Facebook • Must validate your sources http://labs.digg.com/arc/ 26 Sep 2008
Expert Search Find people who will advocate on your behalf • What do they want? • Scholar: • Active? →Check their recent articles • Names common? → Define area of interest • Compare against peers • Download vs. citation counts • Patent search: • Referenced by: (citation count; different than scholar) • Identifying webfaced advocates: • Blog search, PageRank →Impact http://flickr.com/photos/phauly/ • How do machines do it? • Expert search task as benchmark test • Download web pages to analyze • Needed to deal with spam pages • Used PageRank to assess prestige 26 Sep 2008
Revenue from print continually declining Students and researchers rely on internet Researchers want archiving rights – freedom of academic information Characteristics: Not zero-sum content Distribution is now largely the role of search engines →Necessitates new role of publisher and new revenue model Will classic models work? Advertising, Subscription, Transactional & Bundling Variants? Versioning (Varian), Moving window (JSTOR) Problem or opportunity? The game has fundamentally changed http://flickr.com/photos/danielbroche/ 26 Sep 2008
– Content is becoming free MIT / Stanford opening up textbooks Open access archiving → long term: content will not be primary revenue source eBook revenue hasn’t held up its promise yet… Device gap: iPhone and nextGen devices → Revenue may be further down the pipe + Academic publishers Connect to libraries and federations at institution level Individual customers are secondary Trusted source Expertise in copyediting, typesetting, project management, distribution, social networking Many individual web publishers rediscovering same problems → Consultancy model → Win-win partnerships with individual authors Forecasting 26 Sep 2008
Social Content Wisdom of masses: Crowdsourcing Rich Media Open Source / Access Paradigmatic change Classifieds → Craigslist POTS →Skype CD store →iTunes Publishers → ?? Web Trends http://www.informationarchitects.jp/slash/iA_WebTrends_2007_2_1024_768.gif 26 Sep 2008
Server centric User centric Where is research going? • Search API usage • Browser as computer • Web page structure, mining text data • Modeling web users at tasks: Exploring / Fact-finding • Personalization, recommending • Social networks • Understanding opinion • Query and log analysis http://flickr.com/photos/alisdair/ 26 Sep 2008
WING@NUS Webfaced pop quiz – which is which? American Statistical Society World Scientific Springer courtesy:http://pagerank.si/ 26 Sep 2008
Get advocates Make it easy to get individuals to insist to their institution to buy your materials Know who is accessing (not necessarily buying) your content Content revenue will continue to decline Find an economic model that works for you Work as partners in content creation Be savvy on trends Be visible: do “white hat” Search Engine Optimization (SEO) Make your abstracts indexable by others + Academic publishers Connect to libraries and federations at institution level Individual customers are secondary Trusted source Expertise in copyediting, typesetting, project management, distribution, social networking Many individual web publishers rediscovering same problems → Consultancy model → Win-win partnerships with individual authors Forecast: Know your strengths 26 Sep 2008
Trends in Digital Libraries >> WING @ NUS • Expanding types of information in search • Automated tools for DLs • Usability in E-books and online media • User modeling • Personalization, annotation and relation to other user tasks http://flickr.com/photos/pathfinderlinden 26 Sep 2008
Scholarly Digital Libraries • ForeCite: our scholarly DL • Data Cleaning • Slide and Document Alignment • Searching in the OPAC • Math Information Retrieval 26 Sep 2008
ForeCite: Beyond the document as an item Server Client • A user-centric DL framework • Put author / reader functionality together • Tagging, correction, annotation and viewing • Automatic tools: keyphrases and sentence classification • For use on and offline, organizes local PDF files for you • Onlyneed your web browser 26 Sep 2008
Addresses Dongwon Lee, 110 E. Foster Ave. #410, State College, PA, 16802 LEE Dong, 110 East Foster Avenue Apartment 410, Univ. Park, PA 16802-2343 Products Honda Fix vs. Honda Jazz Apple iPod Nano 4GB vs. 4GB iPod nano 4GB Idea: use web as additional context for disambiguation and clustering Placed 3rd in Web People Search Task (WEPS 2007) Data Cleaning • Search results: • “Jeffrey D. Ullman” 384,000 pages • “Jeffrey D. Ullman” + “aho” 174,000 pages • “J. Ullman” 124,000 pages • “J. Ullman” + “aho” 41,000 pages • “Shimon Ullman” 27,300 pages • “Shimon Ullman” + “aho” 66 pages 45% 33% 0% 26 Sep 2008
Slides and their relationship to documents Document in focus Slides in Focus 26 Sep 2008
Searching in Libraries http://linc.comp.nus.edu.sg 26 Sep 2008
Symbolic Information Search How do users want to search math materials? Our answer: Text-to-Expression Linking • Resolve text keywords to expressions • e.g., “Pythagorean Theorem”“a2+b2=c2” or “x2+y2=z2” • Reduce the need for expression input • Solves the notational variation problem Not quite right… 26 Sep 2008
Conclusions • Consider us your research WING! • Trade data and problems for solutions and interns Meanwhile: • Use better search strategies • Practice white hat SEO • Identify webfaced advocates 26 Sep 2008
References • Kahin and Varian (2000) Internet Publishing and Beyond • Towle et al. (2007) Electronic Books in the 2003-2005 Period, Pub Res Q 23:95-104 Photo Credits • Flickr Creative Commons Search Thanks to all of you for listening & my fellow WING group members 26 Sep 2008
Abstract • I will present trends in current academic research on web search anddigital libraries, and discuss their relevance to publishers and theireconomic model. With respect to the web, I will cover how searchengines are starting to specialize and use click through and ad datato improve relevance ranking. With respect to digital libraryresearch, I discuss my group's research at NUS on advancing thestate-of-the-art in scholarly digital libraries. I cover advances onhow we deal with data cleaning issues, and slide and equationretrieval and alignment. 26 Sep 2008