250 likes | 449 Vues
The Networked Economy (10): Information Management, Strategy, and Innovation 网络经济:信息管理,战略,和创新. Search 搜索. Search: Key Points 搜索:议程. Technolgoy 技术 Index 检索 Crawl (or spider) 网页爬行程序 Speed Store everything Need search To be fast, need to build index 存储所有东西 需要搜索 索引能加快速度
E N D
The Networked Economy (10):Information Management, Strategy, and Innovation网络经济:信息管理,战略,和创新 Search搜索
Search: Key Points搜索:议程 • Technolgoy技术 • Index检索 • Crawl (or spider)网页爬行程序 • Speed • Store everything Need search To be fast, need to build index 存储所有东西 需要搜索 索引能加快速度 • Trade-off: Results very fast, but pre-computing and storage needed权衡:结果迅速但需要预先计算和存储空间 • Relevance (algorithmic)关联度(算法) • Information production Information finding Information filtering / ranking信息产出 信息搜索 信息过滤/排序
Desktop search桌面搜索 • Money or Attention? • You pay with your money ; Buy software (e.g., X1 in 2005)用户付费购买软件: 软件开发(例如,X1 in 2005) • Pay with attention: Yahoo, Google, MSN search toolbar 用户付出注意力: 搜索工具栏 • Sites understand user behavior and situation better, can target ads better 获取用户行为和上网情境等信息, 改善定向广告效果
Search搜索 • Desktop • Build index桌面:建立检索 • Intranet is similar局内网很类似 • Web • How to find information, products, etc. on the web?如何在网络上寻找信息,产品等?
Find without search?无需搜索就能找到? • 1. GUESS • If you know the location on the web (URL)如果知道确定的网络地址(主页、网址) • 2. BROWSE • Use directories 目录指南 • Organize manually using expert “surfers”由“网络冲浪”专家手工编制 • Does not scale. • Manual directories difficult to maintain!但是如规模太大则无法人工维护 • Organize using community of web users由网民社区编制 URL (Universal Resource Locator, e.g., http://www.ceibs.edu) is the address that identifies the web page URL(通用资源定位程序,如http://www.ceibs.edu)指定位网页的地址
Crawl爬行 • Early search engines 早期搜索引擎 • Basic idea 原理 • Crawl through web, following hyperlinks*通过超级链接在网页间中抓取 • Extract words from the page从网站中提取关键字 • Then build index of web 然后建立网页索引 • Match user input (search terms) to the index将关键字与用户输入信息(搜索项)比对 • *Example: <a href="http://www.weigend.com/"> Home of my professor.</a> Hyperlink: Takes you to another page when you click on it 超级链接:在用户点击后将用户带到另一个网页
Relevance (in organic search results)关联度(自然搜索结果) New problem: Relevance 新问题:关联度 • How to rank the pages? What to show on top?搜索结果如何排序?哪些显示在最前面? • What information can be used to help with this decision?哪些信息可以用来做这项决策? • A) Within page同一网页上 • Location of search term on page搜索项在网页上的位置 • Number of occurrences of the search term on page 网页上搜索项出现的频率 • Metatags底标签 • B) Static: Link structure静态的: 链接结构 • E.g., Number of hyperlinks going into page指向某网页的外部超级链接数量 • Leverages other websites利用其它网站的访问情况 • C) Dynamic: Click behavior动态的:点击行为 • Choice within set of links用户如何在一系列链接中进行选择 • Action: Move results up or down 搜索结果上下移动 • Understand overall trajectory (eg for typos) 趋势分析(如错别字) • Q: What information does the user see?问题:用户看到的是哪些信息? • Leverages users利用使用者的点击情况 • Example: google search for “weigend” 例如: 在google上搜索“weigend” • 309,000 results returned for weigend, in 0.2 seconds显示约309,000条关于weigend的结果(仅用0.2秒)
Business models of search (sponsored search etc.)搜索商业模式(付费搜索等) • Search is a necessary competence…搜索是必须的功能 • Has become entry point to everything (or at least key necessity)已成为重要入口(至少是必备) • Customer has become empowered消费者能力增强 • Customer get smarter. Can’t fool them any more – Transparency empowers them, too消费者变得聪明,不能随便愚弄 - 透明度使他们更加聪明 • Power of community社区力量 • Other examples of search其他搜索举例 • Product search产品 • Books书
Search Inside the Book (Amazon.com, 2003)亚马逊图书内容全文搜索(2003)
Search statistics搜索统计 • 1 billion searches per day (2005.1 estimate)每天10亿次搜索(2005年1月估计) • 0.3 billion searches per day (2003.1)每天3亿次搜索(2003年1月) • Search statistics (January 2003)(searchenginewatch.com)搜索统计(2003年1月) • Unique users per month (google, 2003.06)每月用户实际人数(google, 2003.06) • 81.9 million 8,190万 • (Nielsen/NetRatings)
Vertical search垂直搜索 • Many internet businesses are essentially vertical search 许多网络公司本质上是垂直/纵向搜索 • Limitations of horizontal search?横向搜索的局限性? • Complexity of products and services产品与服务的复杂性 • Domain knowledge专业领域知识 • Information often in deep web, not in surface web信息常处于深层网页,而非表层网页 • Travel旅游 • Aggregation: Intermediation and disintermediation信息聚合 / 中介与非中介
Vertical search垂直搜索 • Shopping comparison比价购物 • Initially: Spider sites, e.g., Amazon.com最初:网络蜘蛛,如亚马逊 • What should be Amazon’s response?亚马逊如何反应? • Should they make it hard or easy?制造阻碍还是积极配合? • Now feeds and web services达到双赢 • Business models for shopping comparison engines比价购物搜索引擎的盈利模式
Vertical search垂直搜索 Insurance comparison保险比价 • Market structure: Often through agents, health insurance often through employment市场结构:经常通过代理,健康险经常通过公司 • Essentially an information good本质上是信息产品 • Still a long way to go还有很长路要走
Vertical search垂直搜索 • Cars汽车 • 70% of customer do research on web before going to dealer在进店买之前,70%的人在网上搜索过 • Challenge: Dealer’s don’t think of their business as e-business挑战:经销商不认为他们从事的是电子商务 • Huge advertising budget, need to move to mixed channel marketing巨额的广告预算,需要利用多种渠道做整体市场营销 • Basically, car market can also be seen as vertical search基本上,汽车市场也可以被看作是垂直搜索。
Vertical search垂直搜索 • Real estate房地产 • Large part of the economy – most expensive purchase for most people经济的重大组成部分-对于大多数人来说是最贵的商品 • Market structure: 6% commission市场结构:6%佣金 • But: Real estate also is essentially a information search problem但本质上是信息搜索问题
Vertical search垂直搜索 • Music音乐 • Information sources信息来源 • Human ratings歌迷打分 • Meta data (Composer etc.) Meta数据(作曲家等) • Machine analysis机器分析 • Payment支付 • From buy (Possession) to rent (Subscription) 购买(拥有)还是租借(订阅) • China piracy rate: 92% of consumers are using pirated materials中国盗版率:92%消费者用盗版
Local search本地搜索 • Total market size: $90 billion (CitySearch)总体市场规模:$900亿(CitySearch) • Technology技术 • Know location of use via IP address or registration通过IP地址知道网民位置或注册 • Mobile: LBS (location based services) 收集: LBS(以地区为基础的服务)
People search人物搜索 • Dating and social networking sites网上交友和社会网络公司 • Note: Social networking companies are purely an information play注:社会网络公司是纯粹的信息服务 • Network effects key网络效应是关键 • The product is the customer产品就是客户 • The buyers are the inventory 买家成为存货 • Online dating platforms网络约会平台
Other examples其他例子 • Craigslist • No real-time chat 非实时聊天 • Local markets (San Francisco, New York etc)本地市场(旧金山、纽约等) • Monetization货币化 • Only people who post jobs pay贴招聘广告者付钱 • Genealogy家谱 • Amazing stories of people finding relatives花费大量精力寻根溯源
Personalized search个性化搜索 • Explicit显性的 • “Customization”:用户定制 • User tells interests explicitly用户告诉对何感兴趣 • Implicit隐性的 • Based on user’s past behavior基于你过去的行为 • Needs persistent history需长时间的历史信息 • Problem: Multiple personalities问题:多重个性 • a9, google giving access to entire search history on their platformsa9、google让你访问所有的搜索历史
Relevance is everything关联度最重要 • The Search Paradigm搜索范例 • 2.4 words, a few clicks, and done2.4个字,几次点击,就找到了 • Only possible if results are relevant搜索结果关联度很高时才可能 • Relevance is ‘speed’ 关联度就是“速度” • Time from task initiation to resolution从任务开始执行到完成的时间 • Tmportant factors:重要因素: • Location of useful result 有用搜索结果的位置 • UI Clutter 接口的速度 • Latency 反应时间 • Relevance is relative 关联度是相对的 • Context dependent内容依赖 • E.g. ‘football’ in the UK vs the US例如,“football”在英国与在美国的含义就不同 • Task dependent任务依赖 • E.g. ‘mafia’ when shopping vs researching例如,“mafia”在购物与在研究中的含义也不同
Tune Ranking 可调节的排序 Evaluate Metrics评定标准 Relevance is hard to measure关联度很难测量 • Poorly defined, subjective notion定义不清晰,主观想法 • Depends on task, user context, etc.取决于任务,用户情境等 • Analysts have focused on surrogates that are easier to measure分析时关注更易测度的替代指标 • Index size, traffic, speed索引规模,流量,速度 • anecdotal relevance tests有趣的关联度测试 • e.g. Vanity queries 例如,空内容检索 • Methodology important需要用调查的方法 • Averaged over queries 检索要求平均 • Averaged over users 用户平均 • Development Cycle发展周期
User interface用户接口 • Relevance-ranked result lists 排序搜索结果 • Document summaries are critical文件摘要很重要 • Hit highlighting 加亮提示 • Dynamic abstracts 动态摘要 • Assisted search辅助搜索 • Spell correction拼写校正 • Specialized indices特定索引 • via Tabs通过标签 • Blended results 结果混杂 • Multiple sources多种来源 • Predefined segmentation 预先提炼信息池 • E.g. Paid listing 如付费列示 • Intermixed with results from other sources将结果与其他信息来源混杂 • E.g. News 例如新闻 • Localization本地化 • Country language experience语言组合与识别
Future Trends未来趋势 • Question answering 问题解答 • New contexts 新的领域 • Ubiquitous searching 无处不在的搜索 • Toolbars, desktop, phone 工具栏、桌面、电话等 • Implicit searching 模糊搜索 • Computed links 计算链接 • New tasks 新的任务 • E.g. Local search如本地搜索
Search: Summary搜索:总结 • No longer about filing and organizing, but about searching不再去归档或组织,而是去搜索 • Whether it’s about your email or knowledge in your companies可能是你的电子邮件,可能是你公司的内部信息和知识 • And then about ranking / sorting / relevance排序/索引分类/关联度 • Why does search replace directories / categories?为什么搜索会替代分类目录 • Can be done automatic, in contrast to manual categories自动执行,不同于手工目录