240 likes | 383 Vues
How to build a better Google?. Adam Bak IST 497E November 21, 2002. Google Timeline. 1995 March-December – Ph.D. candidates Sergey Brin and Larry Page meet at Stanford University and discuss ideas about new search technology 1996-1997 January 1996-December – Brin and Page create BackRub.
E N D
How to build a better Google? Adam Bak IST 497E November 21, 2002
Google Timeline • 1995 • March-December – Ph.D. candidates Sergey Brin and Larry Page meet at Stanford University and discuss ideas about new search technology • 1996-1997 • January 1996-December –Brin and Page create BackRub
Google Timeline • 1998 • August-December –Sergey and Larry raise one million dollars in funding and create Google Corporation • 10,000 search queries per day • 1999 • February-June –500,000 search queries per day
Google Timeline • 1999 • August-December – 3 million searches per day • 2000 • May-June –18 million search queries per day • November-December – 60 million searches per day
Google Timeline • 2002 • May – 150 million searches per day
Google’s Current Technology • Page Rank • Does not count direct links • Page A would have a lower rank if pages B and C did not have a high weighting
Google’s Current Technology • Hypertext-Matching Analysis • Font size – The larger and bolder the fonts, the higher the weights • Capitalization – Higher weights • Relative Distance – Example - Peanut Butter
Google’s Search Capabilities • Images • Usenet • Search by language • File Types (key word filetype:) • News (new feature)
Google’s Key Words • cache: Will retrieve the page that Google has stored in its cache • link: Will display pages that link to the given page • related: Will display pages that are similar to the specified page • info: Will show information about a particular page
Google’s Key Words • stocks: Will treat the query as a stock ticker symbols • site: Will restrict the search to the given domain • allintitle: Will search words found only in the title • intitle: Will display results with the first word appearing in the title • allinurl: Will search words found only in the URL • inurl: Will display results with the first word appearing in the URL
The big question • Can any improvements be done to make Google any better than it already is?
Google’s Programming Contest • Started this year • Winner - Daniel Egnor • His Idea – A geographic search • “Converted street addresses found within a large corpus of documents to latitude-longitude-based coordinates” • Would allow the user to specify a query – “What are closest movie theaters near my house”
Personalized results based on location • The server knows your IP • Find the server closest to you by doing a trace route • http://www.calweb.com/cgi-bin/traceroute • The relative geographic location of your computer can be found by doing a whois query on your IP’s server • http://dns411.com/cgi-bin/whois.pl • Once your location is found your results can be customized based on where you live
Personalized results from Cookies • Google could ask the user to answer a one time survey and store the results as a cookie • For example: • Age • Sex • Education • A query done by a 60 year old man for “rock” might give back different results than the same search done by a teenager
Linguistic Approach • Google could tailor results based on the language used • For example the English word “Java” has many definitions • The programming language • The coffee • The Indonesian island
File type restriction • Google already has the ability to search for file types with its keyword filetype: • What if that user does not want to find a certain file type, but instead has the need to find a page that contains a file type either embedded inside the page or has a link to that certain file type? • For example: Find me only pages that have audio files and java applets
Authorities and Hubs • Authorities - Highly cited pages • Hubs – Pages that contain many authorities • Difference between search on www.Google.com and www.inquirus.com when searching for “Pasta”
Business Improvements • Develop Google software for the PC market • The single search query using the search tool on a windows machine is relatively slow compared to a Google search done online
P2P • If Google would create software for the PC market, maybe the amount of searchable documents would increase drastically. • Perhaps with this P2P technology one would be able to find a computer science document about search engine technology that sits in a professor’s computer at Stanford
B2B • Business to Business • Google could act as an intermediary between corporations that are looking for the business of other corporations • Coupled together with the Geographic technology, a business could perform a sample query: Find me all the businesses that sell paper around the Philadelphia Region
Other ideas • Include commercial databases • Library catalogs • Proquest • Cluster documents by topic • After searching for the keyword “Law,” Google should cluster the documents pertaining to the type of law (property law, banking law, criminal law)
Resources • http://www.google.com/corporate/tech.html • http://www.google.com/corporate/timeline.html • http://www.google.com/programming-contest/winner.html • http://citeseer.nj.nec.com/borodin01finding.html • http://www.calweb.com/cgi-bin/traceroute • http://dns411.com/cgi-bin/whois.pl • Aaron Steward– Finance Major