1 / 22

Η αναζήτηση των πληροφοριών στον Παγκόσμιο Ιστό γίνεται μέσω:

Εισαγωγικά Θέματα WWW Αναζήτηση πληροφορίας στον Παγκόσμιο Ιστό. Μηχανές αναζήτησης. Η μηχανή Go ο gle και οι αλγόριθμοι αναζήτησης που χρησιμοποιεί. Μαρία Χατζηπανηγύρη. Η αναζήτηση των πληροφοριών στον Παγκόσμιο Ιστό γίνεται μέσω:. των Μηχανών αναζήτησης ( search engines )

Télécharger la présentation

Η αναζήτηση των πληροφοριών στον Παγκόσμιο Ιστό γίνεται μέσω:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Εισαγωγικά Θέματα WWWΑναζήτηση πληροφορίας στον Παγκόσμιο Ιστό. Μηχανές αναζήτησης. Η μηχανή Goοgle και οι αλγόριθμοι αναζήτησης που χρησιμοποιεί.Μαρία Χατζηπανηγύρη

  2. Η αναζήτηση των πληροφοριών στον Παγκόσμιο Ιστό γίνεται μέσω: • των Μηχανών αναζήτησης (search engines) • των Web καταλόγων (Web directories)

  3. Μερικές μηχανές αναζήτησης είναι: SEARCH ENGINE URL WEB PAGES INDEXED • AltaVista www.altavista.com 140 • AOL Netfind www.aol.com/netfind/ - • Excite www.excite.com 55 • Google google.stanford.edu 25 • GoTo goto.com - • HotBot www.hotbot.com 110 • Infoseek www.infoseek.com 30 • Lycos www.lycos.com 30 • Magellan www.makinley.com 55

  4. Microsoft search.msn.com - • Northern Light www.nlsearch.com 67 • WebCrawler www.webcrawler.com 2 Table 13.2 URLs and estimated size (millions) of the largest search engines (May 1998). (page 375) (R. Baeza Yates, B. Ribeiro-Neto, Modern Information Retrieval – Kεφ. 13, Searching the Web, ACM Press, 2000)

  5. Μερικές άλλες μηχανές αναζήτησης: • AOL Netfind for Kids Only www.aol.com/netfind/kids • AOL Netfind www.aol.com/search • Lycos Kids Guide www.lycos.com/kids • MetaCrawler www.metacrawler.com • In.gr www.in.gr

  6. Web directories(Κατάλογοι) Web directory URL Web sites Categories • eBlast www.eblast.com125 - • LookSmart www.looksmart.com300 24 • Lycos Subjects a2z.lycos.com 50 - • Magellan www.mckinley.com60 - • NewHoo www.newhoo.com100 23 • Netscape www.netscape.com- - • Search.com www.search.com-- • Snap www.snap.com- - • Yahoo! www.yahoo.com750 - Table 13.3 URLs Web pages indexed and categories (both in thousands) of some Web directories (beginning of 1998). (ο.π. page 385)

  7. Έξυπνες μηχανές αναζήτησης • Δέχονται ερωτήσεις που οι απαντήσεις αναφέρονται σε πιο εξειδικευμένες πληροφορίες. • Π.χ. από αρχεία ήχου, βίντεο και εικόνας • Hotbot:μέσα από την κατασκευή της σελίδας βγάζει κάποιες επιπλέον πληροφορίες • Οι πιο πολλές λέγονται και μετά-μηχανές γιατί χρησιμοποιούν μεγαλύτερες βάσεις δεδομένων.

  8. Πώς δουλεύουν οι μηχανές αναζήτησης? • Οι περισσότερες χρησιμοποιούν συγκεντρωτική crawler-indexer αρχιτεκτονική. • Οι πιο πολλές βασίζονται στις Ηνωμένες Πολιτείες και εστιάζουν σε έγγραφα στα αγγλικά. • Μερικές μηχανές αναζήτησης ειδικεύονται σε διαφορετικές γλώσσες και χώρες πχ. Kanji(Κινέζικα, Γιαπωνέζικα, Κορεάτικα) • Χρησιμοποιούν προγράμματα , που λέγονται agents,robot,crawler και spider που ψάχνουν το World Wide Web και αναφέρουν ότι βρίσκουν.

  9. Στις γνωστές τοποθεσίες βρίσκουν τις συνδέσεις που υπάρχουν στις σελίδες, τις ακολουθούν και η διαδικασία αυτή συνεχίζεται. • Τα δεδομένα που δίνουν οι χρήστες μεταφράζονται από τα προγράμματα ευρετηρίου σε μια τεράστια βάση δεδομένων με κείμενο , διευθύνσεις και συνδέσεις .

  10. Η μηχανή Google • Είναι μια πολύ χρήσιμη μηχανή αναζήτησηςπου είναι διαθέσιμη στο www.google.gr • Είναι δημοφιλέστατη και έχει το πιο "πλούσιο" περιεχόμενο • Χρησιμοποιεί, όπως και όλες οι άλλες μηχανές αναζήτησης, αλγόριθμους «για να είναι έξυπνες σαν άνθρωποι» • Χρησιμοποιεί μαθηματικούς αλγόριθμους για την "κατανόηση" των περιεχόμενων των σελίδων και την αντιστοίχιση τους με τις λέξεις / φράσεις της αναζήτησης.

  11. Οι αλγόριθμοι του Google • Χρησιμοποιεί PageRank που είναι μέρος του ranking algorithm . • PageRank:προσποιείταιένα χρήστηπου πλοηγεί τυχαία στο Web , o οποίος «πηδάει» σε μια τυχαία σελίδα με πιθανότητα qή ακολουθεί μια σπάνια υπερσυνδεση (σε μια τρέχουσα σελίδα) με πιθανότητα 1-q.

  12. Περαιτέρω υποθέτουμε ότι ο χρήστης δεν θα ξανάπαει ποτέ πίσω στην προηγούμενη σελίδα ακολουθώντας μια ήδη διασχισμένη υπερσυνδεση προς τα πίσω. • Αυτή η διαδικασία μπορεί να φορμαριστεί με μια Markov αλυσίδα από όπου η στάσιμη πιθανότητα της ύπαρξης σε κάθε σελίδα μπορεί να υπολογιστεί. • Αυτή η αξία χρησιμοποιείται ως μέρος του μηχανισμού κατάταξης του Google.

  13. PageRank PageRank PR(α) του α ορίζεται ως PR(α)=q+(1-q)ΣPR(pi)/C(pi) Όπου C(α) είναι ο αριθμός των εξωτερικών συνδέσεων της σελίδας α και η σελίδα α τονίζει τις σελίδεςpi έως pn και το qπρέπει να οριοθετηθεί από το σύστημα (0.15 είναι μια τυπικά αξία). Μπορεί να υπολογιστεί χρησιμοποιώντας ένα επαναληπτικό αλγόριθμο και απαντάει to the principal eigenvector of the normalized link matrix of the Web. Τέλος, οι σχεδιαστές των σελίδων , για να βοηθήσουν την κατάταξη αλγόριθμων, θα έπρεπε να περιλαμβάνουν πληροφοριακούς τίτλους, επικεφαλίδες, μετά-πεδία και καλές συνδέσεις .

  14. Google

  15. In.gr

  16. Yahoo.gr

  17. HotBot

  18. Q. My page no longer comes up tops at Google for a particular search term. Why not? • Google, like all search engines, uses a system called an algorithm to rank the web pages it knows about. All search engines make periodic changes to their ranking algorithms in an effort to improve the results they show searchers. These changes can cause pages to rise or fall in rank. Small changes may produce little ranking differences, while large changes may have a dramatic impact. • Google made a change to its algorithm at the end of last month. This fact is obvious to any educated search observer, plus Google itself confirms it. The change has caused many people to report that some of their pages fell in ranking. These pages no longer please Google's algorithm as much as in the past. • If your page has suddenly dropped after being top ranked for a relatively long period of time (at least two or three months), then it's likely that your page is one of those no longer pleasing the new Google algorithm. Running what's called the filter test may help confirm this for you, at least in the short term. • Keep in mind that while many pages dropped in rank, many pages also consequently rose. However, those who dropped are more likely to complain about this in public forums that those who've benefited from the move. That's one reason why you may hear that "everyone" has lost ranking. In reality, for any page that's been dropped, another page has gained. In fact, WebmasterWorld is even featuring a thread with some comments from those who feel the change has helped them.

  19. Q. Does this mean Google no longer uses the PageRank algorithm? • Google never used the PageRank algorithm to rank web pages. PageRank is simply a component of that overall algorithm, a system Google uses to measure how important a page is based on links to it. It has always -- ALWAYS -- been the case that the context of links to the page was also considered, as well as the content on the page itself. • Unfortunately, some writing about Google have called its system of ranking PageRank, and Google itself sometimes makes this mistake, as seen in its webmaster's information page: • The method by which we find pages and rank them as search results is determined by the PageRank technology developed by our founders, Larry Page and Sergey Brin. • In reality, the page describing Google's technology more accurately puts PageRank at the "heart" of the overall system, rather than giving the system that overall name. • By the way, PageRank has never been the factor that beats all others. It's has been and continues to be the case that a page with low PageRank might get ranked higher than another page. Search for books, and if you have the PageRank meter switched on in the Google Toolbar, you'll see how the third-ranked Online Books Page with a PageRank of 8 comes above O'Reilly, even though O'Reilly has a PageRank of 9. That's just one quick example, but I've seen others exactly like this in the past, and you can see plenty first-hand by checking yourself.

  20. Technology OverviewGoogle stands alone in its focus on developing the "perfect search engine," defined by co-founder Larry Page as something that, "understands exactly what you mean and gives you back exactly what you want." To that end, Google has persistently pursued innovation and refused to accept the limitations of existing models. As a result, Google developed its own serving infrastructure and breakthrough PageRank™ technology that changed the way searches are conducted. • From the beginning, Google's developers recognized that providing the fastest, most accurate results required a new kind of server setup. Whereas most search engines ran off a handful of large servers that often slowed under peak loads, Google employed linked PCs to quickly find each query's answer. The innovation paid off in faster response times, greater scalability and lower costs. It's an idea that others have since copied, while Google has continued to refine its back-end technology to make it even more efficient. • The software behind Google's search technology conducts a series of simultaneous calculations requiring only a fraction of a second. Traditional search engines rely heavily on how often a word appears on a web page. Google uses PageRank™ to examine the entire link structure of the web and determine which pages are most important. It then conducts hypertext-matching analysis to determine which pages are relevant to the specific search being conducted. By combining overall importance and query-specific relevance, Google is able to put the most relevant and reliable results first. • PageRank Technology: PageRank performs an objective measurement of the importance of web pages by solving an equation of more than 500 million variables and 2 billion terms. Instead of counting direct links, PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives.

  21. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. Important pages receive a higher PageRank and appear at the top of the search results. Google's technology uses the collective intelligence of the web to determine a page's importance. There is no human involvement or manipulation of results, which is why users have come to trust Google as a source of objective information untainted by paid placement. • Hypertext-Matching Analysis: Google's search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), Google's technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. Google also analyzes the content of neighboring web pages to ensure the results returned are the most relevant to a user's query. • Google's innovations don't stop at the desktop. To bring its accurate and speedy search results to users accessing the web through portable devices, Google also pioneered the first wireless search technology for on-the-fly translation of HTML to formats optimized for WAP, i-mode, J-SKY, and EZWeb. Currently, Google provides its wireless technology to numerous market leaders, including AT&T Wireless, Sprint PCS, Nextel, Palm, Handspring, and Vodafone, among others. • Life of a Google Query

  22. Life of a Google Query • The life span of a Google query normally lasts less than half a second, yet involves a number of different steps that must be completed before results can be delivered to a person seeking information. • 3. The search results are returned to the user in a fraction of a second.   1. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book - it tells which pages contain the words that match the query. 2. The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.

More Related