Seek And Ye Shall Find The Collected Wisdom Gleaned from the EdSeek Project Enlightenments of the Glaringly Obvious Only After We Learned How Glaringly Obvious They Were.
Training Ground Seekology Primary School: Life on the Playground • Eating your own dog food • Try finding content on your own website in major search engines • 1) School Lunch Menu • “shiloh lunch menu” • 2) CNC Router • “cnc router”
Shiloh School Sept. 2003 Reality of the Real World
How do “spiders” find pages • “robots” or “spiders” follow links • Follow standard html links • DO NOT follow image maps, java animated menus, etc. • Therefore – you need standard links to all pages in your site • Ideally include a SITEMAP.html file • EdSeek.org – functional sitemap • www.shiloh.k12.il.us - main navigation menu
Categorized How? • The search engine software parses and reads all TEXT in the page (script/comments ignored) • It assigns priorities to the words • based on location in page • Title, Header section, Body section • based on number of times it is used • based on proximity of requested words to each other • Priority is given to Meta-Tags
Description Meta-Tag Example • <html> • <head> • <title>Smithsonian Institution</title> • <meta NAME="description" CONTENT="The Smithsonian Institution is composed of sixteen museums and galleries and the National Zoo and numerous research facilities in the United States and abroad."> • </head> Smithsonian Institute
Keyword Meta-Tag Example • <html> • <head> • <title>Illinois Technology Conference for Educators IL-TCE</title> • <meta name="keywords" content="education, educators, educational, youth, conference, opportunities, improve, alternative, program, training, equipment, illinois, technology, ideas, schools"> • </head> IL-TCE Conference - IL-Ed&Tech Conference
Taking Controlling of which Pages are Indexed • “ROBOTS.TXT” file • Placed at root of Webserver • Or start in any folder
What is invisible to search engines? • Images (use alt tags) • Script (Java etc, some Image Maps) • Comments/Scripts • PDF & DOC files are not easily indexed • Dynamic generated pages from Databases
Seekology 101: Introduction to Seekology A Primer for Uber-Geeks, Alpha-Geeks, Neo-Geeks and Non-Geeks Who Seek
Enlightenment No. 1 Nothing is available on the global network (web) unless someone puts it there.
Enlightenment No. 2 If you put something on the global network and don’t tell anyone that it’s there, it might as well not be on the global network.
Enlightenment No. 3 The most fundamental unit of information on the global network is the file.
Enlightenment No. 4 The most fundamental method of accessing the most fundamental unit of information is the hypertext link.
Enlightenment No. 5 Humans access units of information by following hypertext links using a process colloquially called “clicking”.
Enlightenment No. 6 Web servers are software programs designed to listen and respond to requests for files (clicks) from clients on the global network.
Enlightenment No. 7 If it cannot be “clicked”, it probably cannot be located by the average human user, unless one knows the exact location (URL or address).
Enlightenment No. 8 Search engines consist of 2 separate software programs: “crawlers” and “indexers”
Enlightenment No. 9 “Crawling” is done by software programs called “robots” (aka “spiders” and “spidering bots”).
Enlightenment No. 10 “Robots” work the same way humans do…they “click” on hypertext links and follow them from file to file.
Enlightenment No. 11 If there is no hypertext link to a file on a web server, a “robot” cannot find that file. Links on your home page are your key to seekology enlightenment
Enlightenment No. 12 Even if a robot finds a file on the web, it may not be able to parse (read) it.
Enlightenment No. 13 Basic spidering robots can “read” ASCII. Only the most advanced robots can read *.doc, *.xls, *.rtf, *.pdf, etc.
Enlightenment No. 14 Unless your web server publishes “indexes” of files, any file that is not the target of a hyperlink is invisible to robots
Example of Enlightenment No. 15 This is how one brand of web server shows a file index
Enlightenment No. 16 The “indexing” software component of the search engine parses files and stores keywords in a database on the server.
Enlightenment No. 17 Your interaction with the search engine is in the form of a keyword search of the database, from which it creates pages of hyperlinks to the files that contain the keywords along with a brief listing of what those files contain.
Enlightenment No. 18 The robot and indexing software are designed to pay special attention to text found within metatagged brackets in the header area of web pages. <meta>info </meta>
Enlightenment No. 19 Search Engines are not intelligent.
Enlightenment No. 20 Search Engines are only as effective as the organization of the global network allows them to be.
Enlightenment No. 21 When material is placed on your web, make sure there is a “clickable” path to find it.
Seekology 380: Optimization Strategies – Beyond the Primer Enlightenments You Can Use on Monday.
Enlightenment No. 22 Use either the web server’s automatic indexing system or a tool such as “dir2html” to create hyperlinked indexes of files.
Enlightenment No. 23 Publish in ASCII when possible. This can include plain text, html, asp, php, or other text-based coding Advantages = small, easily parsed, no plugins
Enlightenment No. 24 Use META-tags abundantly for high-profile documents. META-tag Generators make this easy.
Enlightenment No. 25 Use “ALT=“ attributes to describe graphics that have valued context. <img src=“krebs-cycle.gif” alt=“Diagram of Krebs Cycles”>
Enlightenment No. 27 Deflect robots from your sensitive server areas with the “robots.txt” file User-agent: * Disallow: /search Disallow: /groups Disallow: /images
Seekology Graduate School: Seekology Secrets 5010 Big Secret No. 1 Anyone can deploy a Search Engine.