Searching the Web


The Search Tools

  • Search Engines
  • Web Directories
  • Multi-engine searches
  • Specialized Sites

Links for this class

Search Engines

Search engines are searchable databases that appear as web pages, with built-in common, plain English query systems. Their databases are built in two steps. First, "spiders" or "robots" roam the web, sending back information on the web pages they encounter. Alta Vista calls it's spider "Scooter" for example, and Scooter is able to visit 6 million sites a day.

Spiders don't "invade" your web pages. They visit web servers and politely request documents, which they then examine. They may find these documents by following links on pages they've already indexed, look at "What's New?" pages, or pay particular attention to popular web sites. Most of them allow authors to submit their sites for inclusion in the database.

Once a spider encounters a new web site, it informs the indexing software back home and indexing begins. An index can be a list of all relevant words found on a page, or perhaps just from specific location such as the title, special "meta" tags in the header, or they might read the first few paragraphs on each page. Those that index all the words on a page may give extra weight to certain words depending how often and where they appear.

Since the web changes so rapidly, spiders must revisit previously indexed sites to make sure they still exist and look for changes. The "freshness' of search results depends on how often this is done, and so a high return of 404 error messages and irrelevant sites would indicate that the engine is not revisiting sites often enough. Alta Vista appears to revisit most sites about once a month.

Studies have concluded that no search engine indexes more than one-third to one-half the web. (Lawrence and Giles, Science, 280 (5360) April 3, 1998) Plus, when you factor in the propritory differences between indexing techniques, it turns out that there is little overlap between search engine results. This is the reason why we should remain flexible in our selections and approach to web searching.

Am I searching the entire web?

No, not really. When you use a search engine, you are submitted your query to the previously assembled index, which looks for your word or phrase and returns a list of pages that contain it.

So many results! How can I possibly sort through it all?

You can't, so don't let the numbers bother you. In the first place, you won't be able to see those thousands of pages anyway, as most search engines will only display a smaller sample. In the second place, it what you need isn't on the first couple of pages of the list, go back and refine your search term or find another search tool.

This is because the major search engines perform a third task - called relevancy rating, and you need to be aware of it. The lists that are returned are generally ranked according to their relevancy to your query. How this is accomplished varies, and proprietary algorithms that differ widely among the search engines are constantly being fine-tuned. In general, relevence is related to how often the search term appears in the page, whether or not it appears in the title or first paragraph, how often the web page is visited, or how many other pages link to it. So - if the results you get are way off base, that means the search engine is having trouble understanding your question, and you need to rethink it.

So if it's on the Internet it must be right. Right?

No way. First, a search engine will generally return ALL the references it has for a topic. It makes no distinction between reports from 4the graders or the National Academy of Science and is just as likely to return the name of a rock group as a scientific phenomena. The other difficulty is harder to address - that anyone can get a web account and "set up shop" on the internet. Judging the credibility and the accuracy of a site is our individual burden - and will only get worse as the Information Superhighway adds more lanes and raises the speed limit.

Things aren't always what they seem.

It's difficult for a search engine to distinquish between identical words with different meanings. These contextual ambiguities result in a return of sites about contraceptive sponges when you're searching for information on the phylum Porifera, and matchmaking services when you're really interested in heart disease.

Which is the best search engine?

You'll find that loyalty runs deep when it comes to everyone's very own favorite search engine, and reviews will vary widely in their recomendations. But to cling to one method is to fail to take advantage of the fluid, rapidly evolving state of the art. My suggestion is to periodically take the time to run some tests yourself. Everyone's an expert on something - so perform some searches on topics that are familiar to you. See if you're satisfied not only with the depth of the search but also the relevancy rankings of the sites. If you have special requirements, such as the need to explore current news sources, some sites may be better than others for your needs. Be aware that competition is fierce in this area, and all of the top contenders are constantly upgrading and refining their user interfaces and their techniques. So it pays to explore now and then, and revisit an old engine or check out the new ones that crop up.

Web Directories...the other option

Web directories are best used for broad, general information. They are indexed by actual people, not software programs. They categorize sites based on topics, which are either provided by the author submitting the site or by their own staff. Most rely on submissions for their material and as a result, their databases will not be as large. But what you do find will be more sharply focused. Although most directories do provide search engines, though their power lies in their directory structure. Yahoo is the best known web directory and while it uses Alta Vista's search engine, but it's 1400 catagories are well structured and easy to navigate.
 

"Searching the Web" - © 2001 - E. Barbara Meyer - EdTech Center - Life Sciences - University of Illinois - Urbana, IL USA - http://www.life.uiuc.edu/edtech/search.html

Return to Edtech Home Page