Search Engines

All about Search Engines…

For Fast and Autonomy related documents kindly see the links below Search Engines (from the left pane) 

 How do Search Engines Work?

Search Engines for the general web (like all those listed above) do not really search the World Wide Web directly. Each one searches a database of the full text of web pages automatically havested from the billions of web pages out there residing on servers. When you search the web using a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine’s search results, you retrieve from the server the current version of the page.

Search engine databases are selected and built by computer robot programs called spiders. These “crawl” the web, finding pages for potential inclusion by following the links in the pages they already have in their database (i.e., already “know about”). They cannot think or type a URL or use judgment to “decide” to go look something up and see what’s on the web about it. (Computers are getting more sophisticated all the time, but they are still brainless.)

If a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page – one that no other page has ever linked to – can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this.

After spiders find pages, they pass them on to another computer program for “indexing.” This program identifies the text, links, and other content in the page and stores it in the search engine database’s files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content.

Many web pages are excluded from most search engines by policy. The contents of most of the searchable databases mounted on the web, such as library catalogs and article databases, are excluded because search engine spiders cannot access them. All this material is referred to as the “Invisible Web” — what you don’t see in search engine results.

Advertisements

One Response

  1. Here is a web service that provides RSS results for Google Search

    http://www.ecubicle.net/gsearch_rss.asmx

    Since the returned XML is RSS 2.0 compliant, the web service can also be called in a feed reader like this:

    http://www.ecubicle.net/gsearch_rss.asmx/GetSearchResults?searchPage=0&gQuery=tutorial+asp+net&numOfResults=15

    You need to change the searchPage, gQuery and numOfResults querystring parameters to your choice. gQuery parameter accepts all standard Google Search Operators.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: