The Deep Web

Richard Fidler rfidler at
Wed Jan 22 10:24:46 MST 2003

[A year and a half ago, I posted an article on "the deep web" to this list:

It identified and described a number of sites of interest to researchers
that are not normally accessible through the search engines we commonly use.

Here is a follow-up article, from which I have scanned the following
excerpts: "Searching the Internet in the Age of Globalization -- the Deep
Web", by Yvan Cloutier, from L'Actualité terminologique/Terminology Update,
35/4, December 2002.

While addressed primarily to those looking for bilingual or multilingual
language resources, the article  will be of general interest to all those
using the internet for serious research in virtually all domains. Note
especially the links at the end of these excerpts. -- R.F.]

   * * *

Data on the Internet are often disorganized, and finding a piece of
information is not always as straightforward as it is in traditional tools
such as terminology databases, thesauri and libraries. It is easier to find
useful information when the user understands the tools designed to handle
the Web's inherent order and chaos.


The deep Web

The deep Web is structured using proven archiving methods, which makes it
easier to search. General search engines like Google and Wisenut do not
index this part of the Web-the hidden part of the iceberg.

The deep (or invisible) Web is often made up of directories, indexes and
databases that sometimes have user fees. It is a huge reserve library.
Researchers working in specific subject fields can browse the deep Web to
find sustainable resources, which they can eventually add to their personal
bookmark lists.


Characteristics of the deep Web

-- The deep Web includes pages written in PDF (portable document format),
dynamically generated pages, ASPS (active server pages), databases with
restricted access or user fees, and firewall- or pass word-protected pages.

-- The resources are updated more often than on the surface Web.

-- Information is added faster than it is to the surface Web.

-- Search results are more relevant.

-- Pages often come from sites in domains such as edu and org, reserved for
educational institutions and international organizations, and therefore
offer a higher level of language and greater expertise.

-- The deep Web indexes some 550 billion electronic documents not indexed by
regular search engines.

-- More than half of deep-Web content is in specialized databases.

-- Access to 55% of the information on the deep Web is free.


Partial list of deep-Web sites

Eurêka database (multilingual)

Bubl Link (English)

Complete Planet (English)

INFOMINE (English)

Invisible Web (English)

Invisible-Web (English) (English)

ProFusion (English)

PLEASE clip all extraneous text before replying to a message.

More information about the Marxism mailing list