Post Views:0
CHAPTERONE
1.0 INTRODUCTION
The volume of information on the web isalready vast and is increasing at a very fast rate according to Deepweb.com [1].The Deep Web is a vast repository of web pages, usually generated bydatabase-driven websites, that are available to web users yet hidden fromtraditional search engines. The computer program that searches the Internet for newlyaccessible information to be added to the index examined by a standard searchtool search engine [2] used by these search engines to crawl the web cannot reach most of thepages created on-the-fly in dynamic sites such as e-commerce, news and majorcontent sites, Deepweb.com [1].
According to a study by Bright Planet [3],the deep web is estimated to be up to 550 times larger than the ‘surface web’accessible through traditional search engines and over 200,000 database-drivenwebsites are affected (i.e. accessible through traditional search engines).Sherman & Price [4], estimates the amount of quality pages in the deep webto be 3 to 4 times more than those pages accessible through search engine likeGoogle, About, Yahoo, etc. While the actual figures are debatable, it made it clearthat the deep web is far bigger than the surface web, and is growing at a muchfaster pace, Deepweb.com [1].
In a simplified description, the webconsists of these two parts: the surface Web and the deep Web(invisible Web or hidden Web) but the deep Web came into public awareness onlyrecently with the publication of the landmark book by Sherman & Price [4],“The invisible Web: Uncovering Information Sources Search Engines Can’t See”.Since then, many books, papers and websites have emerged to help furtherexplore this vast landscape and these needs to be brought to your notice too.
Most people access Web contents with Surface SearchEngines and 99% of Web content is not accessible through Surface Search Engines.
A complete approach to conductingresearch on the Web incorporates using surface search engines and deep webdatabases. However, most users of the Internet are skilled in at leastelementary use of search engines but the skill in accessing the deep web islimited to a much smaller population. It is desirablefor most user of the Web to be enabled to access most of the Web content. This work therefore seeks toaddress problems such as how Deep Web affects: search engines, websites,searchers and proffered solutions.
The broad objective of this study ismeant to aid IT researchers in finding quality information in less time. Themain objective of the project work can be stated more clearly as follows:
The study on deep web is necessarybecause, it brings to focus problems encountered by search engines, websitesand searchers. More importantly, the study will provide information on theresults of searches made using both surface search engines and deep web searchtools. Finally, it presents deep web not only as a substitute for surfacesearch engines, but as a complement to a complete search approach that ishighly relevant to the academia and the general public.
What is Deep Web?
Wikipedia [5], defined the surface Web (also known as the visible Web or indexable Web) as that portion of the World Wide Web that is indexed by conventional search engines. Search engines construct a database of the Web by using programs called spiders or Web crawlers that begin with a list of known Web pages. For each page the spider knows of it retrieves the page and indexes it. Any hyperlinks to new pages are added to the list of pages to be crawled. Eventually all reachable pages are indexed, unless the spider runs out of time or disk space. The collection of reachable pages defines the surface Web.
For various reasons (e.g., the Robots Exclusion Standard, links generated by JavaScript and Flash, password-protection) some pages cannot be reached by the spider. These ‘invisible’ pages are referred to as the Deep Web.
Bergman [6], defined the deep Web (also known as: Deepnet, invisible Web or hidden Web) to mean World Wide Web content that is not part of the surface Web indexed by search engines. Dr. Jill Ellsworth coined the term “Invisible Web” in 1994 to refer to websites that are not registered with any search engine.
Sherman and Price [4], defined deep web as text pages, files, or otheroften high-quality authoritative information available via the World Wide Webthat general-purpose search engines cannot, due to technical limitations, orwill not, due to deliberate choice, add to their indices of Web pages.Sometimes referred to as invisible web” or “dark matter”
Origin of Deep Web
In 1994, Dr. Jill H. Ellsworth a university professor who is also anInternet consultant for Fortune 500 companies wasthe first to coin the term “Invisible Web” [6]. In a January 1996 article, Ellsworth states: “It would be a site that’s possibly reasonably designed, but they didn’tbother to register it with any of the search engines. So, no one can find them!You’re hidden. He called that the invisible Web”.
The first commercial deep Web tool (although they referred to it as the “Invisible Web”)was AT1 (@1) from Personal Library Software (PLS), announced December 12th,1996 in partnership with large content providers. According to a December 12th,1996 press release, AT1 started with 5.7 terabytes of content which wasestimated to be 30 times the size of the nascent World Wide Web.
Another early use of the term“invisible web” was by Bruce Mount (Director of Product Development) and Dr.Matthew B. Koll (CEO/Founder) of Personal Library Software (PLS) whendescribing AT1 (@1) to the public. PLS was acquired by AOL in 1998 and AT1 (@1)was abandoned [7], [8].
AT1 is an invisible web which allows users to find content “below,behind and inside the Net” therefore users can now identify high qualitycontent amidst multiple terabytes of data on the AT1 Invisible Web; toppublishers join as charter members.
Can't find what you are looking for? Hire An Eduproject Writer To Work On Your Topic or Call 0704-692-9508.
Proceed to Hire a Writer »