Monday, July 7, 2008

Where did Search Engines Begin?

Where did Search Engines Begin?

The first search engine created was Archie, created in 1990 by Alan Emtage, a student at McGill University in Montreal. The original intent of the name was "archives," but Unix standards required a shorter filename.

Tim Burners-Lee existed at this point, however there was no world wide web. The main way people shared data back then was via File Transfer Protocol (FTP). If you had a file you wanted to share you would set up an FTP server. If someone was interested in retrieving the data they could using an FTP client.

This process worked effectively in small groups, but the data became as much fragmented as it was collected.


The First Search Engines - Archie, Veronica and Jughead

Archie helped solve the problem of data defragmentation in transfer as it became a database of web filenames which it would match with the users' queries.

It soon started to become known and Archie had such popularity that in 1993 the University of Nevada System Computing Services group developed Veronica. Veronica served the same purpose as Archie, but it worked on plain text files.

Soon another user interface name Jughead appeared with the same purpose as Veronica; both of these were used for files sent via Gopher, which was created as an Archie alternative by Mark McCahill at the University of Minnesota in 1991.

The Web Robot or Bot used by Search Engines

Soon the webs first robot came. Matthew Gray introduced the World Wide Web Wanderer. He initially wanted to measure the growth of the web and created this bot to count active web servers. He soon upgraded the bot to capture actual URL's. His database became knows as the Wandex. The Wanderer was as much of a problem as it was a solution because it caused system lag by accessing the same page hundreds of times a day. It did not take long for him to fix this software, but people wondered if robots were a good or bad thing. Computer robots are simply programs that automate repetitive tasks at speeds impossible for humans to reproduce.The term bot on the internet is usually used to describe anything that interfaces with the user or that collects data. Search engines use "spiders" which search (or spider) the web for information. Another bot example could be Chatterbots, which are resource heavy on a specific topic. These bots attempt to act like a human and communicate with humans on said topic.In October of 1993 artijn Koster created Archie-Like Indexing of the Web, or ALIWEB in response to the Wanderer. ALIWEB allowed users to submit their pages they wanted indexed with their own page description. This meant it needed no bot to collect data and was not using up excessive bandwidth. The downside of ALIWEB is that many people did not know how to submit their site.

The Search Engine Spider

By December 1993, three full fledged bot fed search engines had surfaced on the web.

JumpStation, the World Wide Web Worm, and the Repository-Based Software Engineering (RBSE) spider. JumpStation gathered info about the title and header from Web pages and retrieved these using a simple linear search. As the web grew, JumpStation slowed to a stop. The WWW Worm indexed titles and URL's. The problem with JumpStation and the World Wide Web Worm is that they listed results in the order that they found them, and provided no discrimination. The RSBE spider did implement a ranking system.

Excite Search Engine

Excite came from the project Architext, which was started by in February 1993 by six Stanford undergrad students. They had the idea of using statistical analysis of word relationships to make searching more efficient. They were soon funded, and in mid 1993 they released copies of their search software for use on web sites.

All these links seemed somewhat irrelevant because the spiders were not intelligent enough to understand what all the links meant back then so if you did not know the exact name of what you were looking for, it was extremely hard and sometimes impossible to find. Out of necessity the EINet Galaxy web directory was born in January of 1994. It was organized similar to how web directories are today. The biggest reason the EINet Galaxy became a success was that it also contained Gopher and Telnet search features in addition to its web search feature. The truth is the web size in early 1994 did not really require a web directory; however, others soon did follow.


The birth of the Yahoo Search Engine

In April 1994 David Filo and Jerry Yang created Yahoo as a collection of their favourite web pages. As their number of links grew they had to reorganize and become a searchable directory. What set the directories above The Wanderer is that they provided a description with each URL.

Brian Pinkerton of the University of Washington released the WebCrawler on April 20, 1994. It was the first crawler which indexed entire pages. Soon it became so popular that during daytime hours it could not be used. AOL eventually purchased WebCrawler and ran it on their network. Then in 1997, Excite bought out WebCrawler, and AOL began using Excite to power its NetFind. WebCrawler opened the door for many other services to follow suit. Within 1 year of its debuted came Lycos, Infoseek, and OpenText.

Lycos Search Engine

Lycos was the next major development, having been design at Carnegie Mellon University around July of 1994. Michael Mauldin was responsible for this search engine and remains to be the chief scientist at Lycos Inc.

On July 20, 1994, Lycos went public with a catalogue of 54,000 documents. In addition to providing ranked relevance retrieval, Lycos provided prefix matching and word proximity bonuses. But Lycos' main difference was the sheer size of its catalogue: by August 1994, Lycos had identified 394,000 documents; by January 1995, the catalogue had reached 1.5 million documents; and by November 1996, Lycos had indexed over 60 million documents -- more than any other Web search engine. In October 1994, Lycos ranked first on Netscape's list of search engines by finding the most hits on the word ‘surf.'.

A Search Engine called Infoseek

Infoseek also started out in 1994, claiming to have been founded in January. They really did not bring a whole lot of innovation to the table, but they offered a few add on's, and in December 1995 they convinced Netscape to use them as their default search, which gave them major exposure. AltaVista debut online came during this same month. AltaVista brought many important features to the web scene. They had nearly unlimited bandwidth (for that time), they were the first to allow natural language queries, advanced searching techniques and they allowed users to add or delete their own URL within 24 hours. They even allowed inbound link checking. AltaVista also provided search tips.

The Looksmart Search Engine

The Looksmart directory came about in 1996.

The Inktomi Corporation came about on May 20, 1996 with its search engine Hotbot. Two Cal Berkeley cohorts created Inktomi from the improved technology gained from their research. Hotwire listed this site and it became hugely popular quickly. It has since been bought by Yahoo though.

Ask Jeeves

In April of 1997 Ask Jeevesssearch engine was launched. Northern Light was also launched in 1997.



.... and then there was Google .... the world's largest Search Engine

In 1998 the last of the current search super powers, and the most powerful to date, Google, was launched. It decided to rank pages using an important concept of implied value due to inbound links. This makes the web somewhat democratic as each off going link is a vote. Google has become so popular that major portals such as AOL and Yahoo have used Google and allowed that search technology to own the lion’s share of web searches.

Google went public at $85 a share on August 19, 2004 and its first trade was at 11:56 am ET at $100.01.

On September 30, 2004 Vivisimo launched Clusty, which was the first major search engine to offer blog and other similar content type tabs.

On October 5, 2004 Bill Gross (founder of Overture) relaunched Snap as a search engine with a completely transparent business model (showing search volumes, revenues, and advertisers). Snap has many advanced sorting features but it may be a bit more than what most searchers were looking for.

On November 10th Google opened up their Google Advertising Professional program.

On November 18th Google launched the Google Scholar search program.

On January 21, 2005 Google opened up a free cross platform ad tracking.

Geico took Google to court for trademark violation for allowing Geico to be a keyword trigger. Geico lost this US based case on December 15, 2004. Google lost a similar French trademark case against Le Meridien Hotels on December 16, 2004.

On January 18, 2005 Google, MSN, and Yahoo! announced the release of a NoFollow tag which allows blog owners to block comment spam from passing link popularity. This still will not deter spam bots though and was quickly used by many non blog sites. The Wikipedia was the first major non blog site to use the nofollow tag.

The NineMSN Search Engine

In 1998 MSN search is launched. The open directory and direct hit were also launched in 1998.

Disney released the Go Network which has lost much of its popularity since 1999. Fast releases its search technology which is thought to be the closest competitor to Google.

In 2000 the Teoma search engine was released, which uses clustering to organize sites by Subject Specific Popularity. In 2001 Ask Jeeves bought Teoma to replace the Direct Hit search engine.

In 2003 Google released a contextual based ad program by the name of AdSense which allows people like me to make revenue of the automated placement of relevant ads on my pages.

In the later months of 2003, (November 15 to be specific), Google began to heavily introduce many more semantic elements into its search product. Researchers and SEO's like me have noticed the wild changes in search relevancy, but many searchers remain clueless to the changes.

The Other Search Engines

LookSmart bought the WiseNut search engine in March of 2002. They have used their new search product to power the backend of their search. One of the largest problems with LookSmart is that their directory first mentality has hurt their relevancy.

In 2004 MSN dropped LookSmart to be powered from Inktomi. LookSmart struggles to make ends meet because they are putting the cart in front of the horse. RELEVANCY WINS DISTRIBUTION

In 2003 Overture purchased AllTheWeb and AltaVista. Yahoo gobbled up Intomi and Overture.

Yahoo in 2004 dumped Google in favour of its own in-house search engine. Yahoo! Slurp is believed to be collecting data to make a new database separate from the Inktomi database. The new Yahoo! database replaced both AltaVista and AllTheWeb in March 2004.

The latest in Search Engines

Microsoft is making a large investment into developing a new search technology and should be some cause for concern for other major search engines.

Through this course of history many smaller search engines have come and gone as the search industry has struggled to find a balance between profitability and relevancy. There are niche specific engines, meta engines, and in 1997 Overture (named GoTo back then) launched the pay per click variety.

Meta engines search multiple other engines at the same time. They figure by drawing from multiple sources they refine the results to a higher quality. The problem with meta searches is that they are usually overstuffed with advertisements. You are only as strong as your weakest link. InfoSpace powers most of the larger meta search engines.

The newest search engine concepts are web site clustering, semantics, and having industry specific smaller search engines / portals.

In 2004 Nutch and Dispie were to be launched but did not yet make huge waves.

Nutch - open sourced search engine
Dipsie - huge search engine to be which claims it will index over 10,000,000,000 documents this year.
Acoona got endorsed by Bill Clinton but out of the gate their relevancy was a bit questionable.

MSN Beta began to power a large portion of the MSN Search queries on January 20, 2005.

What is the future of Search Engines?

Some of the things that Search Engines will do in the future, include:

Image Scanning Search Engines

It is obvious that scanning images is one of the major upgrades that will soon be possible with search engines. In early 2004 they released Princeton's 3D search engine which can search for images like what you sketch.

Streaming Media Search Engines

Singingfish is already offering streaming media searches. Interesting today, but it will be boring in a few years. There is nothing exciting about connecting information to hungry minds if you are not interesting. I am not actively involved in the future of search engines, but I am interested and excited to say the very least.

Voice Recognition & Emotion Understanding

In the future computers will become more understanding of speech and applying the appropriate words to the sounds we make. Dragon Naturally Speaking already does a good job of this. Also as the information revolution is taking place it will become so that computers can be more able to understand emotion and what we are really "searching" for. Eventually advanced monitoring biofeedback monitoring equipment will aid us in discovering our true passions and what we want (what we are searching for).

Better Resources to Search Through

Currently programs like Google AdSense encourage the creation of solid content. This will improve the quality of content which search engines are able to find - currently one of their limiting factors.

Search Engines Indexing Dynamic Content

On the technical back end computers also need to be able to follow links and dynamic content with greater ease. This will require the design and implementation of ultra premium spider monitoring software. Of course the whole time everything else is occurring collecting information about the world around us and the worlds around it will only become easier. As the pool of data continues to grow so will the quality of distributed computing.

Yahoo!'s content acquisition program aims to index more dynamic content. Dipsie is to be launched in 2004 with the ability to execute form boxes.

Find out more about Google
Click here to find out what a Search Engine Optimisation is

No comments: