Friday, October 10, 2008

What to know about Web search engines....Week 7 Readings

This weeks readings are:

David Hawking , Web Search Engines: Part 1 and Part 2 IEEE Computer, June 2006.

M. Henzinger et al. challenges in Web Search Engines. ACM SIGIR 2002.


First up, David Hawking's Web Search Engines Part 1:

Crawling, what a name to describe a computer function! This reading is basically a clear and concise bare bones description of web search engines infrastructure and explains the process and function of 'crawling'. Honestly, it blows my mind that things that take mere milliseconds on the user end, is actually a fairly involved process that involves queues, fetching, sifting through URLs, organizing among other things. But really, these things happen so fast!
Besides being amazed at the speed of which these things happen, I was also wondering what exactly happened to make people (computer scientists?) think that the web could never be indexed to that of which, it is indexed, with much more information and at a far greater speed then probably ever imagined. I wonder, if I had to do with the fact that someone had the foresight to see that web search engines could and would be hugely profitable and thus, indexing would be possible? Also, this reading also struck me as making search engines sound much more capable then the actually are. Okay, not capable but maybe accurate? I guess, ideally web search engines always link to useful information but depending on the search engine, in my experience, it is not always the most useful information.

PART 2
Oh Indexing! What makes googling terms and phrases so much easier for us! This reading breaks down the processes behind search engines and explains the functions and algorithms that make searching for things on the Internet via search engines so easy and quick.
I found this reading very useful in the explanation of how things are indexed and how search results are generated.
I just wonder about the development of algorithms and functions that make searching and indexing even more accurate and useful. Have computer scientists and programmers done or is the technology constantly changing? Are their any standards involved? Are companies required to share new algorithms or process they come up with, or are the allowed to keep the technology to themselves?


Challenges in Web Search Engines~ Monika R. Henzinger, Rajeev Motwani and Craig Silverstein
This reading provide a much more in-depth discussion of the processes and issues involved with web search engines' information retrieval. The article covered issues such as spam, quality of content, web conventions, duplications , etc.
I found this article to be informative but my questions are more based around quality evaluation...commercial search engines use user-behavior data to evaluate ranking, thus (hopefully) providing more quality information. However, when reading this I wonder do commercial search engines really just use this to help users? My experiences with searching, left me wondering about this and how much true quality really counts to search engine companies.
Also, spam=so annoying.

Overall, each of these readings did a wonderful job of explaining processes, functions and algorithms associated with web search engines. I don't believe it is possible to read about everything that is involved with algorithms, quality of content, indexing and organizing and not be blown away.

No comments: