Thursday, July 16, 2009

Crawling the web for...infringements?

While in class, you find yourself utterly bored. You attempt to go to your favorite site, only to find out that it has now been blocked. Oh no. Happiness does end, after all. But you do not give up. You resort to whatever you think can bring you happiness. And so stare at your homepage. You think, what could you possibly google? You take a look around, and after realizing that there’s no one worth to look at, you decide to google an image of someone you think worth’s looking at.

You find a good picture, click the link, and you find yourself staring at this:

That is what the web crawler does.

It’s hard to explain what a web crawler does, but I guess that pretty much explains it. Whenever you click on an image or any link, and you find the entire page appearing in the search engine, inside another box within the same browsing window, this is caused by a web crawler. That’s for us, lay people. Wikipedia defines it as a computer program which is able to browse the internet in a methodological and systematic manner. The online encyclopedia further adds that the crawler, which is used by search engines, such as google and yahoo, is able to make its own copy of the pages which have been visited to be processed later on by the search engine and eventually indexed to make the search easier.

So you may ask, Is this just your attempt at stretching the topic so far so you could have an excuse to post pictures of Victoria’s Secret supermodels? Frankly, yes; however, there is an intellectual property rights issue here. Basically, what a crawler does is copy the site and reproduces it in the window of the search engine. Take note: the crawler is able to recreate, repost, and store the webpage for viewing by people. The link to the source website will still be posted, but the thing is, even if the link is there, a person can always opt not to browse outside of the search engine’s window and satisfy himself with the page that has been searched for him.

This, to me, presents two issues: first, are websites even protected by patents? Sure, the labels and names can be protected under trademarks, but what about the page itself, i.e. the page which contains the elements arranged in a certain manner, with the specific information. Second, if they can be protected by patents, then is the fact of complete and absolute reproduction of the page, while providing all the links to the source an infringement under the law? There’s also that issue of profit lost by some websites because of the crawler; monetized websites, or the sites which earnings depend on the number of hits received, may actually lose profit when instead of these pages getting the hits, the hits just stop at the search engines.

Yes, there IS an issue, after all.

Web crawler definition from en.wikipedia.org.

1 comment:

Daniel Lising said...

I think websites come under copyright and not under patents and trademarks. However only the expression of the idea is protected and not the idea itself.