Monday, April 23, 2007

web history

After many occasions to pull out my hair and a (rather civilized, i must add) rant about not being able to find what I had only recently seen, there is finally something claims to be your anodyne at least in this regard.

Whoever had to has already blogged about Google's latest - So I'm not going to do that. These are just the first few thoughts that crossed my mind when I read about it.

A little voice in my head told me that it might not be a wise idea to jump into it. I know this doesn't make sense and probably sounds paranoid - but Google already knows too much about me. This is really personal and private data and I will have no control over it. Hell, I'm all for gathering data and trusting organizations when they say the data will be use cumulatively and all that. Has my Google usage crossed some threshold in my mind?

Give me a desktop solution and I'm all for it. Why can't this be a browser plugin? Or similar to Google desktop? The data and the index sits on my desktop. Yes, the data can grow too large; yes, I could possibly lose the data. Okay, let's work this one out.
What is important is the index. The data can reside on an open server for all I care. So, as I browse the pages get cached locally and indexed. As the data gets older than a day or two, it gets pushed off to the server freeing up space on my local hard drive. The index stays on my desktop and refers to data on the server. This solves the problem where the data can grow too large. What about the index? Wait a moment.. I have only 24 hours in a day to surf the web. How much can the index grow? With desktop hardware today being almost as powerful as the hardware powering the "cheap commodity hardware" solutions the internet behemoths love to brag about, I think I can self-sustain. As for losing the index - well, I should be allowed to take backups of the index regularly on to the server and the backup should be locked up with a key provided by me.

The problem with this is that you cannot put together data of hundreds of thousands of users and come with really powerful analyses and intelligent software. Which is the whole point of having something like Web History.

If only some other company had decided to build it....


Pallavi Nopany said...

Wait a second, so are u saying that those indexes are actually stored somewhere in serverland? Or is it that google has taken measures to delete those indexes?

Umang said...

Yep. All on serverland. Scary, huh?

Jackline said...

Hi Nice Blog .I think HR understands the importance of other people tracking time--IT, Lawyers, non-exempt employees, but struggles with the idea of web time clock .