Wednesday, July 30, 2008

The Search Engine That Could

The cool thing about the technology behind search engines like Google is that it is the source to find ANYTHING. Whether you want recipes, telephone numbers, latest news, or pictures of your favorite rock star, you can type it into a search engine and find the stuff you want. The technology is pretty complex, because you're not just looking for a phrase in a large text, sometimes you need to understand the semantics of the language to pick up on the key phrases or determine the relevancy of the sites you index. It's very hard to get it right. But when you do, you got a money maker on your hands.

Even though we use search engines for pretty much anything online, I think that search engines are not doing anywhere near enough what they could be. Here's what I mean. Search engines are natural and essential when you have a database, and you can market a search algorithm for almost anything.

If you have a database of people's information, you need a search engine to find people. If you have a database of clothes, you need a search engine to find the clothes you want. Amazon, iTunes, Netflix are all high profile web companies that would dissolve into chaos if their search algorithm went bad. And I would suggest that all such algorithms are connected in some way.

Suppose someone wrote on their blog analyzing the moral complexities in the relatively unknown comic Dodobirdman and it gets a lot of page links in the next few hours, getting on the front page of Digg the next day, naturally, when you are browsing for books and you type in birdman, it'll suggest Dodobirdman and automatically spit out the newest volume of the comic, as well as Dodobirdman backpacks, and of course the popular Dodoman key chains. While when the reverse happens, and a book gets rave reviews and becomes a bestseller, webpages that refer to the things related to the book online should get more attention.

So the first idea is that a search engine company should be a search engine company. Sure, website search engines are very different than Amazon's recommendation system or the Digg algorithm, but they are all related. Why should a company spend money to go make the hardware with enough computing power to analyze all of its data, including the staff required to maintain it, or write a svm multiple tree and constantly update it with the latest in machine language research, when they have the option of outsourcing it to a reliable company?

There are many ways of providing such a search engine service and it might be pretty tricky to find the right way. The content and all the "signals" (as google puts it) may be confidential information from the client, and the search algorithm would be the business secret of the service, but the algorithm needs the data to work and when you're optimizing something like that, you're bound to access a little bit of both. One way is to make servers with a collection of different search engine tools built-in, that you can put your data on, as well as getting it processed the way you want to. Even if the hardware isn't top-notch, if your software is unique, companies would buy them like hot cakes.

I can probably write more about the subject, but unless you're microsoft, yahoo, or google, you probably won't be interested, so let's move on to IDEA 2!!!

One of the problems with search engines is the trade-off between search engine effectiveness and privacy. You can find out all of the interests of a particular user, and recommend them everything they are interested in with high accuracy, but you would need training data that people don't really want big-name companies to be keeping in their database. If search engines now can already provide relevant results, imagine what they could do with your private data.

So idea 2 would be to create a client-side search engine the keeps track of things you buy, the searches you do, the kinds of website that you like that does not in any way reveal your information to anyone else. It will be your personalized search engine. When you encounter any kind of large collection, say search result from google or recommendations from buy.com, the local search engine would take those results and reorder the priorities depending on past behavior of the user. Of course, to protect the privacy of its user, it will have to non-discriminately download the information about the things it wants to look for, increasing the bandwidth in a world where bandwidth matters less and less to companies.

If people find what they want, they'll buy more things. So the development of this software (like a browser plug-in, or browser feature) can be a collaborative effort from major web companies that will give the sponsors an advantage of understanding the inner workings of the local search engine for optimization, while still giving the means for startups to take advantage of the local search with little investment.

If this program REALLY behaves itself, maybe it will also be able to expand to giving adword-like little popups when you aren't online at all. It could respond to the text that it sees you typing into your computer, or programs that you have running. If this idea evolves to this stage though, there are many other things you can do with the personal e-butler. It might become so good that you won't ever miss out on any of the Dodobirdman memorabilia, ever again! WOOT!