Wednesday, October 08, 2008

A.I.: Modeling reality through supposed (uncertain) associations

If you have an account at Amazon, you may have noticed that somewhere on the screen after your login, the system produces a list of recommendations on things that you may find interesting. This is a little project started by Greg Linden. You could consider this some kind of A.I. engine. The basis of the idea is the assumption/claim that an association exists when customer A buys book X and Y and customer B buys book X only, customer B may also be interested in book Y. This model can further be extended by logging what has been in the shopping cart at some point in time, such that it's probably of interest to a person, even though they end up buying it or not.

Does the relationship really exist? Probably in x % of the cases the relationship is real, but I have bought books for my wife for example and since then, the engine keeps recommending me books on corporate social responsibility. Although I do find the topic interesting, I'd rather hear summaries about it then dive into a 400-page bible describing it :).

But such is life then. A computer has very sparse information about online customers to reason with. And once you develop such technology, it's a good thing to shout about it, since it's good marketing. However, the point of this story is not to evaluate the effectivity of the algorithm or engine behind Amazon recommendations, it's to show that these A.I. systems are not necessarily that complicated.

The first thing to do is to understand modeling this space is all about finding sensible relationships / assocations. In Amazon's case, this is a customer that you may be able to profile further. Do you know their age? what is their profession? are they reading fiction/novels? are they reading professional books? where is their IP from? Can you find out if they're behind a firewall of a large company / university? when you send them some material to try them out, did they click your links? did they then also buy the book? and why?. Of course, you wouldn't start by finding out as much as possible, but you need to think about which properties of a customer are important and figure out a way to determine them.

At the other end of the spectrum are books, waiting for readers. A book has a category, it's got a total number of pages, it has a target age group, it has customer reviews with stars describing its popularity, some are paperbacks, others are always sold when put in the shopping cart, others are removed later, some are clicked on when you send small campaigns to a select customer group. Thus, very soon, the two domains are somehow married together in the middle, but in many different ways, of which some ways cannot be analyzed with great certainty.

A little bit of data-mining helps here to test the certainty of your hypothesis. The next step is to think of a model where you put these things together. You could consider using a neural network.... but why? That'll work well for data that is more or less similar, but can consumer behaviour really considered that way?

Other approaches consider production rules. It's not much different from IF-THEN rules, except that you're not processing them in the order in which they are declared in the program. The problem here lies in the fact that you have millions of books that you may be able to match to millions of customers, but testing every possible combination would certainly cost a lot of processing cycles for nothing. So you need some more intelligence to wisely pre-select sets.

The ideal thing would be to develop a system that is perfectly informed. That is, it knows exactly what your interests are at a certain time and it tries to match products against those interests. Two problems here. Consumer behaviour tells us nobody is going to stop at a website to enter their interests. Second, a customer may not know they're really looking for something until they see it. The second reason being much more interesting, since it's "impulse" buying to a high degree. Exactly what you'd need.

Well, and in case you were expecting a finale where I give you the secret to life, the universe and everything.... :)... This is where it ends. There is no other final conclusion but to understand that a server in 2008 cannot have perfect information about you, especially not when you choose to be anonymous and known at the same time.

So... reasoning and dealing with uncertainty it remains. The efficiency of recommendations is highly dependent (100% dependent actually) on the relationships and associations that you assume in the model. In the case of Amazon, they started with what customer A bought customer B might also find interesting, and developed their concepts further to "wish lists" and mixing it with other information. That still does not capture interests that arise suddenly, which is generally what happens when changes occur in your life. You may for example start buying a house, start a new course in cooking, start a business, have a colleague who talked about DNA and thought it really interesting.

Also, chances are that once you've bought books about a subject and let's say it's technical, you're saturated by that knowledge (or author), and thus your interest wanes. The recommendations you'll see are very likely bound to the same domain. So they are not nearly as effective (except for those who are totally consumed by the subject :).

As a change, you can also attack this from a totally different angle. The information you can build up about your products can be very deep. You could theoretically use consumer behaviour to find out more about your products, rather than applying it to understand your customers better. The idea is to generate intricate networks of associations between your products. Then link those associations back to anonymous users later on. The more you know about your products and those hidden associations they may have, you can react very quickly to anonymous demand. You could also use it to not search for books with a certain term in the title or text, but find books that are ontologically related to the term.

For example, a customer types "artificial intelligence". It's tempting to show books about A.I., but is it really what the customer is looking for? You could make this into a kind of game. Start with a very generic entry point, quickly zoom in on an "area of interest", which is interconnected with a host of products, books and other types. Then start showing 5 options that allows the user to browse your space differently. Always show 20 sample products after that. When a user clicks a specific product, it gets a score to bind that to the terms selected (path) to the product and it's showed again to other users with the same similar path. The higher a product scores (the more popular), it'll automatically pop up more often.

The above model could then be expanded. The idea is that you're not just seeing products that are easily related to the domain of interest, but also have less obvious relationships. That allows customers to see things they wouldn't have looked for themselves and it can peak interest. It's a bit like entering a store without exactly knowing what you want. It's also a bit like searching on the internet. Who knows what you generate if you don't allow the most obvious associations, but only the less obvious ones (which could be part of the heuristics/score).

Just be careful not to get this scenario. :)

No comments: