Friday, January 29, 2010

ATI Stream on Ubuntu 9.10 Karmic

Just as Nvidia has CUDA for GPU programming, the ATI cards also have an SDK to get those hardware-threads revving for solving complex, scientific or brute-force problems, called ATI Stream. I have a 64-bit OS with a quad-core Intel i920 and one HD5850 ATI Radeon on Ubuntu Karmic 9.10. On the 27th, some new drivers for the ATI card were released on the ATI website, which I think together with the nopat flag have resolved my problems getting the SDK to run.

After downloading the SDK, you need to set the LD_LIBRARY_PATH for the system to find the opencl shared libraries and whatever else is in lib. It's actually not as easy to find the installation documents, they're hidden 2-levels deep in links, but here's the full documentation for the SDK (the link to which could be stated more visibly on the ATI Stream page).

Officially, ATI Stream is not yet supported for Ubuntu 9.10. I initially got only segfaults for any application or it just wouldn't find devices even when the latest Catalyst drivers from the ATI site (27-Jan-2010) were downloaded and installed and a full restart was performed. However, when the "nopat" option is supplied as a kernel boot parameter, things work a lot better :). Right now, most samples seem to run except for one specific one that my card doesn't seem to support. Because Karmic uses grub2, the location for modifying kernel boot parameters changed. You need to modify /etc/default/grub instead of /boot/grub/menu.lst. I inserted nopat here:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nopat"

After these modifications, the /usr/lib/OpenCL/vendors links and the environmental changes for LD_LIBRARY_PATH, yours should be working as well.

There is some more information about OpenCL, which I think is a very nice advance into ensuring portability of GPU code to other platforms. A very good explanation about CPU vs. the GPU can be found here :).

I'm planning to use it for some really new and innovative experiments involving different perspectives on neural networks, network spiking and so on. The idea is that I need a large number of parallel processors, which can be very modest in processing power, as they only need to move a very bit of data and perform extremely simple processing steps. The details and the complexities of the work is in the synchronization, so I'll take some looks into that :).

Wednesday, January 27, 2010

What knowledge may actually be about...

I've been cycling in the rain today and suddenly got this awesome thought train. Nothing much to do with rain, but I guess that the cooling characteristics of the cold rain with the cold weather probably caused some superconductivity somewhere. I can't get into too much details what I've come up with to resolve the below general and (as I argue) incorrect assunptions, but I can at least discuss the problems I perceived that might restrict the ability to develop more intelligent behaviours in computer programs. :)

To give you a starting frame of thought, I'm arguing that there are many assumptions about the origin, extent or representation of knowledge in many artificial intelligence reasoning models, in neural networks and whatever else is there. Here is why.

Most computer programs function by taking some parameters from an environment, file or whatever as input and based on that perform some action. A program without input or output is not a computer program. The general idea is thus that by applying some function F(X) on a range of input parameters X, one can derive an action or actions Y that should most logically ensue. The diagram here helps to illustrate this further. It is a depiction of a so-called Markov diagram. These are often used in A.I. to reason about and between states. One can move from state to state along the transitions, but the arrows also restrict movement to just any other state. The diagram need not be statically defined a priori, there are sometimes ways to dynamically extend the diagram at execution time, most of the times determined by a set of rules. M here is the entire world, consisting of possible transitions or relations R, states S (which is more of a placeholder in this case) and measurable parameters T from which you determine the current state S.

The above model works very well for a set of discrete parameters, since they always fit into one single state. For continuous parameters however, it becomes more difficult, because if every single value is considered, the number of states in the model would become infinite. Hence, classification is needed to reduce the number of states, or the development of some kind of 'probability distribution', where one is actually a little bit in all states at once, or one can come up with a fuzzy reasoning method for reasoning between just two states S1 and S2, which are closest to the current 'continuous' state.

The huge assumption made here is that all information leading to some desired action can be derived from a snapshot of the external world, just by inspecting some range of percepts X. Reasoning and intelligence is then reduced to the application of some function F(X), even though this can be quite a complicated affair when there are a possibly very large number of percepts X or a wide range of possible actions Y. The complexity due to the dimensionality of the problem is irrelevant for this conclusion however. Even more worrying here is that there's only one single assignment of intention or purpose in this entire system. The purpose is the definition of the function F(X) in itself, but it'll be difficult to change this as soon as the system has been constructed or trained... Hmmm... :). So one cannot expect this system to generalize a bit and then specialize further into other things.

So, resuming... having a function F(X) implies a direct, causal relationship between (a range of) perception(s) and a most logical (range of) action(s), which only holds if the purpose is fixed and pre-defined and doesn't change. Or worse... the context does not change. Also, knowledge is then only defined as the function that needs to be applied to the percepts in order to derive the action Y, it is not contextual knowledge about what is required in which context. In effect, the only way to make the system more able to deal with situations is to measure more and more in that single snapshot and then trying to fit the function F(X) to suit those measurements.

However, instead of considering that it is the state(s) embedding the information, one should consider that it is the transitions that allow one to collect the real knowledge about events, since transitions embody relationships. The transitions maintain the real information about some state S1, the next state S2, the difference in the percepts, the action(s) that was undertaken and whether S2 is actually a better state to be in when compared to S1, so deriving whether the action was desirable or not, given a certain context. Note that this doesn't yet require much information about a goal state Sg, so one might as easily conclude that there is no direct need to know about the eventual goal state or have a real conception about what it looks like, we can at this point still assume we're slowly and gradually moving towards the goal state and there's still a capability to change direction or impact the environment to reach the goal anywhere (but this is getting too complicated at this point).

This is why I think that discarding the thought that there is a causal relationship between percepts and most likely or desired action should be a first objective in order to develop more intelligent systems. Knowledge is more likely some kind of imagination or dreaming of the effect of actions from any given particular 'general state' and the kind of changes in perceptions how one can measure such progress, all this framed into a particular context. This should allow systems to reason with shorter horizons or reason at different levels with different horizons, but always reason with specific and particular contexts at each level. The definition of how many transitions you allow before making conclusions, or the extent of correlation between events is then part of the game and part of the experience...

Monday, January 25, 2010

Cybernetics

For my work, I've been looking at neural networks in more detail, but specifically how they may be applied to problems with temporal properties. Most artificial neural networks do not store their values or activations as some kind of remanent activation very well. Such neural networks can only work in situations where the input applied to the network is complete enough to derive conclusions from it. In that case, it doesn't matter if there is a known formula for the correlation between input and output values or whether there is one, albeit a complex one, that is being approximated. Such direct correlations after one single frame of observance is the typical neural network that you get today. More recent developments also give you recurrent networks or LSTM cells, but those model patterns or 'data' over time. Thus, they establish correlations or relations between sequences of events. For all these networks, you'd generally expect a mathematical explanation or approach to back up the learning methods applied for it (if they haven't been trained with evolutionary algorithms).

Even in the case of evolutionary algorithms though, there is significant complexity involved in training the network so that it performs well. Unless the input/output correlation between classification or pattern recognition is very strong, which makes any problem quite easy. Basically, this means that in the same way that a formula calculates a set of output value(s) in a deterministic way (and 1+1 is always assumed to equal 2 and never changes), a neural network is just a very deterministic approximator. Given a trained network, any two inputs yield always the same output. This is true, unless you use an RNN. But the RNN is designed to work on temporal problems, so given the same sequence, this yields the same output to classify or reproduce the sequence.

The picture above shows a neuron receptor that is involved in a process called chemo-taxis. This process allows the worm to navigate in an environment where food is available. The worm has no eyes, can feel heat (thermotaxis), and wriggles its way within some environment. If the worm is wriggling towards a food source, it's climbing up an increasing gradient where the chemicals picked up by the neuron increase. When it climbs up an increasing gradient, the worm is less likely to make sharp turns. If the worm is on a decreasing gradient or doesn't detect any food source, it is more likely to make sharper turns.

Such a problem sounds very simple to resolve, but remember that we have eyes. How do you locate a particular source of smell if you only have a nose? You need to move around in a room, being careful not to cause any harm to the body. Depending on the sensitivity of the olfactory sense and the distance of the smelly source, the distance between one sample and another must be larger or may be smaller. The samples also must be taken within a certain amount of time, otherwise no significant difference can be observed.

The above shows how some organism, actor or agent must interact with the environment in order to sense it and make conclusions about the layout or composition of the environment. Without the ability to perform actions in the environment, I don't think there can be any development of intelligence at all.

Such statements are derived from a particular field of knowledge called 'cybernetics'. This sounds like some science fiction term I'm sure, but it's not just related to cyborg, cyberspace, the internetz or what have you.

Cybernetics is a recent term from the 40's and 50's defined by Norbert Wiener. The word is derived from the greek "kybernētēs", which means helmsman, rudder, etc. and has the same root as government. It was initially intended to study self-regulatory systems, but quickly expanded to other areas like management (Stafford Beer), biology and also computer science. The science is quite broad in scope now and poses questions about how systems make sense of their environment by interacting with it. Finding out what an environment is about isn't exactly the same as just perceiving it.

The difference with Artificial Intelligence is quite strong. Cybernetics is more theoretical, whereas A.I. has basically just gone ahead with a lot of assumptions, trying to wrap reality into mathematical models and executing them on a computer. My criticism of A.I. until this point is that it is simply too deterministic for anything 'new' to come out of it. The term A.I. has been a bit bold as well. Sure, we have large-scale systems analyzing emails and other kinds of data and knows how to extract information from it to make predictions, classifications or specific groupings of data. But those capabilities have been intended by designers. There is nothing that such algorithms or machines can ever do different but execute those particular implementations.

In that sense, expecting anything truly intelligent to come out of A.I. as it is currently approached (through mathematical formulas) is very hopeful at best. A better understanding of knowledge, its roots and how it may be represented over time and interacts with new environments is needed.

That is not saying that A.I. is useless. It's saying that A.I. certainly has approaches for doing pseudo-cognitive work by helping out massaging details out of very large mountains of data or finding specific correlations that we cannot even imagine. So there's certainly room for the A.I. as we know it now and some new kind of vision on what truly intelligent machines can bring us. Those kinds of machines however need a different basis than just mathematics and strong correlative behaviour. They need to find out for themselves what is important and have their own ways of interacting with the environment (which is how they find that out :).

Monday, January 11, 2010

Neural network dynamics

The picture to the left is an LSTM cell that can be used in some neural networks to 'remember' sequences of previous inputs. It can therefore re-create similar outputs as have been observed in previous runs or it can be used to associate one thing with another in temporal terms. Anyway, the reason for posting today is that there are particular dynamics that one should understand to choose the right 'kind' of network or even if one should use a neural network anyway. The more obvious design criteria are related to explain-ability of certain classifications or outputs. If you expect a neural network to give you reasons for why it indicated a certain classification or output, you're out of luck. So they have limited use in knowledge based systems where there must be traceability of the observations that lead to some conclusion or the observations that need to be taken in order to derive a certain final conclusion (enrichment and convergence towards a certain diagnosis for example). So in such cases, you should definitely use different technology.

Input signals can be interpreted or preprocessed in many ways and the exact method that you choose to model your network may have complicating effects on the design you require in the neural network that is supposed to resolve the problem. For example, if you have a mine sweeper as in http://www.ai-junkie.com/, then it is given its own orientation and the direction to the closest mine as input. Then you have a couple of weights and the end result is the output at two terminals, which is defined as the activation of the right and the left track of the mine sweeper. The overall goal is to get mine sweepers that consistently reorient their own direction to match the direction of the closest mine, such that they can clean up the mines as fast as possible.

However, another possible design choice is to use the distance to the closest mine instead. This really changes the entire problem scope, because a single 'frame' of the entire situation doesn't immediately provide all the information that is necessary such that the tank can be directed to the tank. Maybe this reorientation of the tank isn't successful from frame 1 to frame 2, but certainly in the direction/orientation approach, you can rest assured that you can guarantee convergence to match the two together and the ability to train a network to exhibit this obvious correlation (and basically behaviour).

In the case of distance however, the mine may be in a radius around the tank at any distance. The only way in which the tank can find out more about the actual location of the mine is to choose a single action and observe the difference in the distance. This particular action therefore is a very important key decision that determines the action that the tank should take in the frame thereafter (the time between frames in a simulation isn't necessarily 0.0000001 seconds, we could easily take 1 second and be happy with it. It does determine the resolution and accuracy of the end solution, but even minor fluctuations should provide this information).

So, the reasoning that takes place here cannot be executed successfully by observing a single frame of information. In other words, you need changes in the external environment, sometimes induced by your personal actions, in order to determine what happens next. This often occurs in robotics or other controller situations, where the relation or 'directionality' is not always known. Or sometimes you can derive it, but you don't know how you should react to it. Also, changes induced by other forces in that environment will then allow the system to react to those in similar terms, since the observation is changing.

The point is that there are certain dynamics in real-life scenario's that standard feed-forward neural networks can never deal with. Because FFN's must have full information available in a single frame that conclusively can result in the output situation to converge to some expected result. In cases where the external environment has a larger reaction period... for example the temperature of cooling water for a very large diesel engine, these neural networks will very likely never be able to react successfully, because they do not understand or embed the concept of memory.

This is why the type of the network is important and why there has been so much research into Recurrent Neural Networks. These RNN's have the capability to store observations of a number of past events into the memory of the network, such that the observations made in history impact the actual decisions that are taken at the output. Standard RNN's can remember information for about 10 timeframes (whatever the time resolution is you've chosen), whereas LSTM's have the capability to remember significant events up to 1000 timesteps. The idea is that significant reoccurrences of some events cause these cells to suddenly behave different and indicate to output neurons that some event is reoccurring, inducing a different kind of response than in other, more regular or random situations.

Now... there's a worm on this earth called Caenorhabditis elegans, which has exactly 302 neurons. This worm reacts to the smell (chemical concentration) of bacterial residu, because it knows that it can feed on the bacteria that are necessarily present there. Some bacteria infect the worm and harm it, whereas others are excellent feeding grounds. It innately avoids the smell of some types of bacteria that are harmful, but there are two types of harmful bacteria that it is especially attracted to. Lab research has shown that after exposure to these two harmful bacteria, it learns to avoid them later. Other behaviour includes social feeding, where worms gather together on large piles of food, up to some threshold determined by the amount of oxygen that these worms observe by one or two sensor neurons that detect oxygen.

If you ever wonder what a neural network in biology looks like, here's a full description of such a network. You can trace the network from sensor neurons to motor neurons and the actual muscle. But beware! You'll also see that this type of network is extremely intricate.

This shows that the dynamics of particular problems aren't necessarily obvious. Those things you can observe and how they interact with other dynamics of the problem interact together to which a solution has to be found that doesn't violate critical constraints, but still allow the problem to converge to some optimal solution.

Monday, January 04, 2010

Fitness functions

I'm reading up on Genetic Algorithms and the most important thing for these algorithms is to choose a correct fitness function. This is a function that determines a score for the performance of a certain instance or candidate for the solution or closest approximation of a given problem.

For some mathematical problems, if you have a set of data that you wish to approximate, this fitness function can be given by, for example, the root mean square error of the differences of the calculated result and the known result. A neural network wouldn't be used if the actual function was available, so let's assume that the problem is somewhat hard and cannot be approximated appropriately with either a rule or a function approximation.

Neural networks are sometimes trained using genetic algorithms by gencoding of the neuron weights, randomizing these in the first generation and then incrementally mutate, mix, pair and select these network instances until one network is found that seems to perform appropriately.

Eventually, after 50 generations or so, networks emerge that exhibit the desired behaviour expressed in the fitness function. But if we take this as an analogue to the biological world (where both ann's and ga's come from), then this fitness function isn't a single expression.

If what Darwin says it's true, imagine the time it must have taken to evolve from simpler organisms into more complex ones and the evolutions and changes that must have occurred in populations. The genetic effects are measured over the entire population (not on individuals).

For computers, we assume the role of overall biological system, but I argue that this is only suitable for the simplest of cases (assuming this role actually means defining the fitness function).

Besides considering plain survivability of a particular instance, the social behaviour of the overall group should also be considered (the generation that the individual net is part of). Some animals for example increase their survival rates by working together on occasion, which may leave room for other developments to occur that are highly favourable (although negatively impacting survivability). And if one individual succeeds in subverting or influencing another individual to their own gain, then this is one thing that should be taken along in the fitness function.

Come to think of it; many GA's are only used during a training session, but discarded otherwise. Most neural network training so far has been about pursuing a particular unknown function with a large amount of observational data. But each network generally is constructed as a lone individual in the entire population. still, there is at least some research that has been undertaken in the area of neural network ensembles, where cooperation is sometimes used to solve sub-problems, or in the application of environments where multiple objectives are present. As such, successful combinations of neural networks are what count most, not individual results. So, cooperation becomes an important factor for individual survival.

As with the previous post, for anything interesting to come out of neural networks, it may be necessary to find a fitness or cost function of a very complex environment in which this neural network is supposed to execute actions. This sounds really easy, but consider an example of a robot that needs to find a way through a corridor as fast as possible.

A simple fitness function is a measure of the amount of time taken to reach the end of the corridor. We may now feed the robot some sensory data, for example when it bumps into a wall. We always start off a robot in the correct direction. The robot can rotate two tracks on either side. Rotating a track faster than another makes the robot turn to one side, such that it gets a steering ability. Rotating both tracks to the same speed make it go forward.

You may think that the robot that goes fastest wins, but then the scientist sees that the robot has so much speed, it hits the wall with unforgiving force and breaks off a track. So a very stern measure would be to restrict the speed of the tracks, such that none of the individuals can move at that speed. A different solution however is to somehow factor in this speed or wall-hitting force and penalize those robots where this occurs often. Then we may have two different kinds of winners:
  • Robots that still maintain a very high speed, but hit the walls sometimes
  • Robots that maintain lower speeds and gradually arrive at the end of the corridor in good time.
Both strategies have their advantages and disadvantages. The one that is faster can also run faster from danger and therefore can maybe survive better. The one that is more careful has less chances of considerable harm.

So the judges are out on all of this technology. For specific tasks, they work because we can think of a suitable fitness function for them. As soon as things get tougher and wider though, the environment needs to penalize and reward those individuals appropriately. Good environments also allow for diversity in the population. And then one wonders whether such diversified elements continue to specialize in their own populations and wander away from their original ones. A bit like evolution outside of their normal populations.