Sunday, June 01, 2008

Kudos to CUDA

CUDA is an nvidia technology that allows programmers access to the processing power of GPU's, virtually turning your computer into a super-computer by loading certain complex mathematical processing on the GPU itself. Actually, a GPU is a collection of little processors of generally about 128, which are running functions in parallel. So, a GPU will probably benefit you most if you're processing a large dataset that fits into memory. For many simple business applications, it's not worth the trouble.

The GPU needs loads of different threads to become more efficient. The program thus slices up pieces of the algorithm to be processed and then executes its processors on it, that means the design of your 'program' will need to be adjusted accordingly. In comparison with a single CPU, if you have one processor the number of threads quickly saturate the processing power with thread-context switching overhead, thereby mitigating the results of the parallel processing. That's why it's better to have many little processors than one very large one.

The GPU can process algorithms about 250 times faster than a dual-core 2.4GHz CPU. So in other words, if you have a problem that is suitable for being loaded on the GPU, you can have access to a 250node CPU cluster by buying a commercial-grade graphics card of 250 euro's. That is very good value for money! And there are motherboards available where you can load 4 of those cards into your computer. You'll need a 1500W power supply though, but that is far less than 250 * (300/400)W. And there are guys at the university of Antwerp that have built such a supercomputer for the price of 4,000 Euros.

Here's a link to a tutorial on how to install this on Linux:

http://lifeofaprogrammergeek.blogspot.com/2008/05/cuda-development-in-ubuntu.html

One of the obvious applications why I'm looking at this is the ability to use the GPU for applications in neural network, more or less like what has been done here:

http://www.codeproject.com/KB/graphics/GPUNN.aspx

The memory bandwidth is in the order of 20-100 GBs/sec, as you can see here:

http://www.nvidia.com/page/geforce8.html

As I have stated in previous posts, one of the constraints of a working artificial network is the processing power that is available and the amount of memory storage due to the weights and so on that need to be coded. If things are hardware-accelerated, it might open up new avenues for processing. One of the larger problems remaining is probably the design of coding the network into simpler data elements most efficiently. Leaving things to one processor to figure out puts pressure on the frequency that can be developed. But if there are 128 processors instead, it's becoming very interesting indeed.

One of my posts suggested that I don't believe (yet) that there's a kind of output wire where the results of the network end up in, but rather that the state of the network itself, or sub-states of cell assemblies somehow form up consciousness. And the more neurons are available, the more of "consciousness" you'll have.

One should never forget that the idea of something having consciousness or awareness depends on our perception. And perception is always influenced by our own ideas and assumptions. So if we look at someone pretending to be intelligent, we also assume they have the same level of conscience that pertain to those tasks, or more even. But that's not necessarily an accurate assumption.

A probably mistake in a previous post is related to the frequency of the network. Probably, we can't really discuss 'frequency', as there's no clear cycle or sweep in the biological network if we assume that neurons fire whenever they want and the "sweep" that would collect all neurons that are currently activated might just come in somewhat randomly. Computers are generally programmed to define proper "work" cycles against a single cycle that collects the results.

The frequency of brain waves has been defined as between 5 to 40Hz. So the figure of 1000Hz may be way off. If we regard 40Hz, things may become a lot easier. And if we work with simpler processors that work on this frequency and on equal units of work, perhaps it brings things closer to the real human brain.

From the perspective of processing, the GPU would enable a calculation speed of 250 times that of a CPU. And if we lower the frequency from 1000 to 40 Hz, that is another multiplication factor of 25. That brings the new number of neurons that can be processed to 2,500,000,000. This is only a factor of 40 lower than the human brain!

Thus, if we put 40 graphics cards into a computer, we'd probably close to the processing power of the brain itself. That'd be about 8000W versus 12W. Not too bad, because using CPU's that'd be about 2,000,000W. The remaining problem is the storage of network information to be able to use the network. That was set at 2Gb and with some smarter designs and optimizations or reductions in the number of connections this could be brought down to 500M or so, so that a network of 500,000 neurons could run on a single graphics card, but it's not yet enough. A single byte is possibly sufficient though for a neuron and you wouldn't typically use a single byte for processing on a CPU due to byte-alignment optimizations. On the GPU that shouldn't matter too much though.

It's time to delve into the design of the processors and see what the processors of a GPU can deal with efficiently. If they work on bytes rather than words, it makes things a lot easier to fit in memory, thereby increasing the size of the network.

And then it doesn't matter too much. Perhaps SLI can help to distribute tasks across networks and may help to assign specific tasks to specific cards, like visual processing, reasoning and so on. Graphics cards generally use texture maps and so on and those can be extracted from cards and loaded onto others in an effort to share information.

No comments: