Friday, January 29, 2010

ATI Stream on Ubuntu 9.10 Karmic

Just as Nvidia has CUDA for GPU programming, the ATI cards also have an SDK to get those hardware-threads revving for solving complex, scientific or brute-force problems, called ATI Stream. I have a 64-bit OS with a quad-core Intel i920 and one HD5850 ATI Radeon on Ubuntu Karmic 9.10. On the 27th, some new drivers for the ATI card were released on the ATI website, which I think together with the nopat flag have resolved my problems getting the SDK to run.

After downloading the SDK, you need to set the LD_LIBRARY_PATH for the system to find the opencl shared libraries and whatever else is in lib. It's actually not as easy to find the installation documents, they're hidden 2-levels deep in links, but here's the full documentation for the SDK (the link to which could be stated more visibly on the ATI Stream page).

Officially, ATI Stream is not yet supported for Ubuntu 9.10. I initially got only segfaults for any application or it just wouldn't find devices even when the latest Catalyst drivers from the ATI site (27-Jan-2010) were downloaded and installed and a full restart was performed. However, when the "nopat" option is supplied as a kernel boot parameter, things work a lot better :). Right now, most samples seem to run except for one specific one that my card doesn't seem to support. Because Karmic uses grub2, the location for modifying kernel boot parameters changed. You need to modify /etc/default/grub instead of /boot/grub/menu.lst. I inserted nopat here:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nopat"

After these modifications, the /usr/lib/OpenCL/vendors links and the environmental changes for LD_LIBRARY_PATH, yours should be working as well.

There is some more information about OpenCL, which I think is a very nice advance into ensuring portability of GPU code to other platforms. A very good explanation about CPU vs. the GPU can be found here :).

I'm planning to use it for some really new and innovative experiments involving different perspectives on neural networks, network spiking and so on. The idea is that I need a large number of parallel processors, which can be very modest in processing power, as they only need to move a very bit of data and perform extremely simple processing steps. The details and the complexities of the work is in the synchronization, so I'll take some looks into that :).

No comments: