RasPi only lets you program the ARM11 directly. There's also a bunch of video and graphics processors, along with a DSP (or so I've read), none of which are documented. I wonder how much performance you could get from a single RasPi if you could program those processors as well?
I've heard Eben say the GPU has 24 GFLOPS of general purpose computing. This cluster had 64 Pi's so that would be 1.5 TFLOPS I believe. Of course the trouble is there's no way to tap into that yet. I asked Eben about it back in May at Maker Faire and he commented they would like to open it up but the interface won't be OpenCL. I'm not sure what the current plan is.
Even the claimed 24 GFLOPS (single precision only), times 64 Pi's (assuming perfect performance scaling across 100Mbit ethernet implemented over USB 2.0) giving 1.5 TFLOPS in the ideal case, assuming it could be opened up for the user to program, is not competitive with a single modern GPGPU graphics card with 3.2 TFLOPS single precision.
http://www.amd.com/us/press-releases/Pages/amd-powerful-server-graphics-2012aug27.aspx
Even the claimed 24 GFLOPS (single precision only), times 64 Pi's (assuming perfect performance scaling across 100Mbit ethernet implemented over USB 2.0) giving 1.5 TFLOPS in the ideal case, assuming it could be opened up for the user to program, is not competitive with a single modern GPGPU graphics card with 3.2 TFLOPS single precision.
http://www.amd.com/us/press-releases/Pages/amd-powerful-server-graphics-2012aug27.aspx