With the public availability of the CUDA Toolkit 5.5 Release Candidate (RC) ARM platforms are now supported out-of-the-box.
We are still testing CUDA 5.5, however you can try CUDA 5.0 on Apalis T30 as explained further down.
First download and extract our latest Apalis T30 Embedded Linux BSP:
wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Images/old/Apalis_T30_LinuxImageV2.0Beta2_20130816.tar.bz2 sudo tar xjvf Apalis_T30_LinuxImageV2.0Beta2_20130816.tar.bz2
As a next step download and extract the Apalis T30 CUDA package (please note that this will complement/overwrite files from previously extracted regular BSP package):
wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/Apalis_T30_LinuxImageV2.0-CUDA_5.0_v1.1.tar.bz2 sudo tar xjvf Apalis_T30_LinuxImageV2.0-CUDA_5.0_v1.1.tar.bz2
Now flash it as usual:
cd Apalis_T30_LinuxImageV2.0 ./update.sh
As a final step on the Apalis T30 target itself update the X-server with a Xinerama enabled version and reboot:
wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/xserver-xorg_1.11.2-r11_armv7ahf-vfp.ipk wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/libdrm2_2.4.39-r3.0_armv7ahf-vfp.ipk wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/xserver-xorg-extension-dri2_1.11.2-r11_armv7ahf-vfp.ipk wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/xserver-xorg-module-libwfb_1.11.2-r11_armv7ahf-vfp.ipk opkg install xserver-xorg_1.11.2-r11_armv7ahf-vfp.ipk libdrm2_2.4.39-r3.0_armv7ahf-vfp.ipk xserver-xorg-extension-dri2_1.11.2-r11_armv7ahf-vfp.ipk xserver-xorg-module-libwfb_1.11.2-r11_armv7ahf-vfp.ipk reboot
Some samples can be found in the home directory:
root@apalis-t30:~# ls CUDA-5.0_samples/ BlackScholes data result.dat FDTD3d dct8x8 scalarProd FlowCPU.flo deviceQuery scan FlowGPU.flo deviceQueryDrv segmentationTreeThrust FunctionPointers dwtHaar1D shfl_scan HSOpticalFlow dxtc simpleAssert MC_EstimatePiInlineP eigenvalues simpleAtomicIntrinsics MC_EstimatePiInlineQ fastWalshTransform simpleCUBLAS MC_EstimatePiP fluidsGL simpleCUFFT MC_EstimatePiQ histogram simpleCallback MC_SingleAsianOptionP imageDenoising simpleCubemapTexture Mandelbrot inlinePTX simpleDevLibCUBLAS MersenneTwisterGP11213 interval simpleGL MonteCarloMultiGPU level_00.ppm simpleHyperQ SobelFilter level_01.ppm simpleIPC SobolQRNG level_02.ppm simpleLayeredTexture alignedTypes level_03.ppm simpleMultiCopy asyncAPI level_04.ppm simpleMultiGPU bandwidthTest level_05.ppm simpleP2P barbara_cuda1.bmp level_06.ppm simplePitchLinearTexture barbara_cuda2.bmp level_07.ppm simplePrintf barbara_cuda_short.bmp level_08.ppm simpleSeparateCompilation barbara_gold1.bmp level_09.ppm simpleStreams barbara_gold2.bmp lineOfSight simpleSurfaceWrite batchCUBLAS marchingCubes simpleTemplates bicubicTexture matrixMul simpleTexture bilateralFilter matrixMulCUBLAS simpleTexture3D bindlessTexture matrixMulDrv simpleTextureDrv binomialOptions matrixMulDynlinkJIT simpleVoteIntrinsics boxFilter mergeSort simpleZeroCopy cdpAdvancedQuicksort nbody smokeParticles cdpLUDecomposition newdelete sortingNetworks cdpQuadtree oceanFFT stereoDisparity cdpSimplePrint output.pgm template cdpSimpleQuicksort output_CPU.pgm template_runtime clock output_GPU.pgm threadFenceReduction concurrentKernels particles threadMigration conjugateGradient postProcessGL transpose conjugateGradientPrecond ptxjit vectorAdd convolutionFFT2D quasirandomGenerator vectorAddDrv convolutionSeparable radixSortThrust volumeFiltering convolutionTexture randomFog volumeRender cppIntegration recursiveGaussian cudaOpenMP reduction root@apalis-t30:~# xrandr --output DP-0 --mode 1920x1080 --right-of DP-1 root@apalis-t30:~# cd CUDA-5.0_samples/ root@apalis-t30:~/CUDA-5.0_samples# ./FunctionPointers ./FunctionPointers Starting... Reading image: lena.pgm I: display Image (no filtering) T: display Sobel Edge Detection (Using Texture) S: display Sobel Edge Detection (Using SMEM+Texture) Use the '-' and '=' keys to change the brightness. b: switch block filter operation (Mean/Sobel) p: switch point filter operation (Threshold ON/OFF) root@apalis-t30:~/CUDA-5.0_samples# ./Mandelbrot [CUDA Mandelbrot/Julia Set] - Starting... > Device 0: < NVS 310 >, Compute SM 2.1 detected GPU Device 0: "NVS 310" with compute capability 2.1 Data initialization done. Initializing GLUT... Loading extensions: No error OpenGL window created. Starting GLUT main loop... Press [s] to toggle between GPU and CPU implementations Press [j] to toggle between Julia and Mandelbrot sets Press [r] or [R] to decrease or increase red color channel Press [g] or [G] to decrease or increase green color channel Press [b] or [B] to decrease or increase blue color channel Press [e] to reset Press [a] or [A] to animate colors Press [c] or [C] to change colors Press [d] or [D] to increase or decrease the detail Press [p] to record main parameters to file params.txt Press [o] to read main parameters from file params.txt Left mouse button + drag = move (Mandelbrot or Julia) or animate (Julia) Press [m] to toggle between move and animate (Julia) for left mouse button Middle mouse button + drag = Zoom Right mouse button = Menu Press [?] to print location and scale Press [q] to exit Creating GL texture... Texture created. Creating PBO... PBO created. root@apalis-t30:~/CUDA-5.0_samples# ./SobelFilter CUDA Sobel Edge-Detection Starting... Reading image: lena.pgm I: display Image (no filtering) T: display Sobel Edge Detection (Using Texture) S: display Sobel Edge Detection (Using SMEM+Texture) Use the '-' and '=' keys to change the brightness. root@apalis-t30:~/CUDA-5.0_samples# ./bicubicTexture Starting bicubicTexture [CUDA BicubicTexture] (OpenGL Mode) CUDA device [NVS 310] has 1 Multi-Processors Loaded 'lena_bw.pgm', 512 x 512 pixels Controls =/- : Zoom in/out b : Run Benchmark g_FilterMode c : Draw Bicubic Spline Curve [esc] - Quit Press number keys to change filtering g_FilterMode: 1 : nearest filtering 2 : bilinear filtering 3 : bicubic filtering 4 : fast bicubic filtering 5 : Catmull-Rom filtering root@apalis-t30:~/CUDA-5.0_samples# ./bilateralFilter ./bilateralFilter Starting... Loading ./data/nature_monte.bmp... BMP width: 640 BMP height: 480 BMP file loaded successfully! Loaded './data/nature_monte.bmp', 640 x 480 pixels Found 1 CUDA Capable device(s) supporting CUDA Device 0: "NVS 310" CUDA Runtime Version : 5.0 CUDA Compute Capability : 2.1 Found CUDA Capable Device 0: "NVS 310" Setting active device to 0 Using device 0: NVS 310 Running Standard Demonstration with GLUT loop... Press '+' and '-' to change filter width Press ']' and '[' to change number of iterations Press 'e' and 'E' to change Euclidean delta Press 'g' and 'G' to changle Gaussian delta Press 'a' or 'A' to change Animation mode ON/OFF root@apalis-t30:~/CUDA-5.0_samples# ./boxFilter ./boxFilter Starting... Loaded './data/lenaRGB.ppm', 1024 x 1024 pixels Found 1 CUDA Capable device(s) supporting CUDA Device 0: "NVS 310" CUDA Runtime Version : 5.0 CUDA Compute Capability : 2.1 Found CUDA Capable Device 0: "NVS 310" Setting active device to 0 Running Standard Demonstration with GLUT loop... Press '+' and '-' to change filter width Press ']' and '[' to change number of iterations Press 'a' or 'A' to change animation ON/OFF root@apalis-t30:~/CUDA-5.0_samples# ./imageDenoising CUDA ImageDenoising Starting... [CUDA ImageDenoising] Allocating host and CUDA memory and loading image file... Loading ./data/portrait_noise.bmp... BMP width: 320 BMP height: 408 BMP file loaded successfully! Data init done. Initializing GLUT... OpenGL window created. Loading extensions: No error Creating GL texture... Texture created. Creating PBO... PBO created. Starting GLUT main loop... Press [1] to view noisy image Press [2] to view image restored with knn filter Press [3] to view image restored with nlm filter Press [4] to view image restored with modified nlm filter Press [ ] to view smooth/edgy areas [RED/BLUE] Ct's Press [f] to print frame rate Press [?] to print Noise and Lerp Ct's Press [q] to exit root@apalis-t30:~/CUDA-5.0_samples# ./nbody Run "nbody -benchmark [-numbodies=]" to measure perfomance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies= (number of bodies (>= 1) to run in simulation) -device= (where d=0,1,2.... for the CUDA device to use) -numdevices= (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy= (load a tipsy model file for simulation) > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation > Compute 2.1 CUDA device: [NVS 310] root@apalis-t30:~/CUDA-5.0_samples# ./oceanFFT [CUDA FFT Ocean Simulation] Left mouse button - rotate Middle mouse button - pan Right mouse button - zoom 'w' key - toggle wireframe [CUDA FFT Ocean Simulation] root@apalis-t30:~/CUDA-5.0_samples# ./particles CUDA Particles Simulation Starting... grid: 64 x 64 x 64 = 262144 cells particles: 16384 root@apalis-t30:~/CUDA-5.0_samples# ./postProcessGL ./postProcessGL Starting... (Interactive OpenGL Demo) OpenGL device is Available Creating a Texture render target GL_RGBA16F_ARB Shader compilation error: Fragment info ------------- 0(4) : warning C7533: global variable gl_Color is deprecated after version 120 Controls (right click mouse button for Menu) [ ] : Toggle CUDA Post Processing (on/off) [a] : Toggle Animation (on/off) [=] : Increase Blur Radius [-] : Decrease Blur Radius [esc] - Quit root@apalis-t30:~/CUDA-5.0_samples# ./randomFog Random Fog ========== CURAND initialized Random number visualization On creation, randomFog generates 200,000 random coordinates in spherical coordin ate space (radius, angle rho, angle theta) with curand's XORWOW algorithm. The c oordinates are normalized for a uniform distribution through the sphere. The X axis is drawn with blue in the negative direction and yellow positive. The Y axis is drawn with green in the negative direction and magenta positive. The Z axis is drawn with red in the negative direction and cyan positive. The following keys can be used to control the output: s Generate a new set of random numbers and display as spherical coordinates (Sphere) e Generate a new set of random numbers and display on a spherical surface (shEll) b Generate a new set of random numbers and display as cartesian coordinates (cuBe/Box) p Generate a new set of random numbers and display on a cartesian plane (Plane) i,l,j Rotate the negative Z-axis up, right, down and left respectively a Toggle auto-rotation t Toggle 10x zoom z Toggle axes display x Select XORWOW generator (default) c Select Sobol' generator v Select scrambled Sobol' generator r Reset XORWOW (i.e. reset to initial seed) and regenerate ] Increment the number of Sobol' dimensions and regenerate [ Reset the number of Sobol' dimensions to 1 and regenerate + Increment the number of displayed points by 8,000 (up to maximum 200,000) - Decrement the number of displayed points by 8,000 (down to minimum 8,000) q/[ESC] Quit the application. root@apalis-t30:~/CUDA-5.0_samples# ./recursiveGaussian CUDA Recursive Gaussian Starting... Loaded './data/lena.ppm', 512 x 512 pixels Press '+' and '-' to change filter width 0, 1, 2 - change filter order root@apalis-t30:~/CUDA-5.0_samples# ./simpleGL simpleGL (VBO) starting... root@apalis-t30:~/CUDA-5.0_samples# ./simpleTexture3D simpleTexture3D Starting... Read './data/Bucky.raw', 32768 bytes Press space to toggle animation Press '+' and '-' to change displayed slice root@apalis-t30:~/CUDA-5.0_samples# ./smokeParticles CUDA Smoke Particles Starting... Loaded './data/floortile.ppm', 256 x 256 pixels root@apalis-t30:~/CUDA-5.0_samples# ./volumeFiltering CUDA 3D Volume Filtering Starting... Found 1 CUDA Capable Device(s). Device 0: "NVS 310" CUDA Runtime Version : 5.0 CUDA Compute Capability : 2.1 Found CUDA Capable Device 0: "NVS 310" Setting active device to 0 Read './data/Bucky.raw', 32768 bytes Press 'SPACE' to toggle animation 'p' to toggle pre-integrated transfer function '+' and '-' to change density (0.01 increments) ']' and '[' to change brightness ';' and ''' to modify transfer function offset '.' and ',' to modify transfer function scale root@apalis-t30:~/CUDA-5.0_samples# ./volumeRender CUDA 3D Volume Render Starting... Read './data/Bucky.raw', 32768 bytes Press '+' and '-' to change density (0.01 increments) ']' and '[' to change brightness ';' and ''' to modify transfer function offset '.' and ',' to modify transfer function scale
Some information about the graphics card used.
root@apalis-t30:~/CUDA-5.0_samples# ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "NVS 310" CUDA Driver Version / Runtime Version 5.0 / 5.0 CUDA Capability Major/Minor version number: 2.1 Total amount of global memory: 512 MBytes (536543232 bytes) ( 1) Multiprocessors x ( 48) CUDA Cores/MP: 48 CUDA Cores GPU Clock rate: 1046 MHz (1.05 GHz) Memory Clock rate: 875 Mhz Memory Bus Width: 64-bit L2 Cache Size: 65536 bytes Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: No Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): No Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = NVS 310
Some information about the graphics driver used.
root@apalis-t30:~# modinfo nvidia filename: /lib/modules/3.1.10-carma/kernel/drivers/video/nvidia.ko alias: char-major-195-* version: 313.24 supported: external license: NVIDIA alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00* alias: pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00* alias: pci:v000010DEd*sv*sd*bc03sc02i00* alias: pci:v000010DEd*sv*sd*bc03sc00i00* depends: vermagic: 3.1.10-carma SMP preempt mod_unload ARMv7 parm: NVreg_Mobile:int parm: NVreg_ResmanDebugLevel:int parm: NVreg_RmLogonRC:int parm: NVreg_ModifyDeviceFiles:int parm: NVreg_DeviceFileUID:int parm: NVreg_DeviceFileGID:int parm: NVreg_DeviceFileMode:int parm: NVreg_RemapLimit:int parm: NVreg_UpdateMemoryTypes:int parm: NVreg_InitializeSystemMemoryAllocations:int parm: NVreg_RMEdgeIntrCheck:int parm: NVreg_UsePageAttributeTable:int parm: NVreg_MapRegistersEarly:int parm: NVreg_RegisterForACPIEvents:int parm: NVreg_CheckPCIConfigSpace:int parm: NVreg_EnablePCIeGen3:int parm: NVreg_EnableMSI:int parm: NVreg_RegistryDwords:charp parm: NVreg_RmMsg:charp