Over in the NVIDIA developper section a new driver was spotted. As such consider this a beta driver please. We have a discussion thread open here.
New and Improved CUDA Libraries
CUSPARSE, a new library of GPU-accelerated sparse matrix routines for sparse/sparse and dense/sparse operations
CURAND, a new library of GPU-accelerated random number generation (RNG) routines, supporting Sobol quasi-random and XORWOW pseudo-random routines for in both host and device code
CUFFT performance tuned for radix-3, -5, and -7 transform sizes on Fermi architecture GPUs
CUBLAS performance improved 50% to 300% on Fermi architecture GPUs, for matrix multiplication of all datatypes and transpose variations
H.264 encode/decode libraries that were previously available in the GPU Computing SDK are now part of the CUDA Toolkit
CUDA Driver & CUDA C Runtime
Support for new 6GB Quadro and Tesla products
Support for debugging GPUs with more than 4GB device memory.
Integrated Tesla Compute Cluster (TCC) support in standard Windows driver packages
Multi-GPU debugging support for both cuda-gdb and Parallel Nsight
Added cuda-memcheck support for Fermi architecture GPUs
NVCC support for Intel C Compiler (ICC) v11.1 on 64-bit Linux distros
Support for malloc() and free() in CUDA C compute kernels
NVIDIA System Management Interface (nvidia-smi) support for reporting % GPU busy, and several GPU performance counters
New GPU Computing SDK Code Samples
Several code samples demonstrating how to use the new CURAND library, including MonteCarloCURAND, EstimatePiInlineP, EstimatePiInlineQ, EstimatePiP, EstimatePiQ, and SingleAsianOptionP
Conjugate Gradient Solver, demonstrating the use of CUBLAS and CUSPARSE together
Function Pointers, a sample that shows how to use function pointers to implement the Sobel Edge Detection filter for 8-bit monochrome images
Interval Computing, demonstrating the use of interval arithmetic operators using C++ templates and recursion
Simple Printf, demonstrating best practices for using both printf and cuprintf in compute kernels
Bilateral Filter, an edge-preserving non-linear smoothing filter for image recovery and denoising that is implemented in CUDA C with OpenGL rendering
SLI with Direct3D Texture, a simple example demonstrating the use of SLI and Direct3D interoperability with CUDA C
Windows developers should be sure to check out the new debugging and profiling features in Parallel Nsight for Visual Studio at www.nvidia.com/ParallelNsight.
Please refer to the Release Notes and Getting Started Guides for more information.
In CUDA Toolkit 3.2 and the accompanying release of the CUDA driver, some important changes have been made to the CUDA Driver API to support large memory access for device code and to enable further system calls such as malloc and free. Please refer to the CUDA Toolkit 3.2 Readiness Tech Brief for a summary of these changes.