pynx.processing_unit
: detecting, initializing and using computing or graphical processing units#
- class pynx.processing_unit.Backend(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)#
Processing backend used for the computations
- exception pynx.processing_unit.ProcessingUnitException#
- exception pynx.processing_unit.ProcessingUnitWarning#
- pynx.processing_unit.opencl_device.available_gpu_speed(cl_platform=None, fft_shape=(16, 256, 256), axes=(-1, -2), min_gpu_mem=None, verbose=False, gpu_name=None, only_gpu=True, return_dict=False, ranking='fft')#
Get a list of all available GPUs, sorted by FFT speed (Gflop/s) or bandwidth (Gbytes/s).
- Parameters:
cl_platform – the OpenCL platform (default=None, all platform are tested)
fft_shape – the FFT shape against which the fft speed is calculated. If None, no benchmark is performed, the speed for all devices is reported as 0.
axes – the fft axis
min_gpu_mem – the minimum amount of gpu memory desired (bytes). Devices with less are ignored.
verbose – if True, printout FFT speed and memory for found GPUs
gpu_name – if given, only GPU whose name include this sub-string will be tested & reported. This can also be a list of acceptable strings
only_gpu – if True (the default), will skip non-GPU OpenCL devices
return_dict – if True, a dictionary will be returned instead of a list, with both timing and gflops listed
ranking – either ‘fft’ or ‘bandwidth’.
- Returns:
a list of tuples (GPU device, speed (Gflop/s)), ordered by decreasing speed. If return_dict is True, a dictionary is returned with each entry is a dictionary with gflops and dt results
- pynx.processing_unit.opencl_device.cl_device_fft_speed(d=None, fft_shape=(16, 256, 256), axes=(-1, -2), verbose=False, nb_test=4, nb_cycle=1, timing=False, shuffle_axes=False)#
Compute the FFT calculation speed for a given OpenCL device.
- Parameters:
d – the pyopencl.Device. If not supplied, pyopencl.create_some_context() will be called, and a device can be chosen interactively. This will result in a new context created for each call, and is not efficient (the context memory cannot be freed).
fft_shape – (nz,ny,nx) the shape of the complex fft transform, treated as a stack of nz 2D transforms of size nx * ny, or as a single 3D FFT, depending on the value of ‘axes’
axes – (1,2) the axes for the FFT. Default value is (-1,-2), which will perform a stacked 2d fft. Using None will perform a 3d fft.
verbose – if True, print the speed and timing for the given transform
nb_test – number of time the calculations will be repeated, the best result is returned
nb_cycle – each test consist of nb_cycle forward and backward FFT.
timing – if True, also return the time needed for a single FFT (dt)
shuffle_axes – if True, the order of axes for the transform will be shuffled to find the fastest combination, and the optimal axes order will be returned. Only useful for gpyfft, ignored when pyvkfft is used.
- Returns:
The computed speed in Gflop/s (if timing is False) or a tuple (flops, dt), and also with the axes if shuffle_axes is True.
- pynx.processing_unit.opencl_device.cl_device_global_mem_bandwidth(d)#
Get the CUDA device global memory bandwidth :param d: the opencl device. :return: the memory bandwidth in Gbytes/s
- pynx.processing_unit.cuda_device.available_gpu_speed(fft_shape=(16, 256, 256), batch=True, min_gpu_mem=None, verbose=False, gpu_name=None, return_dict=False, ranking='fft')#
Get a list of all available GPUs, sorted by FFT speed (Gflop/s) or memory bandwidth (Gbytes/s).
- Parameters:
fft_shape – the FFT shape against which the fft speed is calculated
batch – if True, perform a batch 2D FFT rather than a 3D one
min_gpu_mem – the minimum amount of gpu memory desired (bytes). Devices with less are ignored.
verbose – if True, printout speed and memory for found GPUs
gpu_name – if given, only GPU whose name include this sub-string will be tested & reported. This can also be a list of acceptable strings
return_dict – if True, a dictionary will be returned instead of a list, with both timing and gflops listed
ranking – either ‘fft’ or ‘bandwidth’.
- Returns:
a list of tuples (GPU device, speed (Gflop/s) or memory bandwidth), ordered by decreasing values. If return_dict is True, a dictionary is returned with each entry is a dictionary with gflops and dt results
- pynx.processing_unit.cuda_device.cuda_device_fft_speed(d=None, fft_shape=(16, 256, 256), batch=True, verbose=False, nb_test=4, nb_cycle=1, timing=False)#
Compute the FFT calculation speed for a given CUDA device.
- Parameters:
d – the pycuda.driver.Device. If not given, the default context will be used.
fft_shape=(nz,ny,nx) – the shape of the complex fft transform, treated as a stack of nz 2D transforms of size nx * ny, or as a single 3D FFT, depending on the value of ‘axes’
batch – if True, will perform a batch 2D FFT. Otherwise, will perform a 3D FFT.
verbose – if True, print the speed and timing for the given transform
nb_test – number of time the calculations will be repeated, the best result is returned
timing – if True, also return the time needed for a single FFT (dt)
- Returns:
The computed speed in Gflop/s (if timing is False) or a tuple (flops, dt)
- pynx.processing_unit.cuda_device.cuda_device_global_mem_bandwidth(d, measured=False)#
Get the CUDA device global memory bandwidth :param d: the CUDA device. :param measured: if True, measure the bandwidth :return: the memory bandwidth in Gbytes/s