To compile and run CUDA code, you need to use an external Kokkos installation that has been configured with CUDA support. To enable deal.II's own CUDA backend, you will need your GPU to have compute capability 6.0 or higher. Independently from the GPU itself, you also need a version of CUDA between 10.2 and 11.8.
To configure deal.II's own CUDA backend use the following option:
-DDEAL_II_WITH_CUDA=ON
CUDA versions prior to 11.0 don't support C++17 or higher. You will have
to make sure that C++17 is disabled when using earlier versions.
Several MPI implementations are able to perform MPI operations with data located in device memory directly without the need to copy to CPU memory explicitly first. This feature is commonly known as "CUDA-aware MPI". In case deal.II is compiled with MPI support and the MPI implementation supports this feature, you can tell deal.II to use it by configuring with
-DDEAL_II_WITH_MPI=ON
-DDEAL_II_MPI_WITH_DEVICE_SUPPORT=ON
Note, that there is no check that detects if the MPI implementation
really is CUDA-aware. Activating this flag for incompatible MPI libraries
will lead to segmentation faults in MPI calls.
Using CUDA in combination with architecture-specific C++ compiler flags
like -march=native is known to be fragile and there might be
compatibility issues with other libraries, e.g. using CUDA 10.1 with
-DDEAL_II_WITH_TBB=ON and
-DDEAL_II_CXX_FLAGS=-march=native results in compile time
errors like:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx512fintrin.h(11265): error: identifier "__builtin_ia32_scalefsd_round" is undefined
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx512fintrin.h(11274): error: identifier "__builtin_ia32_scalefss_round" is undefined
Since vectorization in VectorizedArray is disabled when compiling with
CUDA support anyway, it is recommended to drop the compile flag in that
case.