Real(fp_kind) ,allocatable, dimension (:) :: B Real(fp_kind) ,pointer, dimension (:) :: A,C Integer, parameter :: fp_kind = kind(0.0d0) ! Double precision We will use the standard Fortran allocator for this one. B is an array that we will use to compute a reference solution on the CPU. Since we want to use the zero copy features on these two, we will allocate them with cudaHostAlloc. We need to do a couple of extra steps: call the CUDA allocator in C, and then pass the C pointer to Fortran using the function C_F_Pointer provided by the iso C bindings.Ī is the input array, C is the output array from the GPU computation. Since we are using a standard Fortran 90 compiler, we can't use the built in allocator ( it has no knowledge of pinned memory). This is achieved with calls to cudaHostGetDevicePointer. These are the pointers that we will pass to the CUDA kernels. Get the device pointers to the mapped memory.Allocate the host mapped arrays: this is achieved with cudaHostAlloc with the flag cudaHostAllocMapped.Set the device flag for mapping host memory: this is achieved with a call to the cudaSetDeviceFlags with the flag cudaDeviceMapHost.To declare the mapped array, we will need to perform the following steps: If you are not familiar with the zero-copy feature in CUDA C, it allows compute kernels to share host system memory and provides zero-copy support for direct access to host system memory when running on many newer CUDA-enabled graphics processors. Makes kernels significantly more readable.The basic idea is to use the original CUDA C functions to allocate host arrays that are page-locked ( aka pinned) and with the right attributes to be used by the zero copy feature of CUDA. You can use the Fortran index operator () on the C++ side, which. created with hipMalloc) on the Fortran side.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |