CUDA Programming Model Basics:
The CUDA programming model is a heterogeneous model in which both the CPU and GPU are used. In CUDA, the host refers to the CPU and its memory, while the device refers to the GPU and its memory. Code run on the host can manage memory on both the host and device, and also launches kernels which are functions executed on the device. These kernels are executed by many GPU threads in parallel.
Given the heterogeneous nature of the CUDA programming model, a typical sequence of operations for a CUDA C program is:
- Declare and allocate host and device memory.
- Initialize host data.
- Transfer data from the host to the device.
- Execute one or more kernels.
- Transfer results from the device to the host.
Let us look at some of the CUDA programs in c:
1. CUDA C program that add two array of elements and store the result in third array.
Problem statement: You are given two array with integer/real values, your task is to add to elements of both the array and store in another array.
Solution: The main task is to write CUDA kernel for that, writing kernel is not a big task. Let see how
Let say we have N elements in an array which is represent by “arraySize” (here it is = 5, change accordingly). We have two Function named addWithCuda (…); for invoking kernel and allocating memory on device.d_a and d_b is the device array for storing elements and d_c is the array which stores sum of both array d_a and d_b.
We launch arraysize number threads to add elements of array.
We launch arraysize number threads to add elements of array.
So, adding corresponding elements is quite easy. Just keep tracking the thread Id and we are done. We store id of thread within the block in “I” and adding both of the array element respectively
Here is the link for the code :
2. CUDA C program for Matrix Multiplication :
Problem statement:
To multiply two matrixes sufficient and necessary condition is "number of columns in matrix A = number of rows in matrix B".
Solution:
Loop for each row in matrix A.Loop for each columns in matrix B and initialize output matrix C to 0. This loop will run for each rows of matrix A.Loop for each columns in matrix A.Multiply A[i,k] to B[k,j] and add this value to C[i,j]Return output matrix C.
Here is the link for the code :
No comments:
Post a Comment