The data structure is as follow:
typedef struct cuda_agent{bool _enabled;float3 _position;float3 _velocity;float3 _forward;float _radius;float3 _goalQueue[20];int _goalQSize;int _curGoal;int _usedGoal;AABox _oldBounds;AABox _newBounds;}cuda_agent;First, I copied the data back and forth between CPU and GPU, however, I found it very inefficient. Therefore, I keep
the updated data only in GPU side. After computation along with time, I copy the updated data back to CPU and draw
them. And this make it run faster.
Currently, I compare the performance between i7-2600 and GTX520, and they just have the same performance. I guess
it is because the computation ability of GTX520 is too weak. I will rerun the SteerSuite in other computers which have
better GPUs.
I think I can get better results.
There are machines with several different cards available in the SIG lab. Also, dig deeper into your performance analysis. Is there enough parallelism? What is the computational intensity, i.e., how much compute are you doing relevant to memory access? Are too many registers being used? etc.
ReplyDelete