Friday, April 20, 2012

Finally SteerSuite works with GPU

Because of the limitation of CUDA C++, I dropped the original plan that put everything in GPU side. Instead, I only copy the necessary information, like position, velocity, direction, and so on to the GPU side.

The data structure is as follow:
typedef struct cuda_agent{
bool _enabled;
float3 _position;
float3 _velocity;
float3 _forward;
float _radius;
float3 _goalQueue[20];
int _goalQSize;
int _curGoal;
int _usedGoal;
AABox _oldBounds;
AABox _newBounds;
}cuda_agent;
First, I copied the data back and forth between CPU and GPU, however, I found it very inefficient. Therefore, I keep 
the updated data only in GPU side. After computation along with time, I copy the updated data back to CPU and draw 
them. And this make it run faster.

 Currently, I compare the performance between i7-2600 and GTX520, and they just have the same performance. I guess
it is because the computation ability of GTX520 is too weak. I will rerun the SteerSuite in other computers which have
better GPUs. 

I think I can get better results.

Wednesday, April 11, 2012

Update agent in parallel

As I mentioned in last post, all my data are in the GPU side. And according to the existing algorithm, there is no conflict while updating the built-in KD tree. So I decide to update each agent in parallel, but some modification is needed.

For simpleAI, two steps are taken actually in the critical method updateAI in the SimpleAgent class.
First, a agent will check whether it reaches the goal already, if it reaches it, then it will confirm whether the reached goal is the last. If it is, then the agent will disable itself and finishes the update, otherwise, the subsequent goal will be set as the current goal. Afterwards, a simple movement will be calculated, and the agent will update its position and the location in the KD tree. The annoying thing here is that invoking relocating object in KD tree is at CPU side, so it is impossible to make it run at GPU side. Therefore, what I did is storing the newBounds and oldBounds in the agent, then copy them to CPU side and finish the update part.

For pprAI, the idea is similar but taking more steps. perceptive, predictive, and reactive steps could be done in one big step, but beforehand, the required query should be done. Afterwards, the updating part as we did in simpleAI would be the same in pprAI.

Hopefully, I can finish simpleAI this week, and march to modify pprAI next week. I will keep the blog updated.

Putting database in the GPU side

For the data store in the GridDatabase2D which is pointed by a SpatialDatabaseItemPtr *, _basePtr, in order to put the data in the GPU side, I made _basePtr a SpatialDatabaseItemPtr **. First, _basePtr will be modified by cudaMalloc, pointing to a memory address in GPU side, then, by launching a kernel, I will use new in the CUDA side to assign a chunk of memory in GPU to the pointer pointed by _basePtr. And the _cells member in GridDatabase2D is treated in this way as well.

The reason I do things in this way is because in the kernel, only addresses in GPU side can be accessed, therefore another indirect pointer is required.

For the reason above, all agents and obstacles have to be allocated in this way. So most of the interfaces in the original program will be modified.

Currently, I successfully assign the memory blocks for _basePtr, _cells, and obstacles. But for agents, I always encounter a kernel error when I was trying to launch them. I will keep looking into it. Hopefully, it can be resolved soon then I can run the whole program.