Friday, April 20, 2012

Finally SteerSuite works with GPU

Because of the limitation of CUDA C++, I dropped the original plan that put everything in GPU side. Instead, I only copy the necessary information, like position, velocity, direction, and so on to the GPU side.

The data structure is as follow:
typedef struct cuda_agent{
bool _enabled;
float3 _position;
float3 _velocity;
float3 _forward;
float _radius;
float3 _goalQueue[20];
int _goalQSize;
int _curGoal;
int _usedGoal;
AABox _oldBounds;
AABox _newBounds;
First, I copied the data back and forth between CPU and GPU, however, I found it very inefficient. Therefore, I keep 
the updated data only in GPU side. After computation along with time, I copy the updated data back to CPU and draw 
them. And this make it run faster.

 Currently, I compare the performance between i7-2600 and GTX520, and they just have the same performance. I guess
it is because the computation ability of GTX520 is too weak. I will rerun the SteerSuite in other computers which have
better GPUs. 

I think I can get better results.

Wednesday, April 11, 2012

Update agent in parallel

As I mentioned in last post, all my data are in the GPU side. And according to the existing algorithm, there is no conflict while updating the built-in KD tree. So I decide to update each agent in parallel, but some modification is needed.

For simpleAI, two steps are taken actually in the critical method updateAI in the SimpleAgent class.
First, a agent will check whether it reaches the goal already, if it reaches it, then it will confirm whether the reached goal is the last. If it is, then the agent will disable itself and finishes the update, otherwise, the subsequent goal will be set as the current goal. Afterwards, a simple movement will be calculated, and the agent will update its position and the location in the KD tree. The annoying thing here is that invoking relocating object in KD tree is at CPU side, so it is impossible to make it run at GPU side. Therefore, what I did is storing the newBounds and oldBounds in the agent, then copy them to CPU side and finish the update part.

For pprAI, the idea is similar but taking more steps. perceptive, predictive, and reactive steps could be done in one big step, but beforehand, the required query should be done. Afterwards, the updating part as we did in simpleAI would be the same in pprAI.

Hopefully, I can finish simpleAI this week, and march to modify pprAI next week. I will keep the blog updated.

Putting database in the GPU side

For the data store in the GridDatabase2D which is pointed by a SpatialDatabaseItemPtr *, _basePtr, in order to put the data in the GPU side, I made _basePtr a SpatialDatabaseItemPtr **. First, _basePtr will be modified by cudaMalloc, pointing to a memory address in GPU side, then, by launching a kernel, I will use new in the CUDA side to assign a chunk of memory in GPU to the pointer pointed by _basePtr. And the _cells member in GridDatabase2D is treated in this way as well.

The reason I do things in this way is because in the kernel, only addresses in GPU side can be accessed, therefore another indirect pointer is required.

For the reason above, all agents and obstacles have to be allocated in this way. So most of the interfaces in the original program will be modified.

Currently, I successfully assign the memory blocks for _basePtr, _cells, and obstacles. But for agents, I always encounter a kernel error when I was trying to launch them. I will keep looking into it. Hopefully, it can be resolved soon then I can run the whole program.

Saturday, March 31, 2012

Issues with port steersuite to CUDA and some possible solutions

I have been trying to port SteerSuite code to CUDA for about an week. The largest obstacle with this is the inter-class referred the data structure.

In the class GridDatabase2D, the pointer _basePtr points to the overall database which stores every single item we need to draw and move, and the pointer _cells points to a GridCell array which stores the pointers of all the items according to those items' location.

But in the class SimulationEngine, there is a list of agent pointers and a list of obstacle pointers, which are the basic items in the whole database. And the updates are done via the list of agent pointers. In agent module, there is global pointer pointing to the GridDatabase2D, so each agent can update the database.

Adding items, either Agent or Obstacle, is kinda different. In TestCasePlayerModule, test case file is read here and parsed, then obstacles and agents are added in to _obstacles and _agents respectively. But agents are added into the database in a different way from the obstacles, via createAgent method in SimulationEngine by resetting each agent.

The interplay among those classes complicate the porting work, so I am planning not to reuse the updateAI code directly but to copy the underlying data, _basePtr and _cell out, process them in the GPU, and copy back afterwards. The workload is quite a lot, but this seems to be the only way unless I can get a better graphic card which supports CUDA with compute compatibility 2.x. Then I can reuse the C++ code directly, if CUDA can fully support the C++ features used in the SteerSuite.

Wednesday, March 28, 2012

SteerSuite underlying data structure analysis

Before porting the SteerSuite to CUDA code, I have to fully understand how SteerSuite works.

From, you can download the code.

There are seven projects in the SteerSuite solution, pprAI, simpleAI, steerlib, steersim, steerbench, steertool, and glfw. Most of the work will rely on the first three projects which includes the underlying data structure storing agents and obstacles, and how each agent move with the help of the AI.

First of all, all the data are stored in the GridDatabase2D class, in the form of an array of SpatialDatabaseItemPtr. And each item can either be an agent or an obstacle, depending on the initial options. Beside this array, another array of GridCell stores the identical database, but only the pointers to the original data. The whole geometry is divided into this array of GridCell, and if agents and obstacles are close enough, their pointers will be store in a GridCell.

Every time, each agent is updated by using corresponding AIs, either simpleAI or pprAI. First step, each agent will read its neighborhood information from the whole database, then do the corresponding computation, at last, the new position will be written back to the agent, and the content of GridCell will be changed as well as the position changes of the agents.

By examining the simpleAI, pprAI, and steerlib projects, there are six subclasses of SpatialDatabaseItem, i.e. SimpleAgent, PPRAgent, BoxObstacle, CircleObstacle, and OrientedBoxObstacle. Now I will only take care of SimpleAgent.

In SimpleAgent, there are _position, _velocity, _forward, _enabled, _radius, and _goalQueue as the class members. Because some of them uses self-defined data like Util::Point, Util::Vector and standard template library like std::queue<SteerLib::AgentGoalInfo>, I have to use float3 and array instead.

The basic routine will be that first copying data from host to device, then updating each agent in parallel at device side, finally copying the modified data back.

If this method does not work well, it may be worth trying rewrite database part completely in CUDA. Let us see.