Friday, April 20, 2012

Finally SteerSuite works with GPU

Because of the limitation of CUDA C++, I dropped the original plan that put everything in GPU side. Instead, I only copy the necessary information, like position, velocity, direction, and so on to the GPU side.

The data structure is as follow:
typedef struct cuda_agent{
bool _enabled;
float3 _position;
float3 _velocity;
float3 _forward;
float _radius;
float3 _goalQueue[20];
int _goalQSize;
int _curGoal;
int _usedGoal;
AABox _oldBounds;
AABox _newBounds;
}cuda_agent;
First, I copied the data back and forth between CPU and GPU, however, I found it very inefficient. Therefore, I keep 
the updated data only in GPU side. After computation along with time, I copy the updated data back to CPU and draw 
them. And this make it run faster.

 Currently, I compare the performance between i7-2600 and GTX520, and they just have the same performance. I guess
it is because the computation ability of GTX520 is too weak. I will rerun the SteerSuite in other computers which have
better GPUs. 

I think I can get better results.

Wednesday, April 11, 2012

Update agent in parallel

As I mentioned in last post, all my data are in the GPU side. And according to the existing algorithm, there is no conflict while updating the built-in KD tree. So I decide to update each agent in parallel, but some modification is needed.

For simpleAI, two steps are taken actually in the critical method updateAI in the SimpleAgent class.
First, a agent will check whether it reaches the goal already, if it reaches it, then it will confirm whether the reached goal is the last. If it is, then the agent will disable itself and finishes the update, otherwise, the subsequent goal will be set as the current goal. Afterwards, a simple movement will be calculated, and the agent will update its position and the location in the KD tree. The annoying thing here is that invoking relocating object in KD tree is at CPU side, so it is impossible to make it run at GPU side. Therefore, what I did is storing the newBounds and oldBounds in the agent, then copy them to CPU side and finish the update part.

For pprAI, the idea is similar but taking more steps. perceptive, predictive, and reactive steps could be done in one big step, but beforehand, the required query should be done. Afterwards, the updating part as we did in simpleAI would be the same in pprAI.

Hopefully, I can finish simpleAI this week, and march to modify pprAI next week. I will keep the blog updated.

Putting database in the GPU side

For the data store in the GridDatabase2D which is pointed by a SpatialDatabaseItemPtr *, _basePtr, in order to put the data in the GPU side, I made _basePtr a SpatialDatabaseItemPtr **. First, _basePtr will be modified by cudaMalloc, pointing to a memory address in GPU side, then, by launching a kernel, I will use new in the CUDA side to assign a chunk of memory in GPU to the pointer pointed by _basePtr. And the _cells member in GridDatabase2D is treated in this way as well.

The reason I do things in this way is because in the kernel, only addresses in GPU side can be accessed, therefore another indirect pointer is required.

For the reason above, all agents and obstacles have to be allocated in this way. So most of the interfaces in the original program will be modified.

Currently, I successfully assign the memory blocks for _basePtr, _cells, and obstacles. But for agents, I always encounter a kernel error when I was trying to launch them. I will keep looking into it. Hopefully, it can be resolved soon then I can run the whole program.

Saturday, March 31, 2012

Issues with port steersuite to CUDA and some possible solutions

I have been trying to port SteerSuite code to CUDA for about an week. The largest obstacle with this is the inter-class referred the data structure.

In the class GridDatabase2D, the pointer _basePtr points to the overall database which stores every single item we need to draw and move, and the pointer _cells points to a GridCell array which stores the pointers of all the items according to those items' location.

But in the class SimulationEngine, there is a list of agent pointers and a list of obstacle pointers, which are the basic items in the whole database. And the updates are done via the list of agent pointers. In agent module, there is global pointer pointing to the GridDatabase2D, so each agent can update the database.

Adding items, either Agent or Obstacle, is kinda different. In TestCasePlayerModule, test case file is read here and parsed, then obstacles and agents are added in to _obstacles and _agents respectively. But agents are added into the database in a different way from the obstacles, via createAgent method in SimulationEngine by resetting each agent.

The interplay among those classes complicate the porting work, so I am planning not to reuse the updateAI code directly but to copy the underlying data, _basePtr and _cell out, process them in the GPU, and copy back afterwards. The workload is quite a lot, but this seems to be the only way unless I can get a better graphic card which supports CUDA with compute compatibility 2.x. Then I can reuse the C++ code directly, if CUDA can fully support the C++ features used in the SteerSuite.

Wednesday, March 28, 2012

SteerSuite underlying data structure analysis

Before porting the SteerSuite to CUDA code, I have to fully understand how SteerSuite works.

From http://www.magix.ucla.edu/steersuite/, you can download the code.

There are seven projects in the SteerSuite solution, pprAI, simpleAI, steerlib, steersim, steerbench, steertool, and glfw. Most of the work will rely on the first three projects which includes the underlying data structure storing agents and obstacles, and how each agent move with the help of the AI.

First of all, all the data are stored in the GridDatabase2D class, in the form of an array of SpatialDatabaseItemPtr. And each item can either be an agent or an obstacle, depending on the initial options. Beside this array, another array of GridCell stores the identical database, but only the pointers to the original data. The whole geometry is divided into this array of GridCell, and if agents and obstacles are close enough, their pointers will be store in a GridCell.

Every time, each agent is updated by using corresponding AIs, either simpleAI or pprAI. First step, each agent will read its neighborhood information from the whole database, then do the corresponding computation, at last, the new position will be written back to the agent, and the content of GridCell will be changed as well as the position changes of the agents.

By examining the simpleAI, pprAI, and steerlib projects, there are six subclasses of SpatialDatabaseItem, i.e. SimpleAgent, PPRAgent, BoxObstacle, CircleObstacle, and OrientedBoxObstacle. Now I will only take care of SimpleAgent.


In SimpleAgent, there are _position, _velocity, _forward, _enabled, _radius, and _goalQueue as the class members. Because some of them uses self-defined data like Util::Point, Util::Vector and standard template library like std::queue<SteerLib::AgentGoalInfo>, I have to use float3 and array instead.

The basic routine will be that first copying data from host to device, then updating each agent in parallel at device side, finally copying the modified data back.

If this method does not work well, it may be worth trying rewrite database part completely in CUDA. Let us see.

Sunday, March 18, 2012

Integrate CUDA into SteerSuite

SteerSuite is a pure C++ project. I need to do something in CUDA in this project. So first thing to do is making CUDA code run with this project.

By referring this post: http://www.ademiller.com/blogs/tech/2011/05/visual-studio-2010-and-cuda-easier-with-rc2/, I successfully make CUDA run with SteerSuite, more accurate, with steerlib. Because I am planning first to optimize data retrieve in GridDatabase.

The way of integrating CUDA code into existing VS project is fairly easy, and it can be done in following steps:

1) Select the project in the solution explorer (here I choose steerlib), and then select  Project--Build Customization menu. In the dialog, check CUDA 4.0 targets.













2) Then right click on the .cu file and select Properties. Make sure that in Configuration Properties--General, "Item type" is CUDA C/C++


3) You should make sure that NVCC CUDA compiler targets your original platform, either Win32 or x64.
In project's Properties, open Configuration Properties--CUDA C/C++. The "Target Machine Platform" is correct. Here my target is Win32.

4) Open Configuration Properties--Linker--Input, and add cudart.lib to the list in the "Additional Dependencies", and do not forget use semicolon(;) to separate each item.


Now, you are all done. Just rebuild your project. But wait, how to use CUDA code in the source code.
You just need to use extern "C" to decorate your functions in .cu file, and use it in your original project code, but do not pre-declare your function in at the head of the file in which you use it, in the same form as extern "C".

I followed the steps mentioned above, and run a simple function in .cu file. Hopefully more complicated ones will work as well.

Wednesday, March 14, 2012

Crowd Simulation with SteerSuite starts


Crowd simulation is the process of simulating the movement of a large number of entities or characters, now often appearing in 3D computer graphics for film. While simulating these crowds, observed human behavior interaction is taken into account, to replicate the collective behavior. 

In the final project, I am going to use SteerSuite, which is developed in UCLA, to show my crowd simulation. SteerSuite provides a framework to develop AI for steering objects in the crowd simulation. In the project, I will modify the built-in pprAI which is based on the paper “A modular framework for adaptive agent-based steering” from CPU-based code to GPU-based code, and observe the performance boost brought by the GPU acceleration. 

In SteerSuite, there are mainly three steps for each agent to move towards its goal. First, the agent will read environment data from database, this stage is also called perception phase. Then the agent will analysis the situation based on the received environment data, this stage can be subdivided into prediction phase and reactive phase, calculating possible collision and steering respectively. At last, the steering result will be written back to the database, and the movement will be rendered by the graphic library. 

In the three steps described above, parallelism can be exploited in each of them. We can parallelize reading environment data, do the analysis simultaneously, and writing data back in parallel. I will apply these parallelization one after another and uncover the acceleration brought by these changes.
At the end of this project, we should be expected to see certain amount of speed-up brought by GPU over CPU. 

 
Reference:

Crowd Simulation: http://en.wikipedia.org/wiki/Crowd_simulation
SteerSuite: http://www.magix.ucla.edu/steersuite/
pprAI algorithm: http://dl.acm.org/citation.cfm?id=1944769