1: for each beam b 2: pb ← partner beam (a beam slightly further) 3: for each point P in beam pb 4: y ← distance(P) 5: for each point Q in beam b, around P (+/- few degrees) 6: x ← distance(Q) 7: E[P] = expected distance of P, as a function of x 8: if y < E[P] 9: P is an obstacle
That can be up to a million distance tests (line 8) depending on the situation. Velodyne data comes once every 100ms and we need to do a lot more once points have been labeled, so this should run in about 10ms. When I started coding it, it was running in a couple of seconds. This is the story of how I brought it down to 12ms on a single core of an average laptop (without using a GPU or OpenMP).
Since the structure of the velodyne is fixed, we can pre-compute the beam association (beam partnering), formula for expected the distance, which is a function of the vertical angle of beams.
Then, I realized that once a point had been labeled has an obstacle, there was no need comparing it with its other neighbors. This cut down the execution time from a few seconds to about 500ms.
I organized the data in a 2D array, beam index being the first dimension. For each point I am storing its angular position, so that I can do neighbor searching based on angular distance. After running it through a profiler (gperftools) I realized that this organization was slowing down the algorithm a lot: first of all the algorithm didn't really know where to look for neighboring points so a bit of searching was necessary, and angle comparison was too long.
So I reorganized the points into a 3D array, with points binned by small groups according to their angular position in number of encoder ticks (integer). I am using 700 bins, leading to a couple of points per bin. Then finding neighboring points is straightforward: knowing the encoder value for a point, simply divide it by the max value and multiply by the number of bins. This led to a huge speedup, down to about 35ms.
Again, I inspected the code with the profiler and noticed that it was spending a lot of time accessing the points in the array. So I cached the pointer values when the same location was used several times.
Obj &p = table[i][j] for( ) do something with p
is faster than
for() do something with table[i][j]
This brought down the execution time to about 20ms.
Letting the compiler optimize the code (until now I was in debug mode), brought the execution time down to 12ms!
ps: code formatting from http://codeformatter.blogspot.com/