15418-Final-Project

Final Report

https://docs.google.com/document/d/1qyeijTi3HBF1TzQkeOrlOfEhb-BM_51niq_WDS4tv78/edit?usp=sharing

Milestone Report

https://docs.google.com/document/d/1Q0LRbJ3SyzvPCi-ktEX833drrTi7vuU1dRujk6NUMDc/edit?usp=sharing

Project Proposal

https://frankkim220.github.io/15418-Final-Project/

Title

Accelerate Execution of a Cone Detection Algorithm using LiDAR Pointclouds for Deployment on a Driverless Racecar

URL

https://frankkim220.github.io/15418-Final-Project/

Summary

We are going to implement an optimized LiDAR clustering algorithm for cone detection. We will conduct analyses to test what implementation works best when running in the context of a full driverless racecar software stack.

Background

Carnegie Mellon Racing - Driverless is a team that develops an autonomous FSAE racecar every year. This goal of this project is to develop a racecar that can autonomously navigate a random track denoted by blue and yellow cones.

This year, the team has chosen to utilize a LiDAR as the central piece of its Perception system to identify the cones that make up the track. The LiDAR outputs a dense point cloud (roughly 50k points per second) and the team is currently using a sequential, python-based algorithm to identify clusters of points that are likely to be cones. The points in the cluster can then be averaged to estimate the three-dimensional position of the cone with a high degree of accuracy.

The cone detection algorithm consists of three steps:

  1. Ellipse Cut - points that are outside of the expected track bounds and at a very far distance away from the racecar are irrelevant to the Perceptions system. This is because points outside the expected track boundaries are unlikely to contain cones and points that are far away have a higher probability of having inaccurate depth estimates.

  2. Ground Filtering - of the points that remain, most of them represent the ground. As such, most of these are useless and should be removed.

  3. Clustering - of the points that remain, identify points that are close together and likely to represent cones. Group those points together and average their positions to get a single estimate of the position of the cone

We hope to speed up our sequential algorithm by employing parallelism and experimenting with whether we see greater improvements through utilizing multi-core parallelism or through GPU parallelism.

The Challenge

  1. We will need to spend significant time understanding the multiprocessing + CUDA libraries available to us in Python.

  2. Points appear to us randomly (they are not given in any sort of order). Thus, a good scheme needs to be figured out for distributing work to processors/GPU cores as it is likely that the Ellipse Cut and Ground Filtering will take away more points in some cases than others, leading to high workload imbalance.

  3. Ground Filtering requires the generation of a uniform plane that can be used to model the ground. If we were to naively split points across processors, we would end up with lots of disjoint planes that model local segments of the ground. Thus, we may need to employ some sort of boundary-communication scheme where neighboring processors have copies of the boundary points to be able to concoct a single, smooth ground plane.

  4. In addition to this LiDAR cone detection algorithm, there are lots of other algorithms running onboard our single compute unit on the racecar (ex. Trajectory generation, control-action generator, localization and mapping algorithms). How will context switching with these other processes affect the performance of our entire pipeline?

Resources

The Driverless team only has space for a single compute unit on the racecar. For this year, we have chosen to use a machine with a single Intel i7-9700TE 8-core processor with a Nvidia Quadro Embedded RTX3000 GPU. All of our analyses will have to happen on this machine, as we are bound by the computational limits of this onboard computer.

We will be starting with an existing algorithm that was written by the Driverless team. This has been deployed on the car with real pointcloud data streaming and has been verified to generate correct outputs. However, the algorithm needs to be sped up in order to properly hit the team’s goal of achieving cone estimates at 20Hz.

Goals and Deliverables

PLAN TO ACHIEVE: We plan to achieve a full algorithm set that can output cone estimates at 20Hz. This is necessary in order to run in real-time on the racecar. Experimentation will need to be done to figure out the best implementation for this given that our algorithm is contending for limited computational resources alongside various other driverless algorithms.

HOPE TO ACHIEVE: Some of the best teams in the world run their driverless software algorithms at 40Hz, allowing them to process more data and make more decisions per second. While this would be an ambitious goal, speeding up our algorithms to run at 40Hz would make us even more competitive against other racing teams.

DEMO: The team has collected terabytes of pointcloud data from practice runs we have done in the past. Our demo will consist of running this algorithm on real-life data that was collected and observing that our cone estimates are output at a rate greater than or equal to 20Hz. At the poster session, we hope to have videos showcasing our demo as well as a video of the algorithm being deployed on the racecar and helping it to complete a full race.

Platform Choice

As stated previously, we are locked into our hardware as that is the compute unit used onboard the racecar. We believe that this machine is good for our purposes though because it has an eight-core process alongside a GPU with 1920 CUDA cores. Thus, we should have plenty of computation resources available to successfully parallelize the algorithm (although this may be bottlenecked by the other software running onboard the machine).

Schedule

Week Task to Complete
Week 1: March 25 - March 29 Submit Project Proposal
Week 2: April 1 - April 5 Investigate Python CUDA support + Multiprocessing libraries
Create test scripts to understand how to use these libraries
Week 3: April 8 - April 12 Implement + test improvements to Ellipse Cut, Ground Filtering, Clustering on eight-core processor using Python’s threading/multiprocessing libraries
Week 4.0: April 15 - April 17 Implement PyCuda for phase one
Finish sequential implementation of phase two algorithms
Submit Milestone Report - due April 16th
Week 4.5: April 18 - April 21 Conduct in-depth analyses for how threshold parameter changes load balancing for phase two algorithms
Investigate speedups that can be achieved through threading, multiprocessing, and PyCuda for phase two
Week 5.0: April 22 - April 24 Look into ghost cells implementation for phase three
Identify Python message-passing interfaces that would enable this phase to be developed
Week 5.5: April 25 - April 28 Test phase three implementation
Week 6.0: April 29 - May 1 Buffer days for any parts of the project that take longer than expected
Work on draft of final report and poster pages
Week 6.5: May 2 - May 5 Submit final report
Prepare for final presentation, finish up poster pages