I recently gave a talk at embedded systems night at a Chicago hackerspace called Pumping Station One (www.pumpingstationone.org). In it, I explained why I chose to go with embedded vision for obstacle avoidance in my autonomous car and discussed options for overcoming the performance issues with embedded systems.
I’m working on an RC-size autonomous car (http://jeffsinventions.com/?p=577). Basically, you give it a location to go and it goes there. Currently, it goes from one point to another in a straight line. Unfortunately, it is not smart enough to avoid crashing into things. To solve this, I’m planning to use a depth map of the environment in front of car. That is, for each angle in front of the car, it specifies how far away the nearest object is.
From that, I can make the car turn in the direction closest to its destination, where the object is adequately far away.
But, before I can do this, I need to figure out how to get a depth map.
- The field of view, the angle in front of the car that can be sensed, has to be wide enough. Imagine what would happen if car were only able to see a one degree cone in front of it and it were going through a croquet wicket. It would try to go through, but the sides of the wicket would stop it (or it would tear the wicket out of the ground).
- The system has to have an adequate resolution in order to distinguish between obstacles and viable paths. Consider the extreme case where the resolution was one pixel wide—the car wouldn’t be able to tell where to go.
- The system can’t be too heavy. I don’t want the car to drag its “stomach” on ground. This also implies that the power requirements can’t be excessive (if they were, I would need a big battery or unacceptably frequent recharges).
- The system has to be timely in taking in information about the world. If the car took ten seconds to acquire information and act accordingly, it would crash before it realized that it was about to.
- The system must work outside.
- It will detect a range of objects (it would be unacceptable, for example, if it could not detect trees).
- The system must have a reasonable cost.
Active sensors bounce waves off objects and calculate time of flight (how long it takes for the wave to return to the transmitter). There are a couple of variations on this theme and none of them work for my context.
- Ultrasound. I tried mounting a Parallax Ping (http://www.parallax.com/tabid/768/ProductID/92/Default.aspx) on a servo in front of the car. Unfortunately, it didn’t seem to pick up curbs. This is puzzling, because I know that there are commercial cars and after market kits that use ultrasound for curb detection (e.g., http://www.apogeekits.com/ultrasonic_parking_sensor.htm).
- Infrared. At first glance, the Kinect would seem to be an attractive option for this. However, outdoors, at the Rutgers engineering fair, I saw a poor soul try to use a Kinect for obstacle avoidance on a robot. Unfortunately, the infrared radiation from the sun washed the sensor out.
- Lidar would be nice, but unfortunately it is tens of thousands of dollars.
The main alternative to active sensors for depth perception is computer vision. Computer vision is the use of a computer to make judgments about images from a camera. To get a depth map using computer vision, one can use Stereopsis (http://en.wikipedia.org/wiki/Stereopsis). Imagine if you took two images of one object from similar perspective. If you overlay one on top of the other and evaluate the differences, the sections where the images are similar are far away and the sections where the images are different are close. To get a sense of this, put your finger in front of your nose and close one eye at a time in rapid succession. You can see how your finger is in a very different part of the scene for each eye. Then, move your finger far away from your nose and close one eye at a time again. This time, the two scenes are similar (attempts have also been made to use monocular vision, but they are difficult to implement; e.g., http://www.cs.cornell.edu/~asaxena/rccar/).
How I will implement vision
Pulling off computer vision requires a camera (which grabs a frame in the form of a two-dimensional array of pixels), software instructions for analyzing the frame, and a hardware implementation of those instructions.
I chose to go with a CMOS camera, because of their low power requirements relative to CCD cameras (http://www.siliconimaging.com/ARTICLES/cmos_advantages_over_ccd.htm asserts that CMOS cameras require 20-50mW as against the 2000-5000 mW required by CCDs). An attractive sensor is the Sony Exmor R, a $20 CMOS chip that can do 5 MP (the idea came from http://downloads.deusm.com/designnews/21020908_BDTI_DN_Day_2.pdf).
For software, I opted for OpenCV (http://opencv.willowgarage.com/wiki/Welcome), a set of computer vision libraries written by Intel, because it is well supported. Below is the output of a program I wrote in Python that takes frames from a web cam and uses an algorithm called Canny to detect edges (while I won’t use Canny edge detection for depth perception, it was a useful example to familiarize myself with OpenCV). The code is here: https://github.com/JeffsInventions/autonomouscar/blob/master/canny.py.
A couple of options for hardware processing that I considered turned out to be not viable.
- It’s tempting to use a smart camera like the ones I’ve seen used to route containers on conveyor systems (e.g., http://www.cognex.com/IS7000.aspx), because they are relatively easy to set up. Unfortunately, I haven’t seen any with built-in depth algorithms.
- Another option is to send frames from the camera to a laptop wirelessly, as the laptop has more processing power. However, it takes too long to send the information back and forth.
Consequently, I’ve decided to connect the camera to a small computer (e.g., a Raspberry Pi). The biggest concern with these, however, is performance. Folks are reporting that with a Raspberry Pi they are getting less than 5 frames per second at 320×240 while doing basic image processing operations in OpenCV (http://www.raspberrypi.org/phpBB3/viewtopic.php?f=29&t=9764, http://www.raspberrypi.org/phpBB3/viewtopic.php?f=37&t=11745, and http://eduardofv.com/read_post/185-Installing-OpenCV-on-the-Raspberry-Pi).
A couple of good suggestions for overcoming this bottleneck came up in the discussion at the talk.
- Overclocking my Raspberry Pi
- Finding a way to run OpenCV on my Pi’s GPU (my preliminary research turned up no one who had done this and porting it myself is a more ambitious project than I want to take on)
- Picking up a faster processor (e.g., Rikomagic MK802 II http://store.cloudsto.com/rikomagic/rikomagic-mk802-ii-detail.html, Olimex A13 https://www.olimex.com/Products/OLinuXino/A13/A13-OLinuXino/, or Beagle Bone http://beagleboard.org/bone). The Cortex A8 processor in these was said to have been more than two times faster than the processor in the Pi.
- Using a Fresnel lens
- Ensure that the camera, not the Pi is doing image compression
- Only analyze the parts of the image with potential obstacles (finding out that the sky is far away is not informative)
- Introduction to embedded vision (http://www.designnews.com/lecture.asp?doc_id=248898&piddl_promo=765&cid=emb3)
- Setting up OpenCV + Python on Windows (http://luugiathuy.com/2011/02/setup-opencv-for-python/)
- Learning Python (http://www.diveintopython.net/)
- OpenCV + Python reference (http://opencv.willowgarage.com/documentation/python/index.html)
- OpenCV + Python examples (http://www.neuroforge.co.uk/index.php/getting-started-with-python-a-opencv)