From The Cape Lab: Automating Road Detection Via Image Processing
By Peter Lorenzen, PhD & Kayvan Farzaneh
In 2018, one of our team members shared a news article from The Economist about the most recent outbreak of Ebola and an emergency effort to map roads and buildings in the Democratic Republic of Congo:
“On May 9th, the day after the first cases of Ebola were confirmed in Bikoro, an urgent request came into the headquarters of Médecins Sans Frontières (MSF) [Doctors Without Borders], an international charity. Maps of this part of the Democratic Republic of Congo were needed to deliver vaccines and medical help. Yet accurate ones did not exist.
MSF turned to the crowd for help. Volunteers, trained using an online tutorial, started analysing satellite pictures and drawing maps. About 450 volunteers have already managed to plot some 67,000 structures and 1,000km of roads in the area of the outbreak, completing in days a task that could have taken months. Some of these new maps (see above) are already in the field.”
Mapping 1,000 kilometers of road in a few days is no easy feat. But, being in the field of computer vision and aerial imagery analysis, we couldn’t help but wonder if this task could be automated using modern image processing techniques. Perhaps it could be done in hours or minutes, rather than days? If so, it could open a host of benefits to groups like Doctors Without Borders.
This discussion led us to our first side project at Cape Analytics, wherein we decided to explore methods for identifying roads from aerial imagery. As part of this project, we were most interested in seeing whether a model could successfully extract road networks, estimate road width, and evaluate road surface types (such as telling the difference between dirt and asphalt).
As a company that focuses on property analytics, there are a number of additional technical challenges inherent to roads. From an overhead perspective, buildings are 2D objects that can be represented as simple, compact polygons. Roads, however, are not as simple. How do you determine the polygons of a complex road network across a wide variety of topologies and challenging features like driveways, bisecting overpasses, and footpaths? Moreover, how do you break up and process large areas? Where does one road network end and another start?
To answer these questions we had to get to work.
A crucial first step was the development of well-defined road presence and road surface taxonomies. For roads, we defined two labels: road or background. For road surface, we defined five labels: concrete, asphalt, dirt, unknown, and background. We crowdsourced semantic labeling campaigns for ground truth generation, and then trained and evaluated our models in house.
Below is an example of an RGB reference (lest) versus a human annotated groundtruth mask (right)
We then developed multi-class fully convolutional networks (FCNs) for pixel-level road segmentation (2-class) and surface estimation (5-class) models. We decided to move forward with FCNs for this project because they are well suited to image segmentation and admit input imagery of arbitrary size.
To extract road networks, these FCNs were applied to aerial RGB imagery in order to generate heatmaps of pixel-level road likelihood. From there, we created road binary masks. We extracted road centerlines by identifying mask-containing Voronoi cell edges whose seeds are sampled at three-meter intervals along the edges of the road mask [ref Ojaswa Sharma]. We found Voronoi Diagrams to be a superior method than either Thinning or Distance Maps for this implementation because they most successfully preserve both geometric properties and topological properties. From the Diagrams, road networks were extracted in the form of graphs that carry both road width and road surface type along road centerlines.
Here is an input RGB image (left), a road heatmap (center) and, finally, a road mask (right):
An example of a road network derived from a Voronoi Diagram:
Finally, the below figure shows the 5-class road surface type model generating accurate results (left) when compared to ground truth (right).
Overall, we found that this approach preserves the desirable geometric and topological properties of the road mask and offers a tunable trade-off between accuracy and speed of extraction. A network merging methodology is currently being developed to that would allow the models to scale across large regions.
Our 2-class road extraction pixel model achieved a mean Jaccard Similarity Index (JSI) of .73, while our 5-class road surface type model achieved a mean JSI of .47, .65, and .41 for concrete, asphalt, and dirt, respectively. Currently, the top-ranking road extraction method was presented at the most recent Computer Vision and Pattern Recognition (CVPR) conference in June 2018, for a conference challenge called “DeepGlobe: Parsing the Earth through Satellite Images.” The method shown by the winning team achieved a JSI of 0.64. Although roads were extracted from satellite rather than aerial imagery in the CVPR challenge, our results are very likely consistent with, or better than, the state-of-the-art.
Some Challenges We Faced
Of course, the model also had trouble in certain cases:
- Tree occlusion: Once trees increasingly cover the road, the model tends to classify these pixels as background. However, the model also learned to “imagine” the underlying road in situations where trees were only partially occluding the road.
- Foreground and background with similar appearance: the model has difficulties differentiating between roads and parking lots. The assumption is that there are samples in the training dataset that show roads that have parking lots and thus the model tries to give its best guess of where the road is located. In this case, we found the model reproduces the behavior of a human labeler quite closely.
- Dirt roads: dirt roads have a large variance in appearance and the model tends to over-segment dirt road areas, especially where the image data is ambiguous. Another problem with dirt roads is that home driveways in dirt road areas are hard to distinguish from actual public roads.
For the purposes of linking this project back to the work done by the Doctors Without Borders volunteers, both tree occlusion and dirt roads would pose significant challenges in an environment like the Democratic Republic of Congo. However, with further refinement, we believe these could be overcome.
Potential Use Cases and Next Steps
Although there is still a lot of work to be done, this early example serves to show both feasibility and initial accuracy of automated road detection. Many organizations have a need for better road access information, and most rely on eyewitness accounts and volunteers to parse aerial imagery and annotate maps. Automating a portion of this process could be extremely useful to aid operations and first responders, particularly in post-catastrophe scenarios. For example, below is a map from the World Food Program, detailing road access in Nepal, where many roads were damaged during the 2015 earthquake:
By mapping the unmapped roads of the world, aid organizations could streamline operations — knowing exactly where roads exist, their current condition and passability, as well as their composition. All of this would allow for more efficient monitoring of conflict or the deployment of resources such as food, water, and medicine to remote areas, and even provide insight into more detailed information like speed of travel (a dirt road is much slower than an asphalt one, for example).
From this point, in order to make an automated solution fully operational, it would be necessary to develop models that scale across a variety of regions and conditions. A substantial amount of groundtruth would need to be developed to cover corner cases — which would, of course, necessitate upfront manual labor. But, once that is complete, the process of road detection could be feasibly automated and completed in hours instead of days or weeks.
Special thanks to Ingo Kossyk and Ayush Jain for critical contributions to this project, including model building and road network analysis, respectively.