Access to electricity is one of the most important requisites to economic and societal development. It is associated with decreased maternal mortality, increased education levels, and decreased poverty. [P. Alstone, et al. Decentralized energy systems for clean electricity access. ] However, 1 billion people still lack access to electricity. To enable electrification and grid expansion in energy-poor regions, it is crucial to know where the existing infrastructure is. This information can help policymakers and businesses decide on whether to expand the national grid, build a microgrid, or provide direct off-grid solar PV. Current approaches to identifying and mapping energy infrastructure tend to be expensive and time-intensive, consisting of aggregating survey data from the ground level and roughly scaling it across regions. This is where we want to help.
For three years, the Duke Energy Data Analytics Lab has worked on developing deep learning models that identify energy infrastructure, with an end goal of generating maps of power grid networks that can aid policymakers in implementing effective electrification strategies. Researchers have already been able to create an object identification model that can ID different types of energy infrastructure, as shown here: [2018-19 Bass Connections Team]
The high-resolution satellite imagery displays various infrastructure such as building, car, tower clearly and visibly. Using Low resolution images won’t give higher prediction accuracy.
We use the model to detect different objects in the training dataset such as energy infrastructure such as transmission lines and towers in addition to cars, buildings too. The identified objects are bounded by blue boxes.
Finally, the model is tested on new images, where each identified object (bounded by red boxes) was assigned a probability score of belonging to a certain class.
Image Segmentation is a deep learning model which can segment images and identify target objects at scale by assigning each pixel to a probability. Each Satellite image then can be simplified and partitioned into different segments based on object features, such as color, texture, and gradient, and offer insights on the model's generalizability across different geographic domains. We chose Models for Remote Sensing (MRS) [B. Huang et al. Large-Scale Semantic Classification: Outcome of the First Year of Inria Aerial Image Labeling Benchmark.], an encoder-decoder model to perform the segmentation. Then, we need:
The output of the model is the predicted bounding boxes around the object of interest - in our case, buildings. In order to apply Intersection over Union to evaluate an (arbitrary) object detector we need:
Precision measures how accurate our predictions are. i.e. the percentage of our predictions are
Recall measures how accurately we can find all the positives.
Area of union is the area encompassed by both the predicted bounding box and the ground-truth bounding box.
Area of overlap is the common area between the predicted bounding box and the ground-truth bounding box.
Dividing the area of overlap by the area of union yields our final score — the Intersection over Union.
A detailed blogpost explaining the accuracy metrics with an example can be found here.
The task was to identify transmission lines across the four cities shown.
In these plots, each colored precision-recall curve corresponds to a single city, and the black curve
model has been trained on all cities. The higher the area under the PR curve, the higher the performance
of the model.
As we can see from the plots, the USA model performed best in each test case, and the models trained on
single cities performed equivalently to the USA model only in the same city and performed poorly in
Source: Mapping electric transmission line infrastructure from aerial imagery with deep learning, Hu, Alexander, Cathcart, Hu, Nair, Zuo, Malof, Collins, Bradbury (2020)
As shown in the table, the most accurate model for a given city is the model that is trained on that same city, and models trained on a single city don’t necessarily generalize well to the other cities. Furthermore, we found that training a model on all of the cities results in higher test accuracies across the board. To address this problem of transferring learning across geographic domains we want to: Visualize the model’s representation of the current training domain(Goal 2). Provide ways to diversify this training domain (Goal 3).
As seen in the two object detection tasks, the model
accuracy increases if the model is trained on different cities. The geography and building types of each
city is enormously different. The reason for improvement in accuracy is that the model is able to learn
features of the different cities.
The buildings in Austin and Kitsap are widely spaced, whereas the buildings in Chicago and Vienna are tightly packed. The Kitsap has more green cover and the color of cover is also more green compared to all other cities. The texture of Vienna buildings has a peculiar reddish tone. Due to the difference in the physical features of each city, the training dataset should be representative of all these different features for the model accuracy to improve significantly.
Synthetic imagery can thus help us diversify our
training data, by adding more examples of different, representative satellite images.
Diversified training data, such as the "All Cities" example we see in the table above, can help us
overcome the geographic differences, and build robust models, agnostic to geography.
Another great thing about synthetic data is that it removes the need for hand-annotation and manual labelling.
Example of Synethetic Images from the Synthinel-1 Dataset
However, synthetic imagery is visibly different in its
textures and styles from real
satellite imagery. We can see from the three synthetic satellite images above, that they do not have
enough resemblance to real life buildings - there is something "off" and different about them.
To make them more realistic, we plan on transferring textures from real satellite imagery to synthetic imagery:
The reason for switching to buildings is because the INRIA Aerial Image Labeling dataset(link) has high
resolution satellite imagery of cities with labeled buildings. If we can build models that identify
buildings, they can be expanded to identify other types of infrastructure as well.
Source: INRIA Dataset
In order to automate the extraction, synthesis, and substitution of building rooftop textures, we created a pipeline that takes any of the 2500 images of cities in the INRIA dataset, capture the building rooftop texture in each city, then transfer the visual content and style onto a texture patch while maintaining image resolution, and finally, substitute these rooftop textures on top of buildings in simulated environments to create synthetic cities that are geography-agnostic.
Once we have obtained the largest rectangular roof patch, we want to be able to transfer the visual content and style of this roof texture to buildings in any other image. Using a picture of red peppers shown above as an example, the model should be able to take a snapshot of the red peppers’ color, shapes, background, and other characteristics and reconstitute them in a new image, shown on the right.
In our situation, we accomplished this by applying a feed-forward convolutional neural network model that recreates the color, gradient, rooftop structure, and more in a new texture patch, as seen below with the example of a roof of a building in Austin.
We then apply this model on unique rooftops for all 2500 images of 5 cities in the INRIA dataset, and create a bank of rooftop textures arranged by the city from which they were extracted. This way, we have a bank of textures to draw from when we create synthetic imagery, as seen in the example below with the city of Vienna, Austria.
Using building rooftops as a proof-of-concept in our
goal to include more diversity in landscape and infrastructure types in training data, we envision
creating a whole scene or a simulated environment with buildings and landscapes such as rolling hills
and rivers from our rooftop texture bank. With successes applying our model on forests, rivers,
carparks, and farm fields, we are one step closer to truly being geography agnostic and creating a range
of scenes simulating actual landscapes in our target regions.
Once we are equipped with the texture bank, we move onto the next step: feeding textures from our synthetic texture bank to the texture substitution algorithm.
The last part of our pipeline is texture substitution. Now that we have extracted rooftops from source
images and generated synthetic textures, we want to substitute these synthesized textures to new target
geographies. Figure 8 shows our texture substitution process.
We take a source image, choose a target image, and match rooftop textures based on the sizing of the target rooftops. This can help make our synthetic cities more realistic, which can do a better job of diversifying training data.
As shown in the figure on the left, the rooftops are flat, solid colors. However, they can provide the basis of realistic, textured synthetic cities. This is the process laid out in the figure on the right.
At this point, we can supplement our real satellite imagery with textured synthetic imagery, where the rooftops have been replaced. This will allow us to convincingly represent new regions and create more accurate building maps. As this research progresses, the team will expand this pipeline to other types of infrastructure besides buildings, eventually allowing us to map out entire energy grids around the world.
Our team focus has been to create a geographic agnostic model to identify energy infrastructure in satellite images. Existing model’s accuracy was decreased significantly due to lack of diversity in training data of satellite images. We need synthetic images to make training datasets more diverse and more representative of a broader range of geographies.
As a proof of concept, we have worked on generating synthetic rooftops in satellite images. We extracted building rooftops from images in the INRIA dataset, synthesized textures characteristic of these rooftops, and substituted them atop buildings in other images, which gave us satellite images with synthetic rooftops of our desired texture.
Our final output consists of a bank of textures of building rooftops from different cities, along with several examples of applying these textures onto simulated urban environments generated by CityEngine (a 3-D modelling software) to generate synthetic images of cities that capture unique features from different geographies.
Class of 2020
Class of 2021
Computer Science & Statistical Science
Class of 2021
Class of 2022
Computer Science & Mathematics
Class of 2022
Public Policy & Physics
Class of 2020
Masters in Interdisciplinary Data Science
Class of 2021
Dr. Kyle Bradbury
Dr. Leslie Collins
Dr. Jordan Malof