Synthetic data for better, cost-efficient satellite data-based AI models

Exploring the usability of synthetic satellite data across several use cases to optimize the cost, time, and performance of AI model development.

This article presents the results of the first stage of collaboration between the Innovation Center of NTT DATA and Bifrost AI with the purpose of tackling various challenges that arise in the development of AI using satellite data - both a growing need among our clients and a promising technology crossover that will open a sleuth of innovation opportunities in the near future.

The partnership aimed, through a PoC, to leverage Bifrost AI's synthetic data generation technology to address the challenges of data accessibility and high development costs. This aimed to enhance the development of AI models for various use cases. The goal of the PoC was not just to assess potential improvements in model performance and robustness, but also to determine if it is possible to reduce the reliance on real images - often limited and costly - while still obtaining high-performing models. By doing so, the teams aimed to demonstrate increased efficiency and scalability for future satellite AI projects.

The PoC proved significant improvements in AI performance with the use of synthetic data. For the object detection use case, the mean average precision (mAP) increased by almost 10 percentage points. For the change detection use case, the overall F1-score increased by almost 3 percentage points. This increase was obtained by using the same satellite images as the original dataset, where some of the images were synthetically enhanced, proving that AI performance is no longer data-scale dependent. Additionally, the PoC highlighted the ease of adoption of Bifrost AI's technology, high customization capabilities, and the benefit of no additional annotation effort required.

Index

Background & Context
About Bifrost AI
Collaboration Scope & Objectives
PoC Approach
Results & Key Findings
Technical Insights & Lessons Learned
Strategic & Future Outlook

Background & Context

Recent advancements in satellite technology, particularly in higher temporal and spatial resolution, have made it possible to monitor the Earth with unprecedented detail and frequency. This has opened the door to entirely new use cases employing AI - from near-real-time infrastructure monitoring to precision agriculture and dynamic risk assessment. These advancements are allowing NTT DATA to address complex challenges and create significant new value for clients across industries. Furthermore, satellite technology continues to evolve rapidly, and we expect a significant expansion of opportunities in the long term.

Despite the many success stories and consistent pipeline of opportunities regarding satellite data, we have also identified significant challenges in acquiring and utilizing large volumes of satellite data for AI development, which are generally essential for robust, high-performing AI. These challenges are driven by the high costs of data acquisition, the difficulty in obtaining specific data due to infrequent appearances of target objects or target situations, and the high costs associated with data annotation.

To remain competitive and meet growing demands, identifying a reliable solution to streamline and reduce the costs of satellite data based AI development became essential. The Satellite Team at the Innovation Center identified synthetic data as the best solution to our challenges and Bifrost AI as a potential partner due to their synthetic data generation technology.

This collaboration aimed not only to test the impact on AI development of Bifrost AI's synthetic data technology, but also to explore its integration into development workflows and business processes.

About Bifrost AI

Bifrost AI, headquartered in San Francisco and founded in 2020 by Charles Wong and Aravind Kandiah, is transforming how physical AI is built by making high-quality data accessible at scale through advanced synthetic data generation.

Its foundation is a physically-accurate 3D simulation engine, designed to be fully programmatically accessible. This enables users to generate realistic, tailored datasets that replicate rare, complex, or high-risk scenarios - the kinds of conditions that are often prohibitively expensive or impossible to capture in the real world at the volume AI systems require.

This programmatic approach offers a level of precision, automation, and customization that outpaces traditional 3D simulators and the current wave of 2D generative AI tools. For organizations like ours, it creates new opportunities to accelerate AI development, unlock the full potential of satellite data, and move faster toward strategic goals.

Examples of synthetic images generated by Bifrost AI's tool

Collaboration Scope & Objectives

The joint collaboration was defined with the following core objectives:

Evaluate the usability of Bifrost AI's technology and determine how easily it could be adopted into our existing AI development workflow;
Assess the extent to which Bifrost AI's simulation platform could be used to customize synthetic data to address specific requirements and scenarios;
Analyze the direct impact of synthetic data generated through Bifrost AI's technology on the performance of developed AI.

The collaboration was designed as a proof of concept over a period of 3 months and the target AI models were object detection and change detection models, where the change detection model targeted a specific type of change, rather than general change.

PoC Approach

Technical Setup: The PoC leveraged AI models already developed by NTT DATA and the satellite image datasets that each model had been originally developed on.

We have taken the following common approach in the preparation of our new datasets including synthetic images and in the training and evaluation of the updated AI models for each scenario:

Prepare the original dataset used for each model development, with the same training, validation and test data distribution
Place target objects or change detection target objects in the original dataset images and generate new, synthetic images
Retrain the untrained model on the new training dataset
Test the new model on the original test dataset (unchanged, not containing synthetic images)

Data details: We have used 2487 images as the original dataset for the object detection model, and 414 images for the change detection model, both datasets consisting of high-resolution optical satellite images.

Evaluation: The evaluation metrics involved recall, mean average precision (mAP) and F1-score values, all typical metrics in the evaluation of image based AI models.

Results & Key Findings

The PoC proved the value of synthetic data usage in the training of AI models, showing significant improvements in AI performance obtained with a small number of synthetic images:

for object detection, 86 images have been updated to include synthetic objects
for change detection, 70 images have been updated to include synthetic change

With these small dataset updates we have obtained the following results:

Object detection: the mAP increased by almost 10 percentage points, reaching 0.89, while recall of object detection increased by 4 percentage points, reaching 0.93
Change detection: the overall F1-score increased by almost 3 percentage points, with precision reaching 0.7 and recall reaching 0.57.

Qualitative insights we have obtained through this PoC include:

Ease of use and smooth adoption of Bifrost AI's technology by the data scientist team due to the programmatic aspect of the synthetic data generation platform
High customization of the position and appearance of objects and target changes, as well as ease in matching the appearance of synthetic object with the original image to create highly realistic, aspect-wise homogeneous images
The output of annotations alongside synthetic images by the Bifrost AI platform implied that no additional annotation effort was required, the tool overall offering performance increase benefits at no additional data purchase and annotation cost.

Technical Insights & Lessons Learned

For the object detection scenario we have observed a very high similarity between the original and synthetic images, generally the synthetic images being indistinguishable from the original ones. On the other hand, in the update of the change detection model, we have understood that synthetic images can still have a positive impact on model performance despite visual discrepancies between the real and the synthetic images showing change. Our assessment of the results showed that improvement to the change integration within Bifrost AI's platform could indeed induce a higher improvement in performance, but the current integration method can still bring improvement.

During the project we have also attempted to approach an object detection model for which we have used additional satellite images, rather than the original dataset's images. Due to budget constraints, the new satellite images had a lower resolution than the original dataset, and despite the easy integration of objects in the appropriate resolution in these new images (the synthetic objects had themselves a lower resolution in the new images), the results did not show improvement with the usage of synthetic images, due to the fact that the test dataset had included original images and objects in higher resolution.

Strategic & Future Outlook

The collaboration between the NTT DATA and Bifrost AI highlights how combining deep AI expertise and cutting-edge simulation innovation can produce measurable technical and strategic benefits.

Daria Piwowarczyk

Senior Expert
Innovatin Center, Innovation Technology Department, Technology and Innovation General Headquarters, NTT DATA Group

Daria Piwowarczyk is a Senior Expert at the Innovation Center of NTT DATA. She is a Data Scientist with extensive experience in AI/ML across various industries. She is currently leading the Satellite x AI focused team, looking to unlock new avenues for this rapidly evolving technology symbiosis.