Actually Useful Synthetic Data
What is Mirage?
Mirage is a platform that is designed to help computer vision engineers effectively utilize synthetic data in their computer vision models.
Our vision is to drastically improve how quickly machine learning models can iterate to accelerate humanity’s journey to an autonomous future.
Platform
Our platform helps engineers identify and improve their dataset weaknesses. First we train a model to identify data that is out of distribution. Then, we use the identified samples to create representative synthetic data using NVIDIA Omniverse replicator and generative techniques (we plan to work with diffusion models)!
With active learning we effectively integrate the synthetic data back into the dataset to improve the dataset quality and model performance. This platform is meant to be an iterative process by capturing the worst edge cases, creating synthetic data to fix the model weaknesses, integrating the best synthetic data into the dataset and repeating.
Get Aman Kishore’s stories in your inbox
Join Medium for free to get updates from this writer.
Demo of Current Platform
Architecture Diagram and Replicator
Here’s a simplified architecture diagram illustrating how we run Omniverse Replicator in AWS. When the user requests data generation from the frontend, our backend constructs the execution command with the specified user configurations. Then, it connects to AWS Elastic Container Service which deploys the Replicator docker image and generates the synthetic data. The results are stored in S3 for downstream analysis and retrieval.
Use Cases
Mirage can work with all unstructured data and help reduce the data bottleneck faced by any industry dealing with sensor-based data
We help these companies quickly adapt to new environments & changing sensor sets!
If you are working with any unstructured data, please reach out! We are happy to chat! My email is aman@mirageml.com

