PlatformMarketplaceSolutionsResourcesOpen DatasetsCommunityCompany

[Ed's TALK] The genesis story behind Graviti and future development of open dataset platform

Published at2022-01-24

This blog was originally published on You could click here for full article.

Edward, founder and CEO of Graviti recently had an interview with Unite.AI, and talked about the genesis story behind Graviti and shared his ideas about how Graviti as a data platform manages and structures data at scale and why it is needed for future business or developers and their data management.

Topic Discussed

Could you share the genesis story behind Graviti?

Working at Uber was very complicated at the beginning because people didn’t use big machine learning models and we lacked compute power and a data management platform to train models. The data we collected for self-driving cars were all unstructured. For example, they were images, videos, LIDAR points. All that type of data from real-world sensors and we collected tons of unstructured data every day. We did a statistic where it told us the amount of data we collected in a self-driving car division for a week is equal to the data we collect for the entire restaurant business globally for the entire year. Tons of unstructured data accumulated for every single day and that created big problems on how to store that data, how to manage that data, and how to use that data to actually generate values for different organizations.

After three years working at Uber, I saw the opportunity to improve how large-scale unstructured data could be managed. So, I founded Graviti in 2019 to accelerate innovations in AI by building the unstructured data management platform.

Can you discuss how Graviti is a platform to manage and structure data at scale?

Graviti aims to launch the first data platform that enables organizations to work with large volumes of unstructured data to power innovative AI applications. This platform eliminates the hassle and helps developers to manage large amounts of unstructured data with the team.

While the vast majority of available information in AI development is low-quality and unstructured, development teams usually spend over 50% of their time – not on building models – but on identifying, augmenting, or cleansing unstructured data, and that’s just the beginning of their work. Graviti offers a more expert data management way to free developers and gives them more time to analyze unstructured data and train artificial intelligence models.

We help developers in three dimensions: data discovery, data iteration, and workflow automation.

Why is unstructured data the future of AI?

Over 80% of enterprise data is unstructured now, including images, recordings, videos, social media posts, etc. AI is the key to delivering values from unstructured data. Enterprises start to leverage unstructured data to support in-depth research and further analysis.

What are some of the benefits of open-source datasets?

They benefit researchers because scientists have more free resources to use to train models and complete research.

They benefit enterprises, which use the datasets to start building AI abilities and power up the transition from traditional enterprises to AI enterprises.

How does Graviti authenticate the quality of the datasets?

Even popular datasets such as COCO and KITTI are not perfect for developers. Bugs always occur when developers train models and no one has found out an excellent way to improve dataset qualities. Graviti believes a dataset evaluation model will be established or other technical revolution will help the community solve the problem, and it is also part of Graviti’s mission to achieve in the future.

What is your vision for the future of how developers access data in the future?

For a small amount of data, developers should be able to access that data easily. For larger amounts of data, like more diverse datasets for training models, federated learning technology would help to work in collaborative ways by decoupling the ability to do machine learning from storing the data in a central server.