Synthetic data is the future of AI. At least, that’s what USC senior Michael Naber (‘21) and his co-founder Jacob Hauck say. Last week, the St. Louis natives launched Simerse, a new startup focused on creating datasets to train AI and computer vision algorithms.
“Everyone hears about the magic of AI, but what they don’t realize is that training data is actually really tough to gather and label. We figured out that it’s actually easier to make fake data than to collect real data,” said Naber. Co-founder Jacob Hauck laid out the process: “We take what is basically a 3D video game environment and use that to create training data. If you think about it, super realistic 3D rendering is already used in video games and film, but now we’re using it to help train AI.”
Their first use case? Agriculture. “What we realized early on is that agriculture has so much variation, and real image datasets just don’t transfer well across regions. For example, orchards in Florida are different than vineyards in California. So instead of having to collect separate datasets for each farm by hand, we generalized a synthetic farm and then tweak it to the region and crop,” says Naber. “That way we can generate massive training sets at exponentially lower costs than manual gathering.”
Hauck also praised the potential for computer vision in agriculture: “Computer vision has all kinds of applications in agriculture. You can do yield mapping, weed detection, crop classification, etc. These insights can lead to real outcomes, such as adjusting pesticide usage based on early indicators of yield. There’s a lot of interest from farmers, and the agriculture industry is just getting started with this stuff.”
Simerse points out that agriculture is a huge market: “Agriculture is a $50B industry in California alone. There are many customers, no overwhelmingly dominant industry player, and a lot of opportunities to add value.” The sector’s venture capital space is burgeoning, too. According to AgFunder, nearly $20 billion was invested into AgTech startups in 2018. While the company wants to start with agriculture, Hauck sees a lot of opportunities in adjacent markets as well. He believes synthetic data is a technology that is applicable across industries, and plans to fine-tune their technology in agriculture, and then focus on expansion.
It’s no secret that AI is increasingly hyped. In 2019, AI companies raised over $18 billion according to the National Venture Capital Association. This numeric exceeds the previous year by nearly $2 billion. The co- founders at Simerse believe this trend will continue: “There’s no stopping AI. We’re going to need more and more training data, and that’s exactly the market gap we hope to fill.”