Our daily lives are more linked into a globalized grid than ever before. Products are sourced and shipped from afar; traveling to a place 3,000 miles away can be easier than getting across a big city in traffic; and information disseminates to anyone and everyone at the tap of a finger.
A startup called Sanas has developed some AI voice technology that aims to make one critical component of that grid work more smoothly — how people speaking the same language but with different accents can understand each other better, by filtering accented voices and turning those accents in real time into other ones. Today the startup is announcing $32 million in funding on the heels of strong momentum for its tools as it comes out of stealth and launches more widely.
Insight Partners is leading the investment, with participation from new backers GV (formerly Google Ventures), strategic backer Assurant Ventures and angel investor Gokul Rajaram. Previous backers Human Capital, General Catalyst, Quiet Capital and DN Capital are also participating in this Series A round. Along with the investment, Sanas is also announcing a strategic partnership with Alorica, one of the largest BPOs in the world, which is rolling out the tech to 100,000 employees and 250 enterprise customers globally.
The company is not disclosing valuation but we understand that it is $150 million post-money. This Series A is one of the biggest for a voice AI startup, and from what we understand, it’s coming after Sanas turned down an acquisition offer from Google. (If you can’t buy ’em, invest in ’em!)
As you might have surmised from its list of investors, Sanas’ tech is already being deployed in call centers. Specifically, it’s found a lot of traction with far-flung customer service providers, which have become a hotbed of abuse against agents who might speak the same language as a customer, but heavily accented.
In addition to insurance giant Assurant and BPO leviathan Alorica, other customers include the large collection agency firm ERC and travel industry BPO IGT. In a sad comment on the state of our world, Sanas’ CEO and co-founder Maxim Serebryakov said the result of using the tech in these places has been dramatic in terms of the reduction in agent harassment.
Sanas’ plan is to use the funding both to continue expanding its business in that vertical but also to start to shape up for other use cases in the enterprise, for example as a plug-in for video calls, or for voice-based interactive services to help machines (and ML-based systems) understand a wider range of accents.
Serebryakov initially co-founded the company with Shawn Zhang and Andrés Pérez Soderi, two fellow students at Stanford’s artificial intelligence lab, after a fourth friend of theirs needed to leave school and return to his native country, Nicaragua, to take a job to help with a family emergency.
The friend took a job at a call center back home serving customers in the US, and even though he was completely fluent — and a student taking a break from Stanford no less — he faced endless abuse over the phone from people who didn’t like his accent.
The three others understood that judgment, response, and abuse this all too well, being first-generation immigrants themselves (and I’ll add that I know this very well first-hand myself too, both in my current life and growing up as a first-generation immigrant in the US). And so they decided to put their AI learnings to the test to see if they could fix it. (Earlier this year, Sanas also picked up a fourth co-founder, Sharath Keshava who is also now COO, who left another company he co-founded, Observe.ai, after learning about the company and wanting to be involved in building it. )
There are tons of tools out there today to “autotune” and modify a person’s voice in real time or delayed time — they are about as common as photo filters at this point. But as Serebryakov notes, it’s especially tricky to be able to preserve the natural, actual voice and change how it is saying what it is saying.
Interestingly, the problem is one that is so abstracted — Sanas has approached fixing it by ingesting thousands of hours of differently accented speech into a system and ordering it to match with other sounds, with the whole mix of technology and method now also in a patent process — that the end result is that Sanas’s accent “translation” engine can be used with any language at all, not just English as you might have assumed. (Serebryakov tells me it’s being used already to “smooth out” accents across Japan, China and South Korea, for example.
“Technology like this is applicable globally, from one accent to another,” he said. “It will take time, but our goal is to let people communicate in any accent at all.”
There is a certain unease around the very concept of what Sanas is tackling and doing here. It raises a lot of questions of potential abuse, and apart from that some might find it distasteful and passé for technology to be developed specifically to obscure a person’s true identity: shouldn’t the people who are judging over accent the ones who should learn to be more open-minded and accepting, rather than people forever accommodating prejudices by hiding anything that marks out others as outsiders or different?
There are points that argue against these, too, though. Sanas is specifically not building any applications for consumers or making its tech accessible to them at the moment precisely because of how that might get misused. Even its customers are not using a cloud-based version of the tech: to keep things extra-secure, it sits on premises and so customers control their own data that passes and is generated through Sanas.
On the part of obscuring true identity, that is definitely a bigger issue that we all need to tackle on a daily basis. And in the meantime, this is giving those who are at the sharper end of those jabs a way of coping better, and in some very practical ways making it easier for people (even those who are well-intentioned) simply to understand each other without accents getting in the way.
I had a demo of the service during my interview, where Sanas rang up one of their customers’ agents in India, and got him to chat with me first with his own accent, and then “turning on” his midwestern-neutral tone. It was a little creepy knowing what was happening in the background, but on the surface, I was very surprised at how natural it all seemed — well, natural enough, at least. His voice was clear, but maybe a little too clear, and almost a tiny bit robotic and emotion-free.
Apparently, that, too, is somewhat intentional for now, and might evolve if that’s what customers and other users want.
“The reason we are focused on call centers is that it’s a low hanging fruit,” Serebryakov said, noting that the difficulties of building what is effectively ground-breaking tech was hard enough, but it also happens to fit the use case. “For us when building this, it was important to go down the path of least resistance. No singing, no laughter, no hyper emotive speech. What we are tackling is we’re trying to give control over how these users interact at work.” There’s no crying in baseball, and there’s no fun and games in the call center, either.
“Insight Partners is thrilled to deepen its relationship with Sanas on such cutting-edge and transformative technology,” said Ganesh Bell, an MD at Insight, in a statement. “As the company comes out of its stealth phase, I look forward to working with this immensely talented and passionate team to build a product that will, among many things, help eliminate the unfortunate biases and discrimination experienced by those who speak English as a second language, which includes many of Sanas’ employees.”