FEED Autumn 2022 Web



What’s your origin story? My name is Jesse Shemen and I

am the CEO of Papercup. We founded Papercup in 2017 to solve a decades-old problem: the billions

human-in-the-loop system, we made a platform to automate large parts of the dubbing procedure. We assembled the best possible team, bringing in renowned speech processing and synthesis experts, Simon King and Mark Gales, as advisors – and outlined a plan for running our model to scale. It’s been amazing to watch the company go from strength to strength. What is the company working on right now? Content translated by Papercup was watched by 300 million people last year – and we’re building on that momentum. Lots of energy goes into enhancing our ability to control the expressiveness of our voices, to meet the demands posed by quirky comedic content and emotional nuances of quality dramas. While we’ve already translated more than 1500 hours of factual content for Sky News and several seasons of Bob Ross’ The Joy of Painting , our ambition is to grow and tackle emotionally varied material. Expanding into new languages is a priority, and we’re improving our German system. Employing the human-in-the-loop method, in which translations are bolstered by experts and native speakers, our content delivers realism currently unmatched in the market. What is the next step? From a commercial perspective, we’re aiming to start operating more prolifically in markets where we’re confident the technology will

of hours of video content trapped in a single language. It was clear to me that the global dubbing industry was constrained because it couldn’t support the volume of requests. That issue has been compounding year after year, resulting in enormous pent-up demand. The idea that every citizen is suddenly being given a platform to make their voice heard on a world stage isn’t matching reality. Speakers of many languages are locked out of this mass exchange. With traditional dubbing costing up to $10,000 per hour, per language – sometimes even more for select languages and voice artists – the process prices everyone but the wealthy studios out of the future of digital content. I met my co-founder Jiameng Gao through Entrepreneur First, a company builder that helps people looking to found start-ups, and we bonded over this shared concern. Jiameng had studied machine learning and speech processing at Cambridge, and wrote a thesis focused on neural machine translation and speaker-adaptive speech processing. He put forward the idea of building a technology to solve the ‘dubbing problem.’ The first challenge we solved was creating synthetic voices better than anything else on the market. Then, by marrying this with auto transcription, machine translation and the proprietary

excel. Many people don’t realise the untapped value in certain content that has never been considered for localisation. Papercup will lead the way in unlocking it. On the research front, one of the most exciting plans is to nail down our real-time translation services. Once these are ready for market, we’ll be able to translate live news, sporting events and even presidential speeches as they happen, allowing for even more shared experiences. What one thing does the business need most? Data, data and even more data. Top-quality output can only be guaranteed by a vast reserve of reliable input. We use a mix of third-party training data, as well as commissioning our own collection processes. But the reality of our tremendous ambition is that we require large quantities of varied voice and annotation data. Given the vocal complexity of even the most monotonous human speaker, our machine-learning engine needs the best fuel possible if it is to reproduce the most iconic voices. We never said this was going to be easy; only that it would be generation-defining.


Powered by