Launch HN: Pinch (YC W25) – Video conferencing with immersive translation

72 points5 months

Hey HN! I'm Christian, and my co-founder Keyu and I are building Pinch (https://startpinch.com), a virtual conferencing platform with translation that mimics your voice and synchronizes your lips in real-time to make you sound and appear as a native speaker in over 15 languages.

Here's a demo: https://youtu.be/Cu7KlbZ3gjw, but you can also try it for free on our website.

Over the last three years, Keyu and I were working in a company where we had to lead engineering and research teams across the U.S., China, India, and Europe. We felt the language barrier actively limiting our team's potential in terms of collaboration + productivity. The existing tools we tried operated in low-bandwidth mediums (mostly text), which 1) means they are slower because they need to convert audio to text before translation, and 2) lose all information of how something was said.

At that point we knew there had to be a better way to connect across different languages and cultures, so we started building Pinch. Shortly after, we found out how challenging translation truly was. Balancing latency/accuracy for chunk-based audio translation, capturing inflection and tonality per statement, handling culturally specific phrasing, and making a seamless meeting experience are all unsolved problems we're taking on.

So far, we've seen some really interesting use-cases (many we hadn't considered!), from personal connections like a first conversation with foreign in-laws, to more business-oriented usage in sales and meeting foreign clients.

After a long experience building conversation AI video/audio, we're incredibly excited to see what these same technologies can unlock for human<>human communication.

You can try a demo or create a meeting for free: https://startpinch.com

All feedback is appreciated, and we'd love to know how we're doing on the overall meeting UX and translation accuracy for your language. Thanks all!

skeeter2020 • 5 months ago

>> The system continuously learns and improves from usage while maintaining privacy and security.

Are you training your own translation models? or using third-party services?

>> >> Think real-time translation + natural expressions + perfect lip sync.

not yet, based on the demo.

skylerwiernik • 5 months ago

Cool idea, but just watching your demo it looks like it doesn’t work. Is there any change in the video? The lip movements certainly don’t look synchronized, and audio often continues after the person stops talking. It also doesn’t do any audio mimicking. It really doesn’t look like it does anything that Google Translate doesn’t.

christiansafka • 5 months ago

Appreciate the feedback. On the video side, we currently synchronize it to play out with the translated audio (as often as possible), matching when you started speaking to the moment the translated audio starts. Mentioned in another comment but we're still working on audio mimicking (voice clone then inflection transfer). Our model does a lot that Google Translate doesn't, even just around translation, such as taking into account who you're talking to in the meeting and the conversation context. + we have to do it much faster, so smaller audio chunks at a time!

tpae • 5 months ago

I really like the concept, but I don't understand why you guys are building an entire video conferencing platform. That sounds like years of work building the network and millions of VC funds. It could be a standalone app that exports video to existing conferencing services. I would pay good money for that.

christiansafka • 5 months ago

Thanks! We have a virtual camera on our roadmap as well, but by building the conferencing platform end to end we can optimize both latency and conversation UX to a much higher degree. We're also lucky to be building this now and not five years ago - there are some solid webrtc infra companies and open source projects to build on.

btown • 5 months ago

Curious about this approach! It looks like you're connecting to https://livekit.io/ under the hood - do you have custom server side components intercepting the video stream and doing video analysis & lip movement patching there?

https://news.ycombinator.com/item?id=41743327 - looks like they do quite a bit of heavy lifting out of the gate!

AznHisoka • 5 months ago

Would companies switch their video conferencing solution to yours, or do you envision them using both side by side?

christiansafka • 5 months ago

We're hoping companies with international teams will switch over fully (we have internally), but our initial goal is to attract a subset of the market that has cross-lingual needs and unblock them as much as possible from using it more.

elwillbo • 5 months ago

Would love the virtual camera - will be on the look out for it to arrive

michaelmior • 5 months ago

By export I assume you mean as a virtual webcam? I would definitely prefer that as a user to be able to use any videoconferencing app.

davidz • 5 months ago

It's really cool to see how you guys are using the voice AI stack to overcome language barrier.

(btw I work at LiveKit, so let me know if we could make Agents easier to use for your use case.)

bschwindHN • 5 months ago

Just nit-picking, but in the Japanese section the subtitles translate to "You can speak Japanese" but the spoken content translates to "Please speak Japanese".

Are the subtitles (and subsequent conversation transcripts) consistent with what is actually vocalized, or is this an artifact of a manually-edited video?

VenturingVole • 5 months ago

This is one of the coolest things I've seen, as an English speaker who moved to Brazil and firmly believed that technology was going to break down barriers to global access: You're making it happen and helping level the global playing field.

In my situation - you've just given a way for a highly enthusiastic Spanish/Portuguese speaking apprentice software engineer I'm mentoring/training to to actually have the capacity to communicate with a friend/client in near real-time in a more human way than I've ever seen before.

Yes, they're learning English (at same time as software engineering) but you've just made connections more tangible and real.

Love it!

FlamingMoe • 5 months ago

I work with a lot of overseas developers who speak with thick accents and sometimes it can be very difficult to understand them or for them to understand me. I could definitely see this being a more pleasant experience for everyone.

lolpanda • 5 months ago

Great idea! The demo looks impressive. What are your thoughts on real-time translated captioning compared to AI voice? I guess it's still difficult to mimic nonverbal elements like laughter and pauses.

christiansafka • 5 months ago

Fantastic question. Our opinion on this is that the higher-bandwidth we can make the communication, the more useful it will be. The reason we've moved from IRC->VoIP->Video is because of the efficiency of information transfer and additionally the empathic element of face-to-face conversation.

From the technical side, speech to speech models have more potential for accuracy (no explicit ASR, no audio->text information loss). We have a few options on mimic'ing nonverbal elements - we could decide when to naturally mix in the original audio, or train our end to end model to handle those nonverbal audio chunks. We'll be trying both but likely the first option on the sooner side!

nhod • 5 months ago

Very cool guys. I am beyond excited about this — I can see how this could transform certain projects and relationships — and it is useless if it doesn’t work in the language people need.

What languages do you actually support? The site says “20+” and “over 20” and there’s even a FAQ entry listing a handful of them, but it doesn’t list all of them. What is the thinking in not just listing all of them?

alloysmila • 5 months ago

Just this morning I told myself I should build something like this. I work in global supply chain and the language barriers are an absolute mess.

christiansafka • 5 months ago

It's hard, don't do it :D We have a few supply chain companies trying us out though! Would love to hear more about your experience.

asimpleusecase • 5 months ago

Please just keep a list of supported languages on the site. The FAQ only gives a few and the bubble on your site that says more than 29 languages was not clickable on my iPhone - I won’t go any further until I can see supported languages - bonus points if you listed languages that are coming - in order of future of availability

brap • 5 months ago

Assuming the tech is solid, I think that if you had developed this as a browser extension to work on top of Meet/Teams/etc, not only would your dev time have been much shorter and adoption much faster, but Google/Microsoft/etc would have probably bought you out in a blink of an eye.

debarshri • 5 months ago

I am not sure if the dev time would be shorter because teams and meet have its own nuances as well as you would be limited to what you could be by the tool itself. Also, i dont think you would go into every call with this plugin on.

This is very valuation where communication barrier is high and has specialized usecases in industries like supply chain, outsourcing.

hassleblad23 • 5 months ago

Getting bought out is not a bad option here.

brap • 5 months ago

Yup that’s what I’m saying

bongwater_OS • 5 months ago

Hey just a heads up the demo on your site is broken (for me). English transcriptions are coming through fine but translations aren't being spoken, despite the output video stuttering for a moment at the time when it should.

christiansafka • 5 months ago

We noticed that Swedish isn't currently working properly, but we weren't able to replicate this with any other languages. Please let us know if it's still having issues!

instagary • 5 months ago

Congrats, really cool idea! You should add the demo video to your website in addition to the interactive version.

Jingyuan_Design • 5 months ago

Kudos to the Pinch team for tackling such a challenging yet crucial problem. Excited to see where this goes!

Aspos • 5 months ago

Impressive demo! Note that on 01:46 it says "You can speak Korean" in Ukrainian lol.

elixirnogood • 5 months ago

are you guys using livekit for webrtc? If yes, are you using livekit agents as well?

christiansafka • 5 months ago

Yes! LiveKit is great - and we are using livekit agents but had to override a few low-level library components for our use case.

elixirnogood • 5 months ago

Do you have any concerns around scaling? I like LikeKit stack, but if not mistaken their agent architecture is based on multiprocessing (one os process per 'session'/'conversation') which doesn't sound very scalable. Btw, great demo, this is a cool technical problem to solve. I've spend a couple of months in this space (using a similar stack) and know for a fact that's not easy.

christiansafka • 5 months ago

Thanks, there are certainly a lot of fun and challenging problems to take on in the space. On scaling, the agent architecture isn't limited to one machine, so you can also autoscale your machines. It's essentially python's Celery if you've tried that. It gets more tricky when you require GPUs though!

elixirnogood • 5 months ago

the demo doesn't mimic my voice unless I misunderstand 'mimic'

christiansafka • 5 months ago

I didn't make this clear enough in the post, but we're still working on voice cloning and inflection transfer. Voice cloning is easier, but to support inflection transfer we have to modality-align an LLM.

aloukissas • 5 months ago

What does real-time mean here? Would it work e.g. in a live stream?

christiansafka • 5 months ago

Yes, right now we're ranging from 0.75-3 seconds for the translation to start, and we're hoping to move the average time lower with our next updates. There will always be some limitation to how fast we can translate (different languages have different sentence structures and phrasing), but for livestreaming usually you'd have even a bit more wiggle room for the latency.

Also in case you're interested in the logistics of using us for livestreaming: If our current platform won't work for your use-case and you need to use OBS + a virtual camera, it's on our roadmap.

zachanderson • 5 months ago

Ofc i fully associate "video conferencing" with a "Pinch".

Wake up you SV product manager dorks! Lazy effort in naming things!

motoxpro • 5 months ago

What would you name it? Something that explains exactly what it does in the name like Hulu, Apple, Snowflake, Oracle, Google, Ford, or Bungie?

AznHisoka • 5 months ago

Pinch is the worst of both possibilities. It doesn’t describe what it does, but it’s also not a catchy memorable “brand-like” name like Google either.

Tinkeringz • 5 months ago

Slack, zoom, Apple etc were really successful without that mattering at all

stupidhammer • 5 months ago

[dead]

lefstathiou • 5 months ago

Telemedicine!

adltereturn • 5 months ago

cool! I need this.

zachanderson • 5 months ago

[flagged]