> When I first ran into this issue back in 2017, I posted in the React issue tracker that I had ”fixed” my app by blocking translation entirely.
Please do not do this! In almost every instance I've encountered severe Translate-related broken-ness, it's still worked well enough to get me a snapshot of the current page translated. Fighting through this is still less cumbersome than the alternatives.
> The only alternative solution that I can think of, is to implement your own localization within your app (i.e. internationalization)
I will add, please make sure that language is an independent setting, and not derived from locale! I sometimes have to use translate on sites that have my preferred language available, but won't show it to me because it's tied to locale and that changes other things that I don't want, like currency.
On one such site I used a browser extension to rewrite the request for the language strings file.
I had to learn this the hard way when a React app I built showed random crashes I couldn't explain.
"Fortunately" it also auto-translated the "I happily accept the terms of use" checkbox in one case into (back-translated) "I happily dying die perish", which also couldn't be clicked. That lead to a very high priority ticket and made us realize that all DOM manipulators might break the site.
Very early on in my career, I was working on a greeting-card app in Facebook (back when Facebook apps were a thing).
Got a bug report from one of our own team members that some of her greeting cards didn't show up in the list. The link appeared, but no image. We figured out that the difference was she was running an ad-blocker. We couldn't figure out precisely what rule the blocker was applying, but it seemed to be:
- image
- within some particular size bounds
- with the consecutive letters 'ad' in the URL.
... and we were using hexadecimal encodings to track individual entities in the UI.
We solved the problem by replacing 'a' with 'g' in our hex encoding. And then I had to take a long walk and accept that if I was going to do web development on the public Internet, I'd be sharing the space with intentionally-modified user agents forever, and would have to account for every such modification as we discovered it.
I still won't run my own ad-blocker for this reason.
> I still won't run my own ad-blocker for this reason.
I maintained an extension for a public website for a couple years. (It did things like, for example, adding information that was available in the API to the page, for power users.) I eventually gave up with the conclusion that the concept of a browser extension was fundamentally unsound. So I also don’t use an-blocker.
Which is why a decent ad blocker has the option to selectively permit things it thinks are ads. Without blocking I've had multiple occasions of encountering a website that was completely unusable, it would be completely overlaid with ads as soon as they loaded.
So the fundamental problem is the DOM is too inefficient to do applications on it. No surprise there, considering the original design of HTML was for presenting information, not for interactive applications.
I think the fundamental problem is you have two applications, the primary application and google translate altering the application state of the primary application at the same time, without any possible communication between the two regarding locking or alterations or anything.
I'm pretty sure most applications in the history of computing would not fare any better if you constructed that situation.
Both Google and React are guilty here if i read the article right.
Google replaces an element with a different element (Text with Font containing Text?), and React's virtual DOM keeps the old, deleted elements alive because the virtual DOM still references them.
React "applications" would crash when Google Translate changes their stuff from under them if they didn't accidentally keep the old elements alive. Which would be much better behaviour.
They both do reasonable things, so I wouldn't really blame either. Google Translate was there first and is a big accessibility advantage for the web. At the same time, Google Translate is the user-specific browser extension that is executed last, on top of existing apps. It affects not just React[1], so even if React were to implement a fix, Google Translate would continue to interfere with other webapps.
I think any real fix to Google Translate would be very complicated. I fear the only solution might be to elevate Google Translate to be part of the browser's rendering engine instead of acting like an extension. This would allow it to work in the rendering pipeline without modifying the DOM, but even that will probably run into site-bugs because of things like text being longer or shorter.
[1]: https://martijnhols.nl/blog/everything-about-google-translat...
> Why is an extra element required, and why is it <font> of all things?
I don't know, but perhaps due to the fact that due to the CJK unification in unicode, rendering Chinese or Japanese without explicitly setting a font designed for that particular language can output incorrect characters (of the other language, which are considered "the same character" despite being different). Thus, a translation tool would have to explicitly set a font in order to display these languages correctly in a reliable manner, because the surrounding context certainly cannot be assumed to have the appropriate font. And I could easily imagine that someone would choose to keep the same code path for all languages instead of branching for this particular case, resulting in a <font> even for languages other than those two.
Chinese/Japanese -> everything else.
They're so terse, two Unicode characters can be like 10+ letters lol
Sometimes I translate to understand the page then refresh to use it unborked in the original language
That doesn't track with me. The React application is the website. It should be able to run while expecting some other third party thing isn't going to dig into the internals of its view and modify those internals.
It would be fine if Translate was just modifying text, but changing the actual structure of the HTML goes too far.
I mean they have to keep the old elements alive because that is the data they use to render to the DOM.
What React could do is to catch that there have been changes made to the structure of the DOM by someone other than them and then re-render the full page. Which would probably not be the most performant solution for anyone.
But anyway then people would complain that React was breaking Google Translate.
Essentially you have two applications fighting to control rendering of the application state of one of them.
Yeah, Google translate shouldn't have to "research"/reverse engineer, what kind of framework is being used on any random website it translates. It assuming, that it simply interacts with static information would still be a reasonable assumption. While Google translate is also at fault, if it changes the DOM structure. Why not leave things the same and just exchange textnodes? Seems silly.
I am just supposing, but have not checked, that if they change the text they must also change the DOM by at least changing the lang attribute on nodes affected, meaning probably the lang attribute on the html element, but could also be a lang attribute on each element wrapping a textNode.
on edit: I figured the article must have said something about this and I missed it, and yes, it shows that the DOM is changed to be lang="nl" on the html element, which means obviously if React rerenders but does not rerender the HTML element (which many React applications do not control) then the language would be out of sync, of course.
The primary application is the browser.
The problem here is that react "application" builds its own state of the web page, fails to reconcile it with changes to actual state, ends up in a detached state with stale information and then proceeds to alter actual state based on the corrupted information it has.
I disagree. The browser is a VM that runs other applications, in this case one that's written in React.
Then the browser, via an extension, is corrupting that application's internal state and is then surprised when that application stops working correctly. Well, duh.
If Translate were to only modify the text on the website, I'd think React would be able to deal with it better. But it seems to be modifying the structure too (adding <font> tags); I think it's not reasonable to expect every JS framework to be able to deal with that properly.
The browser extension is not touching "application's" state whatsoever. The browser extension is, however, making a perfectly legal store on a shared system resource (if you consider browser to be VM), but the application ignores those modifications and continues to issue state modification calls computed from it's own, now long invalidated, internal cache.
> I think it's not reasonable to expect every JS framework to be able to deal with that properly.
It would be unreasonable to call any framework out of pre-alpha prototype stage, much less production ready, if it can't handle such basic stuff, sorry.
I disagree with that. Fundamentally, it comes down to who is in charge. Does React control the DOM or does the DOM control react? I see this as a cache concurrency issue--React is trying to cache the DOM and breaking when that fails.
either that or the browser is the new OS, as people keep saying.
If there is such a thing as web apps, then the primary application is the web app, and that primary application runs in the browser.
As a concrete example: I'm racking my brain trying to think of any instances of a working realtime whole-desktop translation overlay application, and I don't think such a thing exists.
Translators for specific chunks of the screen, yes. Selectable translators, sure. Translators you can configure to work with one application at a time.
But rewriting all text in the entire windowing environment? While preserving selection, copying, and editing? Without hooks to the underlying apps? Functionally preposterous as a proposition.
I doubt it. Say you write a native app and some other side app swoops in and changes all your UI state while running. That is going to cause problems.
But only, if the DOM is used in other ways than being an output of whatever is done to calculate its update. If the framework uses the DOM for other things than directly updating it or respresentation of internal state, then that's on the framework. Is it reasonable to assume DOM itself is part of the state? Would it not be more reasonable to have an internal state? But we have this with virtual DOM. So maybe the issue is in the way it makes use of the DOM in conjuction with virtual DOM internally. Maybe it is an optimization "hack" that goes badly in this scenario.
That sounds pretty arbitrary. Upon what grounds do you feel it's appropriate to say that the DOM is only allowed to be used in the way you describe?
Disagree. I've swooped in on programs and done major changes to their filesystem behind their back. Programs that had no concept of working with externally supplied data in a network world. The programs happily went along doing their job, completely unaware that every label was a phantom whose meaning would always change if selected. The contents were appropriate (everything was simple CSV), the proper commands would be sent to the attached machinery.
It's the responsibility of the swooper to ensure everything's put back sane.
That hasn’t been true for at a minimum half a decade now and much longer in practice I think.
> No surprise there, considering the original design of HTML was for presenting information, not for interactive applications.
Even in this we’re just HTML displaying information, there’s effectively this second application (Google Translate) changing the structure of the original application’s XML, which would still break display (or XPATHs) in a “normal HTML presenting information” scenario.
No. It's not an issue with efficiency at all.
That is not the case anymore.
React and many other SPA frameworks use an additional virtual DOM which gets mapped onto the real DOM. This used to be faster 10 years ago and allowed for a unified interface.
Any addon manipulating the DOM forces the virtual DOM to go out of sync thus crashing the app.
As shown be the likes of Svelte, the virtual DOM is just legacy modern browsers are fast enough to get by without.
Virtual DOM or not doesn't matter, even Svelte has the potential to be disrupted by these Google translate shenanigans since it manipulates the DOM.
Actually it seems they got hit also, https://github.com/sveltejs/svelte/issues/15090
It’s not that modern browsers are faster - Svelte is a different approach and figures out how to update at compile time rather than using a runtime virtual DOM. 10 years ago it would also have been faster
Another solution would be a React-native machine translation implementation that updates the TextNodes without replacing them. It would still have the issues of merging adjacent nodes to get a proper translation, but at least it could update on any state change.
One idea that crossed my mind while reading the article: for websites that already use react-intl, have react-intl implement an API to allow supplying machine translations of messages into languages otherwise not supported by the app.
The problem with this is that it will only help sites that already put in effort to internationalization. Whereas the main target of Google translate are the sites that do not bother with i18n.
Still, it'd be quite valuable to the sites early in their internationalization journeys to get support for tons of languages right upon introducing internationalization.
that requires google to care about this issue
I ran into this. We worked around it with solution 2 from the article i.e. never render text by itself next to another element, always wrap the text in it's own element. Not that much of an inconvenience since we have a Text component that controls font size etc anyway.
How does Firefox translation interact with React?
Not sure if this has the same root cause, but some websites 'break' with Firefox translate, as if the visible text in an HTML element is somehow being used as an identifier in the website's application logic.
I don't like the way React updates subtrees. Other frameworks get it wrong, too, by using the same incorrect model. Employing professional opinion, it's just wrong. The document should be considered the source of truth, not some internal private state.
e.g. Input values on the HTMLInputElement are the real input, not some clone to a private object in JavaScript.
As a result of React's blatantly wrong treatment of the document.body, you have ensure that when it reuses element siblings within an arbitrary tree, that it's values are squashed to whatever private fields you're using in your component.
It screams wrong, and side effects like the one in this article make it obvious why.
No one is going to go out of their way to touch your special internal state, we're all going to use the web API to touch nodes and events from standard interfaces. You can't take the ball into a private court and expect the rest of the game to function.
The substack.com is a typical example of a problem where the Google Translate extension can crash or cause parts to explode.
This is essentially a bug in React VDOM which blindly assumes it is the only one updating DOM nodes. Imho, it's long overdue to remove VDOM from react as other renderes have shown that it's not needed for performance.
Knowing Google they'd build a private extension of the Web standard to fix this in Chrome.
Just out of curiosity, what web apps are effected? I tried to find the "other web apps" and can't find anything (quick scan of the article)
Anything that affects the DOM and relies on TextNodes behaving predictably. It could be as simple as `e.target` of a click event being different (the `font` element gets in between what was actually clicked), but the main issues are when apps try to update or replace what used to be TextNodes.
Imagine you're building a framework, and a consumer renders a clock. The only thing that changes every tick, is the text value; `00:00` becomes `00:01`. In an attempt to be as efficient as possible, it's only natural for the framework builder to decide to keep a reference to the `TextNode` and only update its `textContent` every frame. This scales the best for even the most complicated app, but it leads to interference from Google Translate as the article shows.
It strikes me that a straightforward solution to this problem would be to have Google Translate dispatch a new CustomEvent with a particular "type" and a reference to the new element in the detail field. So React and other frameworks could listen to this event and instead of dealing with Text node X, they could instead refer to the translated element Y.
I use Google translate all the time, and I definitely noticed you have to kick it to update.
The real fix is to have a new web feature which is to have the 'page dom' and the 'display dom',.
By default there would be a 1:1 mapping, but things like browser extensions could write code to define how a particular bit of page dom would display to the user.
React is not the right way to build webapps, so I see no problem here.
How about react get on with times and support everyday needs of people instead?
So the web is basically broken. Can WASM provide a solution?
Last part of the post hints at a possible future for SPAs: wasm. No DOM modified by browser extensions so no surprise.
Or switching back to desktop apps? Also no DOM manipulation there. :-)
WASM enables us to bring what we were doing there to the web. The distinction always was a bit artificial.
This notion that you can only have DOM/CSS/Javacript on the web did not age well. There's a whole generation of programmers that built their careers on targeting that and are confusing that status quo with something that is set in stone for good reasons. Those reasons never really existed. Javascript was a bit shit but it was there so people used it. Fast forward 30 years and you still have people proving that point on a daily basis by creating very mediocre and underwhelming things with it.
WASM opens up the web to 30 years of progress in UI development elsewhere (mobile phones, game consoles, VR/AR, desktop, etc.).
What people will do with that is of course an open question. There are a few frameworks emerging but they are still kind of niche. And there are lots of attempts to bring retro UIs to the web unmodified. Links to e.g. Winamp in a browser, VB 6 running in a browser, etc. are easy to find. Some people even boot entire operating systems in a browser. I think I came across windows 95 at some point. A few versions of Linux, and some other stuff. Cool, but I'm more interested in new stuff.
The web has bit of an imagination deficit. Creativity on the web mostly died along with Flash. HTML + Javascript never managed to fill that void. Just the wrong tech for that job.
> Creativity on the web mostly died along with Flash.
Even being anti flash sites at the time, I can't deny it allowed many fun experiences with browser games which look absent today.
Therefore, I prefer svelte (besides superior ergonomics & web-standards compliance): It's not a framework, but a compiler that outputs only pure JS. Svelte simply has no virtual DOM that can be messed up. Just Simplicity & efficiency.
You are confused about the issue, and the OP does its part in contributing to the confusion. It's not a VDOM issue, it's not React exclusive (that part the post is explicit about) and indeed Svelte is affected as well: https://github.com/sveltejs/svelte/issues/15090
When seeing issues like this one pop up with React in the title, one should really have a good think whether this is solved principally different in other fws OR, and this should be the null hypothesis, is React in the title because it is more widely used than all the others combined
Fair. The svelte sites I tried it out on had no issues, so I assumed it might be limited on React.
Svelte breaks just the same...
Go, get down off your high horse and try it yourself, finish the counter in their tutorial, put a console log in the handler, and translate the page to French...
C'est tres borked.
But wouldn't it be the same when Google translate is actively replacing nodes?
[flagged]
> I will add, please make sure that language is an independent setting, and not derived from locale!
Websites already have exactly what they need to provide you with the language you want via the Accept-Language header your browser sends. In your browser's settings you can configure a list of languages (country-specific if desired) which get send with every request.
E.g.,:
(Prefer British English, fall back to any English, and if not available either use Dutch.)This is already entirely separate from your OS locale! Although it will default to filling it in with that locale's language if you don't configure it yourself of course.
This should be the primary way to decide upon a language, but in addition to that offering a way to switch languages for a specific site on that site itself is a user-friendly gesture appreciated by many.
This is not true. E.g., Safari is tied to the OS settings, Firefox has some dependencies regarding the locale of the first install, etc…
Moreover, probably most people speak or can read more than a single language. There may be reasons for accessing a site in a particular language other than the standard locale.
Please empower users to make their own choice! Do not assume to know better.
For example, when the translation is shit and you prefer to use the English one because the one in your language is impossible to understand.
This does not help in many many situations.
I am a Hongkonger, natively speaks Cantonese, fluent in English and learning Japanese.
If I go to Google I want English UI and prioritise traditional Chinese result then English then simplified Chinese.
On the other hand if I go to a Japanese website, I don’t want them to translate for me, just display the original Japanese will be fine. Unless I toggle.
These kind of complex setup can never be achieved if we don’t have a per site locale policy. And seriously. A toggle per site is easier than navigate three level deep into browser setting page.
> A toggle per site is easier than navigate three level deep into browser setting page.
I don't disagree with your overall point, that flexibility is useful for website visitors, but your statements requires asking the question: "easier for whom?"
Certainly relying on Accept-Language is significantly easier for the website maintainer. And overall it would be a lot easier if the small handful of web browser maintainers added saner settings (even a per-website Accept-Language toggle), than if we were to require the thousands (tens of thousands? millions?) of multi-language website developers to provide their own toggle. Not to mention having a standardized way to do manage this would be better for users than having to discover each website's language toggle UI.
But sure, we don't have those easy-to-use browser settings, so it's (unfortunately) up to every website developer to solve this problem, over and over and over.
(As an aside, it would be cool if websites could return a hypothetically-standardized Available-Languages header in their responses so browsers could display the appropriate per-website UI, with only the supported languages.)
The problem is when you understand multiple languages.
If a website is made by an English speaking team, as I understand English I'd like it to be English first and not a possibly broken French version. If a website is developed with French language first I'd like to have it in French and not a second-rate English translation.
> In your browser's settings you can configure a list of languages (country-specific if desired) which get send with every request.
Customising this list at all makes your browser fingerprint thousands of times less common than it was before you did this, and many websites you visit could then probably uniquely identify you as the same user across all of your sessions.
That and a thousand other things. A highly privacy focussed browser could offer to enable this setting only on whitelisted websites (and send 'en' plus a bunch of random language codes on others).
Not if the random part of the list changes with each request.
There are two ways to defeat fingerprinting: 1) make everyone look exactly the same (pretty difficult to do), or 2) introduce enough noise and randomness to fingerprinting signals to each request so that each person looks like many different people (much easier to do).
> This should be the primary way to decide upon a language
Google developers are very intelligent, but not intelligent enough to understand this.
They probably understand it just fine. Someone higher-up has just over-ruled them. There may even be a good reason for it, but because of the way companies work, we will probably never find out what it is.
Almost no website uses this, even big ones like Google who insist on showing me pages in German rather than English or French.
On the other hand, sometime the ads that are shown are in German. Easier to mentally filter out.
> that changes other things that I don't want, like currency.
Oh god Google is so bad at this. They don't even let me change the currency in many cases when looking at hotels (yes on the website; not in the Google Maps app)
Google is so ridiculously bad at this, when an account that only ever uses English is explicitly asking for English search results, but happens to be located in Thailand, it will give you English results, but use the Thai calendar to display years, which is 543 years ahead of the Gregorian calendar. Are there any people at all who expect to read English text but expect to see the year 2568 instead of the year 2025, when no part of their system or account is configured for Thai?
I'd have no issue leaving translation enabled, if the translator was an optional feature that the user must opt-in to, and that's clearly communicated as something not controlled by the developer.
But I've received reports from Edge-users that didn't even know translation was enabled.
Yeah, I agree that's problematic. And I would have no objection to implementing a UI feature that displayed a warning banner of some kind if it detected that the page had been translated.
When you say locale, you mean your current location, e.g. as detected by geoip?
A locale is a combination of several things, including a language, but not only.
E.g. I'm from portugal. I'm visiting an american site, which does not have professional portuguese translations, but does have auto-generated ones.
I don't like the auto-generated ones and can read english just fine, so I want to have the language set to english (en-US in this case).
However, I still want it to apply some locale-specific things from Portugal, e.g.:
- Units (Metric vs. Imperial vs. Whatever mess countries like the UK do)
- Date formatting (DD/MM/YYYY vs MM/DD/YYYY)
- Time formatting (AM/PM vs. 24hr clock)
- Currency formatting (10€ vs. 10 € vs. 10 EUR vs. €10)
- Number formatting (10,000.00 vs 10.000,00)
- When the week starts (Monday vs. Sunday)
If you take a look at the windows locale options, it mostly lets you mix-and-match whatever you want (which is great! Now if only the MS apps let me stop using the localized keyboard shortcuts...): https://learn.microsoft.com/en-us/globalization/locale/langu...
Locale I'm using as a shorthand for "the bundle of variables that your service or business needs to tweak between customers in different markets". It may determine things like currency, date/time or currency formatting, or relevant regulatory framework. My argument is that language should always be sett-able independently of the other variables locale controls.
For an example of a site that almost gets it right, see https://www.finnair.com/ . You are first prompted to set location, and then language. I say "almost" because although they will allow you to select English in any market, they won't allow you to select any offered language in any market.
In comparison, https://www.flysas.com/ you get one dropdown which sets market, currency, and language in one go.
Sometimes, but not always. Sometimes it is also based on the locales in your browser.
It means system/browser settings like the one available in navigator.language.