Hello,
Our community has decided that machine translation in Content Translation Tool should be completely disabled.
Hopefully, our decision will not be overridden.
Homo_ergaster | |
May 3 2024, 7:47 AM |
F51346339: monthly-translations-by-user-edit-count-bucket-2024-05-09T09-35-29.614Z.jpg | |
May 9 2024, 10:05 AM |
F51346500: translations-across-all-wikis-2024-05-09T09-37-33.342Z.jpg | |
May 9 2024, 10:05 AM |
Hello,
Our community has decided that machine translation in Content Translation Tool should be completely disabled.
Hopefully, our decision will not be overridden.
Thanks for sharing the request. With disabling requests we want to be especially cautious. While some editors may find issues with the feature, being a frequently used feature suggests that other editors may find it useful.
We are happy to adjust the tool in the way that it better serves the whole community.
To add some context I'll share some data. Data never tells the whole picture, but with your knowledge of the Lithuanian community we can get a better understanding from it (and be helpful as a reference after making any adjustment).
Lithuanian Wikipedia is one of the few cases where the articles created with Content Translation are deleted more often that those started from scratch without using the tool. For example, during last year (2023), 28% of the articles created with Content Translation were deleted, while only 5% of the articles created from scratch were deleted. This is not the case for most wikis, and it would be great to understand which particular factors may influence into this (by the request, I assume that the low quality of machine translation may be one, but would be great to hear more details on this).
Looking at the recent activity (last 2 years) I'v enoticed a particular spike of activity in February 2023 with over 100 translations where the regular number of translated articles in other months is usually in the 20-30 range. Does anyone know what may have caused this spike? Were any big events/campaigns happening at the time?
Looking at the distribution by user edit count, was driven by experienced users ( users with more than 10K edits).
Based on the above, it makes sense to apply some adjustments. We can consider:
Feel free to share any additional details which will be very useful to support not only Lithuanian but also other communities that may be in similar situations.
Thanks!
@Pginer-WMF: Thanks for a more detailed look into this. Let me provide some context behind this decision of the community:
The request is not necessarily about complete removal of the tool. However, encouraging new users to use it is counter-productive and we argue that it should at least be hidden or disabled by default. We can still retain the option to enable the tool in the "Preferences -> Appearance" section. It would actually be useful for our sole productive user of the tool. In general, those users who can contribute meaningful content are also more likely to figure out how to enable some tools.
P.S. From a quick glance, for the sample text, the NLLB-200 model produces significantly worse result than Google Translate (it didn't even spell "Jazz" correctly).
Some auto-translated articles are created by foreigners who often write about their villages, local celebrities and so on. I'm not sure about the spike you mentioned, but it's likely that a significant part of the articles were written by users who don't speak Lithuanian. Also, I'm under the impression that many auto-translated articles are written by children or teenagers who just play with the UI and don't have any interest in sticking around and actually learning something.
Indeed, we pretty much have only one editor who has been producing good enough articles using the Content Translation tool. Given the circumstances, it would be reasonable if the tool was disabled by default.
And even that user (https://lt.wikipedia.org/wiki/Naudotojas:ArunasG) rewrites a lot of text before posting it (compare https://en.wikipedia.org/w/index.php?oldid=1211409903 and https://lt.wikipedia.org/w/index.php?title=Messerschmitt_Me_321_Gigant&oldid=7267980), then edits it a lot, and the result is very average (plenty of syntax, hyperlink errors which others should edit).
The translator is useful only in two cases: when you have very similar languages and need to change text a little (I used it to translate articles from Lithuanian to Samogitian (:sgs:/bat-smg) which is a dialect or a very close language) but Lithuanian has no bigger similar languages (English, Russian, German, Spanish, all the big languages from which one could translate articles are significantly different) and closest one, Latvian, is quite different, an rarely anyone speaks it in Lithuania. Second case is importing any sort of big data sets like tables, templates where you want to copy formatting and translated hyperlinks. But even that doesn't work good - it is impossible to transfer only a part of an article which only contains the data set, also, it doesn't work if your article is in List namespace.
To sum it, the translator could be sometimes useful but only to experienced users and it helps nothing for new/random users. As I monitor many small wikipedias I see that it is mostly used by spammers who want to promote things in languages they don't speak, or some irresponsible users who want to boost their wikipedias article count. I had to delete bunch of such bad autotranslations from Guarani Wikipedia (:gn).
It's been quite some time since our community requested to disable the Content Translation tool on the Lithuanian Wikipedia. Are there any technical or "political" obstacles to the implementation of our request?
Based on the above I can see two possible course of action:
Each has their own pros and cons, but I think we can start witht he original approach requested (A), to disable MT.
@KartikMistry feel free to disable MT in CX for Lithuanian as a target language, when you have a slot available.
We just released an MT usage analysis report, future ones could help to identify some of the effects and inform future next steps. In addition, input from the community will be appreciated to make further adjustments (if MT can be useful for some related languages, etc.)
Change #1083292 had a related patch set uploaded (by KartikMistry; author: KartikMistry):
[operations/mediawiki-config@master] Disable MT in Content Translation on Lithuanian Wikipedia
Change #1083292 merged by jenkins-bot:
[operations/mediawiki-config@master] Disable MT in Content Translation on Lithuanian Wikipedia
Mentioned in SAL (#wikimedia-operations) [2024-10-28T07:45:31Z] <kartik@deploy2002> Started scap sync-world: Backport for [[gerrit:1083292|Disable MT in Content Translation on Lithuanian Wikipedia (T364073)]]
Mentioned in SAL (#wikimedia-operations) [2024-10-28T07:56:13Z] <kartik@deploy2002> kartik: Backport for [[gerrit:1083292|Disable MT in Content Translation on Lithuanian Wikipedia (T364073)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
Mentioned in SAL (#wikimedia-operations) [2024-10-28T08:07:55Z] <kartik@deploy2002> Finished scap sync-world: Backport for [[gerrit:1083292|Disable MT in Content Translation on Lithuanian Wikipedia (T364073)]] (duration: 22m 24s)