Last November, The Bookseller reported Dutch publisher Veen Bosch & Keuning, owned by publishing titan Simon & Schuster, was testing the use of artificial intelligence to help translate several of its books to English.
Last November, The Bookseller reported Dutch publisher Veen Bosch & Keuning, owned by publishing titan Simon & Schuster, was testing the use of artificial intelligence to help translate several of its books to English.
Actually, as to your edit, the it sounds like you’re fine-tuning the model for your data, not training it from scratch. So the llm has seen english and chinese before during the initial training. Also, they represent words as vectors and what usually happens is that similiar words’ vectors are close together. So subtituting e.g. Dad for Papa looks almost the same to an llm. Same across languages. But that’s not understanding, that’s behavior that way simpler models also have.
True! Models not trained on a specific language are generally bad at that language.
However, there are some exceptions, like a Japanese tune of Qwen 32B which dramatically enhances it Japanese, but the training has to be pretty extensive.
And even that aside… the effect is still there. The point it to illustrate that LLMs are sort of “language independent” internally, like you said.