Last November, The Bookseller reported Dutch publisher Veen Bosch & Keuning, owned by publishing titan Simon & Schuster, was testing the use of artificial intelligence to help translate several of its books to English.

  • JustTesting@lemmy.hogru.ch
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    Actually, as to your edit, the it sounds like you’re fine-tuning the model for your data, not training it from scratch. So the llm has seen english and chinese before during the initial training. Also, they represent words as vectors and what usually happens is that similiar words’ vectors are close together. So subtituting e.g. Dad for Papa looks almost the same to an llm. Same across languages. But that’s not understanding, that’s behavior that way simpler models also have.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      True! Models not trained on a specific language are generally bad at that language.

      However, there are some exceptions, like a Japanese tune of Qwen 32B which dramatically enhances it Japanese, but the training has to be pretty extensive.

      And even that aside… the effect is still there. The point it to illustrate that LLMs are sort of “language independent” internally, like you said.