OpenAI’s ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models
OpenAI’s ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models
But is the output original? That’s the real question here. If humans are allowed to learn from information publicly available, why can’t AI?
No, it isn’t original. Output of AI is just reorganized content that it already has seen.
AI doesn’t learn, it doesn’t create derivative works. It’s nothing more than reshuffling what it’s already seen, to the point that it will frequently use phrases pulled directly from training data.
You are saying that it isn’t original content because AI can’t be original. I’m saying if the content isn’t distinguishable from original content, and can’t be directly traced to the source, in what way is it not original?
I think you hear a lot of college students say the same thing about their original work.
What I need to see is output from an AI, and the original content side by side and say “yeah, the AI ripped this off”. If you can’t do that, then the AI is effectively emulating human learning.
No it isn’t
AI is math. That’s it. This over humanization is crazy scary that people can’t see the difference. It does not learn like a human.
https://www.vice.com/en/article/m7gznn/ai-spits-out-exact-copies-of-training-images-real-people-logos-researchers-find
https://techcrunch.com/2022/12/13/image-generating-ai-can-copy-and-paste-from-training-data-raising-ip-concerns/amp/
https://www.technologyreview.com/2023/02/03/1067786/ai-models-spit-out-photos-of-real-people-and-copyrighted-images/amp/
It’s a well established problem. Tech companies have explicitly told employees to not use these services on company hardware or servers. The data is not abstracted from the user and it’s been proven to output data that’s been inputted.