‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

      • dhork@lemmy.world
        link
        fedilink
        English
        arrow-up
        55
        arrow-down
        5
        ·
        edit-2
        9 months ago

        ¿Porque no los dos?

        I don’t understand why people are defending AI companies sucking up all human knowledge by saying “well, yeah, copyrights are too long anyway”.

        Even if we went back to the pre-1976 term of 28 years, renewable once for a total of 56 years, there’s still a ton of recent works that AI are using without any compensation to their creators.

        I think it’s because people are taking this “intelligence” metaphor a bit too far and think if we restrict how the AI uses copyrighted works, that would restrict how humans use them too. But AI isn’t human, it’s just a glorified search engine. At least all standard search engines do is return a link to the actual content. These AI models chew up the content and spit out something based on it. It simply makes sense that this new process should be licensed separately, and I don’t care if it makes some AI companies go bankrupt. Maybe they can work adequate payment for content into their business model going forward.

        • deweydecibel@lemmy.world
          link
          fedilink
          English
          arrow-up
          13
          arrow-down
          2
          ·
          edit-2
          8 months ago

          It shouldn’t be cheap to absorb and regurgitate the works of humans the world over in an effort to replace those humans and subsequently enrich a handful of silicon valley people.

          Like, I don’t care what you think about copyright law and how corporations abuse it, AI itself is corporate abuse.

          And unlike copyright, which does serve its intended purpose of helping small time creators as much as it helps Disney, the true benefits of AI are overwhelmingly for corporations and investors. If our draconian copyright system is the best tool we have to combat that, good. It’s absolutely the lesser of the two evils.

          • lolcatnip@reddthat.com
            link
            fedilink
            English
            arrow-up
            0
            arrow-down
            1
            ·
            8 months ago

            Do you believe it’s reasonable, in general, to develop technology that has the potential to replace some human labor?

            Do you believe compensating copyright holders would benefit the individuals whose livelihood is at risk?

            the true benefits of AI are overwhelmingly for corporations and investors

            “True” is doing a lot of work here, I think. From my perspective the main beneficiaries of technology like LLMs and stable diffusion are people trying to do their work more efficiently, people paying around, and small-time creators who suddenly have custom graphics to illustrate their videos, articles, etc. Maybe you’re talking about something different, like deep fakes? The downside of using a vague term like “AI” is that it’s too easy to accidently conflate things that have little in common.

            • EldritchFeminity@lemmy.blahaj.zone
              link
              fedilink
              English
              arrow-up
              4
              ·
              8 months ago

              There’s 2 general groups when it comes to AI in my mind: Those whose work would benefit from the increased efficiency AI in various forms can bring, and those who want the rewards of work without putting in the effort of working.

              The former include people like artists who could do stuff like creating iterations of concept sketches before choosing one to use for a piece to make that part of their job easier/faster.

              Much of the opposition of AI comes from people worrying about/who have been harmed by the latter group. And it all comes down the way that the data sets are sourced.

              These are people who want to use the hard work of others for their own benefit, without giving them compensation; and the corporations fall pretty squarely into this group. As does your comment about “small-time creators who suddenly have custom graphics to illustrate their videos, articles, etc.” Before AI, they were free to hire an artist to do that for them. MidJourney, for example, falls into this same category - the developers were caught discussing various artists that they “launder through a fine tuned Codex” (their words, not mine, here for source) for prompts. If these sorts of generators were using opt-in data sets, paying licensing fees to the creators, or some other way to get permission to use their work, this tech could have tons of wonderful uses, like for those small-time creators. This is how music works. There are entire businesses that run on licensing copyright free music out to small-time creators for their videos and stuff, but they don’t go out recording bands and then splicing their songs up to create synthesizers to sell. They pay musicians to create those songs.

              Instead of doing what the guy behind IKEA did when he thought “people besides the rich deserve to be able to have furniture”, they’re cutting up Bob Ross paintings to sell as part of their collages to people who want to make art without having to actually learn how to make it or pay somebody to turn their idea into reality. Artists already struggle in a world that devalues creativity (I could make an entire rant on that, but the short is that the starving artist stereotype exists for a reason), and the way companies want to use AI like this is to turn the act of creating art into a commodity even more; to further divest the inherently human part of art from it. They don’t want to give people more time to create and think and enjoy life; they merely want to wring even more value out of them more efficiently. They want to take the writings of their journalists and use them to train the AI that they’re going to replace them with, like a video game journalism company did last fall with all of the writers they had on staff in their subsidiary companies. They think, “why keep 20 writers on staff when we can have a computer churn out articles for our 10 subsidiaries?” Last year, some guy took a screenshot of a piece of art that one of the artists for Genshin Impact was working on while livestreaming, ran it through some form of image generator, and then came back threatening to sue the artist for stealing his work.

              Copyright laws don’t favor the small guy, but they do help them protect their work as a byproduct of working for corporate interests. In the case of the Genshin artist, the fact that they were livestreaming their work and had undeniable, recorded proof that the work was theirs and not some rando in their stream meant that copyright law would’ve been on their side if it had actually gone anywhere rather than some asshole just being an asshole. Trademark isn’t quite the same, but I always love telling the story of the time my dad got a cease and desist letter from a company in another state for the name of a product his small business made. So he did some research, found out that they didn’t have the trademark for it in that state, got the trademark himself, and then sent them back their own letter with the names cut out and pasted in the opposite spots. He never heard from them again!

        • AnneBonny@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          2
          ·
          8 months ago

          I don’t understand why people are defending AI companies sucking up all human knowledge by saying “well, yeah, copyrights are too long anyway”.

          Would you characterize projects like wikipedia or the internet archive as “sucking up all human knowledge”?

          • dhork@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            8 months ago

            In Wikipedia’s case, the text is (well, at least so far), written by actual humans. And no matter what you think about the ethics of Wikipedia editors, they are humans also. Human oversight is required for Wikipedia to function properly. If Wikipedia were to go to a model where some AI crawls the web for knowledge and writes articles based on that with limited human involvement, then it would be similar. But that’s not what they are doing.

            The Internet Archive is on a bit less steady legal ground (see the resent legal actions), but in its favor it is only storing information for archival and lending purposes, and not using that information to generate derivative works which it is then selling. (And it is the lending that is getting it into trouble right now, not the archiving).

            • phillaholic@lemm.ee
              link
              fedilink
              English
              arrow-up
              2
              ·
              8 months ago

              The Internet Archive has no ground to stand on at all. It would be one thing if they only allowed downloading of orphaned or unavailable works, but that’s not the case.

            • randon31415@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              8 months ago

              Wikipedia has had bots writing articles since the 2000 census information was first published. The 2000 census article writing bot was actually the impetus for Wikipedia to make the WP:bot policies.

          • assassin_aragorn@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 months ago

            Wikipedia is free to the public. OpenAI is more than welcome to use whatever they want if they become free to the public too.

          • MBM@lemmings.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 months ago

            Does Wikipedia ever have issues with copyright? If you don’t cite your sources or use a copyrighted image, it will get removed

        • lolcatnip@reddthat.com
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          8 months ago

          I don’t understand why people are defending AI companies

          Because it’s not just big companies that are affected; it’s the technology itself. People saying you can’t train a model on copyrighted works are essentially saying nobody can develop those kinds of models at all. A lot of people here are naturally opposed to the idea that the development of any useful technology should be effectively illegal.

          • BURN@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 months ago

            You can make these models just fine using licensed data. So can any hobbyist.

            You just can’t steal other people’s creations to make your models.

            • lolcatnip@reddthat.com
              link
              fedilink
              English
              arrow-up
              0
              ·
              8 months ago

              Of course it sounds bad when you using the word “steal”, but I’m far from convinced that training is theft, and using inflammatory language just makes me less inclined to listen to what you have to say.

              • BURN@lemmy.world
                link
                fedilink
                English
                arrow-up
                0
                ·
                8 months ago

                Training is theft imo. You have to scrape and store the training data, which amounts to copyright violation based on replication. It’s an incredibly simple concept. The model isn’t the problem here, the training data is.

      • HelloThere@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        38
        arrow-down
        5
        ·
        edit-2
        9 months ago

        I’m no fan of the current copyright law - the Statute of Anne was much better - but let’s not kid ourselves that some of the richest companies in the world have any desire what so ever to change it.

        • Gutless2615@ttrpg.network
          link
          fedilink
          English
          arrow-up
          13
          arrow-down
          10
          ·
          9 months ago

          My brother in Christ I’m begging you to look just a little bit into the history of copyright expansion.

            • Gutless2615@ttrpg.network
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              2
              ·
              8 months ago

              I only discuss copyright on posts about AI copyright issues. Yes, brilliant observation. I also talk about privacy y issues on privacy relevant posts, labor issues on worker rights related articles and environmental justice on global warming pieces. Truly a brilliant and skewering observation. Youre a true internet private eye.

              Fair use and pushing back against (corporate serving) copyright maximalism is an issue I am passionate about and engage in. Is that a problem for you?

      • Fisk400@feddit.nu
        link
        fedilink
        English
        arrow-up
        15
        arrow-down
        3
        ·
        9 months ago

        As long as capitalism exist in society, just being able go yoink and taking everyone’s art will never be a practical rule set.

    • S410@lemmy.ml
      link
      fedilink
      English
      arrow-up
      22
      arrow-down
      12
      ·
      9 months ago

      Every work is protected by copyright, unless stated otherwise by the author.
      If you want to create a capable system, you want real data and you want a wide range of it, including data that is rarely considered to be a protected work, despite being one.
      I can guarantee you that you’re going to have a pretty hard time finding a dataset with diverse data containing things like napkin doodles or bathroom stall writing that’s compiled with permission of every copyright holder involved.

      • Exatron@lemmy.world
        link
        fedilink
        English
        arrow-up
        27
        arrow-down
        4
        ·
        9 months ago

        How hard it is doesn’t matter. If you can’t compensate people for using their work, or excluding work people don’t want users, you just don’t get that data.

        There’s plenty of stuff in the public domain.

      • HelloThere@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        19
        arrow-down
        3
        ·
        edit-2
        9 months ago

        I never said it was going to be easy - and clearly that is why OpenAI didn’t bother.

        If they want to advocate for changes to copyright law then I’m all ears, but let’s not pretend they actually have any interest in that.

      • deweydecibel@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        8 months ago

        I can guarantee you that you’re going to have a pretty hard time finding a dataset with diverse data containing things like napkin doodles or bathroom stall writing that’s compiled with permission of every copyright holder involved.

        You make this sound like a bad thing.

  • unreasonabro@lemmy.world
    link
    fedilink
    English
    arrow-up
    36
    arrow-down
    1
    ·
    8 months ago

    finally capitalism will notice how many times it has shot up its own foot with their ridiculous, greedy infinite copyright scheme

    As a musician, people not involved in the making of my music make all my money nowadays instead of me anyway. burn it all down

  • 800XL@lemmy.world
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    1
    ·
    9 months ago

    I guess the lesson here is pirate everything under the sun and as long as you establish a company and train a bot everything is a-ok. I wish we knew this when everyone was getting dinged for torrenting The Hurt Locker back when.

    Remember when the RIAA got caught with pirated mp3s and nothing happened?

    What a stupid timeline.

  • Alien Nathan Edward@lemm.ee
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    3
    ·
    8 months ago

    if it’s impossible for you to have something without breaking the law you have to do without it

    if it’s impossible for the artistocrat class to have something without breaking the law, we change or ignore the law

      • Krauerking@lemy.lol
        link
        fedilink
        English
        arrow-up
        3
        ·
        8 months ago

        Oh sure. But why is it only the massive AI push that allows the large companies owning the models full of stolen materials that make basic forgeries of the stolen items the ones that can ignore the bullshit copyright laws?

        It wouldn’t be because it is super profitable for multiple large industries right?

    • NeatNit@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      27
      arrow-down
      1
      ·
      8 months ago

      hijacking this comment

      OpenAI was IMHO well within its rights to use copyrighted materials when it was just doing research. They were* doing research on how far large language models can be pushed, where’s the ceiling for that. It’s genuinely good research, and if copyrighted works are used just to research and what gets published is the findings of the experiments, that’s perfectly okay in my book - and, I think, in the law as well. In this case, the LLM is an intermediate step, and the published research papers are the “product”.

      The unacceptable turning point is when they took all the intermediate results of that research and flipped them into a product. That’s not the same, and most or all of us here can agree - this isn’t okay, and it’s probably illegal.

      * disclaimer: I’m half-remembering things I’ve heard a long time ago, so even if I phrase things definitively I might be wrong

      • dasgoat@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        8 months ago

        True, with the acknowledgement that this was their plan all along and the research part was always intended to be used as a basis for a product. They just used the term ‘research’ as a workaround that allowed them to do basically whatever to copyrighted materials, fully knowing that they were building a marketable product at every step of their research

        That is how these people essentially function, they’re the tax loophole guys that make sure you and I pay less taxes than Amazon. They are scammers who have no regard for ethics and they can and will use whatever they can to reach their goal. If that involves lying about how you’re doing research when in actuality you’re doing product development, they will do that without hesitation. The fact that this product now exists makes it so lawmakers are now faced with a reality where the crimes are kind of past and all they can do is try and legislate around this thing that now exists. And they will do that poorly because they don’t understand AI.

        And this just goes into fraud in regards to research and copyright. Recently it came out that LAION-5B, an image generator that is part of Stable Diffusion, was trained on at least 1000 images of child pornography. We don’t know what OpenAI did to mitigate the risk of their seemingly indiscriminate web scrapers from picking up harmful content.

        AI is not a future, it’s a product that essentially functions to repeat garbled junk out of things we have already created, all the while creating a massive burden on society with its many, many drawbacks. There are little to no arguments FOR AI, and many, many, MANY to stop and think about what these fascist billionaire ghouls are burdening society with now. Looking at you, Peter Thiel. You absolute ghoul.

        • NeatNit@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          8 months ago

          True, with the acknowledgement that this was their plan all along and the research part was always intended to be used as a basis for a product. They just used the term ‘research’ as a workaround that allowed them to do basically whatever to copyrighted materials, fully knowing that they were building a marketable product at every step of their research

          I really don’t think so. I do believe OpenAI was founded with genuine good intentions. But around the time it transitioned from a non-profit to a for-profit, those good intentions were getting corrupted, culminating in the OpenAI of today.

          The company’s unique structure, with a non-profit’s board of directors controlling the company, was supposed to subdue or prevent short-term gain interests from taking precedence over long-term AI safety and other such things. I don’t know any of the details beyond that. We all know it failed, but I still believe the whole thing was set up in good faith, way back when. Their corruption was a gradual process.

          There are little to no arguments FOR AI

          Outright not true. There’s so freaking many! Here’s some examples off the top of my head:

          • Just today, my sister told me how ChatGPT (her first time using it) identified a song for her based on her vague description of it. She has been looking for this song for months with no success, even though she had pretty good key details: it was a duet, released around 2008-2012, and she even remembered a certain line from it. Other tools simply failed, and ChatGPT found it instantly. AI is just a great tool for these kinds of tasks.
          • If you have a huge amount of data to sift through, looking for something specific but that isn’t presented in a specific format - e.g. find all arguments for and against assisted dying in this database of 200,000 articles with no useful tags - then AI is the perfect springboard. It can filter huge datasets down to just a tiny fragment, which is small enough to then be processed by humans.
          • Using AI to identify potential problems and pitfalls in your work, which can’t realistically be caught by directly programmed QA tools. I have no particular example in mind right now, unfortunately, but this is a legitimate use case for AI.
          • Also today, I stumbled upon Rapid, a map editing tool for OpenStreetMap which uses AI to predict and suggest things to add - with the expectation that the user would make sure the suggestions are good before accepting them. I haven’t formed a full opinion about it in particular (and especially wary because it was made by Facebook), but these kinds of productivity boosters are another legitimate use case for AI. Also in this category is GitHub’s Copilot, which is its own can of worms, but if Copilot’s training data wasn’t stolen the way it was, I don’t think I’d have many problems with it. It looks like a fantastic tool (I’ve never used it myself) with very few downsides for society as a whole. Again, other than the way it was trained.

          As for generative AI and pictures especially, I can’t as easily offer non-creepy uses for it, but I recommend you see this video which takes a very frank take on the matter: https://nebula.tv/videos/austinmcconnell-i-used-ai-in-a-video-there-was-backlash if you have access to Nebula, https://www.youtube.com/watch?v=iRSg6gjOOWA otherwise.
          Personally I’m still undecided on this sub-topic.

          Deepfakes etc. are just plain horrifying, you won’t hear me give them any wiggle room.

          Don’t get me wrong - I am not saying OpenAI isn’t today rotten at the core - it is! But that doesn’t mean ALL instances of AI that could ever be are evil.

          • dasgoat@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            3
            ·
            8 months ago

            ‘It’s just this one that is rotten to the core’

            ‘Oh and this one’

            ‘Oh this one too huh’

            ‘Oh shit the other one as well’

            Yeah you’re not convincing me of shit. I haven’t even mentioned the goddamn digital slavery these operations are running, or how this shit is polluting our planet so someone somewhere can get some AI Childporn? Fuck that shit.

            You’re afraid to look behind the curtains because you want to ride the hypetrain. Have fun while it lasts, I hope it burns every motherfucker who thought this shit was a good idea to the motherfucking ground.

  • S410@lemmy.ml
    link
    fedilink
    English
    arrow-up
    39
    arrow-down
    18
    ·
    9 months ago

    They’re not wrong, though?

    Almost all information that currently exists has been created in the last century or so. Only a fraction of all that information is available to be legally acquired for use and only a fraction of that already small fraction has been explicitly licensed using permissive licenses.

    Things that we don’t even think about as “protected works” are in fact just that. Doesn’t matter what it is: napkin doodles, writings on bathrooms stall walls, letters written to friends and family. All of those things are protected, unless stated otherwise. And, I don’t know about you, but I’ve never seen a license notice attached to a napkin doodle.

    Now, imagine trying to raise a child while avoiding every piece of information like that; information that you aren’t licensed to use. You wouldn’t end up with a person well suited to exist in the world. They’d lack education regarding science, technology, they’d lack understanding of pop-culture, they’d know no brand names, etc.

    Machine learning models are similar. You can train them that way, sure, but they’d be basically useless for real-world applications.

    • AntY@lemmy.world
      link
      fedilink
      English
      arrow-up
      42
      arrow-down
      2
      ·
      9 months ago

      The main difference between the two in your analogy, that has great bearing on this particular problem, is that the machine learning model is a product that is to be monetized.

      • S410@lemmy.ml
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        15
        ·
        9 months ago

        Not necessarily. There’s plenty that are open source and available for free to anyone willing to provide their own computational power.
        In cases where you pay for a service, it could be argued that you aren’t paying for the access to the model or its results, but the convenience and computational power necessary to run the model.

    • Exatron@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      5
      ·
      9 months ago

      The difference here is that a child can’t absorb and suddenly use massive amounts of data.

      • S410@lemmy.ml
        link
        fedilink
        English
        arrow-up
        16
        arrow-down
        13
        ·
        edit-2
        9 months ago

        The act of learning is absorbing and using massive amounts of data. Almost any child can, for example, re-create copyrighted cartoon characters in their drawing or whistle copyrighted tunes.

        If you look at, pretty much, any and all human created works, you will be able to trace elements of those works to many different sources. We, usually, call that “sources of inspiration”. Of course, in case of human created works, it’s not a big deal. Generally, it’s considered transformative and a fair use.

        • Barbarian@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          15
          arrow-down
          4
          ·
          edit-2
          9 months ago

          I really don’t understand this whole “learning” thing that everybody claims these models are doing.

          A Markov chain algorithm with different inputs of text and the output of the next predicted word isn’t colloquially called “learning”, yet it’s fundamentally the same process, just less sophisticated.

          They take input, apply a statistical model to it, generate output derived from the input. Humans have creativity, lateral thinking and the ability to understand context and meaning. Most importantly, with art and creative writing, they’re trying to express something.

          “AI” has none of these things, just a probability for which token goes next considering which tokens are there already.

          • agamemnonymous@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            2
            ·
            9 months ago

            Humans have creativity, lateral thinking and the ability to understand context and meaning

            What evidence do you have that those aren’t just sophisticated, recursive versions of the same statistical process?

            • Barbarian@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              edit-2
              8 months ago

              I think the best counter to this is to consider the zero learning state. A language model or art model without any training data at all will output static, basically. Random noise.

              A group of humans socially isolated from the rest of the world will independently create art and music. It has happened an uncountable number of times. It seems to be a fairly automatic emergent property of human societies.

              With that being the case, we can safely say that however creativity works, it’s not merely compositing things we’ve seen or heard before.

              • agamemnonymous@sh.itjust.works
                link
                fedilink
                English
                arrow-up
                2
                ·
                8 months ago

                I disagree with this analysis. Socially isolated humans aren’t isolated, they still have nature to imitate. There’s no such thing as a human with no training data. We gather training data our whole life, possibly from the womb. Even in an isolated group, we still have others of the group to imitate, who in turn have ancestors, and again animals and natural phenomena. I would argue that all creativity is precisely compositing things we’ve seen or heard before.

          • sus@programming.dev
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            edit-2
            8 months ago

            I don’t think “learning” is a word reserved only for high-minded creativeness. Just rote memorization and repetition is sometimes called learning. And there are many intermediate states between them.

          • testfactor@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            2
            ·
            9 months ago

            Out of curiosity, how far do you extend this logic?

            Let’s say I’m an artist who does fractal art, and I do a line of images where I take jpegs of copywrite protected art and use the data as a seed to my fractal generation function.

            Have I have then, in that instance, taken a copywritten work and simply applied some static algorithm to it and passed it off as my own work, or have I done something truly transformative?

            The final image I’m displaying as my own art has no meaningful visual cues to the original image, as it’s just lines and colors generated using the image as a seed, but I’ve also not applied any “human artistry” to it, as I’ve just run it through an algorithm.

            Should I have to pay the original copywrite holder?
            If so, what makes that fundamentally different from me looking at the copywritten image and drawing something that it inspired me to draw?
            If not, what makes that fundamentally different from AI images?

              • testfactor@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                8 months ago

                I feel like you latched on to one sentence in my post and didn’t engage with the rest of it at all.

                That sentence, in your defense, was my most poorly articulated, but I feel like you responded devoid of any context.

                Am I to take it, from your response, that you think that a fractal image that uses a copywritten image as a seed to it’s random number generator would be copyright infringement?

                If so, how much do I, as the creator, have to “transform” that base binary string to make it “fair use” in your mind? Are random but flips sufficient?
                If so, how is me doing that different than having the machine do that as a tool? If not, how is that different than me editing the bits using a graphical tool?

        • Exatron@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          8 months ago

          The problem is that a human doesn’t absorb exact copies of what it learns from, and fair use doesn’t include taking entire works, shoving them in a box, and shaking it until something you want comes out.

          • S410@lemmy.ml
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            8 months ago

            Expect for all the cases when humans do exactly that.

            A lot of learning is, really, little more than memorization: spelling of words, mathematical formulas, physical constants, etc. But, of course, those are pretty small, so they don’t count?

            Then there’s things like sayings, which are entire phrases that only really work if they’re repeated verbatim. You sure can deliver the same idea using different words, but it’s not the same saying at that point.

            To make a cover of a song, for example, you have to memorize the lyrics and melody of the original, exactly, to be able to re-create it. If you want to make that cover in the style of some other artist, you, obviously, have to learn their style: that is, analyze and memorize what makes that style unique. (e.g. C418 - Haggstrom, but it’s composed by John Williams)

            Sometimes the artists don’t even realize they’re doing exactly that, so we end up with with “subconscious plagiarism” cases, e.g. Bright Tunes Music v. Harrisongs Music.

            Some people, like Stephen Wiltshire, are very good at memorizing and replicating certain things; way better than you, I, or even current machine learning systems. And for that they’re praised.

            • Exatron@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              8 months ago

              Except they literally don’t. Human memory doesn’t retain an exact copy of things. Very good isn’t the same as exactly. And human beings can’t grab everything they see and instantly use it.

              • S410@lemmy.ml
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                edit-2
                8 months ago

                Machine learning doesn’t retain an exact copy either. Just how on earth do you think can a model trained on terabytes of data be only a few gigabytes in side, yet contain “exact copies” of everything? If “AI” could function as a compression algorithm, it’d definitely be used as one. But it can’t, so it isn’t.

                Machine learning can definitely re-create certain things really closely, but to do it well, it generally requires a lot of repeats in the training set. Which, granted, is a big problem that exists right now, and which people are trying to solve. But even right now, if you want an “exact” re-creation of something, cherry picking is almost always necessary, since (unsurprisingly) ML systems have a tendency to create things that have not been seen before.

                Here’s an image from an article claiming that machine learning image generators plagiarize things.

                However, if you take a second to look at the image, you’ll see that the prompters literally ask for screencaps of specific movies with specific actors, etc. and even then the resulting images aren’t one-to-one copies. It doesn’t take long to spot differences, like different lighting, slightly different poses, different backgrounds, etc.

                If you got ahold of a human artist specializing in photoreal drawings and asked them to re-create a specific part of a movie they’ve seen a couple dozen or hundred times, they’d most likely produce something remarkably similar in accuracy. Very similar to what machine learning images generators are capable of at the moment.

  • Milk_Sheikh@lemm.ee
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    3
    ·
    8 months ago

    Wow! You’re telling me that onerous and crony copyright laws stifle innovation and creativity? Thanks for solving the mystery guys, we never knew that!

  • McArthur@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    2
    ·
    8 months ago

    It feels to be like every other post on lemmy is taking about how copyright is bad and should be changed, or piracy is caused by fragmentation and difficulty accessing information (streaming sites). Then whenever this topic comes up everyone completely flips. But in my mind all this would do is fragment the ai market much like streaming services (suddenly you have 10 different models with different licenses), and make it harder for non mega corps without infinite money to fund their own llms (of good quality).

    Like seriously, can’t we just stay consistent and keep saying copyright bad even in this case? It’s not really an ai problem that jobs are effected, just a capitalism problem. Throw in some good social safety nets and tax these big ai companies and we wouldn’t even have to worry about the artist’s well-being.

    • Marxism-Fennekinism@lemmy.ml
      link
      fedilink
      English
      arrow-up
      18
      ·
      edit-2
      8 months ago

      I think looking at copyright in a vacuum is unhelpful because it’s only one part of the problem. IMO, the reason people are okay with piracy of name brand media but are not okay with OpenAI using human-created artwork is from the same logic of not liking companies and capitalism in general. People don’t like the fact that AI is extracting value from individual artists to make the rich even richer while not giving anything in return to the individual artists, in the same way we object to massive and extremely profitable media companies paying their artists peanuts. It’s also extremely hypocritical that the government and by extention “copyright” seems to care much more that OpenAI is using name brand media than it cares about OpenAI scraping the internet for independent artists’ work.

      Something else to consider is that AI is also undermining copyleft licenses. We saw this in the GitHub Autopilot AI, a 100% proprietary product, but was trained on all of GitHub’s user-generated code, including GPL and other copyleft licensed code. The art equivalent would be CC-BY-SA licenses where derivatives have to also be creative commons.

      • McArthur@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 months ago

        Maybe I’m optimistic but I think your comparison to big media companies paying their artist’s peanuts highlights to me that the best outcome is to let ai go wild and just… Provide some form of government support (I don’t care what form, that’s another discussion). Because in the end the more stuff we can train ai on freely the faster we automate away labour.

        I think another good comparison is reparations. If you could come to me with some plan that perfectly pays out the correct amount of money to every person on earth that was impacted by slavery and other racist policies to make up what they missed out on, ids probably be fine with it. But that is such a complex (impossible, id say) task that it can’t be done, and so I end up being against reparations and instead just say “give everyone money, it might overcompensate some, but better that than under compensating others”. Why bother figuring out such a complex, costly and bureaucratic way to repay artists when we could just give everyone robust social services paid for by taxing ai products an amount equal to however much money they have removed from the work force with automation.

    • MrSqueezles@lemm.ee
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      8 months ago

      Journalist: Read a press release. Write it in my own words. See some Tweets. Put them together in a page padded with my commentary. Learn from, reference, and quote copyrighted material everywhere.

      AI

      I do that too.

      Journalists

      How dare AI learn! Especially from copyrighted material!

      • Boiglenoight@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        8 months ago

        Journalists need to survive. AI is a tool for profit, with no need to eat, sleep, pay for kids clothes or textbooks.

    • rottingleaf@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      8 months ago

      Which jobs are going to be affected really?

      One thing is for certain, the “open” web is going to become a junkyard even more than it is now.

  • CosmoNova@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    1
    ·
    edit-2
    9 months ago

    Let’s wait until everyone is laid off and it’s ‘impossible’ to get by without mass looting then, shall we?

  • Treczoks@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    arrow-down
    3
    ·
    8 months ago

    If a business relies on breaking the law as a fundament of their business model, it is not a business but an organized crime syndicate. A Mafia.

  • Chee_Koala@lemmy.world
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    17
    ·
    9 months ago

    But our current copyright model is so robust and fair! They will only have to wait 95y after the author died, which is a completely normal period.

    If you want to control your creations, you are completely free to NOT publish it. Nowhere it’s stated that to be valuable or beautiful, it has to be shared on the world podium.

    We’ll have a very restrictive Copyright for non globally transmitted/published works, and one for where the owner of the copyright DID choose to broadcast those works globally. They have a couple years to cash in, and then after I dunno, 5 years, we can all use the work as we see fit. If you use mass media to broadcast creative works but then become mad when the public transforms or remixes your work, you are part of the problem.

    Current copyright is just a tool for folks with power to control that power. It’s what a boomer would make driving their tractor / SUV while chanting to themselves: I have earned this.

      • just_change_it@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        2
        ·
        9 months ago

        I think it’s pretty amazing when people just run with the dogma that empowers billionaires.

        Every creator hopes they’ll be the next taylor swift and that they’ll retain control of their art for those life + 70 years and make enough to create their own little dynasty.

        The reality is that long duration copyright is almost exclusively a tool of the already wealthy, not a tool for the not-yet-wealthy. As technology improves it will be easier and easier for wealth to control the system and deny the little guy’s copyright on grounds that you used something from their vast portfolio of copyright/patent/trademark/ipmonopolyrulelegalbullshit. Already civil legal disputes are largely a function of who has the most money.

        I don’t have the solution that helps artists earn a living, but it doesn’t seem like copyright is doing them many favors as-is unless they are retired rockstars who have already earned in excess of the typical middle class lifetime earnings by the time they hit 35, or way earlier.

      • drislands@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        2
        ·
        9 months ago

        Them: “Oh yeah I have 10 minutes until my dentist appointment, I’ll check that out.”

      • Chee_Koala@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        2
        ·
        edit-2
        8 months ago

        First:

        I truly believe that they don’t matter as an individual when looking at their creation as a whole. It matters among their loved ones, and for that person itself. Why do you need more… importance? From who? Why do you need to matter in scope of creation? Is it a creation for you? Then why publish it? Is it a creation for others? Then why does your identity matter? It just seems like egotism with extra steps. Using copyright to combat this seems like a red herring argument made by people who have portfolio’s against people who don’t…

        You are not only your own person, you carry human culture remnants distilled out of 12000 years of humanity! You plagiarised almost the whole of humanity while creating your ‘unique’ addition to culture. But, because your remixed work is newer and not directly traceable to its direct origins, we’re gonna pretend that you wrote it as a hermit living without humanity on a rock and establish the rules from there on out. If it was fair for all the players in this game, it would already be impossible to not plagiarise.

      • h3rm17@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        4
        ·
        9 months ago

        Funny thing is, human artists work quite similar to AI, in that they take the whole of human art creation, build on ot and create something new (sometimes quite derivative). No art comes out of a vacuum, it builds on previous works. I would not really say AI plagiarizes anything, unless it reproduced pretty much the exact work of someone

  • phillaholic@lemm.ee
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    2
    ·
    8 months ago

    A ton of people need to read some basic background on how copyright, trademark, and patents protect people. Having none of those things would be horrible for modern society. Wiping out millions of jobs, medical advancements, and putting control into the hands of companies who can steal and strongarm the best. If you want to live in a world run by Mafia style big business then sure.

    • 31337@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 months ago

      Meh, patents are monopolies over ideas, do much more harm than good, and help big business much more than they help the little guy. Being able to own an idea seems crazy to me.

      I marginally support copyright laws, just because they provide a legal framework to enforce copyleft licenses. Though, I think copyright is abused too much on places like YouTube. In regards to training generative AI, the goal is not to copy works, and that would make the model’s less useful. It’s very much fair use.

      Trademarks are generally good, but sometimes abused as well.

      • phillaholic@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 months ago

        Patents don’t let you own an idea. They give you an exclusive right to use the idea for a limited time in exchange for detailed documentation on how your idea works. Once the patent expires everyone can use it. But while it’s under patent anyone can look up the full documentation and learn from it. Without this, big business could reverse engineer the little guys invention and just steal it.

        • 31337@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          8 months ago

          Goes both ways. As someone who has tried bringing new products to market, it’s extremely annoying that nearly everything you can think of already has similar patent. I’ve also reverse engineered a few things (circuits and disassembled code), as a little guy, working for a small business . I don’t think people usually scan patents to learn things, and reverse engineering usually isn’t too hard.

          If I were a capitalist, I’d argue that if a big business “steals” an idea, and implements it more effectively and efficiently than the small business, then the small business should probably fail.

          • phillaholic@lemm.ee
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 months ago

            Amazon is practically a case study on your last point. They routinely copy competitors products that use their platform to sell, taking most of the profits for themselves and sometimes putting those others out of business. I don’t see that as a good thing, it’s anticompetitive and eventually the big business just squeezes for more profit.

    • xenoclast@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      8 months ago

      I agree with you on part …It’s moot anyway. It’s the current law of the land. The glue of society and all that. It’s illegal now so they shouldn’t do it.

      If you have enough money (required) and make a solid legal argument to change the laws (optional: depends on how much money you start with) then they can do it… But for now they should STFU and shut the fuck down.

    • BlueMagma@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      8 months ago

      I see and understand your point regarding trademark, but I don’t understand how removing copyright or patents would have this effect, could you elaborate ?

      • mihnt@lemmy.world
        link
        fedilink
        English
        arrow-up
        13
        ·
        8 months ago

        Small business comes up with something, big business takes idea and puts it in all their stores/factories. Small business loses out because they can’t compete. Small business goes poof trying to compete.

        • BlueMagma@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          8 months ago

          Is it not what is already happening with our current system ? The little guy never have the ressources to fight legal battle against the big guy and enforce it’s “intellectual property”.

          And the opposite would be true in a world without patent, small businesses could win because they would be free to reuse and adapt big businesses’ ideas.

          It feels very simplistic to reduce patents to “protection of the little business”, in our current world they mostly protect the big ones.

          Also this small example doesn’t elaborate about how removing copyrights would so negatively affects our society

          • BURN@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            8 months ago

            I mean we’ve seen it work multiple times against Apple where a smaller company has been able to enforce their patent against them.

          • aesthelete@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            edit-2
            8 months ago

            There’s a reason why the sharks on shark tank ask if ideas are patented. Without a patent, your idea can be ripped off without any recompense.

            Sure there are problems with some patents, such as software patents, but the system should be reformed rather than completely tossed.

          • mihnt@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            8 months ago

            Well, I was just giving an example of something that is bad about not having a patent system. Personally, I think the patent system is good thing, but it needs a lot of reworking and we don’t and probably won’t ever have the proper government to fix it what with all the big businesses living in the politician’s pockets.

  • Venia Silente@lemm.ee
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    4
    ·
    8 months ago

    “Impossible”? They just need to ask for permission from each source. It’s not like they don’t already know who the sources are, since the AIs are issuing HTTP(S) requests to fetch them.