• Star@sopuli.xyzOP
    link
    fedilink
    English
    arrow-up
    143
    arrow-down
    5
    ·
    edit-2
    8 months ago

    It’s so ridiculous when corporations steal everyone’s work for their own profit, no one bats an eye but when a group of individuals do the same to make education and knowledge free for everyone it’s somehow illegal, unethical, immoral and what not.

    • richieadler@lemmy.myserv.one
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      8 months ago

      Cue the Max Headroom episode where the blanks (disconnected people) are chased by the censors because the blanks steal cable so their children can watch the educational shows and learn to read, and they are forced to use clandestine printing presses to teach them.

      • mPony@kbin.social
        link
        fedilink
        arrow-up
        3
        ·
        8 months ago

        what’s this? an anti-corporate message that sneers at cable TV companies??? CANCEL THAT SHOW!!!

        that show was so amazingly prescient: the theme of the first episode was how advertising literally kills its viewers and the news covers things up. No wonder they didn’t get renewed. ;)

    • Grimy@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      8 months ago

      Using publically available data to train isn’t stealing.

      Daily reminder that the ones pushing this narrative are literally corporation like OpenAI. If you can’t use copyright materials freely to train on, it brings up the cost in such a way that only a handful of companies can afford the data.

      They want to kill the open-source scene and are manipulating you to do so. Don’t build their moat for them.

      • givesomefucks@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        8 months ago

        And using publicly available data to train gets you a shitty chatbot…

        Hell, even using copyrighted data to train isn’t that great.

        Like, what do you even think they’re doing here for your conspiracy?

        You think OpenAI is saying they should pay for the data? They’re trying to use it for free.

        Was this a meta joke and you had a chatbot write your comment?

        • tourist@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          8 months ago

          Was this a meta joke and you had a chatbot write your comment?

          if someone said this to me I’d cry

        • webghost0101@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          8 months ago

          The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn’t copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.

          This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.

          The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.

          EDIT: In case it isn’t clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.

          • givesomefucks@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            8 months ago

            That’s insane logic…

            Like you’re essentially saying I can copy/paste any article without a paywall to my own blog and sell adspace on it…

            And your still saying OpenAI is trying to make AI companies pay?

            Like, do you think AI runs off free cloud services? The hardware is insanely expensive.

            And OpenAI is trying to argue the opposite, that AI companies shouldn’t have to pay to use copyrighted works.

            You have zero idea what is going on, but you are really confident you do

            • webghost0101@sopuli.xyz
              link
              fedilink
              English
              arrow-up
              0
              ·
              8 months ago

              I clarified the comment above which was misunderstood, whether it makes a moral/sane argument is subjective and i am not covering that.

              I am not sure why you think there is a claim that openAI is trying to make companies pay, on the contrary the comment i was clarifying (so not my opinion/words) states that openAI is making an argument that anyone should be able to use copyrighted materials for free to train AI.

              The costs of running an online service like chatgpt is wildly besides the argument presented. You can run your own open source large language models at home about as well as you can run Bethesda’s Starfield on a same spec’d PC

              Those Open source large language models are trained on the same collections of data including copyrighted data.

              The logic being used here is:

              If It becomes globally forbidden to train AI with copyrighted materials or there is a large price or fine in order to use them for training then the Non-Corporate, Free, Open Source Side of AI will perish or have to go underground while to the For-Profit mega corporations will continue exploit and train ai as usual because they can pay to settle in court.

              The Ethical dilemma as i understand it is:

              Allowing Ai to train for free is a direct threat towards creatives and a win for BigProfit Enthertainment, not allowing it to train to free is treat to public democratic AI and a win for BigTech merging with BigCrime

              • Grimy@lemmy.world
                link
                fedilink
                English
                arrow-up
                0
                arrow-down
                1
                ·
                8 months ago

                That is very well put, I really wish I could have started with that.

                Though I envision it as a loss for BigProfit Enthertainment since I see this as a real boon for the indie gaming, animation and eventually filmmaking industry.

                It’s definitely overall quite a messy situation.

          • RainfallSonata@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            arrow-down
            1
            ·
            8 months ago

            I didn’t want any of this shit. IDGAF if we don’t have AI. I’m still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.

            • RememberTheApollo@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              8 months ago

              It doesn’t matter what you want. What matters is if corporations can extract $ from you, gain an efficiency, or cut their workforce using it.

              That’s what the drive for AI is all about.

            • Grimy@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              arrow-down
              1
              ·
              8 months ago

              You don’t have to use it. You can even disconnect from the internet completely.

              Whats the benefit of stopping me from using it?

        • Grimy@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          8 months ago

          If the data has to be paid for, openAI will gladly do it with a smile on their face. It guarantees them a monopoly and ownership of the economy.

          Paying more but having no competition except google is a good deal for them.

          • givesomefucks@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            edit-2
            8 months ago

            Eh, the issue is lots of people wouldn’t be willing to sell tho.

            Like, you think an author wants the chatbot to read their collected works and use that? Regardless of if it’s quoting full texts or “creating” text in their style.

            No author is going to want that.

            And if it’s up to publishers, they likely won’t either. Why take one small payday if that could potentially lead to loss of sales a few years down the row.

            It’s not like the people making the chatbits just need to buy a retail copy of the text to be in the legal clear.

            • Grimy@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              arrow-down
              1
              ·
              8 months ago

              The publisher’s will absolutely sell imo. They just publish, the book will be worth the same with or without the help of AI to write it.

              I guess there is a possibility that people start replacing bought books with personalized book llm outputs but that strikes me as unlikely.

      • TwilightVulpine@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        8 months ago

        OpenAI is definitely not the one arguing that they have stole data to train their AIs, and Disney will be fine whether AI requires owning the rights to training materials or not. Small artists, the ones protesting the most against it, will not. They are already seeing jobs and commission opportunities declining due to it.

        Being publicly available in some form is not a permission to use and reproduce those works however you feel like. Only the real owner have the right to decide. We on the internet have always been a bit blasé about it, sometimes deservedly, but as we get to a point we are driving away the very same artists that we enjoy and get inspired by, maybe we should be a bit more understanding about their position.

        • Grimy@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          8 months ago

          Thats basically my main point, Disney doesn’t need the data, Getty either. AI isn’t going away and the jobs will be lost no matter what.

          Putting a price tag in the high millions for any kind of generative model only benefits the big players.

          I feel for the artists. It was already a very competitive domain that didn’t really pay well and it’s now much worse but if they aren’t a household name, they aren’t getting a dime out of any new laws.

          I’m not ready to give the economy to Microsoft, Google, Getty and Adobe so GRRM can get a fat payday.

          • TwilightVulpine@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            8 months ago

            If AI companies lose, small artists may have the recourse of seeking compensation for the use and imitation of their art too. Just feeling for them is not enough if they are going to be left to the wolves.

            There isn’t a scenario here in which big media companies lose so talking of it like it’s taking a stand against them doesn’t make much sense. What are we fighting for here? That we get to generate pictures of Goofy? The small AI user’s win here seems like such a silly novelty that I can’t see how it justifies just taking for granted that artists will have it much rougher than they already have.

            The reality here is that even if AI gets the free pass, large media and tech companies are still primed to profit from them far more than any small user. They will be the one making AI-assisted movies and integrating chat AI into their systems. They don’t lose in either situation.

            There are ways to train AI without relying on unauthorized copyrighted data. Even if OpenAI loses, it wouldn’t be the death of the technology. It may be more efficient and effective to train them with that data, but why is “efficiency” enough to justify this overreach?

            And is it even wise to be so callous about it? Because it’s not going to stop with artists. This technology has the potential to replace large swaths of service industries. If we don’t think of the human costs now, it will be even harder to make a case for everyone else.

            • Grimy@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              arrow-down
              1
              ·
              8 months ago

              I fully believe AI will be able to replace 50% or more of desk jobs in the near future. It’s definitely a complicated situation and you make good points.

              First and foremost, I think it’s imperative the barrier for entry for model training is as low as possible. Anything else basically gives a select few companies the ability to charge a huge subscription fee on all our goods and services.

              The data needed is pretty heavy as well, it’s not very pheasible to go off of donated or public domain data.

              I also think any job loss is virtually guaranteed and trying to save them is misguided as well as not really benefiting most of those affected.

              And yea, the big companies win either way but if it’s easier to use this new tech, we might not lose as hard. Disney for instance doesn’t have any competition but if a bunch of indie animation companies and groups start popping up, it levels the playing field a bit.

      • kibiz0r@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        8 months ago

        We have a mechanism for people to make their work publically visible while reserving certain rights for themselves.

        Are you saying that creators cannot (or ought not be able to) reserve the right to ML training for themselves? What if they want to selectively permit that right to FOSS or non-profits?

        • Grimy@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          8 months ago

          Essentially yes. There isn’t a happy solution where FOSS gets the best images and remains competitive. The amount of data needed is outside what can be donated. Any open source work will be so low in quality as to be unusable.

          It also won’t be up to them. The platforms where the images are posted will be selling and brokering. No individual is getting a call unless they are a household name.

          None of the artists are getting paid either way so yeah, I’m thinking of society in general first.

      • deweydecibel@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        8 months ago

        The point is the entire concept of AI training off people’s work to make profit for others is wrong without the permission of and compensation for the creator regardless if it’s corporate or open source.

        • ANGRY_MAPLE@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          8 months ago

          I think I’ve decided to not publish anything that I want to keep ownership of, just in case. There’s an entire planet’s worth of countries, which will all have their own sets of laws. It takes waay too long to polish something, only to just give it away for free haha. Someone else is free to do that work if it is that easy. No skin off my back.

          I think it’s similar to many other hand-made crafts/items. Most people will buy their clothes from stores, but there are definitely still people who make beautiful clothing from hand better than machines could.

          Don’t even get me started on stuff like knitting. It already costs the creator a crap ton of money just for the materials. It takes a crap ton of time to make those, too. Despite the costs, many people just expect those knitted pieces for practically free. The people who expect that pricing are also free to go with machine-produced crafts/items instead.

          It comes down to what people want, and what they’re willing to pay, imo. Some people will find value in something physically being put together by another human, and other people will find value in having more for less. Neither is “wrong” necessarily, so long as no one is literally ripped off. (With over 8 billion people, it’s bound to happen at least once. I feel bad for whoever that is.)

          That being said, we’ll never be able to honestly say that the specific skills and techniques that are currenty required are the exact same. It would be like calling a photographer amazing at realism painting because their photo looks like real life. Photographers and painters both have their place, but they are not the exact same.

          I think that’s also part of what’s frustrating so many artists. Coding AI is not the same as using the colour wheel, choosing materials, working fine motor control, etc. It’s not learning about shadows, contrast, focal points, etc. I can definitely understand people not wanting those aspects to be brushed off, especially since it usually takes most of a lifetime to achieve. A music generator and a violin may both make great music, but they are not the same, and they require different technical skills.

          I’ll never buy AI art if I have any say in the matter. I’ll support handmade stuff first, every time.

          • Grimy@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            arrow-down
            1
            ·
            edit-2
            8 months ago

            There is definitely more value in hand made art. Even the fanciest prints on canvas can’t compare and I don’t think AI art will be evoking the same feelings a john waterhouse exhibit does any time soon.

            On the subject of publishing, I’ve chosen to embrace it personally. My view is that even the hidden stuff on our comp ends up in a Chinese or US databases anyways.

      • grue@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        8 months ago

        They want to kill the open-source scene

        Yeah, by using the argument you just gave as an excuse to “launder” copyleft works in the training data into permissively-licensed output.

        Including even a single copyleft work in the training data ought to force every output of the system to be copyleft. Or if it doesn’t, then the alternative is that the output shouldn’t be legal to use at all.

        • Grimy@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          8 months ago

          100% agree, making all outputs copyleft is a great solution. We get to keep the economic and cultural boom that AI brings while keeping the big companies in check.

  • Alien Nathan Edward@lemm.ee
    link
    fedilink
    English
    arrow-up
    92
    arrow-down
    5
    ·
    8 months ago

    this is because the technocrats are allowed to steal from you, but when you steal from them what they’ve stolen from actual researchers that’s a problem

    • blazeknave@lemmy.world
      link
      fedilink
      English
      arrow-up
      21
      arrow-down
      1
      ·
      8 months ago

      There are no technocrats. Just oligarchs, that titan newer industries. Same as the old boss. Don’t give them more credit than that. It’s evil capitalism. Lump them with bankers, not UX designers imho

        • blazeknave@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          ·
          8 months ago

          You’re not confused, you’re getting the point. Musk has more in common with Jamie Diamond than the tech workers with which he’s lumped by industry.

          It’s not a tech people/company problem. They’re just like accounts, they don’t own the enterprise.

  • I Cast Fist@programming.dev
    link
    fedilink
    English
    arrow-up
    67
    arrow-down
    1
    ·
    8 months ago

    What really breaks the suspension of disbelief in this reality of ours is that fucking advertising is the most privacy invasive activity in the world. Seriously, even George Orwell would call bullshit on that.

    • skarlow181@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      8 months ago

      What I find even more mind boggling is that despite all that tracking, advertising still misses the mark by a mile. I regularly see the same ad repeated 10 times in a row while also being completely irrelevant to me. Meanwhile I also frequently miss stuff that would be relevant for me and that should be covered by ads (e.g. movie releases, I might pick up the first trailer, but completely miss when the movie actually hits cinemas).

      For the money and effort spend on ads you’d think they could do a lot better than what they are.

      • trolololol@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        ·
        8 months ago

        Ads know your profile better than yourself. It’s telling you you’re a cheap bastard who won’t actually buy popcorn at the movies, making the theater run at a loss.

        /S

  • hottari@lemmy.ml
    link
    fedilink
    English
    arrow-up
    78
    arrow-down
    16
    ·
    8 months ago

    This is different. AI as a transformative tech is going to usher the US economy into the next boom of prosperity. The AI revolution will change the world and allow people to decide if they want to work for money or not (read UBI). In case you haven’t caught on, am being sarcastic.

    All this despite ChatGPT being a total complete joke.

    • douglasg14b@lemmy.world
      link
      fedilink
      English
      arrow-up
      54
      arrow-down
      1
      ·
      edit-2
      8 months ago

      Honestly couldn’t tell if you were being sarcastic or not because Poes law until I saw your note.

      If all the wealth created by these sorts of things didn’t funnel up to the 0.01% then yeah. It could usher in economic changes that help bring about greater prosperity in the same way mechanical automation should have.

      Unfortunately it’s just going to be another vector for more wealth to be removed from your average American and transferred to a corporation

    • TurtleJoe@lemmy.world
      link
      fedilink
      English
      arrow-up
      42
      arrow-down
      1
      ·
      8 months ago

      This was a case where you needed the sarcasm tag. Up to then, it was a totally “reasonable” comment from an AI bro.

      BTW, plug “crypto” in to your comment for AI, and it’s a totally normal statement from 2020/21. It’s such a similar VC grift.

        • SparrowRanjitScaur@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          4
          ·
          8 months ago

          Ah yes, of course. I remember this video. Not all of the specific points, but I do remember Adam Conover really chewing into large language models. Interestingly, that same Adam Conover must have believed AI isn’t actually that useless seeing as he became a leading member of the 2023 Hollywood writers strike, in which AI was a central focus:

          Writers also wanted artificial intelligence, such as ChatGPT, to be used only as a tool that can help with research or facilitate script ideas and not as a tool to replace them.

          https://en.wikipedia.org/wiki/2023_Writers_Guild_of_America_strike

          That said, I’m not going to rewatch a 25 minute video for a discussion on lemmy. Any specific points you want to make against chat gpt?

          • wikibot@lemmy.worldB
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            8 months ago

            Here’s the summary for the wikipedia article you mentioned in your comment:

            From May 2 to September 27, 2023, the Writers Guild of America (WGA)—representing 11,500 screenwriters—went on strike over a labor dispute with the Alliance of Motion Picture and Television Producers (AMPTP). With a duration of 148 days, the strike is tied with the 1960 strike as the second longest labor stoppage that the WGA has performed, only behind the 1988 strike (153 days). Alongside the 2023 SAG-AFTRA strike, which continued until November, it was part of a series of broader Hollywood labor disputes. Both strikes contributed to the biggest interruption to the American film and television industries since the COVID-19 pandemic. The lack of ongoing film and television productions resulted in some studios having to close doors or reduce staff. The strike also jeopardized long-term contracts created during the media streaming boom: big studios could terminate production deals with writers through force majeure clauses after 90 days, saving them millions of dollars. In addition, numerous other areas within the global entertainment ecosystem were impacted by the strike action, including the VFX industry and prop making studios. Following a tentative agreement, union leadership voted to end the strike on September 27, 2023. On October 9, the WGA membership officially ratified the contract with 99% of WGA members voting in favor of it. Its combined impact with the 2023 SAG-AFTRA strike resulted in the loss of 45,000 jobs, and "an estimated $6.5 billion" loss to the economy of Southern California.

            article | about

    • Joe Cool@lemmy.ml
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      8 months ago

      So, I feel taking an .epub and putting it in a .zip is pretty transformative.

      Also you can make ChatGPT (or Copilot) print out quotes with a bit of effort, now that it has Internet.

    • UnderpantsWeevil@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      8 months ago

      In case you haven’t caught on, am being sarcastic.

      It sounds like a completely sincere Marc Andressen post to me.

  • Maggoty@lemmy.world
    link
    fedilink
    English
    arrow-up
    66
    arrow-down
    9
    ·
    edit-2
    8 months ago

    Oh OpenAI is just as illegal as SciHub. More so because they’re making money off of stolen IP. It’s just that the Oligarchs get to pick and choose. So of course they choose the arrangement that gives them more control over knowledge.

    • Lemminary@lemmy.world
      link
      fedilink
      English
      arrow-up
      22
      arrow-down
      48
      ·
      edit-2
      8 months ago

      They’re not serving you the exact content they scraped, and that makes all the difference.

      • localhost443@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        25
        arrow-down
        4
        ·
        8 months ago

        Well if you believe that you should look at the times lawsuit.

        Word for word on hundreds/thousands of pages of stolen content, its damming

        • Lemminary@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          11
          ·
          8 months ago

          Why do you assume that I haven’t? The case hasn’t been resolved and it’s not clear how The NY Times did what they claim, which is may as well be manipulation. It’s a fair rebuttal by OpenAI. The Times haven’t provided the steps they used to achieve that.

          So unless that’s cleared up, it’s not damming in the slightest. Not yet, anyway. And that still doesn’t invalidate my statement above, because it’s still under very specific circumstances when that happens.

          • Emy@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            2
            ·
            8 months ago

            Also intention is pretty important when determining the guilt of many crimes. OpenAI doesnt intentionally spit back an author’s exact words, their intention is to summarize and create unique content.

              • Lemminary@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                ·
                8 months ago

                No, the real defense is “that’s not how LLMs work” but you are all hinging on the wrong idea. If you so think that an LLM is capable of doing what you claim, I’d love to hear the mechanism in detail and the steps to replicate it.

              • whofearsthenight@lemm.ee
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                8 months ago

                I mean, I’m not sure why this conversation even needs to get this far. If I write an article about the history of Disney movies, and make it very clear the way I got all of those movies was to pirate them, this conversation is over pretty quick. OpenAI and most of the LLMs aren’t doing anything different. The Times isn’t Wikipedia, most of their stuff is behind a paywall with pretty clear terms of service and nothing entitles OpenAI to that content. OpenAI’s argument is “well, we’re pirating everything so it’s okay.” The output honestly seems irrelevant to me, they never should have had the content to begin with.

                • Lemminary@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  8 months ago

                  That’s not the claim that they’re making. They’re arguing that OpenAI retains their work they made publicly available, which OpenAI claims is fair use because it’s wholly transformative in the form of nodes, weights and biases, and that they don’t store those articles in a database for reuse. But their other argument is that they created a system that threatens their business which is just ludicrous.

        • Lemminary@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          12
          ·
          8 months ago

          What a colorful mischaracterization. It sounds clever at face value but it’s really naive. If anything about this is deceptive, it’s the lengths that people go to to slander what they dislike.

          • jacksilver@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            4
            ·
            8 months ago

            Actually content laundering is the best term I’ve heard to describe the process. Just like money laundering, you no longer know the source and know it’s technically legal to use and distribute.

            I mean, if the copyrighted content wasn’t so critical, they would train models without it. Their essentially derivative works, but no one wants to acknowledge it because it would either require changing our copyright laws or make this potentially lucrative and important work illegal.

            • Lemminary@lemmy.world
              link
              fedilink
              English
              arrow-up
              5
              arrow-down
              1
              ·
              8 months ago

              Content laundering is not a good way to describe it because it’s misleading as it oversimplifies and mischaracterizes what a language model actually does. It’s a fundamental misunderstanding of how it works. Training language models is typically a transparent and well-documented process as described by the mountains of research over the past decades. The real value comes from the weights of the nodes in the neural network and not the source that it spits out in its entirety when it was trained. The source material is evaluated and wholly transformed into new data in the form of nodes and weights. The original content does not exist as it was within the network because there’s no way to encode it that way. It’s a statistical system that compounds information.

              And while LLMs do have the capacity to create derivative works in other ways, it’s not all that they do, or what they always do. It’s only one of the many functions that it has. What you say would probably be true if it was only trained on a single source, but that’s not even feasible. But when you train it on millions of sources, what remains are the overall patterns of language within those works. It’s much more sophisticated and flexible than what you describe.

              So no, if it was cut and dry there would be grounds for a legitimate lawsuit. The problem is that people are arguing points that do not apply but sound reasonable when they haven’t seen a neural network work under the hood. If anything, new laws need to be created to address what LLMs do if you’re so concerned about proper compensation.

              • jacksilver@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                2
                ·
                8 months ago

                I am familiar with how LLMs work and are trained. I’ve been using transformers for years.

                The core question I’d ask is, if the copyrighted material isn’t essential to the model, why don’t they just train the models without that data? If it is core to the model, then can you really say they aren’t derivative of that content?

                I’m not saying that the models don’t do something more, just that the more is built upon copyrighted material. In any other commercial situation, you’d have to license/get approval for the underlying content if you were packaging it up. When sampling music, for example, the output will differ greatly from the original song, but because you are building off someone else’s work you must compensate them.

                Its why content laundering is a great term. The models intermix so much data that it’s hard to know if the content originated from copyrighted materials. Just like how money laundering is trying to make it difficult to determine if the money comes from illicit sources.

          • Jilanico@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            3
            ·
            8 months ago

            I feel most people critical of AI don’t know how a neural network works…

            • Lemminary@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              5
              ·
              8 months ago

              That is exactly what’s going on here. Or they hate it enough that they don’t mind making stuff up or mischaracterizing what it does. Seems to be a common thread on the Fediverse. It’s not the first time this week I’ve seen it.

      • Cethin@lemmy.zip
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        6
        ·
        8 months ago

        It’s great how for most of us we’re taught that just changing the order of words is still plagerism. For them they frequently end up using the exact same words as other things and people still argue it somehow is intelligent and somehow not plagerism.

        • Lemminary@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          8 months ago

          “Changing the order of words” is what it does? That’s news to me. And do you have examples of it “using the exact same words as other things” without prompt manipulation?

          • asret@lemmy.zip
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            8 months ago

            Why does the prompting matter? If I “prompt” a band to play copyrighted music does that mean they get a free pass?

            • Lemminary@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              ·
              edit-2
              8 months ago

              That’s not a very good analogy because the band would be reproducing an entire work of art which an LLM does not and cannot. And by prompt manipulation I mean purposely making it seem like the LLM is doing something it wouldn’t do on its own. The operating word is seem, which is what I meant by manipulation. The prompting here is irrelevant, but how it’s done is. So unless The Times releases the steps they used to get ChatGPT to output what it did, you can’t really claim that that’s what it does.

              In a blog post, OpenAI said the Times “is not telling the full story.” It took particular issue with claims that its ChatGPT AI tool reproduced Times stories verbatim, arguing that the Times had manipulated prompts to include regurgitated excerpts of articles. “Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts,” OpenAI said.

            • stewsters@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              8 months ago

              If you passed them a sheet of music I’d say that’s on you, it would be your responsibility to not sell recordings of them playing it.

              Just like if I typed the first chapter of Harry Potter into word it is not Microsoft’s intent to breach copyright, it would have been my intent to make it do it. It would be my responsibility not to sell that first chapter, and they should come after me if I did, even though MS is a corporation who supplied the tools.

  • Jknaraa@lemmy.ml
    link
    fedilink
    English
    arrow-up
    37
    arrow-down
    1
    ·
    8 months ago

    And people wonder why there’s so much push back against everything corps/gov does these days. They do not act in a manner which encourages trust.

      • Gutless2615@ttrpg.network
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        3
        ·
        edit-2
        8 months ago

        That’s a pretty strong accusation. You seem to like to wade through people’s post history but to my cursory glance nothing would indicate this poster is a troll.

        You understand AI posts frequently surface on this platform and people will engage with those posts even if they disagree with you?

          • Gutless2615@ttrpg.network
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            2
            ·
            edit-2
            8 months ago

            Yeah you keep spamming that screen shot. Idk I’m not seeing it. I read the thread you’re posting and it seems like you’re just digging in and insisting that someone that disagrees with you must be a troll.

            For what it’s worth, you made the same accusation against me yesterday and after I think I pretty effectively (and unnecessarily I might add) defended myself you deleted those posts. Making spurious accusations like that (and, as I read it, this) are also trollish behavior that doesn’t further any discussion. I’ve looked in your thread you’re posting. You come out flying with accusations based on extremely flimsy evidence. I think OPs responses seemed entirely warranted.

              • Gutless2615@ttrpg.network
                link
                fedilink
                English
                arrow-up
                4
                arrow-down
                2
                ·
                8 months ago

                No, see, it actually isn’t self evident. After being accused of being disingenuous because he only talked about open source in the context of AI — again almost the verbatim ridiculous accusation you lobbed at me before cowardly deleting it - he asked for a citation relevant to the issue and someone sent a CNN article about Duolingo laying off staff. That isn’t the gotcha you think it is. It doesn’t “destroy my reputation” lmao to point out that you are, in fact, acting like a troll. This is a pattern of yours. Be better.

  • Uriel238 [all pronouns]@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    1
    ·
    8 months ago

    The IP system, which goes to great lengths to block things like open-access scientific publications, is borked borked borked borked borked.

    If OpenAI and other generative AI projects are the means by which we finally break it so we can have culture and a public domain again, well, we had to nail Capone with tax evasion.

    Yes, industrialists want to use AI [exactly they way they want to use every other idea – plausible or not] to automate more of their industries so they can pay fewer people less money for more productivity. And this is a problem of which generative AI figures centrally, but it’s not really all that new, and eventually we’re going to have to force our society to recognize that it works for the public and not money. I don’t think AI is going to break the system and lead us to communist revolution ( The owning class will tremble…! ) But eventually it will be 1789 all over again. Or we’ll crush the fash and realize the only way we can get the fash to not come back is by restoring and extending FDR’s new deal.

    I am skeptical the latter can happen without piles of elite heads and rivers of politician blood.

    • JoeKrogan@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      ·
      8 months ago

      Thats actually not a bad idea, train a model with all the data in scihub a then release the model to the public

    • Maggoty@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      8 months ago

      We need to ban the publishing business from academic stuff. Have the Universities host a site that’s free access. They can also better run the peer review system and the journals would also also no longer control what research sees the light of day even behind a paywall.

      • Liz@midwest.social
        link
        fedilink
        English
        arrow-up
        6
        ·
        8 months ago

        How would you publish if you’re not a part of a major research institution? Los Alamos National Lab could host its own papers just fine, but what about small-time labs? I know of at least one person who doesn’t even officially work in science but publishes original research they do in their free time.

        The journal system still provides a service, even if they over-charge for access. The peer review system has value. Imagine if there was zero barrier to publish. As a reader, you’d have to wade through piles of trash to find decent science.

        Where would you find it all? Currently we use journal aggregators, whose service also has value and costs money. Are you really going to go to every university’s website looking for research relevant to your area? We could do that again, but with everyone responsibile for publishing their own work, well, who gets indexed with the aggregators?

        • Maggoty@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          8 months ago

          You get published with a university instead of a for profit publishing system. And universities would get a good or bad reputation for their peer review, just like journals. The aggregator could easily be run by a coalition of universities with government grants to make the maintenance and upkeep free to the users and universities.

          We do not have to lock research behind paywalls.

      • cecinestpasunbot@lemmy.ml
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        8 months ago

        The problem isn’t just publishing though, it’s academia as well. Scientists are incentivized to publish in “prestigious” closed access journals such as Nature. They are led to believe it’s better for their career than publishing in open access journals such as PLOS One. As such, groundbreaking papers often get paywalled. Universities then feel obligated to pay outrageous subscription fees to access them.

    • Imgonnatrythis@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      6
      ·
      8 months ago

      Don’t mind? Hell, we want people to read that shit. We don’t profit at all if it’s paywalled, it hurts us and hurts science in general. This is 100% the wishes of scientific for profit journals.

    • breakfastmtn@lemmy.ca
      link
      fedilink
      English
      arrow-up
      5
      ·
      8 months ago

      Academics don’t care because they don’t get paid for them anyway. A lot of the time you have to pay to have your paper published. Then companies like Elsevier just sit back and make money.

    • brsrklf@jlai.lu
      link
      fedilink
      English
      arrow-up
      3
      ·
      8 months ago

      I follow a few researchers with interesting youtube channels, and they often mention that if you ask them or their colleagues for a publication of theirs, chances are they’ll be glad to send it to you.

      A lot of them love sharing their work, and don’t care at all for science journal paywalls.

      • andrew_bidlaw@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0
        ·
        8 months ago

        Other than be happy for that attention and being curious of what extra things you can find in their field, they get quoted and that pushes their reputation a little higher. Locking up works heavily limits that, and the only reason behind that is a promise of a basic quality control when accepting works - and it’s not ideal, there are many shady publications. Other than that it’s cash from simple consumers, subscriptions money from institutes for works these company took a hold of and maybe don’t have physical editions anymore just because, return to fig. 1, they depend on being published and quoted.

        • brsrklf@jlai.lu
          link
          fedilink
          English
          arrow-up
          2
          ·
          8 months ago

          Sure, that’s a motivation too, but they were also talking about random people who’d find a reference and were curious about their work, not just other researchers who may quote them. It’s not all about h-index.

          When a guy literally makes, among other things, regular paleontology news reports and whole videos of his own university course material during summer breaks, and puts all that to youtube it’s safe to assume he just likes popularizing his subject.

    • honey_im_meat_grinding@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      8 months ago

      I’m starting to think the term “piracy” is morally neutral. The act can be either positive or negative depending on the context. Unfortunately, the law does not seem to flow from morality, or even the consent of the supposed victims of this piracy.

      The morals of piracy also depend on the economic system you’re under. If you have UBI, the “support artists” argument is far less strong, because we’re all paying taxes to support the UBI system that enables people to become skilled artists without worrying about starving or homelessness - as has already happened to a lesser degree before our welfare systems were kneecapped over the last 4 decades.

      But that’s just the art angle, a tonne of the early-stage (i.e. risky and expensive) scientific advancements had significant sums of government funding poured into them, yet corporations keep the rights to the inventions they derive from our government funded research. We’re paying for a lot of this stuff, so maybe we should stop pretending that someone else ‘owns’ these abstract idea implementations and come up with a better system.

  • Flying Squid@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    arrow-down
    6
    ·
    8 months ago

    Yeah, but did SciHub pay Nigerians a pittance to look at and read about child rape? Because- wait, I have no idea what I’m even arguing. Fuck OpenAI though.

    • owlet@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      5
      ·
      8 months ago

      OpenAI did those subhuman training of ChatGPT in Kenya, not Nigeria. And since the Kenyan govt is a western lapdog these days, nothing would ever come out of that.

  • UnderpantsWeevil@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    3
    ·
    8 months ago

    Consider who sits on OpenAI’s board and owns all their equity.

    SciHub’s big mistake was to fail to get someone like Sundar Pichai or Jamie Iannone with a billion-dollar stake in the company.

  • rivermonster@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    2
    ·
    8 months ago

    Kind of a strawman, I’d like everything to be FOSS, and if we keep Capitalism (which we shouldn’t), it should be HEAVILY regulated not the laissez-faire corporatocracy / oligarchy we have now.

    I don’t want any for-profit capitalists to have any control of AI. It should all be owned by the public and all productive gains from it taxed at 100%. But open source AI models, right on.

    And team SciHub–FUCK YEAH!

    • BURN@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      8 months ago

      Cyberpunk 2077 had a whole giant plot point that the old net was overtaken by rough AIs and the AI wars were a thing.

      I’m not sure they’re that far off base

    • sndrtj@feddit.nl
      link
      fedilink
      English
      arrow-up
      24
      ·
      8 months ago

      A website where you can download paywalled scientific literature. Most scientific literature is paywalled by publishers, and costs a real significant amount to read (like 30-50$ per article if you don’t have a subscription).

      Scihub basically just pirates it. And has been shut down several times. But as most scientific studies are already laid with public money, scihub isn’t that unethical at all.

      • andros_rex@lemmy.world
        link
        fedilink
        English
        arrow-up
        14
        ·
        8 months ago

        Lots of scientists will just send you their article if you email them. They don’t get the money when you pay to read it - often they pay to submit. Reviewing journal articles is a privilege and doesn’t get you paid. The prestige of a scientific article is from the number of times people have cited it. The only “harm” done is that the publisher doesn’t get to make 100% profit for doing nothing.

        Journal publishing is mostly a way to extract money from universities. Elsevier and its ilk name whatever price they think a research university can afford.

        • Gargantu8@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          8 months ago

          Very true. Also, a new federal policy is now in place and requires any research funded even in part by federal money be open access. As a result we should see much more high quality research becoming open access (already has begun). Only downside is research labs like mine have to use more money to publish to these journals because open access costs more for the authors. Hopefully this system gets reformed during my lifetime.

          But yes, please just email the authors! Works most of the time and I think it’s fun.