AI training with open data is becoming increasingly complex.
Outrage has been aroused by Microsoft’s AI chief’s assertion that any publicly accessible data used to train AI models is “freeware.
“During the Aspen Ideas Festival, Mustafa Suleyman, the CEO of Microsoft AI, made an effort to distinguish between content that is freely accessible on the internet and content that is specifically protected by copyright, as said in a CNBC interview.
He did, however, also recognize the complexity of the material that publishers take care to prevent scraping.
Should online content be used for AI training?
Suleyman also underlined the need for responsible development and governance during the extensive debate addressing the current status of AI technology, its potential impact on many industries and society, the difficulties and worries surrounding its development, and the role of AI in the future.
The discussion goes into the controversy around open-source versus closed-source AI models, with Suleyman arguing that when it comes to international development—especially with China—cooperation is preferable to an adversarial strategy.
Nevertheless, content creators have claimed that their intellectual property is being used without paying for it, regardless of where AI models are trained. Many have even suggested that the continuous unapproved use of their work jeopardizes both their livelihoods and, to some extent, the integrity of generative AI.
Ongoing legal proceedings support Suleyman’s claim that the bounds of AI model training remain ambiguous. The Center for Investigative Reporting sued OpenAI and its largest backer, Microsoft, shortly after the talk for allegedly exploiting the nonprofit’s content without consent or payment.
“OpenAI and Microsoft started vacuuming up our stories to make their product more powerful, but they never asked for permission or offered compensation, unlike other organizations that license our material,” said Monika Bauerlein, CEO of the company.
Although Microsoft is still under fire for how it handles data for AI, at least it has provided users of its GenAI tools with copyright protection to shield them from lawsuits.
We are collaborating with the news industry and working with international news publishers to display their content in our products like ChatGPT, including summaries, quotes, and attribution, to drive traffic back to the source articles,” an OpenAI representative informed us. The capacity to apply different machine learning and training techniques to exploit publisher material is one of the partnerships’ components. This helps us optimize the content’s display and increase its usefulness to consumers.”