Artificial Intelligence and the Use of Copyrighted Content: An Emerging Legal Conflict

Artificial intelligence (AI) has advanced at a rapid pace in recent years, generating applications that affect almost every aspect of modern life. From natural language processing to the creation of images or music, AI models have demonstrated their versatility and their ability to transform multiple industries.

However, this progress in artificial intelligence has brought with it several ethical and legal issues, especially regarding the use of copyrighted content. As AI companies train their models with large amounts of data, questions arise about whether they are violating the rights of content creators by using their works without permission.

This situation is leading to an increasing number of copyright infringement lawsuits that are shaping the future of AI use. Below, ITD Consulting provides details of OpenAI's recent lawsuit.

The Use of Publicly Available Content in AI Model Training

To understand the current legal conflict, it is first necessary to explain how AI companies train their models. AI models, such as the one that powers ChatGPT or other similar virtual assistants, require a massive amount of data to generate coherent and useful responses.

These AI models are trained using texts, images, and other types of information available on the internet. Some of this content is publicly available, such as blog articles, forums, social media posts, and other materials that AI can use to learn language patterns and general knowledge.

La inteligencia artificial y el uso de contenidos con derechos de autor: Un conflicto legal emergente, ITD Consulting, innovación tecnológica, OpenAI, inteligencia artificial, derechos de autor, copyright, demandas

AI models like ChatGPT, for example, have been fed enormous amounts of data to improve their ability to generate informed and accurate responses.

However, the issue becomes more complicated when it comes to content protected by copyright. In this case, many argue that AI companies should obtain permission from the creators or owners of this content before using it to train their models.

Using copyrighted material without proper consent could be considered a violation of intellectual property rights. The legal dilemma is clear: AI companies often claim that the data they use is publicly available and, therefore, can be used under the principle of "fair use." However, content creators, such as journalists and artists, argue that they should be compensated for their work when it is used in this way by AI.

Lawsuits for Copyright Infringement

One of the most notable cases highlighting the issue of copyright in relation to AI is the lawsuit between OpenAI and the publications Raw Story and Alternet. In February 2024, both publications filed a lawsuit against OpenAI, accusing the company of using thousands of their articles without permission to train its ChatGPT model.

According to the plaintiffs, OpenAI had deliberately removed copyright information, such as author names and copyright notices, allowing the model to generate responses based on those contents without acknowledging the original source. This type of practice, according to the publications, facilitates plagiarism and unauthorized reproduction of protected material.

The crux of the accusation refers to the removal of copyright management information (CMI), which would enable the AI to generate responses based on copyrighted articles without properly attributing the content. This issue has significant implications in the realm of intellectual property, as the lack of author attribution could be seen as a violation of the creator’s rights.

In addition to removing CMI, the publications accused OpenAI of using their material without proper compensation. The plaintiffs argue that the exploitation of their work in this context not only represents theft of intellectual property but also harms the media financially by reducing their web traffic and, therefore, their advertising revenue.

In this case, Raw Story and Alternet are seeking financial compensation for each infringement that has occurred, which, according to their calculations, amounts to billions of dollars.

The Judicial Response: A First Victory for OpenAI

In a significant decision, federal judge Colleen McMahon in New York dismissed the lawsuit filed by Raw Story and Alternet against OpenAI. In her ruling, the judge argued that the plaintiffs failed to demonstrate any concrete harm resulting from the alleged copyright infringement.

McMahon stated that the plaintiffs did not present sufficient evidence that ChatGPT had directly reproduced the content of their articles in a way that violated their copyrights. Furthermore, she emphasized that the widespread and public use of materials on the web makes it difficult to track and accurately attribute AI-generated content.

McMahon also pointed out that the accusation that the AI had removed CMI from articles did not represent a tangible harm. According to the judge, the plaintiffs did not show how the removal of these metadata had significantly affected their business.

In her ruling, the judge expressed skepticism about the media's ability to demonstrate "demonstrable harm," as required for a lawsuit under U.S. copyright law.

Although this was a victory for OpenAI, the case is not closed. Raw Story and Alternet have the option to appeal the decision and present new arguments to the court. According to statements from the publications' lawyers, they are working on a new case that could address the judge’s concerns about the lack of direct evidence of harm.

The Rise of Copyright Infringement Lawsuits

The Raw Story and Alternet case are not an isolated incident. As AI capabilities continue to expand, other companies, media outlets, and content creators are beginning to file similar lawsuits.

In 2023, The New York Times sued OpenAI and its partner Microsoft, accusing them of using millions of its articles to train AI models without obtaining proper authorization. This lawsuit was based on the claim that the newspaper's articles were used to teach ChatGPT without any licensing or compensation.

La inteligencia artificial y el uso de contenidos con derechos de autor: Un conflicto legal emergente, ITD Consulting, innovación tecnológica, OpenAI, inteligencia artificial, derechos de autor, copyright, juicio

The New York Times is not the only media outlet to take legal action. Other publications, such as The Wall Street Journal, CNN, and several smaller editorial groups, have started to investigate how their content is being used by AI companies.

These media outlets argue that not only should they be paid for the use of their material, but clear rules should also be established to protect their intellectual property.

The growth of these lawsuits is creating a tense legal environment where tech giants like OpenAI, Microsoft, and Google are struggling to find a balance between using large volumes of data to train their AI models and respecting the copyright of content creators.

The Lack of Transparency on Training Data

One of the main criticisms leveled at AI companies is the lack of transparency regarding the data they use to train their models. While OpenAI and other companies have claimed they use publicly available data from the web, they have not provided clear details on how this data is collected or what specific types of information are used.

This secrecy has fueled concerns that AI companies are using copyrighted materials without the consent of their owners.

The lack of clarity about the data sets used to train AI models has led many media outlets and content creators to question whether their works are being used without their knowledge. This creates an atmosphere of distrust, where content creators feel vulnerable to the power of the large tech companies that dominate the AI sector.

In response to these concerns, some AI companies have begun to adopt more transparent approaches, seeking licensing agreements with content creators.

For example, OpenAI has reached agreements with media outlets like El País and Le Monde, allowing them to use their content in exchange for compensation. However, most companies have not adopted this approach, and the lack of clear regulations on the use of copyrighted content remains a hot topic.

The Response of Big Tech: "Fair Use" and Licensing

One of the most common defenses by AI companies against copyright infringement accusations is that their use of data is covered under the principle of "fair use."

According to this legal doctrine, the use of copyrighted material without permission is allowed in specific situations, such as research, criticism, or the creation of new works. AI companies argue that by using the content to train language models and improve technology, they are within the limits of fair use.

Google and Meta, for example, have openly defended their right to use publicly available data from the web to train their AI models, arguing that this type of use benefits society by making technological advancements accessible. However, content creators do not always agree with this interpretation of "fair use."

Many argue that the benefit to large AI tech companies does not justify the harm to content creators, who often see their work used without receiving any compensation.

In response to these lawsuits, some companies are beginning to negotiate agreements with media outlets and creators. For example, OpenAI signed agreements with Prisa (publisher of El País) and Le Monde, allowing their content to be used to train AI models in exchange for compensation. However, this type of agreement is still limited and does not solve the widespread issue of unauthorized use of copyrighted content.

Towards a Future of Licensing Agreements or Regulation

Towards a Future of Licensing Agreements or Regulation

Experts suggest that one possible way to resolve this conflict would be to establish clear licensing agreements between AI companies and content owners. Instead of using data in a generalized, unauthorized way, companies could negotiate agreements that allow the legal use of content in exchange for compensation.

On the other hand, there is also a discussion about the possibility of establishing a new global regulation to address the use of data in the context of AI. Perhaps creating a legal framework that requires AI companies to pay for the use of copyrighted content will be the solution that allows for a balance between technological innovation and intellectual property protection.

La inteligencia artificial y el uso de contenidos con derechos de autor: Un conflicto legal emergente, ITD Consulting, innovación tecnológica, OpenAI, inteligencia artificial, derechos de autor, copyright, New York Times

The debate over the use of copyrighted content in the training of AI models is a complex issue that will continue to be a legal and ethical challenge. AI companies argue that access to large volumes of data is essential for the development and improvement of their technologies and defend the idea that the use of publicly available information should be allowed under the principle of "fair use."

However, content creators, such as media outlets, authors, and artists, point out that these practices not only infringe on their intellectual property but also reduce their economic opportunities by not receiving compensation for the use of their work. The lack of clear regulation on the use of copyrighted content in AI is creating an environment of legal uncertainty, where there is a constant battle between technological innovation and the protection of creators' rights.

As copyright infringement lawsuits increase, the legal landscape is becoming more complicated. Courts will have to determine whether AI companies truly benefit from "fair use" or if, on the contrary, they are unfairly exploiting others' work.

If judicial decisions continue to favor tech companies, content creators may be forced to adapt to an environment where their intellectual property rights are systematically disregarded. On the other hand, if the courts rule in favor of creators, this could force AI companies to rethink their business models and establish more equitable agreements, seeking ways to compensate creators for the use of their works, which could deeply alter the development of artificial intelligence.

Ultimately, the solution will likely lie in a combination of licensing agreements between AI companies and content creators, accompanied by clearer and stronger regulation at the national and international levels. Although some steps have already been taken, such as agreements between OpenAI and media outlets like El País and Le Monde, there is still a long way to go.

The creation of specific legal frameworks that address the use of copyrighted data in AI model training could balance the interests of large tech companies and content creators, enabling a more fair and sustainable development of artificial intelligence. This is a process that will take time but is essential to ensuring that technological innovation is carried out ethically and in a way that respects the rights of individuals and institutions.

If you want to learn more about the controversies AI faces and how to use it safely, write to us at [email protected]. We have a dedicated tech team ready to advise you.

Do you want to SAVE?
Switch to us!

✔️ Corporate Email M365. 50GB per user
✔️ 1 TB of cloud space per user

en_USEN

¿Quieres AHORRAR? ¡Cámbiate con nosotros!

🤩 🗣 ¡Cámbiate con nosotros y ahorra!

Si aún no trabajas con Microsoft 365, comienza o MIGRA desde Gsuite, Cpanel, otros, tendrás 50% descuento: 

✔️Correo Corporativo M365. 50gb por usuario.

✔️ 1 TB of cloud space per user 

✔️Respaldo documentos. Ventajas: – Trabajar en colaboración Teams sobre el mismo archivo de Office Online en tiempo real y muchas otras ventajas.

¡Compártenos tus datos de contacto y nos comunicaremos contigo!

[contact-form-7 id="eeb1893" title="Formulario de contacto 1"]