Copyright in the Era of Generative AI: Untangling a Web of Legal Complexities

Introduction: The rise of generative artificial intelligence (Gen AI) presents intriguing legal challenges, particularly in the realm of copyright law. As Gen AI technologies like large language models (LLMs) become more advanced and pervasive, they raise significant questions about the potential for copyright infringement and the boundaries of creative ownership. This blog post explores the complex intersection of Gen AI, creativity, and copyright legislation.

1. The Potential for Copyright Infringement with Gen AI

AI systems can be programmed to create content by mimicking specific styles or even reproducing the thematic essence of copyrighted works. For instance, if an AI is prompted to produce a story resembling Ernest Hemingway's The Old Man and the Sea without replicating the exact text, it challenges traditional notions of copyright infringement. The crux of the issue lies in determining when and how infringement occurs through the use of generative AI.

What if Gen AI created a prequel or sequel of The Old Man and the Sea using the same characters and Hemmingway style? Would that be copyright infringement? It is a new work, but it clearly builds upon the original story and could easily fall into the realm of being a derivative version of the novel.

2. Stages of AI Involvement and Legal Considerations

  • Material Selection for Training: AI models are trained on vast datasets culled from the internet, raising questions about the legality of using copyrighted material in these datasets. However, the initial selection of training data is generally not where infringement claims arise. As an example, add the reference from the NYT’s legal battle with OpenAI which accuses them of using their materials to train their models: https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
  • AI Training Process: Similar to how a human might be influenced by reading Hemingway, an AI model retains and processes information. The current legal stance seems to be that training itself isn't infringement, but the specific use of the training data in the generated content might be, since the model’s output depends on specific user prompts.
  • Inference and Output Generation: The key legal issue often emerges at the point of content generation. For example, transforming a copyrighted software from Java to Go using AI, while maintaining its functionality and design seems to be a clear copyright infringement.

3. The Role of User Prompts in AI-Generated Content

User prompts play a crucial role in determining the nature of AI-generated content. The examples above about The Old Man and the Sea demonstrate the role the prompt plays in creating infringing content. Telling the Gen AI to create the same story, characters, themes, style, etc. but just change the names still results in infringing content. In fact, the prompt could purposely obfuscate the results to make it appear as if it does not infringe. For example, if you feed source code into an LLM and tell it to recreate the app but change the structure, variables, programming language, comments, etc. is it still infringing? We might at some point rely on Gen AI to compare the source code of the applications to determine whether and to what degree one is infringing the copyright of the other. This highlights the need for clarity on how much transformation or obfuscation is necessary to avoid legal pitfalls.

4. Legal Implications of AI Mimicry and Residual Knowledge

Court cases like the Ed Sheeran copyright trial underscore the complexities of determining infringement in creative outputs. At what point does “being influenced by…” like Oasis was influenced by the Beatles, transition into the realm of infringement? AI, with its perfect recall, complicates this further by potentially retaining and using residual knowledge from its training data, leading to outputs that may inadvertently infringe on copyrighted materials. For example, some artists have a very distinct style. That style is stored as residual knowledge in the Gen AI model. If you ask the AI for a stylistically similar piece of art, the resulting output could appear to be infringing in its striking similarity.

5. Guardrails and Corporate Responsibility in AI Deployment

Companies deploying AI services must establish robust guardrails to prevent blatant copyright infringement. This includes monitoring the use of copyrighted material in prompts and possibly restricting certain types of content generation. However, the legal responsibility often remains ambiguous, especially when the AI generates content that has never been directly exposed to the protected material.

An Gen AI service might, refuse to translate software source code from one language to another for fear of infringing the original copyright. But a company may be modernizing their software and using Gen AI to convert their own Cobol application to Golang. This would clearly infringe on the Cobol copyright, but that company owns the copyright, so they will want the ability to “Click to Accept Copyright Liability”. I expect that we will see a layered approach to this liability passing, starting with the terms of service, then upon service start-up and finally a stronger disclaimer when the action results in infringing output like the example above.

Conclusion

As AI continues to evolve, so too must our legal frameworks. Copyright law needs to adapt to the realities of Gen AI. Finding a balance between fostering creativity and protecting intellectual property rights is crucial. This might involve establishing clear guidelines for Gen AI service providers, users, and the legal system itself.

Call to Action

Stay updated with the latest developments in AI and copyright law by subscribing to our newsletter. Engage with us through comments to share your perspectives on how AI is reshaping the creative landscape.


Like what you see? Share it with your friends.
Mike Hogan

Mike Hogan

My team and I build amazing web & mobile apps for our companies and for our clients. With over $2B in value built among our various companies including an IPO and 3 acquisitions, we've turned company building into a science.

Leave a Reply

Your email address will not be published. Required fields are marked *