The long-awaited Anthropic “Claude” AI training, copyright fair use judgment has just been issued in the Northern District of California, USA. Here are some important takeaways for business owners:
It goes without saying that you should respect copyright and lawfully acquire any copyrighted material for all uses, especially for building data libraries. Piracy is a significant legal risk.
Training AI models on copyrighted material can be considered “transformative fair use” (and therefore not infringe copyright), particularly if the outputs do not reproduce the original works. While training may be fair use, mechanisms to prevent infringing outputs from the LLM are critical to ensure there is no reproduction.
Different stages of data handling (acquisition, storage, processing, training) may be subject to separate fair use analyses. A transformative end-use does not justify infringing acts earlier in the process.
Ensure robust internal controls and record-keeping for data acquisition and usage, as transparency is critical in litigation. Anthropic’s resistance to providing information about what specific copies were used for training LLMs, including clawing back a spreadsheet, was held against it by the court.
Interestingly, the court held that the potential for LLMs to generate new works that compete with authors’ works (e.g., alternative summaries or narratives) is not the “kind of competitive or creative displacement that concerns the Copyright Act.” The Act aims to advance original authorship, not shield authors from competition.

