Training the Machine: AI, Copyright, and India’s Statutory Licensing Crossroads

Author: Nikhil Saini
Student, Manipal University Jaipur, Jaipur

💡 3 Quick Takeaways

India’s Copyright Act, 1957 does not currently provide a clear legal basis for the commercial use of copyrighted works in AI training.
The closed-list fair dealing framework under Section 52 is unlikely to offer meaningful protection to commercial AI developers engaged in large-scale training activities.
A carefully designed statutory licensing regime, coupled with fair remuneration mechanisms and safeguards for creators and innovators, may provide a balanced path forward.

Abstract

The explosive growth of generative artificial intelligence (AI) models has exposed a significant gap within India’s copyright framework. The Copyright Act, 1957, designed primarily for a pre-digital creative economy, contains no explicit provision governing the use of copyrighted works as training data for AI systems.

This article addresses two interrelated questions. First, does the large-scale ingestion of copyrighted works for training AI models amount to copyright infringement under Sections 14 and 51 of the Copyright Act, 1957? Second, does the fair dealing exception under Section 52(1)(a) provide any meaningful legal protection for commercial AI developers?

Drawing upon the DPIIT Working Paper on Artificial Intelligence and Copyright (2025), the ongoing litigation in ANI Media Pvt. Ltd. v. OpenAI Inc., and comparative approaches adopted in the United States, the European Union, Japan, and Singapore, the article argues that India’s fair dealing framework is too limited to function as a viable safe harbour for commercial AI training.

The article further evaluates the Department for Promotion of Industry and Internal Trade’s proposed blanket licensing framework and contends that a tiered royalty model, combined with careful TRIPS compliance analysis, is necessary before any statutory licensing regime is implemented.

Keywords: Artificial Intelligence, Copyright Act 1957, Fair Dealing, Text and Data Mining, Statutory Licensing, AI Training, Copyright Infringement

Introduction

Disruptive technologies have repeatedly compelled legal systems to confront new realities. From the printing press to the internet, copyright law has continually faced the challenge of adapting statutory frameworks to technological transformation.

Generative artificial intelligence presents perhaps the most significant challenge yet.

Unlike earlier technologies that merely reproduced or distributed creative works, large language models consume vast quantities of copyrighted material, identify patterns within those works, and generate outputs that may reflect or overlap with the content on which they were trained.

India currently finds itself at a crucial juncture. The country’s generative AI ecosystem has expanded rapidly, growing from approximately 240 startups during the first half of 2024 to more than 890 by the first half of 2025. At the same time, India’s creative economy, valued at approximately USD 43 billion and contributing significantly to national GDP, supplies much of the culturally specific and linguistically diverse content required for high-quality AI systems.

These competing interests generate an important legal question: how can copyright law simultaneously support technological innovation and protect the interests of creators?

The issue has acquired particular significance following ANI Media Pvt. Ltd. v. OpenAI Inc., one of India’s first major AI-copyright disputes currently before the Delhi High Court. Simultaneously, the Department for Promotion of Industry and Internal Trade (DPIIT) has initiated policy discussions concerning copyright reform and AI training data.

Against this backdrop, the article examines whether India’s existing copyright framework can accommodate commercial AI training and explores potential pathways for reform.

AI Training and Copyright Infringement: The Sections 14 and 51 Question

Any assessment of copyright liability must begin with an understanding of how AI systems are trained.

Training a large language model involves much more than simply reading content. The process requires large-scale ingestion of source materials, digitisation, tokenisation, repeated processing of datasets, and storage of statistical relationships within the model.

Each of these activities potentially implicates copyright law.

Section 14 of the Copyright Act, 1957 grants copyright owners the exclusive right to reproduce their works, including the right to store those works in any medium by any means. Section 51 further provides that copyright is infringed when a person undertakes an act reserved exclusively to the copyright owner without authorisation.

The central doctrinal issue is therefore whether the copying and storage involved in AI training constitutes reproduction within the meaning of Section 14.

According to the article, the statutory language suggests an affirmative answer.

The DPIIT Working Paper acknowledges that Indian copyright law presently contains no distinct defence for “non-expressive use.” Reproduction under Section 14 depends upon the act of copying itself rather than the purpose for which the copying occurs.

This position finds some support in international developments. In Kadrey v. Meta Platforms, Inc., a United States District Court observed that even statistical representations generated during AI training may embody elements of creative expression.

The counterargument relies on the distinction between ideas and expression. Proponents of AI training contend that models learn patterns and structures rather than reproducing protected expression itself.

However, Indian copyright law presently lacks broad doctrines such as “transformative use” or “non-expressive use.” As a result, the article argues that large-scale copying of copyrighted works for commercial AI training constitutes prima facie infringement under Sections 14 and 51 in the absence of an express statutory exception.

The Fair Dealing Defence: Why Section 52 Falls Short

Given the potential infringement concerns, AI developers may seek protection under Section 52 of the Copyright Act, which specifies acts that do not constitute copyright infringement.

The article examines three provisions that may appear relevant.

Section 52(1)(a): Fair Dealing

The most frequently cited provision is Section 52(1)(a), which permits fair dealing for purposes such as private or personal use, research, criticism, review, and reporting of current events.

However, the concept of fair dealing under Indian law differs significantly from the American doctrine of fair use.

Unlike the open-ended four-factor analysis available under Section 107 of the United States Copyright Act, India’s fair dealing provision operates as a closed-list exception. The permitted purposes are specifically enumerated and cannot be expanded indefinitely through judicial interpretation.

Although Indian courts have occasionally drawn upon American jurisprudence, the article notes that reliance upon such principles remains uncertain and insufficient for commercial planning.

The DPIIT Working Paper similarly concludes that commercial AI training does not comfortably fit within the notion of private or personal use.

Section 52(1)(b): Transient and Incidental Storage

Another possible defence arises under Section 52(1)(b), which exempts transient and incidental storage occurring during technical transmission processes.

The article argues that AI training datasets do not satisfy this requirement.

Training data is neither transient nor incidental. Rather, it forms a deliberate and essential component of the AI development process.

Section 52(1)(zb): Electronic Storage Exceptions

The article also considers Section 52(1)(zb), which permits electronic storage in limited circumstances.

However, the provision remains tied to specific purposes enumerated elsewhere in Section 52, none of which expressly encompass commercial AI development.

Consequently, the article concludes that Section 52 offers no clear legal pathway for commercial AI training under the current statutory framework.

Global Regulatory Models: Comparative Perspectives

India is not alone in confronting these challenges.

Several jurisdictions have experimented with different approaches to balancing copyright protection and AI innovation.

United States

The United States relies largely upon the flexible fair use doctrine under Section 107 of the Copyright Act.

However, recent litigation demonstrates continuing uncertainty. Courts have reached differing conclusions regarding market harm and transformative use in AI training cases, highlighting the limitations of relying solely upon judicial interpretation.

European Union

The European Union has adopted a more structured approach through the Copyright in the Digital Single Market (CDSM) Directive.

Article 3 establishes a research-oriented text and data mining exception, while Article 4 introduces a commercial text and data mining framework subject to an opt-out mechanism for rights holders.

The article notes criticisms of this approach, particularly concerns that the opt-out system effectively places the burden of protection upon creators and may inadequately safeguard individual rights holders.

Japan and Singapore

Japan and Singapore have adopted more permissive frameworks.

Japan’s Copyright Act permits use of copyrighted works for information analysis under specified circumstances, while Singapore’s Copyright Act, 2021 allows computational data analysis without obtaining prior consent from rights holders.

Although these approaches encourage innovation, the article argues that they may not be suitable for India because they provide little direct remuneration to creators within a substantial and rapidly growing creative economy.

The DPIIT Working Paper and Proposed Licensing Framework

Following consultations with technology companies, content creators, and other stakeholders, the DPIIT Working Paper rejected both blanket exceptions and pure opt-out systems as standalone solutions.

The committee concluded that:

A zero-cost exception may undermine creative incentives;
Pure opt-out systems disadvantage smaller creators; and
A licensing-based approach offers a more balanced solution.

The committee therefore proposed a compulsory blanket licensing framework administered through a centralised body known as the Copyright Royalties Collective for AI Training (CRCAT).

Under this model:

AI developers would receive access to copyrighted works for training purposes;
Rights holders would be compensated through royalty payments;
Developers would contribute a percentage of commercial revenues;
A central collecting body would distribute royalties; and
Developers would be required to provide meaningful disclosures regarding training datasets.

The proposal also contemplates retrospective compensation for AI systems already commercialised using copyrighted material.

The article notes that the CRCAT model builds upon existing collective management structures recognised under Section 33 of the Copyright Act, thereby maintaining institutional continuity within the Indian copyright framework.

Critical Analysis of the Proposed Framework

While supportive of the DPIIT’s general direction, the article identifies several concerns requiring further attention before legislative implementation.

Startup Viability

The proposed revenue-sharing model may function effectively for established AI companies but could create barriers for early-stage startups lacking commercial revenue.

The author suggests that a tiered royalty structure or deferred payment mechanism may be necessary to avoid discouraging innovation.

Retroactive Liability

The proposal’s retrospective compensation mechanism raises concerns regarding legal certainty and proportionality.

Although compensating creators for past use may be normatively attractive, the scope and calculation of retrospective obligations require careful consideration.

TRIPS Compatibility

India’s obligations under the Berne Convention and the TRIPS Agreement necessitate compliance with the three-step test governing copyright limitations and exceptions.

The article argues that any future licensing regime should be accompanied by a formal analysis of compatibility with international intellectual property obligations.

Governance of Collective Management Organisations

The success of CRCAT depends upon effective governance.

The article notes that collective management organisations in India have historically faced concerns regarding transparency, accountability, and timely distribution of royalties.

Accordingly, the governance framework for CRCAT would need to be significantly strengthened if it is to manage a nationwide AI licensing system involving potentially millions of creators.

Conclusion

India stands at a critical crossroads in the development of AI and copyright law.

The ongoing ANI v. OpenAI litigation highlights questions that the Copyright Act, 1957 was never specifically designed to address. At the same time, legislative reform offers an opportunity to create a framework capable of balancing innovation with protection of creative labour.

This article advances three central propositions.

First, large-scale copying of copyrighted works for commercial AI training constitutes prima facie infringement under the current Copyright Act, while the existing fair dealing framework provides little meaningful protection for commercial AI developers.

Second, comparative international experience suggests that neither zero-cost exceptions nor purely voluntary licensing systems adequately balance the interests of creators and innovators.

Third, the DPIIT’s proposed licensing framework represents a promising direction but requires important refinements before implementation, including tiered royalty mechanisms, carefully defined retroactivity provisions, robust governance standards, and comprehensive TRIPS compatibility review.

Ultimately, the debate surrounding AI training data concerns how the costs of technological progress should be allocated. The emerging Indian approach seeks to recognise those costs as a matter of remuneration rather than externalising them entirely onto creators.

The challenge now lies in translating that policy vision into legislation that is doctrinally coherent, technologically responsive, and capable of securing confidence among both the creative community and the AI industry.

Disclaimer: The views expressed in this article are those of the author and do not necessarily reflect the views of The Lawscape.

The Lawscape — clear, practical legal insight for students and future lawyers.