ANI v. OpenAI: Where Will the Court Draw the Line?

Authors: Rishi Anand (Partner), Riddhi Misra (Principal Associate) and Deeksha Dubey (Associate)

Introduction

The past decade marked a revolution in the technology space, wherein artificial intelligence (“AI”) has evolved significantly from a futuristic abstraction to a present-day reality. AI has not only rapidly transformed how information is created, accessed, and consumed but has managed to make itself a part of everyday life. At the same time, its growing presence is testing the limits of existing legal frameworks, particularly in areas such as intellectual property, data use, and liability.

In this article, we will be delving into the increasingly debated question of whether the use of copyrighted material to train AI systems without authorisation is permissible or constitutes infringement under existing copyright law. This tension stems from the nature of AI itself. On one hand, developers require data at an unprecedented, almost planetary scale to build effective models. On the other hand, much of this data, including news articles, books, music, and other creative works, is protected by copyright and owned by individuals and organisations entitled to control and monetise its use. In India, this conflict has come to the forefront in ANI Media Pvt. Ltd. v. OpenAI Inc. & Anr. (CS(COMM) 1028/2024) (“ANI v. OpenAI”), in which the Hon’ble Delhi High Court (“Court”) has heard the matter over multiple dates and has now reserved the judgement. The case represents the first significant judicial examination of whether AI training practices are compatible with the Indian Copyright Act, 1957 (“Act”), a statute enacted long before the advent of the digital age, let alone AI. With the judgement in ANI v. OpenAI now reserved, the outcome is expected to have far-reaching implications, not only for AI developers and content creators but also for the future trajectory of copyright law in India.

    How large language models use training data

    To understand the legal dispute in ANI v. OpenAI, it is pertinent to consider the broader legal and technological context. The first step is to briefly examine how large language models (“LLMs”) are built. LLMs are at the centre of this revolution as they are the core engine for an AI. The LLMs are systems trained on vast volumes of data to generate quick responses – the training of an LLM begins with large-scale ingestion of data from datasets like ‘Common Crawl, which contain vast amounts of publicly available, web-scraped content. This includes, among other material, news articles that are often republished across platforms, allowing content created by organisations like ANI to enter training datasets through broad scraping rather than deliberate selection. Developers may also use proprietary web crawlers, which operate on an opt-out basis. This means website owners can prevent such AI crawlers from accessing or using their content by specifying restrictions in files like robots.txt. This practice of opt-out shifts the burden onto content creators to block access by AI web-crawlers, while raising questions about effectiveness, particularly where the same content remains accessible through other third-party platforms or licensed sources.

    The web-crawled data is then processed through tokenisation, converting text into numerical representations that enable the model to learn patterns within language. While AI developers argue that this does not involve storing original works in a readable form, concerns arise where AI generates outputs that are substantially similar or even verbatim similar to the original content, suggesting possible ‘memorisation’ of protected work. This is followed by reinforcement learning, a method in which an AI system improves by learning from feedback on its responses. While this helps make outputs more fluent and coherent, incorrect feedback may also lead to inaccurate or fabricated outputs, including false attribution in news-related contexts. Advanced AI systems also incorporate real-time retrieval capabilities, allowing them to access and summarise live information from the internet. While this enhances usability, it blurs the line between summarisation and substitution, particularly where users can obtain the substance of original work without accessing the source. Taken together, these processes illustrate why the legal issues are complex: each stage of data ingestion, processing, and output affects rights protected under copyright law. The central question before the Court in the case of ANI v. OpenAI is whether these processes merely facilitate learning, or whether they amount to unauthorised use of protected expression.

    The Indian Copyright Framework

    The resolution of this dispute ultimately rests on the framework of the Act, which defines both the rights of authors and the limits on how their works can be used by others. Although the Act was enacted long before the digital age, its provisions are broad enough to pose critical questions when applied to modern technologies like AI. Under the Act, authors are granted exclusive rights to reproduce, communicate, adapt, and distribute their works, and these rights arise automatically upon creation, no explicit registration is required. News articles are treated as “literary works” under Section 13 of the Act, and the Supreme Court in Eastern Book Company v. D.B. Modak clarified that copyright subsists where a work reflects certain level of creativity, a threshold that ANI’s news content is likely to satisfy. The Act also provides for certain exceptions under Section 52 of the Act. However, courts in India have generally treated these exceptions as exhaustive. In other words, if a particular use does not fall within one of the listed exceptions, it would generally amount to infringement, regardless of how useful or innovative the purpose may be. This is an important difference from U.S. copyright law, where courts apply a flexible “fair use” doctrine and can recognise new categories of permissible use through a broader, case-by-case analysis under Section 107 of the Copyright Act of 1976, 17 U.S.C.

    The exception most relevant in this context is Section 52(1)(a), which allows “fair dealing” with a work for purposes such as private or personal use, including research. However, Indian courts have generally interpreted this provision quite narrowly. They tend to look at factors such as how much of the work has been used, whether the use has a commercial element, and what impact it has on the market for the original work. A key limitation is that the exception is meant for private or personal use. Once there is any element of commercial use, even indirectly, the protection of Section 52 of the Act becomes difficult to claim. This creates a challenge for AI systems, which are ultimately developed and deployed as commercial products. While the 2012 amendments did introduce a limited allowance for “transient or incidental” storage under Section 52(1), Indian courts have clarified that this applies to routine technical processes such as browsing or caching. It does not extend to the kind of large-scale and sustained use of data involved in AI training.

    Unlike some other jurisdictions, India does not currently have a specific exception for text and data mining (“TDM”). For example, the European Union (“EU”) allows certain forms of TDM, including limited commercial use, as long as rightsholders have not expressly reserved their rights. Japan has also introduced similar provisions. In contrast, Section 52 of the Act offers no clear basis for large-scale use of copyrighted material for AI training. This gap lies at the heart of the ANI v. OpenAI case, raising an important question for the Court whether such an exception can be read into the existing framework through interpretation, or whether it is a matter that should be left to the legislature.

    The Dispute: Arguments and Interim Developments

    ANI’s case rests on three primary claims: First, it contends that OpenAI reproduced its copyrighted articles or substantial parts of them during the training process, thereby allegedly reproducing their work as per Section 14(a)(i) of the Act. Second, it argues that ChatGPT’s outputs, which allegedly paraphrase or closely restate ANI news content, may implicate the adaptation and reproduction rights under Section 14 of the Act. Third, and most heavily relied upon, it submits that allowing a commercially operated chatbot to substitute for the original source materially undermines the market for licensed syndication of ANI’s journalism.

    OpenAI’s primary substantive defence is that training constitutes research or private use within the meaning of Section 52(1) and that, in any event, it is exempted under the “fair dealing” principle, having regard to the limited and transformative nature of the copying involved. It also raises a jurisdictional objection: as a US corporation whose servers are located outside India, it questions whether the Court has territorial jurisdiction at all. ANI, however, has argued that jurisdiction may still arise in India, since the alleged harm is felt here and ChatGPT is accessible to Indian users. The issue of territorial jurisdiction remains open for determination by the Court.

    At an early stage of the proceedings, the Court was assisted by expert submissions on the broader legal and policy questions involved. These reflected divergent views on the application of copyright law to AI training. One line of argument emphasised that the use of copyrighted content for training could amount to infringement, and that the fair dealing exception under Indian law, being limited in scope may not extend to such use. An alternative view suggested that temporary storage and processing of data for training purposes may not, by itself, be infringing, and that the more relevant inquiry lies in whether copyrighted material is reproduced at the output stage. The experts also indicated that the absence of a clear TDM exception in Indian law raises a significant question in the context of large-scale AI training. More broadly, Government submissions and parliamentary responses have acknowledged this gap and suggested that reforms in this area may be under consideration.

    International Precedents and Comparative Perspectives

    No common law court has yet delivered a final ruling on AI training and copyright, but the landscape of pending litigation and emerging doctrine offers useful guidance. In the United States (“US”), the pivotal fair-use precedent is Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015), where the Second Circuit upheld Google’s mass digitisation of library books on the basis that the resulting search index was “highly transformative” and did not serve as a market substitute for the originals. AI defendants have eagerly invoked Google Books, but critics note a crucial distinction: Google displayed only short “snippets” of source text, whereas a generative model may produce lengthy passages that do function as substitutes for the original.

    More directly on point is The New York Times Company v. Microsoft Corporation and OpenAI, No. 1:23-cv-11195 (S.D.N.Y., filed December 2023). In this case, The New York Times Company has demonstrated through carefully constructed prompts that ChatGPT and GPT-4 can be induced to reproduce verbatim paragraphs from its articles, including content sitting behind a paywall. OpenAI’s principal response has been to invoke the transformative-use doctrine and to characterise any verbatim reproduction as an unintended “memorisation” artefact rather than a design choice. The litigation is ongoing, but the evidentiary record showing near-perfect replication of journalistic prose has strengthened plaintiffs’ arguments in AI‑training cases worldwide, and the case has been cited in commentary and policy discussions, including in the context of the ANI proceedings before the Court.

    In the United Kingdom (“UK”), the Copyright, Designs and Patents Act 1988 (Section 29A) permits TDM only for non-commercial research without the copyright owner’s consent. Following its 2022 proposal for a broad commercial TDM exception, which was abandoned after widespread opposition from rights holders, the UK Government launched a fresh Copyright and AI Consultation in late 2024. Having received responses to the consultation, the UK Government has yet to commit to a broad commercial TDM exception, with the position remaining under active legislative review. The position under English law remains unsettled in relation to commercial AI training, with no broad statutory exception currently in place. However, the Digital Single Market Directive continues to shape the EU landscape by providing a TDM exception for non‑commercial research‑related uses on a non‑waivable basis, while also allowing rightholders to expressly reserve their rights in an appropriate manner, thereby determining how far European publishers can practically exclude their content from AI training datasets.

    Japan presents the most permissive framework: its Copyright Act (Act No. 48 of 1970), as amended by Act No. 30 of 2018, which introduced Article 30-4, which broadly permits information analysis, including commercial AI training. However, two important statutory guardrails apply: the exception does not cover uses where the purpose includes “enjoyment” of the copyrighted expression (meaning AI outputs that substantially replicate training works are not protected), nor does it apply where use “unreasonably prejudices the interests of the copyright owner,” for instance, reproducing a database sold specifically for AI training purposes. Japan’s Agency for Cultural Affairs clarified both limits in May 2024 guidelines. Subject to these caveats, the framework offers considerably more flexibility than India or the EU. It has attracted both admiration and criticism; it is nonetheless unlikely to serve as a model for India, whose domestic content industry has a strong lobbying interest in the outcome.

    Indian courts may also draw on the Supreme Court’s approach to interpreting copyright exceptions, as reflected in Super Cassettes Industries Ltd. v. Music Broadcast Pvt. Ltd., where the Court emphasized that such exceptions must be understood in a manner that gives effect to their purpose, but without stretching the language of the statute beyond its ordinary meaning. In that light, courts should be cautious both about reading a new TDM exception into Section 52 and about interpreting the phrase ‘private use’ so broadly as to encompass large‑scale commercial use of copyrighted data.

    What’s at Stake: Two Scenarios

    The Court now finds itself at a critical juncture tasked with balancing the encouragement of innovation against the protection of established rights, making it essential to examine both the technological foundations of AI and the evolving legal principles that seek to govern it. The eventual judgment will, in substance, produce one of two outcomes: either a finding that training without a licence is an infringement, or a finding that it falls within fair dealing, and each carries significant downstream consequences. It is also possible that the court does not reach the merits at this stage and instead simply preserves the status quo pending legislative action, but even that choice would carry its own message to the market. We have briefly analysed the two possible outcomes below.

    Scenario A: Training Without License Constitutes Infringement

    If the Court holds that training on copyrighted works without authorisation constitutes infringement, whether by treating the transient copies made during ingestion as reproductions, or by treating model weights that encode the statistical patterns of source texts as adaptations, the consequences for the Indian AI landscape would be immediate and far-reaching.

    Developers of foundation models would face the prospect of requiring licences for every substantial corpus they use, mirroring the licensing ecosystems that exist in the music industry. This is not inherently unworkable; several major publishers and news agencies have already negotiated data-licensing agreements with AI companies, but it would impose compliance costs that may burden smaller Indian AI start-ups as compared to well-resourced multinational companies. There is a credible concern that a strict infringement finding, unaccompanied by clear legislative guidance on what a compliant training programme looks like, would create a “chilling effect” on AI research and development in India at precisely the moment when domestic investment in the sector is accelerating.

    On the other side of the ledger, a finding of infringement would vindicate the property rights of Indian content creators like journalists, authors, academics, and film-makers who currently have no practical means of preventing or being compensated for the use of their work in AI training. It would also align India more closely with the precautionary approach that the EU has adopted, offering a degree of international regulatory coherence that may matter to Indian companies seeking to operate across jurisdictions. The Court would also need to consider the burden of proof in establishing copyright infringement. Given the opaque nature of large-scale training systems, plaintiffs would at the very least need to show that their specific works were included in the training dataset. This is precisely why disclosure-related directions may assume significance during such proceedings.

    Remedially, a bare declaration of infringement would be of limited practical value unless accompanied by injunctive relief, damages, or both. A permanent injunction against further training on Indian news content would be technically difficult to enforce but would create powerful incentives for licensing. Any assessment of damages, whether based on notional licence fees or an account of profits, would face a practical difficulty: it would be extremely hard to determine the contribution of any individual article to the overall commercial value of a model trained on hundreds of billions of tokens.

    Scenario B: Fair Dealing and the Limits of Section 52

    If, conversely, the court holds that AI training constitutes fair dealing for research or private use within Section 52(1)(a), that finding would carry its own difficulties. The most obvious is textual: OpenAI’s use of ANI’s articles is plainly commercial, and the phrase “private or personal use” in Section 52(1)(a) does not readily extend to the operations of a for-profit company generating revenue from a commercially deployed product. A court that stretches the provision to cover such uses would be doing precisely what the Supreme Court has warned against: departing from the natural language of exceptions in pursuit of a judicially preferred policy outcome.

    A fair-dealing finding would, however, spare the Indian Legislature from having to legislate urgently on a technically complex subject. It would create immediate breathing room for the Indian AI industry and would align India functionally with the US, where the fair-use doctrine has historically been deployed to protect new technologies against copyright challenges. The risk is that, in the absence of a clearly defined opt-out mechanism, rights holders may be left without any practical remedy even where they object to the use of their works for commercial AI training. This may raise questions regarding consistency with India’s obligations under the Berne Convention and the TRIPS Agreement, which require that limitations on exclusive rights satisfy the three-step test i.e., that they apply only in certain special cases, do not conflict with normal exploitation of the work, and do not unreasonably prejudice the author’s legitimate interests.

    A finding in OpenAI’s favour on the merits would also not resolve the output-stage claim. Even if training is shielded, a separate analysis is required for instances where ChatGPT reproduces ANI content verbatim or near verbatim. That reproduction occurs at the point of user interaction, not during training, and cannot be sheltered by a research-or-private-use exception if the user is receiving the output through a commercial platform. This dimension of the case may ultimately prove easier for the court to resolve against OpenAI than the training question.

    Conclusion

    ANI v. OpenAI is not just a dispute between a news agency and a technology company. It raises a larger question of whether a copyright law drafted in an earlier era can be meaningfully applied to modern AI systems, or whether it needs to be updated to address these new realities. As the law currently stands, the Act does not comfortably accommodate large-scale commercial AI training within its existing exceptions. At the same time, a strict finding of infringement without clear legislative guidance may create uncertainty for both rights holders and developers. The Government’s acknowledgement of this gap suggests that reform is already under consideration. In this context, a narrowly framed judicial approach focused on the facts of the case, particularly instances of verbatim reproduction, may offer a more balanced path, while leaving broader policy questions to the Indian Legislature.

    Irrespective of the final outcome, the case has already brought issues of AI training data, copyright protection, and creators’ rights to the forefront of India’s technology policy discourse. This is also reflected in the DPIIT’s Working Paper on Generative AI and Copyright (December 8, 2025), released by a committee constituted on April 28, 2025 (“Working Paper”), to examine AI-copyright issues and recommend reforms. After reviewing approaches in the US, UK, EU, Japan, and Singapore, the committee concluded that no single existing model adequately serves India’s objectives. The challenge, as the Working Paper recognises, lies in balancing innovation with fair compensation. A purely binary approach, either permitting unrestricted use or imposing a complete restriction may not be sustainable. Instead, a more balanced framework could be considered. One possible approach is a structured licensing model, such as a centralised or blanket licensing mechanism, which allows access to data while ensuring remuneration to rights holders. A further challenge lies in the opacity of training datasets, which makes it difficult for rights holders to establish whether their works were used at all. Therefore, transparency measures such as requiring AI developers to provide meaningful disclosures about training data sources can help address concerns around accountability and enforceability, while being designed in a way that does not unduly burden innovation.

    Such an approach would help balance the interests of India’s content industries with its ambitions in the AI sector, while also signalling a responsible and forward-looking regulatory stance. The outcome of this case will not settle the issue entirely, but it will shape the direction of the debate. What is clear, however, is that the question can no longer be deferred; the proceedings in ANI v. OpenAI have ensured that India must now actively define how copyright law will interact with the future of AI.

    Disclaimer: This article represents our understanding and interpretation of the relevant laws as on the date hereof and is provided without expressing any opinion, advice, or recommendation. The interpretations set out herein are subject to change, and there can be no assurance that any regulator, authority, or judicial body will concur with or adopt a position consistent with our views expressed in this article. This article is furnished solely for academic and informational purposes and should not be construed as legal advice or relied upon for any purpose whatsoever.