Data Scraping: Through the Lens of IP Law

Data scraping, i.e., the extraction of data from websites through automated processes, is used across industries for market research, consumer analytics, preparing business strategies, etc. However, its misuse leads to violation of Intellectual Property Rights (IPR), privacy laws, breach of the website’s terms of use, etc. This post seeks to give an overview of the legal intricacies involved in data scraping and explore the interplay between data scraping and IP Rights.

Towards the end of June 2023, a class action lawsuit was filed against Open AI for data scraping. It was claimed that the company’s Artificial Intelligence (AI) tools, ChatGPT and DALL-E, use “stolen private information”, thereby violating terms of service agreements along with state and federal privacy laws. Days after this suit came to be voluntarily dismissed, another class action lawsuit was instituted by the Authors Guild and renowned writers, including John Grisham, George R. R. Martin, and others, seeking an injunction restraining Open AI from infringing their copyrights. In the complaint filed before the US District Court for the Southern District of New York, it is alleged that Open AI copied the Plaintiffs’ works, which were then used to train its AI language models.

Further, concerns over AI data scraping led X (formerly Twitter) to temporarily disallow users from accessing tweets or browsing through the platform without logging in and to limit the number of posts which the users could read.

These instances reflect the upsurge in the use of data scraping in the technology-driven world and demonstrate the need to have a comprehensive legal framework to address the issues that arise from such use.

LinkedIn Data Scraping Case and User Agreement

Referring to the landmark ruling in the data scraping dispute between LinkedIn and hiQ Labs is necessary to better understand the legal intricacies involved in this domain. The key aspects of the case are discussed herein.

For the scraping of data from LinkedIn public profiles for its business, a cease-and-desist letter was sent by LinkedIn to hiQ, a data analytics company. This letter specified that future access would violate provisions of the Digital Millennium Copyright Act of 1998, the Computer Fraud and Abuse Act of 1986, the California Penal Code and the California common law of trespass, and measures were taken to block its access to LinkedIn profiles. Following this, hiQ approached the US District Court, seeking an injunction against LinkedIn in this regard. Accordingly, a preliminary injunction was issued in the year 2017, and the Ninth Circuit upheld this decision.

Thereafter, LinkedIn filed a petition before the Supreme Court seeking a writ of certiorari, challenging the said order of the Ninth Circuit. Accepting the reliance placed upon the Van Buren decision[1] with respect to the interpretation of the phrase “exceeds authorized access” under the Computer Fraud and Abuse Act of 1986, the Court remanded the matter to the Ninth Circuit for reconsideration. The said Act imposes criminal liability on anyone who “intentionally accesses a computer without authorization or exceeds authorized access”.

Reaffirming its earlier stand, in a ruling dated April 18, 2022, the Ninth Circuit observed that the “gates-up-or-down inquiry” (from the Van Buren case) envisages two situations: one when authorization is necessary and has been given (the gates are up) and the other situation when authorization is required but is not granted (the gates are down)[2]. It was held that in the case of public websites, there are “no gates to lift or lower in the first place”, and there was no violation of the provisions of the said Act. However, the Ninth Circuit clarified that even if the CFAA does not apply, causes of action such as copyright infringement, breach of contract, violation of privacy law, etc., can be pursued.

Since it could not be demonstrated that the provisions of CFAA were violated, LinkedIn based its argument on the violation of its user agreement (which expressly prohibited scraping of profiles, information, technology, etc.) and, accordingly, sought a summary judgment. In a decision dated November 4, 2022, the United States District Court for the Northern District of California held that hiQ Labs breached LinkedIn’s user agreement through the scraping of data from public profiles on LinkedIn. In the following month, the parties entered into a settlement vide a consent judgment and permanent injunction.

Data Scraping under the Indian Copyright Law

If data scraping violates the express copyrights of the website owner, the same may result in legal action. As per provisions of the Copyright Act, 1957 (“Act”), copyright subsists in original literary, dramatic, musical, and artistic works, cinematograph films and sound recordings. Here, the type of work applicable would depend on the data that is scraped. If the data is in the form of text, it could constitute literary work subject to provisions of the Act. When it comes to original literary works, the owner is conferred with the exclusive right to reproduce the work, make copies, translations, adaptations of the work, etc. Likewise, if the data that is scraped is an artwork, it could qualify as artistic work, and so on.

The exercise of such rights without a license from the owner of the copyright will lead to copyright infringement. However, there are specific scenarios where the act would not be treated as an infringement of copyright, and these scenarios are listed in Section 52 of the Act. For instance, fair use of work (for private use, criticism, review, or reporting of current events) is excluded from the purview of infringement. So, it follows that data scraping would be compliant with copyright law if done in the manner and for any of the purposes enumerated under said provision.

In the case of Eastern Book Company & Ors. v. DB Modak & Anr.[3], the Appellants approached the Supreme Court alleging that the copying of the case reports included in their journal, Supreme Court Cases (SCC), onto the Respondents’ CD-ROMs amounted to copyright infringement under Section 51 of the Act. It was contended that the said case reports were the Appellants’ original literary works, and they had the exclusive right to make copies of the said works under Section 14. 

The Court noted that a joint reading of Section 2(k), proviso (d) to Section 17 and Section 52(1)(q)(iv) makes it clear the reproduction or publication of any judgment or order of a Court, Tribunal or other judicial authority would not constitute an infringement of the government’s copyright unless the Court, Tribunal, or judicial authority prohibits the same.

After perusal of the inputs made by the Appellants, including the addition of the Section or rule numbers, completing or correcting case names and citations, etc., the Court held that the required standard of creativity is not fulfilled. It was observed that, in order to establish a copyright, the applicable creativity standard is “not that something must be novel or non-obvious, but some amount of creativity in the work to claim a copyright is required”. However, the Court acceded that the Appellants had copyright with respect to inputs pertaining to the depiction of judges’ opinions as dissenting/ partly dissenting, etc.

Theft of Trade Secrets

Utilising data scraping to gather confidential information about competitors to gain a competitive advantage would amount to the misappropriation of trade secrets by way of industrial espionage. In 2017, Uber’s former employee, by way of a letter, accused Uber of stealing trade secrets and using data scraping techniques to obtain information through its competitors’ platforms.[4] This is an instance from the United States. However, there is no specific legislation governing trade secrets in India, where such rights can be enforced with the help of principles of equity or common law action for breach of confidence. If offences under criminal law are involved, one can seek relief under the Indian Penal Code, 1860 and the Information Technology Act, 2000.

The Way Forward

The future of data scraping as a legal practice in India is still up in the air because neither the enacted laws nor the proposed measures directly make any reference to it. Hence, the stakeholders must ensure that they carefully assess the terms of use, IP implications, etc. Though one can always follow best practices to avoid legal pitfalls, the same doesn’t negate the need for clear rules governing the protection of intellectual property and the avoidance of dishonest or harmful behaviour in data scraping. It will be interesting to see what the future holds, considering the evolving discussion around these topics.

References:

[1] Van Buren v. United States, 141 S. Ct. 1648, 1649 (2021)

[2] https://casetext.com/case/hiq-labs-inc-v-linkedin-corp-5

[3] (CA No.6472 of 2004)

[4] https://www.nytimes.com/2017/12/15/technology/uber-letter-illegal-spying.html

Image Credits:

Photo by Lewis Kang’ethe Ngugi on Pexels

If data scraping violates the express copyrights of the website owner, the same may result in legal action. As per provisions of the Copyright Act, 1957 (“Act”), copyright subsists in original literary, dramatic, musical, and artistic works, cinematograph films and sound recordings. Here, the type of work applicable would depend on the data that is scraped. If the data is in the form of text, it could constitute literary work subject to provisions of the Act. When it comes to original literary works, the owner is conferred with the exclusive right to reproduce the work, make copies, translations, adaptations of the work, etc. Likewise, if the data that is scraped is an artwork, it could qualify as artistic work, and so on.

POST A COMMENT