Does your algorithm care for copyrights, or for data protection regulation?
Does your algorithm care for copyrights, or for data protection regulation? | Photo: Markus Spiske | Free use

Web Scraping Social Media: Pitfalls of Copyright and Data Protection Law

The increasing popularity of web scraping methods does not come without a plethora of legal questions. In our first article, we analyzed the growing popularity of web scraping methods and how the Terms of Service of the social media platforms relate to this issue. In this article we discuss further questions of copyright law and data protection law regarding web scraping. The German legal situation in copyright law is discussed as an example here.

Web Scraping and copyright

Web scraping for research purposes may be permitted from a copyright point of view. Two aspects have to be taken into account: On the one hand, text content published on Facebook can be a protected language work in accordance with Section 2 paragraph 1 no. 1 of the German Act on Copyright and Related Rights (German: Urheberrechtsgesetz or UrhG). On the other hand, Facebook as a database manufacturer may invoke a special intellectual property right under Section 87a ff UrhG. Web scraping procedures can affect these rights. They technically require a reproduction of contributions.

First, web scraping could violate the rights of social media users who are the authors of text contributions. Assuming that at least some of the contributions have copyright protection, they would have to be either licensed by the rights holder or a legal permit must be applicable (according to Section 44a ff UrhG). However, the licensing of the single contributions proves to be unrealistic. Therefore, our attention must be directed to the legal permits.

In this case, Section 60c paragraph 1 and paragraph 3 UrhG are particularly relevant. This regulation allows the reproduction of small-scale works for the purposes of non-commercial scientific research to the necessary extent. Posts on social networks, such as Facebook, are typically such small-scale works. They are usually less extensive than poems, lyrics or printed works of up to 25 pages, which are mentioned by the legal rationale as typical works of small scope. Technical precautions can also ensure that web scraping procedures only cover contributions of a certain size. The copyrights of the users are therefore not infringed by web scraping.

Second, the question remains whether scraping infringes the rights of the provider of a social media platform as a manufacturer of a database in the copyright sense. This would be the case if during scraping, a substantial part of the database were to be reproduced by type or scope (Section 87b paragraph 1 sentence 1 UrhG) or if a part of the database, which is insignificant measured by type and scope, were to be repeatedly and systematically reproduced, provided that these acts run counter to a standard evaluation of the database or unreasonably impede the legitimate interests of the database-producing company (Section 87b paragraph 1 sentence 2 UrhG).

A substantial part of a database will at least not be reproduced if scraping is technically limited to the required extent within the framework of the research objective. Assuming that scraping is limited to posts relevant to individual topics, such as posts including certain hashtags, from the numerous written contributions available on a social media platform, there is usually no reproduction of substantial parts of a database.

Repeated and systematic reproduction of non-essential parts can be technically barred. In addition, web scraping will not typically run counter to a standard evaluation of the database since the evaluation of social media content for research purposes is not used to develop competing products. Finally, scraping will not unreasonably affect the interests of social media providers, as it does not affect the economic exploitation of their systems. Non-commercial research activities do not compete economically with the business activities of global companies such as Facebook.

Data protection issues

Since web scraping involves the mass processing of personal data, questions of data protection law also arise. The processing of data from publicly accessible areas of social media is permitted for research purposes to a large extent without the consent of the users.

For research institutions organized under private law, the decisive factor is whether the processing serves legitimate interests in accordance with Article 6 paragraph 1 littera f General Data Protection Regulation (GDPR). In the case of publicly available data, these will regularly outweigh the interests of the users in excluding processing. For research institutions organized under public law in accordance with Article 6 paragraph 1 littera e GDPR, whether data processing is necessary to fulfill their research tasks, will be determined according to similar criteria.

An additional balancing of interests is to be carried out in accordance with Article 9 paragraph 2 littera j GDPR in conjunction with Section 27 paragraph 1 sentence 1 of the German Federal Data Protection Act (BDSG) for special categories of personal data. Additionally, Article 9 paragraph 2 littera e GDPR applies when personal data is shared by users themselves in the public areas of social media sites.

The Terms of Service of social media platforms which prohibit automated access to their content are not in conflict with the processing of data as part of web scraping procedures for research purposes on the basis of this legal framework. The regulations of the Terms of Service are to be taken into account when balancing data protection interests in the context of Article 6 paragraph 1 littera f GDPR and Article 9 paragraph 2 littera j GDPR in conjunction with Section 27 paragraph 1 sentence 1 BDSG. They also serve to protect the users of these social media platforms.

Even if the relevant passage of the Terms of Service can only be partially applied, it shapes the expectations of users regarding data protection and can influence their decision about the use of Facebook – and thus the disclosure of data. These expectations of users must be taken into account when balancing the interests. The violation of the Terms of Service could lead to a predominant interest in the exclusion of data processing if users of Facebook could rely and did rely on not having their data automatically processed for research purposes when registering. However, Facebook reserves the right to authorize the automated evaluation of data by third parties without further consultation with the users. It appears doubtful that Facebook would take the possible conflicting interests of the users into account when deciding on the granting of the approval in a particular case. Ultimately, this speaks against a protected trust of users in the absence of automated access.

Web scraping without infringing copy right and data protection law

Web scraping for scientific research purposes in social media platforms entails various legal uncertainties. The copyrights of users and social media providers are not infringed by web scraping if suitable technical measures are taken to comply with the legal requirements. Regarding data protection law, users cannot expect that their data will not be processed for research purposes if social media platforms continue to reserve themselves the right to grant an approval for third parties to automatically access the data.

Sebastian Golla

Sebastian Golla

Sebastian Golla is a postdoctoral researcher at the Chair for Public Law, Information Law and Data Protection Law at the Johannes Gutenberg-University Mainz. In his research, he works on security law and information criminal law.
Denise Müller

Denise Müller

Denise Müller studies law and is a student assistant at the Chair of Public Law, Information Law and Data Protection Law at the Johannes Gutenberg-University Mainz.

Sebastian Golla

Sebastian Golla is a postdoctoral researcher at the Chair for Public Law, Information Law and Data Protection Law at the Johannes Gutenberg-University Mainz. In his research, he works on security law and information criminal law.

Weitere Beiträge zum Thema

Beraten und entscheiden in einer „Transboundary Crisis“ Eine der geläufigsten Definitionen von Krise bezeichnet diese als Wendepunkt, als „turning point for better or worse“ (aus dem Merriam-Webster Wörterbuch). In Krisen kann also Schl...
Web Scraping Social Media: Legitimate Research or a Breach of Contract? To make full use of the massive amounts of social media platform data for the purposes of scientific research, data is increasingly obtained using data collection methods such as w...
Populismus und Lüge. Wissenschaft in Bedrängnis Die Erfolge populistischer Bewegungen und die vielfältigen Rückgriffe auf „gefühlte Wahrheiten“, fake news und glatte Lügen sind auch für die Sozialwissenschaften eine besondere He...