Can You Still Perform Web Scraping With The New CNIL Guidelines?

7 min readJun 9, 2020

The General Data Protection Regulation (GDPR) is a law that deals with data privacy and security in the European Union. It affects companies anywhere in the world if they target or collect data from people living in the European Union. The GDPR went into effect on May 25, 2018, and carries large fines for those who break it.

While the GDPR went into effect a couple of years ago, data and web scraping are still a topic of discussion in some European Union countries. One of those countries is France; the French Data Protection Authority (CNIL) released new guidelines on web scraping. These new guidelines demonstrate the level of care companies must take in complying with the GDPR. They set in place specific procedures that companies which perform web scraping must follow in order to maintain compliance with the GDPR. This includes companies that collect data on publicly available websites. Conforming to these new guidelines ensures that any company and its vendors perform the necessary procedures to fully comply with all GDPR requirements.

What Are The New CNIL Guidelines?

The CNIL guidelines made it clear that publicly available data is still personal data. These new guidelines were released on April 30, 2020. This means that any publicly available personal data cannot be repurposed without the knowledge of the person to whom that data belongs. These new guidelines have the potential to impact every French citizen, as well as the companies that are collecting data, as they allow French citizens to opt-out of having their data collected. They have also created clear procedures which every business must follow when collecting data.

Obtaining Unequivocal Consent To Reuse Publicly Available Private Data

One important aspect to understand when it comes to web scraping is what it means to get unequivocal consent and how you can get unequivocal consent as a web scraper.

Unequivocal consent means that you are given very clear and very firm consent to perform a specific task. There are several ways for a web scraper to get unequivocal consent and some rules regarding this. For starters, the users must receive all relevant information in regard to what information is collected and what it will be used for. It is recommended that web scrapers provide the following information to anyone who’s data they collect:

The purpose of the data
The number of other people/users who have access to the data
Whether consent is valid for the ability to scrape data throughout other websites or apps
The fact that users have are able to withdraw consent at any time

This information must be clear, easily accessible, and exhaustive before a company is able to gain someone’s consent.

The CNIL follows the GDPR criteria for consent and has given the following recommendations. For starters, the consent should be individual for each purpose. For web scrapers, this means having a distinct data set for each purpose, as well as clearly defining that purpose. The CNIL also recommends that users are able to consent or refuse consent with equal simplicity. Also, if a user does not give their consent for data collection, they cannot be asked again for a specified period of time. Consent should also be renewed at an appropriate interval of time.

Performing a Data Protection Impact Assessment

You do not need to perform a data protection impact assessment every time you are looking to perform web scraping. However, you do need to perform if there is a high risk to data subjects. There are two scenarios when you would need to complete one of these assessments. The first case is if the processing envisaged is on the list of types of processing operations which the CNIL considers to be compulsory for carrying out the assessment. The second case in which you would need to perform an assessment is if it meets two out of the nine criteria laid out in the G29 guidelines.

There are several pros and cons to conducting a data protection impact assessment. For starters, it will take more time for companies to get approval and this could cost them. However, this extra time gives users more protection over their data. It also ensures that companies take the time to evaluate how they will handle and protect the data. Another con, however, is that some useful data collection may not take place if their data protection impact assessment is not accepted. If more companies have to perform these types of assessments, there might be fewer companies that do data collection and, thus, leave data in the hands of only a few companies.

There are several key steps that you can perform when carrying out a data protection impact assessment. The first step is identifying the need for a DPIA. If you do need to perform a DPIA, start by describing the process and then lay out how/why you plan to use the data you have collected. This should include the nature, scope, context, and purposes of the processing. The third step is to consider a consultation. Next, assess the necessity and proportionality of the data collection. You also need to identify and assess any of the risks that may come up. After you have identified any risks, you must find ways to mitigate those risks. Finally, you need to record several different items such as any other measures you will take in regard to data collection and protecting that data. You also need to identify whether the risks have been eliminated, reduced, or accepted, and the overall level of residual risk. Finally, you need to record if you need to consult the ICO.

What Do the New CNIL Guidelines Mean for Web Scraping Services?

The new CNIL guidelines have impacted web scraping due to the way it limits who you are able to collect data from. For example, you are no longer allowed to collect data from people who have determined that they do not want their data collected, even if that data is publicly available.

The new guidelines state that the data should also be relevant and only collected from websites that allow data collection This means that companies which offer web scraping services are no longer allowed to collect irrelevant, excessive, or sensitive data.

You Can Still Perform Legal Web Scraping. Here’s How:

The CNIL did issue some guidance on collecting data. For starters, the companies need to have an understanding of how long the data processing or web scraping will last. Companies also need to know where the scraped data came from, especially if that company restricts its data collected for commercial reuse. Companies also need to limit the amount of data to only the necessary data for the identified task and not collect any data which is irrelevant to that task. Individuals also have to be informed if they are affected by the collection of their personal data.

On top of this, the CNIL wants companies to carefully oversee all of their vendor relationships in regard to data processing. It is recommended that companies comply with all GDPR requirements. Finally, companies may need to do a data protection impact assessment, depending on the type of data they collect and the methods they use to collect it.

The LinkedIn vs. HiQ Case and What it Means for You

LinkedIn is a popular website that allows employees to connect with workers and companies, helping them grow the number of business relationships they have. HiQ is a data scraping company that collects publicly available data from LinkedIn. There was an increase in the number of companies that were scraping data from LinkedIn and this resulted in a ban on those companies. However, HiQ was able to get around that ban by hiding the IP address that was being used for the web scraping. LinkedIn then sent a cease-and-desist letter to HiQ because they had breached LinkeIn’s ToS and violated the Computer Fraud and Abuse Act. HiQ filed for a preliminary injunction so that it could still function until a later date.

The case made its way to Ninth Circuit, where the preliminary injunction was upheld allowing HiQ to collect the data that it wanted. This could potentially impact web scraping services because it could allow them to collect publicly available data from the website, even if doing so breaches that company’s ToS. However, this case is not completely over and LinkedIn may have this case appealed to the Supreme Court.

Summing It All Up

The world of web scraping is changing as more people and countries push to protect their data. One of those countries trying to protect their citizen’s data is France, who have passed new guidelines on the collection of data. CINL decided that publicly available data still fell under the umbrella of GDPR and established several guidelines on web scraping technologies.

CNIL also laid down regulations on how companies should handle data protection impact assessments. These assessments have the potential to protect people’s data better and make sure that data collection companies are also following the rules.

However, while these new guidelines do not stop a company from collecting data from individuals, it does allow for individuals to opt-out of having their data collected and limit the amount of data that can be collected. While these new guidelines appear to limit web scraping, a Ninth Circuit Court ruling in the United States seemed to rule in favor of web scraping continuing even when it breaks a company’s ToS.

Looking for more information? Check out the legal web scraping guide.

Can You Still Perform Web Scraping With The New CNIL Guidelines?

Written by FindDataLab.com