Only a tiny percentage of the 500 most visited websites in Spain - including government sites and streaming and adult content platforms - correctly fulfil the requirements of the General Data Protection Regulation (GDPR).
The research study by researchers from the Universitat Oberta de Catalunya (UOC), the University of Girona and the Center for Cybersecurity Research of Catalonia (CYBERCAT) not only provides valuable insights on compliance with online privacy laws. It also highlights the significance of the algorithms used to study them. With an enormous number of pages and platforms on the internet, manually analysing each case is impractical, which makes automating the process crucial.
Web-tracking techniques can be challenging to detect, with no clear indicators of their presence. The researchers overcame this challenge by developing a proprietary method that employs four algorithms and a measure called the Websites Level of Confidence. This approach assesses regulatory compliance and helps detect hard-to-spot tracking techniques.
The results, published in open access in the scientific journal Computers & Security, were reached using novel automated methods for analysing web-tracking techniques and compliance with internet privacy regulations.
The European Parliament's approval of the General Data Protection Regulation in 2016 was set to forever change how companies, websites and digital platforms manage users' personal data. The European regulation, promoted in Spain as the Organic Law on the Protection of Personal Data and Guarantee of Digital Rights in 2018, was supposed to mark a turning point in protecting citizens' privacy. However, six years later, the actual implementation of this regulation is progressing at a faltering pace.
Sites lack forms to obtain consent for cookies
The researchers developed several algorithms for this study to analyse the 500 most visited websites in Spain according to the Alexa ranking. The results revealed a high percentage of sites that lack an appropriate form to obtain users' consent for using cookies and other data collection tools.
The analysis tools also detected around seven tracking cookies on average per website and 11 web beacons, which are small pieces of code embedded in the site to collect certain information from web traffic invisibly. In addition, 10% of the sites analysed in the study use browser fingerprinting techniques, which are also challenging to detect.
"The purpose of all these techniques is usually to track the online behaviour of web users in order to create profiles that can then be used to adjust the advertising that will be shown or the prices that will be offered for services or products,” says Pérez-Solà.
The analysis carried out by the researchers from the UOC (Pérez-Solà and Albert Jové) and the University of Girona (David Martínez and Eusebi Calle) shows that only 8.91% of websites that obtain users' consent as required apply this consent successfully in practice.
"Our method uses a combination of automation and manual inspection. The implemented algorithms automatically browse the analysed websites and take screenshots that are then manually inspected," says Pérez-Solà.
Each of the algorithms used by the researchers has a defined function:
- The Consent Inspector Algorithm (CIA) captures clear images of the website's cookie banners and identifies buttons allowing users to customise these tracking elements.
- The Website Evidence Collector (WEC) gathers information on the different web-tracking techniques used on each website.
- The Cookies Detector Algorithm (CDA) categorises the cookies that websites use in browsers without user consent based on the data provided by the WEC.
- The Web Beacons Detection Algorithm (BDA) not only extracts web beacons detected by the WEC but also identifies browser fingerprinting techniques.
"Understanding the details of the regulations that apply at any given time and knowing how to tell what techniques a website is using are beyond the grasp of most users," she says.