Blog post

How to Train an AI with GDPR Limitations

Learn how AI companies can comply with the new European data protection regulation

September 13, 2019

8 mins read

In May 2018, the General Data Protection Regulation (GDPR) went into force. This European Union regulation aims to protect the privacy of EU citizens. And it affects not only Europe-based companies but all companies processing and holding the personal data of those residing in the EU. In essence, GDPR imposes new data regulations on modern technologies. But the regulation’s severe limitations are a real game-changer for AI and machine learning development. Let’s take a look at what challenges have arisen for the development of artificial intelligence in Europe with GDPR.

In this article, you’ll read about:

  • Why the GDPR affects AI development
  • The main challenges which arise due to GDPR limitations on AI
  • How to develop GDPR-friendly artificial intelligence

Why the GDPR affects AI development

Why does the GDPR have a significant impact on artificial intelligence? This regulation touches the two main aspects of machine learning (ML). First, it enhances data security. It poses strict obligations on companies that collect and process any personal data. Machine learning is closely connected with big data. Most AI-based systems require large volumes of information to train and learn from. Usually, personal data is among these training datasets. Which means that the impact of GDPR on machine learning and AI development is inevitable.
How to Train an AI with GDPR Limitations
Source: AppMachine

Second, the regulation explicitly addresses “automated individual decision-making” and profiling. According to Article 22, a person has a right not to be subject to either if they produce legal effects concerning him or her. Automated individual decision-making here covers an AI’s decisions made without any human intervention. Profiling means the automated processing of personal data to evaluate certain things about the data subject. For instance, an AI system might analyze a user’s credit card history to identify the user’s spending patterns.

Read more: Discover five use cases of machine learning in FinTech and bankig and learn how to apply best practices to your business

The GDPR provides for the right not [to] be subjected to a decision made entirely devoted by a machine, with some exceptions.

Giovanni Buttarelli, European Data Protection Supervisor

What challenges arise from GDPR limitations on AI?

How will GDPR affect AI? GDPR has six data protection principles at its core. According to a report by the Norwegian Data Protection Authority, artificial intelligence faces four challenges associated with these principles. AI development has problems fulfilling purpose limitation, data minimization, fairness, transparency, and the right to information.

Six core principles of GDPR
How to Train an AI with GDPR Limitations
Source: Network ROI

    • Fairness and discrimination

The GDPR fairness principle addresses fair processing of personal data: in other words, data must be processed with respect for the data subject’s interests. Also, the regulation obligates that a data controller take measures to prevent discriminatory effects on individuals. It’s no secret that many AI systems are trained using biased data. Or that their algorithmic models contain certain biases. That’s why AI systems often demonstrate racial, gender, health, religious, or ideological discrimination. To comply with GDPR, companies have to learn how to mitigate those biases in their AI systems.

    • Purpose limitation

The purpose limitation principle states that a data subject has to be informed about the purpose of data collection and processing. Only then can a person choose whether to consent to processing. The interesting thing is that sometimes AI systems use information that’s a side product of the original data collection. For instance, an AI application can use social media data for calculating a user’s insurance rate. The GDPR states that data can be processed further if the further purpose is compatible with the original. If it isn’t, the data collector should get additional approval from the data subject. But this principle has a few exceptions.

Further data processing is always compatible with the previous purpose if it’s connected to scientific, historical, or statistical research. Herein lies a problem, since there’s no clear definition of scientific research. Which means that in some cases, AI development may be considered such research. The rule of thumb is that when the AI model is static and already deployed, the purpose of its data collection can’t be regarded as research.

Read more: Learn how Intellias helped a global payment provider comply with PCI DSS requirements.
    • Data minimization

This principle controls the degree of intervention into a data subject’s privacy. It ensures that the data collected fits the purpose of the project. Collected information should be adequate, limited, and relevant. These requirements encourage developers to think through the application of their AI models. Engineers have to determine what data and what quantity of it is necessary for the project. Sometimes, this can be a challenge. It’s not always possible to predict how and what a model will learn from data. Developers should continuously reassess the type of and minimum quantity of training data required to fulfil the data minimization principle.

    • Transparency and the right to information

The GDPR aims to ensure that individuals have the power to decide which of their information is used by third parties. This means that data controllers have to be open and transparent about their actions. They should provide a detailed description of what they’re doing with personal information to the owners of that information. Unfortunately, with AI systems, this may be hard to do.

That’s because AI is essentially a black box. It’s not always clear how the model makes certain decisions. Which makes it impossible to explain an AI’s complicated processes to an everyday user. Naturally, when AI is not entirely transparent, the question of liability arises.

According to the GDPR, a data subject has the right to an explanation of an automated decision. So data controllers have to figure out ways to give one.

How to develop GDPR-friendly artificial intelligence

Like it or not, IT companies have to ensure all their processes are compliant with GDPR. Data processors and data controllers who violate this regulation will have to pay significant fines. Luckily, there are several ways of making AI compliant with GDPR. Take a look at these GDPR-friendly methods of AI development.

We need to find a way to design and use machine learning algorithms in a way that is compliant with the GDPR, because they will generate value for both service providers and data subjects if done correctly.

Alessandro Guarino, Senior Information Security Professional at StudioAG

GANs (Generative Adversarial Networks). Today, the trend in AI development is to use less data more efficiently rather than to accumulate lots of data. A GAN reduces the need for training data. Its main idea is generating input data with the help of output data. Basically, with this method, we take the input and try to figure out what the output will look like. To achieve this, we need to train two neural networks. One is the generator, the other is the discriminator.

The generator learns how to put data together to generate an image that resembles the output. The discriminator learns how to tell the difference between real data and the data produced by the generator. The problem here is that GANs still require lots of data to be trained properly. So this method doesn’t eliminate the need for training data; it just allows us to reduce the amount of initial data and generate a lot of similar augmented data. But if we use a small number of initial datasets, we risk getting a biased AI model in the end. So generative adversarial neural networks don’t solve these issues fully, though they do allow us to decrease the need for initial data.

Federated learning is another method of reducing the need for data in AI development. Remarkably, it doesn’t require collecting data at all. In federated learning, personal data doesn’t leave the system that stores it. It’s never collected or uploaded to an AI’s computers. With federated learning, an AI model trains locally on each system with local data. Later, the trained model merges with the master model as an update. But the problem is that a locally trained AI model is limited, since it’s personalized. And even if no data leaves the device, the model is still largely based on personal data. Unfortunately, this contradicts the GDPR’s transparency principle.

The AI model is personalized on the user’s phone. All the training data remains on the device and is not uploaded to the cloud.
How to Train an AI with GDPR Limitations
Source: AI Google Blog

Transfer learning is a method that enables the effective reuse of prior work and leads to the democratization of artificial intelligence. In this case, the AI model doesn’t train from scratch. Instead, it takes an existing model and retrains itself using it to meet the current purpose. Since the AI model uses a pre-existing model, it takes significantly less computing resources and requires less data. But transfer learning works best when the previous model has been trained on a large dataset. Also, the previous model has to be reliable and not contain any biases. So transfer learning can minimize data use but doesn’t exclude the need for data fully.

The explainable AI (XAI) method helps to reduce the black box effect of artificial intelligence. The goal of explainable AI is to assist humans in understanding what’s happening under the hood of an AI system. With this method, an AI model can explain its decisions. It can also characterize its own abilities and give some insights about its future behavior. Explainable AI cannot directly reduce the need for data, but it allows us to understand which exact data is required to enhance model accuracy so researchers can extend the training dataset with required data only and not add a lot of meaningless data.

XAI concept
How to Train an AI with GDPR Limitations
Source: Darpa

The simple truth is that all of these AI training methods we’ve mentioned are somewhat limited. They may comply with one GDPR principle but contradict another. This means that to train AI models properly and achieve great results, you’ll have to combine several methods.

All in all, artificial intelligence and GDPR are closely connected. The data privacy regulation affects not only the development of artificial intelligence (AI) in Europe but also any company whose AI system processes the data of EU residents.

No doubt the impact of GDPR on machine learning will be huge. Tech companies have to revise their data privacy and artificial intelligence policies. Data controllers have to ensure that their AI systems don’t violate the regulation. Luckily, there are several methods of making AI compliant with GDPR. GANs, XAI, federated learning, transfer learning, and differential privacy can help you develop a GDPR-friendly artificial intelligence system.


If you’re thinking about making your AI compliant with GDPR, contact Intellias. Our experts will help to ensure your system meets the new data privacy requirements.

Your subscription is confirmed.
Thank you for being with us.

No votes Thank you for your vote. 19118 a5fc560d5b

Thank you for your message.
We will get back to you shortly.