AI: data round up - French authority's guidance on data protection issues in generative AI

AI: data round up - A pro-innovation approach to AI regulation

The French national data protection authority (the Commission Nationale de L’Informatique or CNIL) has published its first set of guidelines on the development of AI systems and the use of the datasets these systems learn from.

The guidelines are in the form of seven “how to” sheets, each dealing with a different data protection issue as it relates to the development stage of generative AI and the systems involved in the processing of personal data. The “how to” sheets aim to help those working in generative AI to comply with their obligations under the GDPR by providing guidance on how key data protection topics affect this area, together with some commentary on various examples. For example, Sheet 3 discusses the responsibilities of a controller in some scenarios, as well as looking at joint controllers and processors. You can read the “how to” sheets here.

The CNIL also published responses to concerns raised by those working in AI as to how purpose limitation, data minimisation and storage limitation principles can work in the context of machine learning. In summary the responses were:

In respect of purpose limitation, the CNIL understands that it is not possible to define at the training stage all the applications, generative AI may have in future. This should not offend the purpose limitation principle if the potential functionalities and system type of the AI are well defined.
The data minimisation principle does not prohibit the use of large datasets for training an AI algorithm, though the datasets should be selected with the aim of ensuring optimal training and the unnecessary use of personal data should be avoided.
The long-term use of datasets for algorithm training will not contravene the storage limitation principle as long retention periods may well be justified given that the development of such datasets often requires significant investment.

Helpfully, the CNIL also confirmed that it is possible to re-use datasets as long as the data was not collected unlawfully in the first place and the purpose of the re-use is compatible with the initial data gathering.

The CNIL opened up the “how to” sheets to a public consultation which ended in December last year. The CNIL hoped that the those working within the AI sector would comment around the issues of data subjects’ rights and freedoms that will arise in the use of generative AI.

The CNIL planned to publish final versions of the “how to” sheets in the early part of this year following a review of the responses to the consultation, but to date these have not been published. Despite that and the fact that the “how to” sheets are limited in scope to the development stage of generative AI only, those working in the sector will want to keep an eye out for the updated documents to see what views the regulator is taking following the public consultation.

Contact our experts for further advice

Beverley Flynn