Using AI large language models for government work poses privacy risks, says Victorian deputy privacy commissioner

May 23, 2023

Petra Stock

Petra Stock is a journalist and engineer. She has previously worked in climate change, renewable energy, environmental planning and Aboriginal heritage policy.

Victoria’s deputy privacy commissioner says governments should steer clear of using large language models like ChatGPT in their daily work because of privacy risks.

“The privacy risk is not extreme, but it is present,” says Rachel Dixon, Privacy and Data Protection Deputy Commissioner at the Office of the Victorian Information Commissioner.

“At the moment it’s not appropriate for governments to use these tools in normal government work,” she tells Cosmos.

211101 Rachel Dixon copy — Rachel Dixon, Privacy and Data Protection Deputy Commissioner, Office of the Victorian Information Commissioner / Supplied

Government has special responsibilities that don’t necessarily apply to the private sector, Dixon says, and there are potential risks associated with using artificial intelligence (AI) software like ChatGPT for emails, letters and reports if those documents contain any personal information.

In addition to privacy, Dixon flags risks related to the accuracy and credibility of outputs from large language models.

Under Australian state and federal privacy laws, governments are bound by principles governing how people’s personal information and data can be collected, used, stored, shared, and kept up-to-date.

In her role, Dixon oversees Victorian public sector organisations and Ministerial compliance with Victoria’s Privacy and Data Protection Act 2014.

She is particularly concerned about moves by companies like Microsoft and Google to embed AI models into their enterprise software products, which, for example might give users the ability to write and respond to emails with the help of large language models.

“If you’re writing something with Microsoft Office, or the Google products […] and those documents are being used to train the large language model further (which is part of the terms and conditions) then if the document in question is contains sensitive personal information, you really can’t do that […] because it contains personal information,” she says.

On Monday, Home Affairs Secretary Michael Pezzullo told senate estimates he had issued an internal directive requiring approval for use of ChatGPT by departmental staff, according to IT news.

Professor Jeannie Paterson, co-director of the Centre for AI and Digital Ethics at the University of Melbourne agrees the training data used in AI large language models poses privacy risks “because of what might be spewed out”.

“It’s possible that information about individuals could be produced in answer to a prompt. That’s the immediate concern.”

People using AI tools – entering prompts or questions into these models – are often themselves becoming part of the training data set for the next iteration, Paterson says.

“One of the concerns about ChatGPT is that people are using it in businesses in relation to data, [that] they themselves are obliged to keep private or confidential.”

Image copy — Professor Jeannie Paterson, co-director of the Centre for AI and Digital Ethics at the University of Melbourne / Supplied

Dixon says governments, like Victoria, are substantial customers of enterprise software products and she’s surprised there hasn’t been more engagement between the technology companies and governments about minimising privacy risks.

For instance, the Victorian Government contracts Microsoft to supply software for the public service and contracts Google Workspace for Education to provide software to Victorian government students, teachers and staff.

Both Paterson and Dixon argue there needs to be more transparency around the data used in training AI models.

Paterson says, OpenAI has been pursued by the Italian privacy regulator over ChatGPT because it hasn’t followed European Union law in relation to obtaining consent for the use of personal data.

“Individuals don’t know whether their personal data is in there, or indeed, whether there’s a lawful basis for the use of that data,” she says.

It is even possible personal data collected by governments could find its way into the system by different pathways, such as hacks where data is released, she says.

Once personal or private information finds its way into training data, it’s almost impossible to delete, Paterson says.

Samantha Floreani, Program Lead at advocacy group Digital Rights Watch says privacy is an important principle which enables people to retain a sense of autonomy and control over their personal information.

Privacy is also important collectively as a way to push back against pervasive data gathering and monetisation practices, commonly known as ‘surveillance capitalism’, she says.

Floreani says it is important to consider the provenance of data used for training AI models: where it came from, how it was collected, and how it came to be shared, sold, or scraped in order to be used in AI.

“That’s important because the very same companies who stand to benefit the most from widespread development and adoption of AI technologies, are the same ones who also benefited from the normalisation of surveillance capitalism, and were able to collect, generate and commodify immense amounts of our personal information with impunity with very little regulation to stop them,” she says.

Personal data is often collected in circumstances which are sneaky, deceptive or exploitative, and can end up being used and combined in ways that people may never have fully expected, understood or consented to, Floreani says.

“In many ways, the big tech, AI industry kind of exists as a way of like rationalising and creating additional profit from widespread consumer surveillance, which is fundamentally based on invasions of privacy.”

DSCF9539 copy — Samantha Floreani, Program Lead at Digital Rights Watch / Supplied

The Australian and Victorian Governments were contacted in relation to public sector use of large language models and privacy, but did not respond by deadline. Technology companies, Microsoft and Google, did not respond to Cosmos questions about AI privacy risks.

Paterson says the solution for governments and regulators isn’t yet clear. But she says a starting point is the review of the Privacy Act 1988, currently underway, as well as broader regulation of generative AI to address risks beyond privacy.

“We do need to sort of come up with a view about how much we value personal privacy and what we think is an appropriate trade-off between the development of technology and the importance of personal privacy,” she says.

Using AI large language models for government work poses privacy risks, says Victorian deputy privacy commissioner

Petra Stock

AI can churn out 17,000 words of disinformation per hour

Addressing the massive climate and energy costs of AI

Generative AI could automate sexual abuse and child grooming, eSafety Commissioner says

Degenerative AI: Researchers say training artificial intelligence models on machine-generated data leads to model collapse

Petra Stock

Explainer: Unethical AI and what can be done about it