HB 2503 (AI Training data)

Chair Ryu, Ranking Member Barnard, and members of the Committee,

I'm Jon Pincus of Bellevue. I run the Nexus of Privacy newsletter, and served on the state Automated Decision Systems Workgroup in 2022. My position on HB 2503 is PRO.

In 2023, Stanford researchers discovered child abuse material in an image database that had been used to train many gen AI systems, including Stable Diffusion. The database was quickly removed from the internet … but its traces remained in every genAI model that had been trained on that dataset, and as a result they are more likely to generate synthetic CSAM.

Which models are those? Good question! We have no idea of the complete list. That's one reason why disclosure of AI training data is so important.

Another important reason: disclosure of training data can reveal situations where a gen AI systems was trained on data that the owner doesn't have rights to.

I also want to highlight the importance of treating a violation as an unfair or deceptive act under the consumer protection act. For this bill to be effective, there needs to be a realistic prospect of strong enforcement. The consumer protection act allows both AG enforcement and a private right of action. This is especially important given fiscal constraints on the AGO.

If California didn't include a strong enforcement mechanism, congrats to the tech lobbyists. But that's California's problem, not a roadmap for what we should do here in Washington.

It's long past time to end the tech industry's expectations that they should be allowed to engage in unfair and deceptive business practices without consequence.