Please ensure Javascript is enabled for purposes of website accessibility

West Bank Town Becomes ‘Big Prison’ as Israel Fences It In

3 days ago

Trump Says He’s Willing to Let Migrant Farm Laborers Stay in US

3 days ago

US Electric Vehicle Tax Breaks Will Expire on Sept. 30

4 days ago

Eyeing Arctic Dominance, Trump Bill Earmarks $8.6 Billion for US Coast Guard Icebreakers

4 days ago

Trump’s Sweeping Tax-Cut and Spending Bill Wins Congressional Approval

4 days ago

Americans Celebrate Their Independence With Record-Breaking Travel Numbers

4 days ago

US Supreme Court to Decide Legality of Transgender School Sports Bans

4 days ago

Nvidia Set to Become the World’s Most Valuable Company in History

4 days ago

Poll: 41% in US ‘Extremely Proud’ to Be American, Near Historic Low

4 days ago
Study Shows AI Image-Generators Being Trained on Explicit Photos of Children
gvw_ap_news
By Associated Press
Published 2 years ago on
December 21, 2023

Share

Hidden inside the foundation of popular artificial intelligence image-generators are thousands of images of child sexual abuse, according to a new report that urges companies to take action to address a harmful flaw in the technology they built.

Those same images have made it easier for AI systems to produce realistic and explicit imagery of fake children as well as transform social media photos of fully clothed real teens into nudes, much to the alarm of schools and law enforcement around the world.

Until recently, anti-abuse researchers thought the only way that some unchecked AI tools produced abusive imagery of children was by essentially combining what they’ve learned from two separate buckets of online images — adult pornography and benign photos of kids.

But the Stanford Internet Observatory found more than 3,200 images of suspected child sexual abuse in the giant AI database LAION, an index of online images and captions that’s been used to train leading AI image-makers such as Stable Diffusion. The watchdog group based at Stanford University worked with the Canadian Centre for Child Protection and other anti-abuse charities to identify the illegal material and report the original photo links to law enforcement. It said roughly 1,000 of the images it found were externally validated.

The response was immediate. On the eve of the Wednesday release of the Stanford Internet Observatory’s report, LAION told The Associated Press it was temporarily removing its datasets.

LAION, which stands for the nonprofit Large-scale Artificial Intelligence Open Network, said in a statement that it “has a zero tolerance policy for illegal content and in an abundance of caution, we have taken down the LAION datasets to ensure they are safe before republishing them.”

While the images account for just a fraction of LAION’s index of some 5.8 billion images, the Stanford group says it is likely influencing the ability of AI tools to generate harmful outputs and reinforcing the prior abuse of real victims who appear multiple times.

Not an Easy Problem to Fix

It’s not an easy problem to fix, and traces back to many generative AI projects being “effectively rushed to market” and made widely accessible because the field is so competitive, said Stanford Internet Observatory’s chief technologist David Thiel, who authored the report.

“Taking an entire internet-wide scrape and making that dataset to train models is something that should have been confined to a research operation, if anything, and is not something that should have been open-sourced without a lot more rigorous attention,” Thiel said in an interview.

A prominent LAION user that helped shape the dataset’s development is London-based startup Stability AI, maker of the Stable Diffusion text-to-image models. New versions of Stable Diffusion have made it much harder to create harmful content, but an older version introduced last year — which Stability AI says it didn’t release — is still baked into other applications and tools and remains “the most popular model for generating explicit imagery,” according to the Stanford report.

“We can’t take that back. That model is in the hands of many people on their local machines,” said Lloyd Richardson, director of information technology at the Canadian Centre for Child Protection, which runs Canada’s hotline for reporting online sexual exploitation.

Stability AI on Wednesday said it only hosts filtered versions of Stable Diffusion and that “since taking over the exclusive development of Stable Diffusion, Stability AI has taken proactive steps to mitigate the risk of misuse.”

“Those filters remove unsafe content from reaching the models,” the company said in a prepared statement. “By removing that content before it ever reaches the model, we can help to prevent the model from generating unsafe content.”

LAION was the brainchild of a German researcher and teacher, Christoph Schuhmann, who told the AP earlier this year that part of the reason to make such a huge visual database publicly accessible was to ensure that the future of AI development isn’t controlled by a handful of powerful companies.

“It will be much safer and much more fair if we can democratize it so that the whole research community and the whole general public can benefit from it,” he said.

Much of LAION’s data comes from another source, Common Crawl, a repository of data constantly trawled from the open internet, but Common Crawl’s executive director, Rich Skrenta, said it was “incumbent on” LAION to scan and filter what it took before making use of it.

LAION said this week it developed “rigorous filters” to detect and remove illegal content before releasing its datasets and is still working to improve those filters. The Stanford report acknowledged LAION’s developers made some attempts to filter out “underage” explicit content but might have done a better job had they consulted earlier with child safety experts.

Many text-to-image generators are derived in some way from the LAION database, though it’s not always clear which ones. OpenAI, maker of DALL-E and ChatGPT, said it doesn’t use LAION and has fine-tuned its models to refuse requests for sexual content involving minors.

Google built its text-to-image Imagen model based on a LAION dataset but decided against making it public in 2022 after an audit of the database “uncovered a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.”

Trying to clean up the data retroactively is difficult, so the Stanford Internet Observatory is calling for more drastic measures. One is for anyone who’s built training sets off of LAION‐5B — named for the more than 5 billion image-text pairs it contains — to “delete them or work with intermediaries to clean the material.” Another is to effectively make an older version of Stable Diffusion disappear from all but the darkest corners of the internet.

“Legitimate platforms can stop offering versions of it for download,” particularly if they are frequently used to generate abusive images and have no safeguards to block them, Thiel said.

As an example, Thiel called out CivitAI, a platform that’s favored by people making AI-generated pornography but which he said lacks safety measures to weigh it against making images of children. The report also calls on AI company Hugging Face, which distributes the training data for models, to implement better methods to report and remove links to abusive material.

Hugging Face said it is regularly working with regulators and child safety groups to identify and remove abusive material. Meanwhile, CivitAI said it has “strict policies” on the generation of images depicting children and has rolled out updates to provide more safeguards. The company also said it is working to ensure its policies are “adapting and growing” as the technology evolves.

The Stanford report also questions whether any photos of children — even the most benign — should be fed into AI systems without their family’s consent due to protections in the federal Children’s Online Privacy Protection Act.

Rebecca Portnoff, the director of data science at the anti-child sexual abuse organization Thorn, said her organization has conducted research that shows the prevalence of AI-generated images among abusers is small, but growing consistently.

Developers can mitigate these harms by making sure the datasets they use to develop AI models are clean of abuse materials. Portnoff said there are also opportunities to mitigate harmful uses down the line after models are already in circulation.

Tech companies and child safety groups currently assign videos and images a “hash” — unique digital signatures — to track and take down child abuse materials. According to Portnoff, the same concept can be applied to AI models that are being misused.

“It’s not currently happening,” she said. “But it’s something that in my opinion can and should be done.”

RELATED TOPICS:

DON'T MISS

What Are Fresno Real Estate Experts Predicting for 2025 and Beyond?

DON'T MISS

First California EV Mandates Hit Automakers This Year. Most Are Not Even Close

DON'T MISS

How Trump’s ‘Big, Beautiful Bill’ Will Make China Great Again

DON'T MISS

What’s Caitlin Clark Worth to the WNBA? A Lot More Than Her $78,066 Salary.

DON'T MISS

Trump to Sign Tax-Cut and Spending Bill in July 4 Ceremony

DON'T MISS

Madre Fire Spurs Evacuations Across 3 Counties, Grows to More Than 70,000 Acres

DON'T MISS

Clovis, Sanger, Madera, and Bass Lake Will Light the Sky With Fireworks Shows Tonight

DON'T MISS

Oil Dips Ahead of Expected OPEC+ Output Increase

DON'T MISS

613 Killed at Gaza Aid Distribution Sites, Near Humanitarian Covoys, Says UN

DON'T MISS

Fresno County Authorities Investigating Suspicious Death of Transient Man

DON'T MISS

West Bank Town Becomes ‘Big Prison’ as Israel Fences It In

DON'T MISS

Israeli Military Kills 20 in Gaza as Trump Awaits Hamas Reply to Truce Proposal

UP NEXT

Boeing Secures $2.8 Billion US Satellite Contract

UP NEXT

Nvidia Set to Become the World’s Most Valuable Company in History

UP NEXT

Del Monte Files for Bankruptcy. Gets Nearly $1B to Keep Producing Through Process

UP NEXT

Meta’s Instagram Down for Thousands of Users in US, Downdetector Shows

UP NEXT

Google Hit With $314 Million US Verdict in Cellular Data Class Action

UP NEXT

US Senate Strikes AI Regulation Ban From Trump Megabill

UP NEXT

Dollar Gains Ground Against Major Peers After Better-Than-Expected US Jobs Data

UP NEXT

Apple Loses Bid to Dismiss US Smartphone Monopoly Case

UP NEXT

Newsom and Legislature Tangle With Construction Unions Over Minimum Wage

UP NEXT

Teamsters President Urges Congress to Scrap AI State Law Ban

Wanted Fugitive Found Hiding in Attic Arrested in Chowchilla

2 hours ago

Trump Says US Will Impose 25% Tariffs on Japan, South Korea

3 hours ago

Wall Street Knocked Lower by Tariff Jitters, Musk’s Political Plan Hurts Tesla

3 hours ago

Trial Over Free Speech on Campus, and Trump’s Student Crackdown, Begins

3 hours ago

Planned Parenthood Sues Trump Administration Over Planned Defunding

3 hours ago

San Luis Obispo’s Madre Fire Injures 1 Firefighter, Burns Over 80,000 Acres

3 hours ago

Two Border Patrol Officers Injured After Gunman Opens Fire in Texas

3 hours ago

Fresno Police Arrest 9 at Independence Day DUI Checkpoint

3 hours ago

Schumer Wants Probe of National Weather Service Response in Texas

4 hours ago

Israeli Guilt Over Gaza Lurks Beneath Silence and Denial

4 hours ago

Man Dead After Firing at US Border Patrol Station in Texas

WASHINGTON – A 27-year-old Michigan man was shot dead by police after opening fire with an assault rifle on a U.S. Border Patrol stati...

1 hour ago

Photo of caution tape
1 hour ago

Man Dead After Firing at US Border Patrol Station in Texas

The Flume Fire in Sequoia National Forest has burned 65 acres near Highway 190 with no containment as of Monday, July 7, 2025, prompting evacuations in Tulare County. (CalFire)
2 hours ago

Tulare County Flume Fire Burns 65 Acres in Sequoia National Forest, Evacuation Order Issued

Firefighters stopped the forward progress of the Fish Fire near Avocado Lake after it burned 15 acres Monday, July 7, 2025, reaching 50% containment. (CalFire)
2 hours ago

Fresno County Fish Fire Burns 15 Acres Near Avocado Lake, 50% Contained

Gary White, 42, a wanted fugitive, was arrested in Chowchilla after deputies found him hiding in an attic and he surrendered without incident on Thursday, July 3, 2025. (Madera County SO)
2 hours ago

Wanted Fugitive Found Hiding in Attic Arrested in Chowchilla

Containers on a cargo ship are pictured at an industrial port in Tokyo, Japan, July 2, 2025. (Reuters File)
3 hours ago

Trump Says US Will Impose 25% Tariffs on Japan, South Korea

Traders work on the floor at the New York Stock Exchange (NYSE) in New York City, U.S., June 30, 2025. (Reuters/Brendan McDermid)
3 hours ago

Wall Street Knocked Lower by Tariff Jitters, Musk’s Political Plan Hurts Tesla

Protesters march near the campus of Columbia University in upper Manhattan to demand the release of Mahmoud Khalil, a Palestinian activist and former Columbia student, on March 14, 2025. A federal judge in Boston on Monday, July 7, 2025, will hear opening statements in a trial expected to present the foremost challenge to the Trump administration’s aggressive posture toward foreign students who espoused pro-Palestinian views. (Dave Sanders/The New York Times)
3 hours ago

Trial Over Free Speech on Campus, and Trump’s Student Crackdown, Begins

Activists for Planned Parenthood demonstrate as the U.S. Supreme Court hears oral arguments in South Carolina's bid to cut off public funding to Planned Parenthood, in Washington, D.C., U.S., April 2, 2025. (Reuters File)
3 hours ago

Planned Parenthood Sues Trump Administration Over Planned Defunding

Help continue the work that gets you the news that matters most.

Search

Send this to a friend