Is AI training on your content?

New to Charity Digital?

Register

New to Charity Digital?

Register

INSIGHTS

All Topics

Featured Hub

Register/Login

New to Charity Digital?

Register

You are viewing 1 of your 2 free articles

A hand trying to steal a rose from underneath a trap, representing unwanted AI content scraping

There are bots crawling all over the internet. They’ve been doing it for a long time — since the early days of search engines like Google. Bots crawl websites, index the content, and report back to search engines. When someone pops a search term or question into the search engine, the response is informed by what the bots have found.

Web crawlers or “spiders” have been welcomed by web developers in the past because they help drive traffic to clients’ sites. And there are systems in place to guide these bots on which pages they can and can’t index.

AI crawlers work in a slightly different way because they have different goals. When an AI bot crawls a website, it might be looking for one or all of the following:

● Data to train a large language model

● Information for users to complement the content of an AI summary

● Content to index so they can come back to it when needed

When data retrieval from websites is automated and bot-led, this is known as AI scraping.

The ability to retrieve and process large volumes of data can be beneficial to charities in some circumstances — in research and data analysis, for example. But without proper safeguards, data that is private, personal, or the intellectual property of third sector organisations could be scraped and used to train AI.

This is a growing issue and taking a position on how charity content is used is an increasing concern for the sector. Data from Cloudflare shows that from July 2024 to July 2025, raw requests from GPTBot rose 147% and raw requests from Meta-External Agent rose 843%. These bots are scraping data to train ChatGPT and Meta’s AI models respectively.

If an AI model has been trained on a charity’s data, it’s unlikely to repeat that data verbatim in response to a question or search query, but it is possible for AI models to memorise strings of data which can be extracted in cyber attacks. Being able to classify content as public, available for a limited audience, private, or protected could help charities think through where and how to publish that content, and where to limit AI access.

Should charities try to stop AI scraping their content?

Charities may want to make some content accessible to AI crawler and scraper bots while actively trying to conceal other content.

For example, in a charity’s role as a subject expert and educator, it might want to make factual information about the issue it works on available for scraping to help prevent AI misinformation on the topic.

Charities delivering services directly to communities may use personal stories on their website and in social media content. While they should have sought permission from the person whose story is being told, they may not have had a conversation about the possibility of the story being scraped by AI for indexing and training and therefore, may want to prevent this from happening.

Is it possible to prevent AI from scraping your content?

If social sector organisations get to a stage where they have guidelines on which types of content is okay to be scraped and which types should be protected, there are a few ways to try and shield content from AI bots.

Robots.txt

A robots.txt file gives bots instructions on how to access your website. Their use is evolving to take AI crawler and scraper bots into account. Charities can now instruct either all bots or specific bots on website pages they are and aren’t allowed to crawl or scrape.

Http headers and meta tags

It’s also possible to add instructions to bots on a specific page or object on a page using a http header or a meta tag. Examples include ‘X-Robots-Tag:noindex’ which prevents the content from being indexed.

Cloaking and poisoning images

Charities can make it harder for bots to scrape images from their websites by using cloaking tools that distort images for bots, or using poisoning tools that cause bots to misinterpret what they’re seeing.

Opt out or hide content

There are also options at the content management system or platform level that charities can use to prevent AI scraping content. For example, password protecting web pages, or changing settings on website and social media platforms to opt-out of AI training.

Dealing with AI scraping isn’t a simple case of following best practice. There are ethical questions at stake which may be answered differently by each organisation and for each type of content.

Some scraping prevention methods only require voluntary compliance from AI companies and AI legislation is too early-stage and patchy to fully cover the multi-jurisdictional issues that AI presents. Making decisions about how content should or shouldn’t be used by AI and staying informed about methods to handle crawling are the cornerstones of a proactive approach.

Follow-up questions for CAI

What are AI bots?When might charities want AI to train on their content?Why might charities want to prevent AI from training on their content?What practical steps protect personal stories from AI data scraping?How can charities balance public education and protecting sensitive content?

Helen Olszowska

Managing Director, Seashell Collective

Helen Olszowska

Managing Director, Seashell Collective

How to build resilience at work

11 Jun 2026by Charity Digital

How the Charity Digital Strategy Accelerator for Scotland can help you

10 Jun 2026by Laura Stanley

Changes to employment law charities need to know

10 Jun 2026by Helen Olszowska

How to become a trustee

Artificial intelligence: Understanding intellectual property

Generative AI: Is it worth the hype?

How the EU AI Act impacts you

How to tackle AI misinformation

VoiceVoice for Nonprofits: Access a 50% discount

monday.com for nonprofits

Apteco for charity: Exclusive discounts and support for charities and non-profits

How to build resilience at work

Hub-Topics

11 Jun 2026by Charity Digital

How the Charity Digital Strategy Accelerator for Scotland can help you

Hub-Topics Trustees

10 Jun 2026by Laura Stanley

Changes to employment law charities need to knowSponsored Article

Leadership & Skills

10 Jun 2026by Helen Olszowska

How to become a trustee

Leadership & Skills Trustee Trustees

Our Events

13 May 2026Online

Masterclass: An introduction to generative AI

09 Apr 2026

Webinar: A guide to creating ethical AI-generated imagery

12 Feb 2026

Webinar: Strengthening non-profit funding strategies in 2026

07 Mar 2024

On-demand webinar: Securing your charity in the age of AI

29 Feb 2024

On-demand webinar: A guide to digital mapping tools

View All

Charity Digital Academy

Our courses aim, in just three hours, to enhance soft skills and hard skills, boost your knowledge of finance and artificial intelligence, and supercharge your digital capabilities. Check out some of the incredible options by clicking here.

Tell me more