An In-Depth Look into the Prevalence of ai Crawler Blocking Among Leading English News Websites
Press Gazette’s recent analysis sheds light on the extent to which top 100 news websites in the English language implement measures to restrict access of artificial intelligence (ai) web crawlers from their content. In an investigation involving 106 sites, it was found that roughly half of these news platforms permit unrestricted access to ai crawlers, while the other half exhibit varying degrees of restriction.
The Distribution of Restrictions
Of the 106 sites surveyed, a little over 45% did not impose any restrictions on ai crawlers. However, a substantial proportion, which equates to 61 sites, imposed some level of restriction by blocking at least one ai bot. Among these restricted websites, 32 sites take a more stringent stance and block two or more ai crawlers – some even bar up to five different bots.
Which ai Crawlers Face the Most Restriction?
When it comes to blocked ai crawlers, ChatGPT’s GPTBot and Google-Extended lead the list. A significant 56.6% of the surveyed news websites disallow access to GPTBot, while around 17.3% restrict Google-Extended. Other ai crawlers that face varying degrees of restriction across the surveyed websites include Claude-Web, Claudebot, anthropic-ai, Cohere-ai, Perplexity-ai, Seekr, and Meltwater.
Noteworthy Exclusions and Inclusions
Despite the widespread implementation of ai crawler blocking, certain publishers choose not to impose any restrictions. Some notable examples include Mirror, Express, Manchester Evening News, Ladbible, Unilad, and publications under the Lebedev-owned Independent and Evening Standard umbrella. Additionally, Politico, Axel Springer’s subsidiary, permits access to ai crawlers due to a content-sharing agreement with OpenAI.
In an intriguing twist, The Daily Beast, owned by IAC, chooses not to block any ai bots despite its chairman advocating for compensation to publishers from ai companies. Conversely, some politically conservative websites, such as GB News, Newsmax, Zero Hedge, Breitbart, and Fox News, opt not to block ai crawlers – a stance that contrasts with other publications under the Murdoch-owned umbrella.
Implications and Future Outlook
The differing stances taken by news publishers regarding ai crawler access are a reflection of the ongoing debate surrounding content usage, intellectual property rights, and distribution in the digital age. While some publishers prioritize maintaining control over their content and safeguarding against unauthorized usage, others focus on collaboration with ai companies for content dissemination and innovation.
As the digital landscape continues to evolve, it remains to be seen how publishers, ai companies, and regulatory bodies will navigate the complex intersection of technology, content ownership, and user privacy. The decisions made by news publishers not only impact the distribution of news but also shape the broader conversation surrounding digital content usage and intellectual property rights.