Cloudflare detected that Perplexity AI used stealth crawlers to bypass protections such as robots.txt and Web Application Firewall (WAF) blocks. These crawlers impersonated legitimate Chrome browsers to scrape content that is otherwise protected. This bypass technique enabled Perplexity AI to make 3 to 6 million daily requests across tens of thousands of domains. The bypass affects any protections relying on robots.txt and WAF rules, as the stealth crawler mimics regular browsers, thereby evading detection and filtering mechanisms. This incident highlights the challenges in maintaining web security and the importance of respecting web ethics to protect digital content from unauthorized scraping and access.
For more insights, check out the original tweet here: https://twitter.com/Fatesblind/status/1952595419822948412