eupolicy.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
This Mastodon server is a friendly and respectful discussion space for people working in areas related to EU policy. When you request to create an account, please tell us something about you.

Server stats:

196
active users

#webscraping

1 post1 participant0 posts today
Holger Hellingers' Polente<p>Die HTML-Bombe ist eine neuartige Verteidigung gegen KI-Crawler. Sie bläht eine Seite auf über 10 GB auf und lässt Bots abstürzen. Ein kreatives Mittel gegen unerwünschtes Crawling.</p><p><a href="https://polente.de/2025/08/22/die-html-bombe-digitale-abwehr-gegen-ki-crawler/" class="" rel="nofollow noopener" target="_blank">https://polente.de/2025/08/22/die-html-bombe-digitale-abwehr-gegen-ki-crawler/</a></p>
Miguel Afonso Caetano<p>From the comments: "First it was third-party apps and the API, now it's the Wayback Machine. This is more about control and being able to disappear anything they want than AI scraping."</p><p><a href="https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limit" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">theverge.com/news/757538/reddi</span><span class="invisible">t-internet-archive-wayback-machine-block-limit</span></a></p><p><a href="https://tldr.nettime.org/tags/Reddit" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Reddit</span></a> <a href="https://tldr.nettime.org/tags/SocialMedia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SocialMedia</span></a> <a href="https://tldr.nettime.org/tags/InternetArchive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>InternetArchive</span></a> <a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> <a href="https://tldr.nettime.org/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a></p>
knoppix<p>Cloudflare says Perplexity evaded website blocks with stealth crawlers, sparking debate over AI data ethics ⚠️<br>Perplexity denies the claims, calling the analysis flawed and insisting user-driven access only 🤖</p><p>Users split: some defend AI access, others back stricter protections for site owners 🔐</p><p><span class="h-card" translate="no"><a href="https://mastodon.social/@itsfoss" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>itsfoss</span></a></span> </p><p><a href="https://news.itsfoss.com/perplexity-ignores-blocking" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">news.itsfoss.com/perplexity-ig</span><span class="invisible">nores-blocking</span></a></p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://mastodon.social/tags/Privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Privacy</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/ContentOwnership" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ContentOwnership</span></a> <a href="https://mastodon.social/tags/PerplexityAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PerplexityAI</span></a> <a href="https://mastodon.social/tags/CyberSecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CyberSecurity</span></a> <a href="https://mastodon.social/tags/DigitalEthics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalEthics</span></a> <a href="https://mastodon.social/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RobotsTxt</span></a> <a href="https://mastodon.social/tags/Perplexity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Perplexity</span></a> <a href="https://mastodon.social/tags/Cloudfare" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Cloudfare</span></a> <a href="https://mastodon.social/tags/TechNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechNews</span></a> <a href="https://mastodon.social/tags/Tech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Tech</span></a></p>
apfeltalk :verified:<p>Perplexity ignoriert robots.txt: Kontroverse um Daten-Scraping für KI-Training<br>Das Training großer Sprachmodelle beruht auf einer Vielzahl von Webdaten. Die Einhaltu<br><a href="https://www.apfeltalk.de/magazin/news/perplexity-ignoriert-robots-txt-kontroverse-um-daten-scraping-fuer-ki-training/" rel="nofollow noopener" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">apfeltalk.de/magazin/news/perp</span><span class="invisible">lexity-ignoriert-robots-txt-kontroverse-um-daten-scraping-fuer-ki-training/</span></a><br><a href="https://creators.social/tags/News" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>News</span></a> <a href="https://creators.social/tags/Apple" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Apple</span></a> <a href="https://creators.social/tags/Applebot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Applebot</span></a> <a href="https://creators.social/tags/Cloudflare" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Cloudflare</span></a> <a href="https://creators.social/tags/Cybersecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Cybersecurity</span></a> <a href="https://creators.social/tags/Datenanalyse" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Datenanalyse</span></a> <a href="https://creators.social/tags/Datensicherheit" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Datensicherheit</span></a> <a href="https://creators.social/tags/EthikInDerKI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>EthikInDerKI</span></a> <a href="https://creators.social/tags/KITraining" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KITraining</span></a> <a href="https://creators.social/tags/KnstlicheIntelligenz" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KnstlicheIntelligenz</span></a> <a href="https://creators.social/tags/OpenWeb" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenWeb</span></a> <a href="https://creators.social/tags/Perplexity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Perplexity</span></a> <a href="https://creators.social/tags/robotstxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotstxt</span></a> <a href="https://creators.social/tags/Sprachmodell" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Sprachmodell</span></a> <a href="https://creators.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://creators.social/tags/WebseitenBetreiber" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebseitenBetreiber</span></a></p>
Öppet Moln<p>ArchiveBox, ett öppet verktyg för arkivering🕷️📚</p><p><a href="https://oppetmoln.se/20250728/archivebox-ett-oppet-verktyg-for-arkivering" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">oppetmoln.se/20250728/archiveb</span><span class="invisible">ox-ett-oppet-verktyg-for-arkivering</span></a></p><p><a href="https://mastodon.online/tags/archive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>archive</span></a> <a href="https://mastodon.online/tags/arkiv" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>arkiv</span></a> <a href="https://mastodon.online/tags/webbarkiv" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webbarkiv</span></a> <a href="https://mastodon.online/tags/arkivering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>arkivering</span></a> <a href="https://mastodon.online/tags/scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.online/tags/webscraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraping</span></a> <a href="https://mastodon.online/tags/foss" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>foss</span></a> <a href="https://mastodon.online/tags/oss" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>oss</span></a> <a href="https://mastodon.online/tags/opensource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>opensource</span></a></p>
Lenin alevski 🕵️💻<p>New Open-Source Tool Spotlight 🚨🚨🚨</p><p>Scrapling is redefining Python web scraping. Adaptive, stealthy, and fast, it can bypass anti-bot measures while auto-tracking changes in website structure. A standout: 4.5x faster than AutoScraper for text-based extractions. <a href="https://infosec.exchange/tags/Python" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Python</span></a> <a href="https://infosec.exchange/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a></p><p>🔗 Project link on <a href="https://infosec.exchange/tags/GitHub" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GitHub</span></a> 👉 <a href="https://github.com/D4Vinci/Scrapling" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/D4Vinci/Scrapling</span><span class="invisible"></span></a></p><p><a href="https://infosec.exchange/tags/Infosec" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Infosec</span></a> <a href="https://infosec.exchange/tags/Cybersecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Cybersecurity</span></a> <a href="https://infosec.exchange/tags/Software" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Software</span></a> <a href="https://infosec.exchange/tags/Technology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Technology</span></a> <a href="https://infosec.exchange/tags/News" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>News</span></a> <a href="https://infosec.exchange/tags/CTF" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CTF</span></a> <a href="https://infosec.exchange/tags/Cybersecuritycareer" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Cybersecuritycareer</span></a> <a href="https://infosec.exchange/tags/hacking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hacking</span></a> <a href="https://infosec.exchange/tags/redteam" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>redteam</span></a> <a href="https://infosec.exchange/tags/blueteam" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>blueteam</span></a> <a href="https://infosec.exchange/tags/purpleteam" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>purpleteam</span></a> <a href="https://infosec.exchange/tags/tips" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tips</span></a> <a href="https://infosec.exchange/tags/opensource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>opensource</span></a> <a href="https://infosec.exchange/tags/cloudsecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cloudsecurity</span></a></p><p>— ✨<br>🔐 P.S. Found this helpful? Tap Follow for more cybersecurity tips and insights! I share weekly content for professionals and people who want to get into cyber. Happy hacking 💻🏴‍☠️</p>
Nicolas MOUART<p>"Including children’s images in datasets has raised ethical concerns, particularly regarding privacy, consent, data protection, and accountability. These datasets, often built by scraping publicly available images from the Internet, can expose children to risks such as exploitation, profiling, and tracking. "</p><p><a href="https://arxiv.org/html/2504.14446" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/html/2504.14446</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/childprotection" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>childprotection</span></a> <a href="https://mastodon.social/tags/technology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>technology</span></a> <a href="https://mastodon.social/tags/socialmedia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>socialmedia</span></a> <a href="https://mastodon.social/tags/deepfake" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>deepfake</span></a> <a href="https://mastodon.social/tags/childabuse" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>childabuse</span></a> <a href="https://mastodon.social/tags/dataset" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataset</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/webscraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraping</span></a></p>
Nicolas MOUART<p>Q: Based on his ideas, would Adolf Hitler be for or against GDPR and right to erasure nowadays if he still lived?</p><p>A: It's reasonable to infer that Hitler would not support a regulation like <a href="https://mastodon.social/tags/GDPR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GDPR</span></a> which emphasizes individual rights such as <a href="https://mastodon.social/tags/privacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>privacy</span></a> protection, data accessibility or erasure; and instead might favor more centralized control over information dissemination for propaganda purposes. </p><p><a href="https://mastodon.social/tags/webscraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraping</span></a> <a href="https://mastodon.social/tags/technology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>technology</span></a> <a href="https://mastodon.social/tags/EU" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>EU</span></a> <a href="https://mastodon.social/tags/history" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>history</span></a> <a href="https://mastodon.social/tags/historyrepeating" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>historyrepeating</span></a> <a href="https://mastodon.social/tags/transparency" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>transparency</span></a> <a href="https://mastodon.social/tags/regulation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>regulation</span></a> <a href="https://mastodon.social/tags/humanrights" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>humanrights</span></a></p>
Miguel Afonso Caetano<p>"The report, titled “Are AI Bots Knocking Cultural Heritage Offline?” was written by Weinberg of the GLAM-E Lab, a joint initiative between the Centre for Science, Culture and the Law at the University of Exeter and the Engelberg Center on Innovation Law &amp; Policy at NYU Law, which works with smaller cultural institutions and community organizations to build open access capacity and expertise. GLAM is an acronym for galleries, libraries, archives, and museums. The report is based on a survey of 43 institutions with open online resources and collections in Europe, North America, and Oceania. Respondents also shared data and analytics, and some followed up with individual interviews. The data is anonymized so institutions could share information more freely, and to prevent AI bot operators from undermining their countermeasures. </p><p>Of the 43 respondents, 39 said they had experienced a recent increase in traffic. Twenty-seven of those 39 attributed the increase in traffic to AI training data bots, with an additional seven saying the AI bots could be contributing to the increase. </p><p>“Multiple respondents compared the behavior of the swarming bots to more traditional online behavior such as Distributed Denial of Service (DDoS) attacks designed to maliciously drive unsustainable levels of traffic to a server, effectively taking it offline,” the report said. “Like a DDoS incident, the swarms quickly overwhelm the collections, knocking servers offline and forcing administrators to scramble to implement countermeasures. As one respondent noted, ‘If they wanted us dead, we’d be dead.’”"</p><p><a href="https://www.404media.co/ai-scraping-bots-are-breaking-open-libraries-archives-and-museums/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">404media.co/ai-scraping-bots-a</span><span class="invisible">re-breaking-open-libraries-archives-and-museums/</span></a></p><p><a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> <a href="https://tldr.nettime.org/tags/CulturalHeritage" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CulturalHeritage</span></a> <a href="https://tldr.nettime.org/tags/AIBots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIBots</span></a> <a href="https://tldr.nettime.org/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://tldr.nettime.org/tags/CyberSecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CyberSecurity</span></a> <a href="https://tldr.nettime.org/tags/DDoS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DDoS</span></a></p>
Harald Klinke<p>Are AI bots overwhelming digital collections?<br>A new GLAM-E Lab report shows how scrapers for AI training datasets are putting real strain on the infrastructures of galleries, libraries, archives, and museums. Technical bottlenecks, ethical dilemmas, and escalating costs—open culture is under pressure.<br>Read the full analysis:<br><a href="https://www.glamelab.org/products/are-ai-bots-knocking-cultural-heritage-offline/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">glamelab.org/products/are-ai-b</span><span class="invisible">ots-knocking-cultural-heritage-offline/</span></a><br><a href="https://det.social/tags/DigitalHeritage" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalHeritage</span></a> <a href="https://det.social/tags/GLAM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GLAM</span></a> <a href="https://det.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://det.social/tags/OpenAccess" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAccess</span></a> <a href="https://det.social/tags/CulturalData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CulturalData</span></a> <a href="https://det.social/tags/MuseTech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MuseTech</span></a> <a href="https://det.social/tags/DigitalHumanities" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalHumanities</span></a> <a href="https://det.social/tags/GLAMlab" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GLAMlab</span></a></p>
Miguel Afonso Caetano<p>"To reiterate, whatever one's opinion of these particular AI tools, scraping itself is not the problem. Automated access is a fundamental technique of archivists, computer scientists, and everyday users that we hope is here to stay—as long as it can be done non-destructively. However, we realize that not all implementers will follow our suggestions for bots above, and that our mitigations are both technically advanced and incomplete.</p><p>Because we see so many bots operating for the same purpose at the same time, it seems there's an opportunity here to provide these automated data consumers with tailored data providers, removing the need for every AI company to scrape every website, seemingly, every day.</p><p>And on the operators' end, we hope to see more web-hosting and framework technology that is built with an awareness of these issues from day one, perhaps building in responses like just-in-time static content generation or dedicated endpoints for crawlers."</p><p><a href="https://www.eff.org/deeplinks/2025/06/keeping-web-under-weight-ai-crawlers" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">eff.org/deeplinks/2025/06/keep</span><span class="invisible">ing-web-under-weight-ai-crawlers</span></a></p><p><a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> <a href="https://tldr.nettime.org/tags/WebCrawlers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebCrawlers</span></a> <a href="https://tldr.nettime.org/tags/BigTech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BigTech</span></a> <a href="https://tldr.nettime.org/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://tldr.nettime.org/tags/OpenWeb" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenWeb</span></a></p>
Pyrzout :vm:<p>Threat Actor Claims TikTok Breach, Puts 428 Million Records Up for Sale <a href="https://hackread.com/threat-actor-tiktok-breach-428-million-records-sale/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">hackread.com/threat-actor-tikt</span><span class="invisible">ok-breach-428-million-records-sale/</span></a> <a href="https://social.skynetcloud.site/tags/cybersecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cybersecurity</span></a> <a href="https://social.skynetcloud.site/tags/Cybersecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Cybersecurity</span></a> <a href="https://social.skynetcloud.site/tags/CyberAttack" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CyberAttack</span></a> <a href="https://social.skynetcloud.site/tags/SocialMedia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SocialMedia</span></a> <a href="https://social.skynetcloud.site/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://social.skynetcloud.site/tags/databreach" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>databreach</span></a> <a href="https://social.skynetcloud.site/tags/Scrapping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scrapping</span></a> <a href="https://social.skynetcloud.site/tags/Security" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Security</span></a> <a href="https://social.skynetcloud.site/tags/security" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>security</span></a> <a href="https://social.skynetcloud.site/tags/TikTok" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TikTok</span></a></p>
Pyrzout :vm:<p>Threat Actor Claims TikTok Breach, Puts 428 Million Records Up for Sale – Source:hackread.com <a href="https://ciso2ciso.com/threat-actor-claims-tiktok-breach-puts-428-million-records-up-for-sale-sourcehackread-com/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">ciso2ciso.com/threat-actor-cla</span><span class="invisible">ims-tiktok-breach-puts-428-million-records-up-for-sale-sourcehackread-com/</span></a> <a href="https://social.skynetcloud.site/tags/1CyberSecurityNewsPost" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>1CyberSecurityNewsPost</span></a> <a href="https://social.skynetcloud.site/tags/CyberSecurityNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CyberSecurityNews</span></a> <a href="https://social.skynetcloud.site/tags/CyberSecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CyberSecurity</span></a> <a href="https://social.skynetcloud.site/tags/cybersecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cybersecurity</span></a> <a href="https://social.skynetcloud.site/tags/CyberAttack" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CyberAttack</span></a> <a href="https://social.skynetcloud.site/tags/SocialMedia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SocialMedia</span></a> <a href="https://social.skynetcloud.site/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://social.skynetcloud.site/tags/DataBreach" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataBreach</span></a> <a href="https://social.skynetcloud.site/tags/Scrapping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scrapping</span></a> <a href="https://social.skynetcloud.site/tags/Hackread" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Hackread</span></a> <a href="https://social.skynetcloud.site/tags/security" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>security</span></a> <a href="https://social.skynetcloud.site/tags/TikTok" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TikTok</span></a></p>
UG Center for InformationTech<p>📣 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗛𝘂𝗯 𝟱 𝗝𝘂𝗻𝗲: 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝗪𝗲𝗯 𝗦𝗰𝗿𝗮𝗽𝗶𝗻𝗴 𝘂𝘀𝗶𝗻𝗴 𝗣𝘆𝘁𝗵𝗼𝗻 🐍<br>Is much of the information you need for your <a href="https://social.edu.nl/tags/research" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>research</span></a> available on websites, but not as downloadable <a href="https://social.edu.nl/tags/datasets" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datasets</span></a> or <a href="https://social.edu.nl/tags/files" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>files</span></a>? This workshop will introduce you to the basics of <a href="https://social.edu.nl/tags/webscraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraping</span></a> in a clear, practical way! </p><p>Also drop by at the 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗖𝗮𝗳é where experts will be present for your quick (or big;-) questions about <a href="https://social.edu.nl/tags/R" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>R</span></a>, <a href="https://social.edu.nl/tags/Python" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Python</span></a>, <a href="https://social.edu.nl/tags/Statistics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Statistics</span></a>, <a href="https://social.edu.nl/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a>, <a href="https://social.edu.nl/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HPC</span></a> and <a href="https://social.edu.nl/tags/Geo" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Geo</span></a>!</p><p>ℹ️ More information 👉🏼<a href="https://edu.nl/rw7vd" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">edu.nl/rw7vd</span><span class="invisible"></span></a></p>
PromptCloud<p>Tired of babysitting DIY scraping scripts that crash the moment you scale?<br>You’re not alone.</p><p>PromptCloud takes the pain out of large-scale data extraction with fully managed, reliable solutions — so you can focus on what really matters: insights.</p><p>🔗 <a href="https://shorturl.at/EApIO" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">shorturl.at/EApIO</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/OpenData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenData</span></a> <a href="https://mastodon.social/tags/DataEngineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataEngineering</span></a> <a href="https://mastodon.social/tags/BigData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BigData</span></a> <a href="https://mastodon.social/tags/Automation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Automation</span></a> <a href="https://mastodon.social/tags/PromptCloud" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PromptCloud</span></a> <a href="https://mastodon.social/tags/TechForGood" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechForGood</span></a> <a href="https://mastodon.social/tags/DataOps" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataOps</span></a></p>
Rachel Rawlings<p>I'm having trouble figuring out what kind of botnet has been hammering our web servers over the past week. Requests come in from tens of thousands of addresses, just once or twice each (and not getting blocked by fail2ban), with different browser strings (Chrome versions ranging from 24.0.1292.0 - 108.0.5163.147) and ridiculous cobbled-together paths like /about-us/1-2-3-to-the-zoo/the-tiny-seed/10-little-rubber-ducks/1-2-3-to-the-zoo/the-tiny-seed/the-nonsense-show/slowly-slowly-slowly-said-the-sloth/the-boastful-fisherman/the-boastful-fisherman/brown-bear-brown-bear-what-do-you-see/the-boastful-fisherman/brown-bear-brown-bear-what-do-you-see/brown-bear-brown-bear-what-do-you-see/pancakes-pancakes/pancakes-pancakes/the-tiny-seed/pancakes-pancakes/pancakes-pancakes/slowly-slowly-slowly-said-the-sloth/the-tiny-seed</p><p>(I just put together a bunch of Eric Carle titles as an example. The actual paths are pasted together from valid paths on our server but in invalid order, with as many as 32 subdirectories.)</p><p>Has anyone else been seeing this and do you have an idea what's behind it?</p><p><a href="https://infosec.exchange/tags/botnet" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>botnet</span></a> <a href="https://infosec.exchange/tags/ddos" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ddos</span></a> <a href="https://infosec.exchange/tags/webscraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraping</span></a> <a href="https://infosec.exchange/tags/infosec" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>infosec</span></a></p>
Matt Hodgkinson<p>For Immediate Release, April 1, 2025: University of Michigan Press will publish all of the content on Meta platforms as a series of printed books.<br><a href="https://www.linkedin.com/posts/charles-watkinson-7553a257_amphibians-and-reptiles-of-the-great-lakes-activity-7312775744932179968-sLSu" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">linkedin.com/posts/charles-wat</span><span class="invisible">kinson-7553a257_amphibians-and-reptiles-of-the-great-lakes-activity-7312775744932179968-sLSu</span></a></p><p><a href="https://scicomm.xyz/tags/MetaPlatforms" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MetaPlatforms</span></a> <a href="https://scicomm.xyz/tags/Instagram" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Instagram</span></a> <a href="https://scicomm.xyz/tags/Facebook" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Facebook</span></a> <a href="https://scicomm.xyz/tags/ThreadsApp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ThreadsApp</span></a> <a href="https://scicomm.xyz/tags/Copyright" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Copyright</span></a> <a href="https://scicomm.xyz/tags/BookPublishing" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BookPublishing</span></a> <a href="https://scicomm.xyz/tags/TextMining" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TextMining</span></a> <a href="https://scicomm.xyz/tags/TextCorpora" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TextCorpora</span></a> <a href="https://scicomm.xyz/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://scicomm.xyz/tags/AIethics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIethics</span></a></p>
Programming Historian<p>Aprende sobre la técnica de adquisición de datos conocida como <a href="https://hcommons.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> y extrae con R los datos textuales publicados en una página web gracias a esta lección de <span class="h-card" translate="no"><a href="https://mastodon.social/@rivaquiroga" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>rivaquiroga</span></a></span> </p><p><a href="https://doi.org/10.46430/phes0061" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">doi.org/10.46430/phes0061</span><span class="invisible"></span></a></p>
Loki the Cat<p>When your AI is basically a digital vacuum cleaner with 600 power cords 🔌 OpenAI's bot accidentally DDoS'd a small company while trying to download their entire 65,000-product database. Turns out robots.txt is more of a "pretty please" than a restraining order! 🤖 <a href="https://social.jorijn.com/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://social.jorijn.com/tags/webscraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a></p><p><a href="https://tech.slashdot.org/story/25/01/11/0449242/openais-bot-crushes-seven-person-companys-website-like-a-ddos-attack" rel="nofollow noopener" target="_blank">https://tech.slashdot.org/story/25/01/11/0449242/openais-bot-crushes-seven-person-companys-website-like-a-ddos-attack</a></p>
Miguel Afonso Caetano<p>"On Saturday, Triplegangers CEO Oleksandr Tomchuk was alerted that his company’s e-commerce site was down. It looked to be some kind of distributed denial-of-service attack. </p><p>He soon discovered the culprit was a bot from OpenAI that was relentlessly attempting to scrape his entire, enormous site. </p><p>“We have over 65,000 products, each product has a page,” Tomchuk told TechCrunch. “Each page has at least three photos.” </p><p>OpenAI was sending “tens of thousands” of server requests trying to download all of it, hundreds of thousands of photos, along with their detailed descriptions. </p><p>“OpenAI used 600 IPs to scrape data, and we are still analyzing logs from last week, perhaps it’s way more,” he said of the IP addresses the bot used to attempt to consume his site. </p><p>“Their crawlers were crushing our site,” he said “It was basically a DDoS attack.”</p><p>Triplegangers’ website is its business. The seven-employee company has spent over a decade assembling what it calls the largest database of “human digital doubles” on the web, meaning 3D image files scanned from actual human models. </p><p>It sells the 3D object files, as well as photos — everything from hands to hair, skin, and full bodies — to 3D artists, video game makers, anyone who needs to digitally recreate authentic human characteristics."</p><p><a href="https://techcrunch.com/2025/01/10/how-openais-bot-crushed-this-seven-person-companys-web-site-like-a-ddos-attack/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">techcrunch.com/2025/01/10/how-</span><span class="invisible">openais-bot-crushed-this-seven-person-companys-web-site-like-a-ddos-attack/</span></a></p><p><a href="https://tldr.nettime.org/tags/CyberSecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CyberSecurity</span></a> <a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> <a href="https://tldr.nettime.org/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://tldr.nettime.org/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a> <a href="https://tldr.nettime.org/tags/DDoS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DDoS</span></a> <a href="https://tldr.nettime.org/tags/AITraining" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AITraining</span></a></p>