The eupolicy.social admin @admin

**Europe Says** @europesays@pubeurope.com · Sep 10

Europe Says @europesays@pubeurope.com

https://www.europesays.com/2399795/ Sources: AI training startup Mercor eyes $10B+ valuation on $450 million run rate #business #DataLabeling #Entrepreneurship #FelicisVentures #Mercor #SPV

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Aug 12

Aug 12

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"The incredible demand for high-quality human-annotated data is fueling soaring revenues of data labeling companies. In tandem, the cost of human labor has been consistently increasing. We estimate that obtaining high-quality human data for LLM post-training is more expensive than the marginal compute itself1 and will only become even more expensive. In other words, high-quality human data will be the bottleneck for AI progress if these trends continue.

The revenue of major data labeling companies and the marginal compute cost of training of training frontier models for major AI providers in 2024.

To assess the proportion of data labeling costs within the overall AI training budget, we collected and estimated both data labeling and compute expenses for leading AI providers in 2024:

- Data labeling costs: We collected revenue estimates of major data labeling companies, such as Scale AI, Surge AI, Mercor, and LabelBox.
- Compute costs: We gathered publicly reported marginal costs of compute2 associated with training top models released in 2024, including Sonnet 3.5, GPT-4o, DeepSeek-V3, Mistral Large, Llama 3.1-405B, and Grok 2.

We then calculate the sum of costs in a category as the estimate of the market total. As shown above, the total cost of data labeling is approximately 3.1 times higher than total marginal compute costs. This finding highlights clear evidence: the cost of acquiring high-quality human-annotated data is rapidly outpacing the compute costs required for training state-of-the-art AI models."

https://ddkang.substack.com/p/human-data-is-probably-more-expensive

Daniel’s Substack · Aug 11Human Data is (Probably) More Expensive Than Compute for Training Frontier LLMsBy Daniel Kang

#AI #AITraining #GenerativeAI

**HabileData** @habiledata@mastodon.social · Jul 15

Jul 15

HabileData @habiledata@mastodon.social

Data Annotation vs Data Labelling- Find the right for you

Key takeaways:

• Understand the core difference between annotation and labeling
• Explore use cases across NLP, computer vision & more
• Learn how each process impacts model training and accuracy

Read now to make smarter data decisions:

https://www.hitechbpo.com/blog/data-annotation-vs-data-labeling.php?utm_medium=referral&utm_campaign=group-sharing

#DataAnnotation #DataLabeling #AI

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Jun 30

Jun 30

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"Scale AI is basically a data annotation hub that does essential grunt work for the AI industry. To train an AI model, you need quality data. And for that data to mean anything, an AI model needs to know what it's looking at. Annotators manually go in and add that context.

As is the means du jour in corporate America, Scale AI built its business model on an army of egregiously underpaid gig workers, many of them overseas. The conditions have been described as "digital sweatshops," and many workers have accused Scale AI of wage theft.

It turns out this was not an environment for fostering high-quality work.

According to internal documents obtained by Inc, Scale AI's "Bulba Experts" program to train Google's AI systems was supposed to be staffed with authorities across relevant fields. But instead, during a chaotic 11 months between March 2023 and April 2024, its dubious "contributors" inundated the program with "spam," which was described as "writing gibberish, writing incorrect information, GPT-generated thought processes."

In many cases, the spammers, who were independent contractors who worked through Scale AI-owned platforms like Remotasks and Outlier, still got paid for submitting complete nonsense, according to former Scale contractors, since it became almost impossible to catch them all. And even if they did get caught, some would come back by simply using a VPN.

"People made so much money," a former contributor told Inc. "They just hired everybody who could breathe.""

https://futurism.com/scale-ai-zuckerberg-incompetence

Futurism · Jun 28The AI Company Zuckerberg Just Poured $14 Billion Into Is Reportedly a Clown Show of Ludicrous IncompetenceBy Frank Landymore

#AI #GenerativeAI #Meta

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Jun 27

Jun 27

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"The production of artificial intelligence (AI) requires human labour, with tasks ranging from well-paid engineering work to often-outsourced data work. This commentary explores the economic and policy implications of improving working conditions for AI data workers, specifically focusing on the impact of clearer task instructions and increased pay for data annotators. It contrasts rule-based and standard-based approaches to task instructions, revealing evidence-based practices for increasing accuracy in annotation and lowering task difficulty for annotators. AI developers have an economic incentive to invest in these areas as better annotation can lead to higher quality AI systems. The findings have broader implications for AI policy beyond the fairness of labour standards in the AI economy. Testing the design of annotation instructions is crucial for the development of annotation standards as a prerequisite for scientific review and effective human oversight of AI systems in protection of ethical values and fundamental rights."

https://journals.sagepub.com/doi/10.1177/20539517251351320

#AI #GenerativeAI #DataWork

**Winbuzzer** @winbuzzer@mastodon.social · Jun 19

Jun 19

Winbuzzer @winbuzzer@mastodon.social

Meta's Scale AI Gambit Ignites Exodus of Big-Tech Customers and AI Labs

#Meta #ScaleAI #AI #Google #BigTech #AITraing #Zuckerberg #AINeutrality #DataLabeling

https://winbuzzer.com/2025/06/19/metas-scale-ai-gambit-ignites-exodus-of-big-tech-customers-and-ai-labs-xcxwbn/

**WYM Mod** @whatyoumissed@mastodon.vtip.me · Jun 14

Jun 14

WYM Mod @whatyoumissed@mastodon.vtip.me

According to Reuters, a major shift is underway as Google plans to cut ties with Scale AI, its largest data-labeling partner, following Meta's acquisition of a 49% stake in Scale. This strategic move aims to protect proprietary interests amid rising competitive threats. As Google explores alternatives for AI services, this could significantly impact Scale's revenue and open doors for new competitors. Read more about the implications [here](https://www.cnbc.com/2025/06/14/google-scale-ais-largest-customer-plans-split-after-meta-deal.html). Kudos to Reuters for the insightful coverage! #Google #ScaleAI #Meta #AI #DataLabeling #MachineLearning #BusinessStrategy #Technology #Competitors

CNBCGoogle, Scale AI's largest customer, plans split after Meta deal, sources sayGoogle plans to cut ties with Scale AI after news broke that rival Meta is taking a 49% stake in the AI data-labeling startup, Reuters reported, citing sources.

**PUPUWEB Blog** @pupuweb@mastodon.social · Jun 14

Jun 14

PUPUWEB Blog @pupuweb@mastodon.social

Google, Scale AI’s top client, is ending its partnership after Meta acquired a 49% stake in Scale. Microsoft, OpenAI, and xAI are also stepping back, prompting major shifts in AI data-labeling partnerships. #Google #ScaleAI #Meta #AI #DataLabeling #Microsoft #OpenAI #xAI #TechNews

**ResearchBuzz: Firehose** @researchbuzz_firehose@rbfirehose.com · May 22

May 22

ResearchBuzz: Firehose @researchbuzz_firehose@rbfirehose.com

TechXplore: Third-party data annotators often fail to accurately read the emotions of others, study finds. “Machine learning algorithms and large language models (LLMs), such as the model underpinning the functioning of the platform ChatGPT, have proved to be effective in tackling a wide range of tasks. These models are trained on various types of data (e.g., texts, images, videos, and/or […]

https://rbfirehose.com/2025/05/22/techxplore-third-party-data-annotators-often-fail-to-accurately-read-the-emotions-of-others-study-finds/

#AI #aiassisted #dataannotation

**Gregor Weichbrodt** @computer@chaos.social · Dec 28, 2024 *

Dec 28, 2024 *

Gregor Weichbrodt @computer@chaos.social

I created an offline-ready web app for labeling text data. Select a text file (one entry per line), and assign categories/labels to each entry. Your progress is automatically saved in your browser. Export labeled data as text files (again, one entry per line) or combined CSV. 1/

https://pickthing.ggor.de

#nlp #datascience #datalabeling

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Nov 26, 2024

Nov 26, 2024

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"The familiar narrative is that artificial intelligence will take away human jobs: machine-learning will let cars, computers and chatbots teach themselves - making us humans obsolete.

Well, that's not very likely, and we're gonna tell you why. There's a growing global army of millions toiling to make AI run smoothly. They're called "humans in the loop:" people sorting, labeling, and sifting reams of data to train and improve AI for companies like Meta, OpenAI, Microsoft and Google. It's gruntwork that needs to be done accurately, fast, and - to do it cheaply – it's often farmed out to places like Africa –

Naftali Wambalo: The robots or the machines, you are teaching them how to think like human, to do things like human.

We met Naftali Wambalo in Nairobi, Kenya, one of the main hubs for this kind of work. It's a country desperate for jobs… because of an unemployment rate as high as 67% among young people. So Naftali, father of two, college educated with a degree in mathematics, was elated to finally find work in an emerging field: artificial intelligence."

https://www.cbsnews.com/news/labelers-training-ai-say-theyre-overworked-underpaid-and-exploited-60-minutes-transcript/

www.cbsnews.comLabelers training AI say they're overworked, underpaid and exploited by big American tech companiesDigital workers in Kenya had to sift through horrific online content to train AI, but say they were underpaid, overworked, and got inadequate mental health support. So they're fighting back.

#Kenya #AI #GenerativeAI

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Sep 5, 2024

Sep 5, 2024

Miguel Afonso Caetano @remixtures@tldr.nettime.org

#AI #GenerativeAI #LLMs #AITraining #DataLabeling #GigEconomy: "Who are the workers behind the training datasets powering the biggest LLMs on the market? In this explainer, we delve into data labeling as part of the AI supply chain, the labourers behind this data labeling, and how this exploitative labour ecosystem functions, aided by algorithms and larger systemic governance issues that exploit microworkers in the gig economy.

Key points:

- High quality training data is the crucial element for producing a performing LLM, and high quality training data is labeled datasets.

- Several digital labour platforms have arisen to the task of supplying data labeling for LLM training. However, a lack of transparency and use of algorithmic decision-making models undergirds their exploitative business models.

- Workers are often not informed about who or what they are labeling raw datasets for, and they are subjected to algorithmic surveillance and decision-making systems that facilitate unreliable job stability and unpredictable wages."

https://privacyinternational.org/explainer/5357/humans-ai-loop-data-labelers-behind-some-most-powerful-llms-training-datasets

Privacy InternationalHumans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasetsBehind every machine is a human person who makes the cogs in that machine turn - there's the developer who builds (codes) the machine, the human evaluators who assess the basic machine's performance, even the people who build the physical parts for the machine.

Recent searches

Search options

Administered by:

Server stats:

#datalabeling