Data Leverage Blog
Home
Archive
Data Leverage Blog
Archives of blog posts from the
Data Leverage Substack
.
AI Jobs Apocalypse Roundup
Jun 5, 2025
How do we know our AI output is good? Double checks, bar charts, vibes, and training data
Connecting evaluation and dataset documentation via the lens of “AI as ranking”.
May 30, 2025
Each Instance of ‘AI Utility’ Stems from Some Human Act(s) of Information Recording and Ranking
It’s ranking information all the way down.
May 28, 2025
Google and TikTok rank bundles of information; ChatGPT ranks grains
Google and others solve our attentional problem by ranking discrete bundles of information, whereas ChatGPT ranks more granular chunks. This lens can help us reason about AI policy.
May 27, 2025
[microblog] One book is worth ‘0.06%’ benchmark points to AI; is ‘no different from noise’. What gives?
Commenting on recent coverage of, and discussion about, Meta’s arguments about training data value quantification.
Apr 21, 2025
Public AI, Data Appraisal, and Data Debates
A consortium of Public AI labs can substantially improve data pricing, which may also help to concretize debates about the ethics and legality of training practices.
Apr 3, 2025
Evaluation Data Leverage: Advances like ‘Deep Research’ Highlight a Looming Opportunity for Bargaining Power
Research agents and increasingly general reasoning models open the door for immense “evaluation data leverage”.
Mar 2, 2025
Tipping Points for Content Ecosystems
Our AI design choices in 2024 could preclude ’Powerful AI in 2030.
Feb 12, 2025
AI Labs Should Open Source Data Protection Technologies
There’s still incredible tension in the current data paradigm, but sharing “data protection” technologies, like those used by OpenAI to accuse DeepSeek of model theft, can help cut a path forward.
Jan 31, 2025
Live by the free-content-for-training sword, die by the free-content-for-training sword
There’s deep tension in the current ask-for-forgiveness-free-for-all approach to acquiring data for model training. Will “open” models cause this tension to reach a breaking point?
Jan 28, 2025
Selling AGI like AG1: Will Consumers Push Back Against Proprietary Blends of Herbs and of Data?
The race to produce premiere AI products with high price tags might change the standards around data disclosure.
Dec 12, 2024
Perplexity CEO’s Interaction with Striking New York Times Workers Does Not Reflect Well on the AI Industry
The idea that data-dependent AI systems are ready and willing to crush any leverage from knowledge workers is unlikely to make the AI industry look good to the public.
Nov 9, 2024
Is Zuckerberg right to say that your specific creative work has no value to AI?
Examining the Meta CEO’s claim that the “individual work of most creators isn’t valuable enough for it to matter” in the context of AI training.
Sep 28, 2024
“Many Models” and “Track Changes” for AI: Some Thoughts on LLM Interfaces
Interacting with many models and harnessing the power of
diff
Aug 8, 2024
Building a Data Pipeworks for Democratic AI: From Human Knowledge to Records to AI Systems
Focusing on feedback loops – connecting modern AI to early cybernetics-style thinking – could help solve looming challenges and support democratic inputs to AI.
Nov 13, 2023
Will the New York Times Data Strike Have a Large Impact on ChatGPT?
How can we start thinking about how opt-out decisions by content-producing organizations will affect LLMs?
Sep 28, 2023
A Harbinger of the Future of Content? The New York Times Starts a Data Strike
The New York Times is trying to remove its content from OpenAI models, surfacing tensions around copyright, economic harms, privacy, and the distribution of AI benefits.
Aug 25, 2023
The WGA Strike is a Canary in the Coal Mine for AI Labor Concerns
Could Upcoming Data Legislation Enable a “Right to Data Strike”?
May 5, 2023
Reddit, StackOverflow, and Europe: All Trending Towards Data Dignity
May 1, 2023
Data Leverage Recap: December 2022 - April 2023
The Last Three Months in Review: What’s New and What’s Next
Apr 18, 2023
Bing Rewards for the AI Age
Introduction
Mar 30, 2023
Plural AI Data Alignment
Measuring the Alignment of AI Systems Based on their Data Pipelines
Mar 2, 2023
AI Technologies are System Maps, and You are a Cartographer
Mapping a Seaside Village
Feb 3, 2023
AI Artist or AI Art Thief? Innovation, Public Mandates, and the Case for Talking in Terms of Leverage
Dec 16, 2022
ChatGPT is Awesome and Scary: You Deserve Credit for the Good Parts (and Might Help Fix the Bad Parts)
More on why you’re an expert language model trainer
Dec 4, 2022
The Paradox of Reuse, Language Models Edition
Background
Dec 2, 2022
Don’t give OpenAI all the credit for GPT-3: You might have helped create the latest ‘astonishing’ advance in AI too
The much-celebrated GPT-3 that can answer questions, write poems, and more wouldn’t be possible without content written by millions of people around the world. Shouldn’t they get some credit?
Sep 22, 2020
No matching items