Skip to main content

You're Going to Build an AI Layer Anyway. Build the Useful Version

The Cost of Doing Nothing Has Been Quietly Compounding

[ 8 min read · Part of a series on AI-powered moderation — start here ]

Every platform running a reactive moderation system is making a choice. Not a considered choice with a cost-benefit analysis attached — a default choice, made by inertia, to keep doing what was done before because changing it requires a decision and not changing it doesn't.

That choice has a cost. It just doesn't show up in a single line on a budget report. It accumulates in user churn, community trust erosion, reputational incidents, and the quiet departure of exactly the kind of users who make a platform worth being on. By the time it's visible it's already been compounding for a while.


What's being proposed — and why it's different from what exists

Three major platforms have already demonstrated that AI assessment of posts before they go live works at scale. Instagram has been doing it since 2019. Reddit's Post Check has reduced rule-breaking submissions measurably. Steam checks every post for spam and malicious content at publication. The technology is proven. What none of them has built is an AI layer — something that reads every post anyway as a moderation function, and then does several other useful jobs with that reading at minimal marginal cost.

The distinction matters. A moderation tool that intercepts posts and asks "are you sure?" is experienced as surveillance. An AI layer that's genuinely useful to the people posting — and happens to also be doing moderation work — is experienced as a feature. The moderation happens either way. The user experience is completely different depending on which one you build.

Since the AI is reading every post at the point of submission anyway, here is what it can do while it's reading:

Find the thread that already has the answer. When a user starts a new topic, the AI checks whether a sufficiently similar discussion already exists — not by keyword matching, which returns too many loosely related results for users to bother checking, but by semantic similarity. "How do I reach the chest behind the waterfall in zone 3" and "I can't find the chest near the falls in the third area" are the same question with zero keyword overlap. AI finds that match. Keyword search doesn't. The user gets directed to an existing thread that probably already has what they need. If the match is close but not exact, the new thread gets flagged as a potential merge candidate for moderators rather than blocked. The forum gets cleaner over time without anyone having to manually deduplicate.

Help users say what they actually mean. A user who misread the post they're replying to, or whose frustration is legitimate but whose draft doesn't convey their actual point, gets a brief AI conversation — not a warning prompt, a dialogue. What are you trying to say? Is this what you meant? The interaction model is familiar: users in 2026 already know how to talk to an AI. They've used ChatGPT, Gemini, Claude. A forum with a helpful AI built in doesn't feel like surveillance. It feels like a feature they use elsewhere, now integrated into the community they're part of. Most good-faith users who go through this interaction post something better than they would have otherwise. Some realise they don't need to post at all. Both outcomes are good for the forum.

Assess whether a post contributes to the discussion it's entering. This is where the moderation function runs — underneath the helpful surface. The AI's assessment of whether a post contributes in good faith to its thread is the same assessment that catches bad-faith content, ragebait, and irrelevant replies. It isn't pattern-matching against previously reported content. It's contextual judgment about the relationship between a post and what it's responding to. Harder to game. Less prone to the false positives that make users feel unfairly targeted. And operating on the same reading the AI is already doing for the helpful functions above. Every post that fails the contribution assessment gets hidden before other users see it — not deleted, just held back pending review.

Watch what live posts produce. Once a post is live, the system monitors how many of its replies are being hidden this way. When that count is anomalously high compared to similar posts in the same community, the original post gets flagged for human review automatically — regardless of how it looked at submission. This is the second independent safety net: a sophisticated bad actor who crafts their instigating post carefully enough to pass the pre-submission check still gets caught when their post starts generating replies that keep failing the contribution check. Nobody reported it. The system observed what it produced. Each hidden reply isn't only a moderation action — it's simultaneously evidence about the post that provoked it.

Build account profiles without categories. Every time a submission reaches a new stage of the review process, a counter increments. No violation categories. No reason codes. Just a count of how far each submission went. Over time that produces an account profile that the AI can reason about in plain language for a human moderator — without anyone having to decide in advance what category a new type of bad behaviour belongs to. Novel bad-faith patterns are already in the data. They don't need a new category to be found.

The result is an AI layer that users experience as useful — familiar, helpful, unobtrusive — while running a complete proactive moderation system underneath. The best moderation is the kind users don't experience as moderation. This is designed to be exactly that.


The honest objections — and why they're less compelling than they appear

Decision-makers who have considered proactive AI moderation before will have a list of reasons it hasn't been built yet. Most of those reasons are real. None of them hold up as well as they appear on first inspection when applied to this specific system.

Toxicity drives engagement. Flame wars and outrage keep users on the platform. A system that reduces toxic content will reduce certain engagement numbers in the short term. This is true. It's also true that rage-driven engagement is the cheapest and most fragile kind. Users who stay because they're angry burn out. The communities that rely on conflict to generate activity lose their best contributors first — the people who post thoughtfully and engage genuinely leave quietly when the environment deteriorates. What remains generates lower-quality engagement that drives away advertisers. The numbers stay up until they suddenly don't. Protecting high-quality engagement at the cost of low-quality engagement is not a loss.

Latency is a real constraint. Running AI assessment on every post before submission adds processing time. In real-time chat — game lobbies, Discord conversations, instant messaging — even a half-second delay destroys the experience. This objection is valid for those contexts and this system doesn't apply to them. Asynchronous discussion forums, where a conversation unfolds over minutes or hours, have no meaningful latency problem at the submission stage. The scoping matters.

The "Clippy" effect. Being interrupted by an AI that asks "are you sure?" feels paternalistic — especially when it fires on benign content. This is a real problem with Instagram's implementation, which matches against reported content patterns and produces false positives that make users feel unfairly targeted. A contextual contribution assessment that explains what it noticed and asks what the user is trying to say is a different experience. More importantly, a system that leads with finding duplicate threads and helping users articulate their posts isn't presenting itself as a moderation tool at all. The paternalism problem largely dissolves when the AI's primary visible job is being useful.

Algorithmic bias. Any AI system that assesses communication style risks disadvantaging communities whose vernacular or norms differ from the majority. A contribution-based contextual assessment is less vulnerable to this than pattern-matching against a corpus of historically reported content — but it isn't immune. This is a genuine cost that requires genuine investment in testing and calibration per community. It belongs in the budget as a line item, not a footnote.

Free speech positioning. Some platforms have commercially committed to minimal moderation as a brand differentiator. For those platforms this system is incompatible with their market positioning. That's a legitimate strategic choice with known tradeoffs — a different advertiser base, a different user composition, different regulatory exposure. It isn't the audience for this argument.


What it actually costs

The assumption that proactive AI moderation is prohibitively expensive doesn't survive contact with current infrastructure costs.

Running a complete system at Reddit's scale — approximately 45 million comments per day — costs an estimated $2.8 million per year in AI infrastructure at current model pricing. Reddit's 2025 revenue was $2.2 billion. That's 0.13% of revenue. The baseline pre-submission contribution check alone — running on every single post — costs approximately $82,000 per year at that scale. Less than the fully-loaded annual cost of a single mid-level trust and safety employee, running at 3am on a Sunday at the same quality as 2pm on a Tuesday.

The relevant comparison is not this cost against zero. It is this cost against what reactive moderation is currently spending — headcount processing a report backlog that never empties, engineering time maintaining an ever-more-complex ruleset in a permanent arms race against bad actors, advertiser revenue lost to brand safety concerns, user acquisition cost replacing good-faith contributors that a deteriorating community environment drives away. Set against that full picture the proactive system isn't an additional cost. It's a reallocation with a better return.

AI infrastructure pricing has also fallen consistently and substantially year over year. The cost of running this system in two years will be lower than it is today. The cost of the user trust that erodes while the decision is deferred will not be.


Where regulation and competition are heading

The regulatory trajectory is not ambiguous. The EU Digital Services Act, the UK Online Safety Act, Australia's Online Safety Act, and the direction of travel in multiple other jurisdictions all point the same way: platforms will be expected to demonstrate proactive measures to protect users, not just reactive responses to reported violations. The platforms that build these systems now build toward compliance as a competitive advantage. The ones that don't accumulate compliance debt — the expensive, rushed, externally-imposed version of the same work, done under regulatory pressure rather than strategic intent, without the benefit of having moved first.

The competitive dynamic is equally clear. Community trust compounds. Platforms that earn it retain the users who care about quality. Those users generate the contributions that make the community worth joining — the human knowledge, testimony, and experience that only community produces and that no AI will replicate, because it requires someone to have been there. More quality contributors attract more. The composition of the platform improves over time in a direction that's very hard to reverse once established. The platforms that don't earn community trust get the inverse dynamic. It also compounds.

The question isn't whether proactive AI moderation gets built across the industry. It's which platforms build it first and which ones spend the following years explaining to their users — and their regulators — why moderation still works the way it worked in 2010.


The decision being made right now

The AI is going to read posts on your platform eventually — for moderation, for recommendation, for advertising targeting, for some combination of all three. The only question is whether it does that reading in a way that's also useful to the people posting, or in a way that isn't.

Building the useful version costs roughly the same as building the invisible version. It produces better moderation outcomes, a better user experience, a cleaner forum, and a defensible position when moderation decisions get challenged. It also produces the kind of community that has a compelling answer to the question of why it still exists when AI can answer most questions faster — because what it offers isn't faster answers, it's the irreplaceable human knowledge and testimony that only community produces, protected by a system that keeps the noise out.

The default choice — to keep doing what was done before because changing it requires a decision — is still a choice. It just stops being a neutral one.


This is part of a series on AI-powered moderation. Read the overview · For moderation teams · For developers and platform builders

Comments