The different stages of a Moderation operation

April 3, 2020

Bernat Fages

What is Moderation?

Any website, app or platform dealing with user generated content or facilitating user interactions faces considerable risk exposure if they don't police their users' behavior. Typically, social networks, dating apps, online communities, sites with commenting sections and marketplaces fit into this category.

This risk surfaces in the form of bad experiences for the end user, which may end up hurting your business. Some examples might be:

Moderation is what ensures that undesirable behaviour is kept off the platform so users can have a positive experience.

Implementing Moderation

Any Moderation effort will go through various stages. How far you go will mainly depend on the size of the user base, the tolerance level for undesirable behaviour, and ultimately how detrimental to the business such behaviour is.


  1. Reports
  2. Rules
  3. Classifiers


The idea behind user reports is to have users do the moderation work for you, as well as uncover potential blind spots you might not be aware of.

From my experience, user reports are a bit of a double edged sword. They certainly help surface bad users, interactions or content that would otherwise remain undetected. However, user reports also tend to get weaponized against benign users.

The general advice here is, definitely launch reports, but make sure someone is looking at them before taking any action. Here, Operations is a great ally to have. They will help put the right review processes and teams in place to meet the business' demands.

Having good review tooling to review reports with will also be vital. Here, using a platform like Human Lambdas to power your review processes and get visibility into the entire operation will be invaluable.

Finally, once you start receiving undesirable users through these reports you will start seeing patterns. These patterns contain invaluable knowledge that you can use to start automating your moderation efforts, in the form of rules.


A common industry practice is to deploy rules to flag or filter specific clearly undesirable content or interactions. These are usually applied through a rules engine that will make it feasible to easily define conditions to apply to everything that occurs in the site.

Those rules may trigger different subsequent actions, depending on how accurate the rule might be and how negatively impactful the content or interaction might be.

The main qualities of rules are reliability, interpretability and scalability. Rules are 100% consistent in their behaviour – unlike human moderators –, they're easy for anyone to reason about – which is important for ensuring there's no bias – and scale to monitor the entire user base at virtually no cost.

Of course, in order to find good candidates for new rules and implement them, an analyst type of role should be made part of this initiative.


Classifiers take the idea of a rules engine to the next level, by automatically recognising the underlying patterns of a given type of bad behaviour. So they do the job of finding rules for you. Such is the power of classifiers. People may call this Machine Learning, Artificial Intelligence or more simply Statistical Modelling. Either term refers to the same broad set of techniques.

While the promise of a classifier sounds very attractive, it usually requires expensive Machine Learning development and a considerable amount of examples the algorithm will need to learn from. Another downside most Machine Learning classifiers suffer from is lack of explainability and a potential bias amplification risk, which fundamentally will come from the initial training examples.

I would tend to recommend going down this path in those cases where the incremental detection figures have plateaued and it's clear that there is still undesirable behaviour going on.

Putting it all together

Whether to establish reporting mechanisms, rule systems and classifiers is not an either-or question. These are lines of defence in a way. The most well run Trust & Safety Operations in the online world will most certainly leverage all of them.

But to get there, it's probably a good idea to execute them in sequence.