Implementing a Moderation Enforcement Framework

April 23, 2020

Bernat Fages

Whether you're a social network or a marketplace looking into implementing moderation, you will most likely have to design an effective policing system that takes different enforcement actions depending on the severity of each detected instance of bad behaviour.

In this post we propose a framework to help navigate the process of designing an effective, consistent and fair tiered enforcement strategy for moderation teams to apply against violating behaviour.

Having multiple enforcement tiers matters because you probably don't want to punish all instances of bad behaviour equally. After all, not all bad behaviour is equally harmful.

The ideal enforcement for a given case of badness should intend to remediate the issue and prevent it from happening again. At the same time, we don't want to take overly harsh actions unless we strictly have to. That's because the harsher an action is, the more likely it is to impact that user's engagement with your product. For instance, with milder, sometimes borderline, violations like spam or harassment you clearly don't want to wipe those users out from your platform. If you do, your top line metrics are probably going to suffer.

An enforcement system needs to consider two aspects:

In the sections below we outline the frameworks we use to come up with sound strategies to address violating behaviour.

Severity framework

The severity of a violation can be determined as a function of:

PSA: it is worth noting some people deem Recidivism – defined as the tendency of an actor to reoffend – to be a different component affecting the severity judgement. For this post's purpose, I'll assume it is encapsulated by the frequency definition.

Once you have mapped your platform's violation types to these dimensions and decided on how much weight they carry in your severity assessment function, you should be ready to start exploring your enforcement strategy.

Enforcement Framework

The two most useful dimensions to consider in terms of enforcing on a violation are:

  1. The entity level the action is applied to:
  2. The violating entity itself: this can be a comment, a post, a picture, etc.
  3. Actor: the user, page, group that is responsible for the violation.
  4. Platform: the device, phone number, IP address that is hosting the responsible actor.

Actions, by strength:

  1. Downrank: limits the exposure of an entity, for instance in a feed or in a recommendations list.
  2. Hide: hides an entity from a fraction of the population, or adds an opt-in warning screen to gatekeep the violating entity.
  3. Remove: removes the entity permanently.
  4. Penalty: temporarily bans the parent entities responsible for a violation, and may accrue with incremental enforcement consequences.
  5. Ban: permanently removes the parent entities responsible for a violating entity.

Another action to consider is to Quarantine. This one would only be applied as a temporary enforcement action until a conclusive decision on the severity of the violation is taken. An example here would be a classifier flagging a piece of content with mid confidence, upon which we could temporarily decide to hide it until a human reviewer has evaluated it and taken a subsequent enforcement decision.

Implementing an enforcement strategy

Once you have designed your enforcement strategy, you will need to train your moderation team and adapt their review systems so they can start operating on it.

With Human Lambdas you can make the necessary changes without getting blocked on Engineering resources to customise and adjust your review flows to the latest version of your enforcement framework.

And remember, each action has a cost to the user. No moderator will enforce 100% accurately, so you will inevitably suffer from False Positives.

Our recommendation for users running HITL (Human in the loop) operations on Human Lambdas is to ensure there is a Quality Assurance process in place that guarantees an ongoing measurement of rep accuracy, and thus False Positive rate. With the right setup this metric can even be segmented by severity for a more granular overview.

It is also a good idea to keep an eye on the decisions made by the reviewers in the first days of operation, which you can trivially do through the Audits section in your Human Lambdas instance.