Model validation in Django REST Framework

When Django REST Framework's ModelSerializer validates an API request, it doesn't run Model.full_clean(). It's reasonable to have assumed that it does, because Django's ModelForm does, and the two are very similar in some other ways.

Django REST Framework used to run Model.full_clean() though. It was removed in version 3.0:

We no longer use the .full_clean() method on model instances, but instead perform all validation explicitly on the serializer. This gives a cleaner separation, and ensures that there's no automatic validation behavior on ModelSerializer classes that can't also be easily replicated on regular Serializer classes.
The .clean() method will not be called as part of serializer validation, as it would be if using a ModelForm.

# But why?

I had a bit of a dig, and found Django REST Framework contributor Xavier Ordoquy justifying this design in reply to a Stack Overflow question. The justification is multifaceted and worth a read. I am however going to focus on Xavier's implication that a Django project complex enough to justify custom model-level validation should have a distinct 'business logic' layer.

This sounds entirely reasonable on the face of it. Abstracting lower-level 'software implementation detail' (e.g. 'find and update a row in the accounts database') within routines that follow 'domain language' (e.g. 'ban the user') is a cornerstone of the software engineering discipline.

One potential (and common) approach to this in a Django project is viewing lower-level 'data access' details as being handled entirely by Django. Django then provides you a sensible 'open by default' high-level API with many ways (custom model managers, model validation, signals, etc) to integrate your business logic. If your project is centred around data fetching and mutation (a lot are!), this can be a perfectly reasonable solution. A lot of Django's batteries (like ModelForm and the Django admin interface) are primarily built for audiences which can satisfactorily accomplish implementing their business logic per the above, or by building business logic into custom ModelForms, ModelAdmins, etc directly.

Whilst this approach is suitable for plenty of cases, it can be problematic for projects that are more complex, have more people working on them, or just exist in an environment where there's more of an emphasis on correctness. It takes a lot of ~~scar tissue~~ wisdom see these problems before you run into them first-hand. Some of you will already be nodding your heads. For everyone else, consider how you'd implement sending an email to a user when their account has been deactivated. Ensure that your solution:

Allows for updating multiple records with a single SQL query, e.g. by using QuerySet.bulk_update() (ergo no Model.save() or signals!), and
Ensures that a hypothetical developer rewriting the 'deactivate user' form, API endpoint, management command, etc 6 months from now won't accidentally bypass this notification email in a way that won't be noticed.

Let me be clear that there are myriad ways to skin this particular cat. This is a mammoth topic unto itself, and reasonable people can disagree. However my thought process is as follows:

I can abstract "deactivate user account" and "deactivate user accounts" into standalone routines that take responsibility for also sending related emails.
All this can easily be sidestepped by an errant User.objects.update(is_active=False), which is easy to accidentally do if you're used to having 'feature code' interact directly with the Django ORM.
Ensuring that all 'account deactivation' feature code uses these new routines will be a game of cat and mouse, especially on larger projects with more developers.
A good way to ensure compliance is to adopt this explicit business logic layer convention project-wide, so that non-compliance sticks out like a sore thumb. Developers can (hypothetically) use the business logic layer with confidence, knowing that—if a routine has been provided—it is taking into account everything that needs to be taken into account. This is opposed to e.g. Django's ORM, which will provide a deceptively high-level state management API that may not take care of everything it should.

What could this look like? In a blog post, Django REST Framework author Tom Christie describes a convention where code never directly changes a model's state or triggers a database write. Instead, all state change operations occur via custom model methods and model manager methods:

Never write to a model field or call save() directly. Always use model methods and manager methods for state changing operations.
The convention is more clear-cut and easier to follow that "Fat models, thin views", and does not exclude your team from laying an additional business logic layer on top of your models if suitable.
Adopting this as part of your formal Django coding conventions will help your team ensure a good codebase style, and give you confidence in your application-level data integrity.

This seems like a reasonable execution of a 'business logic layer'. It may however be counter to how you currently work with Django.

# Relating this back to Django REST Framework

In his blog post, Tom Christie vaguely implies that the Django REST Framework behaviour change is in service of his posed convention:

Django REST framework's Serializer API ... follows a similar approach to validation as Django's ModelForm implementation. In the upcoming 3.0 release the validation step will become properly decoupled from the object-creation step, allowing you to strictly enforce model class encapsulation while using REST framework serializers.

Let's look at the Django REST Framework documentation's summary of ModelSerializer's functionality:

It will automatically generate a set of fields for you, based on the model.
It will automatically generate some validators for the serializer, based on the underlying Model, such as unique_together validators.
It includes simple default implementations of .create() and .update().

Again, looking at the above, I see it as entirely reasonable to be under the impression that ModelSerializer will deal with everything for you. ModelSerializer's resulting behaviour is in my opinion fence-sitting in a way that leaves no winners. Those that want to handle the state change process themselves will override create() and update(), or not use ModelSerializer altogether in favour of a vanilla Serializer. Everyone else will reimplement anything they already have in Model.clean() just for Django REST Framework, or—worse—just not realise that Model.clean() isn't being called in the first place.

The third option: calling Model.clean() from ModelSerializer.validate(), needs to be addressed separately. Django REST Framework's 3.0 announcement post implies in no uncertain terms that you should need a good reason to do this, and really pushes you to reimplement validation on your ModelSerializer.

I've used this approach for years in mature projects without little issue. My experience leads me to view the language used in the announcement post as unnecessarily strong.

Xavier's Stack Overflow question from earlier alludes to Django REST Framework having painted itself into a corner with some functionality (e.g. nested writable serializers) which is fundamentally incompatible with supporting model validation. I seldom tend to use this functionality myself. I acknowledge that I may be missing something, but I see Xavier's take that "models shouldn't care about the business layer" as overly simplistic. It can certainly be true for some Django projects, but it's certainly not true for others. Using Model.clean() is endorsed by Django's documentation after all. I've gotten by just fine using this pattern in the process of delivering projects that were by all accounts technical and business successes. And, again, I agree that there are other situations in which it is a less appropriate approach.

Anyways, there are a few ways you can go from here. Let's take a look.

# Option 1: Reimplement your validation logic on your `ModelSerializer`

First, check check if something like field-level validation is more appropriate.

The Django REST Framework 3.0 announcement blog post recommends overriding ModelSerializer.validate(). This is a hook for validation specifically, separate from the actual state change logic.

serializers.py

from rest_framework import serializers

class UserSerializer(serializers.ModelSerializer):

    # ...

    def validate(self, data):
        if data['name'] == 'Bruce' data['age'] < 40:
            raise serializers.ValidationError("I don't believe you.")
        return data

# Option 2: Implement a business logic layer

You could extend the previous example by also overriding ModelSerializer.create() and ModelSerializer.update() to interact with your business logic layer. It might look something like this:

serializers.py

from .models import User

class UserSerializer(serializers.ModelSerializer):

    # ...

    def validate(self, data):
        if data['name'] == 'Bruce' data['age'] < 40:
            raise serializers.ValidationError("I don't believe you.")
        return data

    def create(self, validated_data):
        return User.objects.your_create_method(**validated_data)

    def update(self, instance, validated_data):
        return instance.your_update_method(**validated_data)

You may look at this and decide that the (existing) validation logic should actually be the responsibility of the business logic layer. There's no harm in performing validation in create() / update() instead of validate(). The validate() hook is just offered for the sake of convenience / improved semantics.

serializers.py

class UserSerializer(serializers.ModelSerializer):

    # ...

    def create(self, validated_data):
        return User.objects.your_create_method(**validated_data)

    def update(self, instance, validated_data):
        return instance.your_update_method(**validated_data)

This looks simple on the face of it, but it leaves plenty of questions unanswered. For example:

How does your business logic layer report that it's been provided invalid data? Does it raise ValidationError? Or do you raise some other exception, catch these in your ModelSerializer, and raise a ValidationError?
Bonus question: If you raise ValidationError, do you raise Django's ValidationError or Django REST Framework's ValidationError? They are different!
Some validation is still being handled by Django REST Framework as ModelSerializer is still generating fields and validators for you. What is the contract between your ModelSerializer and your services layer? What guarantees are being made regarding the validity of the provided data? Can we even assume that the data types are correct? Are you—at a point—better off just using Serializer and doing everything yourself?

There are no one-size-fits-all answers to these questions. This is what software development is all about! The important thing is to understand that these things do matter, and to be thoughtful and consistent in your approach.

# Option 3: Re-use model-level validation

You can throw caution to the wind and persevere with running Model.clean() from your ModelSerializer's validation routine.

Django REST Framework's 3.0 announcement provides a brief example of this:

serializers.py

def validate(self, attrs):
    instance = ExampleModel(**attrs)
    instance.clean()
    return attrs

Whilst the battle-tested version I use in my day job has grown to be more complex than this, the above snippet is sufficient for many purposes, and serve as a good starting point for anything more complex.

Model validation in Django REST Framework

# But why?

# Relating this back to Django REST Framework

# Option 1: Reimplement your validation logic on your ModelSerializer

# Option 2: Implement a business logic layer

# Option 3: Re-use model-level validation

# Option 1: Reimplement your validation logic on your `ModelSerializer`