Feed: Nikita Sobolev

Entries found: 10

Python ParamSpec guide

Published: 2021-12-31T00:00:00+00:00
Updated: 2021-12-31T00:00:00+00:00
UTC: 2021-12-31 00:00:00+00:00
URL: https://sobolevn.me/2021/12/paramspec-guide

Before ParamSpec (PEP612) was released in Python3.10 and typing_extensions, there was a big problem in typing decorators that change a function’s signature.
Content Preview

Before ParamSpec ( PEP612 ) was released in Python3.10 and typing_extensions , there was a big problem in typing decorators that change a function’s signature.

Let’s start with a basic example. How one can type a decorator function that does not change anything?

from typing import Callable, TypeVar

C = TypeVar('C', bound=Callable)

def logger(function: C) -> C:
    def decorator(*args, **kwargs):
        print('Function called!')
        return function(*args, **kwargs)
    return decorator

Notice the most important part here: C = TypeVar('C', bound=Callable)

What does it mean? It means that we take any callable in and return the exact same callable.

This allows you to decorate any function and preserve its signature:

@logger
def example(arg: int, other: str) -> tuple[int, str]:
    return arg, other

reveal_type(example)  # (arg: int, other: str) -> tuple[int, str]

But, there’s a problem when a function does want to change something. Imagine, that some decorator might also add None as a return value in some cases:

def catch_exception(function):
    def decorator(*args, **kwargs):
        try:
            return function(*args, **kwargs)
        except Exception:
            return None
    return decorator

This is a perfectly valid Python code. But how can we type it? Note that we cannot use TypeVar('C', bound=Callable) anymore, since we are changing the return type now.

Initially, I’ve tried something like:

def catch_exception(function: Callable[..., T]) -> Callable[..., Optional[T]]:
    ...

But, this means a different thing: it turns all function’s arguments into *args: Any, **kwargs: Any , but, the return type will be correct. Generally, this is not what we need when it comes to type-safety.

The second way to do that in a type-safe way is adding a custom Mypy plugin. Here’s our example from dry-python/returns to support decorators that were changing return types . But, plugins are quite hard to write (you need to learn a bit of Mypy’s API), they are not universal (for example, Pyright does not understand Mypy plugins), and they require to be explicitly installed by the end user.

That’s why ParamSpec was added. Here’s how it can be used in this case:

from typing import Callable, TypeVar, Optional
from typing_extensions import ParamSpec  # or `typing` for `python>=3.10`

T = TypeVar('T')
P = ParamSpec('P')

def catch_exception(function: Callable[P, T]) -> Callable[P, Optional[T]]:
    def decorator(*args: P.args, **kwargs: P.kwargs) -> Optional[T]:
        try:
            return function(*args, **kwargs)
        except Exception:
            return None
    return decorator

Now, all decorated functions will preserve their argument types and change their return type to include None :

@catch_exception
def div(arg: int) -> float:
    return arg / arg

reveal_type(div)  # (arg: int) -> Optional[float]

@catch_exception
def plus(arg: int, other: int) -> int:
    return arg + other

reveal_type(plus)  # (arg: int, other: int) -> Optional[int]:

The recent release of Mypy 0.930 with ParamSpec support allowed us to remove our custom Mypy plugin and use a well-defined primitive. Here’s a commit to show how easy our transition was. It was even released today in returns@0.18.0 , check it out!

What’s next? Concatenate

But, that’s not all! Because some decorators modify argument types, PEP612 also adds the Concatenate type that allows prepending, appending, transforming, or removing function arguments.

Unfortunately, Mypy does not support Concatenate just yet, but I can show you some examples from PEP itself. Here’s how it is going to work.

Let’s start with some basic definitions:

from typing_extensions import ParamSpec, Concatenate  # or `typing` for `python>=3.10`

P = ParamSpec('P')

def bar(x: int, *args: bool) -> int: ...

We are going to change the type of bar function with the help of P parameter specification. First, let’s prepend an str argument to this function:

def add(x: Callable[P, int]) -> Callable[Concatenate[str, P], int]: ...

add(bar)  # (str, /, x: int, *args: bool) -> int

Notice that a positional-only str argument is added to the return type of add(bar) . Now, let’s try removing an argument:

def remove(x: Callable[Concatenate[int, P], int]) -> Callable[P, int]: ...

remove(bar)  # (*args: bool) -> int

Because we use P and Concatenate in the argument type, the return type will not have an int argument anymore.

And finally, let’s change an argument type from int to str and return type from int to bool :

def transform(
    x: Callable[Concatenate[int, P], int]
) -> Callable[Concatenate[str, P], bool]: ...

transform(bar)  # (str, /, *args: bool) -> bool

Looking forward to new Mypy release with Concatenate support. I totally know some places where it will be useful.

Conclusion

PEP612 adds two very powerful abstractions that allow us to better type our functions and decorators, which play a very important role in Python’s world.

Complex projects (like Django ) or simple type-safe scripts can highly benefit from this new typing feature. And I hope you will!

Happy New Year!

Newly released feature in PEP612 allows you do a lot of advanced typing things with functions and their signatures

Typeclasses in Python

Published: 2021-06-30T00:00:00+00:00
Updated: 2021-06-30T00:00:00+00:00
UTC: 2021-06-30 00:00:00+00:00
URL: https://sobolevn.me/2021/06/typeclasses-in-python

Today I am going to introduce a new concept for Python developers: typeclasses. It is a concept behind our new dry-python library called classes.
Content Preview

Today I am going to introduce a new concept for Python developers: typeclasses. It is a concept behind our new dry-python library called classes .

I will tell you in advance, that it will look very familiar to what you already know and possibly even use. Moreover, we reuse a lot of existing code from Python’s standard library. So, you can call this approach “native” and “pythonic”. And it is still going to be interesting: I am showing examples in 4 different languages!

But, before discussing typeclasses themselves, let’s discuss what problem they do solve.

Some functions must behave differently

Ok, this one is a familiar problem to all of the devs out there. How can we write a function that will behave differently for different types?

Let’s create an example. We want to greet different types differently (yes, “hello world” examples, here we go). We want to greet :

  • str instances as Hello, {string_content}!
  • MyUser instances as Hello again, {username}

Note, that greet as a simple example does not really make much “business” sense, but more complicated things like to_json , from_json , to_sql , from_sql , and to_binary do make a lot of sense and can be found in almost any project. But, for the sake of implementation simplicity, I’m going to stick to our greet example.

The first approach that comes to our minds is to use isinstance() checks inside the function itself. And it can work in some cases! The only requirement is that we must know all the types we will work with in advance.

Here’s how it would look like:

@dataclass
class MyUser(object):
    name: str

def greet(instance: str | MyUser) -> str:
    if isinstance(instance, str):
        return 'Hello, "{0}"!'.format(instance)
    elif isinstance(instance, MyUser):
        return 'Hello again, {0}'.format(instance.name)
    raise NotImplementedError(
        'Cannot greet "{0}" type'.format(type(instance)),
    )

The main limitation is that we cannot extend this function for other type easily (we can use wrapper function, but I consiser this a redefinition).

But, in some cases - isinstance won’t be enough, because we need extendability. We need to support other types, which are unknown in advance. Our users might need to greet their custom types.

And that’s the part where things begin to get interesting.

All programming languages address this problem differently. Let’s start with Python’s traditional OOP approach.

OOP extendability and over-abstraction problems

So, how does Python solve this problem?

We all know that Python has magic methods for some builtin functions like len() and __len__ , it solves exactly the same problem.

Let’s say we want to greet a user:

@dataclass
class MyUser(object):
    name: str

    def greet(self) -> str:
        return 'Hello again, {0}'.format(self.name)

You can use this method directly or you can create a helper with typing.Protocol :

from typing_extensions import Protocol

class CanGreet(Protocol):
    def greet(self) -> str:
        """
        It will match any object that has the ``greet`` method.

        Mypy will also check that ``greet`` must return ``str``.
        """

def greet(instance: CanGreet) -> str:
    return instance.greet()

And then we can use it:

print(greet(MyUser(name='example')))
# Hello again, example

So, it works? Not really .

There are several problems.

First , some classes do not want to know some details about themselves to maintain abstraction integrity. For example:

class Person(object):
    def become_friends(self, friend: 'Person') -> None:
         ...

    def is_friend_of(self, person: 'Person') -> bool:
        ...

    def get_pets(self) -> Sequence['Pet']:
        ...

Does this Person (pun intended) deserve to know that some to_json conversion exists that can turn this poor Person into textual data? What about binary pickling? Of course not, these details should not be added to a business-level abstraction, this is called a leaky abstraction when you do otherwise.

Moreover, I think that mixing structure and behavior into a single abstraction is bad. Why? Because you cannot tell in advance what behavior you would need from a given structure.

For abstractions on this level, it is way easier to have behavior near the structure, not inside it. Mixing these two only makes sense when we work on a higher level like services or processes .

Second , it only works for custom types. Existing types are hard to extend . For example, how would you add the greet method to the str type?

You can create str subtype with greet method in it:

class MyStr(str):
    def greet(self) -> str:
        return 'Hello, {0}!'.format(self)

But, this would require a change in our usage:

print(greet(MyStr('world')))
# Hello, world!

print(greet('world'))
# fails with TypeError

Monkey-patching

Some might suggest that we can just insert the needed methods directly into an object / type. Some dynamically typed languages went on this path: JavaScript (in 2000s and early 2010s, mostly popularized by jQuery plugins) and Ruby ( still happening right now ). Here’s how it looks:

String.prototype.greet = function (string) {
    return `Hello, ${string}!`
}

It is quite obvious, that it is not going to work for anything complex. Why ?

  • Different parts of your program might use monkey-patching of methods with the same name, but with different functionality. And nothing will work
  • It is hard to read because the original source does not contain the patched method and the patching location might be hidden deeply in other files
  • It is hard to type, for example, mypy does not support it at all
  • Python community is not used to this style, it would be rather hard to persuade them to write their code like this (and that’s a good thing!)

I hope that it is clear: we won’t fall into this trap. Let’s consider another alternative.

Extra abstractions

People familiar with things like django-rest-framework might recommend to add special abstractions to greet different types:

import abc
from typing import Generic, TypeVar

_Wrapped = TypeVar('_Wrapped')

class BaseGreet(Generic[_Wrapped]):
    """Abstract class of all other """

    def __init__(self, wrapped: _Wrapped) -> None:
        self._wrapped = wrapped

    @abc.abstractmethod
    def greet(self) -> str:
        raise NotImplementedError

class StrGreet(BaseGreet[str]):
    """Wrapped instance of built-in type ``str``."""

    def greet(self) -> str:
        return 'Hello, {0}!'.format(self._wrapped)

# Our custom type:

@dataclass
class MyUser(object):
    name: str

class MyUserGreet(BaseGreet[MyUser]):
    def greet(self) -> str:
        return 'Hello again, {0}'.format(self._wrapped.name)

And we can use it like so:

print(greet(MyStrGreet('world')))
# Hello, world!

print(greet(MyUserGreet(MyUser(name='example'))))
# Hello again, example

But, now we have a different problem: we have a gap between real types and their wrappers. There’s no easy way to wrap a type into its wrapper. How can we match them? We have to do it either by hand or use some kind of registry like Dict[type, Type[BaseGreet]] .

And it is still not enough, there will be runtime errors! In practice, it ends up like <X> is not json-serializable as many of us might have seen it with drf ’s serializers when trying to serialize a custom unregistered type.

Typeclasses and similar concepts

Let’s look at how functional languages (and Rust , people still argue whether it is functional or not) handle this problem.

Some common knowledge:

  • All these languages don’t have class concept as we know it in Python and, of course, there’s no subclassing
  • All the languages below don’t have object s as we do in Python, they don’t mix behavior and structure (however, Elixir has Alan Kay’s real objects )
  • Instead, these languages use ad-hoc polymorphism to make functions behave differently for different types via overloading
  • And, of course, you don’t have to know any of the languages below to understand what is going on

Elixir

Let’s start with one of my favorites. Elixir has Protocol s to achieve what we want:

@doc "Our custom protocol"
defprotocol Greet do
  # This is an abstract function,
  # that will behave differently for each type.
  def greet(data)
end

@doc "Enhancing built-in type"
defimpl Greet, for: BitString do
  def greet(string), do: "Hello, #{string}!"
end

@doc "Custom data type"
defmodule MyUser do
  defstruct [:name]
end

@doc "Enhancing our own type"
defimpl Greet, for: MyUser do
  def greet(user), do: "Hello again, #{user.name}"
end

I am pretty sure that my readers were able to read and understand Elixir even if they are not familiar with this language. That’s what I call beauty!

Usage of the code above:

# Using our `Greet.greet` function with both our data types:
IO.puts(Greet.greet("world"))
# Hello, world!
IO.puts(Greet.greet(%MyUser{name: "example"}))
# Hello again, example

The thing with Elixir ’s Protocol s is that it is not currently possible to express that some type does support our Greet.greet for Elixir ’s type checker . But, this is not a big deal for Elixir , which is 100% dynamically typed.

Protocols are very widely used, they power lots of the language’s features. Here are some real-life examples:

  • Enumerable allows to work with collections: counting elements, finding members, reducing, and slicing
  • String.Chars is something like __str__ in Python, it converts structures to human-readable format

Rust

Rust has Trait s . The concept is pretty similar to Protocol s in Elixir :

// Our custom trait
trait Greet {
    fn greet(&self) -> String;
}

// Enhancing built-in type
impl Greet for String {
    fn greet(&self) -> String {
        return format!("Hello, {}!", &self);
    }
}

// Defining our own type
struct MyUser {
    name: String,
}

// Enhancing it
impl Greet for MyUser {
    fn greet(&self) -> String {
        return format!("Hello again, {}", self.name);
    }
}

And of course, due to Rust ’s static typing, we can express that some function’s argument supports the trait we have just defined:

// We can express that `greet` function only accepts types
// that implement `Greet` trait:
fn greet(instance: &dyn Greet) -> String {
    return instance.greet();
}

pub fn main() {
    // Using our `greet` function with both our data types:
    println!("{}", greet(&"world".to_string()));
    // Hello, world!
    println!("{}", greet(&MyUser { name: "example".to_string() }));
    // Hello again, example
}

See? The idea is so similar, that it uses almost the same syntax as Elixir .

Notable real-life examples of how Rust uses its Trait s:

  • Copy and Clone - duplicating objects
  • Debug to show better repr of an object, again like __str__ in Python

Basically, Trait s are the core of this language, it is widely used in cases when you need to define any shared behavior.

Haskell

Haskell has typeclasses to do almost the same thing.

So, what’s a typeclass? Typeclass is a group of types, all of which satisfy some common contract. It is also a form of ad-hoc polymorphism that is mostly used for overloading.

I am a bit sorry for the Haskell syntax below, it might be not very pleasant and clear to read, especially for people who are not familiar with this brilliant language, but we have what we have:

{-# LANGUAGE FlexibleInstances #-}

-- Our custom typeclass
class Greet instance where
  greet :: instance -> String

-- Enhancing built-in type with it
instance Greet String where
  greet str = "Hello, " ++ str ++ "!"

-- Defining our own type
data MyUser = MyUser { name :: String }

-- Enhancing it
instance Greet MyUser where
  greet user = "Hello again, " ++ (name user)

Basically, we do the same thing as we have already done for Rust and Elixir :

  1. We define a Greet typeclass that has a single function to implement: greet
  2. Then we define instance implementation for String type, which is a built-in (alias for [Char] )
  3. Then we define custom MyUser type with name field of String type
  4. Implementing the Greet typeclass for MyUser is the last thing we do

Then we can use our new greet function:

-- Here you can see that we can use `Greet` typeclass to annotate our types.
-- I have made this alias entirely for this annotation demo,
-- in real life we would just use `greet` directly:
greetAlias :: Greet instance => instance -> String
greetAlias = greet

main = do
  print $ greetAlias "world"
  -- Hello, world!
  print $ greetAlias MyUser { name="example" }
  -- Hello again, example

Some real-life examples of typeclasses:

I would say that among our three examples, Haskell relies on its typeclasses the heaviest.

It is important to note that typeclasses from Haskell and traits from Rust are a bit different , but we won’t go into these details to keep this article rather short.

But, what about Python?

dry-python/classes

There’s an awesome function in the Python standard library called singledispatch .

It does exactly what we need. Do you still remember that we are finding a way to change the function’s behavior based on the input type?

Let’s have a look!

from functools import singledispatch

@singledispatch
def greet(instance) -> str:
    """Default case."""
    raise NotImplementedError

@greet.register
def _greet_str(instance: str) -> str:
    return 'Hello, {0}!'.format(instance)

# Custom type

@dataclass
class MyUser(object):
    name: str

@greet.register
def _greet_myuser(instance: MyUser) -> str:
    return 'Hello again, {0}'.format(instance.name)

Looks cool, moreover, it is in standard lib, you even don’t have to install anything!

And we can use it like a normal function:

print(greet('world'))
# Hello, world!
print(greet(MyUser(name='example')))
# Hello again, example

So, what’s the point in writing a completely different library like we did with dry-python/classes ?

We even reuse some parts of singledispatch implementation, but there are several key differences.

Better typing

With singledispatch you cannot be sure that everything will work, because it is not supported by mypy .

For example, you can pass unsupported types:

greet(1)  # mypy is ok with that :(
# runtime will raise `NotImplementedError`

In dry-python/classes we have fixed that. You can only pass types that are supported:

from classes import typeclass

@typeclass
def greet(instance) -> str:
    ...

@greet.instance(str)
def _greet_str(instance: str) -> str:
    return 'Iterable!'

greet(1)
# Argument 1 to "greet" has incompatible type "int"; expected "str"

Or you can break the @singledispatch signature contract:

@greet.register
def _greet_dict(instance: dict, key: str) -> int:
    return instance[key]  # still no mypy error

But, not with dry-python/classes :

@greet.instance(dict)
def _greet_dict(instance: dict, key: str) -> int:
    ...
# Instance callback is incompatible
# "def (instance: builtins.dict[Any, Any], key: builtins.str) -> builtins.int";
# expected
# "def (instance: builtins.dict[Any, Any]) -> builtins.str"

@singledispatch also does not allow defining generic functions:

@singledispatch
def copy(instance: X) -> X:
    """Default case."""
    raise NotImplementedError

@copy.register
def _copy_int(instance: int) -> int:
    return instance
# Argument 1 to "register" of "_SingleDispatchCallable"
# has incompatible type "Callable[[int], int]";
# expected "Callable[..., X]"

reveal_type(copy(1))
# Revealed type is "X`-1"
# Should be: `int`

Which is, again, possible with dry-python/classes , we fully support generic functions :

from typing import TypeVar
from classes import typeclass

X = TypeVar('X')

@typeclass
def copy(instance: X) -> X:
    ...

@copy.instance(int)
def _copy_int(instance: int) -> int:
    ...  # ok

reveal_type(copy(1))  # int

And you cannot restrict @singledispatch to work with only subtypes of specific types, even if you want to.

Protocols are unsupported

Protocols are an important part of Python. Sadly, they are not supported by @singledispatch :

@greet.register
def _greet_iterable(instance: Iterable) -> str:
    return 'Iterable!'
# TypeError: Invalid annotation for 'instance'.
# typing.Iterable is not a class

Protocols support is also solved with dry-python/classes :

from typing import Iterable
from classes import typeclass

@typeclass
def greet(instance) -> str:
    ...

@greet.instance(Iterable, is_protocol=True)
def _greet_str(instance: Iterable) -> str:
    return 'Iterable!'

print(greet([1, 2, 3]))
# Iterable!

No way to annotate types

Let’s say you want to write a function and annotate one of its arguments that it must support the greet function. Something like:

def greet_and_print(instance: '???') -> None:
    print(greet(instance))

It is impossible with @singledispatch . But, you can do it with dry-python/classes :

from classes import AssociatedType, Supports, typeclass

class Greet(AssociatedType):
    """Special type to represent that some instance can `greet`."""

@typeclass(Greet)
def greet(instance) -> str:
    """No implementation needed."""

@greet.instance(str)
def _greet_str(instance: str) -> str:
    return 'Hello, {0}!'.format(instance)

def greet_and_print(instance: Supports[Greet]) -> None:
    print(greet(instance))

greet_and_print('world')  # ok
greet_and_print(1)  # type error with mypy, exception in runtime
# Argument 1 to "greet_and_print" has incompatible type "int";
# expected "Supports[Greet]"

Conclusion

We have come a long way, from basic stacked isinstance() conditions - through OOP - to typeclasses.

I have shown, that this native and pythonic idea deserves wider recognition and usage. And our extra features in dry-python/classes can save you from lots of mistakes and help to write more expressive and safe business logic.

As a result of using typeclasses, you will untangle your structures from behavior, which will allow you to get rid of useless and complex abstractions and write dead-simple typesafe code. You will have your behavior near the structures, not inside them. This will also solve the extendability problem of OOP.

Combine it with other dry-python libraries for extra effect!

Future work

What do we plan for the future?

There are several key aspects to improve:

  1. Our Supports should take any amount of type arguments: Supports[A, B, C] . This type will represent a type that supports all three typeclasses A , B , and C at the same time
  2. We don’t support concrete generics just yet. So, for example, it is impossible to define different cases for List[int] and List[str] . This might require adding runtime typecheker to dry-python/classes
  3. I am planning to make tests a part of this app as well! We will ship a hypothesis plugin to test users’ typeclasses in a single line of code

Stay tuned!

If you like this article you can:

  1. Donate to future dry-python development on GitHub
  2. Star our classes repo
  3. Subscribe to my blog for more content!
Typeclasses is a new (but familiar) idea of how you can organize behavior around your types

Make tests a part of your app

Published: 2021-02-28T00:00:00+00:00
Updated: 2021-02-28T00:00:00+00:00
UTC: 2021-02-28 00:00:00+00:00
URL: https://sobolevn.me/2021/02/make-tests-a-part-of-your-app

Higher Kinded Types in Python

Published: 2020-10-24T00:00:00+00:00
Updated: 2020-10-24T00:00:00+00:00
UTC: 2020-10-24 00:00:00+00:00
URL: https://sobolevn.me/2020/10/higher-kinded-types-in-python

How async should have been

Published: 2020-06-07T00:00:00+00:00
Updated: 2020-06-07T00:00:00+00:00
UTC: 2020-06-07 00:00:00+00:00
URL: https://sobolevn.me/2020/06/how-async-should-have-been

In the last few years async keyword and semantics made its way into many popular programming languages: JavaScript, Rust, C#, and many others languages that I don’t know or don’t use.
Content Preview

In the last few years async keyword and semantics made its way into many popular programming languages: JavaScript , Rust , C# , and many others languages that I don’t know or don’t use.

Of course, Python also has async and await keywords since python3.5 .

In this article, I would like to provide my opinion about this feature, think of alternatives, and provide a new solution.

Colours of functions

When introducing async functions into the languages, we actually end up with a split world. Now, some functions start to be red (or async ) and old ones continue to be blue (sync).

The thing about this division is that blue functions cannot call red ones. Red ones potentially can call blue ones. In Python, for example, it is partially true. Async functions can only call sync non-blocking functions. Is it possible to tell whether this function is blocking or not by its definition? Of course not! Python is a scripting language, don’t forget about that!

This division creates two subsets of a single language: sync and async ones. 5 years passed since the release of python3.5 , but async support is not even near to what we have in the sync python world.

Read this brilliant piece if you want to learn more about colors of functions.

Code duplication

Different colors of functions lead to a more practical problem: code duplication.

Imagine, that you are writing a CLI tool to fetch sizes of web pages. And you want to support both sync and async ways of doing it. This might be very useful for library authors when you don’t know how your code is going to be used. It is not limited to just PyPI libraries, but also includes your in-company libraries with shared logic for different services written, for example, in Django and aiohttp. Or any other sync and async code. But, I must admit that single applications are mostly written in sync or async way only.

Let’s start with the sync pseudo-code:

def fetch_resource_size(url: str) -> int:
    response = client_get(url)
    return len(response.content)

Looking pretty good! Now, let’s add its async counterpart:

async def fetch_resource_size(url: str) -> int:
    response = await client_get(url)
    return len(response.content)

It is basically the same code, but filled with async and await keywords! And I am not making this up, just compare code sample in httpx tutorial:

They show exactly the same picture.

Abstraction and Composition

Ok, we find ourselves in a situation where we need to rewrite all sync code and add async and await keywords here and there, so our program would become asynchronous.

These two principles can help us in solving this problem.

First of all, let’s rewrite our imperative pseudo-code into a functional pseudo-code. This will allow us to see the pattern more clearly:

def fetch_resource_size(url: str) -> Abstraction[int]:
    return client_get(url).map(
        lambda response: len(response.content),
    )

What is this .map method? What does it do?

This is a functional way of composing complex abstractions and pure functions. This method allows creating a new abstraction from the existing one with the new state. Let’s say that when we call client_get(url) it initially returns Abstraction[Response] and calling .map(lambda response: len(response.content)) transforms it to the needed Abstraction[int] instance.

Now the steps are pretty clear! Notice how easily we went from several independent steps into a single pipeline of function calls. We have also changed the return type of this function: now it returns some Abstraction .

Now, let’s rewrite our code to work with async version:

def fetch_resource_size(url: str) -> AsyncAbstraction[int]:
    return client_get(url).map(
        lambda response: len(response.content),
    )

Wow, that’s mostly it! The only thing that is different is the AsyncAbstraction return type. Other than that, our code stayed exactly the same. We also don’t need to use async and await keywords anymore. We don’t use await at all (that’s the whole point of our journey!), and async functions do not make any sense without await .

The last thing we need is to decide which client we want: async or sync one. Let’s fix that!

def fetch_resource_size(
    client_get: Callable[[str], AbstactionType[Response]],
    url: str,
) -> AbstactionType[int]:
    return client_get(url).map(
        lambda response: len(response.content),
    )

Our client_get is now an argument of a callable type that receives a single URL string as an input and returns some AbstractionType over Response object. This AbstractionType is either Abstraction or AsyncAbstraction we have already seen on the previous samples.

When we pass Abstraction our code works like a sync one, when AsyncAbstraction is passed, the same code automatically starts to work asynchronously.

IOResult and FutureResult

Luckily, we already have the right abstractions in dry-python/returns !

Let me introduce to you type-safe, mypy -friendly, framework-independent, pure-python tool to provide you awesome abstractions you can use in any project!

Sync version

Before we go any further, to make this example reproducible, I need to provide dependencies that are going to be used later:

pip install returns httpx anyio

Let’s move on!

One can rewrite this pseudo-code as a real working python code. Let’s start with the sync version:

from typing import Callable

import httpx

from returns.io import IOResultE, impure_safe

def fetch_resource_size(
    client_get: Callable[[str], IOResultE[httpx.Response]],
    url: str,
) -> IOResultE[int]:
    return client_get(url).map(
        lambda response: len(response.content),
    )

print(fetch_resource_size(
    impure_safe(httpx.get),
    'https://sobolevn.me',
))
# => <IOResult: <Success: 27972>>

We have changed a couple of things to make our pseudo-code real:

  1. We now use IOResultE which is a functional way to handle sync IO that might fail. Remember, exceptions are not always welcome ! Result -based types allow modeling exceptions as separate Failure() values. While successful values are wrapped in Success type. In a traditional approach, no one cares about exceptions. But, we do care ❤️
  2. We use httpx that can work with sync and async requests
  3. We use impure_safe function to convert the return type of httpx.get to return the abstraction we need: IOResultE

Now, let’s try the async version!

Async version

from typing import Callable

import anyio
import httpx

from returns.future import FutureResultE, future_safe

def fetch_resource_size(
    client_get: Callable[[str], FutureResultE[httpx.Response]],
    url: str,
) -> FutureResultE[int]:
    return client_get(url).map(
        lambda response: len(response.content),
    )

page_size = fetch_resource_size(
    future_safe(httpx.AsyncClient().get),
    'https://sobolevn.me',
)
print(page_size)
print(anyio.run(page_size.awaitable))
# => <FutureResult: <coroutine object async_map at 0x10b17c320>>
# => <IOResult: <Success: 27972>>

Notice, that we have exactly the same result, but now our code works asynchronously. And its core part didn’t change at all!

However, it has some important notes:

  1. We changed sync IOResultE into async FutureResultE and impure_safe to future_safe , which does the same thing but returns another abstraction: FutureResultE
  2. We now also use AsyncClient from httpx
  3. We are also required to run the resulting FutureResult value. Because red functions cannot run themselves! To demonstrate that this approach works with any async library ( asyncio , trio , curio ), I am using anyio utility

Combining the two

And now I can show you how you can combine these two versions into a single type-safe API.

Update after HKT support is released :

Now, after returns@0.14.0 is released, you can have a look what this program looks like with Higher Kinded Types, link . It is 100% recommended over the version above.

I am going to keep the old version for the historical reasons.

Old version :

Sadly, Higher Kinded Types and proper type-classes are work-in-progress, so we would use regular @overload function cases:

from typing import Callable, Union, overload

import anyio
import httpx

from returns.future import FutureResultE, future_safe
from returns.io import IOResultE, impure_safe

@overload
def fetch_resource_size(
    client_get: Callable[[str], IOResultE[httpx.Response]],
    url: str,
) -> IOResultE[int]:
    """Sync case."""

@overload
def fetch_resource_size(
    client_get: Callable[[str], FutureResultE[httpx.Response]],
    url: str,
) -> FutureResultE[int]:
    """Async case."""

def fetch_resource_size(
    client_get: Union[
        Callable[[str], IOResultE[httpx.Response]],
        Callable[[str], FutureResultE[httpx.Response]],
    ],
    url: str,
) -> Union[IOResultE[int], FutureResultE[int]]:
    return client_get(url).map(
        lambda response: len(response.content),
    )

With @overload decorators we describe which combinations of inputs are allowed. And what return type will they produce. You can read more about @overload decorator here .

Finally, calling our function with both sync and async client:

# Sync:
print(fetch_resource_size(
    impure_safe(httpx.get),
    'https://sobolevn.me',
))
# => <IOResult: <Success: 27972>>

# Async:
page_size = fetch_resource_size(
    future_safe(httpx.AsyncClient().get),
    'https://sobolevn.me',
)
print(page_size)
print(anyio.run(page_size.awaitable))
# => <FutureResult: <coroutine object async_map at 0x10b17c320>>
# => <IOResult: <Success: 27972>>

As you can see fetch_resource_size with sync client immediately returns IOResult and can execute itself. In contrast to it, async version requires some event-loop to execute it. Like a regular coroutine. We use anyio for the demo.

mypy is pretty happy about our code too:

» mypy async_and_sync.py
Success: no issues found in 1 source file

Let’s try to screw something up:

---lambda response: len(response.content),
+++lambda response: response.content,

And check that new error will be caught by mypy :

» mypy async_and_sync.py
async_and_sync.py:33: error: Argument 1 to "map" of "IOResult" has incompatible type "Callable[[Response], bytes]"; expected "Callable[[Response], int]"
async_and_sync.py:33: error: Argument 1 to "map" of "FutureResult" has incompatible type "Callable[[Response], bytes]"; expected "Callable[[Response], int]"
async_and_sync.py:33: error: Incompatible return value type (got "bytes", expected "int")

As you can see, there’s nothing magical in a way how async code can be written with right abstractions. Inside our implementation, there’s still no magic. Just good old composition. What we real magic we do is providing the same API for different types - this allows us to abstract away how, for example, HTTP requests work: synchronously or asynchronously.

I hope, that this quick demo shows how awesome async programs can be! Feel free to try new dry-python/returns@0.14 release, it has lots of other goodies!

Other awesome features

Speaking about goodies, I want to highlight several of features I am most proud of:

from returns.curry import curry, partial

def example(a: int, b: str) -> float:
    ...

reveal_type(partial(example, 1))
# note: Revealed type is 'def (b: builtins.str) -> builtins.float'

reveal_type(curry(example))
# note: Revealed type is 'Overload(def (a: builtins.int) -> def (b: builtins.str) -> builtins.float, def (a: builtins.int, b: builtins.str) -> builtins.float)'

Which means, that you can use @curry like so:

@curry
def example(a: int, b: str) -> float:
    return float(a + len(b))

assert example(1, 'abc') == 4.0
assert example(1)('abc') == 4.0

You can now use functional pipelines with full type inference that is augmentated by a custom mypy plugin:

from returns.pipeline import flow
assert flow(
    [1, 2, 3],
    lambda collection: max(collection),
    lambda max_number: -max_number,
) == -3

We all know how hard it is to work with lambda s in typed code because its arguments always have Any type. And this might break regular mypy inference.

Now, we always know that lambda collection: max(collection) has Callable[[List[int]], int] type inside this pipeline. And lambda max_number: -max_number is just Callable[[int], int] . You can pass any number of arguments to flow , they all will work perfectly. Thanks to our custom plugin!

It is an abstraction over FutureResult we have already covered in this article. It might be used to explicitly pass dependencies in a functional manner in your async programs.

To be done

However, there are more things to be done before we can hit 1.0 :

  1. We need to implement Higher Kinded Types or their emulation, source
  2. Adding proper type-classes, so we can implement needed abstractions, source
  3. We would love to try the mypyc compiler. It will potentially allow us to compile our typed-annotated Python program to binary. And as a result, simply dropping in dry-python/returns into your program will make it several times faster, source
  4. Finding new ways to write functional Python code, like our on-going investigation of “do-notation”

Conclusion

Composition and abstraction can solve any problem. In this article, I have shown you how they can solve a problem of function colors and allow people to write simple, readable, and still flexible code that works. And type-checks.

Check out dry-python/returns , provide your feedback, learn new ideas, maybe even help us to sustain this project !

And as always, follow me on GitHub to keep up with new awesome Python libraries I make or help with!

Gratis

Thanks for reviewing this post to @AlwxSin and @ditansu .

Your sync and async code can be identical, but still, can work differently. It is a matter of right abstractions. In this article, I will show how one can write sync code to run async programs in Python.

Do not log

Published: 2020-03-11T00:00:00+00:00
Updated: 2020-03-11T00:00:00+00:00
UTC: 2020-03-11 00:00:00+00:00
URL: https://sobolevn.me/2020/03/do-not-log

Almost every week I accidentally get into this logging argument. Here’s the problem: people tend to log different things and call it a best-practice. And I am not sure why. When I start discussing this with other people I always end up repeating the exact same ideas over and over again.
Content Preview

Almost every week I accidentally get into this logging argument. Here’s the problem: people tend to log different things and call it a best-practice. And I am not sure why. When I start discussing this with other people I always end up repeating the exact same ideas over and over again.

So. Today I want to criticize the whole logging culture and provide a bunch of alternatives.

Logging does not make sense

Let’s start with the most important one. Logging does not make any sense!

Let’s review a popular code sample that you can find all across the internet:

try:
    do_something_complex(*args, **kwargs)
except MyException as ex:
    logger.log(ex)

So, what is going on here? Some complex computation fails and we log it. It seems like a good thing to do, doesn’t it? Well. In this situation, I usually ask several important questions.

The first question is : can we make bad states for do_something_complex() unreachable? If so, let’s refactor our code to be type-safe. And eliminate all possible exceptions that can happen here with mypy . Sometimes this helps. And in this case, we would not have any exception handling and logging at all.

The second question I ask : is it important that do_something_complex() did fail? There are a lot of cases when we don’t care. Because of the retries, queues, or it might be an optional step that can be skipped. If this failure is not important - then just forget about it. But, if this failure is important - I want to know exactly everything: why and when did it fail. I want to know the whole stack trace, values of local variables, execution context, the total number of failures, and the number of affected users. I also want to be immediately notified of this important failure. And to be able to create a bug ticket from this failure in one click.

And yes, you got it correctly: it sounds like a job for Sentry, not logging.

Sentry

I either want a quick notification about some critical error with everything inside, or I want nothing: a peaceful morning with tea and youtube videos. There’s nothing in between for logging.

The third question is : can we instead apply business monitoring to make sure our app works? We don’t really care about exceptions and how to handle them. We care about the business value that we provide with our app. Sometimes your app does not raise any exception to be caught be Sentry. It can be broken in different ways. For example, your form validation can return errors when it should not normally happen. And you have zero exceptions, but a dysfunctional application. That’s where business monitoring shines!

PostHog

We can track different business metrics and make sure there are new orders, new comments, new users, etc. And if not - I want an emergency notification. I don’t want to waste extra money on reading logging information after angry clients will call or text me. Please, don’t treat your users as a monitoring service!

The last question is : do you normally expect do_something_complex() to fail? Like HTTP calls or database access. If so, don’t use exceptions , use Result monad instead. This way you can clearly indicate that something is going to fail. And act with confidence. And do not log anything. Just let it fail.

Logging is a side effect

One more important monad-related topic is the pureness of the function that has a logger call inside. Let’s compare two similar functions:

def compute(arg: int) -> float:
    return (arg / PI) * MAGIC_NUMBER

And:

def compute(arg: int):
    result = (arg / PI) * MAGIC_NUMBER
    logger.debug('Computation is:', result)
    return result

The main difference between these two functions is that the first one is a perfect pure function and the second one is IO -bound impure one.

What consequences does it have?

  1. We have to change our return type to IOResult[float] because logging is impure and can fail (yes, loggers can fail and break your app)
  2. We need to test this side effect. And we need to remember to do so! Unless this side effect is explicit in the return type

Wise programmers even use special Writer monad to make logging pure and explicit. And that requires to significantly change the whole architecture around this function.

We also might want to pass the correct logger instance, which probably implies that we have to use dependency injection based on RequiresContext monad.

All I want to say with all these abstractions is that proper logging architecture is hard. It is not just about writing logger.log() here and there. It is a complex process of creating proper abstractions, composing them, and maintaining strict layers of pure and impure code.

Is your team ready for this?

Logging is a subsystem

We used to log things into a single file. It was fun!

Even this simple setup required us to do periodical log file rotation. But still, it was easy. We also used grep to find things in this file. But, even then we had problems. Do you happen to have several servers? Good luck finding any required information in all of these files and ssh connections.

But, now it is not enough! With all these microservices, cloud-native, and other 2020-ish tools we need complex subsystems to work with logging. It includes (but is not limited to):

And here’s an example of what it takes to set this thing up with ELK stack:

version: '3.2'

services:
  elasticsearch:
    build:
      context: elasticsearch/
      args:
        ELK_VERSION: $ELK_VERSION
    volumes:
      - type: bind
        source: ./elasticsearch/config/elasticsearch.yml
        target: /usr/share/elasticsearch/config/elasticsearch.yml
        read_only: true
      - type: volume
        source: elasticsearch
        target: /usr/share/elasticsearch/data
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      ES_JAVA_OPTS: "-Xmx256m -Xms256m"
      ELASTIC_PASSWORD: changeme
      discovery.type: single-node
    networks:
      - elk

  logstash:
    build:
      context: logstash/
      args:
        ELK_VERSION: $ELK_VERSION
    volumes:
      - type: bind
        source: ./logstash/config/logstash.yml
        target: /usr/share/logstash/config/logstash.yml
        read_only: true
      - type: bind
        source: ./logstash/pipeline
        target: /usr/share/logstash/pipeline
        read_only: true
    ports:
      - "5000:5000/tcp"
      - "5000:5000/udp"
      - "9600:9600"
    environment:
      LS_JAVA_OPTS: "-Xmx256m -Xms256m"
    networks:
      - elk
    depends_on:
      - elasticsearch

  kibana:
    build:
      context: kibana/
      args:
        ELK_VERSION: $ELK_VERSION
    volumes:
      - type: bind
        source: ./kibana/config/kibana.yml
        target: /usr/share/kibana/config/kibana.yml
        read_only: true
    ports:
      - "5601:5601"
    networks:
      - elk
    depends_on:
      - elasticsearch

networks:
  elk:
    driver: bridge

volumes:
  elasticsearch:

Isn’t it a bit too hard?! Look, I just want to write strings into stdout . And then store it somewhere.

But no. You need a database. And a separate web-service. You also have to monitor your logging subsystem. And periodically update it, you also have to make sure that it is secure. And has enough resources. And everyone has access to it. And so on and so forth.

Of course, there are cloud providers just for logging. It might be a good idea to consider using them.

Logging is hard to manage

In case you are still using logging after all my previous arguments, you will find out that it requires a lot of discipline and tooling. There are several well-known problems:

  1. Logging should be very strict about its format. Do you remember that we are probably going to store our logs in a NoSQL database? Our logs need to be indexable. You would probably end up using structlog or a similar solution. In my opinion, this should be the default

  2. The next thing to fight is levels. All developers will use their own ideas on what is critical and what’s not. Unless you (as an architect) will write a clear policy that will cover most of the cases. You might also need to review it carefully. Otherwise, your logging database might blow up with useless data

  3. Your logging usage should be consistent! All people tend to write in their own style. There’s a linter for that! It will enforce:

     logger.info(
         'Hello {world}',
         extra={'world': 'Earth'},
     )
    

    Instead of:

     logger.info(
         'Hello {world}'.format(world='Earth'),
     )
    

    And many other edge cases.

  4. Logging should be business-oriented. I usually see people using logging with a minimal amount of useful information. For example, if you are logging an invalid current object state: it is not enough! You need to do more: you need to show how this object got into this invalid state. There are different approaches to this problem. Some use simple solutions like version history , some people use EventSourcing to compose their objects from changes. And some libraries log the entire execution context, logical steps that were taken, and changes made to the object. Like dry-python/stories ( docs on logging ). And here’s how the context looks like:

    ApplyPromoCode.apply
      find_category
      find_promo_code
      check_expiration
      calculate_discount (errored: TypeError)
    
    Context:
      category_id = 1024                # Story argument
      category = <example.Category>     # Set by ApplyPromoCode.find_category
      promo_code = <example.PromoCode>  # Set by ApplyPromoCode.find_promo_code
    

    See? It contains the full representation of what’s happened and how to recreate this error. Not just some random state information here and there. And you don’t even have to call logger yourself. It will handled for you. By the way, it even has a native Sentry integration , which is better in my opinion.

  5. You should pay attention to what you log. There are GDPR rules on logging and specialized security audits for your logs. Common sense dictates that logging passwords, credit cards, emails, etc is not secure. But, sadly, common sense is not enough. This is a complex process to follow.

There are other problems to manage as well. My point here is to show that you will need senior people to work on that: by creating policies, writing down processes, and setting up your logging toolchain.

What to do instead?

Let’s do a quick recap:

  1. Logging does not make much sense in monitoring and error tracking. Use better tools instead: like error and business monitorings with alerts
  2. Logging adds significant complexity to your architecture. And it requires more testing. Use architecture patterns that will make logging an explicit part of your contracts
  3. Logging is a whole infrastructure subsystem on its own. And quite a complex one. You will have to maintain it or to outsource this job to existing logging services
  4. Logging should be done right. And it is hard. You will have to use a lot of tooling. And you will have to mentor developers that are unaware of the problems we have just discussed

Is logging worth it? You should make an informed decision based on this knowledge and your project requirements. In my opinion, it is not required for the most of the regular web apps.

Please, get this right. I understand that logging can be really useful (and sometimes even the only source of useful information). Like for on-premise software, for example. Or for initial steps when your app is not fully functioning yet. It would be hard to understand what is going on without logging. I am fighting an “overlogging” culture. When logs are used just for no good reason. Because developers just do it without analyzing costs and tradeoffs.

Are you joining my side?

A lot of developers consider logging as a silver bullet to fix all things at once. And they don't realize how hard it actually is to work with logging properly.

Conditional coverage

Published: 2020-02-25T00:00:00+00:00
Updated: 2020-02-25T00:00:00+00:00
UTC: 2020-02-25 00:00:00+00:00
URL: https://sobolevn.me/2020/02/conditional-coverage

Typed functional Dependency Injection in Python

Published: 2020-02-02T00:00:00+00:00
Updated: 2020-02-02T00:00:00+00:00
UTC: 2020-02-02 00:00:00+00:00
URL: https://sobolevn.me/2020/02/typed-functional-dependency-injection

Complexity Waterfall

Published: 2019-10-13T00:00:00+00:00
Updated: 2019-10-13T00:00:00+00:00
UTC: 2019-10-13 00:00:00+00:00
URL: https://sobolevn.me/2019/10/complexity-waterfall

Testing Django Migrations

Published: 2019-10-13T00:00:00+00:00
Updated: 2019-10-13T00:00:00+00:00
UTC: 2019-10-13 00:00:00+00:00
URL: https://sobolevn.me/2019/10/testing-django-migrations

Dear internet,
Content Preview

Dear internet,

Today we have screwed up by applying a broken migration to the running production service and causing a massive outage for several hours… Because the rollback function was terribly broken as well.

As a result, we had to restore a backup that was made several hours ago, losing some new data.

Why did it happen?

The easiest answer is just to say: “Because it is X’s fault! He is the author of this migration, he should learn how databases work”. But, it is counterproductive.

Instead, as a part of our “Blameless environment” culture, we tend to put all the guilt on the CI. It was the CI who put the broken code into the master branch. So, we need to improve it!

We always write post-mortems for all massive incidents that we experience. And we write regression tests for all bugs, so they won’t happen again. But, this situation was different, since it was a broken migration that worked during the CI process, and it was hard or impossible to test with the current set of instruments.

So, let me explain the steps we took to solve this riddle.

Existing setup

We use a very strict django project setup with several quality checks for our migrations:

  1. We write all data migration as typed functions in our main source code. Then we check everything with mypy and test as regular functions
  2. We lint migration files with wemake-python-styleguide , it drastically reduces the possibility of bad code inside the migration files
  3. We use tests that automatically set up the database by applying all migrations before each session
  4. We use django-migration-linter to find migrations that are not suited for zero-time deployments
  5. And then we review the code by two senior people
  6. Then we test everything manually with the help of the review apps

And somehow it is still not enough: our server was dead.

When writing the post-mortem for this bug, I spotted that data in our staging and production services were different. And that’s why our data migration crushed and left one of the core tables in the broken state.

So, how can we test migrations on some existing data?

django-test-migrations

That’s where django-test-migrations comes in handy.

The idea of this project is simple:

  1. Set some migration as a starting point
  2. Create some model’s data that you want to test
  3. Run the new migration that you are testing
  4. Assert the results!

Let’s illustrate it with some code samples. Full source code is available here .

Here’s the latest version of our model:

class SomeItem(models.Model):
    """We use this model for testing migrations."""

    string_field = models.CharField(max_length=50)
    is_clean = models.BooleanField()

This is a pretty simple model that serves only one purpose: to illustrate the problem. is_clean field is related to the contents of string_field in some manner. While the string_field itself contains only regular text data.

Imagine that you have a data migration that looks like so:

def _is_clean_item(instance: 'SomeItem') -> bool:
    """
    Pure function to the actual migration.

    Ideally, it should be moved to ``main_app/logic/migrations``.
    But, as an example it is easier to read them together.
    """
    return ' ' not in instance.string_field

def _set_clean_flag(apps, schema_editor):
    """
    Performs the data-migration.

    We can't import the ``SomeItem`` model directly as it may be a newer
    version than this migration expects.

    We are using ``.all()`` because
    we don't have a lot of ``SomeItem`` instances.
    In real-life you should not do that.
    """
    SomeItem = apps.get_model('main_app', 'SomeItem')
    for instance in SomeItem.objects.all():
        instance.is_clean = _is_clean_item(instance)
        instance.save(update_fields=['is_clean'])

def _remove_clean_flags(apps, schema_editor):
    """
    This is just a noop example of a rollback function.

    It is not used in our simple case,
    but it should be implemented for more complex scenarios.
    """

class Migration(migrations.Migration):
    dependencies = [
        ('main_app', '0002_someitem_is_clean'),
    ]

    operations = [
        migrations.RunPython(_set_clean_flag, _remove_clean_flags),
    ]

And here’s how we are going to test this migration. At first, we will have to set some migration as a starting point:

old_state = migrator.before(('main_app', '0002_someitem_is_clean'))

Then we have to get the model class. We cannot use direct import from models because the model might be different, since migrations change them from our stored definition:

SomeItem = old_state.apps.get_model('main_app', 'SomeItem')

Then we need to create some data that we want to test:

# One instance will be `clean`, the other won't be:
SomeItem.objects.create(string_field='a') # clean
SomeItem.objects.create(string_field='a b') # contains whitespace, is not clean

Then we will run the migration that we are testing and get the new project state:

new_state = migrator.after(('main_app', '0003_auto_20191119_2125'))
SomeItem = new_state.apps.get_model('main_app', 'SomeItem')

And the last step: we need to make some assertions on the resulting data. We have created two model instances before: one clean and one with the whitespace. So, let’s check that:

assert SomeItem.objects.count() == 2
# One instance is clean, the other is not:
assert SomeItem.objects.filter(is_clean=True).count() == 1
assert SomeItem.objects.filter(is_clean=False).count() == 1

And that’s how it works! Now we have an ability to test our schema and data transformations with ease. Complete test example:

@pytest.mark.django_db
def test_main_migration0002(migrator):
    """Ensures that the second migration works."""
    old_state = migrator.before(('main_app', '0002_someitem_is_clean'))
    SomeItem = old_state.apps.get_model('main_app', 'SomeItem')
    # One instance will be `clean`, the other won't be:
    SomeItem.objects.create(string_field='a')
    SomeItem.objects.create(string_field='a b')

    assert SomeItem.objects.count() == 2
    assert SomeItem.objects.filter(is_clean=True).count() == 2

    new_state = migrator.after(('main_app', '0003_auto_20191119_2125'))
    SomeItem = new_state.apps.get_model('main_app', 'SomeItem')

    assert SomeItem.objects.count() == 2
    # One instance is clean, the other is not:
    assert SomeItem.objects.filter(is_clean=True).count() == 1

By the way, we also support raw unittest cases .

Conclusion

Don’t be sure about your migrations. Test them!

You can test forward and rollback migrations and their ordering with the help of django-test-migrations . It is simple, friendly, and already works with the test framework of your choice.

I also want to say “thank you” to these awesome people . Without their work it would take me much longer to come up with the working solution.

When migrating schema and data in Django multiple things can go wrong. It is better to test what you are doing in advance.