Feed: James Bennett

Entries found: 15

Litestar is worth a look

Published: 2025-08-06T14:25:43-05:00
Updated: 2025-08-06T15:55:00.219053-05:00
UTC: 2025-08-06 20:55:00+00:00
URL: https://www.b-list.org/weblog/2025/aug/06/litestar/

A few years ago at work, I had a project which offered an opportunity to look at the new generation of async-first, type-hint-driven Python web frameworks. For reasons which aren’t particularly relevant today, on that project I ended up choosing Litestar, which is the one that doesn’t have a ravenous all-consuming hype machine surrounding it. And I’m very glad I did, because today I’m more convinced than ever it was the right choice, and for the last 18 months or so every new project I’ve started at my day job has been built with Litestar.
Content Preview

A few years ago at work, I had a project which offered an opportunity to look at the new generation of async-first, type-hint-driven Python web frameworks. For reasons which aren’t particularly relevant today, on that project I ended up choosing Litestar , which is the one that doesn’t have a ravenous all-consuming hype machine surrounding it. And I’m very glad I did, because today I’m more convinced than ever it was the right choice, and for the last 18 months or so every new project I’ve started at my day job has been built with Litestar.

But even if you’re someone who does Python web apps for a living, and even if you’re someone who builds asynchronous type-hint-driven web apps, you might not be familiar with this absolute gem of the Python web ecosystem, and today I want to remedy that.

A taste

Here’s the traditional single-file-app demo:

from litestar import Litestar, get


@get("/greet")
async def greet(name: str) -> str:
    return f"Hi, {name}!"


app = Litestar([greet])

You save this as app.py , run with litestar run or hand it directly to the ASGI server of your choice, and it launches a web application. You go to /greet?name=Bob and it replies “Hi, Bob!”. Leave out the name parameter and it responds with an HTTP 400 telling you the name parameter is required.

So what. Big deal. The FastAPI Evangelism Strike Force will be along soon to bury you under rocket-ship emoji while explaining that FastAPI does the same thing but a million times better. And if you’re a Java person used to Spring, or a . NET person used to ASP . NET MVC , well, there’s nothing here that’s new to you; you’ve had this style of annotation/signature-driven framework for years (and in fact one thing I like about Litestar is how often it reminds me of the good parts of those frameworks). And did anyone tell you FastAPI does this, too! 🚀🚀🚀🚀🚀🚀🚀🚀🚀

But there are a lot of things that make Litestar stand out to me in the Python world. I’m going to pick out three to talk about today, and one of them is hiding in plain sight in that simple example application.

What’s in a name?

You might see older material referring to Litestar as “Starlite”, which was its original name.

Starlette is a toolkit for doing async Python web development, which can be used standalone or as a component in a more complex library or framework. FastAPI still uses Starlette under the hood, for example. And Litestar originally was built on Starlette too, and was named “Starlite”, presumably in recognition of that. Over time, though, it dropped the Starlette dependency in favor of its own implementations for that functionality, and people on social media complained that the “Starlite” name was confusing, especially since Star lette was no longer being used. So the project which had been “Starlite” was renamed to Litestar for version 2.0, released in 2023, and has had that name ever since.

Scaling (the other kind)

It’s a bit unfortunate that the term “scaling” is almost always assumed to mean handling larger and larger quantities of traffic, because that’s only one axis on which any given piece of of technology can “scale” (and, I’d argue, possibly the least important one). The type of scaling I want to talk about here is scaling of a codebase: how does something (in this case, a web framework) help or hinder you as you deal with different amounts of code?

Django, for example, has a reputation for not scaling “down” all that well. You can do it if you really want to, and every so often someone will come up with a new demo of doing a Django project in a single Python file, but it’s just not something that comes naturally to Django. Quite the opposite: if you work through the official beginner Django tutorial and do things the “Django way”, you’ll have around a dozen files laid out in a specific structure of directories and subdirectories before you’ve written a single meaningful line of your own code.

But “micro” frameworks have often had the opposite problem: they’re great at starting out with a tiny single-file application, and then get painful as your codebase grows and needs to spread out (single-file Django approaches have the same problem: you have to do a lot of work to get a “micro-Django” working, and then you have to undo all that work as soon as the code grows large enough to be worth splitting across multiple files).

Let’s look at an example. Here’s a FastAPI equivalent of the basic Litestar application I showed above:

from fastapi import FastAPI


app = FastAPI()

@app.get("/greet")
async def greet(name: str) -> str:
    return f"Hello, {name}!"

Notice that the get() decorator here is attached to the application object. This is a common pattern (Flask/Quart do the same thing, for example, and Starlette used to but has deprecated its entire decorator-based routing system), but it creates a problem once you have multiple files. You need to import the main application object into the other files in order to decorate the routes, but you need to import the other files into your “main” application file to make sure the route registrations are visible from there, and now you have a circular import, and that doesn’t work.

The general solution these frameworks offer is some sort of alternative sub-application object which can act as a per-file route registry that’s safe to import into the file where your application object is defined. FastAPI calls this object an “ API router”; Flask/Quart call it a “blueprint”. Either way, it’s a necessary construct for those frameworks because their route decorators are always bound to some parent object, either the application object in a single-file app or an “ API router”/“blueprint”/etc. object in a multi-file app.

That solves the circular-import problem, but creates a new issue: the whiz-bang quickstart demos of “micro” frameworks generally register all the example routes on the application object in a single file in order to keep everything as simple and flashy as possible, but now in order to build a real application (which will almost never stay in a single file) you’ll need to use a different mechanism, or start out following the demo and then switch later on. You also have to know about that different mechanism; in one framework’s documentation that I looked at, you can (at the time I’m writing this post) apparently get 40 pages into the user guide before encountering the section on how to register routes in a multi-file app 😖😖😖.

Litestar avoids this entire mess by having the route decorators be standalone functions, not bound to a parent application or application-like object. This may seem like a small thing to focus on, but if you’ve spent time with popular Python microframeworks you’ve probably had to deal with the transition from single- to multi-file applications.

More importantly, this small change in approach frees up Litestar’s documentation to introduce route-grouping constructs early on and to present them as part of a coherent layered architecture/configuration concept rather than as an escape hatch for avoiding circular imports. Which is great, because Litestar’s layered architecture is one of its best features: its grouping constructs, and their ability to share configuration, offer an elegant way to compose functionality. For example, a common pattern I use when writing a set of CRUD endpoints looks like this:

from litestar import Router
from litestar.di import Provide

# Imagine some CRUD routes for widgets defined here...

_write_widget_router = Router(
    guards=[some_auth_function],
    route_handlers=[
        create_widget,
        delete_widget,
        update_widget,
    ]
)

widget_router = Router(
    dependencies={"widget_dependency": Provide(some_widget_dependency)},
    path="/widgets",
    route_handlers=[
        get_widget,
        get_widget_list,
        _write_widget_router,
    ]
)

This provides a single “public” Router instance with all the widget routes, all of which have access to the same core dependencies, but with the data-modifying routes also having auth applied. That composability is extremely powerful, and is less obvious if the “router” has to be introduced initially as a way to solve circular-import problems.

Litestar’s approach also means it’s easy to do things like register a single route multiple times, each with different configuration. Which enables use cases like:

  • Different authentication/authorization schemes for each registration. For example, a data-editing route might be written once, and registered once under a router which applies API key auth for machine-to-machine requests, then registered again under a router which uses session auth for interaction by a human user.
  • Different sets of dependencies for each registration. For example, a route which queries and returns a list of widgets might just declare that it accepts an argument of type WidgetRepository , and leave it up to the router configuration to decide whether to dependency-inject one that sees all widgets, or perhaps only a subset, or only those which are active, etc.

If you know what you’re doing, you can emulate some of this in the FastAPI/Flask style of bound route registration, but the techniques you’ll end up using for that feel to me like fighting against the framework, which is something I usually want to avoid.

Not to be too Pydantic

Pydantic is a popular package for defining schema objects which perform validation and serialization/deserialization, driven by Python type annotations, and one major use case for this is the input/output schemas of web applications. FastAPI appears to use Pydantic exclusively, which comes with both upsides and downsides. Pydantic is very useful and very powerful, of course, but it also means FastAPI is somewhat limited by what Pydantic can support: mostly, this is Pydantic’s own classes, and the Python standard library’s dataclasses .

One crucial limitation is an inability to derive validation/serialization behavior directly from SQLAlchemy ORM classes, even though they both support a very similar type-annotation-based declaration format. Which means that to use SQLAlchemy with a Pydantic-only framework (and SQLAlchemy is basically the standard database toolkit and ORM for Python), you either have to write out the shape of your data multiple times—once for SQLAlchemy, and then at least one more time (possibly more than one time) for Pydantic—or turn to a third-party package to help bridge the gap. FastAPI’s author worked around this by writing a new DB toolkit which combines SQLAlchemy and Pydantic , and pushing it in FastAPI’s documentation.

Litestar, meanwhile, supports Pydantic, but is not tightly coupled to Pydantic, which gives a bit more flexibility. By default Litestar lets you define input/output schemas using Pydantic models, dataclasses , or msgspec ; ships with plugins to enable the use of attrs and of SQLAlchemy models; and provides a protocol for writing your own serialization plugins to extend support to other kinds of objects.

That’s very convenient already, but the convenience is amplified by Litestar’s system for automatically deriving data-transfer objects from data-access or domain objects. Suppose, for example, that we have the following SQLAlchemy model class:

from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column

Base = DeclarativeBase()

class Widget(Base):
    __tablename__ = "widget"

    id: Mapped[int] = mapped_column(primary_key=True)
    internal_notes: Mapped[str]
    sku: Mapped[str] = mapped_column(unique=True)
    name: Mapped[str]
    price_cents: Mapped[int]

In a Pydantic-only world, we’d need to write multiple Pydantic models representing different use cases:

  • A “read” schema for use in HTTP responses, which would probably not include the internal_notes field and probably also not id (since sku is more likely to be the public identifier)
  • A “write” schema for creating widgets, which would exclude id since that likely is auto-generated on insert
  • Another “write” schema for updating widgets, setting all fields optional to allow partial update

As well as possibly more schemas like an admin-view “read” schema that does include the internal fields, etc. Even if you get clever and use inheritance to share field definitions among all these Pydantic classes, you still will write out the full set of fields for widgets at least twice, and the second time it will be fragmented across multiple Pydantic classes, creating the risk of making a change to the ORM model and forgetting to update all the corresponding field definitions in the Pydantic models.

Litestar’s approach is a significant improvement on this. For example, here’s how to use Litestar’s DTO helpers to define the “read” schema:

from litestar.dto import DTOConfig
from litestar.plugins.sqlalchemy import SQLAlchemyDTO

class ReadWidget(SQLAlchemyDTO[Widget]):
    config = DTOConfig(exclude={"id", "internal_notes"})

This will give you a DTO class containing all the fields of the Widget ORM model except the two explicitly excluded, and will derive that set of fields, and the correct data types, from introspecting Widget . It will also automatically handle conversion to and from instances of Widget when you specify it as the input or return DTO type of a route. Similarly, it’s possible to declare a list of fields to include, or to re-map field names for public consumption, or to declare a DTO which makes fields optional for partial updates. This means there’s only one canonical definition of the fields—on the original class, which might be a SQLAlchemy ORM model, might be a dataclass , etc.—and it doesn’t have to be repeated in the DTOs because the DTOs will always derive their field definitions directly from the source class you point them at.

Of course, there are going to be cases where your DTOs are sufficiently different from your DAOs and domain objects that this isn’t a big help, but my own experience is that “the DTO is a subset of the DAO ’s fields” is extremely common in real-world applications, so Litestar’s approach really pays off in both reduced boilerplate and reduced errors from manual “transcription” of fields between different class definitions.

Alchemical architecture

I wasn’t exaggerating earlier when I said that SQLAlchemy is the database toolkit and ORM for Python. While there are others out there, the only one I’m aware of that sees anything close to SQLAlchemy’s usage is the Django ORM , and only because it’s built into and tightly integrated with Django. So if you’re going to be writing a database-backed web application in Python, and you’re not doing Django, you are almost certainly going to be using SQLAlchemy.

And Litestar makes that easy. While officially remaining agnostic as to whether you even have a persistence layer, it still includes good integrations for SQLAlchemy: the serialization plugin mentioned earlier allows the direct use of SQLAlchemy ORM classes as input and output schemas; the DTO helpers can derive subsets and remappings of fields from SQLAlchemy ORM classes; and Litestar also ships with a plugin that manages a SQLAlchemy engine and per-request ORM session for you , as well as a single SQLAlchemy mega-plugin combining all the SQLAlchemy plugins’ functionality.

So it’s already pretty convenient to use SQLAlchemy in Litestar applications. But there’s more! The Litestar team also maintains the excellent Advanced Alchemy library which provides a bunch of useful features on top of SQLAlchemy. While Advanced Alchemy is framework-agnostic, Litestar’s SQLAlchemy plugin makes use of it and re-exports much of its functionality, giving you access to it automatically, and it does include Litestar-specific helpers for registering certain utility classes with Litestar’s dependency injection.

Advanced Alchemy provides a lot of quality-of-life improvements for SQLAlchemy, including a variety of base classes and mixins and data types doing useful things like database-agnostic big-integer primary keys, automatic create/update timestamps, UUID -keyed models, a proper UTC timestamp type, and a JSON type which chooses the best column type for your database. There are also command-line helpers for database management (including creating and working with Alembic migrations ), database dumping and seeding to/from JSON , and a lot more.

But the place where Advanced Alchemy really shines is in providing a generic repository implementation (both sync and async flavors) on top of SQLAlchemy models, along with a service-layer abstraction and helpers to integrate them into Litestar’s dependency injection system.

Here’s a basic example using the Widget class from above:

from litestar.plugins.sqlalchemy import repository

class WidgetRepository(repository.SQLAlchemyAsyncRepository[Widget]):
    model_type = Widget

WidgetRepository will have all the methods you’d expect— list() , get_one() , add() , delete() , etc.—automatically derived from the Widget model. And let me just say that having repository implementations automatically derived from any SQLAlchemy model, with not just basic CRUD operations but also things like paginated fetches, is a massive productivity boost compared to just using vanilla SQLAlchemy. It’s maybe not quite on the level of Django’s generic views, but it’s a big step in that direction, and you probably could produce something like Django’s generic views with Litestar and Advanced Alchemy if you wanted to (perhaps one day, in my copious free time, I’ll even make an attempt at it).

I know it may seem strange to hear me saying this, since a few years ago I went on record as being strongly against these sorts of abstractions—specifically service layers—in Django. And I still think you absolutely should not try to retrofit repository or service-layer abstractions onto Django! They’re not the native patterns of Django’s architecture, and instead I think you should stick to what I recommended back then , which is to leverage Django’s own architecture, especially its “manager” abstraction, rather than trying to force abstractions onto it that don’t fit.

I also still think there are a lot of bad use cases for repositories and service layers that people should avoid, but that’s a digression which should probably become its own post, so I’ll just say for now that I think it’s fine to use repositories and service layers as an organizing principle when you’re using a less-structured framework which doesn’t express strong opinions about how you should lay out your code. And that’s exactly what I do when working with Litestar.

A lightweight star of Python

There are plenty of other features and conveniences in Litestar, many of which I use daily. Its auth system, supporting both simple guard functions and middlewares for attaching identity and complex authn/authz logic to requests. Its “stores” framework , which makes caching and similar tasks convenient. Its logging integrations which support both the Python standard library’s logging module and popular third-party tools like structlog . Its built-in support for transforming errors to standard “problem details” structures . Its built-in support for recording and exporting metrics in standard Prometheus or OpenTelemetry formats . Its htmx support .

You can do this stuff in other microframeworks, but it typically involves a lot of tracking down of third-party add-ons and/or writing your own glue code to integrate things. Litestar manages to keep the “microframework” feel when starting a new project while also having all these nice bits optionally available with the framework itself when and if you decide you want them, and that’s nothing to sneeze at. That’s what I was getting at earlier when I said it reminds me of the things I like in certain frameworks from other languages. Litestar doesn’t feel, to me, like it’s trying to be a replacement for any pre-existing Python web framework. It’s not trying to be the next Django or the next Flask or whatever; instead, it feels to me like a Pythonic take on the good parts of something like Spring Boot (and the way I like to set it up, doing things like using svcs behind the scenes as a service locator to feed things to both Litestar’s and pytest’s dependency injection, makes it feel even more that way).

I could go on for a lot longer listing things I like about Litestar, and probably wind up way too far into my own subjective preferences, but hopefully I’ve given you enough of a realistic taste of what it offers that, next time you’re about to build a Python web app, you might decide to reach for 💡⭐ to carry you to the moon 🚀🚀🚀.

Introducing DjangoVer

Published: 2024-11-18T02:04:01-06:00
Updated: 2024-11-18T02:04:26.772204-06:00
UTC: 2024-11-18 08:04:26+00:00
URL: https://www.b-list.org/weblog/2024/nov/18/djangover/

Version numbering is hard, and there are lots of popular schemes out there for how to do it. Today I want to talk about a system I’ve settled on for my own Django-related packages, and which I’m calling “DjangoVer”, because it ties the version number of a Django-related package to the latest Django version that package supports.
Content Preview

Version numbering is hard, and there are lots of popular schemes out there for how to do it. Today I want to talk about a system I’ve settled on for my own Django-related packages, and which I’m calling “DjangoVer”, because it ties the version number of a Django-related package to the latest Django version that package supports.

But one quick note to start with: this is not really “introducing” the idea of DjangoVer, because I know I’ve used the name a few times already in other places. I’m also not the person who invented this, and I don’t know for certain who did — I’ve seen several packages which appear to follow some form of DjangoVer and took inspiration from them in defining my own take on it.

Django’s version scheme: an overview

The basic idea of DjangoVer is that the version number of a Django-related package should tell you which version of Django you can use it with. Which probably doesn’t help much if you don’t know how Django releases are numbered, so let’s start there. In brief:

  • Django issues a “feature release” — one which introduces new features — roughly once every eight months. The current feature release series of Django is 5.1.
  • Django issues “bugfix releases” — which fix bugs in one or more feature releases — roughly once each month. As I write this, the latest bugfix release for the 5.1 feature release series is 5.1.3 (along with Django 5.0.9 for the 5.0 feature release series, and Django 4.2.16 for the 4.2 feature release series).
  • The version number scheme is MAJOR.FEATURE.BUGFIX , where MAJOR , FEATURE , and BUGFIX are integers.
  • The FEATURE component starts at 0 , then increments to 1 , then to 2 , then MAJOR is incremented and FEATURE goes back to 0 . BUGFIX starts at 0 with each new feature release, and increments for the bugfix releases for that feature release.
  • Every feature release whose FEATURE component is 2 is a long-term support (“ LTS ”) release.

This has been in effect since Django 2.0 was released, and the feature releases have been: 2.0, 2.1, 2.2 ( LTS ); 3.0, 3.1, 3.2 ( LTS ); 4.0, 4.1, 4.2 ( LTS ); 5.0, 5.1. Django 5.2 ( LTS ) is expected in April 2025, and then eight months later (if nothing is changed) will come Django 6.0.

I’ll talk more about SemVer in a bit, but it’s worth being crystal clear that Django does not follow Semantic Versioning, and the MAJOR number is not a signal about API compatibility. Instead, API compatibility runs LTS -to- LTS , with a simple principle: if your code runs on a Django LTS release and raises no deprecation warnings , it will run unmodified on the next LTS release. So, for example, if you have an application that runs without deprecation warnings on Django 4.2 LTS , it will run unmodified on Django 5.2 LTS (though at that point it might begin raising new deprecation warnings, and you’d need to clear them before it would be safe to upgrade any further).

DjangoVer, defined

In DjangoVer, a Django-related package has a version number of the form DJANGO_MAJOR.DJANGO_FEATURE.PACKAGE_VERSION , where DJANGO_MAJOR and DJANGO_FEATURE indicate the most recent feature release series of Django supported by the package, and PACKAGE_VERSION begins at zero and increments by one with each release of the package supporting that feature release of Django.

Since the version number only indicates the newest Django feature release supported, a package using DjangoVer should also use Python package classifiers to indicate the full range of its Django support (such as Framework :: Django :: 5.1 to indicate support for Django 5.1 — see examples on PyPI ).

But while Django takes care to maintain compatibility from one LTS to the next, I do not think DjangoVer packages need to do that; they can use the simpler approach of issuing deprecation warnings for two releases, and then making the breaking change. One of the stated reasons for Django’s LTS -to- LTS compatibility policy is to help third-party packages have an easier time supporting Django releases that people are actually likely to use; otherwise, Django itself generally just follows the “deprecate for two releases, then remove it” pattern. No matter what compatibility policy is chosen, however, it should be documented clearly, since DjangoVer explicitly does not attempt to provide any information about API stability/compatibility in the version number.

That’s a bit wordy, so let’s try an example:

  • If you started a new Django-related package today, you’d (hopefully) support the most recent Django feature release, which is 5.1. So the DjangoVer version of your package should be 5.1.0 .
  • As long as Django 5.1 is the newest Django feature release you support, you’d increment the third digit of the version number. As you add features or fix bugs you’d release 5.1.1 , 5.1.2 , etc.
  • When Django 5.2 comes out next year, you’d (hopefully) add support for it. When you do, you’d set your package’s version number to 5.2.0 . This would be followed by 5.2.1 , 5.2.2 , etc., and then eight months later by 6.0.0 to support Django 6.0.
  • If version 5.1.0 of your package supports Django 5.1, 5.0, and 4.2 (the feature releases receiving upstream support from Django at the time of the 5.1 release), it should indicate that by including the Framework :: Django , Framework :: Django :: 4.2 , Framework :: Django :: 5.0 , and Framework :: Django :: 5.1 classifiers in its package metadata.

Why another version system?

Some of you probably didn’t even read this far before rushing to instantly post the XKCD “Standards” comic as a reply. Thank you in advance for letting the rest of us know we don’t need to bother listening to or engaging with you. For everyone else: here’s why I think in this case adding yet another “standard” is actually a good idea.

The elephant in the room here is Semantic Versioning (“SemVer”). Others have written about some of the problems with SemVer, but I’ll add my own two cents here: “compatibility” is far too complex and nebulous a concept to be usefully encoded in a simple value like a version number. And if you want my really cynical take, the actual point of SemVer in practice is to protect developers of software from users, by providing endless loopholes and ways to say “sure, this change broke your code, but that doesn’t count as a breaking change”. It’ll turn out that the developer had a different interpretation of the documentation than you did, or that the API contract was “underspecified” and now has been “clarified”, or they’ll just throw their hands up, yell “Hyrum’s Law” and say they can’t possibly be expected to preserve that behavior.

A lot of this is rooted in the belief that changes, and especially breaking changes, are inherently bad and shameful, and that if you introduce them you’re a bad developer who should be ashamed. Which is, frankly, bullshit. Useful software almost always evolves and changes over time, and it’s unrealistic to expect it not to. I wrote about this a few years back in the context of the Python 2/3 transition :

Though there is one thing I think gets overlooked a lot: usually, the anti-Python-3 argument is presented as the desire of a particular company, or project, or person, to stand still and buck the trend of the world to be ever-changing.

But really they’re asking for the inverse of that. Rather than being a fixed point in a constantly-changing world, what they really seem to want is to be the only ones still moving in a world that has become static around them. If only the Python team would stop fiddling with the language! If only the maintainers of popular frameworks would stop evolving their APIs! Then we could finally stop worrying about our dependencies and get on with our real work! Of course, it’s logically impossible for each one of those entities to be the sole mover in a static world, but pointing that out doesn’t always go well.

But that’s a rant for another day and another full post all its own. For now it’s enough to just say I don’t believe SemVer can ever deliver on what it promises. So where does that leave us?

Well, if the version number can’t tell you whether it’s safe to upgrade from one version to another, perhaps it can still tell you something useful. And for me, when I’m evaluating a piece of third-party software for possible use, one of the most important things I want to know is: is someone actually maintaining this? There are lots of potential signals to look for, but some version schemes — like CalVer — can encode this into the version number. Want to know if the software’s maintained? With CalVer you can guess a package’s maintenance status, with pretty good accuracy, from a glance at the version number.

Over the course of this year I’ve been transitioning all my personal non-Django packages to CalVer for precisely this reason. Compatibility, again, is something I think can’t possibly be encoded into a version number, but “someone’s keeping an eye on this” can be. Even if I’m not adding features to something, Python itself does a new version every year and I’ll push a new release to explicitly mark compatibility (as I did recently for the release of Python 3.13). That’ll bump the version number and let anyone who takes a quick glance at it know I’m still there and paying attention to the package.

For packages meant to be used with Django, though, the version number can usefully encode another piece of information: not just “is someone maintaining this”, but “can I use this with my Django installation”. And that is what DjangoVer is about: telling you at a glance the maintenance and Django compatibility status of a package.

DjangoVer in practice

All of my own personal Django-related packages are now using DjangoVer, and say so in their documentation. If I start any new Django-related projects they’ll do the same thing.

A quick scroll through PyPI turns up other packages doing something that looks similar; django-cockroachdb and django-snowflake , for example, versioned their Django 5.1 packages as “5.1”, and explicitly say in their READMEs to install a package version corresponding to the Django version you use (they also have a maintainer in common, who I suspect of having been an early inventor of what I’m now calling “DjangoVer”).

If you maintain a Django-related package, I’d encourage you to at least think about adopting some form of DjangoVer, too. I won’t say it’s the best, period, because something better could always come along, but in terms of information that can be usefully encoded into the version number, I think DjangoVer is the best option I’ve seen for Django-related packages.

Three Django wishes

Published: 2024-11-04T01:46:14-06:00
Updated: 2024-11-04T23:21:58.371721-06:00
UTC: 2024-11-05 05:21:58+00:00
URL: https://www.b-list.org/weblog/2024/nov/04/django-wishes/

’Tis the season when people are posting their “Django wishlists”, for specific technical or organizational or community initiatives they’d like to see undertaken. Here are a few examples from around the Django community:
Content Preview

’Tis the season when people are posting their “Django wishlists”, for specific technical or organizational or community initiatives they’d like to see undertaken. Here are a few examples from around the Django community:

So, in the spirit of the season, here is my own list, which I’ve narrowed down to three wishes (in the tradition of many stories about wishes), consisting of one organizational item and two technical ones.

Pass the torch

This one requires a bit of background, so please bear with me.

The Django Software Foundation — usually just abbreviated “ DSF ” — is the nonprofit organization which officially “owns” Django. It’s the legal holder of all the intellectual property, including both the copyright to the original Django codebase (generously donated by the Lawrence Journal-World, where it was first developed) and the trademarks, such as the registered trademark on the name “Django” itself. The DSF does a lot (and could do more, with a bigger budget) to support the Django community, and offers financial support to the development of Django itself, but does not directly develop Django, or oversee Django’s development or technical direction.

Originally, that job went to Django co-creators Adrian Holovaty and Jacob Kaplan-Moss. They granted commit permissions to a growing team of collaborators, but remained the technical leaders of the Django project until 2014, when they stepped aside and a new body, called the Technical Board, was introduced to replace them. The Technical Board was elected by the Django committers — by this point usually referred to as “Django Core” — and although the committers continued to have broad authority to make additions or changes to Django’s codebase, the Technical Board became the ultimate decision-maker for things that needed a tie-breaking vote, or that were too large for a single committer to do solo (usually via Django Enhancement Proposals, or DEPs, modeled on the processes of many other open-source projects, including Python’s “PEPs”).

One thing the DSF has done is use some of its funds on the Django Fellowship program , which pays contractors (the “Django Fellows”) to carry out tasks like ticket triage, pull-request review, etc. which would otherwise rely on volunteer labor (with all the problems that involves).

But the system of “Django Core” committers and Technical Board did not work out especially well. Many of the committers were either intermittently active or just completely inactive, new committers were added rarely if ever, and it was unclear what sort of path there was (or even if there was a path) for a motivated contributor to work their way toward committer status. About the only thing that did work well was the Fellowship program, which largely was what kept Django running as a software project toward the end of that era.

This caused a lot of debates focused on the theme of what to do about “Django Core” and how to reform the project and get it back on a healthy footing. The end result of that was a Django Enhancement Proposal numbered as DEP 10, which I spent most of 2018 and 2019 working on. I wrote an explanation at the time , and I’ll just link it here and mention that DEP 10 (which passed in early 2020 ) kept the Technical Board as a tie-breaking and oversight body, and introduced two other main roles — “Mergers” and “Releasers” — which have mostly but not exclusively been filled by the Django Fellows. The first DEP 10 Technical Board drafted and passed another DEP , DEP 12, renaming themselves to “Steering Council” (similar to Python’s technical governing body, but a name I’ve never liked because the Django version doesn’t meaningfully “steer” Django) and making a few tweaks.

So, that brings us to the present day. Where, sadly, the DEP 10/12 era is looking like as much of a failure as the preceding “Django Core” + committer-elected Technical Board era. The DEP 10 Technical Boards/Steering Councils have been dysfunctional at best, and there’s been no influx of new people from outside the former “Django Core”. A stark example: I ran for the the Steering Council last year to try to work on fixing some of this, but the Steering Council election attracted only four total candidates for five seats, all of them former “Django Core” members.

Recently there was a lot of discussion on the DSF members’ forum about what to do with the Steering Council, and a few attempts to take action which failed in frustrating ways. The end result was the resignation of two Steering Council members, which brought the group below quorum and has automatically triggered an election (though one that will run under the existing DEP 10/12 rules, since triggering an election locks the eligibility and election rules against changes).

I believe the ongoing inability to develop stable technical governance and meaningful turnover of technical leadership is the single greatest threat to Django’s continued viability as a project. This is an unfortunate vindication of what I said six years ago in that blog post about developing DEP 10:

Django’s at risk of being badly off in the future; for some time now, the project has not managed to bring in new committers at a sufficient rate to replace those who’ve become less active or even entirely inactive, and that’s not sustainable for much longer.

The good news is there’s a new generation of contributors who I believe are more than ready to take up the technical leadership of Django, and even a structured program — not run by former “Django Core”! — for recruiting and mentoring new contributors on an ongoing basis and helping them build familiarity with working on and contributing to Django . The bad news is there’s a huge obstacle in their way: all of us old-time “Django Core” folks who keep occupying all the official leadership positions. Just recruiting people to run against such long-time well-known names in the project is difficult, and actually winning against us probably close to impossible.

So the biggest thing I’d like for Django, right now, is for the entire former “Django Core” group — myself included! — to simply get out of the way . I thought I could come back last year and help fix things after stepping down post- DEP -10, but doing so was a mistake and only prolonged the problem. I will not be running in the upcoming Steering Council election and I beg my “Django Core” colleagues to all do likewise. There are qualified, motivated folks out there who should be given their chance to step up and run things, and we should collectively place Django into their capable hands. Then they can sort out the rest of the technical governance however they see fit.

And honestly, I’ve been in and out of just about every formal role the Django project has for (checks calendar) seventeen years now. It’s time . It’s time for me, and the rest of the old guard, to give way to new folks before we do serious harm to Django by continuing to hold on to leadership roles.

Give Django a hint

Python 3.0 introduced the ability to add “annotations” to function and method declarations, and though it didn’t specify what they were to be used for, people almost immediately started developing ways to specify static type information via annotations, which came to be known as “type hints”. Python 3.5 formalized this and introduced the typing module in the standard library with tools to make the type-hint use case easier, and Python 3.6 introduced the ability to annotate other names, including standalone variables and class attributes.

Django has a complicated history with this feature of modern Python. There’ve been multiple efforts to add type annotations directly in Django’s own code, there’s a third-party package which provides annotations as an add-on, a proposed DEP never went anywhere because the Technical Board at the time was against it , and it’s just been stuck as a frequently-requested feature ever since.

Let me be absolutely clear: I don’t have any issue with statically-typed programming languages as a concept. I’ve used both statically- and dynamically-typed languages and liked and disliked examples of each. If I weren’t writing Python, personally I probably would be writing C# (statically-typed). But I also have absolutely no interest in static type checking for Python as a feature or a use case.

What I do have an interest in is all the other use cases type hints enable. There’s a whole booming ecosystem of modern Python tools out there now which use type hints to enable all sorts of interesting runtime behavior. Pydantic and msgspec do runtime derivation of validation and serialization/deserialization behavior from type hints. FastAPI and Litestar are web frameworks which use type hints to drive input/output schemas, dependency injection and more. SQLAlchemy as of version 2.0 can use type hints to drive ORM class definitions.

I am very interested in those sorts of things, and right now they’re not available from vanilla Django because Django doesn’t do type hints (you can use a third-party package to turn Django into something resembling one of the newer type-hint-driven frameworks, but it’s an extra package and a whole new way of doing things that doesn’t “feel like Django”).

Compare, for example, this Django ORM model:

from django.db import models

class Person(models.Model):
    name = models.CharField()
    date_of_birth = models.DateField()

With its modern SQLAlchemy equivalent:

from datetime import date   
from sqlalchemy.orm import DeclarativeBase, Mapped

Base = DeclarativeBase()

class Person(Base):
    __tablename__ = "person"
    name: Mapped[str]
    date_of_birth: Mapped[date]

You can use SQLAlchemy’s mapped_column() function to be more verbose and specify a bunch more information, but for a basic column you don’t have to . Just write a type hint and it does the right thing.

I think type hint support in Django has the potential to unlock a huge variety of useful new features and conveniences, and the lack of it is causing Django to fall well behind the current state of the art in Python web development. So if I could somehow wave a magic wand and get any single technical change instantly made to Django, type hints would be it.

More generic Django

Django includes a feature known as “generic views” (keep in mind that Django doesn’t strictly follow regular MVC terminology, and so a Django “view” is what most pure MVC implementations would call the “controller”), which are reusable implementations of common operations like “ CRUD ” (create, retrieve, update, delete operations — including both individual-result and list-of-results), date-based archives of data, etc.

And basically everybody agrees Django’s generic views are too complicated. There’s a giant complex inheritance tree of classes involved, with a huge mess of different attributes and methods you can set or override to affect behavior depending on exactly which set of classes you’re inheriting from, creating a steep learning curve and requiring even experienced developers to spend a lot of time with both official and unofficial documentation ( ccbv.co.uk is the usual reference people are directed to).

There’s a reason for this complexity: originally, Django’s generic views were functions , not classes, and you customized their behavior by passing arguments to them. The class-based generic views were introduced in Django 1.3 (released in 2011), and for compatibility and ease of migration at the time, were implemented in a way which precisely mirrored the functionality of the function-based views. Which means that for every thing you could do via an argument to the function-based views, there is a mixin class, method, or attribute on the class-based ones corresponding to it.

This made some sense at the time, because it was a big migration to ask people to go through. It makes much less sense now, over 13 years later, when the complexity of Django’s hierarchy of class-based views mostly just scares people and makes them not want to use what is otherwise a pretty useful feature: class-based views are a huge reduction in repetitive/boilerplate code when you know how to use them (for example, see the views used by this site for date-based browsing of entries and detail/list views of entries by category — that really is all the code needed to provide all the backend logic).

At this point the overcomplexity of Django’s generic views is basically a meme in the community, and is one of the things I see most often cited by new Django users as making their experience difficult. So if I were going to be given the magic wand a second time and allowed to make another instant technical change, it’d be to finally deprecate the complicated generic-view class hierarchy and replace it with a ground-up rewrite aimed at providing a clear, powerful API rather than maintaining compatibility with a set of older functions that were deprecated nearly a decade and a half ago.

What do you wish for?

Of course, there’s a lot more that could be done to or for Django besides the three items I’ve outlined here. I’d encourage anyone who uses Django to think about what they’d like to see, to post about it, and, ideally, to get involved with Django’s development. That’s not just a way to get bits of your own wishlist implemented; it’s also the way to make sure Django continues to be around for people to have wishes about, and I hope that continues for many years to come.

There can't be only one

Published: 2024-08-27T16:02:50-05:00
Updated: 2024-09-28T20:04:59.859888-05:00
UTC: 2024-09-29 01:04:59+00:00
URL: https://www.b-list.org/weblog/2024/aug/27/highlander-problem/

There’s a concept that I’ve heard called by a lot of different names, but my favorite name for it is “the Highlander problem”, which refers to the catchphrase of the campy-yet-still-quite-fun Highlander movie/TV franchise. In Highlander, immortal beings secretly live amongst us and sword-fight each other in hopes of being the last one standing, who will then get to rule the world forever. And when one of them is about to eliminate another, they often repeat the formulaic phrase: “There can be only one!”
Content Preview

There’s a concept that I’ve heard called by a lot of different names, but my favorite name for it is “the Highlander problem”, which refers to the catchphrase of the campy-yet-still-quite-fun Highlander movie/ TV franchise. In Highlander , immortal beings secretly live amongst us and sword-fight each other in hopes of being the last one standing, who will then get to rule the world forever. And when one of them is about to eliminate another, they often repeat the formulaic phrase: “There can be only one!”

The basic idea of the Highlander problem is that you can cause yourself a lot of trouble, as a programmer, by introducing a “there can be only one!” limit into your code. To take a real example: I once worked at a company that managed medical visits via video calls. A simplistic “there can be only one” view of that would model a visit as being between one doctor and one patient, perhaps with foreign keys at the database level. But that becomes a problem when the actual visit in front of you involves the patient, a primary doctor, a nurse who does triage, a second doctor who consults on the case, a translator who helps them all communicate with the patient, and the patient’s legal guardian/decision-maker. A better model would accept that the visit has multiple participants and multiple possible roles, and keep track of who was in which visit and in which roles.

Similar issues occur all the time. Systems assume someone will only have one of a thing — common examples include physical addresses, email addresses, phone numbers, and payment methods — when many people will quite reasonably have or want to have more than one. And it’s often a lot of trouble to try to fix the system after the fact, because assumptions like these tend to be deeply embedded in the code and often in database schemas.

The result of a few experiences with this is that I’m automatically suspicious of any system which hard-codes a “there can be only one!” assumption. This week, amusingly, I got a new example to use and I expect to get a lot of mileage out of it.

Who’s at bat?

A bit of background: baseball is, by tradition, an outdoor sport. And in the United States, although there are some stadiums with retractable roofs (and, currently, one major-league stadium — Tropicana Field in St. Petersburg, Florida, home of the Tampa Bay Rays — with a permanent domed roof), most at the professional level are open to the elements. And unlike some other sports, baseball does not work well in inclement weather: when it’s raining too much, the ball becomes much more difficult to grip, fielding and running and sliding are affected, and the game becomes a bit of a mess.

The rules allow, under some circumstances, for a game to be ended early due to weather, if certain specific conditions (like a minimum number of completed innings) are met. Or a game can be suspended — literally paused, as-is, and scheduled for completion at a later date.

Teams can also trade players with each other. There are rules about who can be traded (players with a certain amount of major-league playing time start gaining the right to refuse trades), and when (the exact date of the major-league “trade deadline” varies from year to year for game-scheduling reasons, but falls in the last week of July or first week of August). But within those rules there’s a lot of freedom for teams to swap players.

These two factors, combined, created a possibility which had always been theoretical, until yesterday.

On June 26, 2024, the Toronto Blue Jays and Boston Red Sox started a game at Boston’s Fenway Park which was suspended due to rain in the second inning. At the time the game was suspended, Danny Jansen — catcher for the Blue Jays — was at bat.

On July 27, 2024, Jansen was traded to the Red Sox. You may now be able to see where this is going.

Yesterday, August 26, 2024, the schedule had the Blue Jays back in Boston, and the game from June was finally played to completion. As is common for a suspended game being resumed, the official scorer’s listing includes a flurry of simultaneous player substitutions in the second inning, among which two are notable:

Pinch-hitter Daulton Varsho replaces Danny Jansen

Danny Jansen replaces Enmanuel Valdez, batting 7th, playing catcher

In other words, Jansen — who, remember, had been at bat when the game was suspended — was substituted out of the game by the Blue Jays (who he no longer plays for) and replaced with another player. And at the same time, Jansen was substituted into the game, as the new catcher, by the Red Sox (who he does now play for — when the game began, Reese McGuire was catching for the Red Sox, but he’s now in the minor leagues). Usually, a player who has been substituted out cannot re-enter the same game, but this case is an exception.

And thus, for the first time in the over 150-year-long history of major-league baseball in the United States, a single player — Danny Jansen — played for multiple teams in the same game (though it had also happened at least once in lower-level leagues; as this article about the Danny Jansen situation points out, a minor-league player did it in 1986).

Baseball statistics sites, to their credit, already have to handle a lot of weird things. Like the fact that a player can be listed as playing in multiple games for multiple teams on the same day, which is uncommon but not unheard-of (for example, it’s happened several times that two teams playing a doubleheader — two games against each other in one day — have traded players between the two games ). And there are a lot of other fun quirks around suspended games: for record-keeping purposes, they occur completely on the day they began, rather than on the day they completed, so it’s possible for a player who took part in the completion of the game to seemingly “time travel” and join a team months earlier than they actually did, due to being officially listed as playing on the original date of the suspended game.

But this — a single player, playing for multiple teams in the same major-league game (most stats sites don’t keep the same kind of comprehensive records on minor-league games) — was an entirely new situation, and appears to have broken some assumptions that a player can only play for one team per game. This reddit thread points out problems baseball-reference — probably the largest and most popular baseball statistics site — is having, and a representative of the site showed up to say they’re working on it , and apparently have a whole internal thread of issues surfaced by Danny Jansen’s accomplishment.

There can be only one!” has struck again.

A piece of advice

I’ve said many times before that I don’t like the “falsehoods programmers believe about…” genre of posts, because they tend to be contextless lists of things that you’re just told are wrong, with no explanation of why they’re wrong or what you could do instead that would be right . In that spirit, I’d like to offer a bit of guidance.

Wikipedia helpfully mentions the “zero one infinity” rule — the idea that a system should allow either exactly zero of a thing, or exactly one, or an unbounded number limited only by available resources — but I personally tend to lean toward a “zero or infinity” version of the rule: if a thing is worth allowing, it’s almost always worth not putting a hard limit on.

In database schemas, this means you should almost never have a column that is both a foreign key and the only column referenced in a UNIQUE constraint (creating a “one-to-one” instead of “many-to-one” relation — for Django users, this means you should almost never use its OneToOneField ).

Allowing multiple and, at most, designating one as “primary”, is usually a better approach. Better still is allowing multiple and adding labels; for example, labeling each of a user’s multiple phone numbers as having different purposes and allowing the user to choose which one to use in a given circumstance.

Know your Python container types

Published: 2023-12-24T19:07:54-06:00
Updated: 2024-09-12T13:02:48.473149-05:00
UTC: 2024-09-12 18:02:48+00:00
URL: https://www.b-list.org/weblog/2023/dec/24/python-container-types/

This is the last of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is the last of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

Python contains multitudes

There are a lot of container types available in the Python standard library, and it can be confusing sometimes to keep track of them all. So since it’s Christmas Eve and time to wrap any last-minute gifts, I’d like to wrap up this “Advent calendar” series with a guide to the most common types of data containers and what kinds of things you might want to wrap in them.

  • list is a mutable data type, and often used to store multiple objects of the same type — though unlike, say, arrays in other languages like C, there’s no requirement that all values in a list be of the same type. Just note that many static type checkers for Python will default to assuming a list’s contents are all of the same type (i.e., if it sees you put an int in a given list, it will assume a type of list[int] and error if you then add a value of another type).
  • tuple is an immutable/heterogeneous data type, closer in purpose to the “record” types or structs of other languages. Very often you’ll see code which generates lots of tuples which each have the same “structure” — for example, a color library might represent color values as 3-tuples of int .
  • collections.namedtuple and typing.NamedTuple are two different ways to write the same thing: tuple subclasses with fields that can be accessed by name as well as by numeric index, and instantiated using keyword-argument syntax. The key difference is the typing.NamedTuple version supports a type-hint-based declarative syntax. I like using named tuples as a way to define tuple types that will be reused a lot (in the example above, it would probably make sense to define an RGBColor named tuple with red , green , and blue fields).
  • set is a container which enforces uniqueness of its elements. No matter how many times you add the same value to a set, it still ends up with only one copy.
  • dict (short for “dictionary”) is a hash table, mapping keys to values; there is no requirement that all keys or values be the same type, but type checkers will generally still assume such a requirement. There’s also typing.TypedDict for explicitly type-hinting the expected structure of a dictionary.
  • dataclasses.dataclass is not really a “container” at all, though it sometimes gets used as one. The dataclass decorator is primarily a shortcut for declaring a class with a set of attributes (using type-hint syntax) and having it auto-derive a constructor for you which will expect arguments for those attributes and set them appropriately (though it will not do runtime type-checking of the values of those arguments).

There are also other container types in the standard library — the collections module and the array module , for example, provide some specialized container types like Counter , which acts as a histogram structure, or array.array which works like a numeric-type array in C — but they’re more rarely used.

In general, my advice is:

  • Use a list for most cases where you just want an iterable/indexable sequence.
  • Use a tuple as a struct-like type where multiple instances will have the same structure, but consider using named tuples to make that structure clearer (a lot of people don’t like named tuples because of the fact that they support iteration and numeric indexing as well as named field access, but I don’t personally mind this).
  • Use dict for key-value mappings.
  • Use set when uniqueness matters, though this is not super common; most of the value of sets is in the union/intersection/etc. operations they support.
  • Don’t use dataclass as a “super-tuple”; if what you really want is just a plain data container with named field access, just use a named tuple. Use dataclass when you also want the result to be an ordinary mutable Python class (tuples are immutable) with potentially extra behavior attached via methods.
  • Avoid most of the other container types unless you know they’re the right thing for your specific use case. And if you’re not sure whether they’re right, they aren’t; you generally will just know when one of them is the right fit (for example, collections.Counter is very useful for the exact specific thing it does, and not really useful at all for anything else).

Compare strings the right way

Published: 2023-12-23T19:14:06-06:00
Updated: 2023-12-23T21:14:40.116268-06:00
UTC: 2023-12-24 03:14:40+00:00
URL: https://www.b-list.org/weblog/2023/dec/23/compare-python-strings/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

Unicode’s unique complexity

It is the year 2023 — almost 2024! — and hopefully you’re using a programming language that is fully Unicode-aware. Python is; its string type is a sequence of Unicode code points, and has been since Python 3.3 (yes, 3.3 — see this older post for details of why 3.3 and not 3.0, but the short version for those familiar with Unicode is that in Python 3.0, 3.1, and 3.2 the string type, as with the unicode type of Python 2, was a sequence of code units , usually UTF -16 code units but sometimes UTF -32, and didn’t change to code points until Python 3.3),

So let’s get some terminology down right away: a “code point” is the atomic unit of Unicode, and is a sort of abstract thing of which properties can be asserted. It is not the same as a “character”, at least not in the way people typically use the word “character”. The Unicode equivalent of that is a “grapheme”, which might be one or multiple code points (or one code point might be multiple graphemes; it can get complicated, because human language and writing is complicated, and if you want examples look up that older blog post I linked above).

But to be useful to a computer, we need to encode these abstractions, transforming them into sequences of bytes, typically using a Unicode Transformation Format, abbreviated “ UTF ”. Common transformation formats are UTF -8, UTF -16, and UTF -32. A transformation format in turn consists of “code units”, which are the native-sized units of the format. UTF -16’s code units, for example, are two bytes (16 bits). But it’s also a variable-width encoding, so some code points use one code unit while some others use two, according to a somewhat complex scheme which was retrofitted on when Unicode first expanded past the number of code points that could be represented by a 16-bit integer.

And Unicode often provides more than one way to write the same thing. Mostly this is because Unicode tries to maintain round-trip compatibility with older character sets and encoding systems; the idea is you should be able to convert from the old system, to Unicode, and then back again and get exactly the input you started with. But this means that, for example, if you want to write “é” you can either write it as a “decomposed” sequence made up of an “e” and an accent mark, or as a single “composed” code point which exists for compatibility with older character sets in which each character/accent combination got its own separate representation.

And in fact there are multiple types of equivalence in Unicode: canonical equivalence, and compatibility equivalence. Canonical equivalence applies to, say, the two different ways of writing “é” because they all produce the same visual result and are always interchangeable with one another. But compatibility equivalence applies to things which don’t necessarily look identical and aren’t always interchangeable; for example, the sequence 1⁄2 and the single code point ½ have compatibility equivalence.

Unicode provides a concept of normalization for dealing with this and making sequences that “should be” equivalent actually equivalent, and rules for four different normalization forms (depending on what type of equivalence you want to use and whether you want the result to use composed or decomposed forms).

There are also multiple ways of handling case in Unicode; I’ve written in detail about case in Unicode before , but the gist of it is that there are multiple different definitions of it which can give different answers to questions like “is this uppercase” or “is this lowercase”, depending on what you need (for example, you might want to always get an answer even if a given code point is not really “cased”). Unicode defines case mappings, and in particular defines one called “case folding”, which has the useful property that two sequences of code points which differ only in case will be equal after both have been case folded (don’t fall into the trap of thinking case folding is like lowercasing; although many case-folded strings look lowercase to speakers of many European languages, there are scripts — Cherokee is an example — for which case folding produces an uppercase-looking result).

Getting to the (code) point

All of which means that, unfortunately, comparing two strings in a Unicode-aware world isn’t as simple as using the == operator, because you might be dealing with strings that ought to be considered equivalent, or whether you want to consider case, etc.

The unicodedata module in the Python standard library provides an implementation of Unicode’s normalization rules, and Python’s str type provides a casefold() method in addition to the lower() and upper() (and title() ) case transformations. If you’re unsure what to pick, a good general piece of advice is to use normalization form NFKC unless and until you know you want something else, and to case fold after normalizing for comparisons which need to be case-insensitive. So you could do something like this:

from unicodedata import normalize


def compare(s1: str, s2: str) -> bool:
    return normalize("NFKC", s1).casefold() == normalize("NFKC", s2).casefold()

This is about as safe as you can get, within just the Python standard library, for comparing things like identifiers (say, usernames) without running into problems with case or equivalence. Values which will be used for comparison should also generally be stored in normalized format — the default user model in django.contrib.auth , for example, will normalize the username, and the AbstractBaseUser class provides a method which performs the correct normalization (though it does not case-fold the username).

Set cookies the right way

Published: 2023-12-22T17:29:56-06:00
Updated: 2023-12-22T19:30:26.197730-06:00
UTC: 2023-12-23 01:30:26+00:00
URL: https://www.b-list.org/weblog/2023/dec/22/set-django-cookies/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

Cookies in the cookie jar

Django’s request and response objects, and their attributes and methods, make dealing with cookies easy. You can read from the request.COOKIES dictionary to get a cookie, and you can use the response’s set_cookie() method to set cookies . What else do you need?

Well, potentially a few things.

First of all, it’s worth reviewing those arguments to set_cookie() , because several of them are important. You should probably always set:

  • secure=True : This instructs browsers to only send the cookie on secure (i.e., HTTPS ) requests, never on unencrypted connections. This helps avoid some potential leaks of cookie values — for example, if the user bookmarked or manually typed or pasted in a plain HTTP URL , there’s a chance of visiting it first before being redirected to an HTTPS version, and that initial request will not use an encrypted connection, which exposes the values of all cookies in the request to anyone who can observe the network traffic. For a similar reason, you should set the Django settings SESSION_COOKIE_SECURE and CSRF_COOKIE_SECURE both to True to ensure those cookies are only sent on HTTPS connections.
  • samesite="Lax" (if not "Strict" ): This controls whether a cookie is sent when making cross-site requests. Setting to at least "Lax" will shut down quite a lot of CSRF attack vectors. The SESSION_COOKIE_SAMESITE and CSRF_COOKIE_SAMESITE Django settings control this for the session and CSRF cookies, and default to "Lax" .

You should also strongly consider setting httponly=True when possible; this will instruct the browser not to let JavaScript access that cookie (though it will still be sent normally with requests). The Django settings SESSION_COOKIE_HTTPONLY and CSRF_COOKIE_HTTPONLY control this for the session and CSRF cookies; the session cookie defaults to True , while the CSRF cookie defaults to False (since, as the documentation points out, if an attacker can already run JavaScript that would access that cookie, you’re busted anyway).

Django also supports using signed cookies as a tamper-proofing mechanism. To set a signed cookie, use the set_signed_cookie() method , and to read it, use the get_signed_cookie() method (instead of directly accessing request.COOKIES ). You should also look into providing the salt argument, which acts as an additional input to the signing algorithm to help partition different usages of Django’s signing utilities (which by default use the value of your SECRET_KEY setting as the signing key; specifying the salt will combine the salt value and the SECRET_KEY to produce a unique signing key. This prevents use of signing tools in one part of a site from acting as an oracle for signing on another part of the same site.

Just be careful about trying to do too much with signed cookies. Signing only gives you a check against tampering, and doesn’t encrypt or otherwise usefully obscure the value stored in the cookie, and browsers tend to have a limit (usually 4KB ) on how much data they can store in a single cookie, so trying to put large amounts of data in a signed cookie isn’t a great idea.

Don't use Python's property

Published: 2023-12-21T20:31:48-06:00
Updated: 2023-12-21T22:32:15.969397-06:00
UTC: 2023-12-22 04:32:15+00:00
URL: https://www.b-list.org/weblog/2023/dec/21/dont-use-python-property/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

Attributing the problem

Suppose you’re writing Java and you write a class with an attribute :

public class MyClass {
    public int value;
}

And then later on you realize that value really wants to have some extra work happen at runtime — maybe it actually needs to be calculated “live” from other values, or needs auditing logic to note accesses, or similar. So what you actually want is something more like:

public class MyClass {
    private int value;

    public int getValue() {
        // Do the work and return the value...
    }
}

Except, at this point, you can’t turn the integer attribute someValue into an integer-returning method someValue() without forcing everything which used this class — which may be a lot of stuff, not all of which is yours — to adapt to this new interface and recompile.

So the standard recommendation in Java is never to declare a public attribute; always declare it private, and write methods to retrieve and set it (called “getters” and “setters”).

Python has a solution

In Python, on the other hand, you could write your class:

class MyClass:
    value: int

    def __init__(self, value: int):
        self.value = value

And then when you later decide you need value to be a method, you can do that without forcing anyone else to change their code that used your class, by declaring it as a “property”:

class MyClass:
    _value: int

    def __init__(self, value: int):
        self.value = value

    @property
    def value(self) -> int:
        return self._value

    @value.setter
    def value(self, value: int):
        self._value = value

A property in Python lets you create a method — or up to three, for the operations to get, set and delete — which acts like a plain attribute. You don’t need to write parentheses to call it. For example, the following will work:

>>> m = MyClass(value=3)
>>> m.value
3
>>> m.value = 5
>>> m.value
5

But you (mostly) shouldn’t use it

If you have exactly the use case above — you started out with an attribute, and later it became more complex and now needs to be a method, but you don’t want to have to rewrite all code using that class — then you should use property . That’s what it’s for (and in other languages, too; C#, for example, has built-in syntax for declaring wrappers similar to Python’s property ).

But way too much use of property just comes down to “this is a method, and always was a method, I just wanted it to look like an attribute for aesthetic reasons”. Which is not a good use.

For example:

import math


class Circle:
    center: tuple[float, float]
    radius: float

    def __init__(self, center: tuple[float, float], radius: float):
        self.center = center
        self.radius = radius

    @property
    def area(self):
        return math.pi * (self.radius**2)

    @property
    def circumference(self):
        return 2 * math.pi * self.radius

Using property in this way can be deeply misleading, since it creates the impression of simple attribute access when in reality there might be complex logic — or even things like database queries or network requests! — going on. This is a necessary trade-off sometimes for plain attributes that later turn into methods (and should still be documented when it occurs), but when there’s no technical need to do this, you shouldn’t do it. In the example above, area() and circumference() should just be plain methods, not properties.

The one potential exception is for more complex use of descriptors — and Python’s property is a relatively simple example of a descriptor — which let you create objects that emulate attribute behavior and offer fine-grained control of that behavior. Django’s ORM , for example, uses descriptors to let you read and assign fields of a model class in ways that look like plain attribute access although under the hood there may be data conversion or even delayed/deferred querying going on. And even that has its detractors: needing to remember to use select_related() or prefetch_related() to avoid numerous extra queries for related objects can be annoying, and is a common “gotcha” when working with Django’s ORM (SQLAlchemy, by contrast, sets the default relationship-loading behavior on the relationship ).

In general, though, unless you really know what you’re doing with advanced descriptor use cases, or you have the very specific use case of turning a pre-existing attribute into a method, you probably should be avoiding property , and writing descriptors in general, in your Python code.

Also, if you’re coming from a language like Java, don’t interpret this as “always write getter/setter methods from the start, just wrap them in property ” — always start with plain attributes, and only change to getter/setter methods if you need to later. The point of property is that you can do so without breaking the API of your class.

Use Django's system checks

Published: 2023-12-20T22:40:37-06:00
Updated: 2023-12-21T00:41:11.511187-06:00
UTC: 2023-12-21 06:41:11+00:00
URL: https://www.b-list.org/weblog/2023/dec/20/django-system-check/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

Check it out

While you can do very minimal Django setups, more typical use cases tend to involve a mix of applications — your own, some third-party, and some from django.contrib — and associated configuration, and while you can run a lot of “code quality” checks , it still can be daunting to verify you’re using and configuring Django correctly.

Which is where Django’s system-check framework comes in. The system check runs automatically as part of manage.py runserver and manage.py migrate , and can be explicitly run any time you like with manage.py check .

The system-check framework consists of a large number of built-in checks which look at your code and configuration and report problems (or things that are likely to be problems). You can also run subsets of checks corresponding to specific tags .

Notably, these are not static checks: they give you the ability to do live importing and introspection of code, and to run checks which see how things will actually look at runtime, which can be useful in a framework with as much runtime metaprogramming as Django.

For most Django apps you should consider running manage.py check regularly, and perhaps even as part of a CI process; manage.py check --deploy , or at the very least manage.py check --tag security , should probably run before you deploy anything, and should make sure to use the real deployment settings module (via the --settings flag to manage.py check ) rather than local-dev or test-only settings.

You also can write your own system checks . The general idea is that you’re passed a list of applications to check (as Django AppConfig instances ), and then you can do whatever you want with them: scan files in them, fetch a list of model classes, pretty much anything you like. If you’re writing an application which wants to be configured and used in a certain way, or even if you’re just doing some fancy things with custom models, managers, or model fields, writing some custom checks can be a good idea to make sure the code is being used as intended. For examples you can study the django/core/checks/ folder in Django itself . Model checks tend to make use of the undocumented/internal model-introspection API , though, so a future post may need to talk about how that works (it’s less scary than it sounds from that description).

Show Python deprecation warnings

Published: 2023-12-19T17:47:49-06:00
Updated: 2023-12-19T19:48:14.297303-06:00
UTC: 2023-12-20 01:48:14+00:00
URL: https://www.b-list.org/weblog/2023/dec/19/show-python-deprecation-warnings/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

Let this be a warning

Python provides the ability to issue a warning as a step below raising an exception; warnings are issued by calling the warnings.warn() function , which at minimum should be called with a message and a warning type .

One of the most common ways warnings are used is to provide notification that an API is deprecated and will be removed in the future; this typically uses the DeprecationWarning type , and the message should give some hint about what to use instead of the deprecated API , and when the deprecated API will be removed.

To take a real-world example, Django uses a two-release deprecation cycle: when an API is first deprecated, it begins issuing a PendingDeprecationWarning , which then upgrades one release later to a DeprecationWarning , after which the API can be removed.

Django also tends to use subclasses of the built-in warnings but named for the specific Django version which will see the removal. For example, Django 5.0 includes a DeprecationWarning subclass named RemovedInDjango51Warning , used as the final warning for APIs which will be removed in Django 5.1.

Django doesn’t use SemVer

You may be wondering why an API can be removed from Django without needing a major version bump, since that’s what Semantic Versioning (“SemVer”) would require. But Django does not use SemVer as its versioning scheme; instead, Django’s deprecation cycle is based on its long-term-support (“ LTS ”) releases, which are always the third and final release in a major version of Django. Django 2.2, 3.2, and 4.2 were all LTS releases, and 5.2 will be one when it comes out sometime in 2025.

The API compatibility policy, broadly speaking, is that code which runs on an LTS without emitting any deprecation warnings will also run on the next LTS without needing to be changed. So an application that runs on Django 3.2 LTS without deprecation warnings would be able to also run on Django 4.2 LTS (though it would need to handle any new deprecation warnings to become ready for 5.2).

But you have to look for them

Python’s default behavior is to ignore several categories of warnings , meaning that even when they’re being issued, you won’t see them show up in logs. Which means that you won’t usually see deprecation warnings, and thus can be caught by surprise when they suddenly upgrade into errors because you’re using an API that went away.

If you’re using the pytest testing framework, it changes this for you and shows warnings by default .

But you can also change this by invoking Python with the -W flag and specifying a warning policy, or by setting a warning policy in the PYTHONWARNINGS environment variable. You can check the warning filter documentation for full details, and use it to adjust how often you want to see warnings; you can have them print always , once , ignore , or other options (telling Python not to always re-display the same warning can be helpful if, say, you call a function hundreds of times and it emits a deprecation warning every time — that can quickly fill your logs with noise).

I generally use a policy of once::DeprecationWarning for my own projects, but you should play with it and settle on something that works for you.

Also note that you can set a policy of error to tell Python to turn any emitted warning into an exception, and quite a few command-line tools support this too. For example, the Sphinx documentation builder has its own -W command-line flag which turns warnings into errors, which is useful for things like the spell-checking extension — it emits warnings for misspelled words, and the -W flag will turn them into errors. I use this to break my CI builds in case of misspellings in documentation.

Running async tests in Python

Published: 2023-12-18T23:18:13-06:00
Updated: 2023-12-19T01:21:34.517257-06:00
UTC: 2023-12-19 07:21:34+00:00
URL: https://www.b-list.org/weblog/2023/dec/18/async-python-tests/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

A-sync-ing feeling

Async Python can be useful in the right situation, but one of the tricky things about it is that it requires a bit more effort to run than normal synchronous Python, because you need an async event loop that can run and manage all your async functions.

This is especially something you’ll notice when testing, since async code basically has to have async tests (remember, it’s a syntax error to await inside a non- async code block).

If you’re using the standard-library unittest module you don’t have to do too much extra work; you can group async tests into an async test case class , and then the standard python -m unittest discover runner will find and run them for you. As noted in the docs, unittest.main() in a file containing async test cases also works.

If you’re using Django, and Django’s standard testing tools, you also don’t need to do much extra work; asynchronous testing is supported on the standard Django test-case class, which provides an async HTTP client and async versions of its standard methods (prefixed with a for async — alogin() in place of login() , etc.), and then manage.py test will run your async tests just as easily as it runs synchronous tests.

Where things get tricky is with pytest , which — as I write this — does not natively support running async functions as tests, so you’ll need a plugin. I’ve generally used the pytest plugin shipped by AnyIO , which provides a pytest mark that can be applied at module or function level to mark and run async tests, but there’s also pytest-asyncio , which similarly provides a pytest mark for async tests. Mostly this will be a matter of taste and of how much you’re committing to additional libraries on top of the base Python asyncio setup — AnyIO provides a bunch of additional async helpers, while pytest-asyncio just does the one thing in its name.

Also it’s worth noting that although pytest-django provides pytest equivalents of many of Django’s testing helpers, including the async helpers, it does not (as far as I’m aware) provide an async test runner plugin, so you’d still need another plugin like pytest-asyncio or AnyIO to run async Django tests with it.

Just note that running async pytest plugins generally does not come with support for running async class-based tests (where normal synchronous pytest can run unittest.TestCase classes), and that if you’re going to use async pytest fixtures you’ll probably want to make sure those fixtures will work with the pytest helper plugin you’ve chosen.

Don't use class methods on Django models

Published: 2023-12-17T16:13:11-06:00
Updated: 2023-12-17T18:13:34.527386-06:00
UTC: 2023-12-18 00:13:34+00:00
URL: https://www.b-list.org/weblog/2023/dec/17/django-model-classmethod/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

Being methodical about Python

Python classes support three basic types of methods:

  • Instance methods , which are what you get by default when writing a def statement inside a class body. These are called from an instance of the model, and Python will ensure that the instance is passed as the first positional argument (conventionally named self ) when this method is called.
  • Class methods , which are created by the @classmethod decorator . These are called from the class object itself, and Python ensures the class object is passed as the first positional argument (conventionally named cls ).
  • Static methods , which are created by the @staticmethod decorator . These are called from the class object itself, but no special implicit first argument will be passed to these.

The general advice for these is that when you’re writing a class, you almost always want plain instance methods, and you pretty much never want static methods (which are sort of a weird attempt to give Python an equivalent of Java’s static methods, except Java has to have them because it historically didn’t support standalone functions — everything had to be a method of a class — but Python doesn’t have that limitation). And sometimes you want class methods, but they tend to be pretty special case. Most often it’s so you can have alternate constructors. For example, suppose you’re writing a class that represents an RGB color. You might start with:

from dataclasses import dataclass


@dataclass
class RGBColor:
    red: int
    green: int
    blue: int

This will get an automatic constructor (courtesy of the dataclass helper ) which takes three arguments and sets them as the values of the red , green , and blue attributes. Then you might decide you also want to support creating an RGBColor from a hexadecimal string like #daa520 or #000080 . So you could add an alternate constructor which will parse out the integer values from the hex, and it’d be best to mark this as a classmethod :

from dataclasses import dataclass


@dataclass
class RGBColor:
    red: int
    green: int
    blue: int

    @classmethod
    def from_hex(cls, hex_str: str):
        # Put the logic to parse out the red, green, blue components
        # here...
        #
        # Then make the new instance and return it:
        return cls(red, green, blue)

But what’s that about Django?

Except you shouldn’t do this with Django models. You can , and it’ll work, and it won’t break anything, but generally a Django model class should contain only instance-level definitions and behaviors; class-level behavior should be defined on a model’s manager class .

This is because Django already puts class-level (or whole-table-level, if you’re thinking in SQL terms) behavior on the manager automatically, and it’s conceptually simpler to be consistent about the class-level versus instance-level manager/model split than to have some class-level behavior on the model and some on the manager.

So a similar example in the Django ORM should look like:

from django.db import models


class RGBColorManager(models.Manager):
    def from_hex(self, hex_str):
        # Put the logic to parse out the red, green, blue components
        # here...
        #
        # Then make the new instance and return it:
        return self.model(red, green, blue)


class RGBColor(models.Model):
    red = models.IntegerField()
    green = models.IntegerField()
    blue = models.IntegerField()

    objects = RGBColorManager()

And then you’d call it like: RGBColor.objects.from_hex("#daa520") .

This model probably also wants to have some validators attached to its fields to enforce that the red, green, and blue values are all in the permitted 0-255 range, but that’s orthogonal to where to put the alternate constructor.

Say what you mean in a regex

Published: 2023-12-16T18:54:30-06:00
Updated: 2023-12-16T20:55:05.637580-06:00
UTC: 2023-12-17 02:55:05+00:00
URL: https://www.b-list.org/weblog/2023/dec/16/python-regex-unicode-digits/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

An URL -y warning

Suppose you’re writing a blog in Django, and you get to the point where you’re setting up the URLs for the entries. Django has two ways to write URLs, depending on your preferred style:

  1. The path() function , which uses path-converter syntax to let you declare the types you expect things to be and derives the matching rules from that.
  2. The re_path() function , which uses regex syntax to describe the URL .

Here’s an example of each:

from django.urls import path, re_path

from blog import views


urlpatterns = [
    re_path(
        r"^(?P<year>\d{4})/$",
        views.EntryArchiveYear.as_view(),
        name="entries_by_year",
    ),
    path(
        "<int:year>/<int:month>/",
        views.EntryArchiveMonth.as_view(),
        name="entries_by_month",
    ),
]

But there’s a bug here. Can you spot it?

A digital extravaganza

Ever since the Python 3 transition, Python’s regex implementation, in the re module, is Unicode-aware, and will use Unicode properties when determining whether something fits in a particular character class. So this works:

>>> import re
>>> year_pattern = re.compile(r"^(?P<year>\d{4})/$")
>>> year_pattern.match('2020/')
<re.Match object; span=(0, 5), match='2020/'>

But, crucially, so does this :

>>> year_pattern.match('۵७੪୭/')
<re.Match object; span=(0, 5), match='۵७੪୭/'>

That sequence is U+1781 EXTENDED ARABIC-INDIC DIGIT FIVE , U+2413 DEVANAGARI DIGIT SEVEN , U+2666 GURMUKHI DIGIT FOUR , U+2925 ORIYA DIGIT SEVEN , in case you’re interested.

And that behavior probably isn’t what was wanted, but is what the regex asked for: the \d regex metacharacter matches anything that Unicode considers to be a digit, which is a much larger set of things than just the ten ASCII digits. Many languages around the world have their own digit characters, after all, and Unicode recognizes all of them.

So the correct pattern is not \d{4} , but [0-9]{4} , matching only the ten ASCII digits. This is a bug I’ve seen multiple times now in real-world codebases, sometimes lurking for years after a Python 3 migration, and can pop up anywhere you use regex, so it’s worth keeping an eye out for and probably even actively auditing your code for if you’re feeling ambitious.

Python packaging: use the "src"

Published: 2023-12-15T17:39:57-06:00
Updated: 2023-12-15T19:40:18.607130-06:00
UTC: 2023-12-16 01:40:18+00:00
URL: https://www.b-list.org/weblog/2023/dec/15/python-packaging-src-layout/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

A lurking problem

Imagine you write a Python library named, say, foo . And you diligently set up the configuration to package it for distribution (which is not that hard; you can learn the basics of packaging up some Python code in an hour or two, and it’s only when you get into things like bundling compiled extensions in C or other languages that things get annoying). And you also diligently write some tests and periodically run them, and they pass, both on your local machine and in whatever CI you’ve set up.

Except one day someone tells you your package doesn’t actually work. Even though the tests are passing. Huh?

A surprising amount of the time, this can be traced back to a simple decision: to put the directory foo , containing your library’s main importable Python module, at the top level of the repository (so that if you were looking at it you’d see the pyproject.toml and other config files, and the foo/ directory all alongside each other).

The issue with this is that the current directory is, by default, on the Python import path. Which means import foo was always going to work even if your packaging config was broken . That’s why the tests passed: they were still able to successfully import everything they needed. But the packaging wasn’t replicating that structure to someone else’s machine, so they couldn’t use it.

The solutions

There are two things you can and should do to avoid this problem. One is to begin running your tests, at least in CI , with the -I flag to the Python interpreter (i.e., instead of running pytest , run python -Im pytest ). This puts the interpreter into “isolated mode” where, among other things, the current directory is not on the import path. This is a good idea for almost any command you’ll run in CI , because isolated mode can cut off a lot of easy attack vectors.

The other thing is… stop putting your module at the top level. Instead, adopt the src/ layout or a variation of it, where your module is inside a directory called src/ . This way, even locally and even when you forget to use -I , running tests will require that the package successfully build and install, because the module will no longer be top-level and thus no longer implicitly on the import path.

This is not the only reason to prefer a src/ layout ( here’s a post listing a few more , and fortunately things have gotten easier since it was written — setuptools now automatically recognizes and explicitly supports src/ layouts, for example), but it is a compelling one, and it’s the one that got me to change over (after not believing the broken-package problem would ever happen to me — at least I was in good company , and now I’m happy to tell Hynek he was right and I should’ve listened when he wrote that post).

Database functions in Django

Published: 2023-12-14T19:09:04-06:00
Updated: 2023-12-14T21:09:34.565253-06:00
UTC: 2023-12-15 03:09:34+00:00
URL: https://www.b-list.org/weblog/2023/dec/14/django-database-functions/

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Content Preview

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction .

Functionally a database

On top of the basic query syntax we’re all used to, SQL databases tend to have a large number of built-in functions — some of which are standardized, some of which are unique to a particular database — and often the ability to write your own custom ones as well.

And Django’s ORM supports a lot of standard functions , making it easy to use them in ORM query expressions. But you can also write your own.

Trigram matching

So as a simple example, let’s consider the trigram-matching extension that ships with PostgreSQL . You can read its official documentation for an idea of how it works, but the basic concept is that it breaks text values down into trigram patterns which are surprisingly effective at fuzzy text matching, and provides both functions and operators for query trigrams and similarity of trigrams.

Django supports the operators from the trigram extension , and already has implementations of database functions for the trigram functions, too , but the point here isn’t to fix something missing in Django — it’s to show how you can do this, and I think the basic trigram function is a nice example to use. So let’s reimplement it.

We’re going to be writing a subclass of Django’s base Func SQL expression here, and the absolute simplest possible version would just set the function name and the expected argument count. The base trigram function name is SIMILARITY , and it takes two arguments, so:

from django.db import models


class Similarity(models.Func):
    """
    Django ORM function expression which calls the PostgreSQL
    trigram SIMILARITY() function.

    """

    arity = 2
    function = "SIMILARITY"

Now suppose we want to query a blog entry model for titles that are similar to a search string. We could do it like so:

from django.db import models
from django.db.models.functions import Lower

search_result = Entry.objects.annotate(
    match=Similarity(
        Lower("title"),
        Lower(models.Value(search_term)),
    )
).filter(match__gt=0.5)

This uses annotate() to add a calculated column containing the similarity between the entry title and the search term to each row of the result, and then filters for rows which have a similarity score above a certain threshold.

Improving it

But we can do better than this. There are two specific things in particular that should be done here:

  1. We should specify the output type of this expression so Django knows what model field type to use for it.
  2. We should wrap any literal arguments we receive in instances of Django’s Value() expression so they can be correctly resolved as expressions when run. This way users of the function don’t have to remember to do this manually as above.

Here’s the new version:

class Similarity(models.Func):
    """
    Django ORM function expression which calls the PostgreSQL
    trigram SIMILARITY() function.

    """

    arity = 2
    function = "SIMILARITY"
    output_field = models.FloatField()

    def __init__(self, first_expression, second_expression, **extra):
        if not hasattr(first_expression, "resolve_expression"):
            first_expression = models.Value(first_expression)
        if not hasattr(second_expression, "resolve_expression"):
            second_expression = models.Value(second_expression)
        super().__init__(first_expression, second_expression, **extra)

The __init__() method checks whether the two arguments are already some type of Django SQL expression by looking for a standard attribute of all expression types and, if not found, wraps them in Value . And the output_field attribute tells the ORM what type of model field the return value corresponds to.

The usage is the same as above, except now without the need to remember to wrap anything in Value :

from django.db import models
from django.db.models.functions import Lower

search_result = Entry.objects.annotate(
    match=Similarity(
        Lower("title"),
        Lower(search_term),
    )
).filter(match__gt=0.5)