feincms3-data¶

Why¶

Utilities for loading and dumping database data as JSON.

These utilities (partially) replace Django’s built-in dumpdata and loaddata management commands.

Suppose you want to move data between systems incrementally. In this case it isn’t sufficient to only know the data which has been created or updated; you also want to know which data has been deleted in the meantime. Django’s dumpdata and loaddata management commands only support the former case, not the latter. They also do not including dependent objects in the dump.

This package offers utilities and management commands to address these shortcomings.

How¶

pip install feincms3-data.

Add feincms3_data to INSTALLED_APPS so that the included management commands are discovered.

Add datasets somewhere describing the models and relationships you want to dump, e.g. in a module named app.f3datasets:

from feincms3_data.data import (
    specs_for_app_models,
    specs_for_derived_models,
    specs_for_models,
)
from app.dashboard import models as dashboard_models
from app.world import models as world_models


def districts(args):
    pks = [int(arg) for arg in args.split(",") if arg]
    return [
        *specs_for_models(
            [world_models.District],
            {
                "filter": {"pk__in": pks},
                "delete_missing": True,
            },
        ),
        *specs_for_models(
            [world_models.Exercise],
            {
                "filter": {"district__in": pks},
                "delete_missing": True,
            },
        ),
        # All derived non-abstract models which aren't proxies:
        *specs_for_derived_models(
            world_models.ExercisePlugin,
            {
                "filter": {"parent__district__in": pks},
                "delete_missing": True,
            },
        ),
    ]


def datasets():
    return {
        "articles": {
            "specs": lambda args: specs_for_app_models(
                "articles",
                {"delete_missing": True},
            ),
        },
        "pages": {
            "specs": lambda args: specs_for_app_models(
                "pages",
                {"delete_missing": True},
            ),
        },
        "teachingmaterials": {
            "specs": lambda args: specs_for_models(
                [
                    dashboard_models.TeachingMaterialGroup,
                    dashboard_models.TeachingMaterial,
                ],
                {"delete_missing": True},
            ),
        },
        "districts": {
            "specs": districts,
        },
    }

Add a setting with the Python module path to the specs function:

FEINCMS3_DATA_DATASETS = "app.f3datasets.datasets"

Now, to dump e.g. pages you would run:

./manage.py f3dumpdata pages > tmp/pages.json

To dump the districts with the primary key of 42 and 43 you would run:

./manage.py f3dumpdata districts:42,43 > tmp/districts.json

The resulting JSON file has three top-level keys:

"version": 1: The version of the dump, because not versioning dumps is a recipe for pain down the road.
"specs": [...]: A list of model specs.
"objects": [...]: A list of model instances; uses the same serializer as Django’s dumpdata, everything looks the same.

Model specs consist of the following fields:

"model": The lowercased label (app_label.model_name) of a model.
"filter": A dictionary which can be passed to the .filter() queryset method as keyword arguments; used for determining the objects to dump and the objects to remove after loading.
"delete_missing": This flag makes the loader delete all objects matching "filter" which do not exist in the dump.
"ignore_missing_m2m": A list of field names where deletions of related models should be ignored when restoring. This may be especially useful when only transferring content partially between databases.
"save_as_new": If present and truish, objects are inserted using new primary keys into the database instead of (potentially) overwriting pre-existing objects.
"defer_values": A list of fields which should receive random garbage when loading initially and only receive their real value later. This is especially useful to avoid unique constraint errors when loading partial graphs.

Note

When using save_as_new and delete_missing together, you may need to specify how primary keys should be mapped to avoid inadvertent deletion of objects. If you have a parent model with save_as_new and child models with both save_as_new and delete_missing, you should use the dictionary form of delete_missing with a map parameter to map old primary keys to new ones. For example:

{
    "filter": {"parent__in": [42]},
    "save_as_new": True,
    "delete_missing": {
        "map": [
            ("parent__in", "app.Parent"),
        ],
    },
}

This ensures that child objects are matched against the new parent primary keys rather than the old ones, preventing old data from being kept and new data from being inadvertently deleted.

Note that the mapping fails loudly on purpose if the primary key mapping isn’t available. The reason could be that the parent model doesn’t actually use save_as_new. Since this is a potentially destructive operation it’s better to fail loudly than to silently eat data.

The dumps can be loaded back into the database by running:

./manage.py f3loaddata -v2 tmp/pages.json tmp/districts.json

Each dump is processed in an individual transaction. The data is first loaded into the database; at the end, data matching the filters but whose primary key wasn’t contained in the dump is deleted from the database (if "delete_missing": True).

Change log¶

Added testing using Python 3.14.

IMPORTANT: Using save_as_new together with delete_missing had the potential to leave old data lying around, since delete_missing would keep both the data with the old and the new primary key. This has been fixed now, but this change also reintroduces the problem fixed in 0.3 where too much data was being deleted if you’re not being extra careful. If you have a parent model with save_as_new and also child models with save_as_new and delete_missing you can now specify how primary keys should be mapped so that child objects hanging off the original will not be deleted inadvertently.

Fixed a crash because of mishandled many to many relations when using save_as_new on both sides of the relation.
Added testing using Django 6.0.

Ensured that dumps only ever contain distinct objects from single specs.
Raised the minimum version requirements to Python 3.10 and Django 4.2.
Added Django 5.2.
Changed the JSON dumps to include a final newline. This change is inspired by the same change to Django.

Added ./manage.py f3dumpdata - which allows reading JSON data from stdin.
Added Python 3.12, 3.13 and Django 5.0, 5.1.
Order objects by their primary key when dumping specs. This helps with comparing JSON files by hand.

Fixed the broken argument validation of ./manage.py f3dumpdata.
Switched to hatchling and ruff.
Made specs_for_*_models helpers return a list instead of a generator.
Changed the assert statement for checking the dump version into a raise statement since assertions could be optimized out.
Renamed InvalidSpec to InvalidSpecError to make ruff happier.
Added an objects argument to dump_specs which allows overriding the list of objects to dump.

Added argument validation to ./manage.py f3dumpdata instead of crashing when the dataset isn’t known.
Added the defer_values spec field which allows specifying a list of fields whose real values should be deferred and only saved after missing data has been deleted. This is especially useful when you have unique fields where partial updates could produce constraint validation errors (unique constraints cannot be deferred it seems).

Added various tests for expected behavior.
Refactored load_dump to pass the code complexity checker.
Added an expected failure test which shows that handling unique fields isn’t yet working in all cases.
Added Django 4.2a1 to the CI matrix.
Allowed data loading with save_as_new to work in more scenarios where the data contains cyclic dependencies.

Fixed a crash when using nullable foreign keys to a model which uses save_as_new.
Fix a behavior when using save_as_new together with delete_missing: When the parent of a model with both flags set is updated, the content is now duplicated (because the old object hadn’t been removed and the new one has been saved according to save_as_new). Previously, the old object has been removed automatically. Worse, this made is impossible to use save_as_new for duplicating top-level objects (their object graph would be removed after inserting the copy). This change may still be backwards incompatible for you though, so better check twice.
Fixed deletion of missing objects in the presence of protected objects by processing specs in reverse.
Changed specs_for_derived_models to skip proxy models instead of skipping unmanaged models.
Started deferring constraints and resetting sequences when loading dumps.

Added Django 4.1.
Added the mappers argument to dump_specs which allows changing the serialized representation of models before writing the JSON.
Changed load_dump to also validate specs.
Replaced FEINCMS3_DATA_SPECS with the more flexible and opinionated FEINCMS3_DATA_DATASETS.

Initial release!

feincms3-data¶

Why¶

How¶

Change log¶

feincms3-data

Navigation

Related Topics