Geordi data lifecycle/terminology

As data moves in and through geordi it goes through several stages. This document explains the various steps and introduces the terminology used to refer to each step.

Importing (to geordi)

The process of putting data into geordi. This step is done by an admin of the geordi installation, using commands provided by manager.py. In general, importing will use importers, which are implemented in the geordi/data/importer/indexes directory. Should data be updated, or mappings be updated, reimporting the same data performs an update.

Mapping

The process of turning data in raw form into geordi’s standard mapping format. This step is performed as part of importing, above, but is mentioned separately as it’s separated in the codebase. Mappings are defined in the geordi/data/mapping/indexes, per index and item type. Mappings are defined declaratively using rules specifying a source in the raw JSON data and a destination, along with, potentially, item links, conditions, transformations, and so-called “blank node” destinations. Mapping is discussed separately in the Mapping document.

Matching

After importing, items that are already in MusicBrainz may be marked as such by way of matching. This step is performed by users, through the web interface. (not yet implemented)

Seeding (importing to MusicBrainz)

Geordi’s mapped data can be converted into a format that allows adding it to MusicBrainz. Seeding is initiated by users who wish to add an item not already in MusicBrainz to it, or update some data in MusicBrainz. (not yet implemented)

Merging and Splitting

Since multiple items from several indexes can represent the same object, they can be marked as the same thing and thus merged, by way of an interface to specify their connection (potentially via intermediate objects such as release groups, areas, etc.). Likewise, it should be possible to split items by assigning their data items, links, and matches to two sides. (not yet implemented)