Vocabularies

Overview

The portal uses a few vocabularies:

  • In source data and in the artist reconciliation sheets, ISNI, ULAN, and LCNAF identifiers are used as identifiers to deduplicate people in the dataset. These are eventually used against Wikidata entity lists to walk between ID spaces.
  • Labels for AAT terms used within the dataset for classifiers and techniques are generated using a SPARQL query. The pipeline also keeps a cached copy of external vocabulary data in case the source endpoints are unreachable.

Localization considerations

The Duchamp Research Portal handles two langauges in source data and two languages on display and search. In order to adapt to the needs around multiple languages, we handle vocabulary terms as URIs during data handling and only use locale-specific labels in user-facing features like item pages, search facets, and relationship labels.

Display translations for technique and classification terms are managed with a simple table based vocabulary that manages URIs, localized labels, and any possible synonym URIs. This worked fine but does not fully prevent locale-specific editorial voice from leaking into data. In practice this can present challenges in places where a more managed approach would help resolve clusters of terms have field-specific relationships that can remain stubborn to hierarchization like silver gelatin print vs. photography vs fotographie.

References

  • TBD