Data flows through three layers on its way to becoming front-end pages: tabular, linked data as statements, and linked data as documents.
The first of these data layers is simply the data as we get it from a source system. In the case of museum collections data, this data usually arrives at the pipeline as CSV exports or a JSON API. For archival materials we ingest EAD3/XML documents. If an archival source is only available in EAD2002 we use the EAD2002 -> EAD3 XSLT transform provided by the Society of American Archivists Technical Subcommittee on Encoded Archival Standards.
There are a few kinds of fetchable data sources:
In each of these cases the data is fetched, stored in output/<format>
, and transformed into linked data and made available for loading.
In some of these cases source data needs special handling--branch selection of archival hierarchies, escaping embedded HTML, etc. If these are needed the pipeline generates an intermediate representation that is then passed to the linked data transforms.
Every data source is passed through a linked data transform:
At the end of the transformation process, linked data is loaded into the triplestore. More details on transformations can be found in the collections app repository.
At this stage we begin enhancing the graph we've loaded into the database. We:
Then we take the graph full of triples, query via SPARQL construct queries for entities and their underlying properties, and frame the results so that we have JSON-LD documents that show us objects with their identifiers, descriptive cataloguing, participants, etc.
At this stage we also create a data release with a versioned tag that indicates the date it was produced.
Finally, we take the JSON-LD documents and produce simplified versions that the page builder uses to produce HTML documents for the site, and by the search index for the browse and search pages.