Archives data

Data transformation details

  • We start with data sources exported in EAD3.
  • We parse the XML tree to extract all the archival components from the EAD/XML document.
  • We transform the full finding aid to create a linked data representation of the collection-level details and the finding aid details themselves.
  • We transform the extracted components to create a linked data representation of the individual archival components.

At the end of the transformation we have RDF statements that represent the finding aid data itself, the cataloguing data for the top-level archival component (usually a "collection", DACS-wise), the cataloguing data of the enclosed components (usually a collection's series, subseries, etc), and the hierarchy between components.

URIs, CRM type, and classification

For our model, the minimal expression of an EAD3 component is a <c> tag with an identifier attribute and a DACS cataloguing level attribute:

<c id="aspace_ref23_lnh" level="item">
  <unitid>MDE_B017_F001_126</unitid>
</c>

Every archival component follows our model's basic patterns of having a URI, a type, and a classified_as property. For archival components:

* The `"id"` URI is derived from the component identifier: `http://data.duchamparchives.org/<organization>/archive/component/<unitid_tag_content>`
* `{ "type": "ManMadeObject" }` for components catalogued at the `item` level 
* or, `{ "type": "PhysicalObject" }` for components catalogued at any other DACS level
* a classification as an archival thing (`aat:300375748`, "archives")

When this is rendered in JSON-LD we get:

{
  "@context": "https://linked.art/ns/v1/linked-art.json",
  "id": "http://data.duchamparchives.org/pma/archive/component/MDE_B017_F001_126",
  "classified_as": [
    {
      "id": "aat:item",
      "label": "item",
      "type": "Type"
    },
    {
      "id": "aat:300375748",
      "label": "archives (groupings)",
      "type": "Type"
    }
  ]
}

Archival level classification

Every component in an EAD/XML hierarchy can carry a cataloguing-level attribute intended for expressing the component's level within the hierarchy. These level classifications are passed on as AAT stubs of that level. For example:

{
    "@context": "https://linked.art/ns/v1/linked-art.json",
    "id": "http://data.duchamparchives.org/pma/archive/component/MDE_B017_F001_126",
    "type": "ManMadeObject",
    "classified_as": [
        {
            "id": "aat:item",
            "label": "item",
            "type": "Type"
        },
        {
            "id": "aat:300375748",
            "label": "archives (groupings)",
            "type": "Type"
        },
        {
            "id": "aat:300026877",
            "label": "correspondence",
            "type": "Type"
        }
    ]
}

These classifications are rendered with the lowercase slugged stub of the level attribute (also addressing any non-ASCII characters or whitespace):

EAD3 Tag Classification
<c level="fonds"> aat:300311705
<c level="collection"> aat:300026032
<c level="group"> aat:300026972
<c level="subgroup"> aat:300135016
<c level="series"> aat:300054631
<c level="file"> aat:scopenote
<c level="item"> aat:related
<c level="otherlevel" otherlevel="Object"> aat:object

Identifiers and Names

In EAD3 we can express an archival component's basic identifiers with a combination of <unitid> tags and a <unittitle>:

<c id="aspace_ref23_lnh" level="item">
          <did>
            <unittitle>Alexina Duchamp at Philadelphia Museum of Art exhibition standing near The Large Glass</unittitle>
            <unitid>MDE_B017_F001_126</unitid>
            <container id="aspace_ee2c950f6f2741ae317f124e42e0cd4c" label="Mixed Materials" localtype="box">17</container>
            <container id="aspace_04257636d45150a17b00babb37d042c6" localtype="folder" parent="aspace_9ef5b4fd4ac5a29dd6272888874002bd">1</container>
          </did>
        </c>

Following our basic patterns these can be modelled as identifiers:

  • A Name with the value of <unittitle>, classified preferred (aat:300404670)
  • An Identifier with the value of <unitid>, classified with a stub, aat:accession
  • An Identifier with the value of the <c> tag's id attribute, classified preferred (aat:300404670)
  • Two Identifiers representing the box and folder locations, classified with stubs (aat:box and aat:folder)
  • An Identifier with the value of the component's sequence among its siblings, classified as the sequence (aat:300192339)

The identifiers for an archival item should look something like:

   {
       "@context": "https://linked.art/ns/v1/linked-art.json",
       "id": "http://data.duchamparchives.org/pma/archive/component/MDE_B017_F001_126",
       "identified_by": [
        {
            "id": "http://data.duchamparchives.org/pma/archive/component/aspace_475f21f96aeb4b46bb88b1eaa97d0a61/unittitle",
            "type": "Name",
            "value": "Alexina Duchamp at Philadelphia Museum of Art exhibition standing near The Large Glass"
        },
        {
            "classified_as": [
                {
                    "id": "aat:300404670",
                    "label": "preferred terms",
                    "type": "Type"
                }
            ],
            "id": "http://data.duchamparchives.org/pma/archive/component/aspace_475f21f96aeb4b46bb88b1eaa97d0a61/id",
            "type": "Identifier",
            "value": "aspace_475f21f96aeb4b46bb88b1eaa97d0a61"
        },
        {
            "classified_as": [
                {
                    "id": "aat:folder",
                    "label": "folder",
                    "type": "Type"
                }
            ],
            "id": "http://data.duchamparchives.org/pma/archive/container/aspace_281472aec2d4b6b005c495aa24072557",
            "type": "Identifier",
            "value": "1"
        },
        {
            "classified_as": [
                {
                    "id": "aat:box",
                    "label": "box",
                    "type": "Type"
                }
            ],
            "id": "http://data.duchamparchives.org/pma/archive/container/aspace_ee2c950f6f2741ae317f124e42e0cd4c",
            "label": "Mixed Materials",
            "type": "Identifier",
            "value": "17"
        },
        {
            "classified_as": [
                {
                    "id": "aat:300192339",
                    "label": "sequences",
                    "type": "Type"
                }
            ],
            "type": "Identifier",
            "value": "125"
        },
        {
            "classified_as": [
                {
                    "id": "aat:accession",
                    "label": "accession",
                    "type": "Type"
                }
            ],
            "id": "http://data.duchamparchives.org/pma/archive/component/aspace_475f21f96aeb4b46bb88b1eaa97d0a61/unitid/0",
            "type": "Identifier",
            "value": "MDE_B017_F001_126"
        }
         ]
   }

Descriptive Cataloguing

Most of the descriptive cataloguing in an EAD document becomes LinguisticObject nodes with classifications that roughly match the meaning of the descriptive tag. For example in this example document the <relatedmaterial> tag becomes a node classified as aat:related:

    { 
        "@context": "https://linked.art/ns/v1/linked-art.json",
        "id": "http://data.duchamparchives.org/pma/archive/component/MDE_B017_F001_126",
        "referred_to_by": [
            {
                "id": "http://data.duchamparchives.org/pma/archive/component/aspace_f936701a5829bcbd57f1b94da1601352/related/0",
                "type": "LinguisticObject",
                "classified_as": [
                    {
                        "id": "aat:related",
                        "label": "Related Material",
                        "type": "Type"
                    }
                ],
            "value": "The Bride Stripped Bare by Her Bachelors, Even (The Large Glass)"
            }
        ]
    }

Individual EAD tags are classified with the following rubric:

EAD3 Tag Classification
<prefercite> aat:300311705 ("citations")
<abstract> aat:300026032 ("abstracts")
<bioghist> aat:300026972 ("biography files")
<processinfo> aat:300135016 ("archival processing")
<arrangement> aat:300054631 ("classification")
<scopecontent> aat:scopenote (stub)
<relatedmaterial> aat:related (stub)

Archival hierarchy

XML/EAD documents represent the hierarchical 'belonging' relationships between components as a tree of nested children. Minimally this looks like:

   <ead>
       <archdesc level="collection">
           <did>
               <unittitle>Marcel Duchamp Exhibition Records</unittitle>
           </did>
           <dsc>
               <c level="series">
                   <did>
                       <unittitle>Philadelphia Museum of Art, "Marcel Duchamp," 1973</unittitle>
                   </did>
                   <c level="item">
                       <did>
                           <unittitle>Alexina Duchamp at Philadelphia Museum of Art exhibition standing near The Large Glass</unittitle>
                       </did>
                   </c>
               </c>
           </dsc>
       </archdesc>
   </ead>

That is, Marcel Duchamp Exhibition Records contains Philadelphia Museum of Art, "Marcel Duchamp," 1973, which contains a photograph (whose title is catalogued as "Alexina Duchamp at Philadelphia Museum of Art exhibition standing near The Large Glass").

These relationships are expressed two ways in JSON-LD. First, children describe the path to their parents with a part_of property that contains the nodes of all the child's ancestors. So:

{
    "part_of": [
        {
            "classified_as": [
                {
                    "id": "aat:300375748",
                    "label": "archives (groupings)",
                    "type": "Type"
                },
                {
                    "id": "aat:collection",
                    "label": "collection",
                    "type": "Type"
                }
            ],
            "id": "http://data.duchamparchives.org/pma/archive/collection/marcel-duchamp-exhibition-records",
            "identified_by": [
                {
                    "classified_as": [
                        {
                            "id": "aat:300404670",
                            "label": "preferred terms",
                            "type": "Type"
                        }
                    ],
                    "id": "_:b98",
                    "type": "Name",
                    "value": "Marcel Duchamp Exhibition Records"
                }
            ],
            "part": [
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref571_il6",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref13_x97",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref604_xlc",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref152_nys",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref584_xko",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref119_3fj",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref543_pjn",
                "http://data.duchamparchives.org/pma/archive/component/aspace_2d01407c020feaeb17422e7b223c30c4",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref212_493"
            ],
            "type": "PhysicalObject"
        },
        {
            "classified_as": [
                {
                    "id": "aat:300375748",
                    "label": "archives (groupings)",
                    "type": "Type"
                },
                {
                    "id": "aat:series",
                    "label": "series",
                    "type": "Type"
                }
            ],
            "id": "http://data.duchamparchives.org/pma/archive/component/aspace_ref13_x97",
            "identified_by": [
                {
                    "classified_as": [
                        {
                            "id": "aat:300404670",
                            "label": "preferred terms",
                            "type": "Type"
                        }
                    ],
                    "id": "http://data.duchamparchives.org/pma/archive/component/aspace_ref13_x97/unittitle",
                    "type": "Name",
                    "value": "Philadelphia Museum of Art, \"Marcel Duchamp,\" 1973"
                }
            ],
            "part": [
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref95_m2f",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref83_69y",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref111_hr7",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref107_cc5",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref49_xn6",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref75_pyz",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref101_iuo",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref43_zej",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref77_kwq",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref55_jgj",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref117_ybn",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref99_gt7",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref41_176",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref105_884",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref52_j0b",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref97_r8k",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref91_c2g",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref17_pp7",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref93_ptc",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref25_7vl",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref85_l0s",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref113_bv6",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref115_3zi",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref37_i2b",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref89_nw6",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref19_gmp",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref28_m4u",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref109_b9d",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref23_lnh",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref73_ncm",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref87_u6h",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref69_o2r",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref21_6hu",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref59_cwo",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref67_wpo",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref57_xzv",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref34_2pu",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref39_0z1",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref71_geh",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref15_b6o",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref103_ocl",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref31_rdk",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref63_xk6",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref47_mv8",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref45_u7w",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref65_t6p",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref61_h5s",
                "http://data.duchamparchives.org/pma/archive/component/aspace_ref80_zok"
            ],
            "type": "PhysicalObject"
        }
    ]
}

Likewise the document that expresses collection and finding aid information shows the full hierarchy from the top down with a part property:

{
    "@context": "",
    "id": "http://data.duchamparchives.org/pma/archive/collection/marcel-duchamp-exhibition-records",
    "type": "PhysicalObject",
    "classified_as": [
            {
                "id": "aat:300375748",
                "label": "archives (groupings)",
                "type": "Type"
            },
            {
                "id": "aat:collection",
                "label": "collection",
                "type": "Type"
            }
    ],
    "part": [
        {
            "classified_as": [
                {
                    "id": "aat:300026877",
                    "label": "correspondence",
                    "type": "Type"
                },
                {
                    "id": "aat:item",
                    "label": "item",
                    "type": "Type"
                },
                {
                    "id": "aat:300375748",
                    "label": "archives (groupings)",
                    "type": "Type"
                }
            ],
            "id": "http://data.duchamparchives.org/pma/archive/component/aspace_ref23_lnh",
            "identified_by": [
                {
                    "classified_as": [
                        {
                            "id": "aat:300404670",
                            "label": "preferred terms",
                            "type": "Type"
                        }
                    ],
                    "id": "http://data.duchamparchives.org/pma/archive/component/aspace_ref23_lnh/unittitle",
                    "type": "Name",
                    "value": "Alexina Duchamp at Philadelphia Museum of Art exhibition standing near The Large Glass"
                },
            ]
        }
    ]
}
  • NB if you're using the graph representation! In the graph, only the child-parent relationship is expressed. This reduces graph cycles and prevents infinite query recursion. All parent-children relationships are expressed in the graph with crm:P9_is_part_of predicates.

Relevant Field-Level Mappings

The mappings for each field in the model are available in the data review portion of this documentation website:

References

  • TBD