Image management

Overview

  • Images are handed to the pipeline as TIFFs or JPEGs.
    • They can come from source data if these are available (Navigart, TMS API).
    • They can come from an S3 bucket in AWS that contains hi-res source images.
  • Each image is uncompressed and normalized
    • Then it is compressed with OpenJPEG to JPEG2000 and placed in a systematically organized AWS S3 bucket for retrieval by the image server.

Normalization Details

The DRP data pipeline uses Pillow to inspect images and do normalization. This can lead to some unexpected behavior around less common image formats (eg, 16+bpp TIFFs ). The normalization steps are:

  • Uncompress JPEGs
  • Normalize colorspace to 8bit RGB
  • Remove compression on TIFF intermediate representations if present

IIIF Server

The portal uses the Cantaloupe image server to respond to IIIF Image requests, though most requests are handled by Varnish caching to handle requests.

Troubleshooting

Due to the size of the image repositories in question we use S3 directly as an object store as both a source and a destination for the compression process.

This means that the simplest troubleshooting step is to examine the JPEG2000 directly (for orientation, max size, color, etc) and delete it and/or the source image before re-running the compression routine (pipeline images compress).

Reference: