Raw image file header patching

FITS headers are not always correct. Instrument or observatory software can be buggy, sensor or ADC hardware can be broken, humans can make mistakes. But FITS files should not be modified, if only because you then have more than one version of a given file.

We want a mechanism for patching broken FITS headers, subject to the following fundamental requirements:

We propose modifying FITS headers when reading them, based on git-versioned patch files. Details are laid out below.

Examples of fixable problems

All of the following cards come from external actors, which might not be working correctly:

  • The pfsDesign (FITS card is W_PFDSGN). Comes from fpsActor at Subaru, dcbActor at LAM.
  • The calibration lamp cards, which come from dcbActor at LAM, something still undefined at Subaru.
  • EXPTIME, which usually comes from the enuActor.
  • IMAGETYP, which usually comes from the spsActor.

All the above are required by the DRP.

Some other possible examples:

  • W_SRCCD0 or W_SRH4RG, the serial numbers of a CCD and H4RG.
  • W_XTASIC, the ASIC temperature. Maybe not necessary, but we might want to fix cards like this.

Proposed interface

From the user's point of view, the mechanism should look exactly like reading the file. The DRP Butler would know how to transparently patch FITS headers when creating an Exposure. This would happen _before_ the FITS header is consumed to construct the Exposure's metadata.

As a proof-of-concept and for non-DRP users, a pure-python implementation using astropy.io.fits will be in the pfs_utils product, which is specifically constructed to not depend on instrument or pipeline software.

Proposed mechanism

We propose saving YAML files containing a dictionaries of list of rules, in the pfs_instdata git repo. The YAML file names will be based on the DATE-OBS FITS card: data/imagefiles/patches/2019-03-20.yaml, or maybe 2019-03.yaml, say. Most dates will not need or have a patch file, but I think putting all patches in one file is likely to be hard to deal with.

Inside the patch file, the outermost keys are file globs (PFJA01403012.fits or PFS[AB]00123*21.fits) indicating which files to act on, and the values are lists of what to do to the matches files. If there are multiple matches for a single file, all rules are run in the order they are found in the patch file.

As it stands, all actions modify FITS cards, and are expressed as actions on FITS cards.

fileGlob:
 - action: addCard
   name: NAME
   value: VALUE
   comment: COMMENT         # Optional
   overwriteExisting: True  # Optional, default is False

 - action: deleteCard
   name: NAME

 - action: modifyCard
   name: NAME
   newValue: NEWVALUE
   unlessValueIs: VALUE  # Optional
   onlyIfValueIs: VALUE  # Optional
   addIfMissing: True    # Optional, default is False

I will suggest adding a couple more routines, but these should be discussed separately:

- declareFileBad
   # Shorthand for ``action: modifyCard; name: W_QUALTY; value: bad``

- callRoutine: python.name
   # For truly strange modifications. Using this is *strongly*
   # discouraged. [And not yet implemented]

And I, personally, do not object to security horrors like:

- modifyCard: NAME
  value: NEWVALUE
  onlyIfValueIs: EXPR

where ``EXPR`` is ``eval()`` ed.

Desiderata and questions

  • What if we want to apply a patch to files covering a very long time? [e.g. to patch _all_ LAM data without a W_PFDSGN card]? Maybe have a global_patches.yaml which is checked for all files?
  • What if there is no DATE-OBS card? We look in no_date-obs.yaml. Surely there not be too many of these?
  • syntactic sugar? Easy to have addCard: W_GBLDGK instead of/as well as action: addCard, name: W_GBLDGK
  • Are filename globs the right key? Strings are the easiest YAML keys, and I do expect globs when dealing with filenames. Did think about just using the stem (PFLA01234511), but it is handy to have an anchor when looking for all r3 files: PFLA*32.fits. So how about a regexp on the stem instead? Votes?
  • Time chunks. One night feels like the right unit to me, but I can see arguments for per-run. Longer feels dangerous.
  • Multiple HDUs. Specifically, when HDU0 has most cards but HDU1 has the WCS because of compression. And for NIR ramps. Not sure.
  • STARS/SMOKA ingest?
  • This mechanism should work for non-science PFS FITS files: from the MCS or the guiders. Patches for those obviously cannot be used in operations, so it is not obvious that there is any real value in supprting this.
  • Per-institution directories? There will be overlap between LAM and Subaru.
  • It is not clear that this is the right place for tracking data file validity (declareFileBad). But it is convenient.