Uploaded image for project: 'Instrument control development'
  1. Instrument control development
  2. INSTRM-2500

drpActor at summit hanging during reductions

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Sprint:
      PreRun21Mar

      Description

      Twice in the last few days, the drpActor running at the summit has "hung up". The actor was unresponsive to `ping` commands, suggesting that the main user thread was hung. In both cases, there were three live `drpActor` processes. In one case, killing the newest one freed the actor to continue. In the second case, that didn't work and the actor had to be restarted.

      Umm, for the second one, the the actor was running reductions on visit=121963, only n1 and n2 detrend did not finish, and the detrends started finishing at 2025-03-22 17:11:30.019Z

      For the first, the log showed that the actor had a failure on one of the reductions (b2):

      2025-03-22 08:09:49.183Z lsst.ctrl.mpexec.mpGraphExecutor 20 mpGraphExecutor.py:650 Executed 11 quanta successfully, 1 failed and 0 remain out of total 12 quanta.
      2025-03-22 08:09:49.183Z lsst.ctrl.mpexec.mpGraphExecutor 40 mpGraphExecutor.py:666 Failed jobs:
      2025-03-22 08:09:49.184Z lsst.ctrl.mpexec.mpGraphExecutor 40 mpGraphExecutor.py:669   - FAILED: <TaskDef(lsst.obs.pfs.isrTask.PfsIsrTask, label=isr) dataId={instrument: 'PFS', arm: 'b', spectrograph: 2, visit: 121509, ...}>
      2025-03-22 08:09:49.197Z actor            20 engine.py:175 New pfsConfig available: /data/raw/2025-03-21/pfsConfig/pfsConfig-0x5b4744a63e7757a2-121510.fits
      2025-03-22 08:09:49.297Z cmds             20 Actor.py:524 new cmd: ping
      2025-03-22 08:09:49.299Z cmds             20 CommandLink.py:122 > 2 43 : text='Present and (probably) well'
      

      But I happen to know that ping was sent to the actor at 08:06:39: it really did hang up during the reductions.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                arnaud.lefur arnaud.lefur
                Reporter:
                cloomis cloomis
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: