Uploaded image for project: 'Instrument control development'
  1. Instrument control development
  2. INSTRM-2045

Harden tron hub and recover better on exit.

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Open (View Workflow)
    • Priority: Normal
    • Resolution: Unresolved
    • Component/s: tron_tron
    • Labels:
      None

      Description

      We had a tron hub shutdown on 2023-07-26, triggered by the /data NFS mount going away. Recovery was pretty clean, but I don't think we can count on that.

      • restarted the tron hub on pfs@mhs-ics, with {setup tron_tron; tron restart}
      • found that all actors reconnected automatically, with {oneCmd.py hub actors}

        and some manual inspection. I am actually surprised this worked so well.

      The /data NFS bounce also caused all the actors to stop writing their logs. Not sure how we would notice this in general...... That was fixed by telling all the non-hub actors to restart their logging via being reconfigured:

      for a in $(oneCmd.py hub actors | sed -n '/actors=/{s/^.*,msg,//; s/,/ /g; p}'); do 
          echo "==== $a"
          oneCmd.py $a reloadConfiguration
          sleep 1
          oneCmd.py $a status
      done
      

      Will attach more tickets to this.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                cloomis cloomis
                Reporter:
                cloomis cloomis
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: