[INSTRM-2045] Harden tron hub and recover better on exit. Created: 27/Jul/23  Updated: 02/Aug/23

Status: Open
Project: Instrument control development
Component/s: tron_tron
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Normal
Reporter: cloomis Assignee: cloomis
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to INSTRM-2046 Add tron watchdog Open
relates to INSTRM-322 Switch actor logging to rsyslog. Open

 Description   

We had a tron hub shutdown on 2023-07-26, triggered by the /data NFS mount going away. Recovery was pretty clean, but I don't think we can count on that.

  • restarted the tron hub on pfs@mhs-ics, with {setup tron_tron; tron restart}
  • found that all actors reconnected automatically, with {oneCmd.py hub actors}

    and some manual inspection. I am actually surprised this worked so well.

The /data NFS bounce also caused all the actors to stop writing their logs. Not sure how we would notice this in general...... That was fixed by telling all the non-hub actors to restart their logging via being reconfigured:

for a in $(oneCmd.py hub actors | sed -n '/actors=/{s/^.*,msg,//; s/,/ /g; p}'); do 
    echo "==== $a"
    oneCmd.py $a reloadConfiguration
    sleep 1
    oneCmd.py $a status
done

Will attach more tickets to this.



 Comments   
Comment by cloomis [ 02/Aug/23 ]

Hmm. That command to get the actors to restart logging did not, of course, restart the logging for the Gen2 side of the gen2Actor.

Generated at Sat Feb 10 16:42:26 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.