[INSTRM-2046] Add tron watchdog Created: 27/Jul/23  Updated: 10/Nov/23

Status: Open
Project: Instrument control development
Component/s: tron_tron
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Normal
Reporter: cloomis Assignee: cloomis
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to INSTRM-2045 Harden tron hub and recover better on... Open

 Description   

tron_tron stopped, which it essentially never does. But it did, and when it was restarted, the actors reconnected as we would have wished for.

But it could simply have been started by itself. Write some cron/at/whatever script to look for a missing runhub.py and start the hub if down for a minute or two.



 Comments   
Comment by cloomis [ 10/Nov/23 ]

tron_actorcore does use a twisted ReconnectingClientFactory, but that seems to stop trying after a while. There are several knobs for delays, retries, limits. Those need to be checked.

Generated at Sat Feb 10 16:42:26 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.