-
Type: Task
-
Status: Open (View Workflow)
-
Priority: Normal
-
Resolution: Unresolved
-
Component/s: tron_actorcore
-
Labels:None
-
Story Points:3
The machine which hosts tron at JHU rebooted. The actors running on other machines did not re-connect after the server came back up. I'm not sure whether this ever worked, or whether we just assumed that that machine rebooting signals something dire.
Note that at JHU, one machine does everything: tron, postgresql, the archiver, DHCP, DNS, NFS, etc. etc. So it may be that things cannot recover cleanly only at JHU.
The quick workaround is to reboot the client machines (at JHU, only the BEEs) and let ics_launch reconnect things.
In any case, look into it.