[INSTRM-935] Make actors re-connect after tron restart - PFS-JIRA

XML

Word

Printable

Details

Type: Task
Status: Open (View Workflow)
Priority: Normal
Resolution: Unresolved
Component/s: tron_actorcore
Labels:
None

Story Points:
3

Description

The machine which hosts tron at JHU rebooted. The actors running on other machines did not re-connect after the server came back up. I'm not sure whether this ever worked, or whether we just assumed that that machine rebooting signals something dire.

Note that at JHU, one machine does everything: tron, postgresql, the archiver, DHCP, DNS, NFS, etc. etc. So it may be that things cannot recover cleanly only at JHU.

The quick workaround is to reboot the client machines (at JHU, only the BEEs) and let ics_launch reconnect things.

In any case, look into it.

Attachments

Activity

People

Assignee:

cloomis

Reporter:

cloomis

Votes:

0 Vote for this issue

Watchers:

1 Start watching this issue

Dates

Created:

29/Mar/20 8:43 PM

Updated:

29/Mar/20 8:43 PM