[INSTRM-790] ion pumps communication issue Created: 07/Oct/19 Updated: 17/Oct/19 |
|
Status: | Open |
Project: | Instrument control development |
Component/s: | ics_xcuActor |
Affects Version/s: | None |
Fix Version/s: | None |
Type: | Bug | Priority: | Normal |
Reporter: | fmadec | Assignee: | cloomis |
Resolution: | Unresolved | Votes: | 0 |
Labels: | SPS | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
||||||||
Sprint: | SM1-2019 P |
Description |
This is not new, but it often happened this time it can happen at the ion pumps start as shown below or in operation by the monitoring. it would be "nice" to have a way to handle that before starting operation at Subaru
2019-10-03 15:23:19 xcu_r1 w text="failed to create connect or send to ion pump: [Errno 111] Connection refused" 2019-10-03 15:23:19 xcu_r1 f text="command failed: ConnectionRefusedError(111, 'Connection refused') in sendOneCommand() at /software/mhs/products/Linux64/ics_xcuActor/1.11.1/python/xcuActor/Controllers/ionpump.py:65"
|
Comments |
Comment by cloomis [ 08/Oct/19 ] |
The chained pair of ionpump controllers which controls all six ion pumps for the three cryostats are accessible as two RS-485 nodes behind a single MOXA RS-485 port. This cluster of things is commanded from three independent actors. This cannot be reliable, really. One of the earlier tickets discussing ion pump communications problems suggested having either an ionpumpActor or at least a single program to sequence all commands to the controllers. The traffic is very simple, so either seems rational. Will think about which is best. I looked more carefully at the MOXA configuration options, and do not believe we can use it to sequence TCP connections: MOXAs can allow multiple concurrent connections, but that would be much worse. An inventory of all the Connection refused failures in 2019 does suggest that simply retrying after some small random delay would significantly improve matters. But that cannot entirely fix the real problem. For r1 and b1 together, 2019 showed 1_260_000 commands with all but 49 being status commands; 11_000 of those connections were refused. All but 136 of the refused connections were from when the periodic status queries for the two cryostats synced up (07-1* and 09-1*). The periodic tasks are re-scheduled based on the system clock and not from the end of a command, so it is very easy to get locked into a hopeless failure. Different ticket, but naively retrying to fix this ticket's problems would probably make things worse without addressing that. I'd rather add a new actor than change that. |