[INSTRM-643] ccd r1 fee crash Created: 04/Apr/19 Updated: 24/Dec/20 |
|
| Status: | In Progress |
| Project: | Instrument control development |
| Component/s: | ics_ccdActor |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | fmadec | Assignee: | cloomis |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
ccd r1 crashed during biases exposure
{{2019-04-04 10:40:19 ccd_r1 f text="command failed: KeyError('fee',) in fee() at python/ccdActor/main.py:47" }}
then tried disconnect/connect
2019-04-04 10:40:48 cmdIn=ccd_r1 connect controller=fee 2019-04-04 10:40:49 ccd_r1 w controllers="ccd" 2019-04-04 10:40:49 ccd_r1 w text="failed to connect controller fee" 2019-04-04 10:40:49 ccd_r1 f text="command failed: RuntimeError('failed to arm for readout)',) in pyFPGA.FPGA.setClockLevels() at fpga/pyFPGA.pyx:127"
then tried power off/on and connect but:
2019-04-04 10:45:14 ccd_r1 w controllers="ccd"
2019-04-04 10:45:14 ccd_r1 w text="failed to connect controller fee"
2019-04-04 10:45:14 ccd_r1 f text="command failed: RuntimeError('failed to arm for readout)',) in pyFPGA.FPGA.setClockLevels() at fpga/pyFPGA.pyx:127"
|
| Comments |
| Comment by fmadec [ 04/Apr/19 ] |
|
a restart of ccd actor solved the issue...
|
| Comment by cloomis [ 04/Apr/19 ] |
|
Slightly confused by the output; will look more carefully at the logs. |
| Comment by cloomis [ 05/Apr/19 ] |
|
This is an amusing cluster of problems.... Problem 0: the exposure after 15317 "hung". Specifically, the background thread started to run the integration and readout never started the readout (hence no exposureState=readout). This command/code path (running exposures in a background thread) is not used in production (or will not be after Problem 1: the exposure did not finish, so the exposure was cleared and the fee reconnected. But it couldn't be reconnected, because the exposure thread was not really dead (python cannot kill stuck threads) and still had a reference to an old fee object, which had the hardware connection open. The two problems explain why there are two "concurrent" and very very different complaints about the FEE. What to do? I'm inclined to instrument a few things better and make fee disconnection more violent/effective, but that's it. At first these look like very serious problems, but it is only because of the background thread which is going away. So understanding exactly why that thread hung up is not worth working on. If there is an underlying problem, it will be much easier to see and fix without the thread. |
| Comment by arnaud.lefur [ 17/Dec/19 ] |
|
It happened today, but I had an issue with spsaitActor, it's probably my fault, I have to check the logs... |
| Comment by cloomis [ 24/Dec/20 ] |
|
Bump, maybe. Tx, arnaud.lefur. The 2020-12-17 problems might well have happened because the file writing took forever, so the per-exposure thread never closed out. At the very least, new exposures should be rejected if an old exposure thread exists. What to do to better kill a stuck thread is still not clear. |