[INSTRM-1993] Fix pixel drops at start of ramps Created: 15/Jun/23 Updated: 28/Jul/23 Resolved: 28/Jul/23 |
|
| Status: | Done |
| Project: | Instrument control development |
| Component/s: | hxhal, ics_hxActor |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | cloomis | Assignee: | cloomis |
| Resolution: | Done | Votes: | 0 |
| Labels: | near-term | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Eng12July | ||||||||
| Description |
|
Ramps sporadically fail, where the actor cannot get enough pixels from the DAQ to complete the ramp. There is no per-read framing, so this is only discovered at the end of the ramp, and the drops could have come anytime. On n1 at Subaru, we are seeing quite a few of these: 19/516 from 2023-05 to now, ~4%. It turns out that most/all ramps are simply missing pixels at the beginning of the ramp, visible even in the RESET frame. This is probably good news: even if we cannot fully recover that frame, it is essentially never used. I have not yet found anything stupid on my side, sadly, nor seen warnings of SAM FIFO overflows. There are various counters on the SAM and the ASIC which can be reported more enthusiastically; I'll work on adding those under this ticket. |
| Comments |
| Comment by cloomis [ 07/Jul/23 ] |
|
Partly inspired by |
| Comment by cloomis [ 11/Jul/23 ] |
|
About a third of the failures happened because the number of pixels expected for the ramp is calculated based on values read back from the ASIC, and sometimes those values are read back wrong. Simple fix for these particular ramps: calculate based on the values we wrote to the ASIC when configuring it. This seems moderately likely to be from errors on the SAM (ASIC register reads are done by programming some SAM registers; in one case the value of the register was the address of the register): I think we will find similar problems with some of the other failed ramps. |
| Comment by cloomis [ 14/Jul/23 ] |
|
I have made most of the changes required to notice failed ramps quickly, and restructured the inner takeRamp method to support resuming a stopped ramp. But need a failure at this point before I can actually implement that. |
| Comment by cloomis [ 25/Jul/23 ] |
|
I am going to close this one, although I do expect to have smaller followup tickets. Did four things, basically:
|
| Comment by cloomis [ 26/Jul/23 ] |
|
Yeah, close it, but also look at 98434 on n1. |
| Comment by cloomis [ 28/Jul/23 ] |
|
As commented, there will be more work, but this ticket should be closed. hxhal: tagged 3.1.1 which is what we have been running for a couple of nights at the end of the 2023-07 run. |