[INSTRM-918] SM1 at Subaru : shutters failed Created: 11/Mar/20  Updated: 22/Jun/21  Resolved: 22/Jun/21

Status: Done
Project: Instrument control development
Component/s: ics_enuActor
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Normal
Reporter: fmadec Assignee: arnaud.lefur
Resolution: Done Votes: 0
Labels: SM1, SPS
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to INSTRM-1239 Expose function operates blue shutter... Won't Fix
relates to INSTRM-1226 trigger an alert whenever biashaFSM[1... Done
relates to INSTRM-1232 add single enuActor top level command... Done
Story Points: 2
Sprint: SM1PD-2020 C, SM1PD-2020 D, SM1PD-2020 G, SM1PD-2020 G2, SM1PD-2020 H, SM1PD-2020 J, SM1PD-2021 A 3, SM1PD-2021 A 4, SM1PD-2021 A 5, SM1PD-2021 A 6, SM1PD-2021 A 11

 Description   

 

03:53:07.430948 spsait w text="enu_sm1 shutters failed: FysomError('event init inappropriate in current state FAILED',) in fn() at /software/conda/lib/python3.6/site-packages/fysom/_init_.py:272"

Don't know why they failed

Basically biasha board is stuck in a peculiar state where it doesn't answer correctly.

alefur@shell-ics:~$ oneCmd.py enu_sm1 biasha raw=status                                                            
2020-08-13T16:00:44.454 sent enu_sm1 biasha raw=status (from alefur_shell_ics)                                     
2020-08-13T16:00:44.478 enu_sm1 : text="returned: 30"                                                              
alefur@shell-ics:~$ oneCmd.py enu_sm1 biasha raw=statword                                                          
2020-08-13T16:01:01.772 sent enu_sm1 biasha raw=statword (from alefur_shell_ics)                                   
2020-08-13T16:01:01.792 enu_sm1 : text="returned: n"

 The only fix is to power cycle the board



 Comments   
Comment by fmadec [ 11/Mar/20 ]

just did disconnect/connect , status became ok.

 

a dark test image works.

but then I started a sequence but it failed again:

 

04:06:44.656414 spsait w text="enu_sm1 shutters failed: RuntimeError('biasha has replied nok, red_open inappropriate in current state ',) in _gotoState() at /software/mhs/products/Linux64/ics_enuActor/1.0.0/python/enuActor/Controllers/biasha.py:344" 04:06:44.656966 spsait w text="command failed: RuntimeError('Exposure has failed',) in expose() at /software/mhs/devel/ics_spsActor/python/spsActor/Controllers/expose.py:58" 04:06:44.711224 spsait f text="command failed: RuntimeError('subCmd has failed.. sequence aborted..',) in processSubCmd() at /software/mhs/products/Linux64/ics_spsaitActor/1.0.3/python/spsaitActor/utils/experiment.py:80"

 

 

Comment by arnaud.lefur [ 12/Mar/20 ]

It's seems that the arduino is stuck in a weird state, some command are not recognized correctly. 

Power cycle fixed it, but that's the second time it's happening at Subaru and we never seen it at LAM, a bit worrysome...

Comment by fmadec [ 14/Apr/20 ]

it happened again today

07:07:55.644614 spsait w text="enu_sm1 shutters failed: RuntimeError('biasha has replied nok, shut_open inappropriate in current state ',) in _gotoState() at /software/mhs/products/Linux64/ics_enuActor/1.0.0/python/enuActor/Controllers/biasha.py:344"
07:07:55.645212 spsait w text="command failed: RuntimeError('Exposure has failed',) in expose() at /software/mhs/devel/ics_spsActor/python/spsActor/Controllers/expose.py:58"
07:07:55.699574 spsait f text="command failed: RuntimeError('subCmd has failed.. sequence aborted..',) in processSubCmd() at /software/mhs/products/Linux64/ics_spsaitActor/1.0.3/python/spsaitActor/utils/experiment.py:80"

Comment by hassan [ 16/Apr/20 ]

As we heard during today's ICS/SpS telecon: not much can be done without significant software changes. No longer required for the spsait->Subaru migration. Problem appears on an exposure-by-exposure basis, so the worst that can happen is that one exposure is lost. Nominal situation then recovered after a power-cycle.

Will lower priority for this.

Comment by hassan [ 04/Jun/20 ]

From arnaud.lefur: not much can be done about this. We just need to monitor the situation regularly.

Comment by arnaud.lefur [ 14/Aug/20 ]

the bug again using only blue shutters.
board was stuck and needed to be power-cycled.

alefur@shell-ics:~$ oneCmd.py enu_sm1 biasha raw=status                                                            
2020-08-13T16:00:44.454 sent enu_sm1 biasha raw=status (from alefur_shell_ics)                                     
2020-08-13T16:00:44.478 enu_sm1 : text="returned: 30"                                                              
alefur@shell-ics:~$ oneCmd.py enu_sm1 biasha raw=statword                                                          
2020-08-13T16:01:01.772 sent enu_sm1 biasha raw=statword (from alefur_shell_ics)                                   
2020-08-13T16:01:01.792 enu_sm1 : text="returned: n"
Comment by arnaud.lefur [ 08/Oct/20 ]

happened again today !

Comment by hassan [ 08/Oct/20 ]

Requires a firmware update and discussions with LAM support on this.

Comment by hassan [ 05/Nov/20 ]

Arnaud will organize a meeting with LAM support (Philip?) to address this.

Comment by cloomis [ 18/Jan/21 ]

And on 2021-01-16:

2021-01-16 22:37:05.780Z biasha           10 bufferedSocket.py:64 sending b'init\r\n'
2021-01-16 22:37:05.795Z biasha           10 bufferedSocket.py:133 received 'ok'
2021-01-16 22:37:05.798Z biasha           10 bufferedSocket.py:64 sending b'blue_open\r\n'
2021-01-16 22:37:05.811Z biasha           10 bufferedSocket.py:133 received 'nok'
Comment by hassan [ 25/Feb/21 ]

A similar problem is occurring at LAM (SM2). Arnaud will investigate this further at LAM.

Comment by hassan [ 25/Mar/21 ]

Arnaud starting testing at LAM. Ongoing.

Comment by arnaud.lefur [ 31/Mar/21 ]

currently running a fixed version on SM3.

Comment by hassan [ 15/Apr/21 ]

Problem not seen so far at LAM.

Comment by hassan [ 22/Apr/21 ]

Problem after firmware update at LAM, no further problems have been seen.

Following discussions with Fabrice, Arnaud and Craig: will update the firmware further and coordinate with Subaru in ~ 1 month to install the updated firmware.

Comment by fmadec [ 02/Jun/21 ]

Problem occured this morning at LAM with the new firmware version:

 

2021-06-02 07:15:05.729Z cmds             20 CommandLink.py:121 > 2 1048 f text="command failed: RuntimeError('error : shutter switch timeoutn') in _gotoState() at /software/mhs/p 
roducts/Linux64/ics_enuActor/1.1.8/python/enuActor/Controllers/biasha.py:347" 
2021-06-02 07:15:53.886Z cmds             20 CommandLink.py:121 > 2 1050 f text="command failed: FysomError('event init inappropriate in current state FAILED') in fn() at /softwar 
e/conda/envs/conda-ics/lib/python3.7/site-packages/fysom/__init__.py:272" 
2021-06-02 07:16:08.940Z cmds             20 CommandLink.py:121 > 0 0 w text="command failed: RuntimeError('statword 010100  does not match current state') in shutterStatus() at / 
software/mhs/products/Linux64/ics_enuActor/1.1.8/python/enuActor/Controllers/biasha.py:237" 
2021-06-02 07:16:23.993Z cmds             20 CommandLink.py:121 > 0 0 w text="command failed: RuntimeError('statword 010100  does not match current state') in shutterStatus() at / 
software/mhs/products/Linux64/ics_enuActor/1.1.8/python/enuActor/Controllers/biasha.py:237" 
2021-06-02 07:16:37.191Z cmds             20 CommandLink.py:121 > 2 1052 f text="command failed: FysomError('event init inappropriate in current state FAILED') in fn() at /softwar 
e/conda/envs/conda-ics/lib/python3.7/site-packages/fysom/__init__.py:272" 
2021-06-02 07:16:52.244Z cmds             20 CommandLink.py:121 > 0 0 w text="command failed: RuntimeError('statword 010100  does not match current state') in shutterStatus() at / 
software/mhs/products/Linux64/ics_enuActor/1.1.8/python/enuActor/Controllers/biasha.py:237" 
2021-06-02 07:17:07.297Z cmds             20 CommandLink.py:121 > 0 0 w text="command failed: RuntimeError('statword 010100  does not match current state') in shutterStatus() at / 
software/mhs/products/Linux64/ics_enuActor/1.1.8/python/enuActor/Controllers/biasha.py:237" 
2021-06-02 07:17:20.296Z cmds             20 CommandLink.py:121 > 2 1054 f text="command failed: FysomError('event init inappropriate in current state FAILED') in fn() at /softwar 
e/conda/envs/conda-ics/lib/python3.7/site-packages/fysom/__init__.py:272" 
2021-06-02 07:17:35.349Z cmds             20 CommandLink.py:121 > 0 0 w text="command failed: RuntimeError('statword 010100  does not match current state') in shutterStatus() at / 
so

 

Comment by arnaud.lefur [ 02/Jun/21 ]

It looks like another issue.

2021-06-02 07:15:00.126Z cmds             20 CommandLink.py:121 > 2 1048 i elapsedTime=6.43                                                                         
2021-06-02 07:15:02.127Z cmds             20 CommandLink.py:121 > 2 1048 i elapsedTime=8.43      
2021-06-02 07:15:03.701Z cmds             20 CommandLink.py:121 > 0 0 i biashaFSM=ONLINE,BUSY      
2021-06-02 07:15:03.701Z cmds             20 CommandLink.py:121 > 0 0 i metaFSM=ONLINE,BUSY
2021-06-02 07:15:03.701Z biasha           10 bufferedSocket.py:64 sending b'red_close\r\n'                                                                                                                     
2021-06-02 07:15:05.722Z biasha           10 bufferedSocket.py:133 received 'shutter switch timeoutnok'         
2021-06-02 07:15:05.722Z cmds             20 CommandLink.py:121 > 0 0 i biashaFSM=ONLINE,FAILED
2021-06-02 07:15:05.722Z cmds             20 CommandLink.py:121 > 0 0 i metaFSM=ONLINE,FAILED  

An exposure was being taken, the red shutter did not close as it's supposed to.
But the board itself was responding well:

2021-06-02 07:16:08.936Z biasha           10 bufferedSocket.py:133 received '84ok'                                                                                  
2021-06-02 07:16:08.936Z cmds             20 CommandLink.py:121 > 0 0 i shb=0,1,0                                                                                   
2021-06-02 07:16:08.937Z cmds             20 CommandLink.py:121 > 0 0 i shr=1,0,0                                                                                   
2021-06-02 07:16:08.937Z cmds             20 CommandLink.py:121 > 0 0 w shutters=undef                    
2021-06-02 07:16:08.940Z cmds             20 CommandLink.py:121 > 0 0 w text="command failed: RuntimeError('statword 010100  does not match current state') in shutterStatus() at /software/mhs/products/Linux64/ics_enuActor/1.1.8/python/enuActor/Controllers/biasha.py:237"
2021-06-02 07:16:23.941Z cmds             20 CommandLink.py:121 > 0 0 i biashaFSM=ONLINE,FAILED                                                                     
2021-06-02 07:16:23.941Z cmds             20 CommandLink.py:121 > 0 0 i biashaMode=operation              
2021-06-02 07:16:23.942Z biasha           10 bufferedSocket.py:64 sending b'status\r\n'                                                                             
2021-06-02 07:16:23.947Z biasha           10 bufferedSocket.py:133 received '0ok'                                                                                   
2021-06-02 07:16:23.947Z cmds             20 CommandLink.py:121 > 0 0 i biasha=0

This is reflected here, in biasha=0 both shutters are supposed to be closed, but red was actually still open.

It would have been interesting to see whether a biasha init command could fix the issue.

Comment by arnaud.lefur [ 22/Jun/21 ]

after 3 month of test, this mysterious bug never came back !
So let's declare victory now and close this ticket.
Will open a new one if it appears to be necessary.

Generated at Sat Feb 10 16:29:59 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.