[INSTRM-1180] ingest using shell call to ingestPfsImages Created: 13/Feb/21 Updated: 16/Feb/21 Resolved: 16/Feb/21 |
|
| Status: | Done |
| Project: | Instrument control development |
| Component/s: | ics_drpActor |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Normal |
| Reporter: | cloomis | Assignee: | cloomis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Story Points: | 1 | ||||||||
| Description |
|
We currently ingesting using a Task. Switch to a call to ingestPfsImages. Hoping to make new file processing faster and less prone to registry failures. |
| Comments |
| Comment by cloomis [ 14/Feb/21 ] |
|
Doesn't really help. Delay Is dependent on the size of the registry, so we do have to figure this out. Not obvious that the copy-update-rename hack fixed by the DM ticket is the problem, as much as I hate and want to blame that. Does seem to be at least partially tied up with a sequential scan of the db right before closing. More to come. |
| Comment by cloomis [ 14/Feb/21 ] |
|
Good news: ingest really does take long time due to a fixable obs_pfs problem updating the raw_visit table. Bad news: the drpActor does have associated sequencing problems which still need to be addressed. But I think addressing those will either take care of |
| Comment by arnaud.lefur [ 14/Feb/21 ] |
|
Great, it used to be ~atomic, so that makes sense. |
| Comment by cloomis [ 15/Feb/21 ] |
|
This is still a mess. I am running obs_pfs cpl on top of pfs_pipe2d w.2021.06b, which disables the raw_visit updates which cause ingest to slow down to 10s of seconds (everywhere, not just on shell2-ics). There may still be gains to be made on ingest. I re-ordered the internal processing of ingest&detrend in order to guarantee that only one operation is running at a time. Added isr.doDefect=False to get around a temporary detrend bug. See Even though shell-2 is under provisioned (4-core, 4GB)I do not see bad system contention (per dstat 1), but detrend is still slow (12-30s). There are two oddities which need to be figured out before I will trust or suspect anything else. One of the loggers (not the main actorcore/python logger, nor the LSST logger) multiplies itself, adding one line for each file processed. And when you quit the drpActor, there are a correspondingly increasing number of complaints from sqlite and from the multiplying logger. But according to the main thread, we only ever get four threads (not many many), and the thread ids printed by the sqlite complaint at the end are all the same: 2021-02-14 18:40:49.316Z actor 20 Actor.py:550 reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
actor INFO: reactor dead, cleaning up...
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
Exception ignored in: <function SqlRegistry.__del__ at 0x7fd4fea88400>
Traceback (most recent call last):
File "/software/drp-5.2/stack/miniconda3-4.5.12-1172c30/Linux64/daf_persistence/18.1.0/python/lsst/daf/persistence/registries.py", line 323, in __del__
self.conn.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140551400564480 and this is thread id 140553341876032.
|
| Comment by cloomis [ 16/Feb/21 ] |
|
A cute thing to notice in that last log: the "actor" logger is an actorcore thing, along with the "cmds" logger. Both always use the formatting you see in that top line: the date, logger name, etc. But after the detrend task is called the first time, every output line on the actor logger then also generates the multiplying lines as formatted by the LSST logging system: the logger name, the level, the message. Also, detrend was being called on each camera, but without specifying the arm: the second call processed both arms. Fixed. FWIW, with the ingest raw_visit problem fixed, ingest+detrend take ~20-25s, which is fast enough to avoid piling up when taking, say, a sequence of biases. So I propose:
Frankly, grouping all exposures fo a visit and calling detrend.py externally is probably as good as we can do in any case: the detrend.py overhead will be taken up by the 8 exposures. And finally, I will request that shell2-ics be beefed up some more. This configuration is silly. |
| Comment by arnaud.lefur [ 16/Feb/21 ] |
|
Also, detrend was being called on each camera, but without specifying the arm: the second call processed both arms. Fixed. Agreed, will try to fix |
| Comment by cloomis [ 16/Feb/21 ] |
|
Merged at 76ee97a. Intended to be a sloppy mess. |