[PIPE2D-1180] profile isrTask reducing n arm Created: 08/Mar/23 Updated: 05/May/23 Resolved: 05/May/23 |
|
| Status: | Done |
| Project: | DRP 2-D Pipeline |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Normal |
| Reporter: | arnaud.lefur | Assignee: | arnaud.lefur |
| Resolution: | Done | Votes: | 0 |
| Labels: | EngRun | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Sprint: | PreEng11Apr1 |
| Description |
|
When reducing n2 data at LAM, we noticed that it takes quite a long time (~2min30) to get an postISR image, please profile the current code and try to find improvement that would shorten that time. FYI, that how we get those images : import numpy as np
from lsst.obs.pfs.isrTask import PfsIsrTask as IsrTask
config = IsrTask.ConfigClass()
config.doBias = False
config.doDark = False
config.doFringe = False
config.doFlat = False
config.doLinearize = False
config.doDefect = True
config.doIPC = True
config.ipcCoeffs = np.array([13e-3, 6e-3])
config.doSaturationInterpolation = False
config.validate()
isrTask = IsrTask(config=config)
raw = butler.get('raw', dataId=dataId)
defects = butler.get('defects', dataId)
calexp = isrTask.run(raw, defects=defects)
|
| Comments |
| Comment by price [ 08/Mar/23 ] |
|
Would you please run that script under python -m cProfile -o profile.dat, and then post the result of running python -c 'from pstats import Stats; Stats("profile.dat").sort_stats("cumulative").print_stats(30)'? |
| Comment by arnaud.lefur [ 10/Mar/23 ] |
|
Here's the results : Thu Mar 9 22:56:44 2023 profile.dat
16660323 function calls (16092265 primitive calls) in 195.285 seconds
Ordered by: cumulative time
List reduced from 15070 to 30 due to restriction <30>
ncalls tottime percall cumtime percall filename:lineno(function)
3708/1 0.078 0.000 195.304 195.304 {built-in method builtins.exec}
1 0.000 0.000 195.303 195.303 test_detrend_n2.py:1(<module>)
1 0.000 0.000 155.743 155.743 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/obs_pfs/w.2023.09/python/lsst/obs/pfs/isrTask.py:422(run)
1 0.037 0.037 155.743 155.743 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/obs_pfs/w.2023.09/python/lsst/obs/pfs/isrTask.py:550(runH4RG)
1 0.197 0.197 148.283 148.283 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/isrTask.py:2497(maskAndInterpolateDefects)
1 0.421 0.421 145.692 145.692 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/isrFunctions.py:157(interpolateFromMask)
1 0.000 0.000 141.637 141.637 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/isrFunctions.py:77(interpolateDefectList)
1 140.737 140.737 140.866 140.866 {built-in method lsst.meas.algorithms.interp.interpolateOverDefects}
2 0.000 0.000 23.277 11.639 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/butler.py:1377(get)
2 0.000 0.000 21.831 10.916 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/butler.py:1414(callback)
3 0.000 0.000 21.279 7.093 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/posixStorage.py:262(read)
1 0.000 0.000 21.279 21.279 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/butler.py:1409(callback)
1 0.000 0.000 21.279 21.279 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/butler.py:1583(_read)
1 0.000 0.000 21.279 21.279 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/repository.py:186(read)
1 0.000 0.000 21.278 21.278 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/posixStorage.py:565(readFitsStorage)
1 0.124 0.124 21.278 21.278 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/calibType.py:432(readFits)
1 4.021 4.021 20.786 20.786 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:492(fromTable)
3/2 1.035 0.345 15.111 7.556 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:83(__init__)
3150/8 0.028 0.000 14.570 1.821 <frozen importlib._bootstrap>:986(_find_and_load)
3132/5 0.022 0.000 14.570 2.914 <frozen importlib._bootstrap>:956(_find_and_load_unlocked)
4050/4 0.006 0.000 14.556 3.639 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
3010/6 0.023 0.000 14.556 2.426 <frozen importlib._bootstrap>:650(_load_unlocked)
2631/6 0.012 0.000 14.556 2.426 <frozen importlib._bootstrap_external>:837(exec_module)
1063/54 0.004 0.000 14.030 0.260 {built-in method builtins.__import__}
1 0.000 0.000 11.582 11.582 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/obs_pfs/w.2023.09/python/lsst/obs/pfs/__init__.py:1(<module>)
1 0.000 0.000 10.386 10.386 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/obs_pfs/w.2023.09/python/lsst/obs/pfs/pfsMapper.py:1(<module>)
3620/1018 0.014 0.000 10.299 0.010 <frozen importlib._bootstrap>:1017(_handle_fromlist)
662458/441639 2.446 0.000 9.557 0.000 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:174(_normalize)
2 0.000 0.000 7.421 3.710 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:641(fromFootprintList)
3 7.180 2.393 7.180 2.393 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:261(maskPixels)
|
| Comment by price [ 10/Mar/23 ] |
|
So the tall pole is lsst.meas.algorithms.interp.interpolateOverDefects. There are a lot of defects in the NIR detector, and that function is well-known to be relatively slow. One option for a significant speed boost would be to simply replace bad pixels with the image median, but it would not be as visually pleasing. rhl should advise. |
| Comment by arnaud.lefur [ 07/Apr/23 ] |
|
We will be living with this, at least on short term, so we should probably close this ticket, unless someone wants to chime in. |
| Comment by cloomis [ 07/Apr/23 ] |
|
How hard would it be to allow median interpolation? Given that it takes > 2min we would certainly turn interpolation off otherwise. rhl? |
| Comment by rhl [ 07/Apr/23 ] |
|
We can close this once a ticket is filed to address the problem. In the short term, a median infill of the neighbours is probably OK (and should be fast). In the longer run, we need better interpolation in all arms – and this is true for Rubin too. Doug FInkbeiner claims to have fast 2-D GP interpolation (in julia |
| Comment by arnaud.lefur [ 11/Apr/23 ] |
|
so I did check what scipy.ndimage has to offer, their median_filter is quick, ~3 secs, but it's not doing great with NaNs or masked array. |
| Comment by arnaud.lefur [ 11/Apr/23 ] |
|
I've added a median interpolation function, just using numpy, which runs in ~30sec. I've also noticed that the final rotation was taking ~4s, using afwMath.rotateImageBy90 but it can be reduced to few millisec by manipulating the array dimension directly. |
| Comment by rhl [ 12/Apr/23 ] |
ny, nx = im.shape cube = np.zeros((9, ny - 2, nx - 2)) i = 0 cube[i] = im[0:-2, 0:-2]; i += 1 cube[i] = im[0:-2, 1:-1]; i += 1 cube[i] = im[0:-2, 2:]; i += 1 cube[i] = im[1:-1, 0:-2]; i += 1 cube[i] = im[1:-1, 1:-1]; i += 1 cube[i] = im[1:-1, 2:]; i += 1 cube[i] = im[2:, 0:-2]; i += 1 cube[i] = im[2:, 1:-1]; i += 1 cube[i] = im[2:, 2:]; i += 1 med = np.median(cube, axis=0) im[1:-1, 1:-1] = med I think that this perform a 3x3 median filter; it takes c. 1.1s for a 4k x 4k image. How does this compare to your code? Also, I worry that just flipping the definition of the array rather than flipping the pixels will cost us in cache when we actually use the image for anything; but maybe not. |
| Comment by arnaud.lefur [ 13/Apr/23 ] |
|
I tested your code, which I modified slightly for the edge cases, but it is faster indeed, because less dimension probably, constructing a (9, 4096, 4096) vs (4096, 4096, 3, 3) array. Full task run now in 15s instead 30s on LAM machine. |