[PIPE2D-1180] profile isrTask reducing n arm Created: 08/Mar/23  Updated: 05/May/23  Resolved: 05/May/23

Status: Done
Project: DRP 2-D Pipeline
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Normal
Reporter: arnaud.lefur Assignee: arnaud.lefur
Resolution: Done Votes: 0
Labels: EngRun
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: PreEng11Apr1

 Description   

When reducing n2 data at LAM, we noticed that it takes quite a long time (~2min30) to get an postISR image, please profile the current code and try to find improvement that would shorten that time.

FYI, that how we get those images :

import numpy as np
from lsst.obs.pfs.isrTask import PfsIsrTask as IsrTask

config = IsrTask.ConfigClass()
config.doBias = False
config.doDark = False
config.doFringe = False
config.doFlat = False
config.doLinearize = False
config.doDefect = True
config.doIPC = True
config.ipcCoeffs = np.array([13e-3, 6e-3])
config.doSaturationInterpolation = False
config.validate()

isrTask = IsrTask(config=config)

raw = butler.get('raw', dataId=dataId)
defects = butler.get('defects', dataId)
calexp = isrTask.run(raw, defects=defects)


 Comments   
Comment by price [ 08/Mar/23 ]

Would you please run that script under python -m cProfile -o profile.dat, and then post the result of running python -c 'from pstats import Stats; Stats("profile.dat").sort_stats("cumulative").print_stats(30)'?

Comment by arnaud.lefur [ 10/Mar/23 ]

Here's the results :

Thu Mar  9 22:56:44 2023    profile.dat

         16660323 function calls (16092265 primitive calls) in 195.285 seconds

   Ordered by: cumulative time
   List reduced from 15070 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   3708/1    0.078    0.000  195.304  195.304 {built-in method builtins.exec}
        1    0.000    0.000  195.303  195.303 test_detrend_n2.py:1(<module>)
        1    0.000    0.000  155.743  155.743 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/obs_pfs/w.2023.09/python/lsst/obs/pfs/isrTask.py:422(run)
        1    0.037    0.037  155.743  155.743 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/obs_pfs/w.2023.09/python/lsst/obs/pfs/isrTask.py:550(runH4RG)
        1    0.197    0.197  148.283  148.283 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/isrTask.py:2497(maskAndInterpolateDefects)
        1    0.421    0.421  145.692  145.692 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/isrFunctions.py:157(interpolateFromMask)
        1    0.000    0.000  141.637  141.637 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/isrFunctions.py:77(interpolateDefectList)
        1  140.737  140.737  140.866  140.866 {built-in method lsst.meas.algorithms.interp.interpolateOverDefects}
        2    0.000    0.000   23.277   11.639 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/butler.py:1377(get)
        2    0.000    0.000   21.831   10.916 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/butler.py:1414(callback)
        3    0.000    0.000   21.279    7.093 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/posixStorage.py:262(read)
        1    0.000    0.000   21.279   21.279 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/butler.py:1409(callback)
        1    0.000    0.000   21.279   21.279 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/butler.py:1583(_read)
        1    0.000    0.000   21.279   21.279 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/repository.py:186(read)
        1    0.000    0.000   21.278   21.278 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/daf_persistence/g6a31054a6e+4ae6138134/python/lsst/daf/persistence/posixStorage.py:565(readFitsStorage)
        1    0.124    0.124   21.278   21.278 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/calibType.py:432(readFits)
        1    4.021    4.021   20.786   20.786 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:492(fromTable)
      3/2    1.035    0.345   15.111    7.556 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:83(__init__)
   3150/8    0.028    0.000   14.570    1.821 <frozen importlib._bootstrap>:986(_find_and_load)
   3132/5    0.022    0.000   14.570    2.914 <frozen importlib._bootstrap>:956(_find_and_load_unlocked)
   4050/4    0.006    0.000   14.556    3.639 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
   3010/6    0.023    0.000   14.556    2.426 <frozen importlib._bootstrap>:650(_load_unlocked)
   2631/6    0.012    0.000   14.556    2.426 <frozen importlib._bootstrap_external>:837(exec_module)
  1063/54    0.004    0.000   14.030    0.260 {built-in method builtins.__import__}
        1    0.000    0.000   11.582   11.582 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/obs_pfs/w.2023.09/python/lsst/obs/pfs/__init__.py:1(<module>)
        1    0.000    0.000   10.386   10.386 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/obs_pfs/w.2023.09/python/lsst/obs/pfs/pfsMapper.py:1(<module>)
3620/1018    0.014    0.000   10.299    0.010 <frozen importlib._bootstrap>:1017(_handle_fromlist)
662458/441639    2.446    0.000    9.557    0.000 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:174(_normalize)
        2    0.000    0.000    7.421    3.710 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:641(fromFootprintList)
        3    7.180    2.393    7.180    2.393 /software/stack/20220609/stack/miniconda3-py38_4.9.2-3.0.0/Linux64/ip_isr/g8142d5ee96+96f555657d/python/lsst/ip/isr/defects.py:261(maskPixels)

Comment by price [ 10/Mar/23 ]

So the tall pole is lsst.meas.algorithms.interp.interpolateOverDefects. There are a lot of defects in the NIR detector, and that function is well-known to be relatively slow.

One option for a significant speed boost would be to simply replace bad pixels with the image median, but it would not be as visually pleasing.

rhl should advise.

Comment by arnaud.lefur [ 07/Apr/23 ]

We will be living with this, at least on short term, so we should probably close this ticket, unless someone wants to chime in.

Comment by cloomis [ 07/Apr/23 ]

How hard would it be to allow median interpolation? Given that it takes > 2min we would certainly turn interpolation off otherwise. rhl?

Comment by rhl [ 07/Apr/23 ]

We can close this once a ticket is filed to address the problem.  In the short term, a median infill of the neighbours is probably OK (and should be fast).  In the longer run, we need better interpolation in all arms – and this is true for Rubin too.  Doug FInkbeiner claims to have fast 2-D  GP interpolation (in julia ), but whether it'd be fast enough here I don't know (the current interpolation is a simplified 1-D GP)

Comment by arnaud.lefur [ 11/Apr/23 ]

so I did check what scipy.ndimage has to offer, their median_filter is quick, ~3 secs, but it's not doing great with NaNs or masked array.
I rewrote a lousy algorithm, using np.nanmedian and np.ma.median but I couldn't reproduce their results.

Comment by arnaud.lefur [ 11/Apr/23 ]

I've added a median interpolation function, just using numpy, which runs in ~30sec.
Results looks okay, although in my case, the cosmic rays detection bumped from 5k to 15k.
5k was already too much to be real, the settings need to be tweaked in any case.

I've also noticed that the final rotation was taking ~4s, using afwMath.rotateImageBy90 but it can be reduced to few millisec by manipulating the array dimension directly.

Comment by rhl [ 12/Apr/23 ]
ny, nx = im.shape
cube = np.zeros((9, ny - 2, nx - 2))

i = 0
cube[i] = im[0:-2, 0:-2]; i += 1
cube[i] = im[0:-2, 1:-1]; i += 1
cube[i] = im[0:-2, 2:];   i += 1

cube[i] = im[1:-1, 0:-2]; i += 1
cube[i] = im[1:-1, 1:-1]; i += 1
cube[i] = im[1:-1, 2:];   i += 1

cube[i] = im[2:, 0:-2]; i += 1
cube[i] = im[2:, 1:-1]; i += 1
cube[i] = im[2:, 2:];   i += 1

med = np.median(cube, axis=0)

im[1:-1, 1:-1] = med

I think that this perform a 3x3 median filter; it takes c. 1.1s for a 4k x 4k image. How does this compare to your code?

Also, I worry that just flipping the definition of the array rather than flipping the pixels will cost us in cache when we actually use the image for anything; but maybe not.

Comment by arnaud.lefur [ 13/Apr/23 ]

I tested your code, which I modified slightly for the edge cases, but it is faster indeed, because less dimension probably, constructing a (9, 4096, 4096) vs (4096, 4096, 3, 3) array.

Full task run now in 15s instead 30s on LAM machine.

Generated at Sat Feb 10 16:03:54 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.