-
Type:
Task
-
Status: Done (View Workflow)
-
Priority:
Normal
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:
FitContinuumTask is called by FitPfsFluxReferenceTask very many times, to the extent that its execution time amounts to one of the largest part of FitPfsFluxReferenceTask's execution time. The major part of FitContinuumTask's execution time is from binData(). We processed visit=92114 to get the following profile:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.014 0.014 5329.600 5329.600 fitPfsFluxReference.py:292(run) 38657 8.654 0.000 1604.840 0.042 fitContinuum.py:83(fitContinuum) 154628 59.177 0.000 1002.344 0.006 fitContinuum.py:264(binData)
We can make binData() faster by calling np.median() only once. Here is the profile after this modification:
1 0.019 0.019 4658.235 4658.235 fitPfsFluxReference.py:292(run) 38657 8.498 0.000 942.938 0.024 fitContinuum.py:88(fitContinuum) 154628 11.449 0.000 361.209 0.002 fitContinuum.py:271(binData)
10 minites/visit is saved by this modification with virtually no impact on outputs of the pipeline. I say "virtually" because I use a binning strategy different from the original code when numBins does not divide the array length. In many cases my binning agrees with the original binning, but they are different when, say, array length = 100 and numBins = 8. binData(np.arange(100).astype(float), numBins=8) produces the following results:
original: [ 6. 18.5 31. 43.5 56. 68.5 81. 93.5] mine: [ 6. 18.5 31. 43.5 55.5 68. 80.5 93. ]
I believe my one is better because the elements of "*.0" and "*.5" are distributed symmetrically in its output.