[PIPE2D-1145] Make fitPfsFluxReference.py faster Created: 18/Jan/23  Updated: 27/Apr/23  Resolved: 27/Jan/23

Status: Done
Project: DRP 2-D Pipeline
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Normal
Reporter: sogo.mineo Assignee: sogo.mineo
Resolution: Done Votes: 0
Labels: flux-calibration
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File affected_fluxstd.png     PNG File time_threshold.png    
Issue Links:
Relates
relates to PIPE2D-1208 Make fitPfsFluxReference even faster Done
Reviewers: price

 Description   

The current fitPfsFluxReference.py is slow.
It takes a few hours to process a single visit.

We are going to aim at 1 hour/visit for now,
though we are not sure whether it is acceptable or not.

Here is the output of the profiler profiling the processing of the integration test.

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000 1981.366 1981.366 cmdLineTask.py:621(parseAndRun)
 12088    1.538    0.000 1360.010    0.113 fitPfsFluxReference.py:427(computeContinuum)
 48326    0.515    0.000  318.018    0.007 fitPfsFluxReference.py:981(convolveLsf)

(Because the integration test contains two visits, we have to divide these things by 2
to get per-visit values. We then have to multiply them by 10 because "6k" model set is
used in the integration test, whereas "60k" model set is used in the actual data processing.)

The two hot spots are:

  • computeContinuum (Fit a continuum to a model spectrum)
  • convolveLsf (Convolve a model spectrum with an LSF)

We already have a mechanism to skip these two calls when they are unnecessary:
"If prior[model] / max(prior) <= th, then skip computing likelihood[model]." (th = 1e-8)
The prior probability distribution is computed from broad-band fluxes.
Since the integration test contains only a single broad-band flux (i2_hsc),
the prior is the uniform distribution. Therefore this mechanism does not work
in the integration test, but it indeed appears to lead to time saving
in the actual data processing.

We have 85 prior probability distributions as --debug by-products
obtained from processing visit=82596. Using these distributions,
we can count how many calls to the two functions will happen if
we set the threshold th to various values.
We can then speculate, from the profiler output, execution time
of fitPfsFluxReference as a function of the threshold.

We can see from this plot that we have to set th=0.01 or above
if we want the per-visit execution time to be less than an hour.

One concern is that the model that would be chosen were it not
for the threshold can be discarded too hastily if we set the threshold
to such a large value.
We can examine whether a FLUXSTD fiber will be affected by the threshold,
by seeing prior[argmax(posterior)] / max(prior).
(The posterior distributions are also --debug by-products obtained from
processing visit=82596.)
If prior[argmax(posterior)] / max(prior) is less than the threshold,
the best model (argmax(posterior)) won't be selected when we set
th to the threshold value.

It appears that the samples below 0.01 are outliers, which can be neglected.
(More than 40% of FLUXSTD fibers are outliers for now, but it is another problem.)

In conclusion, we will:

  • Modify the program to make the threshold a config parameter.
  • Change the default threshold from 1e-8 to 0.01.


 Comments   
Comment by sogo.mineo [ 19/Jan/23 ]

I copied Yamashita-san's files and ran fitPfsFluxReference.py on them to get the actual profile of fitPfsFluxReference.py running on real data. I set the threshold to 0.01.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000 10937.974 10937.974 cmdLineTask.py:621(parseAndRun)
    28655    5.009    0.000 4212.784    0.147 fitPfsFluxReference.py:435(computeContinuum)
    57141   18.758    0.000 3688.403    0.065 fluxModelSet.py:120(readSpectrum)
  4849760 1056.633    0.000 1644.092    0.000 pfsFiberArraySet.py:366(extractFiber)
    87918    1.490    0.000  806.743    0.009 fitPfsFluxReference.py:989(convolveLsf)

It still took three hours. convolveLsf is no longer problematic with threshold=0.01, for its cumtime was much less than that of extractFiber.

A new hot spot is readSpectrum. This is a function to read a model template from the disk storage. I have yet to find why readSpectrum is called twice as many times as computeContinuum.

Comment by sogo.mineo [ 20/Jan/23 ]

I reduced the number of calls to readSpectrum and extractFiber:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000 8893.141 8893.141 cmdLineTask.py:621(parseAndRun)
    28655    4.785    0.000 4416.048    0.154 fitPfsFluxReference.py:435(computeContinuum)
    28655   11.147    0.000 3055.535    0.107 fluxModelSet.py:120(readSpectrum)
    87918    1.432    0.000  874.030    0.010 fitPfsFluxReference.py:995(convolveLsf)
      170    0.086    0.001    0.114    0.001 pfsFiberArraySet.py:366(extractFiber)

Cumtime percall of readSpectrum increased because the storage was in heavy use during the profiling.
If the storage had been at leisure, the total execution time would have been 2 hours.
I have to reduce it by another hour.

computeContinuum was called 28655 times because 28655 flux templates out of 57141
were possible candidates for any of the 85 FLUXSTD.
I think I can do without computeContinuum by pre-computing continua
of all the flux templates with several typical Gaussian LSF, saving them in files,
and reading the files instead of calling computeContinuum.
It may seem hopeless because readSpectrum (file read) took almost the same time as
computeContinuum, but we should recall that the storage was in heavy use.

Another idea is to reduce the interpolation points in the model parameter space.
If the number of flux templates is made half, execution time of fitPfsFluxReference
will be half.

Comment by sogo.mineo [ 24/Jan/23 ]

Yamashita-san speaks against sacrificing any accuracy for the sake of speed at this point. Since output does not change but speed increases by the commits in this ticket branch, I want to merge the ticket branch before closing this ticket (in "Won't Fix"?)

Comment by Takuji Yamashita [ 24/Jan/23 ]

Because, at this point, we do not have any distinct requirements for the processing time, I think we do not need to reduce the processing time so far as affecting accuracy. In this ticket, we have the option of the cutoff threshold and the speed increases without changing outputs as Mineo-san said. I agree to merge this ticket. 

Comment by sogo.mineo [ 26/Jan/23 ]

To make extra sure, I restored the threshold to 1e-8, keeping the other changes, to take profile again:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000 14380.434 14380.434 cmdLineTask.py:621(parseAndRun)
 45006    7.391    0.000 6854.570    0.152 fitPfsFluxReference.py:436(computeContinuum)
 45006   18.192    0.000 4610.827    0.102 fluxModelSet.py:120(readSpectrum)
241737    3.420    0.000 2070.246    0.009 fitPfsFluxReference.py:996(convolveLsf)

Compared to the case of threshold 1e-2, the total execution time was 1.6 times longer.
The threshold as high as 1e-2 proved inevitable.

I'm making a pull request.

Comment by sogo.mineo [ 26/Jan/23 ]

Could you review this PR? The execution time is almost made half, though the original aim (execution time <~ 1 hour/visit) has not been accomplished. I cannot reduce the execution time any further without sacrificing accuracy. I don't know which resolution should be made to this issue, "Done" or "Won't fix"; but I want to merge the branch to master anyway.

Comment by sogo.mineo [ 27/Jan/23 ]

Merged. Thank you for the review. Let me close this issue in "Done".

Generated at Sat Feb 10 16:03:21 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.