-
Type:
Task
-
Status: Done (View Workflow)
-
Priority:
Normal
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:
The current fitPfsFluxReference.py is slow.
It takes a few hours to process a single visit.
We are going to aim at 1 hour/visit for now,
though we are not sure whether it is acceptable or not.
Here is the output of the profiler profiling the processing of the integration test.
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1981.366 1981.366 cmdLineTask.py:621(parseAndRun) 12088 1.538 0.000 1360.010 0.113 fitPfsFluxReference.py:427(computeContinuum) 48326 0.515 0.000 318.018 0.007 fitPfsFluxReference.py:981(convolveLsf)
(Because the integration test contains two visits, we have to divide these things by 2
to get per-visit values. We then have to multiply them by 10 because "6k" model set is
used in the integration test, whereas "60k" model set is used in the actual data processing.)
The two hot spots are:
- computeContinuum (Fit a continuum to a model spectrum)
- convolveLsf (Convolve a model spectrum with an LSF)
We already have a mechanism to skip these two calls when they are unnecessary:
"If prior[model] / max(prior) <= th, then skip computing likelihood[model]." (th = 1e-8)
The prior probability distribution is computed from broad-band fluxes.
Since the integration test contains only a single broad-band flux (i2_hsc),
the prior is the uniform distribution. Therefore this mechanism does not work
in the integration test, but it indeed appears to lead to time saving
in the actual data processing.
We have 85 prior probability distributions as --debug by-products
obtained from processing visit=82596. Using these distributions,
we can count how many calls to the two functions will happen if
we set the threshold th to various values.
We can then speculate, from the profiler output, execution time
of fitPfsFluxReference as a function of the threshold.
We can see from this plot that we have to set th=0.01 or above
if we want the per-visit execution time to be less than an hour.
One concern is that the model that would be chosen were it not
for the threshold can be discarded too hastily if we set the threshold
to such a large value.
We can examine whether a FLUXSTD fiber will be affected by the threshold,
by seeing prior[argmax(posterior)] / max(prior).
(The posterior distributions are also --debug by-products obtained from
processing visit=82596.)
If prior[argmax(posterior)] / max(prior) is less than the threshold,
the best model (argmax(posterior)) won't be selected when we set
th to the threshold value.
It appears that the samples below 0.01 are outliers, which can be neglected.
(More than 40% of FLUXSTD fibers are outliers for now, but it is another problem.)
In conclusion, we will:
- Modify the program to make the threshold a config parameter.
- Change the default threshold from 1e-8 to 0.01.
- relates to
-
PIPE2D-1208 Make fitPfsFluxReference even faster
-
- Done
-