-
Type:
Task
-
Status: Done (View Workflow)
-
Priority:
Normal
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:
We are trying to make fitPfsFluxReference.py faster and to reduce its memory usage. Here are what we have found:
(A) An RBFInterpolator instance with a finite number of neighbors does not possess pre-computed solution of the interpolation problem, but it solves the problem every time it interpolates a spectrum because the "neighbors" from which to guess the interpolated spectrum depend on the parameters at which to interpolate it. Hence the memory usage decreases by half but the computation time is not reduced very much. For example. RBFInterpolator with neighbors=None can interpolate a spectrum in 600ms, and RBFInterpolator with neighbors=128 in 300ms. With a less number of neighbors, we are afraid that interpolation quality may worsen.
(B) If we are to adopt something like the gradient decent method to find max(probability(model parameters)), then we have to dynamically interpolate model spectra at parameters given by an optimizer at run time. It follows from (A) that we have to pay 600ms (neighbors=None) or 300ms (neighbors=128) for each search step.
(C) If we are to keep brute-forcing the optimal parameters, we can still make fitPfsFluxReference.py run in less than 1 hour/visit with the full fluxdatamodel set:
(C.1) Pack all the FITS files in fluxdatamodel/spectra/ into an uncompressed zip file. If we make this change, we get the following result from the profiler.
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 12315.259 12315.259 cmdLineTask.py:621(parseAndRun) 114124 9.324 0.000 10226.402 0.090 fitPfsFluxReference.py:501(computeContinuum) 228238 2.036 0.000 942.012 0.004 fitPfsFluxReference.py:1086(convolveLsf) 114124 7.996 0.000 867.236 0.008 fluxModelSet.py:126(readSpectrum)
Previously (see PIPE2D-1145), readSpectrum consumed about the same amount of time as computeContinuum, but its cumtime is now negligible compared to computeContinuum.
(C.2) Downsample the model spectra before input into computeContinuum. The model spectra are very finely sampled so as to trace its very narrow LSF, but we can safely downsample them after a wider LSF (from observation) is convolved with them. Then, we get the following result:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 6927.600 6927.600 cmdLineTask.py:621(parseAndRun) 114124 6.386 0.000 4865.802 0.043 fitPfsFluxReference.py:517(computeContinuum) 228238 1.725 0.000 917.864 0.004 fitPfsFluxReference.py:1148(convolveLsf) 114124 7.921 0.000 625.456 0.005 fluxModelSet.py:126(readSpectrum)
This cumtime (7000 sec) is the time to process the two visits in the integration test. For one visit, it takes 3500 sec only. Furthermore, the flux standards in the integration test have only one broad-band flux for each, and no model parameters can be cut off with their prior probability. In a practical situation, where the cut-off mechanism works, we expect even less execution time.
We want to merge the changes in (C.1) and (C.2) to the master branch in this ticket.
EDIT:
Use packFluxModelData.py to make the zip file of the spectrum files in the exact form we expect.
EDIT:
I uploaded two packages which are required by this ticket branch:
https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230428-small.tar.xz
https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230428-full.tar.xz
These packages are almost the same as fluxmodeldata-ambre-20220714, except that the spectra have been packed into a zip file and the new broadband filter names (PIPE2D-1110) have come to be used.
The ticket branch is incompatible to older fluxmodeldata package. Users must update the package to one of these two versions.
- relates to
-
PIPE2D-1168 fitPfsReferenceFlux breaks Tiger
-
- Done
-
-
PIPE2D-1145 Make fitPfsFluxReference.py faster
-
- Done
-