[PIPE2D-1145] Make fitPfsFluxReference.py faster Created: 18/Jan/23 Updated: 27/Apr/23 Resolved: 27/Jan/23 |
|
| Status: | Done |
| Project: | DRP 2-D Pipeline |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Normal |
| Reporter: | sogo.mineo | Assignee: | sogo.mineo |
| Resolution: | Done | Votes: | 0 |
| Labels: | flux-calibration | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Reviewers: | price | ||||||||
| Description |
|
The current fitPfsFluxReference.py is slow. We are going to aim at 1 hour/visit for now, Here is the output of the profiler profiling the processing of the integration test.
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1981.366 1981.366 cmdLineTask.py:621(parseAndRun)
12088 1.538 0.000 1360.010 0.113 fitPfsFluxReference.py:427(computeContinuum)
48326 0.515 0.000 318.018 0.007 fitPfsFluxReference.py:981(convolveLsf)
(Because the integration test contains two visits, we have to divide these things by 2 The two hot spots are:
We already have a mechanism to skip these two calls when they are unnecessary: We have 85 prior probability distributions as --debug by-products
We can see from this plot that we have to set th=0.01 or above One concern is that the model that would be chosen were it not
It appears that the samples below 0.01 are outliers, which can be neglected. In conclusion, we will:
|
| Comments |
| Comment by sogo.mineo [ 19/Jan/23 ] |
|
I copied Yamashita-san's files and ran fitPfsFluxReference.py on them to get the actual profile of fitPfsFluxReference.py running on real data. I set the threshold to 0.01.
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 10937.974 10937.974 cmdLineTask.py:621(parseAndRun)
28655 5.009 0.000 4212.784 0.147 fitPfsFluxReference.py:435(computeContinuum)
57141 18.758 0.000 3688.403 0.065 fluxModelSet.py:120(readSpectrum)
4849760 1056.633 0.000 1644.092 0.000 pfsFiberArraySet.py:366(extractFiber)
87918 1.490 0.000 806.743 0.009 fitPfsFluxReference.py:989(convolveLsf)
It still took three hours. convolveLsf is no longer problematic with threshold=0.01, for its cumtime was much less than that of extractFiber. A new hot spot is readSpectrum. This is a function to read a model template from the disk storage. I have yet to find why readSpectrum is called twice as many times as computeContinuum. |
| Comment by sogo.mineo [ 20/Jan/23 ] |
|
I reduced the number of calls to readSpectrum and extractFiber:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 8893.141 8893.141 cmdLineTask.py:621(parseAndRun)
28655 4.785 0.000 4416.048 0.154 fitPfsFluxReference.py:435(computeContinuum)
28655 11.147 0.000 3055.535 0.107 fluxModelSet.py:120(readSpectrum)
87918 1.432 0.000 874.030 0.010 fitPfsFluxReference.py:995(convolveLsf)
170 0.086 0.001 0.114 0.001 pfsFiberArraySet.py:366(extractFiber)
Cumtime percall of readSpectrum increased because the storage was in heavy use during the profiling. computeContinuum was called 28655 times because 28655 flux templates out of 57141 Another idea is to reduce the interpolation points in the model parameter space. |
| Comment by sogo.mineo [ 24/Jan/23 ] |
|
Yamashita-san speaks against sacrificing any accuracy for the sake of speed at this point. Since output does not change but speed increases by the commits in this ticket branch, I want to merge the ticket branch before closing this ticket (in "Won't Fix"?) |
| Comment by Takuji Yamashita [ 24/Jan/23 ] |
|
Because, at this point, we do not have any distinct requirements for the processing time, I think we do not need to reduce the processing time so far as affecting accuracy. In this ticket, we have the option of the cutoff threshold and the speed increases without changing outputs as Mineo-san said. I agree to merge this ticket. |
| Comment by sogo.mineo [ 26/Jan/23 ] |
|
To make extra sure, I restored the threshold to 1e-8, keeping the other changes, to take profile again:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 14380.434 14380.434 cmdLineTask.py:621(parseAndRun)
45006 7.391 0.000 6854.570 0.152 fitPfsFluxReference.py:436(computeContinuum)
45006 18.192 0.000 4610.827 0.102 fluxModelSet.py:120(readSpectrum)
241737 3.420 0.000 2070.246 0.009 fitPfsFluxReference.py:996(convolveLsf)
Compared to the case of threshold 1e-2, the total execution time was 1.6 times longer. I'm making a pull request. |
| Comment by sogo.mineo [ 26/Jan/23 ] |
|
Could you review this PR? The execution time is almost made half, though the original aim (execution time <~ 1 hour/visit) has not been accomplished. I cannot reduce the execution time any further without sacrificing accuracy. I don't know which resolution should be made to this issue, "Done" or "Won't fix"; but I want to merge the branch to master anyway. |
| Comment by sogo.mineo [ 27/Jan/23 ] |
|
Merged. Thank you for the review. Let me close this issue in "Done". |