[PIPE2D-1231] Reduce memory usage of fitPfsReferenceFlux Created: 05/Jun/23  Updated: 16/Jun/23  Resolved: 09/Jun/23

Status: Done
Project: DRP 2-D Pipeline
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Normal
Reporter: sogo.mineo Assignee: sogo.mineo
Resolution: Done Votes: 0
Labels: flux-calibration
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File pcaresiduals.png     PNG File rms.relative.error.png    
Issue Links:
Relates
relates to PIPE2D-1238 fitPfsFluxReference should not brute-... Done
relates to PIPE2D-1168 fitPfsReferenceFlux breaks Tiger Done
relates to PIPE2D-1060 Tune hyperparameters of fluxmodel int... Done
Reviewers: price

 Description   

I want to use PCA to reduce memory usage of fitPfsReferenceFlux.

Though it takes several days with 500GB of memory
to perform PCA on 6040 of 1e5-dimensional vectors (spectra), the program does finish,
and it seems that the input 6040 vectors can be approximated by
linear combinations of 1024 basis vectors with RMS relative errors 1.6e-4.

The interpolator of flux models has been a function from the 4-dimensional parameter space
(Teff, log(g), metal, alpha) to 1e5-dimensional vector space (R^4 → R^1e5).
With PCA, I can replace it with a function R^4 → R^1024,
"R^1024" being the set of coefficients in linear combinations of the 1024 basis vectors.

I decide to use RBF to fit this function R^4 → R^1024.
(Another candidate is polynomials, but I am not sure I would be able do without overfitting
if I were to use polynomials.)

I use the following procedure to estimate interpolation errors:

For each x[i] of input ~6000 spectra:
    Make RBF from the input ~6000 spectra except x[i]
    Interpolate a spectrum y at the same parameter as x[i]
    rms[i] = sqrt(mean(square((y - x[i]) / x[i])))

error = sqrt(mean(square(rms)))

Below is the histogram of "rms" with the best hyperparameters. The RMS of "rms" is 2.6e-4

I can see that the increase of errors from PIPE2D-1060 is negligible.

I want this new PCA-based interpolation merged to the master branch.
This method will reduce memory usage to 1/6 of the current method.

The new code requires a new version of fluxmodeldata, which I have uploaded here:

[-https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230602.tar.gz-]
https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230608.tar.gz

To use the new fluxmodeldata, users have to run ./install.sh --prefix=/path/to/pfs-packages [--set=small] to
pre-compute ~6,000 or ~60,000 spectra from the PCA basis vectors included in the package.



 Comments   
Comment by sogo.mineo [ 05/Jun/23 ]

Could you review this PR?

Comment by price [ 07/Jun/23 ]

The packaging for fluxmodeldata doesn't make much sense to me. install.py doesn't actually do an installation, but it precomputes the data. Moreover, it puts the products in the same directory, not actually installing it anywhere. That means that if I want both small and full packages, I have to untar the tarball, rename the directory, and run the script in both. It's probably not worth fixing now, but for the next iteration it would be helpful to solve.

Comment by sogo.mineo [ 07/Jun/23 ]

I am making a new fluxmodeldata package whose install.py actually installs files to PREFIX/fluxmodeldata-ambre-20230602-small or PREFIX/fluxmodeldata-ambre-20230602-full according to --set option. (I will upload it tomorrow).

I have made changes to the ticket branch (and reverted makeFluxModelInterpolator.py) to keep supporting old fluxmodeldata packages. I have forgot to revise the commit message, but I don't have time to amend it today. Please give me comments, if any, on anything except the commit message.

A question: I put @deprecated decorator on makeFluxModelInterpolator() function (in makeFluxModelInterpolator.py) but no DeprecationWarning is seen when the program is run. It seems to be python's (not deprecated's) default behavior. What is the best way to inform the user of the deprecation? Should I use print()?

Comment by sogo.mineo [ 08/Jun/23 ]

I uploaded the new fluxmodeldata package (https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230608.tar.gz), and inscribed this specific version in the deprecation messages in the sources. (For example, "NaiveFluxModelInterpolator has been replaced by PCAFluxModelInterpolator, which requires fluxmodeldata >= ambre-20230608. See PIPE2D-1231.").

I decide to let makeFluxModelInterpolator.py exit immediately if fluxmodeldata is new, because it is not compatible with the new fluxmodeldata. I keep makeFluxModelInterpolator.py only for old versions of fluxmodeldata.

Comment by sogo.mineo [ 09/Jun/23 ]

Merged. Thanks for reviewing.

Comment by price [ 15/Jun/23 ]

I just tried installing the new fluxmodeldata package, and it was much smoother, thanks! The only problem I've found so far is that the ups directory isn't installed.

Comment by price [ 15/Jun/23 ]

Ah, no! I wasn't aware that the package name was added to the prefix, so everything was installed down a level.

Generated at Thu Apr 10 03:14:30 JST 2025 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.