Uploaded image for project: 'DRP 2-D Pipeline'
  1. DRP 2-D Pipeline
  2. PIPE2D-1231

Reduce memory usage of fitPfsReferenceFlux

    Details

    • Type: Task
    • Status: Done (View Workflow)
    • Priority: Normal
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None

      Description

      I want to use PCA to reduce memory usage of fitPfsReferenceFlux.

      Though it takes several days with 500GB of memory
      to perform PCA on 6040 of 1e5-dimensional vectors (spectra), the program does finish,
      and it seems that the input 6040 vectors can be approximated by
      linear combinations of 1024 basis vectors with RMS relative errors 1.6e-4.

      The interpolator of flux models has been a function from the 4-dimensional parameter space
      (Teff, log(g), metal, alpha) to 1e5-dimensional vector space (R^4 → R^1e5).
      With PCA, I can replace it with a function R^4 → R^1024,
      "R^1024" being the set of coefficients in linear combinations of the 1024 basis vectors.

      I decide to use RBF to fit this function R^4 → R^1024.
      (Another candidate is polynomials, but I am not sure I would be able do without overfitting
      if I were to use polynomials.)

      I use the following procedure to estimate interpolation errors:

      For each x[i] of input ~6000 spectra:
          Make RBF from the input ~6000 spectra except x[i]
          Interpolate a spectrum y at the same parameter as x[i]
          rms[i] = sqrt(mean(square((y - x[i]) / x[i])))
      
      error = sqrt(mean(square(rms)))
      

      Below is the histogram of "rms" with the best hyperparameters. The RMS of "rms" is 2.6e-4

      I can see that the increase of errors from PIPE2D-1060 is negligible.

      I want this new PCA-based interpolation merged to the master branch.
      This method will reduce memory usage to 1/6 of the current method.

      The new code requires a new version of fluxmodeldata, which I have uploaded here:

      [-https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230602.tar.gz-]
      https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230608.tar.gz

      To use the new fluxmodeldata, users have to run ./install.sh --prefix=/path/to/pfs-packages [--set=small] to
      pre-compute ~6,000 or ~60,000 spectra from the PCA basis vectors included in the package.

        Attachments

          Issue Links

            Activity

            Hide
            sogo.mineo sogo.mineo added a comment -

            Could you review this PR?

            Show
            sogo.mineo sogo.mineo added a comment - Could you review this PR?
            Hide
            price price added a comment -

            The packaging for fluxmodeldata doesn't make much sense to me. install.py doesn't actually do an installation, but it precomputes the data. Moreover, it puts the products in the same directory, not actually installing it anywhere. That means that if I want both small and full packages, I have to untar the tarball, rename the directory, and run the script in both. It's probably not worth fixing now, but for the next iteration it would be helpful to solve.

            Show
            price price added a comment - The packaging for fluxmodeldata doesn't make much sense to me. install.py doesn't actually do an installation, but it precomputes the data. Moreover, it puts the products in the same directory, not actually installing it anywhere. That means that if I want both small and full packages, I have to untar the tarball, rename the directory, and run the script in both. It's probably not worth fixing now, but for the next iteration it would be helpful to solve.
            Hide
            sogo.mineo sogo.mineo added a comment -

            I am making a new fluxmodeldata package whose install.py actually installs files to PREFIX/fluxmodeldata-ambre-20230602-small or PREFIX/fluxmodeldata-ambre-20230602-full according to --set option. (I will upload it tomorrow).

            I have made changes to the ticket branch (and reverted makeFluxModelInterpolator.py) to keep supporting old fluxmodeldata packages. I have forgot to revise the commit message, but I don't have time to amend it today. Please give me comments, if any, on anything except the commit message.

            A question: I put @deprecated decorator on makeFluxModelInterpolator() function (in makeFluxModelInterpolator.py) but no DeprecationWarning is seen when the program is run. It seems to be python's (not deprecated's) default behavior. What is the best way to inform the user of the deprecation? Should I use print()?

            Show
            sogo.mineo sogo.mineo added a comment - I am making a new fluxmodeldata package whose install.py  actually installs files to PREFIX/fluxmodeldata-ambre-20230602-small or PREFIX/fluxmodeldata-ambre-20230602-full according to --set  option. (I will upload it tomorrow). I have made changes to the ticket branch (and reverted makeFluxModelInterpolator.py ) to keep supporting old fluxmodeldata packages. I have forgot to revise the commit message, but I don't have time to amend it today. Please give me comments, if any, on anything except the commit message. A question: I put @deprecated decorator on makeFluxModelInterpolator() function (in makeFluxModelInterpolator.py ) but no DeprecationWarning  is seen when the program is run. It seems to be python's (not deprecated 's) default behavior. What is the best way to inform the user of the deprecation? Should I use print() ?
            Hide
            sogo.mineo sogo.mineo added a comment -

            I uploaded the new fluxmodeldata package (https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230608.tar.gz), and inscribed this specific version in the deprecation messages in the sources. (For example, "NaiveFluxModelInterpolator has been replaced by PCAFluxModelInterpolator, which requires fluxmodeldata >= ambre-20230608. See PIPE2D-1231.").

            I decide to let makeFluxModelInterpolator.py exit immediately if fluxmodeldata is new, because it is not compatible with the new fluxmodeldata. I keep makeFluxModelInterpolator.py only for old versions of fluxmodeldata.

            Show
            sogo.mineo sogo.mineo added a comment - I uploaded the new fluxmodeldata package ( https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230608.tar.gz ), and inscribed this specific version in the deprecation messages in the sources. (For example, "NaiveFluxModelInterpolator has been replaced by PCAFluxModelInterpolator, which requires fluxmodeldata >= ambre-20230608. See PIPE2D-1231 ."). I decide to let makeFluxModelInterpolator.py exit immediately if fluxmodeldata is new, because it is not compatible with the new fluxmodeldata . I keep makeFluxModelInterpolator.py only for old versions of fluxmodeldata .
            Hide
            sogo.mineo sogo.mineo added a comment -

            Merged. Thanks for reviewing.

            Show
            sogo.mineo sogo.mineo added a comment - Merged. Thanks for reviewing.
            Hide
            price price added a comment - - edited

            I just tried installing the new fluxmodeldata package, and it was much smoother, thanks! The only problem I've found so far is that the ups directory isn't installed.

            Show
            price price added a comment - - edited I just tried installing the new fluxmodeldata package, and it was much smoother, thanks! The only problem I've found so far is that the ups directory isn't installed.
            Hide
            price price added a comment -

            Ah, no! I wasn't aware that the package name was added to the prefix, so everything was installed down a level.

            Show
            price price added a comment - Ah, no! I wasn't aware that the package name was added to the prefix, so everything was installed down a level.

              People

              • Assignee:
                sogo.mineo sogo.mineo
                Reporter:
                sogo.mineo sogo.mineo
                Reviewers:
                price
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: