Uploaded image for project: 'DRP 2-D Pipeline'
  1. DRP 2-D Pipeline
  2. PIPE2D-1231

Reduce memory usage of fitPfsReferenceFlux

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Done (View Workflow)
    • Priority: Normal
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None

      Description

      I want to use PCA to reduce memory usage of fitPfsReferenceFlux.

      Though it takes several days with 500GB of memory
      to perform PCA on 6040 of 1e5-dimensional vectors (spectra), the program does finish,
      and it seems that the input 6040 vectors can be approximated by
      linear combinations of 1024 basis vectors with RMS relative errors 1.6e-4.

      The interpolator of flux models has been a function from the 4-dimensional parameter space
      (Teff, log(g), metal, alpha) to 1e5-dimensional vector space (R^4 → R^1e5).
      With PCA, I can replace it with a function R^4 → R^1024,
      "R^1024" being the set of coefficients in linear combinations of the 1024 basis vectors.

      I decide to use RBF to fit this function R^4 → R^1024.
      (Another candidate is polynomials, but I am not sure I would be able do without overfitting
      if I were to use polynomials.)

      I use the following procedure to estimate interpolation errors:

      For each x[i] of input ~6000 spectra:
          Make RBF from the input ~6000 spectra except x[i]
          Interpolate a spectrum y at the same parameter as x[i]
          rms[i] = sqrt(mean(square((y - x[i]) / x[i])))
      
      error = sqrt(mean(square(rms)))
      

      Below is the histogram of "rms" with the best hyperparameters. The RMS of "rms" is 2.6e-4

      I can see that the increase of errors from PIPE2D-1060 is negligible.

      I want this new PCA-based interpolation merged to the master branch.
      This method will reduce memory usage to 1/6 of the current method.

      The new code requires a new version of fluxmodeldata, which I have uploaded here:

      [-https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230602.tar.gz-]
      https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20230608.tar.gz

      To use the new fluxmodeldata, users have to run ./install.sh --prefix=/path/to/pfs-packages [--set=small] to
      pre-compute ~6,000 or ~60,000 spectra from the PCA basis vectors included in the package.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sogo.mineo sogo.mineo
                Reporter:
                sogo.mineo sogo.mineo
                Reviewers:
                price
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: