[PIPE2D-920] How to put the large set of the AMBRE model templates in github for flux calibration Created: 25/Oct/21 Updated: 22/Dec/21 Resolved: 22/Dec/21 |
|
| Status: | Done |
| Project: | DRP 2-D Pipeline |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Story | Priority: | Normal |
| Reporter: | Takuji Yamashita | Assignee: | sogo.mineo |
| Resolution: | Done | Votes: | 0 |
| Labels: | flux-calibration, model-templates | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Epic Link: | flux calibration | ||||||||
| Reviewers: | price | ||||||||
| Description |
|
Initially, we plan to store the 6k templates (2.4GB) in GitHub for flux calibration. Because this might be large, Mineo-san and Yamashita are discussing reducing the size. |
| Comments |
| Comment by Takuji Yamashita [ 26/Oct/21 ] |
|
We lean to saving the 6k templates in fixed-point numbers to reduce the file size. The size could reduce roughly by half, ~1-2GB. We will convert the templates to log and then convert them to fixed-point numbers. We need to test the accuracy. |
| Comment by price [ 26/Oct/21 ] |
|
We should also use WCS to do the wavelengths if we can. |
| Comment by Takuji Yamashita [ 26/Oct/21 ] |
|
The spectra are saved in FITS again. We can use WCS for wavelengths. |
| Comment by sogo.mineo [ 26/Oct/21 ] |
|
The size estimate appearing in the description does not include wavelength column. FITS files contain flux only. Wavelengths are computed by means of WCS. |
| Comment by hassan [ 27/Oct/21 ] |
|
Missing important comment from @rhl:
That way, the access and processing software are decoupled from the stored data format. |
| Comment by hassan [ 27/Oct/21 ] |
|
Discussed this a little further with rhl and price. Is it possible to store the data on a server somewhere? That would be easier to manage than to store the data under git-lfs. |
| Comment by sogo.mineo [ 27/Oct/21 ] |
|
We can indeed put the heavy things in hscdata.mtk.nao.ac.jp, for example. The problem is how to let the test process see them. I would like to hook the first call to getModelSpectrum() and get all spectra downloaded into ... some directory. I don't want to use /tmp since astropy does this and fills up the limited capacity of /tmp all too soon, then killed and leaving the system unstable. Another solution might be to make valid a path starting with https:// and download the models one by one every time they are requested, just as we actually open and read the model files one by one every time they are requested, |
| Comment by sogo.mineo [ 27/Oct/21 ] |
|
I would like to take the last route ("Another solution might be to make valid a path...") because it is the easiest thing to do. If I take this route, it may be that we no longer have to reduce the model size but that we can use all of 60k models, which was Yamashita-san's first plan. One problem is that the task of making calibration references will take a few hours (even if the models are in a local storage) to process a single fiber, so that it will take at least a few hours for a unit test to be proved. |
| Comment by rhl [ 28/Oct/21 ] |
|
I was assuming that the test data would just be a dependency, so it'd be installed once (using as it were curl) and then used whenever you run the tests. |
| Comment by sogo.mineo [ 28/Oct/21 ] |
|
Then what I have to do are:
Do I understand correctly? |
| Comment by price [ 01/Nov/21 ] |
|
Yes, that's great. Please be sure to include a README file that explains what the data are and where they came from, and include a version string (usually the date) in the directory name. You're welcome to put it on the tiger cluster at Princeton (e.g., /projects/HSC/PFS/fluxCal/fluxCal-20211101), and we can serve it via http from there. |
| Comment by sogo.mineo [ 02/Nov/21 ] |
|
I tentatively created a package just now, but I found myself not sure whether or not the synthetic spectra are redistributable. I am now checking it. |
| Comment by sogo.mineo [ 22/Nov/21 ] |
|
Tanaka-san said we have been given permission of redistribution of the synthetic spectra by the author. Yamashita-san found some flaws in converting the original spectra to the format he uses. He is now recreating the data files. |
| Comment by sogo.mineo [ 06/Dec/21 ] |
|
I have uploaded the smaller dataset here https://hscdata.mtk.nao.ac.jp/hsc_bin_dist/pfs/fluxmodeldata-ambre-20190419-small.tar.xz . The full dataset has not been completed yet. We found that we had to add more spectra to the dataset, and the spectra have yet to be made. We must also examine whether the tremendous amount of the full dataset and the eon-long execution time really contribute to accuracy of the calibration task. |
| Comment by sogo.mineo [ 07/Dec/21 ] |
|
If the above dataset is approved and installed in the server where the tests run, I would like to push changes to drp_stella and drp_pfs_data that are named this issue. With the changes, the broadband photometry table referred to by FitBroadbandSEDTask is moved from drp_pfs_data to the above package. The broadband photometry table must reside close to the spectrum set because the two must match with each other. |
| Comment by Takuji Yamashita [ 14/Dec/21 ] |
|
We can close this ticket because we have discussed this issue and Mineo-san has loaded the model template dataset. I will file two new tickets for the works Mineo-san said above: |
| Comment by price [ 18/Dec/21 ] |
|
I've retrieved the tarball listed above, and placed it in /projects/HSC/PFS/fluxCal on our Tiger cluster. It looks good to me. |
| Comment by sogo.mineo [ 20/Dec/21 ] |
|
I have made two PR, one of which is to make drp_stella depend on fluxmodeldata package. The other one is to remove the photometry table from drp_pfs_data. The former change should be made before the latter change. Could you review these things? |
| Comment by price [ 21/Dec/21 ] |
|
I don't think we can require that every installation of the pipeline contains a 2.4 GB data package. You should make the data package setupOptional in the table file, and protect the tests with checks, e.g., here. |
| Comment by sogo.mineo [ 21/Dec/21 ] |
|
I changed fluxmodeldata from required to optional. Could you review the newly pushed patch? |
| Comment by price [ 22/Dec/21 ] |
|
Awesome, thanks! |
| Comment by sogo.mineo [ 22/Dec/21 ] |
|
Thanks for the review. I merged my two pull requests to master. |