[DAMD-127] Definition of pfsReference Created: 21/Dec/21  Updated: 15/Jul/22  Resolved: 18/Feb/22

Status: Done
Project: Data Model
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Story Priority: Normal
Reporter: Takuji Yamashita Assignee: Takuji Yamashita
Resolution: Done Votes: 0
Labels: flux-calibration
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
blocks PIPE2D-922 Write a main code to run the blue par... Done
Duplicate
is duplicated by DAMD-43 Data model representation for flux ca... Won't Fix
Relates
relates to DAMD-128 Update the pfsReference class Done
Reviewers: hassan

 Description   

We have pfsReference which includes the best-fit model template (or set) of flux standards. pfsReference is generated from `calculateReferenceFlux` and then is passed to `fluxCalibrate`. The datamodel definition has been not yet fixed and is not mentioned in datamodel.txt. We propose its definition here. Although this definition can change because we are developing flux calibration, we share our thought with you and we would like to get feedback from you. We will update datamodel.txt according to the discussion here.
The biggest change from the initial assumption in the documents is that pfsReference includes the model template set of a single visit. pfsReference is initially assumed to inherit PfsSimpleSpectrum which represents a spectrum of a single object
(see
https://github.com/Subaru-PFS/datamodel/blob/92c6865d560524971530f000162ebdc59b51308b/python/pfs/datamodel/drp.py#L31 )

 
Our proposal:
 
Reference model
 
The reference model set is a product from `calculateReferenceFlux` and is stored in the pfsReference file. pfsReference is used in the `fluxCalibrate` procedure together with pfsMerged to produce flux-calibrated spectra. pfsReference should include the wavelength and flux of the reference spectra of standard stars. Optionally, the information useful in debugging for developers are included in pfsReference. We propose that pfsReference includes multiple objects in a visit like pfsMerged.
 
The reference model set is saved to:
"pfsReference-%06d.fits" % (visit,)
 
The file has several HDUs:
 
HDU #0 PDU
HDU #1 FIBERID Fiber identifier [32-bit INT] NFIBER
HDU #2 WAVELENGTH Wavelength in units of nm [FLOAT] NROW*NFIBER
HDU #3 FLUX Flux of reference models in units of nJy [FLOAT] NROW*NFIBER
HDU #4 FITFLAG Flag if the fitting was successful or not [32-bit INT] NFIBER
HDU #5 PDF PDFs and Chi-square of fitting [FLOAT] NFIBER*6 optional
HDU #6 METADATA Table of by-products [BINARY TABLE] optional
 
PDU includes the visit number.
 
HDU #4 includes the flag of whether the fitting process was successful or not. This flag also includes the causes of failure.
 
HDUs #5 and 6 are for debugging. HDU #5 includes the probability distribution functions and chi-squares of three types: broad-band SED fitting, spectral fitting, and combined one of both. HDU #6 is a binary table of by-products from the calculateReferenceFlux process, which at least includes the stellar parameters of the reference models, the estimated radial velocity, the fitting parameters to continuum, the scaling factor, and Galactic extinction.
 



 Comments   
Comment by Takuji Yamashita [ 21/Dec/21 ]

And, in this case, how do we implement the pfsReference class? We inherit PfsFiberArraySet and then add new HDUs and new dedicated methods (e.g., read/write an FITS file)? Or we make a new class for it?

Comment by price [ 22/Dec/21 ]

Don't worry about the implementation for now. It's probably a new class, but it shouldn't be hard to put together once we've agreed on the datamodel.

1. Can we use the same wavelength scale for each fiber, and save a large fraction of the space? I expect all the models use a common wavelength scale.
2. How many flags do we need? It sounds like it's either success or failure, in which case FITFLAG should use booleans rather than 32-bit integers.
3. How do you fit a full probability distribution function and chi^2 into six floats per fiber?
4. Let's not use the name METADATA for HDU#6: too generic. It seems to me that these are the fit parameters (so, FITPARAMS?). Is there a reason we can't put the PDFs in there too, as array columns?

Comment by Takuji Yamashita [ 22/Dec/21 ]

1. You are right. We can use the same wavelength scale for all the fibers. We use the header keywords instead.
2. We can use booleans (success or failure), as you suggested. 
3. Sorry, I made a mistake. Actually, (a number of models) * (3 PDFs and 3 chi^2 = 6) * NFIBER is correct.
4. How about DEBUGDATA ? It includes not only the fit parameters. For example, it includes the estimated radial velocity. We cannot put the PDFs in here together. The correct number of dimensions of HDU#5 is different from that of HDU#6.

Comment by rhl [ 22/Dec/21 ]

I think you almost always end up wanting binary flags (or an enum) not a simple bool.

Comment by sogo.mineo [ 22/Dec/21 ]

I think that PfsReference may contain small amount of "metadata" in its strict meaning "data that describes data", but that "debug" information should go into another file. I think PfsReference should concisely contain just such information as is necessary and sufficient for later tasks, especially when you worry about the size of the wavelength array. A "PDF" contains 60,000 float numbers per fiber, which is already far larger than the wavelength array, and the proposal reads there will be 6 arrays of this size. The separate file for debugging will make no sense to common users. By making it a separate file, we can make its creation optional.

Comment by Takuji Yamashita [ 11/Jan/22 ]

I have discussed with Mineo-san and Tanaka-san off-line. The data volume of PDFs is much large, and I do not expect the PDFs themselves are useful for users. So, I propose to remove the PDFs' extension from pfsReference, and I revised the pfsReference definition as follows. We can save the PDFs as a separate file for developers. 
 
I also removed the HDU of wavelength, renamed METADATA to FITPARAMS, and fixed the dimension of FLUX from NROW*NFIBER to (the number of pixels) * NFIBER.
 
 
 
Updated version
 
Reference model
 
The reference model set is a product from `calculateReferenceFlux` and is stored in the pfsReference file. pfsReference is used in the `fluxCalibrate` procedure together with pfsMerged to produce flux-calibrated spectra. pfsReference should include the wavelength and flux of the reference spectra of standard stars, as well as fitting results. We propose that pfsReference includes multiple objects in a visit like pfsMerged.
 
The reference model set is saved to:
"pfsReference-%06d.fits" % (visit,)
 
The file has several HDUs:
 
HDU #0 PDU
HDU #1 FIBERID Fiber identifier [32-bit INT] NFIBER
HDU #2 FLUX Flux of reference models in units of nJy [FLOAT] (the number of pixels of a model)*NFIBER
HDU #3 FITFLAG Flag if the fitting was successful or not [32-bit INT] NFIBER
HDU #4 FITPARAMS Table of by-products [BINARY TABLE]
 
PDU includes the visit number.
 
HDU #3 includes the flag of whether the fitting process was successful or not. This flag also includes the causes of failure.
 
HDU #4 includes a binary table of by-products from the calculateReferenceFlux process. HDU #4 includes at least the stellar parameters of the reference models, the estimated radial velocity, the fitting parameters to continuum, the scaling factor, and Galactic extinction.
 

 

Comment by Takuji Yamashita [ 28/Jan/22 ]

I would appreciate it if you could give me your comments on the updated version. If there is no further comment, we will go with this.

Comment by rhl [ 03/Feb/22 ]

Could we just need to save the parameters of the chosen reference star, and provide a function to regenerate the reference spectrum? If we later wanted to generate denormalised products for users who only want to look at FITS files that'd be possible, but if they're using the sciDB then we'd just provide the code (similar to the way that PSFs are handled in hsc)

Comment by Takuji Yamashita [ 04/Feb/22 ]

It is possible. One thing that I am concerned about is that it takes a certain amount of time for a user to regenerate a set of model templates because the current code (RBF interpolation) needs several seconds for each template.
The change that you propose needs the modification of the structure of the current code set of flux calibration (spectral reddening and absolute scaling). Because there is no scientific impact of this change, I would like to go with FITS-format pfsReference at this stage. At some point, we can change pfsReference to save only parameters.

Comment by Takuji Yamashita [ 07/Feb/22 ]

I have added the pfsRerence definition in datamodel.txt. I would like to ask you or a relevant person to review it. We need to make a new class of pfsReference in the datamodel module. Mineo-san will be assigned to its task in a separate branch.

Comment by hassan [ 15/Feb/22 ]

Proposed text looks fine to me. Minor comment added to pull request.

Comment by Takuji Yamashita [ 18/Feb/22 ]

The branch has been merged to master.

Generated at Sat Feb 10 15:34:32 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.