[DAMD-92] Update 1D pipeline DM with multi-type candidates Created: 27/Oct/20 Updated: 29/Jan/22 Resolved: 12/Jan/22 |
|
| Status: | Done |
| Project: | Data Model |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Story | Priority: | Normal |
| Reporter: | Pierre-Yves CHABAUD | Assignee: | Pierre-Yves CHABAUD |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | 2DDRP-2021 A, 2DDRP-2021 A 6, 2DDRP-2021 A 7, 2DDRP-2021 A 8 | ||||||||
| Description |
|
The 1D pipeline computes the probability of a source to be a galaxy, a star or a QSO. The current data model provides redshift candidates for the type with the highest probability.
This story is related to a major update of 1D pipeline DM. The goal is to include to the DM all the candidates whatever the probability of a source to be a galaxy, star or qso.
We propose the following DM: HDU#0 PDU
HDU#1 GALAXY_CANDIDATES Binary table [FITS BINARY TABLE] NGALAXY_CANDIDATES
Colums for :
CRANK INT32 Rank of galaxy candidate (best=0)
Z FLOAT Redshift
Z_ERR FLOAT Redshift error
Z_PROBA FLOAT XXXX (dz=+/-3.e-3)
SUBCLASS STR Subclassifications
CFILE STR Continuum template file name
LFILE STR Linecatalog ratio template file name
MODELFLUX FLOAT[NROW] Spectrum model (unit: nJy)
HDU#2 GALAXY_PDF Binary table [FITS BINARY TABLE] NPIX
Colums for :
PDF FLOAT[NROW] PDF marginalised over all models
HDU#3 GALAXY_LINES Binary table [FITS BINARY TABLE] NLINE
Columns for :
LINENAME STR Line name
LINEWAVE FLOAT Catalog wavelength for this line in vacuum (unit: nm)
LINEZ FLOAT Redshift
LINEZ_ERR FLOAT Redshift error
LINESIGMA FLOAT Gaussian width (unit: nm)
LINESIGMA_ERR FLOAT Error in gaussian width
LINEVEL FLOAT Gaussian width (unit: km/sec)
LINEVEL_ERR FLOAT Error in gaussian width
LINEFLUX FLOAT Area in gaussian fit (unit: erg/cm^2/s)
LINEFLUX_ERR FLOAT Flux error
LINEEW FLOAT Equivalent width (unit: nm)
LINEEW_ERR FLOAT Equivalent width error
LINECONTLEVEL FLOAT Continuum level at line center (unit: nJy)
LINECONTLEVEL_ERR FLOAT Error in continuum level at line center
HDU#4 QSO_CANDIDATES Binary table [FITS BINARY TABLE] NQSO_CANDIDATES
Colums for :
CRANK INT32 Rank of QSO candidate (best=0)
Z FLOAT Redshift
Z_ERR FLOAT Redshift error
Z_PROBA FLOAT XXXX (dz=+/-3.e-3)
SUBCLASS STR Subclassifications
MODELFLUX FLOAT[NROW] Spectrum model (unit: nJy)
HDU#5 QSO_PDF Binary table [FITS BINARY TABLE] NPIX
Colums for :
PDF FLOAT PDF marginalised over all models
HDU#6 QSO_LINES Binary table [FITS BINARY TABLE] NLINE
Columns for :
LINENAME STR Line name
LINEWAVE FLOAT Catalog wavelength for this line in vacuum (unit: nm)
LINEZ FLOAT Redshift
LINEZ_ERR FLOAT Redshift error
LINESIGMA FLOAT Gaussian width (unit: nm)
LINESIGMA_ERR FLOAT Error in gaussian width
LINEVEL FLOAT Gaussian width (unit: km/sec)
LINEVEL_ERR FLOAT Error in gaussian width
LINEFLUX FLOAT Area in gaussian fit (unit: erg/cm^2/s)
LINEFLUX_ERR FLOAT Flux error
LINEEW FLOAT Equivalent width (unit: nm)
LINEEW_ERR FLOAT Equivalent width error
LINECONTLEVEL FLOAT Continuum level at line center (unit: nJy)
LINECONTLEVEL_ERR FLOAT Error in continuum level at line center
HDU#7 STAR_CANDIDATES Binary table [FITS BINARY TABLE] NSTAR_CANDIDATES
Columns for :
CRANK INT32 Rank of star candidate (best=0)
V FLOAT Star velocity (unit: kms-1)
V_ERR FLOAT Star velocity error
T_PROBA FLOAT XXXX
SUBCLASS STR Subclassifications
TFILE STR Template file name
MODELFLUX FLOAT[NROW] Spectrum model (unit: nJy)
HDU#8 STAR_PDF Binary table [FITS BINARY TABLE] NPIX
Colums for :
PDF FLOAT PDF related to best model
|
| Comments |
| Comment by Pierre-Yves CHABAUD [ 27/Oct/20 ] | |||||||||||||||||||||||||||||||||
|
The header of the FITS file should contains the version of the pipeline used to process spectrum | |||||||||||||||||||||||||||||||||
| Comment by rhl [ 28/Oct/20 ] | |||||||||||||||||||||||||||||||||
|
How are you planning to track all the configuration parameters? | |||||||||||||||||||||||||||||||||
| Comment by Pierre-Yves CHABAUD [ 28/Oct/20 ] | |||||||||||||||||||||||||||||||||
|
rhl It's a really good question. There might be lot of paramaters and not so easy to represent. The current parameter file is a JSON structured format whereas fits header is a flat format.
An other solution could be to track only parameters which are different from default parameters. Indeed, the value of default parameters are embed in the pipeline version. But this solution implies a dynamic header format (keyword might appear or not). I'm not sure if it's such a good idea.
How do you manage that on 2D side ? | |||||||||||||||||||||||||||||||||
| Comment by Masayuki Tanaka [ 28/Oct/20 ] | |||||||||||||||||||||||||||||||||
|
I agree to have a PDF for each type. In addition, science users will need P(galaxy), P(QSO), and P(star). Is that information included in the primary header? | |||||||||||||||||||||||||||||||||
| Comment by Pierre-Yves CHABAUD [ 28/Oct/20 ] | |||||||||||||||||||||||||||||||||
|
Masayuki Tanaka You're absolutly right. I update the PHDU with the CLASS keyword (should be GALAXY, QSO or STAR) and the computed values of P(galaxy), P(QSO) and P(star) HDU#0 PHDU
CLASS KEYWORD Spectro classification: GALAXY, QSO, STAR
P_GALAXY KEYWORD Probability to be a galaxy
P_QSO KEYWORD Probability to be a QSO
P_STAR KEYWORD Probability to be a star
| |||||||||||||||||||||||||||||||||
| Comment by Pierre-Yves CHABAUD [ 11/Nov/20 ] | |||||||||||||||||||||||||||||||||
|
In accordance with Masayuki Tanaka and price proposition, I update the PHDU with pipeline and product versions. I also move classification results to a dedicated classification HDU HDU#0 PHDU
D1D_VER KEYWORD Version of the DRP_1D library
D1DP_VER KEYWORD Version of the DRP_1DPIPE pipeline
DAMD_VER KEYWORD Version of the data model
PAR_FILE KEYWORD Parameters file name
HDU#1 CLASSIFICATION
CLASS KEYWORD Spectro classification: GALAXY, QSO, STAR
P_GALAXY KEYWORD Probability to be a galaxy
P_QSO KEYWORD Probability to be a QSO
P_STAR KEYWORD Probability to be a star
| |||||||||||||||||||||||||||||||||
| Comment by Ali Allaoui [ 26/Nov/20 ] | |||||||||||||||||||||||||||||||||
|
Regarding line measurement, wich unit should we use for the flux ? erg/cm^2/s or 10^-35 W/m2/Hz (more consistent with nJy)? | |||||||||||||||||||||||||||||||||
| Comment by rhl [ 01/Dec/20 ] | |||||||||||||||||||||||||||||||||
|
Why not just use nJy? | |||||||||||||||||||||||||||||||||
| Comment by vlebrun [ 01/Dec/20 ] | |||||||||||||||||||||||||||||||||
|
nJy is the unit for the spectrum flux density. As for the line flux, we have to integrate over the line profile, leading to a flux in W/m2 (or any other multiple like the vintage erg/cm2/s or nJ.Hz which I never saw but we could be pioneers on that one | |||||||||||||||||||||||||||||||||
| Comment by rhl [ 01/Dec/20 ] | |||||||||||||||||||||||||||||||||
|
Right; I was being stupid. So you meant to propose 10^-35 W/m2 (not "/Hz")? | |||||||||||||||||||||||||||||||||
| Comment by vlebrun [ 01/Dec/20 ] | |||||||||||||||||||||||||||||||||
|
well the original mistake was mine actually | |||||||||||||||||||||||||||||||||
| Comment by Pierre-Yves CHABAUD [ 01/Dec/20 ] | |||||||||||||||||||||||||||||||||
|
Description of "PDF" HDU will be change header from TTYPE2 = 'PDF' to TTYPE2 = 'ln PDF' to avoid future confusions. | |||||||||||||||||||||||||||||||||
| Comment by Ali Allaoui [ 29/Jan/21 ] | |||||||||||||||||||||||||||||||||
|
Current proposal for the datamodel (from https://github.com/Subaru-PFS/datamodel/blob/tickets/DAMD-92/datamodel.txt) : HDU#0 PHDU HDU#1 CLASSIFICATION HDU#1 GALAXY_CANDIDATES Binary table [FITS BINARY TABLE] NGALAXY_CANDIDATES HDU#2 GALAXY_PDF Binary table [FITS BINARY TABLE] NPIX HDU#3 GALAXY_LINES Binary table [FITS BINARY TABLE] NLINE HDU#4 QSO_CANDIDATES Binary table [FITS BINARY TABLE] NQSO_CANDIDATES HDU#5 QSO_PDF Binary table [FITS BINARY TABLE] NPIX HDU#6 QSO_LINES Binary table [FITS BINARY TABLE] NLINE HDU#7 STAR_CANDIDATES Binary table [FITS BINARY TABLE] NSTAR_CANDIDATES HDU#8 STAR_PDF Binary table [FITS BINARY TABLE] NPIX HDU#1 CLASSIFICATION HDU#1 GALAXY_CANDIDATES Binary table [FITS BINARY TABLE] NGALAXY_CANDIDATES HDU#2 GALAXY_PDF Binary table [FITS BINARY TABLE] NPIX HDU#3 GALAXY_LINES Binary table [FITS BINARY TABLE] NLINE HDU#4 QSO_CANDIDATES Binary table [FITS BINARY TABLE] NQSO_CANDIDATES HDU#5 QSO_PDF Binary table [FITS BINARY TABLE] NPIX HDU#6 QSO_LINES Binary table [FITS BINARY TABLE] NLINE HDU#7 STAR_CANDIDATES Binary table [FITS BINARY TABLE] NSTAR_CANDIDATES HDU#8 STAR_PDF Binary table [FITS BINARY TABLE] NPIX | |||||||||||||||||||||||||||||||||
| Comment by Masayuki Tanaka [ 06/Jul/21 ] | |||||||||||||||||||||||||||||||||
|
I am terribly sorry for being extremely slow. Morishima-san at NAOJ (he is a database person) and I carefully looked at the datamodel as well as real psfZCandidates files from drp_1dpipe (ver.0.20.1) + drp_1d (ed36a29) + datamodel (w.2021.26). The majority of the work is kindly done by Morishima-san and thanks go to him. HDU #0: We see TRACT, PATCH, CATID, OBJID, NVISIT, VHASH are gone. They are in the file names, but a user may well rename the file. So, it is easy to loose this key information. We prefer to have them in the primary header. I discussed with Ali offline, and I think it would be good to explicitly have the line list version in the header. The keyword PAR_FILE may not be super informative and we could have the configs defined in PAR_FILE in the header if the configs are not too many. Finally, this is not a datamodel issue, but where can I find the definition of Z_WARNING? HDU #2: Where can I find the wavelengths of the best-fit model? I guess you used the CRPIX, CRCAL, CD from pfsObject but it would be nice to copy it to pfsZCandidates. Also, it might make sense to have the galaxy and QSO PDF in the same HDU so they can share the REDSHIFT column. By the way, I do not understand the numbers in the REDSHIFT column (you can look at the sample files in datapack-0.20.1/validation. This is not a datamodel issue, though. HDU #4: It would be nice to be explicit about the units in the header (LINEWAVE is nm, LINEVEL is km/s). I see some of the Balmer lines are measured twice: once for absorption and the other for emission. Is this intended? HDU #7: What does T_PROBA mean? It is an integrated probability over what area? HDU #8: REDSHIFT does not mean a lot for stars. Maybe we can use RADIAL_VELOCITY (or simply VELOCITY) instead? The explanation of PDF should be 'PDF marginalized over all models', no? HDU #9: Do you plan to make line measurements for stars? If so, it should be included here. | |||||||||||||||||||||||||||||||||
| Comment by Ali Allaoui [ 09/Sep/21 ] | |||||||||||||||||||||||||||||||||
|
First of all, you can find the correct definition of the datamodel here
HDU GALAXY_CANDIDATES: Indeed, what would be usefull is a goodness of fit estimation. The reduced chi-squares being one of the possibilities. We are curently investigating several criteria to do so. You are right we missed that column for each candidate (HDU #2. HDU #4 & HDU #7), could be named "reliability" or "goodness_fit" ? HDU GALAXY_CANDIDATES model: We use the wavelength WavelengthArray from PfsObject. We can add CRPIX, CRCAL, CD from pfsObject to HDU #0 (as it is common to HDU #1 #4 and #7) HDU GALAXY_PDF: Galaxy and QSO pdf do not share the redshift column, as the solvers work on different redshift ranges. They are on a logarithmic scale HDU GALAXY_LINES: Yes this is intended. These lines can be either in emission or absorption, eventually simultaneously (with different velocity dispersion) HDU STAR_CANDIDATES: It means probability a posteriori of the star template. The sum of all templates prob being 1 => description to be changed. HDU STAR PDF: Name has been changed. Explanation is correct Line measurement for stars are not planned | |||||||||||||||||||||||||||||||||
| Comment by Masayuki Tanaka [ 18/Sep/21 ] | |||||||||||||||||||||||||||||||||
|
Thank you, Ali! I am sorry for this delay, but the changes look good to me. If you expect the meaning of Z_WARNING to evolve with time, then it would be a good idea to have the definition in the fits header. If not, README is perhaps fine (but we could of course have them in the header even in this case, so the user does not have to go back to README). I do not have a strong opinion on reliability vs goodness_fit. You can pick one that best describes the parameter you choose to use. | |||||||||||||||||||||||||||||||||
| Comment by hassan [ 18/Dec/21 ] | |||||||||||||||||||||||||||||||||
|
Any updates on this Ali Allaoui? | |||||||||||||||||||||||||||||||||
| Comment by Ali Allaoui [ 12/Jan/22 ] | |||||||||||||||||||||||||||||||||
|
branch tickets/ | |||||||||||||||||||||||||||||||||
| Comment by hassan [ 14/Jan/22 ] | |||||||||||||||||||||||||||||||||
|
ticket branch merged to master. |