[DAMD-92] Update 1D pipeline DM with multi-type candidates Created: 27/Oct/20  Updated: 29/Jan/22  Resolved: 12/Jan/22

Status: Done
Project: Data Model
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Story Priority: Normal
Reporter: Pierre-Yves CHABAUD Assignee: Pierre-Yves CHABAUD
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to PIPE1D-25 Output pfsZcandidate*.fits files for ... In Review
Sprint: 2DDRP-2021 A, 2DDRP-2021 A 6, 2DDRP-2021 A 7, 2DDRP-2021 A 8

 Description   

The 1D pipeline computes the probability of a source to be a galaxy, a star or a QSO. The current data model provides redshift candidates for the type with the highest probability.

 

This story is related to a major update of 1D pipeline DM. The goal is to include to the DM all the candidates whatever the probability of a source to be a galaxy, star or qso.

 

We propose the following DM:

HDU#0 PDU

HDU#1 GALAXY_CANDIDATES         Binary table            [FITS BINARY TABLE]     NGALAXY_CANDIDATES
        Colums for :
        CRANK                   INT32                   Rank of galaxy candidate (best=0)
        Z                       FLOAT                   Redshift
        Z_ERR                   FLOAT                   Redshift error
        Z_PROBA                 FLOAT                   XXXX (dz=+/-3.e-3)
        SUBCLASS                STR                     Subclassifications
        CFILE                   STR                     Continuum template file name
        LFILE                   STR                     Linecatalog ratio template file name
        MODELFLUX               FLOAT[NROW]             Spectrum model (unit: nJy)

HDU#2 GALAXY_PDF                Binary table            [FITS BINARY TABLE]     NPIX
        Colums for :
        PDF                     FLOAT[NROW]                   PDF marginalised over all models

HDU#3 GALAXY_LINES              Binary table            [FITS BINARY TABLE]     NLINE
        Columns for :
        LINENAME                STR                     Line name
        LINEWAVE                FLOAT                   Catalog wavelength for this line in vacuum (unit: nm)
        LINEZ                   FLOAT                   Redshift
        LINEZ_ERR               FLOAT                   Redshift error
        LINESIGMA               FLOAT                   Gaussian width (unit: nm)
        LINESIGMA_ERR           FLOAT                   Error in gaussian width
        LINEVEL                 FLOAT                   Gaussian width (unit: km/sec)
        LINEVEL_ERR             FLOAT                   Error in gaussian width
        LINEFLUX                FLOAT                   Area in gaussian fit (unit: erg/cm^2/s)
        LINEFLUX_ERR            FLOAT                   Flux error
        LINEEW                  FLOAT                   Equivalent width (unit: nm)
        LINEEW_ERR              FLOAT                   Equivalent width error
        LINECONTLEVEL           FLOAT                   Continuum level at line center (unit: nJy)
        LINECONTLEVEL_ERR       FLOAT                   Error in continuum level at line center

HDU#4 QSO_CANDIDATES            Binary table            [FITS BINARY TABLE]     NQSO_CANDIDATES
        Colums for :
        CRANK                   INT32                   Rank of QSO candidate (best=0)
        Z                       FLOAT                   Redshift
        Z_ERR                   FLOAT                   Redshift error
        Z_PROBA                 FLOAT                   XXXX (dz=+/-3.e-3)
        SUBCLASS                STR                     Subclassifications
        MODELFLUX               FLOAT[NROW]             Spectrum model (unit: nJy)

HDU#5 QSO_PDF                   Binary table            [FITS BINARY TABLE]     NPIX
        Colums for :
        PDF                     FLOAT                   PDF marginalised over all models

HDU#6 QSO_LINES                 Binary table            [FITS BINARY TABLE]     NLINE
        Columns for :
        LINENAME                STR                     Line name
        LINEWAVE                FLOAT                   Catalog wavelength for this line in vacuum (unit: nm)
        LINEZ                   FLOAT                   Redshift
        LINEZ_ERR               FLOAT                   Redshift error
        LINESIGMA               FLOAT                   Gaussian width (unit: nm)
        LINESIGMA_ERR           FLOAT                   Error in gaussian width
        LINEVEL                 FLOAT                   Gaussian width (unit: km/sec)
        LINEVEL_ERR             FLOAT                   Error in gaussian width
        LINEFLUX                FLOAT                   Area in gaussian fit (unit: erg/cm^2/s)
        LINEFLUX_ERR            FLOAT                   Flux error
        LINEEW                  FLOAT                   Equivalent width (unit: nm)
        LINEEW_ERR              FLOAT                   Equivalent width error
        LINECONTLEVEL           FLOAT                   Continuum level at line center (unit: nJy)
        LINECONTLEVEL_ERR       FLOAT                   Error in continuum level at line center

HDU#7 STAR_CANDIDATES            Binary table            [FITS BINARY TABLE]        NSTAR_CANDIDATES
        Columns for :
        CRANK                   INT32                   Rank of star candidate (best=0)
        V                       FLOAT                   Star velocity (unit: kms-1)
        V_ERR                   FLOAT                   Star velocity error
        T_PROBA                 FLOAT                   XXXX
        SUBCLASS                STR                     Subclassifications
        TFILE                   STR                     Template file name
        MODELFLUX               FLOAT[NROW]             Spectrum model (unit: nJy)

HDU#8 STAR_PDF                   Binary table            [FITS BINARY TABLE]     NPIX
        Colums for :
        PDF                     FLOAT                   PDF related to best model


 Comments   
Comment by Pierre-Yves CHABAUD [ 27/Oct/20 ]

The header of the FITS file should contains the version of the pipeline used to process spectrum

Comment by rhl [ 28/Oct/20 ]

How are you planning to track all the configuration parameters?

Comment by Pierre-Yves CHABAUD [ 28/Oct/20 ]

rhl  It's a really good question. There might be lot of paramaters and not so easy to represent. The current parameter file is a JSON structured format whereas fits header is a flat format.

 

An other solution could be to track only parameters which are different from default parameters. Indeed, the value of default parameters are embed in the pipeline version. But this solution implies a dynamic header format (keyword might appear or not). I'm not sure if it's such a good idea.

 

How do you manage that on 2D side ?

Comment by Masayuki Tanaka [ 28/Oct/20 ]

I agree to have a PDF for each type.  In addition, science users will need P(galaxy), P(QSO), and P(star).  Is that information included in the primary header?

Comment by Pierre-Yves CHABAUD [ 28/Oct/20 ]

Masayuki Tanaka You're absolutly right. I update the PHDU with the CLASS keyword (should be GALAXY, QSO or STAR) and the computed values of P(galaxy), P(QSO) and P(star)

HDU#0 PHDU
     CLASS        KEYWORD     Spectro classification: GALAXY, QSO, STAR
     P_GALAXY     KEYWORD     Probability to be a galaxy
     P_QSO        KEYWORD     Probability to be a QSO
     P_STAR       KEYWORD     Probability to be a star
Comment by Pierre-Yves CHABAUD [ 11/Nov/20 ]

In accordance with Masayuki Tanaka and price proposition, I update the PHDU with pipeline and product versions. I also move classification results to a dedicated classification HDU

HDU#0 PHDU
     D1D_VER      KEYWORD     Version of the DRP_1D library
     D1DP_VER     KEYWORD     Version of the DRP_1DPIPE pipeline
     DAMD_VER     KEYWORD     Version of the data model
     PAR_FILE     KEYWORD     Parameters file name

HDU#1 CLASSIFICATION
     CLASS        KEYWORD     Spectro classification: GALAXY, QSO, STAR
     P_GALAXY     KEYWORD     Probability to be a galaxy
     P_QSO        KEYWORD     Probability to be a QSO
     P_STAR       KEYWORD     Probability to be a star

Comment by Ali Allaoui [ 26/Nov/20 ]

Regarding line measurement, wich unit should we use for the flux ? erg/cm^2/s or 10^-35 W/m2/Hz (more consistent with nJy)?

Comment by rhl [ 01/Dec/20 ]

Why not just use nJy?

Comment by vlebrun [ 01/Dec/20 ]

nJy is the unit for the spectrum flux density. As for the line flux, we have to integrate over the line profile, leading to a flux in W/m2 (or any other multiple like the vintage erg/cm2/s or nJ.Hz which I never saw but we could be pioneers on that one  ).

Comment by rhl [ 01/Dec/20 ]

Right; I was being stupid. So you meant to propose 10^-35 W/m2 (not "/Hz")?

Comment by vlebrun [ 01/Dec/20 ]

well the original mistake was mine actually  so yes I meant W/m2 ...

Comment by Pierre-Yves CHABAUD [ 01/Dec/20 ]

Description of "PDF" HDU will be change header from TTYPE2 = 'PDF'  to TTYPE2 = 'ln PDF' to avoid future confusions.

Comment by Ali Allaoui [ 29/Jan/21 ]

Current proposal for the datamodel (from https://github.com/Subaru-PFS/datamodel/blob/tickets/DAMD-92/datamodel.txt) :

HDU#0 PHDU
D1D_VER KEYWORD Version of the DRP_1D library
D1DP_VER KEYWORD Version of the DRP_1DPIPE pipeline
DAMD_VER KEYWORD Version of the data model
PAR_FILE KEYWORD Parameters file name
ZWARNING INT64 Quality flag

HDU#1 CLASSIFICATION
CLASS KEYWORD Spectro classification: GALAXY, QSO, STAR
P_GALAXY KEYWORD Probability to be a galaxy
P_QSO KEYWORD Probability to be a QSO
P_STAR KEYWORD Probability to be a star

HDU#1 GALAXY_CANDIDATES Binary table [FITS BINARY TABLE] NGALAXY_CANDIDATES
Colums for :
CRANK INT32 Rank of galaxy candidate (best=0)
Z FLOAT Redshift
Z_ERR FLOAT Redshift error
Z_PROBA FLOAT Areak of the PDF peak (dz=+/-3.e-3)
SUBCLASS STR Subclassifications
CFILE STR Continuum template file name
LFILE STR Linecatalog ratio template file name
MODELFLUX FLOAT[NROW] Spectrum model (unit: nJy)

HDU#2 GALAXY_PDF Binary table [FITS BINARY TABLE] NPIX
Colums for :
REDSHIFT FLOAT[NPIX] Redshift grid
PDF FLOAT[NPIX] PDF marginalised over all models

HDU#3 GALAXY_LINES Binary table [FITS BINARY TABLE] NLINE
Columns for :
LINENAME STR Line name
LINEWAVE FLOAT Catalog wavelength for this line in vacuum (unit: nm)
LINEZ FLOAT Redshift
LINEZ_ERR FLOAT Redshift error
LINESIGMA FLOAT Gaussian width (unit: nm)
LINESIGMA_ERR FLOAT Error in gaussian width
LINEVEL FLOAT Gaussian width (unit: km/sec)
LINEVEL_ERR FLOAT Error in gaussian width
LINEFLUX FLOAT Area in gaussian fit (unit: 10^-35 W/m2)
LINEFLUX_ERR FLOAT Flux error
LINEEW FLOAT Equivalent width (unit: nm)
LINEEW_ERR FLOAT Equivalent width error
LINECONTLEVEL FLOAT Continuum level at line center (unit: nJy)
LINECONTLEVEL_ERR FLOAT Error in continuum level at line center

HDU#4 QSO_CANDIDATES Binary table [FITS BINARY TABLE] NQSO_CANDIDATES
Colums for :
CRANK INT32 Rank of QSO candidate (best=0)
Z FLOAT Redshift
Z_ERR FLOAT Redshift error
Z_PROBA FLOAT Areak of the PDF peak (dz=+/-3.e-3)
SUBCLASS STR Subclassifications
MODELFLUX FLOAT[NROW] Spectrum model (unit: nJy)

HDU#5 QSO_PDF Binary table [FITS BINARY TABLE] NPIX
Colums for :
REDSHIFT FLOAT[NPIX] Redshift grid
PDF FLOAT[NPIX] PDF marginalised over all models

HDU#6 QSO_LINES Binary table [FITS BINARY TABLE] NLINE
Columns for :
LINENAME STR Line name
LINEWAVE FLOAT Catalog wavelength for this line in vacuum (unit: nm)
LINEZ FLOAT Redshift
LINEZ_ERR FLOAT Redshift error
LINESIGMA FLOAT Gaussian width (unit: nm)
LINESIGMA_ERR FLOAT Error in gaussian width
LINEVEL FLOAT Gaussian width (unit: km/sec)
LINEVEL_ERR FLOAT Error in gaussian width
LINEFLUX FLOAT Area in gaussian fit (unit: erg/cm^2/s)
LINEFLUX_ERR FLOAT Flux error
LINEEW FLOAT Equivalent width (unit: nm)
LINEEW_ERR FLOAT Equivalent width error
LINECONTLEVEL FLOAT Continuum level at line center (unit: nJy)
LINECONTLEVEL_ERR FLOAT Error in continuum level at line center

HDU#7 STAR_CANDIDATES Binary table [FITS BINARY TABLE] NSTAR_CANDIDATES
Columns for :
CRANK INT32 Rank of star candidate (best=0)
V FLOAT Star velocity (unit: kms-1)
V_ERR FLOAT Star velocity error
T_PROBA FLOAT Area of the PDF peak
SUBCLASS STR Subclassifications
TFILE STR Template file name
MODELFLUX FLOAT[NROW] Spectrum model (unit: nJy)

HDU#8 STAR_PDF Binary table [FITS BINARY TABLE] NPIX
Colums for :
REDSHIFT FLOAT[NPIX] Redshift grid
PDF FLOAT[NPIX] PDF related to best modelHDU#0 PHDU
D1D_VER KEYWORD Version of the DRP_1D library
D1DP_VER KEYWORD Version of the DRP_1DPIPE pipeline
DAMD_VER KEYWORD Version of the data model
PAR_FILE KEYWORD Parameters file name
ZWARNING INT64 Quality flag

HDU#1 CLASSIFICATION
CLASS KEYWORD Spectro classification: GALAXY, QSO, STAR
P_GALAXY KEYWORD Probability to be a galaxy
P_QSO KEYWORD Probability to be a QSO
P_STAR KEYWORD Probability to be a star

HDU#1 GALAXY_CANDIDATES Binary table [FITS BINARY TABLE] NGALAXY_CANDIDATES
Colums for :
CRANK INT32 Rank of galaxy candidate (best=0)
Z FLOAT Redshift
Z_ERR FLOAT Redshift error
Z_PROBA FLOAT Areak of the PDF peak (dz=+/-3.e-3)
SUBCLASS STR Subclassifications
CFILE STR Continuum template file name
LFILE STR Linecatalog ratio template file name
MODELFLUX FLOAT[NROW] Spectrum model (unit: nJy)

HDU#2 GALAXY_PDF Binary table [FITS BINARY TABLE] NPIX
Colums for :
REDSHIFT FLOAT[NPIX] Redshift grid
PDF FLOAT[NPIX] PDF marginalised over all models

HDU#3 GALAXY_LINES Binary table [FITS BINARY TABLE] NLINE
Columns for :
LINENAME STR Line name
LINEWAVE FLOAT Catalog wavelength for this line in vacuum (unit: nm)
LINEZ FLOAT Redshift
LINEZ_ERR FLOAT Redshift error
LINESIGMA FLOAT Gaussian width (unit: nm)
LINESIGMA_ERR FLOAT Error in gaussian width
LINEVEL FLOAT Gaussian width (unit: km/sec)
LINEVEL_ERR FLOAT Error in gaussian width
LINEFLUX FLOAT Area in gaussian fit (unit: 10^-35 W/m2)
LINEFLUX_ERR FLOAT Flux error
LINEEW FLOAT Equivalent width (unit: nm)
LINEEW_ERR FLOAT Equivalent width error
LINECONTLEVEL FLOAT Continuum level at line center (unit: nJy)
LINECONTLEVEL_ERR FLOAT Error in continuum level at line center

HDU#4 QSO_CANDIDATES Binary table [FITS BINARY TABLE] NQSO_CANDIDATES
Colums for :
CRANK INT32 Rank of QSO candidate (best=0)
Z FLOAT Redshift
Z_ERR FLOAT Redshift error
Z_PROBA FLOAT Areak of the PDF peak (dz=+/-3.e-3)
SUBCLASS STR Subclassifications
MODELFLUX FLOAT[NROW] Spectrum model (unit: nJy)

HDU#5 QSO_PDF Binary table [FITS BINARY TABLE] NPIX
Colums for :
REDSHIFT FLOAT[NPIX] Redshift grid
PDF FLOAT[NPIX] PDF marginalised over all models

HDU#6 QSO_LINES Binary table [FITS BINARY TABLE] NLINE
Columns for :
LINENAME STR Line name
LINEWAVE FLOAT Catalog wavelength for this line in vacuum (unit: nm)
LINEZ FLOAT Redshift
LINEZ_ERR FLOAT Redshift error
LINESIGMA FLOAT Gaussian width (unit: nm)
LINESIGMA_ERR FLOAT Error in gaussian width
LINEVEL FLOAT Gaussian width (unit: km/sec)
LINEVEL_ERR FLOAT Error in gaussian width
LINEFLUX FLOAT Area in gaussian fit (unit: erg/cm^2/s)
LINEFLUX_ERR FLOAT Flux error
LINEEW FLOAT Equivalent width (unit: nm)
LINEEW_ERR FLOAT Equivalent width error
LINECONTLEVEL FLOAT Continuum level at line center (unit: nJy)
LINECONTLEVEL_ERR FLOAT Error in continuum level at line center

HDU#7 STAR_CANDIDATES Binary table [FITS BINARY TABLE] NSTAR_CANDIDATES
Columns for :
CRANK INT32 Rank of star candidate (best=0)
V FLOAT Star velocity (unit: kms-1)
V_ERR FLOAT Star velocity error
T_PROBA FLOAT Area of the PDF peak
SUBCLASS STR Subclassifications
TFILE STR Template file name
MODELFLUX FLOAT[NROW] Spectrum model (unit: nJy)

HDU#8 STAR_PDF Binary table [FITS BINARY TABLE] NPIX
Colums for :
REDSHIFT FLOAT[NPIX] Redshift grid
PDF FLOAT[NPIX] PDF related to best model

Comment by Masayuki Tanaka [ 06/Jul/21 ]

I am terribly sorry for being extremely slow.  Morishima-san at NAOJ (he is a database person) and I carefully looked at the datamodel as well as real psfZCandidates files from drp_1dpipe (ver.0.20.1) + drp_1d (ed36a29) + datamodel (w.2021.26).  The majority of the work is kindly done by Morishima-san and thanks go to him.

HDU #0: We see TRACT, PATCH, CATID, OBJID, NVISIT, VHASH are gone.  They are in the file names, but a user may well rename the file.  So, it is easy to loose this key information.  We prefer to have them in the primary header.  I discussed with Ali offline, and I think it would be good to explicitly have the line list version in the header.  The keyword PAR_FILE may not be super informative and we could have the configs defined in PAR_FILE in the header if the configs are not too many.  Finally, this is not a datamodel issue, but where can I find the definition of Z_WARNING?
HDU #1: In addition to probabilities, it would be good to have reduced chi-squares.  It is useful; we don't trust a fit with chi2_nu=1e9 even if P(galaxy)=1 and Z_PROBA=0.99.  Maybe such a bad fit is captured by Z_WARNING, but again, I was not able to find its definition.

HDU #2: Where can I find the wavelengths of the best-fit model?  I guess you used the CRPIX, CRCAL, CD from pfsObject but it would be nice to copy it to pfsZCandidates.  Also, it might make sense to have the galaxy and QSO PDF in the same HDU so they can share the REDSHIFT column.  By the way, I do not understand the numbers in the REDSHIFT column (you can look at the sample files in datapack-0.20.1/validation.  This is not a datamodel issue, though.

HDU #4: It would be nice to be explicit about the units in the header (LINEWAVE is nm, LINEVEL is km/s).   I see some of the Balmer lines are measured twice: once for absorption and the other for emission.  Is this intended?

HDU #7: What does T_PROBA mean?  It is an integrated probability over what area?

HDU #8: REDSHIFT does not mean a lot for stars.  Maybe we can use RADIAL_VELOCITY (or simply VELOCITY) instead?  The explanation of PDF should be 'PDF marginalized over all models', no?

HDU #9: Do you plan to make line measurements for stars?  If so, it should be included here.

Comment by Ali Allaoui [ 09/Sep/21 ]

First of all, you can find the correct definition of the datamodel here
Some of it is still under development (https://pfspipe.ipmu.jp/jira/projects/PIPE1D/issues/PIPE1D-52)
HDU #0 : All remarks have been taken into account. Z_WARNING will be defined in drp1d_pipe README if that's ok. For now it is restricted to 1 bit, telling if line measurement is deactivated, this definition is themporary. The major content of this flag is still under development, the specification (still not definitive, extracted from RM-5638) is :

_. Bit Name _. Binary Digit _. Description
INVALID_SPECTRA_FLUX 0 Invalid spectra flux (Nans, Infs, ...)
INVALID_NOISE 1 Invalid noise (Nans, Infs, err=0, ...)
SMALL_WAVELENGTH_RANGE 2 Available wavelength range is too small
NEGATIVE_CONTINUUM 3 Negative continuum found (AMPLITUDE + 3 * AMPLITUDE_ERR < 0) at all z at all lambda? (VLB) (=> flat continuum pdf ?)
BAD_CONTINUUMFIT 4 Bad continum fit (=> linemodel fit will be bad, except when polynomial added in the model) how do we know? (VLB)
EXPECTED_LINES 5 H_alpha / OII, ... lines detectability in case of truncated spectra _for one line, all? _
CANDIDATE_ELIMINATION 6 Eliminate candidate at second-pass range border
NULL_AMPLITUDES 7 All amplitudes are null at all z (=> flat pdf)
PEAK_NOT_FOUND_PDF 8 Pdf finder peaks has failed
MAX_AT_BORDER_PDF 9 Pdf extremum is too close the border

HDU GALAXY_CANDIDATES: Indeed, what would be usefull is a goodness of fit estimation. The reduced chi-squares being one of the possibilities. We are curently investigating several criteria to do so. You are right we missed that column for each candidate (HDU #2. HDU #4 & HDU #7), could be named "reliability" or "goodness_fit" ?

HDU GALAXY_CANDIDATES model: We use the wavelength WavelengthArray from PfsObject. We can add CRPIX, CRCAL, CD from pfsObject to HDU #0 (as it is common to HDU #1 #4 and #7)

HDU GALAXY_PDF: Galaxy and QSO pdf do not share the redshift column, as the solvers work on different redshift ranges. They are on a logarithmic scale

HDU GALAXY_LINES: Yes this is intended. These lines can be either in emission or absorption, eventually simultaneously (with different velocity dispersion)

HDU STAR_CANDIDATES: It means probability a posteriori of the star template. The sum of all templates prob being 1 => description to be changed.

HDU STAR PDF: Name has been changed. Explanation is correct

Line measurement for stars are not planned

Comment by Masayuki Tanaka [ 18/Sep/21 ]

Thank you, Ali!  I am sorry for this delay, but the changes look good to me. If you expect the meaning of Z_WARNING to evolve with time, then it would be a good idea to have the definition in the fits header. If not, README is perhaps fine (but we could of course have them in the header even in this case, so the user does not have to go back to README). I do not have a strong opinion on reliability vs goodness_fit. You can pick one that best describes the parameter you choose to use.

Comment by hassan [ 18/Dec/21 ]

Any updates on this Ali Allaoui?

Comment by Ali Allaoui [ 12/Jan/22 ]

branch tickets/DAMD-92 is rebased and ready to be merged

Comment by hassan [ 14/Jan/22 ]

ticket branch merged to master.

Generated at Sat Feb 10 15:34:09 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.