[DAMD-40] Define relative locations of the 1d + 2d pipeline outputs Created: 11/Jan/19 Updated: 15/Sep/20 Resolved: 15/Sep/20 |
|
| Status: | Done |
| Project: | Data Model |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Story | Priority: | Normal |
| Reporter: | Masayuki Tanaka | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
The outputs from the 2d pipeline have to be passed to the 1d pipelines. For this, the relative locations of the pipeline outputs have to be defined. A few constraints:
There may be other constraints we need to consider. |
| Comments |
| Comment by Masayuki Tanaka [ 11/Jan/19 ] |
|
I first thought that all the 1d outputs can be under a 2d rerun directory, something line this: 2d/ ├── BIAS ├── CALIB ├── DARK ├── FLAT ├── OTHER_STUFF └── rerun ├── dr1 ├── dr2 └── dr3 ├── ga1d │ ├── STUFF │ └── rerun │ ├── test1 │ └── test2 └── lam1d ├── STUFF └── rerun ├── test1 └── test2 But, this may require having 1d repos for each 2d rerun. Perhaps not a good idea. Another naive thinking is: ./ ├── 2d │ ├── BIAS │ ├── CALIB │ ├── DARK │ ├── FLAT │ ├── OTHER_STUFF │ └── rerun │ ├── dr1 │ ├── dr2 │ └── dr3 ├── ga1d │ ├── STUFF │ └── rerun │ ├── dr3_test1 │ └── dr3_test2 └── lam1d ├── STUFF └── rerun ├── dr3_test1 └── dr3_test2 For this, we need to a mapping between the 2d and 1d reruns. One way to do it is to have 1d rerun name as "2d_rerun_name"_"your_string". In the above example, "dr3_test1" is based on the "dr3" rerun from 2d. There are probably other (and better) options.
|
| Comment by rhl [ 16/Feb/19 ] |
|
Sorry, this got lost. I have no problem with defining the layout of the outputs here, but they should merely be documenting the implementation in the butler templates (for 2-D). In particular, this means that the reruns and calibrations should probable be laid out a bit differently. Is this urgent? |
| Comment by Masayuki Tanaka [ 18/Feb/19 ] |
|
No, this is not urgent. We will probably implement one of the above in the proto-type database v2, but we do not have real 2d outputs there (only simulated psfObject files) and the directory structure will be different anyway. |
| Comment by Masayuki Tanaka [ 13/Apr/20 ] |
|
sogo.mineo suggested that this is also a possible option:
ROOT
+-- BIAS
+-- CALIB
: :
+ 1DSTUFF
+-- rerun
+-- 2d/dr1
+-- 2d/dr2
+-- 2d/dr3
+-- gal1d/dr3_test1
+-- gal1d/dr3_test2
+-- lam1d/dr3_test1
+-- lam1d/dr3_test2
Do we prefer to have different root for 2d and 1d pipelines? |
| Comment by hassan [ 14/Apr/20 ] |
|
Need a definitive agreement in the next 3 weeks. |
| Comment by hassan [ 14/Apr/20 ] |
|
Once an agreement has been made, a document capturing that agreement will be prepared and uploaded to PbWorks. Hassan will take that action. |
| Comment by rhl [ 12/May/20 ] |
|
I really don't care much, but I don't understand the examples – what is BIAS, for example? The reruns need to be under ROOT, at least via a symbolic link, so that you can point the butler at e.g. ROOT/rerun/drp2 or ROOT/rerun/rhl/foo. I don't know how 1-D will handle reruns, but I'd expect them to be able to be able to process any 2-D rerun, including all the book-keeping so we know what we did. Ideally this would be transparent (as is true on the 2-D side), but this is independent of the logical layout. Until we understand how 1-D will handle reprocessing it's hard to take a decision. |
| Comment by Masayuki Tanaka [ 12/May/20 ] |
|
If you ingest new data, you'll have directories $(OBJECT) under your root directory. That is what BIAS etc meant. Anyway, that is not very important. Here is how you run the 1d pipeline: It looks like you will have to tell the pipeline where your spectra are with respect to the root directory. I am not sure if you have any control on the output directory (it does not look like it). We could ask the LAM team to handle reruns. Another option might be to ask Mineo-kun to somehow handle 1d and 2d reruns internally in his code. I'd prefer the former because we want to have a good level of transparency at the rerun level. |
| Comment by Pierre-Yves CHABAUD [ 02/Jul/20 ] |
|
The relative locations of the input (where the 2D outputs are stored) and output (where the DRP-1D data products are stored) directories are comand line parameters of the pipeline (–spectra_dir and --output_dir), and default values can be adapted as a function of implementation (e.g the output directory can be named from the input one with a time_stamp). Thus you can produce as many rerun of a given input catalog as wanted. |
| Comment by Masayuki Tanaka [ 03/Jul/20 ] |
|
Thank you! The ability to specify input/output directories is exactly what we want for our purpose. I am going to educate myself on how to run the 1d pipeline a little further. I am not sure if we are going to run the 1d pipeline at the summit (with different config). But, we may change the config and rerun the pipeline for testing purposes or whatever reason and that is what 'rerun' is for. |
| Comment by Masayuki Tanaka [ 25/Aug/20 ] |
|
I've run the LAM 1d pipeline on ~60k simulated PFS spectra. I think any of the options discussed on this ticket will work. HSC has about 80 reruns for SSP production so far (there should be more for testing purposes), and if I think about >10 years of PFS operations, I would suggest that we have separate root directories for 2ddrp, lam1d, (and probably ga1d as well, although I do not know yet what the ga1d output looks like). ./
├── 2d
│ ├── BIAS
│ ├── CALIB
│ ├── DARK
│ ├── FLAT
│ ├── OTHER_STUFF
│ └── rerun
│ ├── dr1
│ ├── dr2
│ └── dr3
├── ga1d
│ ├── STUFF
│ └── rerun
│ ├── dr3_test1
│ └── dr3_test2
└── lam1d
├── STUFF
└── rerun
├── dr3_test1
└── dr3_test2
The 1d rerun names should inherit the 2d rerun name to make the relationship clear. |
| Comment by rhl [ 11/Sep/20 ] |
|
This is fine, except that I'm not sure what you mean by CALIB. The 2d pipeline has one CALIB directory which contains the biases, flats, fiberTraces, etc. The raw biases etc. are in the same place as all the other raw data, so it's more like: ./
+-- 2d
| +-- raw
| | +-- data
| +-- CALIB
| | +-- BIAS
| | +-- DARK
| | +-- FLAT
| | +-- FIBER_TRACE
| | +-- OTHER_STUFF
| +-- rerun
| +-- rhl
| | +-- test1
| +-- dr1
| +-- dr2
| +-- dr3
+-- ga1d
| +-- STUFF
| +-- rerun
| +-- dr3_test1
| +-- dr3_test2
+-- lam1d
+-- STUFF
+-- rerun
+-- dr3_test1
+-- dr3_test2
Except that I'd expect that we'd need some new CALIBs for every rerun (because the algorithms to build e.g. fibre traces and arc solutions will change, and I have no idea how much smarter we'll get with the H4RGs), so I'd put a CALIB in each rerun (and probably symbolic link it to the top level, but that's a real detail). On the 2-D side the reruns logically include the raw data, but the butler handles that so it's not visible to the file system. |
| Comment by Masayuki Tanaka [ 15/Sep/20 ] |
|
Thank you for the useful comment! This ticket is about relative locations of the 2d and 1d outputs, and I think we can safely close it now. |