[SCIDB-68] prototype v2: directory structure of the 2d outputs Created: 14/Nov/18 Updated: 20/Nov/18 Resolved: 20/Nov/18 |
|
| Status: | Done |
| Project: | Science Database |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Story | Priority: | Normal |
| Reporter: | Masayuki Tanaka | Assignee: | hassan |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Epic Link: | prototype v2 |
| Description |
|
Come up with a tentative plan for the output directory structure. |
| Comments |
| Comment by hassan [ 15/Nov/18 ] |
|
I discussed this with the Princeton team. In short, the directory structure is specified by you. PFS scripts make use of the LSST Butler concept for input/output. All input data and output products are read from/written to what is known as a 'Butler Data Repository'. On a file system, this corresponds to a single directory. For each PFS script called, this directory needs to be passed as the first argument. Below that directory, a number of subdirectories can be created, for calibs, and for output data. The names of those are also specified by the user when the PFS script in question is called. So you can decide the names of those too. See https://pipelines.lsst.io/v/DM-11034/getting-started/data-setup.html as an example. I understand that sogo.mineo is also aware of Butler concepts. Please drop me a comment or message me if I have misunderstood the request. |
| Comment by price [ 15/Nov/18 ] |
|
The directory structure we're currently using is in specified in the PfsMapper.yaml file; I think this is a reasonable starting point. |
| Comment by Masayuki Tanaka [ 15/Nov/18 ] |
|
OK. I think one possible option might be to have directories like: |
| Comment by price [ 16/Nov/18 ] |
|
The datamodel currently specifies:
Of course, this is under a rerun directory within the data repository, so the full path would be something like: /path/to/dataRepo/rerun/<rerunName>/<tract>/<patch>/pfsObject-<stuff>.fits Please note that this is subject to change, for a couple of reasons: Nevertheless, if you use the data butler to read and write the data products, then the need to know the precise directory structure vanishes, and the work required to adapt to changes is small, if any. |
| Comment by Masayuki Tanaka [ 19/Nov/18 ] |
/path/to/dataRepo/rerun/<rerunName>/<tract>/<patch>/pfsObject-<stuff>.fits OK, I think we should follow this. I would propose that we use a leading word 'sim_' for simulation reruns. For example, in our case: /path/to/dataRepo/rerun/sim_prototype_v2/<tract>/<patch>/pfsObject-<stuff>.fits This would allow us to distinguish between real runs from simulated ones. I understand that the data model is subject for changes (e.g., I would expect something like 'deepCoadd/' somewhere in the directory structure), but is there a way to indicate which rerun is based on which version of the data model? If not, perhaps we should invent one? |
| Comment by price [ 19/Nov/18 ] |
|
You can encode whatever information into the rerun name you want, but I would advise against making it complicated as the rerun name is based on policy only, so it's not guaranteed to be always be what you want. The LSST code writes the version information as the packages dataset. So it's not possible to tell the version that was used simply by looking at the directory structure, but it requires a little bit of python code. |
| Comment by Masayuki Tanaka [ 20/Nov/18 ] |
|
We will adopt /path/to/dataRepo/rerun/sim_prototype_v2/<tract>/<patch>/pfsObject-<stuff>.fits
|