[PIPE2D-761] Restructure reduceExposure to process visit one-by-one Created: 07/Mar/21  Updated: 12/Mar/21  Resolved: 12/Mar/21

Status: Done
Project: DRP 2-D Pipeline
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Story Priority: Normal
Reporter: rhl Assignee: price
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: 2DDRP-2021 A3
Reviewers: hassan

 Description   

reduceExposure currently appears to process all the data requested before writing any of it out, processing all the e.g. b data before starting on r.

This blows up its memory footprint and frustrates anyone waiting for a given visit to finish.  

Please restructure things to process all the arms for a given visit, write out the results, and clean up memory before proceeding to the next visit.



 Comments   
Comment by price [ 07/Mar/21 ]

reduceExposure acts like processCcd: it most certainly writes out the results as you go along. As for the memory footprint, I'm not sure, but I don't think there's any reason why it should be different than processCcd.

Comment by rhl [ 08/Mar/21 ]

I don't think so.  It processes all the data to form exposureList before starting any extractions, then loops over all the exposures, and then finally calls self.write.  Please check.

Comment by price [ 09/Mar/21 ]

I'm sorry, you're right that it works through an exposureList. However, I believe that exposureList consists only of exposures from the same visit (i.e., all arms from all spectrographs), so we truly do parallelise over visits. The reason for this is that we want to derive a single 2D sky subtraction solution from all exposures in the same visit, and that has to happen before we extract the spectra.

I propose that we should create a new CmdLineTask to perform a single-arm spectra extraction, which will allow faster processing of SuNSS data.

Comment by rhl [ 11/Mar/21 ]

I thought I'd commented on this already...

I think we want to pass in a list of visits and do the right thing (process all 12 arms for each visit before proceeding to the next visit), and each visit is independent (and may have its own pfsConfig).

I don't think we need a new command;  if I process a single spectrograph and two bands it'll process all of that visit before moving on.

Comment by price [ 11/Mar/21 ]

When you say, "we want to pass in a list of visits", are you referring to using the ReduceExposureTask directly in python, and therefore working around the ReduceExposureRunner that ensures we operate on a visit at a time?

Comment by rhl [ 11/Mar/21 ]

I'm talking about the command line.  If I run

 reduceExposure.py /projects/HSC/PFS/Subaru --id visit=50000..500100 arm=b^r

logging suggests that it runs ISR (etc.) and extracts spectra and writes pfsArm files from all the b visits in turn before starting on the r ones. We only have one spectrograph, but from what you say it'd have handled all 4 spectrographs in b before doing the extraction. I'd like to process all the b and all the r before moving on to the next visit, but as I can achieve this (more or less) by running two jobs, one for b and one for r, this isn't all that urgent. If we add an option (or a new command) to also run the merge step this will become important again.

I am still worried about when it releases the memory, as it seemed to crash on very long visit lists, but that's a different issue. Maybe this was just a problem on the mountain on small memory machines (I just processed 632 b^r visits on tiger without problems).

Comment by price [ 12/Mar/21 ]

So the ReduceExposureTask is fine and doesn't need to be restructured (it operates on all arms of the same kind in an exposure, and that's the way it needs to operate for sky subtraction), but the request is to change the order of the inputs that it receives, which is set by the ReduceExposureRunner. I've made a small change which reorders the inputs so that it iterates over visits more quickly than arms.

Note that I'm not sure how python's multiprocessing (used when we specify a -j flag) will change the ordering we provide, but hopefully it will do something approaching FIFO.

Comment by hassan [ 12/Mar/21 ]

Approved with no additional changes requested.

Comment by price [ 12/Mar/21 ]

Merged to master.

Generated at Sat Feb 10 15:57:25 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.