[INSTRM-1171] Read-only access to summit opDB Created: 02/Feb/21  Updated: 07/Dec/22  Resolved: 07/Dec/22

Status: Done
Project: Instrument control development
Component/s: spt_operational_database
Affects Version/s: None
Fix Version/s: None

Type: Story Priority: Normal
Reporter: hassan Assignee: Kiyoto Yabe
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
blocks PIPE2D-706 Regular processing of arcs, flats Done
is blocked by INSTRM-1145 Replication test for opDB using two t... Done
Story Points: 4

 Description   

In order to ensure regular processing of data from the summit (eg PIPE2D-706) at non-Subaru sites (e.g. in Princeton and Marseille), read-only access to data from the ‘live’ opDB at the summit is required. This access is needed to feed development and testing of both reductions themselves, and of any automated reduction scripts.

Since processing will be automated, we need direct access, not through a VPN or requiring 2FA.

Direct (read-only) access to the Hilo-based postgres port from some tiny list of external machines would be ideal. Barring that, chained streaming replication from Hilo to some other postgres instance would be fine, although quite a lot more work.

We can probably assume that the Hilo server will be just as good as the summit server. If streaming replication works to Hilo, as we expect, the servers will be effectively identical. See INSTRM-1145.

For the moment, we can work without direct access: dump-restore copies from any server suffices, and we can increase the cadence to meet initial needs (e.g. processing stability data from SM1). As we expect to use SuNSS to learn how to run PFS in regular operations we should put the final mechanism in place as soon as possible.

For developing and running the automated scripts it is valuable for all sites to have the same view of available data. Especially for debugging the reduction scripts, it would be useful to include the time sequence of the transactions as this allows us to reproduce the state of the opdb at any time. Without this full replication we will have to develop and debug on the mountain or in Hilo.



 Comments   
Comment by hassan [ 02/Feb/21 ]

Access to the (copy of) the opDB should not be through a VPN - this complicates matters in accessing the data from Princeton.

Comment by Kiyoto Yabe [ 05/Feb/21 ]

Just out of curiosity, is it technically possible to automatically open VPN session only during the processing is running?

Comment by rhl [ 05/Feb/21 ]

If we made it insecure enough, I think we could.  We would have to put all the credentials to make a VPN connection into the script (and if Subaru switches to 2-factor authentication using e.g. duo at some point we'd have to add a back-door around that too).   cloomis should answer too.

Comment by cloomis [ 05/Feb/21 ]

Technically, we surely can. Up to Subaru what identity they would want us to use (a new one just for this? Some particular user's identity?, etc. I would not even try before that was provided). We might need raw socket access, etc. at the other end.

Comment by price [ 06/Feb/21 ]

I played around with opening a VPN automatically on our end. I believe it requires root access. Perhaps it's possible we could get our cluster gurus to set something up for us, but given that we need database copies anyway, I think that might just be wasted effort.

Comment by Hisanori Furusawa [ 08/Mar/21 ]

Could you possibly summarize the current request/desire of a way to access the opDB (replica)?
My understanding is something like this. Any correction and suggestion would be appreciated.

In the short to med term (for XXX months from now on):

  • periodic dump and pass it to remote nodes by https or some protocol
    What protocol or way to access the dumped files can work for you.
    e.g., uploading the files on HTTPS server at the observatory and you get them
    pushing the files from the observatory to your site by rsync or HTTP etc.

In the long term

  • capability of direct query (select) to the live opDB
  • or, make the interval of dump short enough to catch up with the engineering

Questions

  • What is the timescale for short and long terms?
  • Can working on Hilo data analysis computers though VPN be a substitution to the external opDB access?
  • Would you never use the Hilo computers for engineering processing?
Comment by cloomis [ 10/Mar/21 ]

Let me provide a status update.

Princeton currently dumps most of the opdb tables once a day (at ~15h00 HST), copies that using globus to Princeton, and restores the dump into the postgres running there. That does only take a second or so at each end, so is not a burden. Globus is what makes that work, by removing the requirement for ssh tunnels, which do not work trivially at either end: into Subaru because of the VPN, into Princeton because of 2FA.

I am not sure how well that will work in the long run, when far more data is in the opDb tables.

Comment by Kiyoto Yabe [ 07/Dec/22 ]

The current mechanism works at each site so far. File another ticket if we need different mechanism in future.

Generated at Sat Feb 10 16:32:37 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.