[INSTRM-2146] Move drpActor in Hilo to dedicated node. Created: 18/Jan/24  Updated: 03/Feb/24

Status: Open
Project: Instrument control development
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Normal
Reporter: cloomis Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: near-term
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to INFRA-327 Configure Hilo nodes to run drpActor Open
Sprint: PreEngRun15Mar

 Description   

The drpActor in Hilo currently shares a machine with big jupyter lab sessions, and has been killed a few times by the OOM-killer.

We should move it to a dedicated host. We could use pfsa-usr02, but I am wondering whether we should use pfsa-an0[45]. Hisanori Furusawa? Yoshida, Hiroshige?



 Comments   
Comment by Hisanori Furusawa [ 18/Jan/24 ]

It would be good to assign appropriate usages to pfsa-usr01, 02, and 03. pfsa-usr03 was bought primarily for intensive netflow use in the future operation, but could be used for some general cpu-bound jobs for now, too. pfsa-anXX are prepared for batch jobs, which is under configuration, so if you could manage your interactive jobs within the pfsa-usr0[1-3], that would be helpful for now. Yoshida, Hiroshige may comment on this.

Comment by Yoshida, Hiroshige [ 19/Jan/24 ]

(Temporary) use of pfsa-usr03 sounds ok to me.

Comment by cloomis [ 19/Jan/24 ]

I just want to underscore the fact that drpActor operation is fundamental to any drp/QA work in Hilo: it is the drpActor which ingests the raw data and immediately performs on-the-fly reductions of fresh data. In other words, it should be treated as an essential part of data reductions, and not as part of "interactive" or "development" work. I believe that we should settle on the right host for it.

Comment by Kiyoto Yabe [ 03/Feb/24 ]

As per discussion with Hisanori Furusawa last week, there are two possibilities:

  • run drpActor on the master node of the cluster system (for longer run but keep thinking about using in the next run)
  • run drpActor on pfsa-usr03 (netflow machine) as a backup plan

The technical issue of the file system of the cluster has been resolved (I believe), so we can move forward for the first possibility. After I get a full access to pfsa-usr03, I will work on the second possibility.

Generated at Sat Feb 10 16:43:27 JST 2024 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.