[INSTRM-338] Improve performance on PFS machines at Subaru Created: 26/Apr/18  Updated: 10/May/18  Resolved: 26/Apr/18

Status: Open
Project: Instrument control development
Component/s: ics_production
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Normal
Reporter: cloomis Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

The VMs seem to be very slow, probably from IO.
I'll compare installing conda on shell-ics to an NFS-mounted /software, to installing on a physical oldish machine with not special spinning disks. I'd expect the NFS to maybe cause some slowdown, but I though there was a fast cache for that on the server? But it is much worse than that – bad enough to be a problem.

bash Miniconda3-latest-Linux-x86_64.sh -b -p /software/conda: 2m22 vs. 13s
conda update -y conda: 1m5s vs. 16s
conda install numpy cython twisted ply future astropy ruamel_yaml ipython: 6m48s vs. 2m42s



 Comments   
Comment by shimono [ 30/Apr/18 ]

it is known/reported/discussed issue which I have not succeeded to resolve more than two/three years, that jfs+nfs has significant (~2 times) performance issue on appending to files more than 1-2TB, and xfs+jfs has significant (~1 digit / ~10 times) performance issue on any stat action to dir/file. so, for now PFS servers at IPMU uses jfs for its NFS server, but following Princeton order I configured the summit as xfs for NFS storage.
so, help wanted, or wontfix.

Comment by cloomis [ 01/May/18 ]

The disk controller has a battery-backed RAM cache, right? Are the xfs barriers turned off (along with disk caches)?

Comment by shimono [ 01/May/18 ]

its due to a combination of xfs and NFS, so it does not depends on its storage device. ~1 digit performance degradation was measured between physical and NFS, as previously reported.

Comment by cloomis [ 01/May/18 ]

But XFS barriers do matter, especially on NFS servers – if you can legitimately turn them off because you have battery-backed cache in the right place, you gain enormously.

Comment by shimono [ 10/May/18 ]

if it is an issue on bulk (or even sequential) read/write performance, such async or caching could work well, but the issue is handling of stat.
async option could help a bit for stat performance, but it is after NFS server-client communication (like server reply immediately after commit with no-op) and in filesystem + block device, so it would not have any significant change for our issue.

  1. actually, testing with noatime (client), async (server) does not change stat performance.
Generated at Mon Apr 14 16:22:50 JST 2025 using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.