[INFRA-31] Create repo for unittest data files Created: 21/Feb/15 Updated: 04/Sep/16 Resolved: 06/Jul/16 |
|
| Status: | Done |
| Project: | Software Development Infrastructure |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | cloomis | Assignee: | aritter |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Sprint: | 2014-13, 2014-14 | ||||||||||||||||
| Reviewers: | swinbank | ||||||||||||||||
| Description |
|
We need an optionally installed repository for inconveniently big data files (FITS, etc) which some unittests need. Like 'afwdata' in HSC. This ticket is intended to elicit HSC/LSST experience. In particular, is one repo enough (pfs_testdata?) Or will be want more (drp_testdata, ics_testdata for the camera actors, etc.)? I'd vote for one. |
| Comments |
| Comment by swinbank [ 25/Jun/16 ] |
|
cloomis, aritter – looks like Andreas marked this in progress, but JIRA reckons it's assigned to Craig. Should we reassign it? |
| Comment by swinbank [ 25/Jun/16 ] |
|
LSST, for what it's worth, has adopted git-lfs for medium-sized test data repositories (repositories with tiny data files still tend to end up in plain old git, and it's clear that git-lfs doesn't scale to really big data volumes). We've seen a number of issues with the complexity of git-lfs setup and usage. Just the other day, for example: https://community.lsst.org/t/issues-using-git-lfs-for-cloning-test-data-repositories/880/14 However, that is largely because LSST uses the git-lfs protocol, but based on its own infrastructure. Using git-lfs on GitHub's infrastructure should be much more seamless. It's free, for modest data volumes (1 GB total storage, 1 GB/month bandwidth). Would that be usable for PFS? |
| Comment by shimono [ 25/Jun/16 ] |
|
Actually I've migrated from local gitolite to github using github git-lfs, and PFS has its storage. From infra point of view, we can. Policy need to be defined. |
| Comment by swinbank [ 25/Jun/16 ] |
|
aritter tells me that, per discussion on allhands-software he intends to use a plain git repository in this case. In general, though, being able to use git-lfs seems like an excellent thing. How would we go about defining a policy? |
| Comment by aritter [ 26/Jun/16 ] |
|
There's a little problem using gzipped files - the butler can't find them (see https://jira.lsstcorp.org/browse/DM-4924). Given the file size (~160MB) and standard-git's file size limit of 100MB the fits files need to be compressed. Should we mark https://jira.lsstcorp.org/browse/DM-4924 as blocking |
| Comment by aritter [ 26/Jun/16 ] |
|
For the discussion of https://jira.lsstcorp.org/browse/DM-4924 see https://community.lsst.org/t/how-does-the-butler-support-compression/502/15 |
| Comment by swinbank [ 26/Jun/16 ] |
|
The Butler work has't been scheduled on LSST: it'll have to happen sometime, but we can't say when. Plus (per RHL in the e-mail thread linked above) git-lfs is the "correct solution" and (if we're using GitHub infrastructure) I think it should be relatively painless. Given all that, I suggest going ahead with git-lfs. |
| Comment by aritter [ 28/Jun/16 ] |
|
Created repo "drp_stella_data" using git lfs. |
| Comment by swinbank [ 02/Jul/16 ] |
|
Confirm that the drp_stella_data repository has data stored using git-lfs. However, you've not included a .gitattributes file telling git-lfs to manage the data, so the won't get fetched when you clone the repository. I've created the branch u/swinbank/lfs which contains an appropriate .gitattributes. Could you please check it and, if you're happy, merge it to master? I'm not sure I've understood shimono's comment about defining policy for this. Did you discuss with him whether this was appropriate? Do we know how much quota we have available for this data? It looks like your changes to drp_stella have already been merged to master. That's fine, since I don't think we've discussed or formalized a policy for code review. In future, we should hold them on the ticket branch until the reviewer has had chance to have a look, and then, when both you and the reviewer are happy, you can merge. I did go through and add some comments to your commit here: https://github.com/Subaru-PFS/drp_stella/commit/aa634c375f77287bf38783db93dc36536b2ecefb#diff-cd53403f864c50438a9c2afd10f43148R34. Most of them are just minor ways in which the code could be cleaner, but there are a couple of workflow issues we should talk about. |
| Comment by aritter [ 06/Jul/16 ] |
|
All comments applied and merged with master |
| Comment by rhl [ 08/Jul/16 ] |
|
shimono We think that you had some thoughts about git-lfs. Do you have a problem with this solution? |
| Comment by shimono [ 08/Jul/16 ] |
|
Sorry for slow response. |
| Comment by rhl [ 09/Jul/16 ] |
|
In case it's helpful, here's an LSST document on what they did: https://sqr-001.lsst.io |