[DAMD-94] Fix discrepancy between createHash and datamodel.txt Created: 09/Nov/20 Updated: 05/Jan/21 Resolved: 14/Nov/20 |
|
| Status: | Done |
| Project: | Data Model |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Normal |
| Reporter: | hassan | Assignee: | hassan |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Story Points: | 1 | ||||||||
| Sprint: | 2DDRP-2021 A | ||||||||
| Description |
|
Currently the datamodel.txt defines a hash as a 63-bit unsigned int, fitted into 64 bit signed integers:
This is currently inconsistent with the datamodel.utils.createHash() function, where a 64-bit hash is generated: Fix this discrepancy following the proposal by Sogo Mineo and Craig Loomis in the datamodel channel 2020-11-06, by updating the datamodel.txt text mentioned above, such that a 64-bit hash is generated, in line with createHash, and that this hash can be fitted into a standard 64-bit signed integer. This will allow identifiers such as the pfsDesignId, which use that hash, to be stored in the opDB Postgres database, using a standard bigint or int8 data type, without need for additional conversion routines, as discussed in
|
| Comments |
| Comment by hassan [ 09/Nov/20 ] |
|
Other identifiers such as the objId are also affected. Hassan is investigating whether the Gaia DR2 object identifier, the sourceId, can be stored in the positive range of a 64-bit signed int to avoid possible confusion. Yabe-san is checking the HSC ID. |
| Comment by sogo.mineo [ 10/Nov/20 ] |
|
I think the hexadecimal notation of the 64bit hash will be kept unsigned as it is now, e.g. 0xfedcba9876543210.
def fits_cast_unsigned_to_signed(x):
return x - 0x8000_0000_0000_0000
def cplusplus_cast_unsigned_to_signed(x):
return x - ((x & 0x8000_0000_0000_0000) << 1)
|
| Comment by hassan [ 13/Nov/20 ] |
|
Following subsequent discussions with cloomis and sogo.mineo: generating a hash that is truncated to from the original 160 bit SHA-1 to 64-bits, compared with one truncated to 63 bits, would cause more confusion and problems than it helps. For example, if the resultant 64-bit hash is carried internally as a 64-bit signed integer (for example when read from the opDB, where the column type is 64-bit signed), then approx 50% of all generated hashes would result in a negative signed integer value being read. Care then needs to be taken when using these, for example when writing out such has values as hex representations in file names, as performed by the LSST Gen2 Butler (tests show in fact that the Butler would raise an error in such situations). It appears to be much simpler and safer to truncate the SHA to 63 bits. This way, all hash values will be positive. |
| Comment by hassan [ 14/Nov/20 ] |
|
merged to master (d472332) |