[INSTRM-1367] obslog: global search on header keyword Created: 18/Sep/21 Updated: 14/Oct/22 Resolved: 14/Oct/22 |
|
| Status: | Done |
| Project: | Instrument control development |
| Component/s: | spt_operational_database |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Normal |
| Reporter: | hassan | Assignee: | michitaro |
| Resolution: | Done | Votes: | 0 |
| Labels: | EngRun | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Story Points: | 6 |
| Description |
|
Following discussions in the #obslog slack channel during Engineering Run 3, it would be very useful to be able perform a global search on all raw FITS files for a keyword value matching a certain condition. For example: W_CLNEOT>14 would return all visits/files matching the case where the keyword W_CLNEOT (Neon lamp request time) has a value of 14 or greater. |
| Comments |
| Comment by hassan [ 19/Sep/21 ] |
|
Another example is
W_AITNEO='T'
(or corresponding correct boolean expression) to find all exposures which have been taken under the Neon DCB lamp. |
| Comment by michitaro [ 20/Sep/21 ] |
|
Kiyoto Yabe-san, How do we go ahead? |
| Comment by Kiyoto Yabe [ 20/Sep/21 ] |
|
I still don't know how larger the fix is, but anyway if you make a fix easily please make a pull request. |
| Comment by michitaro [ 07/Oct/21 ] |
|
I'm thinking about how to store FITS headers in the opdb. Method 1 (naive)To make a wide table with columns for all keywords the FITS headers have. class FitsHeader(Base): __tablename__ = 'fitsheader' filename = Column(String, primary_key=True) hdu_index = Column(Integer, primary_key=True) SIMPLE = Column(String) BITPIX = Column(Integer) NAXIS = Column(Integer) EXTEND = Column(Boolean) DATA_TYP = Column(String) FRAMEID = Column(String) EXP_ID = Column(String) DETECTOR = Column(String) GAIN = Column(Float) DET_TMP = Column(Float) DET_ID = Column(String) DISPAXIS = Column(Integer) DISPRSR = Column(Float) WAV_MIN = Column(Float) WAV_MAX = Column(FLoat) ... ProsSimple and no violation of SQL standards. ConsSchema change always will be required when the set of keywords changes. Method 2 (EAV)To make a table whose each record represents a FITS card. class FitsHdu(Base): __tablename__ = 'fits_hdu' id = Column(Integer, primary_key=True) class FitsCardString(Base): __tablename__ = 'fits_card_string' hdu_id = Column(Integer, ForeignKey('fits_hdu.id')) keyword = Column(String, nullable=False) value = Column(String, nullable=False) class FitsCardFloat(Base): __tablename__ = 'fits_card_float' hdu_id = Column(Integer, ForeignKey('fits_hdu.id')) keyword = Column(String, nullable=False) value = Column(Float, nullable=False) class FitsCardBoolean(Base): __tablename__ = 'fits_card_boolean' hdu_id = Column(Integer, ForeignKey('fits_hdu.id')) keyword = Column(String, nullable=False) value = Column(Boolean, nullable=False) ProsNo violation of SQL standards. ConsThis approach requires a complicated implementation. Method 3 (JSON)To make a table with a column of JSON(B) type to store the header cards. class FitsHeader(Base): __tablename__ = 'fitsheader' id = Column(Integer, primary_key=True) cards = Column(JSONB) with Session(future=True) as session: records = [ FitsHeader(cards={ 'FRAMEID': 'PFS001', 'favorite_food': 'rice', }), FitsHeader(cards={ 'FRAMEID': 'PFS002', 'favorite_food': 'noodle', }), ] session.add_all(records) session.commit() q = session.query(FitsHeader).filter( FitsHeader.cards['FRAMEID'].astext == 'PFS002') r = list(q)[0] assert r.cards['favorite_food'] == 'noodle' # type: ignore ProsSimple implementation. ConsLess potable. A JSON column is out of SQL standards (, but Postgres, MySQL, SQLite and MSSQLServer support it.) |
| Comment by michitaro [ 07/Oct/21 ] |
|
If we have no plan to run the code on DBMS other than Postgres, I like the 3rd method. |
| Comment by michitaro [ 08/Oct/21 ] |
|
cloomis-san, what do you think about introducing JSON to opdb? |
| Comment by Kiyoto Yabe [ 14/Oct/21 ] |
|
I'm now positively thinking about using JSON columns. We do not probably use any backends other than Postgres. |
| Comment by michitaro [ 15/Oct/21 ] |
|
Thank you for the comment. I will make a prototype using JSON column approach. |
| Comment by michitaro [ 18/Nov/21 ] |
|
I have made a PR for this. https://github.com/Subaru-PFS/spt_operational_database/pull/54 cloomis san, to be able to search by FITS headers, it's necessary to add an `obslog_fits_header` record when a new `pfs_visit` record is issued. As the PR shows, codes to add a `obslog_fits_header ` record from a `pyfits.HDUList` will be something like this: import opdb.obslog import astropy.io.fits as pyfits from opdb.models import pfs_visit db = SessionClass() pfs_visit_id = 66453 filestem = 'PFSA06645311' hdul = pyfits.HDUList([ pyfits.PrimaryHDU(header=pyfits.Header([ ('PI', 3.14, "ratio of a circle's circumference to its diameter"), ])), ]) db.add_all(opdb.obslog.fits_headers_from_hdulist(hdul, pfs_visit_id, filestem): db.commit() Is it feasible to add some codes like the above to your code? Sorry for the delayed progress. |
| Comment by cloomis [ 18/Nov/21 ] |
|
[ Sorry, I missed your much much earlier request. Yes, I'm fine with the clever JSONB encoding. ] I have opened INSTRM-1463 to insert the headers into opdb just before gen2 archiving – that is not the earliest possible time but close. |
| Comment by michitaro [ 19/Nov/21 ] |
|
Thanks! |
| Comment by michitaro [ 09/Dec/21 ] |
|
I have added the header cards in existing sps and mcs FITS files to opdb. Now we can query visits by their FITS header value on obslog like this:
where fits_header['CHECKSUM'] = 'V4faV3fYV3faV3fW' where fits_header['OBSERVER'] like '%tamura%' where fits_header['DET-TMP']::float between 163 and 164 where fits_header['W_ENIISH'] = 'true' -- not 'T' or 'F' Query on a boolean value, we should use 'true' and 'false' instead of 'T' or 'F.'
|
| Comment by Kiyoto Yabe [ 09/Dec/21 ] |
|
Thank you, Koike-san! That would be very useful. Can we combine the previous search method (for instance just typing `defocused` to select defocused data) and this new query method (for instance, `where fits_header['EXPTIME']::float < 10`)? In other words, how can we query to obslog columns such as description and notes? Sorry, but I probably don't understand the search mechanism. |
| Comment by michitaro [ 09/Dec/21 ] |
|
I have to make some documents for this. If the text in the search box starts with `where ` (like the example above), the search query will be sent to the server as is. But if the search query does not start with `where ` (for instance, just `defocused`) the search query will be compiled into a particular format before the query will be sent to the server. For the case of `defocused`, it will be compiled into `where any_column like '%defocused%'. `any_column` is a special virtual column that will be expanded to some real columns in the opdb.
# any_column will be expanded the columns below
M.pfs_visit.pfs_visit_id, String,
M.pfs_visit.pfs_visit_description,
M.obslog_visit_note.body,
visit_note_user.account_name,
M.obslog_visit_set_note.body,
visit_set_note_user.account_name,
M.iic_sequence.name,
M.iic_sequence.sequence_type,
M.sps_annotation.notes,
M.obslog_mcs_exposure_note.body,
mcs_exposure_note_user.account_name,
In addition to `fits_header`, we can use more virtual columns below. (ast.String('any_column'),): AnyColumn, (ast.String('visit_id'),): M.pfs_visit.pfs_visit_id, (ast.String('id'),): M.pfs_visit.pfs_visit_id, (ast.String('sequence_type'),): M.iic_sequence.sequence_type, (ast.String('issued_at'),): M.pfs_visit.issued_at, (ast.String('is_sps_visit'),): M.sps_visit.pfs_visit_id != None, (ast.String('is_mcs_visit'),): M.mcs_exposure.mcs_frame_id != None, (ast.String('is_agc_visit'),): M.agc_exposure.agc_exposure_id != None, (ast.String('fits_header'),): M.obslog_fits_header.cards_dict, |
| Comment by Kiyoto Yabe [ 09/Dec/21 ] |
|
Thank you! That information is very helpful. And thank you for documenting. |
| Comment by hassan [ 30/Mar/22 ] |
|
michitaro Kiyoto Yabe Is this ticket complete? Can we close this? |
| Comment by michitaro [ 30/Mar/22 ] |
|
Yes, we can close this ticket, but we need to register FITS files on the DB manually periodically. |
| Comment by yuki.moritani [ 14/Oct/22 ] |
|
Following the above comments, I close this ticket. |