[INSTRM-66] archiver startup is slow Created: 06/Jan/17 Updated: 03/Jul/18 Resolved: 03/Jul/18 |
|
| Status: | Done |
| Project: | Instrument control development |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | cloomis | Assignee: | arnaud.lefur |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
The archiver takes a long time to start, evidently because of checking/crawling all the existing rows. At LAM, the raw tables have ~170M rows:
It is not clear whether that scan is necessary. If not perhaps we can remove it. If it is, perhaps we need to use a union database or something. |
| Comments |
| Comment by arnaud.lefur [ 23/May/17 ] |
|
"select count However it's required for reply_raw, and reply_hdr, in that case, i'm just taking the last row_id, and it seems ok. https://github.com/Subaru-PFS/ics_archiver/commit/ea01a7421cba16da3054e7ac3d0d3316191ca266 I'll merge later, if it's fine. |
| Comment by shimono [ 23/May/17 ] |
|
Or if it is int/bigint type, you could use MAX() instead of??
|
| Comment by cloomis [ 23/May/17 ] |
|
I agree with using MAX(): archiver=# explain select count(id) from reply_raw; QUERY PLAN ---------------------------------------------------------------------------- Aggregate (cost=1695406.31..1695406.32 rows=1 width=8) -> Seq Scan on reply_raw (cost=0.00..1533464.05 rows=64776905 width=8) (2 rows) versus: archiver=# explain select max(id) from reply_raw; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Result (cost=0.61..0.62 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.56..0.61 rows=1 width=8) -> Index Only Scan Backward using reply_raw_pkey on reply_raw (cost=0.56..2732366.40 rows=64776905 width=8) Index Cond: (id IS NOT NULL) (5 rows) Note that we should perhaps be using postgresql's `BIGSERIAL` type for the ids. Probably does not matter for single-writer tables. But it does make the monotonically increasing integer constraint explicit. |
| Comment by shimono [ 24/May/17 ] |
|
Due to MVCC used by pgsql, you need to use some trick or configuration to get rid from seq scan into index only scan for count(xxx). For just getting max number from auto_increment-ish column, the easiest way in pgsql is max()... |
| Comment by arnaud.lefur [ 24/May/17 ] |
|
Yes, I should have used max(), what I did is equivalent but not very clean. |
| Comment by shimono [ 28/Jun/17 ] |
|
arnaud.lefur close this? |
| Comment by cloomis [ 03/Jul/18 ] |
|
This has been fixed for a year. |