Big image tables maintenance
Hi , I have a 10 TB size table with multiple bytea columns (image & doc)and makes 20TB of DB size. I have a couple of issues to maintain the DB. 1. I Would like to separate the image column from the 10TB size table, place it in a separate schema. The change should not result in any query change in the application. Is it possible? Doing this it should not affect the performance. 2. I can't maintain files on File system as the count is huge, so thinking of using any no-sql mostly mongo-DB, is it recommended? Or PostgreSQL itself can handle? 3. Taking the backup of 20TB data, is big task. Any more feasible solution other than online backup/pg_dump? Each image retrieval is Currently, we are on pg 9.4 and moving to 10.5 soon. Thanks, GJ.
Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Buffers: shared hit=72620045 read=45,297,330 I/O Timings: read=57,489,958.088 Execution time: 61,141,110.516 ms If I'm reading this correctly, it took 57M ms out of an elapsed time of 61M ms to read 45M pages from the filesystem? If the average service time per sarr is < 5 ms, Is this a case of bloated index where re-indexing is warranted? Thanks explain (analyze,buffers,timing,verbose,costs) select count(*) from jim.pitations ; QUERY PLAN - Aggregate (cost=72893810.73..72893810.74 rows=1 width=8) (actual time=61141110.437..61141110.437 rows=1 loops=1) Output: count(*) Buffers: shared hit=72620045 read=45297330 I/O Timings: read=57489958.088 -> Index Only Scan using pit_indx_fk03 on jim.pitations (cost=0.58..67227187.37 rows=2266649344 width=0) (actual time=42.327..60950272.189 rows=2269623575 loops=1) Output: vsr_number Heap Fetches: 499950392 Buffers: shared hit=72620045 read=45297330 I/O Timings: read=57489958.088 Planning time: 14.014 ms Execution time: 61,141,110.516 ms (11 rows) Time: 61141132.309 ms refpep=> refpep=> refpep=> Screen session test_pg on ip-10-241-48-178 (system load: 0.00 0.00 0.00) Sun 16.09.2018 14:52 Screen sess
Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Fred Habash wrote: > If I'm reading this correctly, it took 57M ms out of an elapsed time of 61M > ms to read 45M pages from the filesystem? > If the average service time per sarr is < 5 ms, Is this a case of bloated > index where re-indexing is warranted? > > explain (analyze,buffers,timing,verbose,costs) > select count(*) from jim.pitations ; > >QUERY PLAN > > - > Aggregate (cost=72893810.73..72893810.74 rows=1 width=8) (actual > time=61141110.437..61141110.437 rows=1 loops=1) >Output: count(*) >Buffers: shared hit=72620045 read=45297330 >I/O Timings: read=57489958.088 >-> Index Only Scan using pit_indx_fk03 on jim.pitations > (cost=0.58..67227187.37 rows=2266649344 width=0) (actual > time=42.327..60950272.189 rows=2269623575 loops=1) > Output: vsr_number > Heap Fetches: 499950392 > Buffers: shared hit=72620045 read=45297330 > I/O Timings: read=57489958.088 > Planning time: 14.014 ms > Execution time: 61,141,110.516 ms > (11 rows) 2269623575 / (45297330 + 72620045) ~ 20, so you have an average 20 items per block. That is few, and the index seems indeed bloated. Looking at the read times, you average out at about 1 ms per block read from I/O, but with that many blocks that's of course still a long time. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
