Big image tables maintenance

2018-09-17 Thread still Learner
Hi ,

I have a 10 TB size table with multiple bytea columns (image & doc)and
makes 20TB of DB size. I have a couple of issues to maintain the DB.

1. I Would like to separate the image column from the 10TB size table,
place it in a separate schema. The change should not result in any query
change in the application.  Is it possible? Doing this it should not affect
the performance.

2. I can't maintain files on File system as the count is huge, so thinking
of using any no-sql mostly mongo-DB, is it recommended? Or PostgreSQL
itself can handle?

3. Taking the backup of 20TB data, is big task. Any more feasible solution
other than online backup/pg_dump?

Each image retrieval is
Currently, we are on pg 9.4 and moving to 10.5 soon.

Thanks,
GJ.


Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

2018-09-17 Thread Fred Habash
Buffers: shared hit=72620045 read=45,297,330
I/O Timings: read=57,489,958.088
Execution time: 61,141,110.516 ms

If I'm reading this correctly, it took 57M ms out of an elapsed time of 61M
ms to read 45M pages from the filesystem?
If the average service time per sarr is < 5 ms, Is this a case of bloated
index where re-indexing is warranted?

Thanks

explain (analyze,buffers,timing,verbose,costs)
select count(*) from jim.pitations ;

 QUERY PLAN

-
Aggregate  (cost=72893810.73..72893810.74 rows=1 width=8) (actual
time=61141110.437..61141110.437 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=72620045 read=45297330
   I/O Timings: read=57489958.088
   ->  Index Only Scan using pit_indx_fk03 on jim.pitations
(cost=0.58..67227187.37 rows=2266649344 width=0) (actual
time=42.327..60950272.189 rows=2269623575 loops=1)
 Output: vsr_number
 Heap Fetches: 499950392
 Buffers: shared hit=72620045 read=45297330
 I/O Timings: read=57489958.088
Planning time: 14.014 ms
Execution time: 61,141,110.516 ms
(11 rows)
Time: 61141132.309 ms
refpep=>
refpep=>
refpep=>
Screen session test_pg on ip-10-241-48-178 (system load: 0.00 0.00 0.00)

 Sun 16.09.2018 14:52
Screen sess


Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

2018-09-17 Thread Laurenz Albe
Fred Habash wrote:
> If I'm reading this correctly, it took 57M ms out of an elapsed time of 61M 
> ms to read 45M pages from the filesystem?
> If the average service time per sarr is < 5 ms, Is this a case of bloated 
> index where re-indexing is warranted? 
> 
> explain (analyze,buffers,timing,verbose,costs)
> select count(*) from jim.pitations ;
>   
>QUERY PLAN 
> 
> -
> Aggregate  (cost=72893810.73..72893810.74 rows=1 width=8) (actual 
> time=61141110.437..61141110.437 rows=1 loops=1)
>Output: count(*)
>Buffers: shared hit=72620045 read=45297330
>I/O Timings: read=57489958.088
>->  Index Only Scan using pit_indx_fk03 on jim.pitations  
> (cost=0.58..67227187.37 rows=2266649344 width=0) (actual 
> time=42.327..60950272.189 rows=2269623575 loops=1)
>  Output: vsr_number
>  Heap Fetches: 499950392
>  Buffers: shared hit=72620045 read=45297330
>  I/O Timings: read=57489958.088
> Planning time: 14.014 ms
> Execution time: 61,141,110.516 ms
> (11 rows)

2269623575 / (45297330 + 72620045) ~ 20, so you have an average 20
items per block.  That is few, and the index seems indeed bloated.

Looking at the read times, you average out at about 1 ms per block
read from I/O, but with that many blocks that's of course still a long time.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com