How to see/calculate size of index in memory?

2018-09-19 Thread Sam R.

Hi!

I would have following question, if someone could help.

Question 1: How to see/calculate size of index in memory?
BTree, hash index.

I can see size of index e.g. with pg_relation_size FROM pg_class (after 
reindex). Does that tell size of index on disk?

I would be interested how big part of index is in memory. (Whole index?)

PG10/PG11.
Best Regards, Sam



To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Hi!

Related to my other email (size of index in memory), 

Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?
Or have I missed something, does it matter (to keep indexes in memory)?

Background info: I have plans to use hash indexes: very large amount of data in 
db tables, but (e.g. hash) indexes could be kept in memory.

I am using PostgreSQL 10. I could start to use PostgreSQL 11, after it has been 
released.

Best Regards, Sam


Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Hi!
Is is possible to force PostgreSQL to keep an index in memory? The data in db 
table columns is not needed to be kept in memory, only the index. (hash index.)

It would sound optimal in our scenario.I think Oracle has capability to keep 
index in memory (in-memory db functionality). But does PostgreSQL have such a 
functionality? (I keep searching.)
I have read:
Tuning Your PostgreSQL Server - PostgreSQL wiki
(effective_cache_size, shared_buffers)
I have seen responses to:
PostgreSQL Index Caching



Should I actually set shared_buffers to tens of gigabytes also, if I want to 
keep one very big index in memory?

I ma also reading a PG book.

Best Regards, Sam


 

On Wednesday, September 19, 2018 11:40 AM, Sam R.  
wrote:
 

 Hi!

Related to my other email (size of index in memory), 

Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?
Or have I missed something, does it matter (to keep indexes in memory)?

Background info: I have plans to use hash indexes: very large amount of data in 
db tables, but (e.g. hash) indexes could be kept in memory.

I am using PostgreSQL 10. I could start to use PostgreSQL 11, after it has been 
released.

Best Regards, Sam


   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Sergei wrote:
> You can not pin any table or index to shared buffers.
Thanks, this is answer to my other question!
In our case, this might be an important feature. 
(Index in memory, other data / columns not.)
> shared_buffers is cache for both tables and indexes pages.
Ok. So, we should set also shared_buffers big.

BR Sam
 

On Wednesday, September 19, 2018 12:10 PM, Sergei Kornilov  
wrote:
 

 Hi

effective_cache_size is not cache. It is just approx value for query planner: 
how many data can be found in RAM (both in shared_buffers and OS page cache)

> Q: Size of shared_buffers does not matter regarding keeping index in memory?
shared_buffers is cache for both tables and indexes pages. All data in tables 
and indexes are split to chunks 8 kb each - pages (usually 8kb, it can be 
redefined during source compilation).
Shared buffers cache is fully automatic, active used pages keeps in memory, 
lower used pages may be evicted. You can not pin any table or index to shared 
buffers.

regards, Sergei


   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Hi!
Thanks for all of the comments!
David wrote:> if you mention 
> how muchRAM the server has and how big the data is now
Let's say for example:
RAM: 64 GB
Data: 500 GB - 1.5 TB, for example. 
( RAM: Less would of course be better, e.g. 32 GB, but we could maybe go for an 
even little bit bigger value than 64 GB, if needed to. )
BR Sam

On Wednesday, September 19, 2018 1:11 PM, David Rowley 
 wrote:
 

 On 19 September 2018 at 21:18, Sam R.  wrote:
> Ok. So, we should set also shared_buffers big.

It might not be quite as beneficial as you might think. If your
database is larger than RAM often having a smaller shared_buffers
setting yields better performance.  The reason is that if you have a
very large shared_buffers that the same buffers can end up cached in
the kernel page cache and shared buffers. If you have a smaller shared
buffers setting then the chances of that double buffering are reduced
and the chances of finding a page cached somewhere increases.

However, if your database is quite small and you can afford to fit all
your data in shared buffers, with enough free RAM for everything else,
then you might benefit from a large shared buffers, but it's important
to also consider that some operations, such as DROP TABLE can become
slow of shared buffers is very large.

You might get more specific recommendations if you mention how much
RAM the server has and how big the data is now and will be in the
future.

-- 
 David Rowley                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Size of the index of one huge table has been e.g. 16-20 GB (after REINDEX). 

Size of such an index is quite big.
  
BR Samuli

On Wednesday, September 19, 2018 2:01 PM, Sam R.  
wrote:
 

 Hi!
Thanks for all of the comments!
David wrote:> if you mention 
> how muchRAM the server has and how big the data is now
Let's say for example:
RAM: 64 GB
Data: 500 GB - 1.5 TB, for example. 
( RAM: Less would of course be better, e.g. 32 GB, but we could maybe go for an 
even little bit bigger value than 64 GB, if needed to. )
BR Sam

On Wednesday, September 19, 2018 1:11 PM, David Rowley 
 wrote:
 
...
 



   

   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Thanks for the comments!
Sam wrote:

>> The data in db table columns is not needed to be kept in memory, only the 
>> index. (hash index.)


Jeff Janes wrote:
> This sounds like speculation.  Do you have hard evidence that this is 
> actually the case?
In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.

In generation phase, it is a very large random number. Our application may not 
even generate the random ID.

We use hash index over the ID.

At the moment, in "pure theory", we will read randomly through the hash 
index.So, no one will be able to know what part of the data (from the table) 
should be kept in memory. 
Side note: Of course there may be (even many) use cases, where same data is 
read again and again. Still: I am thinking now from a very theoretical point of 
view (which we may still apply in practice).

In generic: 
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / 
wish of keeping only the index in memory may be faulty.  (This is one reason 
for these discussions.)

BR Sam

  
   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-24 Thread Sam R.
Hi!
"Index in memory" topic:
After read operation starts,
I think / it seems that a big part of an index gets loaded to memory quite 
quickly. A lot of IDs fit to one 8 KB page in PostgreSQL. When reading 
operation starts, pages start to be loaded to memory quickly.

So, this "feature" / PostgreSQL may work well in our very long "random" IDs 
cases still. 
It is not needed / not possible to keep the whole index in memory: E.g. if 
there is not enough memory / size of index is bigger than memory, it is not 
even possible to keep whole index in memory. 
( Regarding in memory DB functionalities: I do not know would "In-memory" index 
/ db work in such situations, if index would not fit in memory. We would like 
to keep most of the index in memory, but not whole index in all cases e.g. when 
there is not enough memory available. )
So, again, maybe PostgreSQL works well in our case.
Regarding double buffering: I do not know how much double buffering would slow 
down operations. 
It could also be possible to turn off kernel page cache on our DB server, to 
avoid double buffering. Although, we may still keep it in use.

BR Sam

On Wednesday, September 19, 2018 5:50 PM, Sam R.  
wrote:
 

 Thanks for the comments!
Sam wrote:

>> The data in db table columns is not needed to be kept in memory, only the 
>> index. (hash index.)


Jeff Janes wrote:
> This sounds like speculation.  Do you have hard evidence that this is 
> actually the case?
In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.

In generation phase, it is a very large random number. Our application may not 
even generate the random ID.

We use hash index over the ID.

At the moment, in "pure theory", we will read randomly through the hash 
index.So, no one will be able to know what part of the data (from the table) 
should be kept in memory. 
Side note: Of course there may be (even many) use cases, where same data is 
read again and again. Still: I am thinking now from a very theoretical point of 
view (which we may still apply in practice).

In generic: 
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / 
wish of keeping only the index in memory may be faulty.  (This is one reason 
for these discussions.)

BR Sam

  
   

   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-27 Thread Sam R.
Hi!
> The double buffering> itself does not slow anything down. 

That was what I was suspecting a little. Double buffering may not matter in our 
case, because the whole server is meant for PostgreSQL only.
In our case, we can e.g. reserve almost "all memory" for PostgreSQL (shared 
buffers etc.).

Please correct me if I am wrong.
BR Sam


 
 
  On ti, syysk. 25, 2018 at 23:55, David Rowley 
wrote:   On Tue, 25 Sep 2018 at 18:36, Sam R.  wrote:
> Regarding double buffering: I do not know how much double buffering would 
> slow down operations.
> It could also be possible to turn off kernel page cache on our DB server, to 
> avoid double buffering. Although, we may still keep it in use.

I think you've misunderstood double buffering.  The double buffering
itself does not slow anything down. If the buffer is in shared buffers
already then it does not need to look any further for it. Double
buffering only becomes an issue when buffers existing 2 times in
memory causes other useful buffers to appear 0 times.

-- 
 David Rowley                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
  


One big table or split data? Writing data. From disk point of view. With a good storage (GBs/s, writing speed)

2018-10-12 Thread Sam R.
Hi!

Could someone discuss about following? It would be great to hear comments!
There is a good storage. According to "fio", write speed could be e.g. 3 GB/s. 
(It is First time using the command for me, so I am not certain of the real 
speed with "fio". E.g. with --bs=100m, direct=1, in fio. The measurement result 
may be faulty also. So, checking still.)
Currently there is one table without partitioning. The table contains json 
data. In PostgreSQL, in Linux.
Write speed can be e.g. 300 - 600 MB/s, through PostgreSQL. Measured with dstat 
while inserting. Shared buffers is large is PostgreSQL.
With a storage/disk which "scales", is there some way to write faster to the 
disk in the system through PostgreSQL?
Inside same server.
Does splitting data help? Partitioned table / splitting to smaller tables? 
Should I test it?
Change settings somewhere? Block sizes? 8 KB / 16 KB, ... "Dangerous" to change?
2nd question, sharding:
If the storage / "disk" scales, could better *disk writing speed* be achieved 
(in total) with sharding kind of splitting of data? (Same NAS storage, which 
scales, in use in all shards.)Sharding or use only one server? From pure disk 
writing speed point of view.

BR Sam


Re: One big table or split data? Writing data. From disk point of view. With a good storage (GBs/s, writing speed)

2018-10-15 Thread Sam R.
Hi!
Still, "writing" speed to disk:
With "fio" 3 GB / s.With PostgreSQL: 350 -500 MB / s, also with a partitioned 
table.
(and something similar with an other DBMS).Monitored with dstat.
With about 12 threads, then it does not get better anymore.Same results with 
data split to 4 partitions, in a partitioned table. CPU load increased, but not 
full yet.Same results with open_datasync.
BR Sam

 
 
  On pe, lokak. 12, 2018 at 19:27, Sam R. wrote:   Hi!

Could someone discuss about following? It would be great to hear comments!
There is a good storage. According to "fio", write speed could be e.g. 3 GB/s. 
(It is First time using the command for me, so I am not certain of the real 
speed with "fio". E.g. with --bs=100m, direct=1, in fio. The measurement result 
may be faulty also. So, checking still.)
Currently there is one table without partitioning. The table contains json 
data. In PostgreSQL, in Linux.
Write speed can be e.g. 300 - 600 MB/s, through PostgreSQL. Measured with dstat 
while inserting. Shared buffers is large is PostgreSQL.
With a storage/disk which "scales", is there some way to write faster to the 
disk in the system through PostgreSQL?
Inside same server.
Does splitting data help? Partitioned table / splitting to smaller tables? 
Should I test it?
Change settings somewhere? Block sizes? 8 KB / 16 KB, ... "Dangerous" to change?
2nd question, sharding:
If the storage / "disk" scales, could better *disk writing speed* be achieved 
(in total) with sharding kind of splitting of data? (Same NAS storage, which 
scales, in use in all shards.)Sharding or use only one server? From pure disk 
writing speed point of view.

BR Sam
  


Re: Scale out postgresql

2019-03-28 Thread Sam R.
Hi!
With following kinds of keywords, it is possible to find / search for cloud 
native (SQL) implementations e.g. with google: 
cloud native sql database
E.g. CockroachDB, YugaByteDB.
I do not know are you planning to do it by other means (by yourself).
I myself would be interested, has someone had experiences with such? Is HA 
provided "ready made? Is HA working fine and does it recover/handle all 
situations well, or is additional algorithms needed to be implemented in 
addition on top e.g. for automatic recovery (by "myself").
I could start an other email chain, if this chain is meant more for something 
else.
Best RegardsSam

 
 
  On to, maalisk. 28, 2019 at 12:10, Mariel 
Cherkassky wrote:   Hey,I was searching for a 
solution to scale my postgresql instance in the cloud. I'm aware of that that I 
can create many read only replicas in the cloud and it would improve my reading 
performance. I wanted to hear what solution are you familiar with ? Are there 
any sharding solution that are commonly used (citus ? pg_shard ?) My instance 
has many dbs (one per customer) and big customers can generate a load of load 
on others..


Thanks.