PG Sharding

2018-01-29 Thread Matej
Hi Everyone.

We are looking at a rather large fin-tech installation. But as scalability
requirements are high we look at sharding of-course.

I have looked at many sources for Postgresql sharding, but we are a little
confused as to shared with schema or databases or both.


So far our understanding:

*SCHEMA.*

PROS:
- seems native to PG
- backup seems easier
- connection pooling seems easier, as you can use same connection between
shard.

CONS:
- schema changes seems litlle more complicated
- heard of backup and maintenance problems
- also some caching  problems.

*DATABASE:*

PROS:
- schema changes litlle easier
- backup and administration seems more robust

CONS:
- heard of vacuum problems
- connection pooling is hard, as 100 shards would mean 100 pools


So what is actually the right approach? If anyone could  shed some light on
my issue.

*Thanks*


Re: PG Sharding

2018-01-29 Thread Matej
Hi Thomas.

Thanks.

Also looked at those solutions:
- PGXL Am a ltille afraid we would be the test dummies. Did not hear of
many production installs.
- Citus seems heavily limited scalability vise, because of the master node
design.

Regarding  partitioning we are considering ourselves pg_pathman. Was hoping
on PG10 partitioning but currently not really many changes performance
vise.

Overall we are still considering manual APP/sharding as this seems to be
the most scalable approach which least added latency. The builtin solutions
seems to introduce extra lag and I am afraid of what to do when something
goes wrong. then it's not a black box anymore and you have to study the
details.

For node loss we plan a master -slave setup, and there will not be so many
cross shard queries.

BR
Matej


2018-01-29 16:15 GMT+01:00 Thomas Boussekey :

> Hello,
>
> Facing the same situation, I'm considering 3 solutions:
> - Sharding with postgres_xl (waiting for a Pg10 release)
> - Sharding with citusdata (Release 7.2, compatible with Pg10 and
> pg_partman, seems interesting)
> - Partitioning with PG 10 native partitioning or pg_partman
>
> With colleagues, we have tested the 3 scenarios.
> Sharding looks interesting, but you have to apprehend its behaviour in
> case of node loss, or cross-node queries.
>
> Thomas
>
> 2018-01-29 15:44 GMT+01:00 Melvin Davidson :
>
>>
>>
>> On Mon, Jan 29, 2018 at 9:34 AM, Matej  wrote:
>>
>>> Hi Everyone.
>>>
>>> We are looking at a rather large fin-tech installation. But as
>>> scalability requirements are high we look at sharding of-course.
>>>
>>> I have looked at many sources for Postgresql sharding, but we are a
>>> little confused as to shared with schema or databases or both.
>>>
>>>
>>> So far our understanding:
>>>
>>> *SCHEMA.*
>>>
>>> PROS:
>>> - seems native to PG
>>> - backup seems easier
>>> - connection pooling seems easier, as you can use same connection
>>> between shard.
>>>
>>> CONS:
>>> - schema changes seems litlle more complicated
>>> - heard of backup and maintenance problems
>>> - also some caching  problems.
>>>
>>> *DATABASE:*
>>>
>>> PROS:
>>> - schema changes litlle easier
>>> - backup and administration seems more robust
>>>
>>> CONS:
>>> - heard of vacuum problems
>>> - connection pooling is hard, as 100 shards would mean 100 pools
>>>
>>>
>>> So what is actually the right approach? If anyone could  shed some
>>> light on my issue.
>>>
>>> *Thanks*
>>>
>>>
>>>
>>
>> *You might also want to consider GridSQL. IIRC it was originally
>> developed by EnterpriseDB. I saw a demo of it a few years ago and it was
>> quite impressive, *
>> *but I've had no interaction with it since, so you will have to judge for
>> yourself.*
>>
>>
>> *https://sourceforge.net/projects/gridsql/?source=navbar
>> <https://sourceforge.net/projects/gridsql/?source=navbar>*
>>
>> --
>> *Melvin Davidson*
>> I reserve the right to fantasize.  Whether or not you
>> wish to share my fantasy is entirely up to you.
>>
>
>


Re: PG Sharding

2018-01-30 Thread Matej
We are looking for multi tenancy but at scale. That's why the sharding and
partitioning. It depends how you look at the distributed part.

BR

Matej

29. jan. 2018 17.50 je oseba "Rakesh Kumar" 
napisala:

>
>
> > On Jan 29, 2018, at 09:34 , Matej  wrote:
> >
> > Hi Everyone.
> >
> > We are looking at a rather large fin-tech installation. But as
> scalability requirements are high we look at sharding of-course.
> >
> > I have looked at many sources for Postgresql sharding, but we are a
> little confused as to shared with schema or databases or both.
> >
> >
> > So far our understanding:
> >
> > SCHEMA.
> >
> > PROS:
> > - seems native to PG
> > - backup seems easier
> > - connection pooling seems easier, as you can use same connection
> between shard.
> >
> > CONS:
> > - schema changes seems litlle more complicated
> > - heard of backup and maintenance problems
> > - also some caching  problems.
> >
> > DATABASE:
> >
> > PROS:
> > - schema changes litlle easier
> > - backup and administration seems more robust
> >
> > CONS:
> > - heard of vacuum problems
> > - connection pooling is hard, as 100 shards would mean 100 pools
> >
> >
> > So what is actually the right approach? If anyone could  shed some light
> on my issue.
>
> From your description it seems your requirement is more of multi tenancy
> in a non distributed env, rather than distributed Sharding env.
>
>
>


Re: PG Sharding

2018-01-30 Thread Matej
As already said. It's missing 2 level sharding and is restricted with SPOF.

BR

Matej

2018-01-30 12:05 GMT+01:00 Rakesh Kumar :

>
>
>
> >We are looking for multi tenancy but at scale. That's why the sharding
> and partitioning. It depends how you look at the distributed part.
>
> Citusdb.
>


Re: PG Sharding

2018-01-31 Thread Matej
Thanks Thomas.

Still fancying the manual approach litlle bit more.

Will probably go with 8 database and 32 schema per machine. This way, will
keep in limits for administration tools as well as autovacuum, also will be
ready for connection pooling, as 8 databases is not to much.

This will give us 256 shard per machine, but will be tunable. The lower
number will also prevent to much memory/disk fragmentation and with this
bad cache hit ratios.

Will also use monthly partitioning per shard, to reduce the change of big
tables forming.

Thanks everyone.


2018-01-30 15:26 GMT+01:00 Thomas Boussekey :

> Using citusdb enterprise, you can replicate the table shards.
>
> Here is the link to the documentation:
> https://docs.citusdata.com/en/v7.2/reference/user_defined_
> functions.html#replicate-table-shards
>
> Regards,
> Thomas
>
>
> 2018-01-30 12:18 GMT+01:00 Matej :
>
>> As already said. It's missing 2 level sharding and is restricted with
>> SPOF.
>>
>> BR
>>
>> Matej
>>
>> 2018-01-30 12:05 GMT+01:00 Rakesh Kumar :
>>
>>>
>>>
>>>
>>> >We are looking for multi tenancy but at scale. That's why the sharding
>>> and partitioning. It depends how you look at the distributed part.
>>>
>>> Citusdb.
>>>
>>
>>
>


Re: PG Sharding

2018-02-02 Thread Matej
I thought that this kind of solution had high latency and bad OLTP
capabilities (low trans/second)? Analytics is not a high priority.

BR

2018-02-01 19:01 GMT+01:00 Dan Wierenga :

>
>
> On Wed, Jan 31, 2018 at 7:48 PM, Steven Lembark 
> wrote:
>
>> On Mon, 29 Jan 2018 15:34:18 +0100
>> Matej  wrote:
>>
>> > Hi Everyone.
>> >
>> > We are looking at a rather large fin-tech installation. But as
>> > scalability requirements are high we look at sharding of-course.
>> >
>> > I have looked at many sources for Postgresql sharding, but we are a
>> > little confused as to shared with schema or databases or both.
>>
>> Suggest looking at the Xreme Data product. It is a parallel,
>> shared-nothing implementation of PG that should solve your
>> needs rather nicely.
>>
>> You can see a description of their product at
>> https://xtremedata.com/
>>
>> Happy scaling :-)
>>
>>
> Having been a production DBA for both the DBX (XtremeData) and the
> Greenplum MPP database platforms, IMO Greenplum is far superior to DBX.
> Issues with the GP master node being a single point of failure are solved
> by a secondary master node and automatic failover technology e.g.
> keepalived.
>
> But, it sounds like the OP is not really looking for the kind of scale
> that an MPP solution provides, but rather the kind of scale that is
> typically solved by a service-orchestration suite.  I don't think that "a
> rather large fin-tech installation" with "high scalability requirements" is
> really enough detail to give a recommendation on orchestration software.
>
> -dan
>