Re: Latest advice on SSD?

2018-04-10 Thread Aaron
RDBMS such as pg are beasts that turn random IO requests, traditionally slow in 
spinning drives, into sequential. WAL is a good example of this. 

SSDs are generally slower than spinning at sequential IO and way faster at 
random.

Expect therefore for SSD to help if you are random IO bound. (Some cloud 
vendors offer SSD as a way to get dedicated local io and bandwidth - so 
sometimes it helps stablize performance vs. virtualized shared io.)

A REASONABLE PERSON SHOULD ASSUME THAT UNBENCHMARKED AND UNRESEARCHED MIGRATION 
FROM TUNED SPINNING TO SSD WILL SLOW YOU DOWN

/Aaron 


> On Apr 10, 2018, at 12:54 PM, Benjamin Scherrey  
> wrote:
> 
> You don't mention the size of your database. Does it fit in memory? If so 
> your disks aren't going to matter a whole lot outside of potentially being 
> i/o bound on the writes. Otherwise getting your data into SSDs absolutely can 
> have a few multiples of performance impact. The NVME M.2 drives can really 
> pump out the data. Maybe push your WAL onto those (as few motherboards have 
> more than two connectors) and use regular SSDs for your data if you have high 
> write rates.
> 
> Meanwhile, if you're looking for strong cloud hosting for Postgres but the 
> speed of physical hardware, feel free to contact me as my company does this 
> for some companies who found i/o limits on regular cloud providers to be way 
> too slow for their needs. 
> 
> good luck (and pardon the crass commercial comments!),
> 
>   -- Ben Scherrey
> 
>> On Tue, Apr 10, 2018 at 9:36 AM, Craig James  wrote:
>> One of our four "big iron" (spinning disks) servers went belly up today. 
>> (Thanks, Postgres and pgbackrest! Easy recovery.) We're planning to move to 
>> a cloud service at the end of the year, so bad timing on this. We didn't 
>> want to buy any more hardware, but now it looks like we have to.
>> 
>> I followed the discussions about SSD drives when they were first becoming 
>> mainstream; at that time, the Intel devices were king. Can anyone recommend 
>> what's a good SSD configuration these days? I don't think we want to buy a 
>> new server with spinning disks.
>> 
>> We're replacing:
>>   8 core (Intel)
>>   48GB memory
>>   12-drive 7200 RPM 500GB
>>  RAID1 (2 disks, OS and WAL log)
>>  RAID10 (8 disks, postgres data dir)
>>  2 spares
>>   Ubuntu 16.04
>>   Postgres 9.6
>> 
>> The current system peaks at about 7000 TPS from pgbench.
>> 
>> Our system is a mix of non-transactional searching (customers) and 
>> transactional data loading (us).
>> 
>> Thanks!
>> Craig
>> 
>> -- 
>> -
>> Craig A. James
>> Chief Technology Officer
>> eMolecules, Inc.
>> -
> 


Re: Half billion records in one table? RDS

2017-12-05 Thread Aaron Werman
Why not store metadata in pg and the payload in S3?

On Mon, Nov 27, 2017 at 11:58 AM Jean Baro  wrote:

> Hi there,
>
> We are creating a new DB which will behave most like a file system, I
> mean, there will be no complex queries or joins running in the DB. The idea
> is to grab the WHOLE set of messages for a particular user and then filter,
> order, combine or full text search in the function itself (AWS Lambda). The
> maximum number of messages is limited to 1.000 messages per user. So we
> expect Postgres to have an amazing performance for this scenario.
>
> As I am not really familiar with PG (9.6, or 10, in case RDS release it
> before February) I would like to share what we are planning to do for this
> DB. So if you guys could share your thoughts, that would be great! :)
>
> Table structure:
>
>
>
> · MessageID (UUID) - PK
>
> · UserCountry (ISO)
>
> · UserRole (TEXT 15)
>
> · UserID (TEXT 30) – FK (although there is no constraint)
>
> · LifeCycle (RANGE DATE? Or 2 TimeStampWithTZ? Start_date and
> end_date?)
>
> · Channel (TEXT 15)
>
> · Tags (TEXT 2000)
>
> · Menu (TEXT 200)
>
> · Icon (TEXT 500) – URL to an image which will be used as an icon;
>
> · Title (TEXT 150)
>
> · *Body (JSON – up to 10K) – Meta data describing all the data to
> a specific type of message. The JSON changes according to the type of
> message. We are assuming most messages will use less than 1K for this
> field.*
>
> · Delete (BOOLEAN) – Soft Delete
>
> · Created (Timestamp – With TZ)
>
> · CreatedBy (TEXT 50)
>
>
>
> Only 1 table
>
> · Messages
>
> 3 indexes:
>
> · MessageID PK (UUID)
>
> · Main fetch key (UserCountry + UserID) - *
>
> · End_date (To locate old messages that can be moved to another
> DB - which will hold the old messages);
>
>
>
> Sizing and worst case scenario:
>
>
>
> · 500MM messages in the main DB
>
> · 4K queries per second (by UserID) – Max time of 500ms per
> query. Simples SELECT, with no ORDER, WHERE OR GROUP BY. Just grab all the
> messages for a particular user. MAX 1000 messages per USER.
>
> · 1K inserts per second on average (So that in 1 hour we can
> insert around 3MM messages)
>
> · 1K deletes per second on average (So that in 1 hour we can
> remove around 3MM messages)
>
>
> My question is:
>
>
>- Can we use any kind of compression for PostgreSQL which would result
>in reduced IO and disk size?
>- We are not relying on any kind of table partitioning, is that the
>best approach for this scenario?
>- Is PG on RDS capable of delivering this type of performance while
>requiring low maintenance?
>- What about Auto Vacuum? Any suggestion how to optimize it for such a
>work load (we will insert and delete millions of rows every day).
>
> P.S.: We are going to test all this, but if we don't get the performance
> we are expecting, all optimization tips from you guys will be really
> appreciated. :)
>
> Thanks
>
>
>
> --

Regards,
/Aaron