Re: Using enum instead of join tables

2018-04-10 Thread PT
On Tue, 10 Apr 2018 11:24:49 +0100
hmidi slim  wrote:

> Hi,
> Is it a good thing to use enum type such a column in a table instead of
> making a foreign key which references to another table?
> I found these links talking about enum and when I will use them:
> http://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/
> https://www.percona.com/blog/2008/01/24/enum-fields-vs-varchar-vs-int-joined-table-what-is-faster/
> 
> Is there any advices or new updates to use enum type in order to decrease
> the number of join between the table and optimize the performance and query
> runtime?

Performance isn't always the only issue.

Enums work well if you're 100% sure that the values will never change.
While it's not impossible to change them, it's essentially a schema change,
which can be tricky to do on an active databse, due to locking.

A foreign table has the advantage of being easy to change, but with the
extra join required to get the text representation.

A foreign table with a text field doesn't require the join, but takes up
more space and requires a cascading change if you need to change an enum
value. Adding new values is pretty easy, though.

Another option is a text field with a check constraint to ensure the data
in it stays valid. This is somewhere in between as changing the check constraint
is easier than with an enum, but harder than with a foriegn table. It doesn't
require a join to get the text representation of the value, but takes up more
space (depending on the lenght of the text for each value).

So you have to balance the requirements of your use case to decide what method
is best.

-- 
Bill Moran



Re: How to find the hits on the databases and tables in Postgres

2018-05-05 Thread PT
On Fri, 4 May 2018 17:14:39 +0530
nikhil raj  wrote:

>  Hi,
> Any one can please help me out
> 
> How to monitor the Hits on database and how many hits on each user tables
> Through query.
> Is there any other tools for that so it can full fill my requirement for it

pgBadger has always been my goto tool for that: 
https://github.com/dalibo/pgbadger

-- 
Bill Moran 



Re: Out of Memory

2018-09-25 Thread PT
On Tue, 25 Sep 2018 11:34:19 -0700 (MST)
greigwise  wrote:

> Well, I've been unsuccessful so far on creating a standalone test.   
> 
> I have put some scripting in place to capture some additional information on
> the server with the out of memory issues.   I have a script which just
> periodically dumps the output of free -m to a text file.  
> 
> So, the output of free -m immediately before and after the out of memory
> error looks like this:
> 
> Just before:
>   totalusedfree  shared  buff/cache  
> available
> Mem:  148772978 1323553   11766   
> 7943
> Swap: 0   0   0
> 
> Just after: 
>   totalusedfree  shared  buff/cache  
> available
> Mem:  148772946 6493548   11280   
> 7982
> Swap: 0   0   0
> 
> If I have nearly 8 GB of memory left, why am I getting out of memory errors?

Doesn't the default NUMA setup mean that it can't actually allocate all
the available memory to a single NUMA zone (or whatever it's called)?

Or am I talking ancient Linux history with that?

-- 
Bill Moran



Re: Adding an extra boolean column to "information_schema.columns" or "pg_attribute"

2017-12-30 Thread PT
On Sat, 30 Dec 2017 20:33:28 +0100
GPT  wrote:

> Please visit the following link:
> 
> https://stackoverflow.com/q/48028501/8895614
> 
> but thing may go further/deeper than it seems.

Adding columns to a built-in system table is liable to cause all sorts of
problems in the future: when you upgrade, migrate to another server, need to
create dev environments, etc.

The view technique described in the stackunderwhelm article seems like
a good solution to me. Maybe if you could explain why you are against
it?

> Krds
> 
> On 12/30/17, Karsten Hilbert  wrote:
> > On Sat, Dec 30, 2017 at 08:17:34PM +0100, GPT wrote:
> >
> >> I would like to add an extra boolean attribute to table columns,
> >> something like NULL. Unfortunately Pg does not support such a feature:
> >>
> >> ADD ATTRIBUTE  TYPE  TO COLUMN
> >> ;
> > ...
> >> I have already been suggested to use VIEW or dynamic SQL but I am
> >> looking something different.
> >
> > Can you explain why ?
> >
> > One thing that comes to mind is complex user types.
> >
> > Or columns of type table.
> >
> > Karsten
> > --
> > GPG key ID E4071346 @ eu.pool.sks-keyservers.net
> > E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
> >
> >
> 


-- 
Bill Moran



Re: best way to storing logs

2018-01-29 Thread PT
On Tue, 30 Jan 2018 00:38:02 +0300
Ibrahim Edib Kokdemir  wrote:

> Hi,
> In our environment, we are logging "all" statements because of the security
> considerations (future auditing if necessary). But the system is very big
> and produces 100GB logs for an hour and we expect that this will be much
> more. We are having trouble to find the disk for this amount of data.
> 
> Now, we are considering one of the following paths:
> - deduped file system for all logs.
> - parsing useful lines with syslog server or pgaudit. And are there any
> drawbacks for using remote syslog for the logs?
> 
> What should be the right path for that?

I did something like this quite recently. Here's what I learned:

rsyslog works fine. It can handle quite a bit of traffic with no problem.
There is theoretical data loss if you use UDP, so if you need true
auditability, be sure ot use TCP. We had a single dedicated logging
server receiving TONS of log data from multiple Postgres servers and
never had a problem with performance.

Rotate the logs often. Many smaller logs is easier to manage than fewer
large ones. We were rotating once an hour, and the typical size
was tens of G to about 100G per hour.

Compress older logs. Everyone rants about these newer compression algorithms,
like bzip2, but remember that they save a little more space at the cost of
a lot more processing time. gzip is still the way to go, if you ask me, but
YMMV.

After logs got a week old we shipped them off to AWS Glacier storage.
It was a pretty cost effective way to offload the problem of ever-increasing
storage requirements. YMMV on that as well.

We didn't bother to trim out any data from the logs, so I can't say how
much that would or wouldn't help.

-- 
Bill Moran



Re: Can parallel vacuum commands lead to a lock in Postgres 10.2

2018-02-15 Thread PT
On Thu, 15 Feb 2018 17:40:48 +0100
Meikel Bisping  wrote:

> Hello,
> 
> we have lots of scripts which issue individual vacuum commands to tables 
> like "vacuum full gxstage_bs" but also general "vaccum full" commands.
> Until now these scripts run sequentially and there are no problems.
> Now the idea is to run some scripts in parallel, my question is if 
> parallel vacuum commands can lead to a lock in 10.2?
> Unfortunately it isn't easy to change the scripts and rely on 
> auto-vacuum since they are installed on lots of customers' hosts and 
> installing updates is very complex.

They lock already. Running in parallel isn't going to lead to any
new locks.

I don't think that's what you're asking, though. I'm guessing that
what you're asking is are you going to experience deadlocks or other
types of unresolvable lock scenarios.

By design, no. In my experience, no.

In the real world ... it's always possible that you will uncover some
edge-case bug that leads to an unresolvable deadlock. I would rate that
as pretty unlikely, though. Locking is a pretty fundamental part of
Postgres that gets a lot of testing.

-- 
Bill Moran



Re: query's performance

2018-02-15 Thread PT
On Thu, 15 Feb 2018 22:43:59 +0100
hmidi slim  wrote:

> Hi,
> I have a table establishment which contains these columns: id, name,
> longitude, latitude.
> I want to select all the establishments in the table establishment within a
> radius from a reference point  which is a given establishment.
> For example:
> I have a given establishment called establishment1 and has an id=1.
> I want to display all the establishments within a radius 1km from this
> establishment.
> I wrote this query:
> select e.name,e1.name, e1.longitude, e1.latitude
> from establishment as e, establishment as e1
> where e.id = 1
> and e.id <> e1.id
> and ST_DWithin(ST_SetSRID(ST_MakePoint(e1.longitude, e1.latitude),
> 4326),ST_MakePoint(e.longitude, e.latitude)::geography, 1000) ;
> 
> Is it a good practice to iterate the rows of the table like I did and does
> not have an effect in the performance if the number of table's rows
> increase?

If performance is a concern, don't use long/lat in 2 columns. Instead use a
GEOGRAPHY type to store the point. You can then index on that GEOGRAPHY
column and queries against it will be quite fast.

The way you have it now is going to degrate in performance linearally with
the number of rows in the table. With the proper types and indexes, performance
degredation will be considerably less.

-- 
Bill Moran



Re: query performance

2018-02-20 Thread PT
On Sun, 18 Feb 2018 00:35:18 +0100
hmidi slim  wrote:

> Hi,
> I have two tables: establishment which contains these columns: id, name,
> longitude, latitude, geom (Geometric column)
> Product contains: id, name, establishment_id
> First of all I want to select the establishment within a radius.
> I run this query:
> select e.name, e1.name
> from establishment as e, establishment as e1
> where e.id <> e1.id
> and e1.id = 1
> and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography,
> 1000)
> 
> The result of this query returns all the establishment within a radius 1KM
> from from a given establishment which has an id = 1.
> 
> After that I want to get the product's name of each establishment from the
> query's result.
> 
> Is there an other optimized solution to make a query such this:
> select * from (
> select e.name, e1.name, e.id
> from establishment as e, establishment as e1
> where e.id <> e1.id
> and e1.id = 1
> and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography,
> 1000)) as tmp inner join product as p on p.establishment_id = tmp.id

Did you not see my earlier response?

-- 
Bill Moran



Re: Performance issues during backup

2018-02-22 Thread PT
On Fri, 23 Feb 2018 01:46:22 +
Dylan Luong  wrote:

> Hi
> Thanks for the rely.
> I have trialed the ionice -c 2 -n 7 tar…. change to our backup script and it 
> appears to have helped but not by much.

If "ionice -c 2 -n 7" helped some, maybe try "ionice -c 3" to set IO priority
to the lowest possible level.

However, I've used this technique many times on heavily loaded systems without
issue. If you're having to squirm this much to avoid problems, you probably
need to get better hardware or investigate the possibility that your
hardware is faulty or some other IO related issue.

> The affected queries are more of the update/delete/insert queries. Could 
> pg_start_backup be causing locking of some sort.

Not in my experience. And the fact that they are write queries having trouble
makes me theorize that you're saturating the write capacity of your disks.

> From: Rene Romero Benavides [mailto:rene.romer...@gmail.com]
> Sent: Wednesday, 21 February 2018 1:37 AM
> To: Laurenz Albe 
> Cc: Dylan Luong ; pgsql-general@lists.postgresql.org
> Subject: Re: Performance issues during backup
> 
> What about sending the backup to a different server? through ssh / rsync or 
> something, that would save lots of IO activity
> 
> 2018-02-20 2:02 GMT-06:00 Laurenz Albe 
> mailto:laurenz.a...@cybertec.at>>:
> Dylan Luong wrote:
> > We perform nighty base backup of our production PostgreSQL instance. We 
> > have a script that basically puts the instance
> > into back mode and then backs up (tar) the /Data directory and then takes 
> > it out of backup mode.
> > Ie,
> > psql -c "SELECT pg_start_backup('${DATE}');"
> > tar -cvf - ${DATA_DIR} --exclude ${DATA_DIR}/pg_log | split -d -b 
> > $TAR_SPLIT_SIZE - ${BACKUP_DIR}/${BACKUP_NAME}
> > psql -c "SELECT pg_stop_backup();"
> >
> > The size of our database is about 250GB and it usually takes about 1 hour 
> > to backup.
> > During this time, we have performance issue where queries can take up to 
> > 15secs to return where normally it takes 2 to 3 seconds.
> > During this time (1:30am) usage is low (less than 10 users) on the system.
> >
> > Has anyone experience the same problem and any suggestions where to look at 
> > to resolve the problem?
> 
> The "tar" is probably taking up too much I/O bandwidth.
> 
> Assuming this is Linux, you could run it with
> 
>   ionice -c 2 -n 7 tar ...
> 
> or
> 
>   ionice -c 3 tar ...
> 
> Of course then you can expect the backup to take more time.
> 
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
> 
> 
> 
> --
> El genio es 1% inspiración y 99% transpiración.
> Thomas Alva Edison
> http://pglearn.blogspot.mx/
> 


-- 
Bill Moran



Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

2018-02-23 Thread PT
On Fri, 23 Feb 2018 17:10:56 +1300
David Rowley  wrote:

> On 23 February 2018 at 04:00, Bill Moran  wrote:
> > 2) The negative impact of an unused index is tiny compared to the negative
> >  impact of a missing index.
> 
> I'd say that would entirely depend on the workload of the table and
> the entire cluster. Disk space and I/O to write WAL and index pages to
> is surely a finite resource. Not to mention the possibility of
> disallowing HOT-Updates in the heap.

I feel like you missed my point. You're absolutely right, but the
disagreement was not on whether or not an unused index could cause
problems, but on the likelihood that the OP was going to build the
simulation code to actually determine whether the index is needed
or not. Information from the original question led me to believe
that simulation was either beyond his skill level or beyond his
time allocation; so I provided a less good, but more likely to be
helpful (in my opinion) answer.

The pattern that almost ALL new ventures I've seen follow is:
1) Hack something together based on an idea for a product
2) If the product actually succeeds, experience tons of issues
 related to scaling and performance
3) Run around like a crazy person fixing all the scaling and
 performance issues
4) Sell the company to someone else who ultimately becomes responsible
 for maturing the software

In theory, there's no reason this pattern _has_ to be so prevalent,
yet it is. Probably becuase it appears to minimize the up front cost,
which the people footing the bill just love.

Given that process, "shotgun" indexing is part of step 1 or step 3.
Whereas the building of load simulations and _real_ tuning of the
system is relegated to step 4.

Since investers tend to want to get out quick if #2 isn't going to
happen, they don't want people doing work that they consider part
of step #4.

I'm a pragmatist. I'd love to see everyone build software in a sane,
well-considered manner. I'd also love to see government without
corruption. However, in the world I _actually_ see, those things aren't
prevalent.

> It would seem to me that anyone using the "shotgun" indexing method
> may end up having to learn more about indexing the hard way.  Learning
> the hard way is something I like to try to avoid, personally. Probably
> it all has to come down to how important it is that your app actually
> can handle the load vs devs/dba experience level divided by time, both
> of the latter two are also a finite resource. So, it probably all has
> to be carefully balanced and quite possibly a person's opinion
> strongly relates to their experience.  If you were in the air traffic
> control business, perhaps your opinion might not be the same!? ...
> Sorry, the aeroplane crashed because the replica instance lagged and
> the plane's location wasn't updated... Analysis shows that the DBA
> indexed every column in the table and the WAL volume was more than the
> network's bandwidth could handle over the holiday period.  (Note: I
> know nothing about air traffic control, but it does seem like
> something you'd want to make stable systems for, games on the
> internet, probably less so...).

I really hope that people writing ATC software have the experience to
do really good testing (including load simulation, etc) but the 3 mile
island accident happened partially because of faulty sensor design, so
there's no guarantee that's the case.

Interesting discussion.

-- 
Bill Moran