Re: Using enum instead of join tables
On Tue, 10 Apr 2018 11:24:49 +0100 hmidi slim wrote: > Hi, > Is it a good thing to use enum type such a column in a table instead of > making a foreign key which references to another table? > I found these links talking about enum and when I will use them: > http://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/ > https://www.percona.com/blog/2008/01/24/enum-fields-vs-varchar-vs-int-joined-table-what-is-faster/ > > Is there any advices or new updates to use enum type in order to decrease > the number of join between the table and optimize the performance and query > runtime? Performance isn't always the only issue. Enums work well if you're 100% sure that the values will never change. While it's not impossible to change them, it's essentially a schema change, which can be tricky to do on an active databse, due to locking. A foreign table has the advantage of being easy to change, but with the extra join required to get the text representation. A foreign table with a text field doesn't require the join, but takes up more space and requires a cascading change if you need to change an enum value. Adding new values is pretty easy, though. Another option is a text field with a check constraint to ensure the data in it stays valid. This is somewhere in between as changing the check constraint is easier than with an enum, but harder than with a foriegn table. It doesn't require a join to get the text representation of the value, but takes up more space (depending on the lenght of the text for each value). So you have to balance the requirements of your use case to decide what method is best. -- Bill Moran
Re: How to find the hits on the databases and tables in Postgres
On Fri, 4 May 2018 17:14:39 +0530 nikhil raj wrote: > Hi, > Any one can please help me out > > How to monitor the Hits on database and how many hits on each user tables > Through query. > Is there any other tools for that so it can full fill my requirement for it pgBadger has always been my goto tool for that: https://github.com/dalibo/pgbadger -- Bill Moran
Re: Out of Memory
On Tue, 25 Sep 2018 11:34:19 -0700 (MST) greigwise wrote: > Well, I've been unsuccessful so far on creating a standalone test. > > I have put some scripting in place to capture some additional information on > the server with the out of memory issues. I have a script which just > periodically dumps the output of free -m to a text file. > > So, the output of free -m immediately before and after the out of memory > error looks like this: > > Just before: > totalusedfree shared buff/cache > available > Mem: 148772978 1323553 11766 > 7943 > Swap: 0 0 0 > > Just after: > totalusedfree shared buff/cache > available > Mem: 148772946 6493548 11280 > 7982 > Swap: 0 0 0 > > If I have nearly 8 GB of memory left, why am I getting out of memory errors? Doesn't the default NUMA setup mean that it can't actually allocate all the available memory to a single NUMA zone (or whatever it's called)? Or am I talking ancient Linux history with that? -- Bill Moran
Re: Adding an extra boolean column to "information_schema.columns" or "pg_attribute"
On Sat, 30 Dec 2017 20:33:28 +0100 GPT wrote: > Please visit the following link: > > https://stackoverflow.com/q/48028501/8895614 > > but thing may go further/deeper than it seems. Adding columns to a built-in system table is liable to cause all sorts of problems in the future: when you upgrade, migrate to another server, need to create dev environments, etc. The view technique described in the stackunderwhelm article seems like a good solution to me. Maybe if you could explain why you are against it? > Krds > > On 12/30/17, Karsten Hilbert wrote: > > On Sat, Dec 30, 2017 at 08:17:34PM +0100, GPT wrote: > > > >> I would like to add an extra boolean attribute to table columns, > >> something like NULL. Unfortunately Pg does not support such a feature: > >> > >> ADD ATTRIBUTE TYPE TO COLUMN > >> ; > > ... > >> I have already been suggested to use VIEW or dynamic SQL but I am > >> looking something different. > > > > Can you explain why ? > > > > One thing that comes to mind is complex user types. > > > > Or columns of type table. > > > > Karsten > > -- > > GPG key ID E4071346 @ eu.pool.sks-keyservers.net > > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > > > > > -- Bill Moran
Re: best way to storing logs
On Tue, 30 Jan 2018 00:38:02 +0300 Ibrahim Edib Kokdemir wrote: > Hi, > In our environment, we are logging "all" statements because of the security > considerations (future auditing if necessary). But the system is very big > and produces 100GB logs for an hour and we expect that this will be much > more. We are having trouble to find the disk for this amount of data. > > Now, we are considering one of the following paths: > - deduped file system for all logs. > - parsing useful lines with syslog server or pgaudit. And are there any > drawbacks for using remote syslog for the logs? > > What should be the right path for that? I did something like this quite recently. Here's what I learned: rsyslog works fine. It can handle quite a bit of traffic with no problem. There is theoretical data loss if you use UDP, so if you need true auditability, be sure ot use TCP. We had a single dedicated logging server receiving TONS of log data from multiple Postgres servers and never had a problem with performance. Rotate the logs often. Many smaller logs is easier to manage than fewer large ones. We were rotating once an hour, and the typical size was tens of G to about 100G per hour. Compress older logs. Everyone rants about these newer compression algorithms, like bzip2, but remember that they save a little more space at the cost of a lot more processing time. gzip is still the way to go, if you ask me, but YMMV. After logs got a week old we shipped them off to AWS Glacier storage. It was a pretty cost effective way to offload the problem of ever-increasing storage requirements. YMMV on that as well. We didn't bother to trim out any data from the logs, so I can't say how much that would or wouldn't help. -- Bill Moran
Re: Can parallel vacuum commands lead to a lock in Postgres 10.2
On Thu, 15 Feb 2018 17:40:48 +0100 Meikel Bisping wrote: > Hello, > > we have lots of scripts which issue individual vacuum commands to tables > like "vacuum full gxstage_bs" but also general "vaccum full" commands. > Until now these scripts run sequentially and there are no problems. > Now the idea is to run some scripts in parallel, my question is if > parallel vacuum commands can lead to a lock in 10.2? > Unfortunately it isn't easy to change the scripts and rely on > auto-vacuum since they are installed on lots of customers' hosts and > installing updates is very complex. They lock already. Running in parallel isn't going to lead to any new locks. I don't think that's what you're asking, though. I'm guessing that what you're asking is are you going to experience deadlocks or other types of unresolvable lock scenarios. By design, no. In my experience, no. In the real world ... it's always possible that you will uncover some edge-case bug that leads to an unresolvable deadlock. I would rate that as pretty unlikely, though. Locking is a pretty fundamental part of Postgres that gets a lot of testing. -- Bill Moran
Re: query's performance
On Thu, 15 Feb 2018 22:43:59 +0100 hmidi slim wrote: > Hi, > I have a table establishment which contains these columns: id, name, > longitude, latitude. > I want to select all the establishments in the table establishment within a > radius from a reference point which is a given establishment. > For example: > I have a given establishment called establishment1 and has an id=1. > I want to display all the establishments within a radius 1km from this > establishment. > I wrote this query: > select e.name,e1.name, e1.longitude, e1.latitude > from establishment as e, establishment as e1 > where e.id = 1 > and e.id <> e1.id > and ST_DWithin(ST_SetSRID(ST_MakePoint(e1.longitude, e1.latitude), > 4326),ST_MakePoint(e.longitude, e.latitude)::geography, 1000) ; > > Is it a good practice to iterate the rows of the table like I did and does > not have an effect in the performance if the number of table's rows > increase? If performance is a concern, don't use long/lat in 2 columns. Instead use a GEOGRAPHY type to store the point. You can then index on that GEOGRAPHY column and queries against it will be quite fast. The way you have it now is going to degrate in performance linearally with the number of rows in the table. With the proper types and indexes, performance degredation will be considerably less. -- Bill Moran
Re: query performance
On Sun, 18 Feb 2018 00:35:18 +0100 hmidi slim wrote: > Hi, > I have two tables: establishment which contains these columns: id, name, > longitude, latitude, geom (Geometric column) > Product contains: id, name, establishment_id > First of all I want to select the establishment within a radius. > I run this query: > select e.name, e1.name > from establishment as e, establishment as e1 > where e.id <> e1.id > and e1.id = 1 > and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography, > 1000) > > The result of this query returns all the establishment within a radius 1KM > from from a given establishment which has an id = 1. > > After that I want to get the product's name of each establishment from the > query's result. > > Is there an other optimized solution to make a query such this: > select * from ( > select e.name, e1.name, e.id > from establishment as e, establishment as e1 > where e.id <> e1.id > and e1.id = 1 > and ST_DWithin(geom, ST_MakePoint(e1.longitude, e1.latitude)::geography, > 1000)) as tmp inner join product as p on p.establishment_id = tmp.id Did you not see my earlier response? -- Bill Moran
Re: Performance issues during backup
On Fri, 23 Feb 2018 01:46:22 + Dylan Luong wrote: > Hi > Thanks for the rely. > I have trialed the ionice -c 2 -n 7 tar…. change to our backup script and it > appears to have helped but not by much. If "ionice -c 2 -n 7" helped some, maybe try "ionice -c 3" to set IO priority to the lowest possible level. However, I've used this technique many times on heavily loaded systems without issue. If you're having to squirm this much to avoid problems, you probably need to get better hardware or investigate the possibility that your hardware is faulty or some other IO related issue. > The affected queries are more of the update/delete/insert queries. Could > pg_start_backup be causing locking of some sort. Not in my experience. And the fact that they are write queries having trouble makes me theorize that you're saturating the write capacity of your disks. > From: Rene Romero Benavides [mailto:rene.romer...@gmail.com] > Sent: Wednesday, 21 February 2018 1:37 AM > To: Laurenz Albe > Cc: Dylan Luong ; pgsql-general@lists.postgresql.org > Subject: Re: Performance issues during backup > > What about sending the backup to a different server? through ssh / rsync or > something, that would save lots of IO activity > > 2018-02-20 2:02 GMT-06:00 Laurenz Albe > mailto:laurenz.a...@cybertec.at>>: > Dylan Luong wrote: > > We perform nighty base backup of our production PostgreSQL instance. We > > have a script that basically puts the instance > > into back mode and then backs up (tar) the /Data directory and then takes > > it out of backup mode. > > Ie, > > psql -c "SELECT pg_start_backup('${DATE}');" > > tar -cvf - ${DATA_DIR} --exclude ${DATA_DIR}/pg_log | split -d -b > > $TAR_SPLIT_SIZE - ${BACKUP_DIR}/${BACKUP_NAME} > > psql -c "SELECT pg_stop_backup();" > > > > The size of our database is about 250GB and it usually takes about 1 hour > > to backup. > > During this time, we have performance issue where queries can take up to > > 15secs to return where normally it takes 2 to 3 seconds. > > During this time (1:30am) usage is low (less than 10 users) on the system. > > > > Has anyone experience the same problem and any suggestions where to look at > > to resolve the problem? > > The "tar" is probably taking up too much I/O bandwidth. > > Assuming this is Linux, you could run it with > > ionice -c 2 -n 7 tar ... > > or > > ionice -c 3 tar ... > > Of course then you can expect the backup to take more time. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > > > > -- > El genio es 1% inspiración y 99% transpiración. > Thomas Alva Edison > http://pglearn.blogspot.mx/ > -- Bill Moran
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
On Fri, 23 Feb 2018 17:10:56 +1300 David Rowley wrote: > On 23 February 2018 at 04:00, Bill Moran wrote: > > 2) The negative impact of an unused index is tiny compared to the negative > > impact of a missing index. > > I'd say that would entirely depend on the workload of the table and > the entire cluster. Disk space and I/O to write WAL and index pages to > is surely a finite resource. Not to mention the possibility of > disallowing HOT-Updates in the heap. I feel like you missed my point. You're absolutely right, but the disagreement was not on whether or not an unused index could cause problems, but on the likelihood that the OP was going to build the simulation code to actually determine whether the index is needed or not. Information from the original question led me to believe that simulation was either beyond his skill level or beyond his time allocation; so I provided a less good, but more likely to be helpful (in my opinion) answer. The pattern that almost ALL new ventures I've seen follow is: 1) Hack something together based on an idea for a product 2) If the product actually succeeds, experience tons of issues related to scaling and performance 3) Run around like a crazy person fixing all the scaling and performance issues 4) Sell the company to someone else who ultimately becomes responsible for maturing the software In theory, there's no reason this pattern _has_ to be so prevalent, yet it is. Probably becuase it appears to minimize the up front cost, which the people footing the bill just love. Given that process, "shotgun" indexing is part of step 1 or step 3. Whereas the building of load simulations and _real_ tuning of the system is relegated to step 4. Since investers tend to want to get out quick if #2 isn't going to happen, they don't want people doing work that they consider part of step #4. I'm a pragmatist. I'd love to see everyone build software in a sane, well-considered manner. I'd also love to see government without corruption. However, in the world I _actually_ see, those things aren't prevalent. > It would seem to me that anyone using the "shotgun" indexing method > may end up having to learn more about indexing the hard way. Learning > the hard way is something I like to try to avoid, personally. Probably > it all has to come down to how important it is that your app actually > can handle the load vs devs/dba experience level divided by time, both > of the latter two are also a finite resource. So, it probably all has > to be carefully balanced and quite possibly a person's opinion > strongly relates to their experience. If you were in the air traffic > control business, perhaps your opinion might not be the same!? ... > Sorry, the aeroplane crashed because the replica instance lagged and > the plane's location wasn't updated... Analysis shows that the DBA > indexed every column in the table and the WAL volume was more than the > network's bandwidth could handle over the holiday period. (Note: I > know nothing about air traffic control, but it does seem like > something you'd want to make stable systems for, games on the > internet, probably less so...). I really hope that people writing ATC software have the experience to do really good testing (including load simulation, etc) but the 3 mile island accident happened partially because of faulty sensor design, so there's no guarantee that's the case. Interesting discussion. -- Bill Moran