from:"Vick Khera"

Re: Version upgrade: is restoring the postgres database needed?

2018-03-01 Thread Vick Khera

On Thu, Mar 1, 2018 at 11:15 AM, Ron Johnson  wrote:

> No, I do:
>
> $ pg_dump -Fc PROD > PROD.pgdump
> $ pg_dump --globals-only postgres > globals.sql
> $ pg_dump -Fc postgres > postgres.pgdump
>
>
That's how I back them up as well. You are correct that all you need to do
is restore the globals.sql, then each "pgdump" file individually. Just
ignore the warning when it tries to restore your initial postgres
superuser, since it was created by the initdb already.

You probably don't need the "postgres" db at all, since it is just there to
allow the client to connect to something on initial install. Normally you
don't use it in production.

Re: Version upgrade: is restoring the postgres database needed?

2018-03-01 Thread Vick Khera

On Thu, Mar 1, 2018 at 11:51 AM, Ron Johnson  wrote:

> Good.  What, then, have I forgotten to restore such that the "Access
> privileges" are showing on my current 9.2 servers, but not on the
> newly-restored 9.6.6 server?
>
> *Current*
> postgres=# \l
>List of databases
> Name |  Owner   | Encoding |   Collate   |Ctype|   Access
> privileges
> -+--+--+-+--
> ---+---
> CSSCAT_STI   | CSS  | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> CSS=CTc/CSS  +
>  |  |  | | |
> =Tc/CSS  +
>  |  |  | | |
> app_user=CTc/CSS
> CSSCAT_STIB  | CSS  | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> CSS=CTc/CSS  +
>  |  |  | | |
> =Tc/CSS  +
>  |  |  | | |
> app_user=CTc/CSS
> CSSCAT_STIC  | CSS  | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> CSS=CTc/CSS  +
>  |  |  | | |
> =Tc/CSS  +
>  |  |  | | |
> app_user=CTc/CSS
>
> *Newly restored*
> postgres=# \l
>List of databases
> Name |  Owner   | Encoding |   Collate   |Ctype|   Access
> privileges
> -+--+--+-+--
> ---+---
> CSSCAT_STIB | CSS  | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> CSSCAT_STIC | CSS  | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> postgres| postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>
>
FWIW none of my databases other than template0 and template1 have anything
listed for Access privileges like that. I'm not even sure exactly what
those are for :(

Any privileges from REVOKEs and GRANTs will be in the dump, so those are
restored.

Re: Is there a continuous backup for pg ?

2018-03-02 Thread Vick Khera

On Fri, Mar 2, 2018 at 2:34 PM, Gary M  wrote:

> Thanks Josh,
>
> PITR is the option I was recommending. Project management diligence
> dictates I need at least one other option from a different recommending
> source, two other options optimally.
>

File system snapshot on a robust system like ZFS. Rollback to any snapshot
almost trivially, and view any snapshot trivially.

Re: Jira database won't start after disk filled up

2018-03-02 Thread Vick Khera

On Fri, Mar 2, 2018 at 4:32 PM, Paul Costello  wrote:

> I have a database that wouldn't start due to the disk filling up back on
> 1/10, unbeknownst to us until 2/27.  This is jira, so it's critical data.
> It appears jira was running in memory that entire time.
>

Those first two sentences seem contradictory...

>
> I needed to run pg_resetxlog -f in order to start the database.  It
> started, but upon logging in I found the system catalog and some data to be
> corrupt.
>

Once you did this, fixing the data is really on you. Postgres has no way to
know what any of the data mean, nor how to decide what to keep and what to
toss on those conflicting rows with duplicate keys.

What I'd personally do is take your 1/5 backup, then merge in rows for
tickets and affiliated data from whatever you can recover in the current
database copy you have. Once that's done, run jira's built-in integrity
checker then do a full export to XML backup format. Finally re-import that
into a fresh jira so you know what's in there is consistent.  You'll
probably also have to cross-reference the attachments directory for missing
tickets and clean up those files (or synthesize tickets for them).

If your jira is configured to send email somewhere on ticket updates,
gathering those (even if it is in multiple people's mailboxes) and
recreating ticket info from them would also move you along.

You will lose some of your data because not all of it was written to disk.

Re: Jira database won't start after disk filled up

2018-03-04 Thread Vick Khera

On Fri, Mar 2, 2018 at 5:13 PM, Paul Costello  wrote:

> My hope is that I can get the db back to 1/10 and maybe we can, with
> Atlassian's help, somehow sync the lucene files back to the db.  I don't
> think I will have any postgres data to work with beyond 1/10.
>
> Does this still sound do-able with that kind of data gap?
>
>
I'm not sure how the incremental updates to the lucene indexes work with
Jira. If they are parallel to writing to the DB maybe you can recover some
info there; if they are trickled out asynchronously after writing to the DB
by an index process that reads back the DB, then I'd expect there to be no
additional info there.

Perhaps the best you can do is get Jira to run its integrity checker on the
current data and fix whatever it tells you to fix. I think Atlassian will
know best.

Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade

2018-04-16 Thread Vick Khera

Hi Keith,

Not sure if this will help but a couple of years ago I migrated from an
SQL_ASCII encoding to UTF8. The data was primarily ASCII with some windows
garbage, and a little bit of UTF8 from customers filling out forms that
were not specifically encoded anything.

I wrote a utility that in-place scans and updates the tables in your
SQL_ASCII-encoded database and ensures that everything is 100% UTF8 NFC at
the end. For us, there were some characters in some bizarre local
encodings, and we had to either toss or make educated guesses for them.

After the cleaning, you dump with client encoding UTF8, then restore into
the final database with UTF8 encoding.

You can find it on my github along with documentation and tests to verify
it works: https://github.com/khera/utf8-inline-cleaner

On Mon, Apr 16, 2018 at 11:16 AM, Keith Fiske 
wrote:

> Running into an issue with helping a client upgrade from 8.3 to 10 (yes, I
> know, please keep the out of support comments to a minimum, thanks :).
>
> The old database was in SQL_ASCII and it needs to stay that way for now
> unfortunately. The dump and restore itself works fine, but we're now
> running into issues with some data returning encoding errors unless we
> specifically set the client_encoding value to SQL_ASCII.
>
> Looking at the 8.3 database, it has the client_encoding value set to UTF8
> and queries seem to work fine. Is this just a bug in the old 8.3 not
> enforcing encoding properly?
>
> The other thing I noticed on the 10 instance was that, while the LOCALE
> was set to SQL_ASCII, the COLLATE and CTYPE values for the restored
> databases were en_US.UTF-8. Could this be having an affect? Is there any
> way to see what these values were on the old 8.3 database? The pg_database
> catalog does not have these values stored back then.
>
> --
> Keith Fiske
> Senior Database Engineer
> Crunchy Data - http://crunchydata.com
>

Re: Can PostgreSQL create new WAL files instead of reusing old ones?

2018-04-18 Thread Vick Khera

On Wed, Apr 18, 2018 at 12:35 AM, Michael Paquier 
wrote:

>
> That looks like a rather difficult problem to solve in PostgreSQL
> itself, as the operator running the cluster is in charge of setting up
> the FS options which would control the COW behavior, so it seems to me
>

You cannot turn off CoW on ZFS. What other behavior would you refer to here?

I suppose one could make a dedicated data set for the WAL and have ZFS make
a reservation for about 2x the total expected WAL size. It would require
careful attention to detail if you increase WAL segments configuration,
though, and if you had any kind of hiccup with streaming replication that
caused the segments to stick around longer than expected (but that's no
different from any other file system).

Re: Doubts about replication..

2018-04-19 Thread Vick Khera

You want the replication for backup purposes. What is the use case for your
backups: Is it disaster recovery? Is it archiving historical data? Is it
failover? Is it off-site backup?

If you outline your needs then the proper solution can be offered. There
are other methods than just using the built-in binary file replication.

Based on your use of the words "as simple backup" it seems to me you would
be better off just using pg_dump periodically to copy the database to the
backup system. I would use the same version of pg_dump as the database for
maximal compatibility on restoring to that version.

As for your versions, all of the 9.4.x should be upgraded to the latest
9.4.x release there is. This is a simple upgrade and restart, and very safe
to do. You will get many fixed bugs, some of which could cause data loss.
Similarly for the 9.5.x release.

On Thu, Apr 19, 2018 at 1:57 PM, Edmundo Robles 
wrote:

>
>
> I have several versions of postgres 9.4.5, 9.4.4, 9.4.15 (3), 9.5.3
> in different versions of Debian 7.6, 7.8, 7.11, 8.5 and 8.6.
>
> I need to replicate the databases and I have clear that I must update all
> to one version.
> My main question is, Do you  recommended me update to 9.6 or better update
> to 10?.
>
> Actually, is not the goal have high availability . I will use replication
> as simple backup.
> For reasons of $$$ I can only have 1 server in which I will replicate the
> 6 databases.
>
> Do you recommend using a postgres service for the 6 databases?, or better,
> I think,   I must run  a postgres service in different ports, for each
> database?.
>
> thanks in advance.
> regards!
> --
>
>

Re: A couple of pg_dump questions

2018-04-20 Thread Vick Khera

On Thu, Apr 19, 2018 at 6:39 PM, Ron  wrote:

>
> $ pg_dump --host=farawaysrvr -Fc $REMOTEDB > /local/disk/backups/$REMOTEDB.
> dump
>
> Is the data compressed on the remote server (thus minimizing traffic on
> the wire), or locally?  (I'd test this myself, but the company has really
> strict firewall rules in place.)
>
>
The compression is done locally. If your network is slower than
compression, then you could tunnel it through an SSH connection with
compression enabled.

As for your parallel dump, you cannot do that to stdout and "c" format. You
have to tell pg_dump the directory name to write, as that is the only
format that supports parallel dumps.

Re: Postgresql database encryption

2018-04-20 Thread Vick Khera

On Fri, Apr 20, 2018 at 11:24 AM, Vikas Sharma  wrote:

> Hello Guys,
>
> Could someone throw light on the postgresql instance wide or database wide
> encryption please? Is this possible in postgresql and been in use in
> production?.
>

For anyone to offer a proper solution, you need to say what purpose your
encryption will serve. Does the data need to be encrypted at rest? Does it
need to be encrypted in memory? Does it need to be encrypted at the
database level or at the application level? Do you need to be able to query
the data? There are all sorts of scenarios and use cases, and you need to
be more specific.

For me, using whole-disk encryption solved my need, which was to ensure
that the data on disk cannot be read once removed from the server. For
certain fields in one table, I use application level encryption so only the
application itself can see the original data. Anyone else querying that
table sees the encrypted blob, and it was not searchable.

Re: Backup Strategy Advise

2018-04-24 Thread Vick Khera

On Tue, Apr 24, 2018 at 10:50 AM, David Gauthier 
wrote:

> Typically, I would think doing a weekly full backup, daily incremental
> backups and turn on journaling to capture what goes on since the last
> backup.
>

This is almost the whole concept of the streaming replication built into
postgres, except you are not applying the stream but archiving it. If you
have atomic file system snapshots, you can implement this strategy along
the lines of marking the DB snapshot for binary backup, snapshot the file
system, then copy that snapshot file system off to another system (locally
or off-site), meanwhile you accumulate the log files just as you would for
streaming replication. Once the copy is done, you can release the file
system snapshot and continue to archive the logs similarly to how you would
send them to a remote system for being applied. You just don't apply them
until you need to do the recovery.

Or just set up streaming replication to a hot-standby, because that's the
right thing to do. For over a decade I did this with twin servers and
slony1 replication. The cost of the duplicate hardware was nothing compared
to not having downtime.

Re: Known Bugs on Postgres 9.5

2018-05-05 Thread Vick Khera

If you're using Postgres for Jira and Confluence, you should upgrade to
9.6. It is the newest version that is known to work.  Version 9.5 will also
work but you will get better performance and longer life out of the 9.6
version.

Re: Known Bugs on Postgres 9.5

2018-05-06 Thread Vick Khera

On Sat, May 5, 2018 at 7:49 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> If you want an opinion, opting for more than one release behind current on
> fear grounds is an ultra-conservative position - you might as well minimize
> the pain and go recent and keep it longer.  At this point I'd be asking
> myself whether 9.6 or 10 is the better choice.  Though with a third-party
> application that runs on top of it I'd probably limit myself to the highest
> release they purport to support.
>
>
Precisely why I suggest 9.6

Re: Domain based on TIMEZONE WITH TIME ZONE

2018-05-10 Thread Vick Khera

On Thu, May 10, 2018 at 7:31 AM, Ben Hood  wrote:

> Or are we saying that domains are one way of achieving the timestamp
> hygiene, but equally, you can get the same result as described above?
>

The *only* way to have timestamp hygiene is to require them to have time
zones at all times, even if that time zone is UTC. Any other representation
of a time is ambiguous without context.

Re: When use triggers?

2018-05-18 Thread Vick Khera

On Wed, May 16, 2018 at 6:19 PM, hmidi slim  wrote:

> HI,
>
> I'm working on a microservice application and I avoid using triggers
> because they will not be easy to maintain and need an experimented person
> in database administration to manage them. So I prefer to manage the work
> in the application using ORM and javascript.
> However I want to get some opinions and advices about using triggers: when
> should I use them? How to manage them when there are some problems?
>

I have used triggers to keep audit-logs of changes to certain columns in a
table. For example, I want to know when a customer went "overdue" and then
back to "active". The best place to create that log is in the database
itself, since that also captures any manually updated rows (ie, those
actions not initiated by the application code itself).

I have also used triggers to ensure data consistency and enforce state
diagram transition rules for status columns in a table. These help capture
logic errors in application code. For example, if your state diagram allows
A -> B <-> C, then the trigger would disallow a transition from B or C  to
A, disallow A -> C, but allow C -> B and B -> C and A -> B.

To manage them, we treat them like all DDL changes: everything is done via
SQL script, and those are tracked using our version control software, go
through developer testing then staging testing, then finally production.

Re: Multiple PostgreSQL instances on one machine

2018-06-08 Thread Vick Khera

If they are just test environments, why a whole dedicated cluster per
instance? Just give each a unique name for the database and run it all on
one cluster.

I'd also go back and reconsider why these are separate machines in the
first place and make sure you're not violating any assumptions that were
made.


On Fri, Jun 8, 2018 at 4:29 PM, Tony Sullivan 
wrote:

> I am trying to consolidate some machines in my server room particularly in
> the testing environment and I was hoping someone could point me in the
> right direction.
>
> I currently have three machines running PostgreSQL for testing purposes.
> Each week a backup is made of the production database and this is deployed
> onto these three machines. None of these machines is any where near
> capacity. I would like to host all three instances on one machine.
>
> I know that "initdb" can be used to create additional installations, but
> the part I am unsure about is the tablespace. The production database has
> a tablespace defined that resides on its SSD. When I use the production
> backup, I have to create a mount point on the test machine with the same
> name as the one on the production database. I am not certain how I would
> handle this situation if I am serving three separate instances of that
> database from one computer.
>
>
>

Re: Append only replication over intermittent links (with local only delete?)

2018-06-15 Thread Vick Khera

On Thu, Jun 14, 2018 at 8:04 AM, Uri Braun  wrote:

> To be clear, the car device will surely add data -- append rows -- and may
> very occasionally add a new table. I would expect the only case where a
> delete may occur -- other than culling old data -- is during recovery of a
> partial write or transaction rollbacks. The time requirements are loose,
> but I would like replication consistency within hours (once connected).
>
> I'm wondering what replication scheme is appropriate for this use case and
> how to configure it appropriately.
>
>
I would recommend a trigger based approach where every table has an INSERT
trigger on it that adds the new data to a log table to store the data that
needs to be pushed up to the central server. When the system detects a
network connection, it sends the data and deletes them from the log table
in a transaction (begin, select, send, delete, get ack, commit).

Re: Can PostgreSQL create new WAL files instead of reusing old ones?

2018-06-22 Thread Vick Khera

On Wed, Jun 20, 2018 at 1:35 PM, Jerry Jelinek 
wrote:

> As Dave described in his original email on this topic, we'd like to avoid
> recycling WAL files since that can cause performance issues when we have a
> read-modify-write on a file that has dropped out of the cache.
>
> I have implemented a small change to allow WAL recycling to be disabled.
> It is visible at:
> https://cr.joyent.us/#/c/4263/
>
> I'd appreciate getting any feedback on this.
>
>
This looks so simple, yet so beneficial. Thanks for making it. Is there
some other mechanism that already cleans out the old unneeded WAL files?  I
recall there is something that does it when you start up after changing the
number of files to keep, but I don't recall if that is tested over some
loop regularly.

Is there some way to make it auto-detect when it should be enabled? If not,
please document that it should be used on ZFS and any other file system
with CoW properties on files.

Re: Database name with semicolon

2018-06-27 Thread Vick Khera

On Wed, Jun 27, 2018 at 10:25 AM, Pavel Stehule 
wrote:

> Hi
>
>
> modern Postgresql has not any problems with special chars
>
>
I think the issue is that he cannot create a proper connection string to
pass to the ODBC driver, since semi-colon is used as a delimiter for the
fields.

Re: dumping only table definitions

2018-06-30 Thread Vick Khera

On Fri, Jun 29, 2018 at 6:39 PM, Alvaro Herrera 
wrote:

> On 2018-Jun-29, Kevin Brannen wrote:
>
> > I'm trying to figure out how to dump only the table definitions, well
> those and things they need directly, like sequences & types. What I do NOT
> want are all the millions (not literally but it feels like it :)) of
> functions we have. Triggers would be all right if I must, as we only have a
> few of those.
>
> Try "pg_dump -Fc" followed by pg_restore -l.  You can edit the list
> emitted there, then use it with pg_restore -L.
>

>From my experience, this is the right solution.

Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Vick Khera

On Tue, Jul 10, 2018 at 1:13 PM, Ravi Krishna  wrote:

> >
> > Did you include the time to CREATE INDEX after the COPY or is the 1:14
> only for the COPY stage?
>
> Yes.
>
> Time taken to load 47 mil rows with all 16 indexes intact:  14+ hrs
> Time taken to load the same after dropping index and then loading and
> finally creating 16 indexes: 1 hr 40 min
>


https://fle.github.io/temporarily-disable-all-indexes-of-a-postgresql-table.html

Re: User documentation vs Official Docs

2018-07-16 Thread Vick Khera

On Mon, Jul 16, 2018 at 5:44 PM, Joshua D. Drake 
wrote:

> On 07/16/2018 02:22 PM, Joshua D. Drake wrote:
>
>> On 07/16/2018 01:59 PM, Stephen Frost wrote:
>>
>>>
>>> We have a place for this to go, in the official docs, already split out
>>> by version, and it starts here:
>>>
>>> https://www.postgresql.org/docs/10/static/tutorial-start.html
>>>
>>> Adding more to that certainly sounds good to me.
>>>
>>
>> I didn't know that existed. I will take a look.
>
>
> Well now that I see it is just the "tutorial" in the official docs, I
> disagree that is the correct place to start. At least not if it is going to
> ship with the 1000+ pages of documentation we already have. What I am
> envisioning is something with a strong SEO that gives pointed and direct
> information about solving a specific problem. A tutorial book could
> certainly do that as could (what I am really talking about) is Postgres
> recipes or something like that.
>
>

I didn't know it existed either, mostly because I know how to ask google to
do things, and the things I need to know are not covered here (yet). This
does seem to me to be the ideal place to add more how to documentation to
augment all the reference docs we have.

As for some "strong SEO" I think already the top hit for almost everything
I seek postgres related is the official manual, so it seems to have good
SEO. The only big improvement would be somehow to tell google to only show
me the newest version of the manual, not all of the older ones too, for the
same page.

Re: PG backup check

2018-07-16 Thread Vick Khera

On Mon, Jul 16, 2018 at 8:35 PM, Ravi Krishna 
wrote:

> Not sure I am following this.  Did Google release this because PG backups
> are not 100% reliable or the data corruption can occur due to hardware
> failure.
>
> http://www.eweek.com/cloud/google-releases-open-source-
> tool-that-checks-postgres-backup-integrity?utm_medium=
> email&utm_campaign=EWK_NL_EP_20180713_STR5L2&dni=450493554&rni=24844166
>

The rule of thumb is you should verify that your backups can be restored
before you consider them reliable. You don't want to end up in a disaster
recovery situation and find that your backups have been corrupted in any
way shape or form. Many things can break your backups, especially if they
are automated and have many steps involved.

One way to do this is to actually restore the data on a different system
and compare. This tool appears to do the comparison for you directly to the
dump, increasing confidence that it is reliable.

Re: Order in which tables are dumped

2018-07-25 Thread Vick Khera

On Wed, Jul 25, 2018 at 11:15 AM, Ron  wrote:

> Hi,
>
> v8.4 if it matters.
>
> It looked like the tables were being backed up in alphanumeric order, but
> now I see that table "docformat" is being dumped *after* "doc_image".
>
> Are there some other rules besides alphabetical sorting?
>

Is there some concern about the order? Lower case f comes after _ in ascii.

The only time it could possibly matter is on restore when there are
references for foreign keys, but on a restore those are all done after the
data is restored.

Re: Settings for fast restores

2018-08-01 Thread Vick Khera

On Wed, Aug 1, 2018 at 2:03 AM, Ron  wrote:

> Hi,
>
> http://www.databasesoup.com/2014/09/settings-for-fast-pgrestore.html
>
> shared_buffers = 1/2 of what you'd usually set
> maintenance_work_mem = 1GB-2GB
> wal_level = minimal
> full_page_writes = off
> wal_buffers = 64MB
> checkpoint_segments = 256 or higher
> max_wal_senders = 0
> wal_keep_segments = 0
>
> How many of these 4 year old setting recommendations are still valid for
> 9.6?
>

They all look still valid to me. I personally also set fsync=off since I
can always start over if the machine crashes and corrupts the data.

Re: Vacuum process waiting on BufferPin

2018-08-13 Thread Vick Khera

On Mon, Aug 13, 2018 at 5:19 PM, Don Seiler  wrote:

> On Mon, Aug 13, 2018 at 4:15 PM, Alvaro Herrera 
> wrote:
>
>>
>> Maybe you had a cursor that was not fully scanned before the session was
>> left idle -- as I recall, those can leave buffers pinned.
>>
>
> I don't quite follow this. What circumstances would lead to this situation?
>

BEGIN WORK;
DECLARE CURSOR ... ;
FETCH ...;  -- for some number of fetches, which does not reach the end of
the cursor.

then just sit there idle, without having closed the cursor or fetching
anything more.

Re: Vacuum process waiting on BufferPin

2018-08-14 Thread Vick Khera

On Tue, Aug 14, 2018 at 9:21 AM, Don Seiler  wrote:

> On Mon, Aug 13, 2018 at 5:55 PM, Vick Khera  wrote:
>
>> On Mon, Aug 13, 2018 at 5:19 PM, Don Seiler  wrote:
>>
>>>
>>> I don't quite follow this. What circumstances would lead to this
>>> situation?
>>>
>>
>> BEGIN WORK;
>> DECLARE CURSOR ... ;
>> FETCH ...;  -- for some number of fetches, which does not reach the end
>> of the cursor.
>>
>> then just sit there idle, without having closed the cursor or fetching
>> anything more.
>>
>
> So the fix in that case would be to ensure that they CLOSE the cursors
> when done with them?
>
>
The general fix is to never sit idle in transaction, but this specific case
closing the cursor seems like it will also do it.

How to schema-qualify "IS DISTINCT FROM" on trigger definition? (I created a db that cannot be dump/restored)

2024-02-21 Thread Vick Khera

I've created a database which my vendor (Supabase) cannot
dump/restore/upgrade. Ultimately, it comes down to this trigger statement,
and the fact that the underlying operations needed to perform the `IS
DISTINCT FROM` comparison in the WHEN clause need to be found in the
`public` schema. During the restore, the search path is empty, so it fails.

Full example file is below.

The trigger:

CREATE TRIGGER record_content_update BEFORE UPDATE OF content, embedding ON
t1
  FOR EACH ROW
  WHEN (((new.content <> old.content) OR (new.embedding IS DISTINCT FROM
old.embedding)))
  EXECUTE FUNCTION t1_content_update_handler();

The content field is a JSONB, and the embedding field is a vector from the
pg_vector extension.

I make a backup using pg_dump, and upon restore it errors out with this:

psql:dump1.sql:122: ERROR:  operator does not exist: public.vector =
public.vector
LINE 1: ... (((new.content <> old.content) OR (new.embedding IS DISTINC...
 ^
HINT:  No operator matches the given name and argument types. You might
need to add explicit type casts.

The ^ is under the "IS DISTINCT" in case the formatting makes it unclear.

If I make the operator just `<>` the pg_dump properly schema-qualifies it
in the dump as

new.embedding OPERATOR(public.<>) old.embedding

but I need to account for the NULLs. I cannot find a way to schema-quailify
the `IS DISTINCT FROM` comparison.

How do I make this trigger definition survive pg_dump/pg_restore? I cannot
alter the dump file between the steps.

I'm running version: psql (PostgreSQL) 15.5. For my tests I'm on FreeBSD
14, but Supabase runs whatever version of linux they do and Pg version 15.1.

Full reproduction steps:

Save the file below as create.sql then run these commands:

createdb -U postgres t1
psql -U postgres -f create.sql t1
pg_dump -U postgres t1 > dump.sql
createdb -U postgres t2
psql -U postgres -f dump.sql t2

On the last step, the above referenced error will occur.

Is there a way to fix this, or is it a "don't do that" situation?

The only workaround I can think of is to move the IS DISTINCT FROM test to
be inside my trigger function.

--- create.sql file ---
CREATE EXTENSION IF NOT EXISTS "vector";

CREATE TABLE t1 (
  id SERIAL PRIMARY KEY,
  content JSONB DEFAULT '{}'::JSONB NOT NULL,
  embedding vector
);

CREATE FUNCTION t1_content_update_handler() RETURNS TRIGGER
   LANGUAGE plpgsql
   AS $$
 BEGIN
   RAISE INFO '% trigger called for id=%', TG_TABLE_NAME, OLD.id;
   RETURN NEW;
 END;
$$;

CREATE TRIGGER record_content_update BEFORE UPDATE OF content, embedding ON
t1
  FOR EACH ROW
  WHEN (((new.content <> old.content) OR (new.embedding IS DISTINCT FROM
old.embedding)))
  EXECUTE FUNCTION t1_content_update_handler();
--- end ---

Re: How to schema-qualify "IS DISTINCT FROM" on trigger definition? (I created a db that cannot be dump/restored)

2024-02-22 Thread Vick Khera

On Wed, Feb 21, 2024 at 4:27 PM Tom Lane  wrote:

> For the moment, I think the only feasible solution is for your trigger
> function to set the search path it needs by adding a "SET search_path
> = whatever" clause to the function's CREATE command.

The error is not in the function, it is the WHEN clause of the trigger.
There's no way to set a search path on the trigger as far as I see.

The only option I see is to remove the WHEN clause on the trigger and wrap
my function with an IF with those same conditions. I hope this will not
result in any noticeable difference in speed.

It is rather unfortunate that one can end up with a schema that a backup
with pg_dump cannot be restored. Feel free to keep my example for
regression testing when postgres does grow the ability to schema-qualify
such operators.

Re: Performance issue debugging

2024-02-22 Thread Vick Khera

On Thu, Feb 22, 2024 at 4:03 PM veem v  wrote:

> Hi All,
> As i understand we have pg_stats_activity which shows the real time
> activity of sessions currently running in the database. And the
> pg_stats_statement provides the aggregated information of the historical
> execution of all the queries in the database. But I don't see any sampling
> or timing information in those views. For example at a certain point in
> time in the past , what queries were getting executed in the database and
> overall wait events etc.
>
> So is there any other view which provides such information to dig into the
> past to diagnose any historical performance issues ? or should we create
> our own table and flush the information from the pg_stats_activity view to
> that with the current timestamp and that would be helpful in analyzing
> performance issues or any extension available for such?
>
>
Look at the auto_explain setting. Taking a random interval snapshot of
running queries likely will not teach you anything useful.


> Also even the explain analyze can only provide the exact run time stats of
> a completed query. If we want to see what's going on for a long running
> query and at what step in the execution path the query is spending most
> resources and time when it keeps running in the database, is there any
> available option in postgres database?  for e.g. in a SELECT query index
> access path if taking most of the time OR in an INSERT query INDEX block is
> causing contention while inserting data into the table , how would we be
> able to find that for a currently running query or a for a historical query?
>

You can see locking contention in the pg_locks table. In my experience I
rarely ever saw anything in there even when I was cranking tens of millions
of inserts and updates per day. I don't think there's anything for
historical queries or to probe anything more about a currently running
query's progress.

Take some time to think about your queries and how you can reduce any
locking they need. If you do need some locking, consider using the FOR
UPDATE clause in SELECT to limit what you do lock.

Re: How to schema-qualify "IS DISTINCT FROM" on trigger definition? (I created a db that cannot be dump/restored)

2024-02-23 Thread Vick Khera

On Thu, Feb 22, 2024 at 5:06 PM Erik Wienhold  wrote:

> On 2024-02-22 22:14 +0100, Vick Khera wrote:
> > On Wed, Feb 21, 2024 at 4:27 PM Tom Lane  wrote:
> >
> > > For the moment, I think the only feasible solution is for your trigger
> > > function to set the search path it needs by adding a "SET search_path
> > > = whatever" clause to the function's CREATE command.
> >
> >
> > The error is not in the function, it is the WHEN clause of the trigger.
> > There's no way to set a search path on the trigger as far as I see.
> >
> > The only option I see is to remove the WHEN clause on the trigger and
> wrap
> > my function with an IF with those same conditions. I hope this will not
> > result in any noticeable difference in speed.
>
> You may also try the equivalent CASE expression in the WHEN clause.
>
>
> https://wiki.postgresql.org/wiki/Is_distinct_from#Writing_with_CASE_statements
>

Nice. It makes for a big ugly trigger statement, but probably my better
choice.  I was considering doing this but didn't want to risk making my own
interpretation.

Re: To all who wish to unsubscribe

2017-11-20 Thread Vick Khera

Did the list software change? All my messages from here are not being properly 
auto-files by the filter I have set up. 

> On Nov 20, 2017, at 13:48, Martin Fernau  wrote:
> 
> PLS unsubscribe via https://lists.postgresql.org/manage/
> 
> Today I received dozens of unsubscribe mails, spamming my mail account :-)
> 
> thx
> 
>> Am 20.11.2017 um 19:42 schrieb Ramalingam, Sankarakumar:
>>  
>>  
>> Thank you!
>>  
>> Kumar Ramalingam
>>  
>> From: Zacher, Stacy [mailto:szac...@mcw.edu] 
>> Sent: Monday, November 20, 2017 12:56 PM
>> To: pgsql-general@lists.postgresql.org
>> Subject: unsubscribe
>>  
>>  
>> The information contained in this e-mail and in any attachments is intended 
>> only for the person or entity to which it is addressed and may contain 
>> confidential and/or privileged material. Any review, retransmission, 
>> dissemination or other use of, or taking of any action in reliance upon, 
>> this information by persons or entities other than the intended recipient is 
>> prohibited. This message has been scanned for known computer viruses.
>> 
>> -- 
>> This email was Malware checked by UTM 9. http://www.sophos.com
> 
> -- 
> FERNAUSOFT GmbH
> Gartenstraße 42 - 37269 Eschwege
> 
> Telefon (0 56 51) 95 99-0
> Telefax (0 56 51) 95 99-90
> 
> eMail martin.fer...@fernausoft.de
> Internet http://www.fernausoft.de
> 
> Handelsregister Eschwege, HRB 1585
> Geschäftsführer: Axel Fernau, Ulrich Fernau, Martin Fernau
> Steuernummer 025 233 00041
> USt-ID-Nr. DE 178 554 622
> 
>

Re: To all who wish to unsubscribe

2017-11-20 Thread Vick Khera

On Mon, Nov 20, 2017 at 2:11 PM, Tom Lane  wrote:

> Vick Khera  writes:
> > Did the list software change? All my messages from here are not being
> properly auto-files by the filter I have set up.
>
> Yes - did you not see either of the "migration to pglister" messages?
> There's a summary of the changes at
>

Thanks, Tom. I see it now :)

I wonder why there was such a rush of unsubscribe requests following the
change... maybe people were filtering it instead of unsubscribing before.

Re: migrations (was Re: To all who wish to unsubscribe)

2017-11-22 Thread Vick Khera

On Wed, Nov 22, 2017 at 9:07 AM, Stephen Frost  wrote:

> Greetings everyone,
>
> * Mike Sofen (mso...@runbox.com) wrote:
> > Confirmed, I did not get a reset email either.  Also, I had subscribed
> to the digest version and now I get individual emails - which is why I
> needed to login, but my password no longer worked.  Sigh.  Mike
>
> I've replied to Mike directly off-list, but wanted to let others know
> who might be concerned about the reset link- it *does* work, but only if
> you have an account already in the system.
>
> It's possible to be subscribed to the PostgreSQL mailing lists without
> having a community account.  If you find that you don't get an email
> after going through the 'reset password' link, try creating a new
> account instead.
>

This sounds to me like you're going to run afoul of CAN-SPAM unsubscribe
rules. You should re-think this workflow.

Re: a back up question

2017-12-06 Thread Vick Khera

On Wed, Dec 6, 2017 at 7:52 AM, Martin Mueller <
martinmuel...@northwestern.edu> wrote:

>
> The objective is to create a  backup from which I can restore any or all
> tables in the event of a crash. In my case,  I use Postgres for my own
> scholarly purposes. Publications of whatever kind are not directly made
> public via the database. I am my only customer, and a service interruption,
> while a nuisance to me, does not create a crisis for others. I don’t want
> to lose my work, but a service interruption of a day or a week is no big
> deal.
>

I'd stick with pg_dump for sure. Two main choices depending on how big your
database is and how fast your disks are: 1) "c" format into a single flat
compressed file from which you can restore; 2) "d" format which you would
then subsequently need to compress and tar for easy tracking and off-site
copying. The only real advantage to "d" format is that you can parallelize
the dumps if you have enough spare I/O bandwidth.

For my backups on a production database serving thousands of customers per
day (mostly in the US) on a web app, I just did a "c" format pg_dump
nightly around 3am US Eastern time.  It was our low time, and the impact on
the database server was not significant since it had more RAM than the size
of the database on disk (256GB RAM vs 100GB disk size including indexes).
The backups are on a different machine which connects via LAN to the DB
server and writes to its own local disk then copied that to an off-site
server. Before I had such beefy hardware, I would do the dump from a
replica which was updated using Slony1 software. The pg_dump backups were
for disaster recovery and customer error recovery, so I kept about 2 weeks'
worth of them.

Since you have no other consumers of your data, just use a simple "c"
format dump however often you like, then copy those off-site. Easy peasy.

Re: psql format result as markdown tables

2018-01-14 Thread Vick Khera

How does this work for you? I use this to get tables to insert into my
wiki, which are basically the format you want. I just delete the extra
lines I don't want at the end.

vk=> SELECT * FROM (values(1,2),(3,4)) as t;
 column1 | column2
-+-
   1 |   2
   3 |   4
(2 rows)

Time: 37.888 ms
vk=> \pset border 2
Border style is 2.
vk=> SELECT * FROM (values(1,2),(3,4)) as t;
+-+-+
| column1 | column2 |
+-+-+
|   1 |   2 |
|   3 |   4 |
+-+-+
(2 rows)

For you it looks like you need to change the "+" to "|" and it will work
and delete the first and last lines. I don't know if you can change that
with some other \pset setting.

On Sat, Jan 13, 2018 at 4:50 AM, Nicolas Paris  wrote:

> Hello
>
> I wonder if someone knows how to configure psql to output results as
> markdown tables.
> Then instead of :
>
> SELECT * FROM (values(1,2),(3,4)) as t;
>  column1 | column2
> -+-
>1 |   2
>3 |   4
>
> Get the result as :
> SELECT * FROM (values(1,2),(3,4)) as t;
> | column1 | column2|
> |-||-
> |   1 |   2|
> |   3 |   4|
>
> Thanks by advance
>
>

Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Vick Khera

Try random page cost 1.1. Way back when I started using SSD we had a
discussion here and came to the conclusion that it should be ever so
slightly higher than sequential page cost.

It is very hard to read your query plans (maybe gmail is wrapping them
funny or you need to use a fixed font on them or share them from
https://explain.depesz.com), but they do look substantially different. My
guess is that with the random page cost = sequential page cost you are
tricking Pg into using more sequential scans than index searches.

Re: shared_buffers 8GB maximum

2018-02-18 Thread Vick Khera

On Sun, Feb 18, 2018 at 7:41 AM, Vitaliy Garnashevich <
vgarnashev...@gmail.com> wrote:

>
> In the case when shared_buffers cover most of RAM, most of writes should
> happen by checkpointer, and cache hit ratio should be high. So a
> hypothetical question: Could shared_buffers=200GB on a 250 GB RAM server
> ever be a reasonable setting? (assuming there are no other applications
> running except postgres, and 50GB is enough for allocating
> work_mem/maintenance_work_mem and for serving queries)

That amount of shared buffers is not sensible. I found on a 256 GB box that
anything over about 50-100GB was counter productive. That was a FreeBSD
system where I ran the database on top of ZFS, so there was a lot of data
also stored in the ARC (memory cache). There is a setting in postgres to
tell it how much RAM your system is using for the disk cache, so set that
to a fair estimate of how much your system will use. I set mine to 50% of
RAM. I did not limit the cache at the OS level since it is good about
giving up that memory for the needs of the running processes.

38 matches

Mail list logo