Snowball Stemming Greek support not included in Release 12 beta

2019-06-07 Thread Tasos
Hi to all,

 

Since Oct 2018 - Greek stemming algorithm contributed by Oleg Smirnov is in inclued in the Snowball project.

 

I believe it can be included in the Release 12 of postgres 

 

Best Regards,

Bard




Re: Snowball Stemming Greek support not included in Release 12 beta

2019-06-07 Thread Achilleas Mantzios

On 7/6/19 11:31 π.μ., Tasos wrote:

Hi to all,
Since Oct 2018 - Greek stemming algorithm contributed by Oleg Smirnov is in 
inclued in the Snowball project.


Για σου Τασσο, μπορείς να εξηγήσεις τι εστί Snowball?


I believe it can be included in the Release 12 of postgres
Best Regards,
Bard



--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: Snowball Stemming Greek support not included in Release 12 beta

2019-06-07 Thread Tasos
Καλησπέρα Αχιλλέα και ευχαριστώ για την απάντηση, το snowball -> https://snowballstem.org/ με τα δικά του λόγια

 

"Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval."

 

Ερώτηση, νόμιζα είναι παγκόσμια λίστα μιλάμε αγγλικά ή ελληνικά;;;

 

Tasos

 
 

Sent: Friday, June 07, 2019 at 11:44 AM
From: "Achilleas Mantzios" 
To: pgsql-general@lists.postgresql.org
Subject: Re: Snowball Stemming Greek support not included in Release 12 beta



On 7/6/19 11:31 π.μ., Tasos wrote:



Hi to all,

 

Since Oct 2018 - Greek stemming algorithm contributed by Oleg Smirnov is in inclued in the Snowball project.

 



Για σου Τασσο, μπορείς να εξηγήσεις τι εστί Snowball?
 


I believe it can be included in the Release 12 of postgres 

 

Best Regards,

Bard


 

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt









Re: Snowball Stemming Greek support not included in Release 12 beta

2019-06-07 Thread Achilleas Mantzios

On 7/6/19 11:48 π.μ., Tasos wrote:

Καλησπέρα Αχιλλέα και ευχαριστώ για την απάντηση, το snowball -> 
https://snowballstem.org/ με τα δικά του λόγια
"*Snowball* is a small string processing language designed for creating stemming 
algorithms for use in Information Retrieval."
Ερώτηση, νόμιζα είναι παγκόσμια λίστα μιλάμε αγγλικά ή ελληνικά;;;


oopps I just thought the greek list got resurrected, sorry, anyways thanks.


Tasos
*Sent:* Friday, June 07, 2019 at 11:44 AM
*From:* "Achilleas Mantzios" 
*To:* pgsql-general@lists.postgresql.org
*Subject:* Re: Snowball Stemming Greek support not included in Release 12 beta
On 7/6/19 11:31 π.μ., Tasos wrote:

Hi to all,
Since Oct 2018 - Greek stemming algorithm contributed by Oleg Smirnov is in 
inclued in the Snowball project.


Για σου Τασσο, μπορείς να εξηγήσεις τι εστί Snowball?

I believe it can be included in the Release 12 of postgres
Best Regards,
Bard

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: Snowball Stemming Greek support not included in Release 12 beta

2019-06-07 Thread Kostis Michail
 Greek stemming algorithm is already available at the snowballstem site.
(Oct 2018)

Lets hope the integration happens before the next release.

On Fri, Jun 7, 2019 at 11:49 AM Tasos  wrote:

> Καλησπέρα Αχιλλέα και ευχαριστώ για την απάντηση, το snowball ->
> https://snowballstem.org/ με τα δικά του λόγια
>
> "*Snowball* is a small string processing language designed for creating
> stemming algorithms for use in Information Retrieval."
>
> Ερώτηση, νόμιζα είναι παγκόσμια λίστα μιλάμε αγγλικά ή ελληνικά;;;
>
> Tasos
>
>
> *Sent:* Friday, June 07, 2019 at 11:44 AM
> *From:* "Achilleas Mantzios" 
> *To:* pgsql-general@lists.postgresql.org
> *Subject:* Re: Snowball Stemming Greek support not included in Release 12
> beta
> On 7/6/19 11:31 π.μ., Tasos wrote:
>
> Hi to all,
>
> Since Oct 2018 - Greek stemming algorithm contributed by Oleg Smirnov is
> in inclued in the Snowball project.
>
>
>
> Για σου Τασσο, μπορείς να εξηγήσεις τι εστί Snowball?
>
>
> I believe it can be included in the Release 12 of postgres
>
> Best Regards,
> Bard
>
>
>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>


delimeters psql /CSV

2019-06-07 Thread paul.malm
Hi, I have a problem with psql and CSV.

C:\Users\an\bin>C:\Tmp\psql -h 10.211.43.22 -p 5432 -U postgres -d aatest -w  
-c  "\copy public.""Bayern"" FROM 'C:\Users\an\test\Bayern.csv' WITH DELIMITER 
';' CSV"
Error:  extra data after expected last column
CONTEXT:  COPY Bayern, row 1: 
"48.456;11.8008333;Anglberg/Amper;Industrial 
plant;722;220;220;2133;0;0;Undefined..."

I have a table (Bayern) with all columns in the right format. There is a 
column-value which has a '/' in the string (Anglberg/Amper). I think that '/' 
is taken for a delimeter, since when I replace / with 'white space' it works.
I use ';' as delimeter. I have UTF-8 encoding in the db. Is there a way around 
this or do I have to go through all csv files and change '/' to white space?

Kind regards,
Paul


Re: Snowball Stemming Greek support not included in Release 12 beta

2019-06-07 Thread Adrien Nayrat
On 6/7/19 10:31 AM, Tasos wrote:
> Hi to all,
>  
> Since Oct 2018 - 
> Greekhttps://www.postgresql.org/message-id/E1g4YQa-0003tf-Gr%40gemulon.postgresql.org
>  stemming algorithm contributed by Oleg Smirnov is in
> inclued in the Snowball project.
>  
> I believe it can be included in the Release 12 of postgres 
>  
> Best Regards,
> Bard

Hi,

Postgres' stemmer has been synced with Snowball upstream, but it was before 
October:

https://www.postgresql.org/message-id/E1g4YQa-0003tf-Gr%40gemulon.postgresql.org

As mentionned, this will add arabic, indonesian, irish, lithuanian,
nepali, and tamil.

We are now in 12 beta 1 and far after feature freeze, so I am afraid your
request to add greek could only be considered for v13. Sorry.

Regards,

-- 
Adrien



signature.asc
Description: OpenPGP digital signature


Re: Snowball Stemming Greek support not included in Release 12 beta

2019-06-07 Thread Tasks
😢 is there a solution to that? 
Is it possible that we add it ourselves ???

Has anyone worked with this extension -> 
https://github.com/postgrespro/snowball_ext

Tassos

7 Ιουν 2019, 1:12 μμ, ο χρήστης «Adrien Nayrat » 
έγραψε:

>> On 6/7/19 10:31 AM, Tasos wrote:
>> Hi to all,
>>  
>> Since Oct 2018 - 
>> Greekhttps://www.postgresql.org/message-id/E1g4YQa-0003tf-Gr%40gemulon.postgresql.org
>>  stemming algorithm contributed by Oleg Smirnov is in
>> inclued in the Snowball project.
>>  
>> I believe it can be included in the Release 12 of postgres 
>>  
>> Best Regards,
>> Bard
> 
> Hi,
> 
> Postgres' stemmer has been synced with Snowball upstream, but it was before 
> October:
> 
> https://www.postgresql.org/message-id/E1g4YQa-0003tf-Gr%40gemulon.postgresql.org
> 
> As mentionned, this will add arabic, indonesian, irish, lithuanian,
> nepali, and tamil.
> 
> We are now in 12 beta 1 and far after feature freeze, so I am afraid your
> request to add greek could only be considered for v13. Sorry.
> 
> Regards,
> 
> -- 
> Adrien
> 


Re: Snowball Stemming Greek support not included in Release 12 beta

2019-06-07 Thread Adrien Nayrat
On 6/7/19 12:19 PM, Tasks wrote:
> 😢 is there a solution to that? 
> Is it possible that we add it ourselves ???
> 
> Has anyone worked with this extension -> 
> https://github.com/postgrespro/snowball_ext
> 
> Tassos
> 


I did not played with this extension, maybe you can add greek to
https://github.com/postgrespro/snowball_ext/tree/master/libstemmer

You should ask to the authors of this extension ;)
https://github.com/postgrespro/snowball_ext#authors

-- 
Adrien



signature.asc
Description: OpenPGP digital signature


Re: Postgres 10.7 Systemd Startup Issue

2019-06-07 Thread Francisco Olarte
Tom:

On Fri, Jun 7, 2019 at 2:19 AM Tom Lane  wrote:
> Francisco Olarte  writes:
> > On Thu, Jun 6, 2019 at 6:57 PM Kelly, Kevin  wrote:

> > and is running, notify means it does systemd integration via socket,
> > with IIRC postgres does not.
>
> I might be confusing this with some other issue, but I think PG does
> support systemd notification if you build it with the --with-systemd
> configuration option.  This is recommended if you're using systemd
> to launch PG, because systemd's other options for figuring out the
> service state work substantially less well.

Upz, totally my fault. Oviously IDNRC.

maybe them he should check if the program HAS been built this way ( as
it's optional, from what I see ).

Francisco Olarte.




RE: Copyright and bundling the installer

2019-06-07 Thread Kyle Lee
As I haven't heard regarding the runtime error, I'd like to ask again for the 
question #2.

Does anyone know that including Postgresql setup exe and pgAdmin setup exe in 
my installer doesn't violate the copyright/licensing policy?


Re: delimeters psql /CSV

2019-06-07 Thread Ron

On 6/7/19 5:01 AM, paul.m...@lfv.se wrote:


Hi, I have a problem with psql and CSV.

C:\Users\an\bin>C:\Tmp\psql -h 10.211.43.22 -p 5432 -U postgres -d aatest 
-w  -c  "\copy public.""Bayern"" FROM 'C:\Users\an\test\Bayern.csv' WITH 
DELIMITER ';' CSV"


Error:  extra data after expected last column

CONTEXT:  COPY Bayern, row 1: 
"48.456;11.8008333;Anglberg/Amper;Industrial 
plant;722;220;220;2133;0;0;Undefined..."


I have a table (Bayern) with all columns in the right format. There is a 
column-value which has a ‘/’ in the string (Anglberg/Amper). I think that 
‘/’ is taken for a delimeter, since when I replace / with ‘white space‘ it 
works.


I use ‘;’ as delimeter. I have UTF-8 encoding in the db. Is there a way 
around this or do I have to go through all csv files and change ‘/’ to 
white space?




What if you escape the slashes with backslashes?

--
Angular momentum makes the world go 'round.


Strange connection limit problem

2019-06-07 Thread Ict Security
Dear guys,

i have two identical Postgresql 9.6.1 and i need, to test an
application, to reach connection above 1024.

I setup 2400 as max_connections
I raised file-max kernel limit to 65535
I increased shared_buffers
I also increased the value in semaphores in both
And i added this in limits.conf:
* - nofile 99

But - and it is really strange - in machine nr. 1 i can start
thousands of connections while in machine nr. 2 after 1024 connections
Postgres does not accept anything else.

Configuration are the same on the two machines:

ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 63462
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 99
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 99
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

ipcs -l
-- Messages Limits 
max queues system wide = 8192
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
-- Shared Memory Limits 
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398442373116
min seg size (bytes) = 1
-- Semaphore Limits 
max number of arrays = 8192
max semaphores per array = 4096
max semaphores system wide = 2560
max ops per semop call = 64
semaphore max value = 32767


ONLY here i can notice a different:
- machine nr. 1
cat /proc/14994/limit (where 14994 is Postgres' PID):
Max open files1024 4096 files

- machine nr. 2
cat /proc/8459/limits (where 8459 is Postgres' PID)
Max open files99   99   files

Maybe '99' is an out of range value and it is not considered?
In this case, how can i solve?

Thank you!
Frank




Re: delimeters psql /CSV

2019-06-07 Thread Daniel Verite
Paul Malm wrote:

> C:\Users\an\bin>C:\Tmp\psql -h 10.211.43.22 -p 5432 -U postgres -d aatest -w
> -c  "\copy public.""Bayern"" FROM 'C:\Users\an\test\Bayern.csv' WITH
> DELIMITER ';' CSV"
> Error:  extra data after expected last column
> CONTEXT:  COPY Bayern, row 1:
> "48.456;11.8008333;Anglberg/Amper;Industrial
> plant;722;220;220;2133;0;0;Undefined..."
> 
> I have a table (Bayern) with all columns in the right format. There is a
> column-value which has a '/' in the string (Anglberg/Amper). I think that
> '/' is taken for a delimeter, since when I replace / with 'white space' it
> works.

There's no reason for '/' to be taken as a delimiter.
Can you can share the table definition and the offending line as an
attachment, so that someone can try to reproduce this with the
exact same data?
Also mention your PostgreSQL version and the shell the command
is invoked from (cmd.exe, powershell, something else?) just in case
it matters.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




SV: delimeters psql /CSV

2019-06-07 Thread paul.malm
Sorry, I recognized that it was not "/" who caused the problem. I'm doing this 
in a java program and if I pause the program before running psql.exe and just 
open the csv file in windows with Excel and close it (without doing anything 
else) and then let the program run again. Then it works! 
Could it have something to do  with encoding. I have no idea. 
Kind regards,
Paul

-Ursprungligt meddelande-
Från: Daniel Verite [mailto:dan...@manitou-mail.org] 
Skickat: den 7 juni 2019 14:03
Till: Malm, Paul (Operations AIM)
Kopia: pgsql-general@lists.postgresql.org
Ämne: Re: delimeters psql /CSV

Paul Malm wrote:

> C:\Users\an\bin>C:\Tmp\psql -h 10.211.43.22 -p 5432 -U postgres -d aatest -w
> -c  "\copy public.""Bayern"" FROM 'C:\Users\an\test\Bayern.csv' WITH
> DELIMITER ';' CSV"
> Error:  extra data after expected last column
> CONTEXT:  COPY Bayern, row 1:
> "48.456;11.8008333;Anglberg/Amper;Industrial
> plant;722;220;220;2133;0;0;Undefined..."
> 
> I have a table (Bayern) with all columns in the right format. There is a
> column-value which has a '/' in the string (Anglberg/Amper). I think that
> '/' is taken for a delimeter, since when I replace / with 'white space' it
> works.

There's no reason for '/' to be taken as a delimiter.
Can you can share the table definition and the offending line as an
attachment, so that someone can try to reproduce this with the
exact same data?
Also mention your PostgreSQL version and the shell the command
is invoked from (cmd.exe, powershell, something else?) just in case
it matters.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




RE: Postgres 10.7 Systemd Startup Issue

2019-06-07 Thread Lu, Dan
Thank you so much for pointing that out.

I downloaded the binary from https://ftp.postgresql.org/pub/source/v10.7

After download, I did the following.

 mkdir /hostname/pg/PostgreSQL-10.7_2

./configure --prefix=/hostname/pg/PostgreSQL-10.7_2

make
make install

I now included the option mentioned “./configure 
--prefix=/hostname/pg/PostgreSQL-10.7_2 --with-systemd”.  I also saw that in 
the documentation now “Using Type=notify requires that the server binary was 
built with configure --with-systemd.”.

We will reply back if we run into further issue.

Dan
From: Christopher Browne [mailto:cbbro...@gmail.com]
Sent: Thursday, June 06, 2019 9:19 PM
To: Tom Lane 
Cc: Francisco Olarte ; Kelly, Kevin 
; pgsql-gene...@postgresql.org; Lu, Dan 

Subject: Re: Postgres 10.7 Systemd Startup Issue


On Thu, Jun 6, 2019, 8:19 PM Tom Lane 
mailto:t...@sss.pgh.pa.us>> wrote:
Francisco Olarte mailto:fola...@peoplecall.com>> writes:
> On Thu, Jun 6, 2019 at 6:57 PM Kelly, Kevin 
> mailto:kevin.ke...@sig.com>> wrote:
>> We’re attempting to launch postgres via systemd and noticing that when 
>> invoking via systemctl start postgres.service the prompt never returns. If 
>> we switch to another tty and check the status it shows as:
>> Active: activating (start) since Thu 2019-06-06 09:36:32 EDT; 12min ago
>> If we change the type from notify to forking we see the same behavior. The 
>> instance seems to be up and running just fine, we just never see the active 
>> (running) status as we have come to expect.

> Are you sure you have the postgres.service correctly configured? ( or
> you could post the ExecStart/Type config, or the whole service file ).

> The type tells systemd how to know the service has finished starting
> and is running, notify means it does systemd integration via socket,
> with IIRC postgres does not.

I might be confusing this with some other issue, but I think PG does
support systemd notification if you build it with the --with-systemd
configuration option.  This is recommended if you're using systemd
to launch PG, because systemd's other options for figuring out the
service state work substantially less well.

Yeah, I was one of the reviewers of the patch for this feature.

If the configuration parameter is turned on then Postgres reports in to SystemD 
once it completes any crash recovery work to indicate that the database service 
is up.

That way, if there was a bunch of WAL needing to be processed, services 
depending on Postgres will properly get deferred.

You need to set up the service type in the SystemD service file to
   Type=notify
in order to get this behavior



IMPORTANT: The information contained in this email and/or its attachments is 
confidential. If you are not the intended recipient, please notify the sender 
immediately by reply and immediately delete this message and all its 
attachments. Any review, use, reproduction, disclosure or dissemination of this 
message or any attachment by an unintended recipient is strictly prohibited. 
Neither this message nor any attachment is intended as or should be construed 
as an offer, solicitation or recommendation to buy or sell any security or 
other financial instrument. Neither the sender, his or her employer nor any of 
their respective affiliates makes any warranties as to the completeness or 
accuracy of any of the information contained herein or that this message or any 
of its attachments is free of viruses.


Re: Strange connection limit problem

2019-06-07 Thread Laurenz Albe
Ict Security wrote:
> i have two identical Postgresql 9.6.1 and i need, to test an
> application, to reach connection above 1024.
> 
> I setup 2400 as max_connections

Ugh.  Don't.  Use a connection pool.

> I raised file-max kernel limit to 65535
> I increased shared_buffers
> I also increased the value in semaphores in both
> And i added this in limits.conf:
> * - nofile 99
> 
> But - and it is really strange - in machine nr. 1 i can start
> thousands of connections while in machine nr. 2 after 1024 connections
> Postgres does not accept anything else.

What is the error message in the log?

What do you see in /proc/12345/limits and /proc/12345/cgroup
(assuming that your postmaster PID is 12345)?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Strange connection limit problem

2019-06-07 Thread Giuseppe Sacco
Hi Frank,

Il giorno ven, 07/06/2019 alle 13.58 +0200, Ict Security ha scritto:
> Dear guys,
> 
> i have two identical Postgresql 9.6.1 and i need, to test an
> application, to reach connection above 1024.
> 
> I setup 2400 as max_connections
> I raised file-max kernel limit to 65535
> I increased shared_buffers
> I also increased the value in semaphores in both
> And i added this in limits.conf:
> * - nofile 99
[...]

Is postgresql ran via systemd? If this is the case, limits.conf is
ignored and you should put the limits in the systemd unit file.

Bye,
Giuseppe





Re: Snowball Stemming Greek support not included in Release 12 beta

2019-06-07 Thread Tom Lane
Adrien Nayrat  writes:
> Postgres' stemmer has been synced with Snowball upstream, but it was before 
> October:
> https://www.postgresql.org/message-id/E1g4YQa-0003tf-Gr%40gemulon.postgresql.org

> We are now in 12 beta 1 and far after feature freeze, so I am afraid your
> request to add greek could only be considered for v13. Sorry.

Yeah, it's too late to consider this for v12.

Also, while I don't think anybody would object to absorbing the upstream
Greek support, there's nonzero work that has to be done to make it happen.
There's nobody here who's particularly tracking what the Snowball project
does, so someone who actually cares about some new feature would need to
prepare and submit a resynchronizing patch.  The above commit and the
referenced discussion threads should give a good example of what has
to be done.  (Hopefully, a resync patch right now would be a good deal
smaller than that one, though ;-))

regards, tom lane




RE: Postgres 10.7 Systemd Startup Issue

2019-06-07 Thread Lu, Dan
Even with the added option “—with-systemd”, it is not working still.

Status remains “   Active: activating (start) since Fri 2019-06-07 09:40:06 
EDT; 13s ago”.

To outline what I did.

1)  Stop PostgreSQL instance running with “/hostname/pg/PostgreSQL-10.7” 
configuration which was compiled without the “—with-systemd” option



2)  Configure PostgreSQL to new location


mkdir /hostname/pg/PostgreSQL-10.7_2

./configure --prefix=/hostname/pg/PostgreSQL-10.7_2

make
make install





mv /hostname/pg/PostgreSQL-10.7 /hostname/pg/PostgreSQL-10.7.OLD



mv /hostname/pg/PostgreSQL-10.7_2 /hostname/pg/PostgreSQL-10.7



3)  Start PostgreSQL instance

Any more suggestion?

Dan

From: Lu, Dan
Sent: Friday, June 07, 2019 8:46 AM
To: 'Christopher Browne' ; Tom Lane 
Cc: Francisco Olarte ; Kelly, Kevin 
; pgsql-gene...@postgresql.org
Subject: RE: Postgres 10.7 Systemd Startup Issue

Thank you so much for pointing that out.

I downloaded the binary from https://ftp.postgresql.org/pub/source/v10.7

After download, I did the following.

 mkdir /hostname/pg/PostgreSQL-10.7_2

./configure --prefix=/hostname/pg/PostgreSQL-10.7_2

make
make install

I now included the option mentioned “./configure 
--prefix=/hostname/pg/PostgreSQL-10.7_2 --with-systemd”.  I also saw that in 
the documentation now “Using Type=notify requires that the server binary was 
built with configure --with-systemd.”.

We will reply back if we run into further issue.

Dan
From: Christopher Browne [mailto:cbbro...@gmail.com]
Sent: Thursday, June 06, 2019 9:19 PM
To: Tom Lane mailto:t...@sss.pgh.pa.us>>
Cc: Francisco Olarte mailto:fola...@peoplecall.com>>; 
Kelly, Kevin 
mailto:kevin.ke...@msx.bala.susq.com>>; 
pgsql-gene...@postgresql.org; Lu, Dan 
mailto:dan...@msx.bala.susq.com>>
Subject: Re: Postgres 10.7 Systemd Startup Issue


On Thu, Jun 6, 2019, 8:19 PM Tom Lane 
mailto:t...@sss.pgh.pa.us>> wrote:
Francisco Olarte mailto:fola...@peoplecall.com>> writes:
> On Thu, Jun 6, 2019 at 6:57 PM Kelly, Kevin 
> mailto:kevin.ke...@sig.com>> wrote:
>> We’re attempting to launch postgres via systemd and noticing that when 
>> invoking via systemctl start postgres.service the prompt never returns. If 
>> we switch to another tty and check the status it shows as:
>> Active: activating (start) since Thu 2019-06-06 09:36:32 EDT; 12min ago
>> If we change the type from notify to forking we see the same behavior. The 
>> instance seems to be up and running just fine, we just never see the active 
>> (running) status as we have come to expect.

> Are you sure you have the postgres.service correctly configured? ( or
> you could post the ExecStart/Type config, or the whole service file ).

> The type tells systemd how to know the service has finished starting
> and is running, notify means it does systemd integration via socket,
> with IIRC postgres does not.

I might be confusing this with some other issue, but I think PG does
support systemd notification if you build it with the --with-systemd
configuration option.  This is recommended if you're using systemd
to launch PG, because systemd's other options for figuring out the
service state work substantially less well.

Yeah, I was one of the reviewers of the patch for this feature.

If the configuration parameter is turned on then Postgres reports in to SystemD 
once it completes any crash recovery work to indicate that the database service 
is up.

That way, if there was a bunch of WAL needing to be processed, services 
depending on Postgres will properly get deferred.

You need to set up the service type in the SystemD service file to
   Type=notify
in order to get this behavior



IMPORTANT: The information contained in this email and/or its attachments is 
confidential. If you are not the intended recipient, please notify the sender 
immediately by reply and immediately delete this message and all its 
attachments. Any review, use, reproduction, disclosure or dissemination of this 
message or any attachment by an unintended recipient is strictly prohibited. 
Neither this message nor any attachment is intended as or should be construed 
as an offer, solicitation or recommendation to buy or sell any security or 
other financial instrument. Neither the sender, his or her employer nor any of 
their respective affiliates makes any warranties as to the completeness or 
accuracy of any of the information contained herein or that this message or any 
of its attachments is free of viruses.


Re: Postgres 10.7 Systemd Startup Issue

2019-06-07 Thread Tom Lane
"Lu, Dan"  writes:
> Even with the added option “—with-systemd”, it is not working still.

Hmmm 

> To outline what I did.

> 2)  Configure PostgreSQL to new location
> mkdir /hostname/pg/PostgreSQL-10.7_2
> ./configure --prefix=/hostname/pg/PostgreSQL-10.7_2

Is it just a copy-and-paste mistake that you don't actually have
a --with-systemd option here?

Also, it seems fairly unusual for anyone to build a production
installation with absolutely no configuration options --- it'd
work, but you'd get a really plain-vanilla server.  You might
consider reviewing

https://www.postgresql.org/docs/10/install-procedure.html

to see what other things might make sense to turn on.

regards, tom lane




RE: Postgres 10.7 Systemd Startup Issue

2019-06-07 Thread Lu, Dan
I apologize, it was a copy/paste error.  I did include the option 
"--with-systemd".

This is our first install of postgresql.  I am in the process of getting 
standard in place for future deployment.  We don't have any large footprint of 
postgresql here.  We run mostly Oracle, SQL Server and MySQL in production so 
far.

Dan

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us]
Sent: Friday, June 07, 2019 10:04 AM
To: Lu, Dan 
Cc: Christopher Browne ; Francisco Olarte 
; Kelly, Kevin ; 
pgsql-gene...@postgresql.org
Subject: Re: Postgres 10.7 Systemd Startup Issue

"Lu, Dan"  writes:
> Even with the added option “—with-systemd”, it is not working still.

Hmmm 

> To outline what I did.

> 2)  Configure PostgreSQL to new location mkdir
> /hostname/pg/PostgreSQL-10.7_2 ./configure
> --prefix=/hostname/pg/PostgreSQL-10.7_2

Is it just a copy-and-paste mistake that you don't actually have a 
--with-systemd option here?

Also, it seems fairly unusual for anyone to build a production installation 
with absolutely no configuration options --- it'd work, but you'd get a really 
plain-vanilla server.  You might consider reviewing

https://www.postgresql.org/docs/10/install-procedure.html

to see what other things might make sense to turn on.

regards, tom lane



IMPORTANT: The information contained in this email and/or its attachments is 
confidential. If you are not the intended recipient, please notify the sender 
immediately by reply and immediately delete this message and all its 
attachments. Any review, use, reproduction, disclosure or dissemination of this 
message or any attachment by an unintended recipient is strictly prohibited. 
Neither this message nor any attachment is intended as or should be construed 
as an offer, solicitation or recommendation to buy or sell any security or 
other financial instrument. Neither the sender, his or her employer nor any of 
their respective affiliates makes any warranties as to the completeness or 
accuracy of any of the information contained herein or that this message or any 
of its attachments is free of viruses.


Re: Postgres 10.7 Systemd Startup Issue

2019-06-07 Thread Francisco Olarte
Lu:

On Fri, Jun 7, 2019 at 3:48 PM Lu, Dan  wrote:
> Even with the added option “—with-systemd”, it is not working still.
> Status remains “   Active: activating (start) since Fri 2019-06-07 09:40:06 
> EDT; 13s ago”.
> To outline what I did.
> 1)  Stop PostgreSQL instance running with “/hostname/pg/PostgreSQL-10.7” 
> configuration which was compiled without the “—with-systemd” option
> 2)  Configure PostgreSQL to new location

I assume you also modified and reload the service file to match the
new configuration. ( type notify? )

Francisco Olarte.




Re: pg_dump include/exclude data, was: verify checksums / CREATE DATABASE

2019-06-07 Thread Adrian Klaver

On 6/6/19 6:50 AM, Karsten Hilbert wrote:

Now that it is established that CREATE DATABASE does not
verify checksums on the template I have a followup question.

The current canonical solution (?) for verifying checksums in
an existing database is, to may understanding, to pg_dump it
(to /dev/null, perhaps):

pg_dump --username=... --dbname=... --compress=0 --no-sync 
--format=custom --file=/dev/null

as that will read and verify all blocks related to the dump
of that database.



The question I have is:

The above works with the existing cluster, but would you not also want 
to verify that the blocks written to on the new cluster also are good?



One will be tempted to include options to speed up the
process, say:

 --data-only

which would not output schema definitions. I wonder, however,
whether doing so would allow pg_dump to skip some reads into
the catalog tables, thereby, perhaps not detecting some
corruption in those ?

This question would apply to the following list of options as
far as I can see:

#--no-acl
#--no-comments
#--no-publications
#--no-subscriptions
#--no-security-label

Is my assumption wrong ?


Not sure, though it would seem to me including the above is a relatively 
small incremental cost to the overall dump, assuming a data set of any 
size greater then small.




I don't really expect to just be handed a full answer (unless
someone easily knows offhand) - however, I don't really know
where to look for it. Pointers would be helpful.

Is the only way to know reading the source or suitable
server logs and compare queries between runs with/without
said options ?

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: pg_dump include/exclude data, was: verify checksums / CREATE DATABASE

2019-06-07 Thread Tom Lane
On 6/6/19 6:50 AM, Karsten Hilbert wrote:
>> The current canonical solution (?) for verifying checksums in
>> an existing database is, to may understanding, to pg_dump it
>> (to /dev/null, perhaps):
>> as that will read and verify all blocks related to the dump
>> of that database.

FWIW, that doesn't seem especially canonical from here.  In particular,
pg_dump will not normally result in any examination at all of indexes
on user-defined tables --- it'll just be doing seqscans of the tables
proper.  You could hope for reasonably complete coverage of the system
catalogs along with user tables, but missing out user indexes seems
like a pretty big gap.

The actual solution for this as of v11 is pg_verify_checksums
(renamed to just pg_checksums for v12).  I don't think there's
any really convincing answer before v11.

regards, tom lane




Re: Table partition with primary key in 11.3

2019-06-07 Thread Peter Geoghegan
On Thu, Jun 6, 2019 at 3:00 PM David Rowley
 wrote:
> You may already be aware, but another use case for such variable-width
> identifiers was with indirect indexes as discussed in [1]

Right. I went with global indexes because indirect indexes are
probably a lot more risky as a project. I'd be particularly concerned
about the complexity of VACUUM there, whereas that doesn't seem all
that bad in the case of global indexes.

-- 
Peter Geoghegan




Re: Table partition with primary key in 11.3

2019-06-07 Thread Alvaro Herrera
On 2019-Jun-07, Peter Geoghegan wrote:

> On Thu, Jun 6, 2019 at 3:00 PM David Rowley
>  wrote:
> > You may already be aware, but another use case for such variable-width
> > identifiers was with indirect indexes as discussed in [1]
> 
> Right. I went with global indexes because indirect indexes are
> probably a lot more risky as a project. I'd be particularly concerned
> about the complexity of VACUUM there, whereas that doesn't seem all
> that bad in the case of global indexes.

I think vacuuming for global indexes is somewhat challenging as well :-)
Maybe not as much as for indirect indexes, that's true.

In order for it to be sustainable, I think you'll want to reuse
partition identifiers when the partitions are dropped/detached, which
means that you need a way to ensure that index entries to those
partitions are removed from all indexes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Table partition with primary key in 11.3

2019-06-07 Thread Peter Geoghegan
On Fri, Jun 7, 2019 at 9:10 AM Alvaro Herrera  wrote:
> I think vacuuming for global indexes is somewhat challenging as well :-)
> Maybe not as much as for indirect indexes, that's true.
>
> In order for it to be sustainable, I think you'll want to reuse
> partition identifiers when the partitions are dropped/detached, which
> means that you need a way to ensure that index entries to those
> partitions are removed from all indexes.

I'm not so sure about that. I see your point, but I think that you can
also make the opposite argument. That is, you can make a good case for
asynchronously cleaning up the dead entries that point to a dropped
partition (probably within VACUUM). Perhaps we should offer *both* as
options.

--
Peter Geoghegan




Re: Table partition with primary key in 11.3

2019-06-07 Thread Alvaro Herrera
On 2019-Jun-07, Peter Geoghegan wrote:

> On Fri, Jun 7, 2019 at 9:10 AM Alvaro Herrera  
> wrote:
> > I think vacuuming for global indexes is somewhat challenging as well :-)
> > Maybe not as much as for indirect indexes, that's true.
> >
> > In order for it to be sustainable, I think you'll want to reuse
> > partition identifiers when the partitions are dropped/detached, which
> > means that you need a way to ensure that index entries to those
> > partitions are removed from all indexes.
> 
> I'm not so sure about that. I see your point, but I think that you can
> also make the opposite argument. That is, you can make a good case for
> asynchronously cleaning up the dead entries that point to a dropped
> partition (probably within VACUUM). Perhaps we should offer *both* as
> options.

I was thinking of asynchonously cleaning it up rather than blocking
DROP/DETACH ... which means you need to keep state somewhere.  I don't
think blocking DROP/DETACH is valuable -- a global index that blocks
DROP/DETACH until the index is clean serves no useful purpose.  (You
could think of a multi-step approach with internal transaction commits,
similar to CIC, but you still need a plan to clean that up in case the
server crashes during that operation.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE: Postgres 10.7 Systemd Startup Issue

2019-06-07 Thread Lu, Dan
Hello All,

I ended up removing all configuration.  Re-unzip the binary tar files we 
downloaded.  Re-configure with the option "--with-systemd" and now it is 
working.

We really appreciate all your help!

Dan

-Original Message-
From: Lu, Dan
Sent: Friday, June 07, 2019 10:11 AM
To: 'Tom Lane' 
Cc: Christopher Browne ; Francisco Olarte 
; Kelly, Kevin ; 
pgsql-gene...@postgresql.org
Subject: RE: Postgres 10.7 Systemd Startup Issue

I apologize, it was a copy/paste error.  I did include the option 
"--with-systemd".

This is our first install of postgresql.  I am in the process of getting 
standard in place for future deployment.  We don't have any large footprint of 
postgresql here.  We run mostly Oracle, SQL Server and MySQL in production so 
far.

Dan

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us]
Sent: Friday, June 07, 2019 10:04 AM
To: Lu, Dan 
Cc: Christopher Browne ; Francisco Olarte 
; Kelly, Kevin ; 
pgsql-gene...@postgresql.org
Subject: Re: Postgres 10.7 Systemd Startup Issue

"Lu, Dan"  writes:
> Even with the added option “—with-systemd”, it is not working still.

Hmmm 

> To outline what I did.

> 2)  Configure PostgreSQL to new location mkdir
> /hostname/pg/PostgreSQL-10.7_2 ./configure
> --prefix=/hostname/pg/PostgreSQL-10.7_2

Is it just a copy-and-paste mistake that you don't actually have a 
--with-systemd option here?

Also, it seems fairly unusual for anyone to build a production installation 
with absolutely no configuration options --- it'd work, but you'd get a really 
plain-vanilla server.  You might consider reviewing

https://www.postgresql.org/docs/10/install-procedure.html

to see what other things might make sense to turn on.

regards, tom lane



IMPORTANT: The information contained in this email and/or its attachments is 
confidential. If you are not the intended recipient, please notify the sender 
immediately by reply and immediately delete this message and all its 
attachments. Any review, use, reproduction, disclosure or dissemination of this 
message or any attachment by an unintended recipient is strictly prohibited. 
Neither this message nor any attachment is intended as or should be construed 
as an offer, solicitation or recommendation to buy or sell any security or 
other financial instrument. Neither the sender, his or her employer nor any of 
their respective affiliates makes any warranties as to the completeness or 
accuracy of any of the information contained herein or that this message or any 
of its attachments is free of viruses.


Re: Table partition with primary key in 11.3

2019-06-07 Thread Ravi Krishna
> 
> I was thinking of asynchonously cleaning it up rather than blocking
> DROP/DETACH ... which means you need to keep state somewhere.  I don't
> think blocking DROP/DETACH is valuable -- a global index that blocks
> DROP/DETACH until the index is clean serves no useful purpose.  (You
> could think of a multi-step approach with internal transaction commits,
> similar to CIC, but you still need a plan to clean that up in case the
> server crashes during that operation.)

In Oracle if you have a global unique index and a partition is dropped, the 
index is marked invalid and needs to be rebuild. 
IOW, an outage.

DB2's approach is better. When the partition is dropped, the index entries are 
marked for deletion and it starts a async
process of cleaning it up, which can run into several days if the dropped 
partition is large. But at least the table is online.





Re: Table partition with primary key in 11.3

2019-06-07 Thread Peter Geoghegan
On Fri, Jun 7, 2019 at 12:18 PM Alvaro Herrera  wrote:
> I was thinking of asynchonously cleaning it up rather than blocking
> DROP/DETACH ... which means you need to keep state somewhere.  I don't
> think blocking DROP/DETACH is valuable -- a global index that blocks
> DROP/DETACH until the index is clean serves no useful purpose.  (You
> could think of a multi-step approach with internal transaction commits,
> similar to CIC, but you still need a plan to clean that up in case the
> server crashes during that operation.)

The advantage of synchronous clean-up of global indexes when DROP'ing
a partition are that you can recycle the partition number (or whatever
we end up calling it) immediately and predictably, and you can reuse
the space in indexes occupied by keys from the dropped partition
immediately and predictably. That seems less useful than asynchronous
processing on average, certainly, but those are still real advantages.
You seemed to be particularly concerned about quickly recycling
partition numbers when we drop a partition.

I hope that we can come up with a very efficient on-disk
representation for global index tuples, where only the bare minimum
amount of space is used for partition numbers. Maybe it won't matter
that much if partition numbers cannot be recycled due to this
asynchronous processing.

-- 
Peter Geoghegan




Re: Table partition with primary key in 11.3

2019-06-07 Thread Alvaro Herrera
On 2019-Jun-07, Peter Geoghegan wrote:

> On Fri, Jun 7, 2019 at 12:18 PM Alvaro Herrera  
> wrote:
> > I was thinking of asynchonously cleaning it up rather than blocking
> > DROP/DETACH ... which means you need to keep state somewhere.  I don't
> > think blocking DROP/DETACH is valuable -- a global index that blocks
> > DROP/DETACH until the index is clean serves no useful purpose.  (You
> > could think of a multi-step approach with internal transaction commits,
> > similar to CIC, but you still need a plan to clean that up in case the
> > server crashes during that operation.)
> 
> The advantage of synchronous clean-up of global indexes when DROP'ing
> a partition are that you can recycle the partition number (or whatever
> we end up calling it) immediately and predictably, and you can reuse
> the space in indexes occupied by keys from the dropped partition
> immediately and predictably. That seems less useful than asynchronous
> processing on average, certainly, but those are still real advantages.
> You seemed to be particularly concerned about quickly recycling
> partition numbers when we drop a partition.

Well, "quickly" might mean within a week.  If it takes that long to
fully remove a monthly partition to make that partition ID available to
some future month's partition, that seems acceptable.  Blocking
DROP/DETACH for one hour is certainly not acceptable.

If this scheme means that you can keep the partition identifiers stored
in the index to, for instance, 10 bits (allowing for 1024 partitions to
exist at any one time, including those in the process of being cleaned
up) instead of having to expand to (say) 24 because that covers a couple
of years of operation before having to recreate the index, it seems
worthwhile.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Table partition with primary key in 11.3

2019-06-07 Thread Peter Geoghegan
On Fri, Jun 7, 2019 at 12:43 PM Alvaro Herrera  wrote:
> Well, "quickly" might mean within a week.  If it takes that long to
> fully remove a monthly partition to make that partition ID available to
> some future month's partition, that seems acceptable.  Blocking
> DROP/DETACH for one hour is certainly not acceptable.

I agree that synchronous clean-up of global indexes wouldn't make
sense there, and might not be very compelling in practice.

It occurs to me that we could add a code path to nbtree page splits,
that considered removing dropped partition tuples to avert a page
split. This would be a bit like the LP_DEAD/kill_prior_tuple thing.
Technically the space used by index tuples that point to a dropped
partitions wouldn't become reclaimable immediately, but it might not
matter with this optimization.

> If this scheme means that you can keep the partition identifiers stored
> in the index to, for instance, 10 bits (allowing for 1024 partitions to
> exist at any one time, including those in the process of being cleaned
> up) instead of having to expand to (say) 24 because that covers a couple
> of years of operation before having to recreate the index, it seems
> worthwhile.

I think that we should have no inherent limit on the number of
partitions available at once, on general principle. Limiting the
number of partitions is a design that probably has a lot of sharp
edges.

The nbtree heap TID column and partition number column should probably
be a single varwidth column (not two separate columns), that is often
no wider than 6 bytes, but can be wider when there are many partitions
and/or very large partitions. That will be challenging, but it seems
like the right place to solve the problem. I think that I could make
that happen. Maybe this same representation could be used for all
nbtree indexes, not just global nbtree indexes.

-- 
Peter Geoghegan




Re: Table partition with primary key in 11.3

2019-06-07 Thread Alvaro Herrera
Somehow we ended up discussing this topic in a rather mistitled thread
... oh well :-)  (Nowadays I hesitate to change threads' subject lines,
because gmail).

On 2019-Jun-07, Peter Geoghegan wrote:

> On Fri, Jun 7, 2019 at 12:43 PM Alvaro Herrera  
> wrote:
> > Well, "quickly" might mean within a week.  If it takes that long to
> > fully remove a monthly partition to make that partition ID available to
> > some future month's partition, that seems acceptable.  Blocking
> > DROP/DETACH for one hour is certainly not acceptable.
> 
> I agree that synchronous clean-up of global indexes wouldn't make
> sense there, and might not be very compelling in practice.
> 
> It occurs to me that we could add a code path to nbtree page splits,
> that considered removing dropped partition tuples to avert a page
> split. This would be a bit like the LP_DEAD/kill_prior_tuple thing.
> Technically the space used by index tuples that point to a dropped
> partitions wouldn't become reclaimable immediately, but it might not
> matter with this optimization.

This seems useful on the surface: you drop a partition, and slowly and
incrementally any index items that point to it are removed by processes
scanning the index.  You can't rely solely on this, though: as pointed
out by Robert in the indirect index thread, doing this only means that
non-scanned parts of the index to retain entries for arbitrary long,
which is bad.  Also, this adds latency to client-connected processes.

Because you can't rely on that exclusively, and you want to reuse the
partition ID eventually, you still need a cleanup process that removes
those remaining index entries.  This cleanup process is a background
process, so it doesn't affect latency.  I think it's not a good idea to
add latency to clients in order to optimize a background process.

> > If this scheme means that you can keep the partition identifiers stored
> > in the index to, for instance, 10 bits (allowing for 1024 partitions to
> > exist at any one time, including those in the process of being cleaned
> > up) instead of having to expand to (say) 24 because that covers a couple
> > of years of operation before having to recreate the index, it seems
> > worthwhile.
> 
> I think that we should have no inherent limit on the number of
> partitions available at once, on general principle. Limiting the
> number of partitions is a design that probably has a lot of sharp
> edges.

Yeah, I misspoke.  The way I see this working is this: when a new
partition is created/attached, we scan the set of partitions for that
partitioned table to determine the lowest unused one; use that as
partition ID for the new partition.  Index entries for that partition
will use the smallest possible representation for that partition ID.

When a partition is dropped, a vestigial catalog entry for it remains,
until all global index entries for it have been removed.  This prevents
reuse of the partition ID until it no longer causes harm.

This way, when a partition is dropped, we have to take the time to scan
all global indexes; when they've been scanned we can remove the catalog
entry, and at that point the partition ID becomes available to future
partitions.

> The nbtree heap TID column and partition number column should probably
> be a single varwidth column (not two separate columns), that is often
> no wider than 6 bytes, but can be wider when there are many partitions
> and/or very large partitions. That will be challenging, but it seems
> like the right place to solve the problem. I think that I could make
> that happen. Maybe this same representation could be used for all
> nbtree indexes, not just global nbtree indexes.

Maybe local nbtree indexes would have a partition ID of length 0, since
that many bits are necessary to identify which table is pointed to by
each index item.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Table partition with primary key in 11.3

2019-06-07 Thread Peter Geoghegan
On Fri, Jun 7, 2019 at 1:22 PM Alvaro Herrera  wrote:
> Somehow we ended up discussing this topic in a rather mistitled thread
> ... oh well :-)  (Nowadays I hesitate to change threads' subject lines,
> because gmail).

You can blame me for that, I think.

> > It occurs to me that we could add a code path to nbtree page splits,
> > that considered removing dropped partition tuples to avert a page
> > split. This would be a bit like the LP_DEAD/kill_prior_tuple thing.
> > Technically the space used by index tuples that point to a dropped
> > partitions wouldn't become reclaimable immediately, but it might not
> > matter with this optimization.
>
> This seems useful on the surface: you drop a partition, and slowly and
> incrementally any index items that point to it are removed by processes
> scanning the index.  You can't rely solely on this, though: as pointed
> out by Robert in the indirect index thread, doing this only means that
> non-scanned parts of the index to retain entries for arbitrary long,
> which is bad.  Also, this adds latency to client-connected processes.

Well, we don't have to rely on index scans to set the LP_DEAD bit in
this case. We probably wouldn't do that at all. Rather, we'd have the
page split code refer to a list of dropped partition numbers as
targets for killing immediately. Maybe we'd hint the number of
distinct partitions represented on the page, to make it a bit faster.

> Because you can't rely on that exclusively, and you want to reuse the
> partition ID eventually, you still need a cleanup process that removes
> those remaining index entries.  This cleanup process is a background
> process, so it doesn't affect latency.  I think it's not a good idea to
> add latency to clients in order to optimize a background process.

Ordinarily I would agree, but we're talking about something that takes
place at the point that we're just about to split the page, that will
probably make the page split unnecessary when we can reclaim as few as
one or two tuples. A page split is already a very expensive thing by
any measure, and something that can rarely be "undone", so avoiding
them entirely is very compelling. Delaying a split will often prevent
it altogether. We're already doing foreground processing, just by
having page splits at all.

Other DB systems that don't do much foreground processing will still
do a certain amount of it if that avoids a split in some cases --
"Modern B-Tree techniques" mentions this, and suggests quite a number
of ways that a split might be averted.

> This way, when a partition is dropped, we have to take the time to scan
> all global indexes; when they've been scanned we can remove the catalog
> entry, and at that point the partition ID becomes available to future
> partitions.

It seems worth recycling partition IDs, but it should be possible to
delay that for a very long time if necessary. Ideally, users wouldn't
have to bother with it when they have really huge global indexes.

> > The nbtree heap TID column and partition number column should probably
> > be a single varwidth column (not two separate columns), that is often
> > no wider than 6 bytes, but can be wider when there are many partitions
> > and/or very large partitions. That will be challenging, but it seems
> > like the right place to solve the problem. I think that I could make
> > that happen. Maybe this same representation could be used for all
> > nbtree indexes, not just global nbtree indexes.
>
> Maybe local nbtree indexes would have a partition ID of length 0, since
> that many bits are necessary to identify which table is pointed to by
> each index item.

Right -- special cases are best avoided here. In general, we'd push as
much of the new complexity as we can into this new TID-like table
identifier, while requiring it to work with our existing requirements
for TIDs, plus certain new requirements for global indexes (and maybe
other new requirements, such as relations that are larger than 35GB).
If the complexity is well-encapsulated, then it probably won't be too
bad. Access methods would have to be okay with varwidth table
identifiers, which is a big change, but they at least shouldn't have
to worry about anything else breaking. They'd probably have a
pg_attribute entry for the varwidth table identifier column, too (it
would be the last column in every nbtree index).

We'd expect a space efficient representation with real world
relations, that at least matches what we get with heap TIDs today.
This isn't quite as hard as it sounds. You don't have to be Claude
Shannon to realize that it's kind of silly to reserve 16 bits for the
offset number component of a TID/ItemPointer. We need to continue to
support offset numbers that go that high, but the implementation would
optimize for the common case where offset numbers are less than 512
(or maybe less than 1024).

--
Peter Geoghegan




Re: Table partition with primary key in 11.3

2019-06-07 Thread Alvaro Herrera
On 2019-Jun-07, Peter Geoghegan wrote:

> On Fri, Jun 7, 2019 at 1:22 PM Alvaro Herrera  
> wrote:

> > Because you can't rely on that exclusively, and you want to reuse the
> > partition ID eventually, you still need a cleanup process that removes
> > those remaining index entries.  This cleanup process is a background
> > process, so it doesn't affect latency.  I think it's not a good idea to
> > add latency to clients in order to optimize a background process.
> 
> Ordinarily I would agree, but we're talking about something that takes
> place at the point that we're just about to split the page, that will
> probably make the page split unnecessary when we can reclaim as few as
> one or two tuples. A page split is already a very expensive thing by
> any measure, and something that can rarely be "undone", so avoiding
> them entirely is very compelling.

Sorry, I confused your argumentation with mine.  I agree that removing
entries to try and prevent a page split is worth doing.

> > This way, when a partition is dropped, we have to take the time to scan
> > all global indexes; when they've been scanned we can remove the catalog
> > entry, and at that point the partition ID becomes available to future
> > partitions.
> 
> It seems worth recycling partition IDs, but it should be possible to
> delay that for a very long time if necessary. Ideally, users wouldn't
> have to bother with it when they have really huge global indexes.

I envision this happening automatically -- you drop the partition, a
persistent work item is registered, autovacuum takes care of it
whenever.  The user doesn't have to do anything about it.

> You don't have to be Claude Shannon to realize that it's kind of silly
> to reserve 16 bits for the offset number component of a
> TID/ItemPointer. We need to continue to support offset numbers that go
> that high, but the implementation would optimize for the common case
> where offset numbers are less than 512 (or maybe less than 1024).

(In many actual cases offset numbers require less than 7 bits in typical
pages, even).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Table partition with primary key in 11.3

2019-06-07 Thread Peter Geoghegan
On Fri, Jun 7, 2019 at 2:35 PM Alvaro Herrera  wrote:
> I envision this happening automatically -- you drop the partition, a
> persistent work item is registered, autovacuum takes care of it
> whenever.  The user doesn't have to do anything about it.

We don't have to agree on anything now, but I think that it's possible
that the page split thing will very effective. Perhaps even so
effective that it won't make much sense to vacuum global indexes just
because there is a pending dropped partition.

--
Peter Geoghegan