How to connect to toad Edge with postgresql running with docker container?

2019-06-14 Thread Daulat Ram
Hello team,

Please suggest how to connect to toad Edge with postgresql running with docker 
container.

Regards,
Daulat



How to return ARRAY from SQL function?

2019-06-14 Thread Alexander Farber
Hello,

in PostgreSQL 10.8 the following works -

words_ru=> SELECT ARRAY[
words_ru-> '*', '*', 'А', 'А', 'А', 'А', 'А', 'А', 'А', 'А',
words_ru-> 'Б', 'Б', 'В', 'В', 'В', 'В', 'Г', 'Г', 'Д', 'Д',
words_ru-> 'Д', 'Д', 'Е', 'Е', 'Е', 'Е', 'Е', 'Е', 'Е', 'Е',
words_ru-> 'Е', 'Ж', 'З', 'З', 'И', 'И', 'И', 'И', 'И', 'Й',
words_ru-> 'К', 'К', 'К', 'К', 'Л', 'Л', 'Л', 'Л', 'М', 'М',
words_ru-> 'М', 'Н', 'Н', 'Н', 'Н', 'Н', 'О', 'О', 'О', 'О',
words_ru-> 'О', 'О', 'О', 'О', 'О', 'О', 'П', 'П', 'П', 'П',
words_ru-> 'Р', 'Р', 'Р', 'Р', 'Р', 'С', 'С', 'С', 'С', 'С',
words_ru-> 'Т', 'Т', 'Т', 'Т', 'Т', 'У', 'У', 'У', 'У', 'Ф',
words_ru-> 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ы', 'Ь', 'Ь',
words_ru-> 'Э', 'Ю', 'Я', 'Я'
words_ru-> ];

   array

---
 
{*,*,А,А,А,А,А,А,А,А,Б,Б,В,В,В,В,Г,Г,Д,Д,Д,Д,Е,Е,Е,Е,Е,Е,Е,Е,Е,Ж,З,З,И,И,И,И,И,Й,К,К,К,К,Л,Л,Л,Л,М,М,М,Н,Н,Н,Н,Н,О,О,О,О,О,О,О,О,О,О,П,П,П,П,Р,Р,Р,Р,Р,С,С,С,С,С,Т,Т,Т,Т,Т,У,У,У,У,Ф,Х,Ц,Ч,Ш,Щ,
Ъ,Ы,Ы,Ь,Ь,Э,Ю,Я,Я}
(1 row)

But creating an SQL function fails -

words_ru=> CREATE OR REPLACE FUNCTION words_all_letters()
words_ru-> RETURNS array AS
words_ru-> $func$
words_ru$> SELECT ARRAY[
words_ru$> '*', '*', 'А', 'А', 'А', 'А', 'А', 'А', 'А', 'А',
words_ru$> 'Б', 'Б', 'В', 'В', 'В', 'В', 'Г', 'Г', 'Д', 'Д',
words_ru$> 'Д', 'Д', 'Е', 'Е', 'Е', 'Е', 'Е', 'Е', 'Е', 'Е',
words_ru$> 'Е', 'Ж', 'З', 'З', 'И', 'И', 'И', 'И', 'И', 'Й',
words_ru$> 'К', 'К', 'К', 'К', 'Л', 'Л', 'Л', 'Л', 'М', 'М',
words_ru$> 'М', 'Н', 'Н', 'Н', 'Н', 'Н', 'О', 'О', 'О', 'О',
words_ru$> 'О', 'О', 'О', 'О', 'О', 'О', 'П', 'П', 'П', 'П',
words_ru$> 'Р', 'Р', 'Р', 'Р', 'Р', 'С', 'С', 'С', 'С', 'С',
words_ru$> 'Т', 'Т', 'Т', 'Т', 'Т', 'У', 'У', 'У', 'У', 'Ф',
words_ru$> 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ы', 'Ь', 'Ь',
words_ru$> 'Э', 'Ю', 'Я', 'Я'
words_ru$> ];
words_ru$> $func$ LANGUAGE sql IMMUTABLE;
ERROR:  42601: syntax error at or near "array"
LINE 2: RETURNS array AS
^
LOCATION:  scanner_yyerror, scan.l:1128

Is it possible to create and return an array in an SQL custom function?

Thank you
Alex


Re: How to return ARRAY from SQL function?

2019-06-14 Thread Laurenz Albe
Alexander Farber wrote:
> But creating an SQL function fails -
> 
> words_ru=> CREATE OR REPLACE FUNCTION words_all_letters()
> words_ru-> RETURNS array AS
> words_ru-> $func$
> words_ru$> SELECT ARRAY[...

"array" is not an existing data type.

You'll have to specify an array of which type you want, probably

 ... RETURNS text[]

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: How to return ARRAY from SQL function?

2019-06-14 Thread Tom Lane
Laurenz Albe  writes:
> Alexander Farber wrote:
>> But creating an SQL function fails -
>> 
>> words_ru=> CREATE OR REPLACE FUNCTION words_all_letters()
>> words_ru-> RETURNS array AS
>> words_ru-> $func$
>> words_ru$> SELECT ARRAY[...

> "array" is not an existing data type.

> You'll have to specify an array of which type you want, probably
>  ... RETURNS text[]

Right.  Also, I don't recall the exact rules in this area, but I think
that SQL functions are pickier about their return types than ordinary
query contexts, meaning you might also need an explicit cast:

   SELECT ARRAY[
'*', '*', 'А', 'А', 'А', 'А', 'А', 'А', 'А', 'А',
...
   ]::text[];

Try it without first, but if it moans about the query returning the
wrong type, that's how to fix it.

regards, tom lane




Re: How to connect to toad Edge with postgresql running with docker container?

2019-06-14 Thread Adrian Klaver

On 6/14/19 12:38 AM, Daulat Ram wrote:

Hello team,

Please suggest how to connect to toad Edge with postgresql running with 
docker container.


Use the below and fill in appropriate settings?:

https://support.quest.com/technical-documents/toad-edge/2.0.9/user-guide/3#TOPIC-1189203



Regards,

Daulat




--
Adrian Klaver
adrian.kla...@aklaver.com




checkpoints taking much longer than expected

2019-06-14 Thread Tiemen Ruiten
Hello,

I setup a new 3-node cluster with the following specifications:

2x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2*20 cores)
128 GB RAM
8x Crucial MX500 1TB SSD's

FS is ZFS, the dataset with the PGDATA directory on it has the following
properties (only non-default listed):

NAMEPROPERTY  VALUE  SOURCE
tank/pgsql  used  234G   -
tank/pgsql  available 3.28T  -
tank/pgsql  referenced234G   -
tank/pgsql  compressratio 2.68x  -
tank/pgsql  compression   lz4inherited from tank
tank/pgsql  atime offinherited from tank
tank/pgsql  canmount  on local
tank/pgsql  xattr sa inherited from tank

My postgresql.conf (only changed from default listed):

hba_file = '/var/lib/pgsql/pg_hba.conf'
listen_addresses = '*'
max_connections = 800
shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
autovacuum_work_mem = 1GB
dynamic_shared_memory_type = posix
effective_io_concurrency = 200
max_worker_processes = 50
max_parallel_maintenance_workers = 8
max_parallel_workers_per_gather = 8
max_parallel_workers = 40
wal_level = replica
synchronous_commit = off
full_page_writes = on
wal_log_hints = on
wal_buffers = 128MB
checkpoint_timeout = 60min
max_wal_size = 8GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p'
max_wal_senders = 10
wal_keep_segments = 20
hot_standby = on
hot_standby_feedback = on
random_page_cost = 1.5
effective_cache_size = 48GB
default_statistics_target = 500 # range 1-1
idle_in_transaction_session_timeout = 30min # in milliseconds, 0 is disabled
shared_preload_libraries = 'timescaledb, pg_cron'
max_locks_per_transaction = 512
timescaledb.max_background_workers = 8

My problem is that checkpoints are taking a long time. Even when I run a
few manual checkpoints one after the other, they keep taking very long, up
to 10 minutes:

2019-06-14 15:21:10.351 CEST [23657] LOG:  checkpoint starting: immediate
force wait
2019-06-14 15:25:57.655 CEST [23657] LOG:  checkpoint complete: wrote
139831 buffers (13.3%); 0 WAL file(s) added, 148 removed, 40 recycled;
write=284.751 s, sync=0.532 s, total=287.304 s; sync files=537,
longest=0.010 s, average=0.000 s; distance=2745065 kB, estimate=2841407 kB
2019-06-14 15:26:01.988 CEST [23657] LOG:  checkpoint starting: immediate
force wait
2019-06-14 15:30:30.430 CEST [23657] LOG:  checkpoint complete: wrote
238322 buffers (22.7%); 0 WAL file(s) added, 0 removed, 172 recycled;
write=264.794 s, sync=0.415 s, total=268.441 s; sync files=378,
longest=0.011 s, average=0.001 s; distance=2834018 kB, estimate=2840668 kB
2019-06-14 15:30:44.097 CEST [23657] LOG:  checkpoint starting: immediate
force wait
2019-06-14 15:37:01.438 CEST [23657] LOG:  checkpoint complete: wrote
132286 buffers (12.6%); 0 WAL file(s) added, 54 removed, 96 recycled;
write=366.614 s, sync=2.975 s, total=377.341 s; sync files=467,
longest=0.095 s, average=0.006 s; distance=2444291 kB, estimate=2801030 kB

What is going on? It doesn't seem like normal behaviour?


Re: checkpoints taking much longer than expected

2019-06-14 Thread Ravi Krishna

On 6/14/19 10:01 AM, Tiemen Ruiten wrote:

LOG:  checkpoint starting: immediate force wait


Does it mean that the DB is blocked until the completion of checkpoint.
Years ago
Informix use to have this issue until they fixed around 2006.




Re: checkpoints taking much longer than expected

2019-06-14 Thread Stephen Frost
Greetings,

* Ravi Krishna (ravikris...@mail.com) wrote:
> On 6/14/19 10:01 AM, Tiemen Ruiten wrote:
> >LOG:  checkpoint starting: immediate force wait
> 
> Does it mean that the DB is blocked until the completion of checkpoint.
> Years ago
> Informix use to have this issue until they fixed around 2006.

No.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: checkpoints taking much longer than expected

2019-06-14 Thread Stephen Frost
Greetings,

* Tiemen Ruiten (t.rui...@tech-lab.io) wrote:
> checkpoint_timeout = 60min

That seems like a pretty long timeout.

> My problem is that checkpoints are taking a long time. Even when I run a
> few manual checkpoints one after the other, they keep taking very long, up
> to 10 minutes:

You haven't said *why* this is an issue...  Why are you concerned with
how long it takes to do a checkpoint?

> 2019-06-14 15:21:10.351 CEST [23657] LOG:  checkpoint starting: immediate
> force wait
> 2019-06-14 15:25:57.655 CEST [23657] LOG:  checkpoint complete: wrote
> 139831 buffers (13.3%); 0 WAL file(s) added, 148 removed, 40 recycled;
> write=284.751 s, sync=0.532 s, total=287.304 s; sync files=537,
> longest=0.010 s, average=0.000 s; distance=2745065 kB, estimate=2841407 kB
> 2019-06-14 15:26:01.988 CEST [23657] LOG:  checkpoint starting: immediate
> force wait
> 2019-06-14 15:30:30.430 CEST [23657] LOG:  checkpoint complete: wrote
> 238322 buffers (22.7%); 0 WAL file(s) added, 0 removed, 172 recycled;
> write=264.794 s, sync=0.415 s, total=268.441 s; sync files=378,
> longest=0.011 s, average=0.001 s; distance=2834018 kB, estimate=2840668 kB
> 2019-06-14 15:30:44.097 CEST [23657] LOG:  checkpoint starting: immediate
> force wait
> 2019-06-14 15:37:01.438 CEST [23657] LOG:  checkpoint complete: wrote
> 132286 buffers (12.6%); 0 WAL file(s) added, 54 removed, 96 recycled;
> write=366.614 s, sync=2.975 s, total=377.341 s; sync files=467,
> longest=0.095 s, average=0.006 s; distance=2444291 kB, estimate=2801030 kB
> 
> What is going on? It doesn't seem like normal behaviour?

The time information is all there and it tells you what it's doing and
how much had to be done... If you're unhappy with how long it takes to
write out gigabytes of data and fsync hundreds of files, talk to your
storage people...

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: checkpoints taking much longer than expected

2019-06-14 Thread Peter J. Holzer
On 2019-06-14 16:01:40 +0200, Tiemen Ruiten wrote:
> FS is ZFS, the dataset with the PGDATA directory on it has the following
> properties (only non-default listed):
[...]
> My problem is that checkpoints are taking a long time. Even when I run a few
> manual checkpoints one after the other, they keep taking very long, up to 10
> minutes:

There was a discussion about ZFS' COW behaviour and PostgreSQL reusing
WAL files not being a good combination about a year ago:
https://www.postgresql.org/message-id/flat/CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w%40mail.gmail.com

Maybe you have the same problem?

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature


PostgreSQL 10 and OpenSSL 1.1

2019-06-14 Thread Shane Duan
Is the default PostgreSQL installer built with openSSL 1.0 and or 1.1? I
checked on Windows, it seems it was built with openSSL 1.02g. If it true,
is there any plan to pre-build next minor release(10.x) with latest
openSSL1.1? OpenSSL 1.0 will be deprecated by the end of 2019...

Thanks,
Shane


Re: PostgreSQL 10 and OpenSSL 1.1

2019-06-14 Thread Adrian Klaver

On 6/14/19 9:35 AM, Shane Duan wrote:
Is the default PostgreSQL installer built with openSSL 1.0 and or 1.1? I 
checked on Windows, it seems it was built with openSSL 1.02g. If it 
true, is there any plan to pre-build next minor release(10.x) with 
latest openSSL1.1? OpenSSL 1.0 will be deprecated by the end of 2019...


There is no default installer. There are packages built for various 
platforms by various organizations. You will need to be specific about 
the package you are using and where you got it. In the end you will 
probably need to contact the package maintainers to see what their plans 
are.




Thanks,
Shane



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: PostgreSQL 10 and OpenSSL 1.1

2019-06-14 Thread Shane Duan
Thanks, Adrian.

I am using prebuilt packages for Redhat 7(using yam install.
https://www.postgresql.org/download/linux/redhat/) and Ubuntu 18.04(using
apt-get instal. https://www.postgresql.org/download/linux/ubuntu/) on
Linux. For Windows, I am downloading pre-built package from EnterpriseDB
follow the link on https://www.postgresql.org/download/windows/

-Shane

On Fri, Jun 14, 2019 at 11:05 AM Adrian Klaver 
wrote:

> On 6/14/19 9:35 AM, Shane Duan wrote:
> > Is the default PostgreSQL installer built with openSSL 1.0 and or 1.1? I
> > checked on Windows, it seems it was built with openSSL 1.02g. If it
> > true, is there any plan to pre-build next minor release(10.x) with
> > latest openSSL1.1? OpenSSL 1.0 will be deprecated by the end of 2019...
>
> There is no default installer. There are packages built for various
> platforms by various organizations. You will need to be specific about
> the package you are using and where you got it. In the end you will
> probably need to contact the package maintainers to see what their plans
> are.
>
> >
> > Thanks,
> > Shane
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: PostgreSQL 10 and OpenSSL 1.1

2019-06-14 Thread Adrian Klaver

On 6/14/19 11:23 AM, Shane Duan wrote:

Thanks, Adrian.

I am using prebuilt packages for Redhat 7(using yam install. 
https://www.postgresql.org/download/linux/redhat/) and Ubuntu 
18.04(using apt-get instal. 
https://www.postgresql.org/download/linux/ubuntu/) on Linux. For 
Windows, I am downloading pre-built package from EnterpriseDB follow the 
link on https://www.postgresql.org/download/windows/


I'm going to go out on a limb and say the Linux packages are pegged to 
whatever is the default openSSL version for the OS version.


If you want to know for sure then contact the maintainers at:

https://yum.postgresql.org/
https://wiki.postgresql.org/wiki/Apt#

For EDB I would say start here:

https://postgresrocks.enterprisedb.com/t5/EDB-Postgres/bd-p/EDBPostgres



-Shane

On Fri, Jun 14, 2019 at 11:05 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:


On 6/14/19 9:35 AM, Shane Duan wrote:
 > Is the default PostgreSQL installer built with openSSL 1.0 and or
1.1? I
 > checked on Windows, it seems it was built with openSSL 1.02g. If it
 > true, is there any plan to pre-build next minor release(10.x) with
 > latest openSSL1.1? OpenSSL 1.0 will be deprecated by the end of
2019...

There is no default installer. There are packages built for various
platforms by various organizations. You will need to be specific about
the package you are using and where you got it. In the end you will
probably need to contact the package maintainers to see what their
plans
are.

 >
 > Thanks,
 > Shane


-- 
Adrian Klaver

adrian.kla...@aklaver.com 




--
Adrian Klaver
adrian.kla...@aklaver.com




vacuum/reindex

2019-06-14 Thread Rob Sargent

Is reindex table redundant after vacuum(analyse,verbose)?





Re: vacuum/reindex

2019-06-14 Thread Ron

On 6/14/19 2:55 PM, Rob Sargent wrote:

Is reindex table redundant after vacuum(analyse,verbose)?


Instead of "redundant", I'd call it "backwards", since doing a 
vacuum(analyse,verbose) on a freshly reindexed table seems more fruitful.


--
Angular momentum makes the world go 'round.




Re: vacuum/reindex

2019-06-14 Thread Rob Sargent



On 6/14/19 2:13 PM, Ron wrote:

On 6/14/19 2:55 PM, Rob Sargent wrote:

Is reindex table redundant after vacuum(analyse,verbose)?


Instead of "redundant", I'd call it "backwards", since doing a 
vacuum(analyse,verbose) on a freshly reindexed table seems more fruitful.


Does reindex remove (dead) row versions as vacuum does (in the 
indices)?  Maybe I don't need reindex at all as I do NOT suspect any 
corruption - was just rebuilding after having removed 15M data rows in 
vacuum







Copy Bulk Ignore Duplicated

2019-06-14 Thread Leandro Guimarães
Hi,

   I have a scenario with a large table and I'm trying to insert it via a
COPY command with a csv file.

   Everything works, but sometimes my source .csv file has duplicated data
in the previously fulfilled table. If I add a check constraint and try to
run the COPY command I have an error that stops the whole insertion.

  I've tried to put the data in a tmp table and fill the main using
distinct this way (the fields and names are just examples):

INSERT INTO final_table values (name, document)
   SELECT DISTINCT name, document
   FROM tmp_TABLE t1
   WHERE NOT EXISTS (
   SELECT 1 FROM final_table t2
   WHERE (t2.name, t2.document)
   IS NOT DISTINCT FROM (t1.name, t1.document))

The problem is that my final_table is a large (and partitioned) table and
this query is taking a long time to execute.

Someone have any idea (really guys anything would be great) how to solve
this situation? I need to ignore duplicates instead to have some error.

I'm using* PostgreSQL 9.4* so I can't use "ON CONFLICT" and upgrade is not
an option.

Thanks and Kind Regards!


Leandro Guimarães


Re: Copy Bulk Ignore Duplicated

2019-06-14 Thread Adrian Klaver

On 6/14/19 2:04 PM, Leandro Guimarães wrote:

Hi,
    I have a scenario with a large table and I'm trying to insert it via 
a COPY command with a csv file.


    Everything works, but sometimes my source .csv file has duplicated 
data in the previously fulfilled table. If I add a check constraint and 
try to run the COPY command I have an error that stops the whole insertion.


   I've tried to put the data in a tmp table and fill the main using 
distinct this way (the fields and names are just examples):


INSERT INTO final_table values (name, document)
    SELECT DISTINCT name, document
    FROM tmp_TABLE t1
    WHERE NOT EXISTS (
    SELECT 1 FROM final_table t2
    WHERE (t2.name , t2.document)
    IS NOT DISTINCT FROM (t1.name , t1.document))

The problem is that my final_table is a large (and partitioned) table 
and this query is taking a long time to execute.


Someone have any idea (really guys anything would be great) how to solve 
this situation? I need to ignore duplicates instead to have some error.


pg_bulkload?:

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html
"PARSE_ERRORS = n
The number of ingored tuples that throw errors during parsing, 
encoding checks, encoding conversion, FILTER function, CHECK constraint 
checks, NOT NULL checks, or data type conversion. Invalid input tuples 
are not loaded and recorded in the PARSE BADFILE. The default is 0. If 
there are equal or more parse errors than the value, already loaded data 
is committed and the remaining tuples are not loaded. 0 means to allow 
no errors, and -1 and INFINITE mean to ignore all errors. "




I'm using*PostgreSQL 9.4* so I can't use "ON CONFLICT" and upgrade is 
not an option.


Thanks and Kind Regards!


Leandro Guimarães




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Copy Bulk Ignore Duplicated

2019-06-14 Thread Adrian Klaver

On 6/14/19 2:04 PM, Leandro Guimarães wrote:

Hi,
    I have a scenario with a large table and I'm trying to insert it via 
a COPY command with a csv file.


    Everything works, but sometimes my source .csv file has duplicated 
data in the previously fulfilled table. If I add a check constraint and 
try to run the COPY command I have an error that stops the whole insertion.


   I've tried to put the data in a tmp table and fill the main using 
distinct this way (the fields and names are just examples):


INSERT INTO final_table values (name, document)
    SELECT DISTINCT name, document
    FROM tmp_TABLE t1
    WHERE NOT EXISTS (
    SELECT 1 FROM final_table t2
    WHERE (t2.name , t2.document)
    IS NOT DISTINCT FROM (t1.name , t1.document))


Hit enter too soon on previous post.
Alternative to above query(rough code):

DELETE FROM tmp_TABLE where ;
INSERT INTO final_table VALUES () SELECT * tmp_TABLE;



The problem is that my final_table is a large (and partitioned) table 
and this query is taking a long time to execute.


Someone have any idea (really guys anything would be great) how to solve 
this situation? I need to ignore duplicates instead to have some error.


I'm using*PostgreSQL 9.4* so I can't use "ON CONFLICT" and upgrade is 
not an option.


Thanks and Kind Regards!


Leandro Guimarães




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Copy Bulk Ignore Duplicated

2019-06-14 Thread Tim Cross


Leandro Guimarães  writes:

> Hi,
>
>I have a scenario with a large table and I'm trying to insert it via a
> COPY command with a csv file.
>
>Everything works, but sometimes my source .csv file has duplicated data
> in the previously fulfilled table. If I add a check constraint and try to
> run the COPY command I have an error that stops the whole insertion.
>
>   I've tried to put the data in a tmp table and fill the main using
> distinct this way (the fields and names are just examples):
>
> INSERT INTO final_table values (name, document)
>SELECT DISTINCT name, document
>FROM tmp_TABLE t1
>WHERE NOT EXISTS (
>SELECT 1 FROM final_table t2
>WHERE (t2.name, t2.document)
>IS NOT DISTINCT FROM (t1.name, t1.document))
>
> The problem is that my final_table is a large (and partitioned) table and
> this query is taking a long time to execute.
>
> Someone have any idea (really guys anything would be great) how to solve
> this situation? I need to ignore duplicates instead to have some error.
>
> I'm using* PostgreSQL 9.4* so I can't use "ON CONFLICT" and upgrade is not
> an option.
>

Explain plan would probably shed some light, but I suspect your
performance is being heavily hit by the sub query. Distinct is an
expensive operation and you are performing it once for every distinct row
in your temp table.

It isn't clear what the primary key is for your final table - name +
document seems suspicious given these seem to be the only two columns
your inserting as well. You don't indicate what the data types are
either - it document is something like 'text' then using it in a
distinct clause is likely to have huge performance impact. 

The first thing I'd do is to eliminate duplicates from your temp table
as a separate statement or by pre-filtering the CSV before import. I
would then try something like an outer join to identify rows in your
temp table which don't exist in your final table and select from there
to insert into the final table. You don't really need the distinct in
the sub query as all you really need to know is if (name, document)
exists - it doesn't matter if more than one exists (for this test). 

If you really don't have something more specific for a primary key,
depending on what data type 'document' is and how large it is, you may
find adding a column which is a checksum of your 'document' field a
useful addition. I have done this in the past where I had an application
where name was not unique and we only wanted distinct instances of
'document' (document was a fairly large XML document in this case). 

--
Tim Cross




Re: Copy Bulk Ignore Duplicated

2019-06-14 Thread Leandro Guimarães
Hi Tim, thanks for you answer!

The columns were just examples, but let me explain the database structure,
the fields in *bold are the keys*:

*customer_id integer*
*date_time timestamp*
*indicator_id integer*
*element_id integer*
indicator_value double precision

The table is partitioned per day and customer_id (it works great) the
problem is just the duplicated key situation that I'm really worried about.

I populate the database via a Java Application with JDBC.

Maybe this info could help to provide some light!

Thanks Again!

Leandro Guimarães



On Fri, Jun 14, 2019 at 7:39 PM Tim Cross  wrote:

>
> Leandro Guimarães  writes:
>
> > Hi,
> >
> >I have a scenario with a large table and I'm trying to insert it via a
> > COPY command with a csv file.
> >
> >Everything works, but sometimes my source .csv file has duplicated
> data
> > in the previously fulfilled table. If I add a check constraint and try to
> > run the COPY command I have an error that stops the whole insertion.
> >
> >   I've tried to put the data in a tmp table and fill the main using
> > distinct this way (the fields and names are just examples):
> >
> > INSERT INTO final_table values (name, document)
> >SELECT DISTINCT name, document
> >FROM tmp_TABLE t1
> >WHERE NOT EXISTS (
> >SELECT 1 FROM final_table t2
> >WHERE (t2.name, t2.document)
> >IS NOT DISTINCT FROM (t1.name, t1.document))
> >
> > The problem is that my final_table is a large (and partitioned) table and
> > this query is taking a long time to execute.
> >
> > Someone have any idea (really guys anything would be great) how to solve
> > this situation? I need to ignore duplicates instead to have some error.
> >
> > I'm using* PostgreSQL 9.4* so I can't use "ON CONFLICT" and upgrade is
> not
> > an option.
> >
>
> Explain plan would probably shed some light, but I suspect your
> performance is being heavily hit by the sub query. Distinct is an
> expensive operation and you are performing it once for every distinct row
> in your temp table.
>
> It isn't clear what the primary key is for your final table - name +
> document seems suspicious given these seem to be the only two columns
> your inserting as well. You don't indicate what the data types are
> either - it document is something like 'text' then using it in a
> distinct clause is likely to have huge performance impact.
>
> The first thing I'd do is to eliminate duplicates from your temp table
> as a separate statement or by pre-filtering the CSV before import. I
> would then try something like an outer join to identify rows in your
> temp table which don't exist in your final table and select from there
> to insert into the final table. You don't really need the distinct in
> the sub query as all you really need to know is if (name, document)
> exists - it doesn't matter if more than one exists (for this test).
>
> If you really don't have something more specific for a primary key,
> depending on what data type 'document' is and how large it is, you may
> find adding a column which is a checksum of your 'document' field a
> useful addition. I have done this in the past where I had an application
> where name was not unique and we only wanted distinct instances of
> 'document' (document was a fairly large XML document in this case).
>
> --
> Tim Cross
>
>
>


Re: Copy Bulk Ignore Duplicated

2019-06-14 Thread Leandro Guimarães
Hi Adrian,
  I'll take a look about pg_bulkload, but I populate the database via a
Java application with JDBC.

  I'll try the query you kindly sent to me!

Thanks!
Leandro Guimarães



On Fri, Jun 14, 2019 at 6:59 PM Adrian Klaver 
wrote:

> On 6/14/19 2:04 PM, Leandro Guimarães wrote:
> > Hi,
> > I have a scenario with a large table and I'm trying to insert it via
> > a COPY command with a csv file.
> >
> > Everything works, but sometimes my source .csv file has duplicated
> > data in the previously fulfilled table. If I add a check constraint and
> > try to run the COPY command I have an error that stops the whole
> insertion.
> >
> >I've tried to put the data in a tmp table and fill the main using
> > distinct this way (the fields and names are just examples):
> >
> > INSERT INTO final_table values (name, document)
> > SELECT DISTINCT name, document
> > FROM tmp_TABLE t1
> > WHERE NOT EXISTS (
> > SELECT 1 FROM final_table t2
> > WHERE (t2.name , t2.document)
> > IS NOT DISTINCT FROM (t1.name , t1.document))
> >
> > The problem is that my final_table is a large (and partitioned) table
> > and this query is taking a long time to execute.
> >
> > Someone have any idea (really guys anything would be great) how to solve
> > this situation? I need to ignore duplicates instead to have some error.
>
> pg_bulkload?:
>
> https://ossc-db.github.io/pg_bulkload/pg_bulkload.html
> "PARSE_ERRORS = n
>  The number of ingored tuples that throw errors during parsing,
> encoding checks, encoding conversion, FILTER function, CHECK constraint
> checks, NOT NULL checks, or data type conversion. Invalid input tuples
> are not loaded and recorded in the PARSE BADFILE. The default is 0. If
> there are equal or more parse errors than the value, already loaded data
> is committed and the remaining tuples are not loaded. 0 means to allow
> no errors, and -1 and INFINITE mean to ignore all errors. "
>
> >
> > I'm using*PostgreSQL 9.4* so I can't use "ON CONFLICT" and upgrade is
> > not an option.
> >
> > Thanks and Kind Regards!
> >
> >
> > Leandro Guimarães
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


how to concat/concat_ws all fields without braces

2019-06-14 Thread Jean Louis
Hello,

I have tried doing something like:

SELECT concat_ws(' ', table.*) FROM table;

and if I do that way, it is essentially same as 

SELECT concat(table.*) FROM table;

and I get the items in braces like (1,something).

Why do I get it in braces?

Is there a way without specifying specific fields
to get all items concatenated without braces?

I would prefer conat_ws option.

Jean




arrays of composite types, and client drivers like JDBC

2019-06-14 Thread Rob Nikander
Hi,

I'm experimenting with Java client libraries (the usual JDBC and some other
async projects, eg [1]). So far, I'm not finding ways to select/read
composite types without ugly string parsing. The simple cases are okay, but
if I have a column that is an array of composites, the client library might
give me a string for a column value like the following, with no builtin way
to parse it.

{"(10,\"abc \"\" \"\" \"\"\",\"2019-06-14
18:16:48.067969\",t)","(11,foo,\"2019-06-14 18:16:48.067969\",f)"}

Maybe I'm missing the part of the JDBC API that I can use here.

If not, then I'm wondering: is there something inherent in the underlying
PG protocol that makes this difficult for all these client/driver
libraries? In other words, maybe the protocol is sending strings meant for
display, not for parsing as data?

I was hoping I'd find an API like...

create type foo as (age int, color text);
create table t (a foo, b foo[]);

var resultSet = conn.prepareStatement("select a,b from
t").executeQuery()
var foos = resultSet.getArray("b")
var foo = foos.getElement(0)
var age = foo.getInt(1)
var color = foo.getString(2)

thanks,
Rob

[1]: https://github.com/cretz/pgnio