Re: Fwd: increase insert into local table from remote oracle table preformance

2018-08-15 Thread Mariel Cherkassky
Inserting directly into the partition didnt help, the performance are just
the same. I tried to increase the prefetch value to 1000(alter foreign
table hist_oracle options (add prefetch '1000') but still no change - 15
minutes for one partition(6GB).

On the oracle side the plan is full scan on the partition (I'm copying the
entire partition into a postgresql partition..)

2018-08-15 1:28 GMT+03:00 legrand legrand :

> main ideas are:
>
> - inserting directly to the right partition:
>   perform as many inserts as pg partitions found in main_table_hist, like
>   INSERT INTO 14/08/2018_value1 select * from remote_oracle_hist where
> day=to_date('14/08/2018','DD/MM/') and value='value1'
>
> please check execution plan (in Oracle db) using EXPLAIN ANALYZE
>
> - all those inserts should be executed in // (with 4 or 8 sql scripts)
>
> - wal archiving should be disabled during hist data recovery only (not
> during day to day operations)
>
> - for prefetch see
>
> https://github.com/laurenz/oracle_fdw
>
> prefetch (optional, defaults to "200")
>
> Sets the number of rows that will be fetched with a single round-trip
> between PostgreSQL and Oracle during a foreign table scan. This is
> implemented using Oracle row prefetching. The value must be between 0 and
> 10240, where a value of zero disables prefetching.
>
> Higher values can speed up performance, but will use more memory on the
> PostgreSQL server.
>
>
> Regards
> PAscal
>
>
>
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-
> f2050081.html
>
>


Re: increase insert into local table from remote oracle table preformance

2018-08-15 Thread Daniel Blanch Bataller
The Postgres command of choice to load bulk data is COPY 
https://www.postgresql.org/docs/current/static/sql-copy.html 
 is much faster 
than anything else.

It’s likely that the slowest part could be Oracle exporting it’s data. Try to 
use sqlplus to export the data and see how long does it take, you won’t be able 
to make the process faster than Oracle can export it’s data.

If it’s fast enough, format the resulting file in a suitable format for 
Postgres ‘COPY FROM’ command.

Finally you can pipe the Oracle export command and the Postgres COPY FROM 
command, so the process can run twice as fast. 

You can make it even faster if you divide the exported data by any criteria and 
run those export | import scripts in parallel. 





> El 15 ago 2018, a las 10:43, Mariel Cherkassky  
> escribió:
> 
> Inserting directly into the partition didnt help, the performance are just 
> the same. I tried to increase the prefetch value to 1000(alter foreign table 
> hist_oracle options (add prefetch '1000') but still no change - 15 minutes 
> for one partition(6GB).
> 
> On the oracle side the plan is full scan on the partition (I'm copying the 
> entire partition into a postgresql partition..)
> 
> 2018-08-15 1:28 GMT+03:00 legrand legrand  >:
> main ideas are:
> 
> - inserting directly to the right partition:
>   perform as many inserts as pg partitions found in main_table_hist, like
>   INSERT INTO 14/08/2018_value1 select * from remote_oracle_hist where
> day=to_date('14/08/2018','DD/MM/') and value='value1'
> 
> please check execution plan (in Oracle db) using EXPLAIN ANALYZE
> 
> - all those inserts should be executed in // (with 4 or 8 sql scripts)
> 
> - wal archiving should be disabled during hist data recovery only (not
> during day to day operations)
> 
> - for prefetch see
> 
> https://github.com/laurenz/oracle_fdw 
> 
> prefetch (optional, defaults to "200")
> 
> Sets the number of rows that will be fetched with a single round-trip
> between PostgreSQL and Oracle during a foreign table scan. This is
> implemented using Oracle row prefetching. The value must be between 0 and
> 10240, where a value of zero disables prefetching.
> 
> Higher values can speed up performance, but will use more memory on the
> PostgreSQL server.
> 
> 
> Regards
> PAscal
> 
> 
> 
> --
> Sent from: 
> http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html 
> 
> 
> 



RE: Calculating how much redo log space has been used

2018-08-15 Thread Reza Taheri
> -Original Message-
> From: Andres Freund [mailto:[email protected]]
> Sent: Tuesday, August 14, 2018 12:31 PM
> To: Reza Taheri 
> Cc: [email protected]
> Subject: Re: Calculating how much redo log space has been used
> 
> Hi,
> 
> On 2018-08-14 18:51:34 +, Reza Taheri wrote:
> > Also, it looks like the full_page_writes parameter is the only thing
> > that can help reduce the log usage size
> 
> There's also wal_compression.
> 
> 
> > Another requirement is a very short, 6-minute checkpoint time, which
> > means we will likely write the full page very often. Yes, my hands are
> > tied!
> 
> Why is that a requirement / how is specifically phrased? Is it a bounded
> recovery time?
> 
> Greetings,
> 
> Andres Freund

Hi Andres,
Good to know about wal_compression. It gives us a good reason to upgrade to 9.5 
to get that feature.

The need for a 6-minute checkpoint came from this requirement in the benchmark 
specification:

the database contents (excluding the transaction log) stored on Durable Media 
cannot be more than 12 minutes older than any Committed state of the database.
Comment: This may mean that Database Management Systems implementing 
traditional checkpoint algorithms may need to perform checkpoints twice as 
frequently (i.e. every 6 minutes) in order to guarantee that the 12-minute 
requirement is met.

But in any case, I now realize that I was going into the weeds, looking at the 
wrong thing. My original issue was figuring out how quickly we churn through 
checkpoint segment files, and had been looking at the checkpoint stats in 
pgstatspack to figure that out. But that's the wrong place to look. I don't 
think there is anything in the pgstatspack output that can give me that 
information. I can tell by looking at the timestamps of the checkpoint segment 
files, but I was hoping to find something that gets logged in 
pg_log/postgresql-*log and tells me when we switch to a new log

Thanks,
Reza



Re: Fwd: increase insert into local table from remote oracle table preformance

2018-08-15 Thread legrand legrand
This is not so bad, you where at 10h for 200GB (20GB/h),
And now at 24GB/h, it makes a 20% increase ;0)

Could you tell us what are the résults with parallel exécutions
(Before to switch to unload reload strategy)

Regards
PAscal



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: dsa_allocate() faliure

2018-08-15 Thread Sand Stone
Just as a follow up. I tried the parallel execution again (in a stress
test environment). Now the crash seems gone. I will keep an eye on
this for the next few weeks.

My theory is that the Citus cluster created and shut down a lot of TCP
connections between coordinator and workers. If running on untuned
Linux machines, the TCP ports might run out.

Of course, I am using "newer" PG10 bits and Citus7.5 this time.
On Wed, May 23, 2018 at 7:06 AM Sand Stone  wrote:
>
> >> At which commit ID?
> 83fcc615020647268bb129cbf86f7661feee6412 (5/6)
>
> >>do you mean that these were separate PostgreSQL clusters, and they were all 
> >>running the same query and they all crashed like this?
> A few worker nodes, a table is hash partitioned by "aTable.did" by
> Citus, and further partitioned by PG10 by time range on field "ts". As
> far as I could tell, Citus just does a query rewrite, and execute the
> same type of queries to all nodes.
>
> >>so this happened at the same time or at different times?
> At the same time. The queries are simple count and sum queries, here
> is the relevant part from one of the worker nodes:
> 2018-05-23 01:24:01.492 UTC [130536] ERROR:  dsa_allocate could not
> find 7 free pages
> 2018-05-23 01:24:01.492 UTC [130536] CONTEXT:  parallel worker
> STATEMENT:  COPY (SELECT count(1) AS count, sum(worker_column_1) AS
> sum FROM (SELECT subquery.avg AS worker_column_1 FROM (SELECT
> aTable.did, avg((aTable.sum OPERATOR(pg_catalog./)
> (aTable.count)::double precision)) AS avg FROM public.aTable_102117
> aTable WHERE ((aTable.ts OPERATOR(pg_catalog.>=) '2018-04-25
> 00:00:00+00'::timestamp with time zone) AND (aTable.ts
> OPERATOR(pg_catalog.<=) '2018-04-30 00:00:00+00'::timestamp with time
> zone) AND (aTable.v OPERATOR(pg_catalog.=) 12345)) GROUP BY
> aTable.did) subquery) worker_subquery) TO STDOUT WITH (FORMAT binary)
>
>
> >> a parallel worker process
> I think this is more of PG10 parallel bg worker issue. I don't think
> Citus just lets each worker PG server do its own planning.
>
> I will try to do more experiments about this, and see if there is any
> specific query to cause the parallel query execution to fail. As far
> as I can tell, the level of concurrency triggered this issue. That is
> executing 10s of queries as shown on the worker nodes, depending on
> the stats, the PG10 core may or may not spawn more bg workers.
>
> Thanks for your time!
>
>
>
>
>
> On Tue, May 22, 2018 at 9:44 PM, Thomas Munro
>  wrote:
> > On Wed, May 23, 2018 at 4:10 PM, Sand Stone  wrote:
> dsa_allocate could not find 7 free pages
> >> I just this error message again on all of my worker nodes (I am using
> >> Citus 7.4 rel). The PG core is my own build of release_10_stable
> >> (10.4) out of GitHub on Ubuntu.
> >
> > At which commit ID?
> >
> > All of your worker nodes... so this happened at the same time or at
> > different times?  I don't know much about Citus -- do you mean that
> > these were separate PostgreSQL clusters, and they were all running the
> > same query and they all crashed like this?
> >
> >> What's the best way to debug this? I am running pre-production tests
> >> for the next few days, so I could gather info. if necessary (I cannot
> >> pinpoint a query to repro this yet, as we have 10K queries running
> >> concurrently).
> >
> > Any chance of an EXPLAIN plan for the query that crashed like this?
> > Do you know if it's using multiple Gather[Merge] nodes and parallel
> > bitmap heap scans?  Was it a regular backend process or a parallel
> > worker process (or a Citus worker process, if that is a thing?) that
> > raised the error?
> >
> > --
> > Thomas Munro
> > http://www.enterprisedb.com



Re: dsa_allocate() faliure

2018-08-15 Thread Thomas Munro
On Thu, Aug 16, 2018 at 8:32 AM, Sand Stone  wrote:
> Just as a follow up. I tried the parallel execution again (in a stress
> test environment). Now the crash seems gone. I will keep an eye on
> this for the next few weeks.

Thanks for the report.  That's great news, but it'd be good to
understand why it was happening.

> My theory is that the Citus cluster created and shut down a lot of TCP
> connections between coordinator and workers. If running on untuned
> Linux machines, the TCP ports might run out.

I'm not sure how that's relevant, unless perhaps it causes executor
nodes to be invoked in a strange sequence that commit fd7c0fa7 didn't
fix?  I wonder if there could be something different about the control
flow with custom scans, or something about the way Citus worker nodes
invoke plan fragments, or some error path that I failed to consider...
It's a clue that all of your worker nodes reliably crashed at the same
time on the same/similar queries (presumably distributed query
fragments for different shards), making it seem more like a
common-or-garden bug rather than some kind of timing-based heisenbug.
If you ever manage to reproduce it, an explain plan and a back trace
would be very useful.

> Of course, I am using "newer" PG10 bits and Citus7.5 this time.

Hmm.  There weren't any relevant commits to REL_10_STABLE that I can
think of.  And (with the proviso that I know next to nothing about
Citus) I just cloned https://github.com/citusdata/citus.git and
skimmed through "git diff origin/release-7.4..origin/release-7.5", and
nothing is jumping out at me.  Can you still see the problem with
Citus 7.4?

-- 
Thomas Munro
http://www.enterprisedb.com