Re: Performance of INSERT into temporary tables using psqlODBC driver

2018-09-07 Thread Tim Cross


padusuma  writes:

> I am working on adding support for PostgreSQL database for our application.
> In a lot of our use-cases, data is inserted into temporary tables using
> INSERT INTO statements with bind parameters, and subsequently queries are
> run by joining to these temp tables. Following is some of the data for these
> INSERT statements:
>
> Table definition: CREATE TEMPORARY TABLE Table1( auid varchar(15) ) ON
> COMMIT DELETE ROWS;
>
> SQL statement: INSERT INTO Table1 (uidcol) VALUES (:1);
>
> Time taken to insert 24428 rows: 10.077 sec
> Time taken to insert 32512 rows: 16.026 sec
> Time taken to insert 32512 rows: 15.821 sec
> Time taken to insert  6107 rows: 1.514 sec
>
> I am looking for suggestions to improve the performance of these INSERT
> statements into temporary tables. Database is located on a Linux VM and the
> version is "PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
> 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit". The application is running on a
> windows platform and connecting to the database using psqlODBC driver
> version 10.03.
>

We are inserting large numbers (millions) of rows into a postgres
database from a Javascript application and found using the COPY command
was much, much faster than doing regular inserts (even with multi-insert
commit). If you can do this using the driver you are using, that will
give you the largest performance boost. 


-- 
Tim Cross



Re: Performance of INSERT into temporary tables using psqlODBC driver

2018-09-09 Thread Tim Cross


padusuma  writes:

>>We are inserting large numbers (millions) of rows into a postgres
>>database from a Javascript application and found using the COPY command
>>was much, much faster than doing regular inserts (even with multi-insert
>>commit). If you can do this using the driver you are using, that will
>>give you the largest performance boost.
>
> The data to be inserted into temporary tables is obtained from one or more
> queries run earlier and the data is available as a vector of strings. If I
> need to use COPY FROM command, then the application would need to create a
> file with the data to be inserted and the file needs to be readable by the
> user running database server process, which may not be always possible
> unless the application is running on the same host. I think this approach
> may not be feasible for our application.
>

OK, that does make a difference. If your data is already in the
database, COPY is not going to help you much.

> I have increased the value for /temp_buffers/ server parameter from the
> default 8 MB to 128 MB. However, this change did not affect the INSERT time
> for temporary tables.

It isn't clear why you create vectors of strings rather than just select
into or something similar.

There are no 'quick fixes' which can be applied without real analysis of
the system. However, based on the limited information available, you may
want to consider -

- Increase work_mem to reduce use of temp files. Need it to be 2 to 3
  times largest temp file (but use common sense)

- Tweak wal checkpoint parameters to prevent wal checkpoints occurring
  too frequently. Note that there is a play off here between frequency
  of checkpoints and boot time after a crash. Fewer wal checkpoints will
  usually improve performance, but recovery time is longer.

- Verify your inserts into temporary tables is the bottleneck and not
  the select from existing data (explain plan etc and adjust indexes
  accordingly).

How effectively you can increase insert times will depend on what the
memory and cpu profile of the system is. More memory, less use of temp
files, faster system, so spend a bit of time to make sure your system is
configured to squeeze as much out of that RAM as you can!

--
Tim Cross



Re: Performance of INSERT into temporary tables using psqlODBC driver

2018-09-13 Thread Tim Cross


padusuma  writes:

> Hello Tim,
>
> I have tried the suggestions provided to the best of my knowledge, but I did
> not see any improvement in the INSERT performance for temporary tables. The
> Linux host on which PostgreSQL database is installed has 32 GB RAM.
> Following are current settings I have in postgresql.conf file:
> shared_buffers = 8GB
> temp_buffers = 256MB
> work_mem = 256MB
> maintenance_work_mem = 256MB
> wal_buffers = 256MB
>
> checkpoint_timeout = 30min
> checkpoint_completion_target = 0.75
> max_wal_size = 1GB
>
> effective_cache_size = 16GB
>
>>>- Increase work_mem to reduce use of temp files. Need it to be 2 to 3
>>>  times largest temp file (but use common sense)
>
>>I have already increased the work_mem and maintenance_work_mem to 256MB. I
>>will check on the temp file sizes and adjust the work_mem parameter as you
>>suggested.
>
>>- Tweak wal checkpoint parameters to prevent wal checkpoints occurring
>>  too frequently. Note that there is a play off here between frequency
>>  of checkpoints and boot time after a crash. Fewer wal checkpoints will
>>  usually improve performance, but recovery time is longer.
>
>>How effectively you can increase insert times will depend on what the
>>memory and cpu profile of the system is. More memory, less use of temp
>>files, faster system, so spend a bit of time to make sure your system is
>>configured to squeeze as much out of that RAM as you can!
>
> Please let me know if there are any other suggestions that I can try.

How are you gathering metrics to determine if performance has improved
or not?

Have you seen any change in your explain (analyze, buffers) plans?

Make sure your table statistics are all up-to-date before performing
each benchmark test. I often turn off autovacuum when doing this sort of
testing so that I know exactly when tables get vacuumed and statistics
get updated (just ensure you remember to turn it back on when your
finished!).

Are the wal checkpoints being triggered every 30 mins or more
frequently?

Are you still seeing the system use lots of temp files?

Do you have any indexes on the tables your inserting into?

As mentioned previously, there are no simple/quick fixes here - you
cannot just change a setting and see performance improve. It will be
necessary to do a lot of experimentation, gathering statistics and
investigate how postgres is using buffers, disk IO etc. All of these
parameters interact with each other, so it is critical you have good
metrics to see exactly what your changes do. It is complex and time
consuming. Highly recommend PostgreSQL: High Performance (Ahmed & SMith)
and Mastering Postgres (Shonig) for valuable background/tips - there
really is just far too much to communicate effectively via email.

Tim


--
Tim Cross



Re: Performance of INSERT into temporary tables using psqlODBC driver

2018-09-15 Thread Tim Cross
Even 3 sets per day is a possibility now.

We could have spent a lot of time tuning and re-spe'ing hardware etc to
get to 1 set per day and we would have succeeded, but that would have
been the absolute upper limit. I suspect it would have cost about the
same as the re-implementation, but with a much lower upper limit.

Re-implementation of a solution is often a hard case to sell, but it
might be the only way to get the performance you want. The big positive
to a re-implementation is that you usually get a better solution because
you are implementing with more knowledge and experience about the
problem domain. Design is often cleaner and as a result, easier to
maintain. It usually takes a lot less time than the original
implementation as well and can be the more economical solution compared
to fighting a system which has fundamental design limitations that
restrict performance.

good luck,

Tim
--
Tim Cross



Re: Legal disclaimers on emails to this group

2019-12-06 Thread Tim Cross


Craig James  writes:

> (I've changed the original subject, "autovacuum locking question", of the
> sender's email so as not to hijack that thread.)
>
> On Thu, Dec 5, 2019 at 2:26 PM Mike Schanne  wrote:
>
>> Hi,
>>
>> I am investigating a performance problem...
>> ... This email is non-binding, is subject to contract, and neither Kulicke
>> and Soffa Industries, Inc. nor its subsidiaries (each and collectively
>> “K&S”) shall have any obligation to you to consummate the transactions
>> herein or to enter into any agreement, other than in accordance with the
>> terms and conditions of a definitive agreement if and when negotiated,
>> finalized and executed between the parties. This email and all its contents
>> are protected by International and United States copyright laws. Any
>> reproduction or use of all or any part of this email without the express
>> written consent of K&S is prohibited.
>>
>
> Sorry to be off topic, but this bugs me. Language is important. This isn't
> directed at you specifically, but I see these disclaimers all the time. How
> can you post to a public newsgroup that automatically reproduces your email
> to thousands of subscribers, and additionally publishes it on
> publicly accessible archives, in direct conflict with your company's policy
> appended to your email? And why on Earth do your company's lawyers think
> this sort of disclaimer is helpful and even legally useful? Not to mention,
> do they realize it's vaguely offensive to every customer and colleague who
> receives it?
>
> Craig

Oh how I hear you!

This is what I was using as my email signature (but not for groups). I
feel for the OP who probably has little choice (other than work for a
different employer, which is a very valid choice given the 'organisational
culture' exhibited by policies requiring such nonsense)

Notice to all senders:

If you send me a message, on receipt of that message I consider that message to
be my property and I will copy, share and deceminate as I see fit. I will
provide attribution when appropriate and I willl endeavour to comply with all
reasonable requests. However, I reject all threats or implied threats of legal
action arising from an error or mistake on your part. It is your responsibility
to manage your communications appropriately, not mine.

-- 
Tim Cross