Backend doesn't catch the next command, after SIGUSR2

Patrick Samson Tue, 09 Mar 2004 07:30:11 -0800

If I run a test script enough time, it eventually
freezes in this deadlock situation:


The client sends a command to a backend and waits
for an answer. It will wait forever because the
backend
is not aware of the arrival of the request and waits
for a next command.

What happens in the loop is:
 SIInsertDataEntry: table is 70% full,
 signaling postmaster

 In reaction, the postmaster sends to its children:
 SignalChildren: sending signal 31 to process <pid>

Most of the time, it works. But at an unpredictable
iteration, it freezes.

This problem appeared first in a replication
machinery, so I reduced the number of components
involved, to get a simpler test case:
A pgtcl script, running a loop with:
 create table from another-table
 copy table to file
 drop table

The 'create table' regularly fires the '70% full'
event, and at some point, the 'copy' never gets
answered.

I attached these files:
- test.tcl: the script to run.
  Change these values to meet your context:

 set srctable pgr_qryengine_log
 set dbname euronetUsers

  The source table can be anything empty.
  In my case, it's:
CREATE TABLE public.pgr_qryengine_log
(
  pgr_sid int4 NOT NULL,
  tablename varchar(50),
  pgr_gfid int8 NOT NULL,
  pgr_grid int8 NOT NULL,
  pgr_optype varchar(2),
  pgr_when timestamp,
  pgr_username varchar(30),
  qry_result text
) WITH OIDS;

- postmaster-ok.log
 The traces of a successful iteration.
- postmaster-ko.log
 The traces of the forever waiting iteration.
 EOF is received on a ctrl/c on the client side.

Comparison of the traces shows that the signals
are processed, but the backend doesn't start a
StartTransactionCommand for the expected 'copy'.

I don't know the exact conditions for the freeze to
arise. I just noticed that chances are higher if
there is a lot of postgres.exe processes alive.
I could run 10000 runs without any extra backends.
So I opened a pgAdmin III session to have many
connexions (on multiple db, with different accounts).
With 7 to 10 processes, I reached the freeze at
3392, 2027, 6729, 272, 1871 runs.

I tried to strace the postmaster, but never managed
to have the problem. I guess strace slow down the
system too much.
I just have a strace of a correct iteration.

Done on:
- postgres 7.3.5, W2000 SP2, cygwin 1.5.5-1
- postgres 7.3.5, NT SP6, cygwin 1.5.7-1

I can't tell if the source of the problem is in
cygwin or in postgres, so I post in the two lists.

Would be helpful if anybody can reproduce the
problem, or provide advices to progress on the
debugging work.

Patrick




__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com

test.tcl
Description: test.tcl

postmaster-ok.log
Description: postmaster-ok.log

postmaster-ko.log
Description: postmaster-ko.log

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Backend doesn't catch the next command, after SIGUSR2

Reply via email to