Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

2018-05-03 Thread George Neuner
On Wed, 2 May 2018 16:01:01 -0700, Adrian Klaver
 wrote:

>On 05/02/2018 02:29 PM, Jim Michaels wrote:
>
>> 
>> - the microsoft patented CSV would be required for implementation. it
>> handles special data with commas and double-quotes in them
>
>Huh?:
>https://en.wikipedia.org/wiki/Comma-separated_values#History


Disclaimer ... I haven't investigated the claim.

However, I would not discount the possibility that Microsoft really
has patented some variation of CSV.  They absolutely did *try* to
copyright the use of + and - symbols for specifying addition and
subtraction operations in VisualBASIC.

It's possible that they slipped something past the examiners.  But
more likely the use of a CSV-like format was specified to be part of a
larger process.  In that case the format itself might not be claimed,
but rather only the *use* of the format for some specific purpose.

IANAL,
George




Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

2018-05-04 Thread George Neuner
On Thu, 3 May 2018 11:02:00 -0700, Adrian Klaver
 wrote:

>On 05/03/2018 09:47 AM, George Neuner wrote:
>> 
>> ..., I would not discount the possibility that Microsoft really
>> has patented some variation of CSV.  They absolutely did *try* to
>> copyright the use of + and - symbols for specifying addition and
>> subtraction operations in VisualBASIC.
>
>Not seeing it:
>
>http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=0&f=S&l=50&TERM1=microsoft&FIELD1=AANM&co1=AND&TERM2=csv&FIELD2=&d=PTXT


That's the patent database.  Microsoft tried to get a *copyright*.  I
don't recall whether it was granted [I don't believe it was], and this
would have been circa ~1990, so it's hard to search for in any case.
Unlike the patent database, the copyright database does not contain
the protected material - it only gives archival references to it.

It generated quite a bit of negative press coverage at the time.  The
basis of Microsoft's argument was that "x + y" was a unique and
protectable expression of the addition concept because it could have
been done in other ways, e.g., by "add(x,y)".


George




Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

2018-05-06 Thread George Neuner
On Sun, 6 May 2018 15:26:22 -0700, Ken Tanzer 
wrote:

>On Fri, May 4, 2018 at 1:03 PM, George Neuner  wrote:
>
>> On Thu, 3 May 2018 11:02:00 -0700, Adrian Klaver
>>  wrote:
>>
>> >On 05/03/2018 09:47 AM, George Neuner wrote:
>> >>
>> >> ..., I would not discount the possibility that Microsoft really
>> >> has patented some variation of CSV.  They absolutely did *try* to
>> >> copyright the use of + and - symbols for specifying addition and
>> >> subtraction operations in VisualBASIC.
>> >
>> >Not seeing it:
>> >
>> >http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%
>> 2Fnetahtml%2FPTO%2Fsearch-bool.html&r=0&f=S&l=50&TERM1=
>> microsoft&FIELD1=AANM&co1=AND&TERM2=csv&FIELD2=&d=PTXT
>>
>>
>> That's the patent database.  Microsoft tried to get a *copyright*.  I
>> don't recall whether it was granted [I don't believe it was], and this
>> would have been circa ~1990, so it's hard to search for in any case.
>> Unlike the patent database, the copyright database does not contain
>> the protected material - it only gives archival references to it.
>>
>> It generated quite a bit of negative press coverage at the time.  The
>> basis of Microsoft's argument was that "x + y" was a unique and
>> protectable expression of the addition concept because it could have
>> been done in other ways, e.g., by "add(x,y)".
>>
>>
>>
>I don't think in general you can copyright a file format. 

CSV is deliberately public domain, but I'm not at all certain that it
could not be copyrighted otherwise.  And I _am_ certain that a more
restricted derivative could be copyrighted. [more below]


You can copyright damn near anything.  Under the Berne convention, the
simple fact of authorship conveys certain rights even if the "work" is
a verbatim copy of something else.

The only rules - at least in the US - are that your expression of the
idea is not known to be common usage, and that it is neither a
verbatim copy, nor a derivative of a previously *registered* work.  

There are 3 kinds of copyrights: registered, explicit, and implicit.
The US recognizes registered and explicit copyrights, but protection
is available only for registered copyrights.
The US does not follow the Berne convention wrt implicit or explicit
copyrights.  Explicit copyrights carry no legal weight in the US, but
they *may* influence the court in case of a dispute because they do
represent a claim of the work.  The US does not recognize implicit
copyrights at all.


Checking that something is a verbatim copy of an existing work is not
that easy ... every registered work is archived in the Library of
Congress (LoC), but much of the LoC *still* is not electronically
searchable.  Determining whether something is or is not derivative of
something else is not even within the US Copyright Office's mandate
... they leave that to the courts.




>You can copyright things you create, 

Including data formats ... 

>and you can try to keep secret the information about how they work.  

Trade secret has nothing whatsoever to do with IP law.  Secrets convey
no legal protection, and a secret may still be an infringement on a
protected, publicly available work.

It might be hard to figure out that a secret is infringing ...


>People can't steal your code to create CSV files, but
>you can't tell people they can't string a bunch of values together with
>commas in between if they can figure out how to do so all by themselves.

>Plus it's hard to see how "fair use" wouldn't protect something as short as
>"x+y", or ",".

"Fair Use" has no effect on whether the work is copyrightable.  It
applies only after the fact as a possible defense against
infringement.

It doesn't even apply here.  Fair Use is not a general exception to
infringement under the law - it is in fact specifically limited to
educational and historical use.

Fair Use does not prescribe any minimum length "sampling" of the work.
It describes that there is a maximum length "sampling" that should be
permitted to be copied verbatim into a new work.  But, even there, the
allowable sample size is not fixed under the law: it is ad hoc, per
work, and decided after the fact by a court in the event of a dispute.

Note that some countries do not recognize the concept of Fair Use.


Punctuation alone is not copyrightable by law - the "work" is required
to be intelligible [for some definition - computer binaries can be
copyrighted].  The issue for a legitimate work would be whether it is
either in the public domain, or otherwise is a commo

Re: Pgagent is not reading pgpass file either in Windows or Linux.

2018-05-30 Thread George Neuner
On Tue, 29 May 2018 13:32:46 -0700, Adrian Klaver
 wrote:

>On 05/29/2018 12:14 PM, nageswara Bandla wrote:
>
>> As per the link- 
>> (https://www.postgresql.org/docs/9.6/static/libpq-pgpass.html) I set 
>> PGPASSFILE environment variable to point to pgpass.conf location. Even 
>> then, it’s throwing same above error message. I have found out that 
>> pgagent is not reading pgpass.conf file when configured under 
>> LocalSystem account.
>> 
>> When I change the properties of the pgagent service to run under my 
>> login user account. Then, it’s reading pgpass.conf file under 
>> %APPDATA%/postgresql/pgpass.conf.
>> 
>> I am clueless, why pgagent is not honoring PGPASSFILE env variable.
>
>My guess because the LocalSystem user does not have permissions on your:
>
>%APPDATA%/postgresql/pgpass.conf
>
>file. This seems to be confirmed by it working when you run pgagent as 
>the login user.

LocalSystem has administrator permissions to virtually everything.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684190(v=vs.85).aspx

It should be able to read files belonging to any user.  

But the LocalSystem account can see only *global* environment
variables ... it can't see any user specific ones.  I would check if
the PGPASSFILE variable is set globally or only in the user account.


I don't know anything specifically about running pgagent on Windows,
so I can't say why it is giving an error if the docs say it should
not.

George


>passfile
>
> Specifies the name of the file used to store passwords (see Section 
>33.15). Defaults to ~/.pgpass, or %APPDATA%\postgresql\pgpass.conf on 
>Microsoft Windows. (No error is reported if this file does not exist.)
> ^^




Re: Pgagent is not reading pgpass file either in Windows or Linux.

2018-05-31 Thread George Neuner
On Thu, 31 May 2018 11:24:18 -0500, nageswara Bandla
 wrote:

>On Wed, May 30, 2018 at 7:45 PM, George Neuner  wrote:
>
>> LocalSystem has administrator permissions to virtually everything.
>> https://msdn.microsoft.com/en-us/library/windows/desktop/
>> ms684190(v=vs.85).aspx
>>
>> It should be able to read files belonging to any user.
>>
>> But the LocalSystem account can see only *global* environment
>> variables ... it can't see any user specific ones.  I would check if
>> the PGPASSFILE variable is set globally or only in the user account.
>>
>>
>> I don't know anything specifically about running pgagent on Windows,
>> so I can't say why it is giving an error if the docs say it should
>> not.
>>
>
>
>I am setting the PGPASSFILE in system environment variables. I am not
>setting it in user specific environmental variables.


It just occurred to me that you said PGPASSFILE was set to

%APPDATA%/postgresql/pgpass.conf


The problem may be that when LocalSystem expands %APPDATA%, it is
finding its own directory, which might be any of:

   C:\Windows\ServiceProfiles\LocalService\appdata
   C:\Windows\System32\config\systemprofile\AppData 
   C:\Windows\SysWOW64\config\systemprofile\AppData

depending on your Windows version, policies (if any), and whether the
executable is 32 or 64 bit.


I wouldn't try messing with any of these directories. Instead try
setting PGPASSFILE to the full path to your file.


George




Re: Pgagent is not reading pgpass file either in Windows or Linux.

2018-05-31 Thread George Neuner
On Thu, 31 May 2018 15:40:21 -0500, nageswara Bandla
 wrote:

>On Thu, May 31, 2018 at 12:57 PM, George Neuner 
>wrote:
>
>> It just occurred to me that you said PGPASSFILE was set to
>>
>> %APPDATA%/postgresql/pgpass.conf
>>
>>
>> The problem may be that when LocalSystem expands %APPDATA%, it is
>> finding its own directory, which might be any of:
>>
>>C:\Windows\ServiceProfiles\LocalService\appdata
>>C:\Windows\System32\config\systemprofile\AppData
>>C:\Windows\SysWOW64\config\systemprofile\AppData
>>
>> depending on your Windows version, policies (if any), and whether the
>> executable is 32 or 64 bit.
>>
>>
>> I wouldn't try messing with any of these directories. Instead try
>> setting PGPASSFILE to the full path to your file.
>>
>>
>I have tried all of them, pgagent is not recognizing any of the above
>locations. In fact, I have tried both options
>
> #1. By defining PGPASSFILE to the above locations one after the other.
> #2. By copying pgpass.conf to all the three locations by creating
>Roaming/postgresql directories.
>
>And also I have defined PGPASSFILE=C:\pgpass.conf; I think, this should be
>accessible to any system account. This also not working.


One more stupid question and then I'm out of ideas ...


Have you rebooted after changing the environment variable?  

Global environment changes normally don't take effect until the user
logs out/in again.  LocalSystem is not an interactive user - you have
to restart the system to let it see environment changes.  PITA.


George




Re: Code of Conduct plan

2018-06-03 Thread George Neuner
On Sun, 03 Jun 2018 17:47:58 -0400, Tom Lane 
wrote:

>Benjamin Scherrey  writes:
>
>> Another more specific factual question - have there been incidents within
>> the active Postgresql community where behaviour by individuals who are
>> participants in the community have conducted themselves in a manner that
>> brought on the actual need for such a code of conduct to exist in the first
>> place?
>
>I believe there were a couple of unfortunate incidents at conferences.
>Now, conferences are generally expected to have their own CoCs and enforce
>them themselves; this CoC is meant more to cover on-line interactions.
>You could argue that we shouldn't create such a CoC until something bad
>happens on-line; but I'd prefer to think that having a CoC might prevent
>that from ever happening at all, which is surely better.

Unfortunately, conduct codes generally aren't worth the paper they are
written on.  People who are inclined to behave badly towards others in
the 1st place will do so regardless of any code or any consequences of
violating the code.

The only thing a conduct code really accomplishes is to make some
subset of the signers feel good about themselves.  Actions are more
important than words.

YMMV.


>In any case, we went over all these sorts of arguments at excruciating
>length in 2016.  It's quite clear to the core team that a majority of
>the community wants a CoC.  I don't think any useful purpose will be
>served by re-litigating that point.
>
>   regards, tom lane

I remember that thread, but I don't remember any vote being taken. And
the participants in the thread were self-selected for interest in the
topic, so any consensus there is not necessarily reflective of the
community at large.


I am completely in favor of civil discourse and behavior, but I am not
in favor of unenforcible red tape.


Just my 2 cents.
George




Re: Pgagent is not reading pgpass file either in Windows or Linux.

2018-06-04 Thread George Neuner


On 6/4/2018 6:31 PM, nageswara Bandla wrote:

I have figured out the issue with pgAgent both in Windows and Linux.

PgAgent seems to ignore pgpass.conf/.pgpass whenever it has 127.0.0.1 
(127.0.0.1:5432:*:postgres:postgres) throws an error:


*DEBUG: Creating DB connection: user=postgres port=5432 
hostaddr=127.0.0.1 dbname=linuxpostgresdb*


*WARNING: Couldn't create the primary connection (attempt 1): 
fe_sendauth: no password supplied*


*
*

The solution could be update .pgpass to have ( 
localhost:5432:*:postgres:postgres ) and then pgagent works fine 
without issues.



I think, pgagent is not inline with libpq.dll while passing host 
address parameter. I have raised this concern with pgagent github 
where exactly they need to change the code in order for pgagent to be 
in line with psql program.



https://github.com/postgres/pgagent/issues/14


Wow !!!

I don't use pgpass much at all - usually I want the security of 
providing the password manually.  Since we started with the idea that 
the file wasn't being read properly, I was mainly trying to figure out 
how Windows could be screwing that up. 


There is a difference between localhost and 127.0.0.1:  localhost is DNS 
resolved - usually from the local hosts file - and thus works with 
either IPv4 or IPv6 addressing.  But it never would have occurred to me 
that using one vs the other, on an otherwise properly configured system, 
should cause an error.


Great work finding that.
George



Re: pg_dump out of memory

2018-07-03 Thread George Neuner
On Tue, 3 Jul 2018 21:43:38 -0500, Andy Colson 
wrote:

>Hi All,
>
>I moved a physical box to a VM, and set its memory to 1Gig.  Everything
>runs fine except one backup:
>
>
>/pub/backup# pg_dump -Fc -U postgres -f wildfire.backup wildfirep
>
>g_dump: Dumping the contents of table "ofrrds" failed: PQgetResult() failed.
>pg_dump: Error message from server: ERROR:  out of memory
>DETAIL:  Failed on request of size 1073741823.
^^

pg_dump is trying to allocate 1GB.  Obviously it can't if 1GB is all
you have.


>pg_dump: The command was: COPY public.ofrrds (id, updateddate, bytes) TO
>stdout;
>
>wildfire=# \dt+ ofrrds
>List of relations
>  Schema |  Name  | Type  | Owner | Size  | Description
>++---+---+---+-
>  public | ofrrds | table | andy  | 15 MB |
>
>
>wildfire=# \d ofrrds
>   Table "public.ofrrds"
>Column|  Type  | Modifiers
>-++---
>  id  | character varying(100) | not null
>  updateddate | bigint | not null
>  bytes   | bytea  |
>Indexes:
> "ofrrds_pk" PRIMARY KEY, btree (id)
>

There must be a heck of a lot of data in that bytea column.


>I'm not sure how to get this backup to run.  Any hints would be appreciated.

As Adrian mentioned already, you're going to have to give it more
memory somehow.  Either more RAM or a big swap file.

George




Re: User documentation vs Official Docs

2018-07-20 Thread George Neuner
On Thu, 19 Jul 2018 21:02:16 -0400, Melvin Davidson
 wrote:

>As universities DO NOT ALLOW software to be installed on shared computers,
>and this is the case especially in a library, it implies the user has 
>their own computer. 

Many (most?) universities do allow students to install and run
software locally under their own accounts.  Of course, that doesn't
help a visitor using a lab or library computer.


>As libraries allow users/citizens to request books be purchased at no 
>cost to the user/citizen, the argument that someone cannot afford a book
>is now a moot point.

Libraries can't afford to purchase everything anyone might want.

However, libraries lend to one another as well as to their patrons.
You always can ask your library to borrow the book from some other
library that does have it.  Most libraries belong to one or more
inter-library lending networks.

[Of course, there is no telling how long it will take to get a book
that way - you may wait months for something that's hard to find, or
if you happen to be far from a decent sized city.]

Another nice thing: often when the book needs to be ordered from
another library, you can keep it checked out longer than if it came
from the local collection.  

E.g., my local library allows (most) books to be checked out for up to
2 weeks. Specially ordered books, however, can be checked out for up
to 5 weeks.  These times may not be typical, but wherever I have been,
the libraries have allowed for keeping specially ordered books longer.

YMMV,
George




Re: Question on postgresql.conf

2018-07-31 Thread George Neuner
On Mon, 30 Jul 2018 18:35:59 +, "Lu, Dan"  wrote:

>I am going through the PostgreSQL manual and came across a question 
>hoping you can help me.  It appears the "postgresql.conf" file is 
>needed by default to start Postgres.  Since we have standard with 
>other RDBMS to store the configuration file on a shared location for
>easy comparison, I would like to use a different name for this file
>like .conf.

Is there some reason that postgresql.conf cannot be a link to your
file?

George




Re: Question on postgresql.conf

2018-07-31 Thread George Neuner
On Tue, 31 Jul 2018 12:59:03 -0500, Dimitri Maziuk
 wrote:

>On 07/31/2018 12:33 PM, George Neuner wrote:
>
>> Is there some reason that postgresql.conf cannot be a link to your
>> file?
>
>It's six of one, half a dozen of the other. The big problem is when the
>network share is unavailable at host boot and all of your databases are
>belong to bitbucket. If you want to do it right, you need chef/puppet as
>already suggested.

I understand the problem with network shares - but the OP mentioned
only a "shared location", which could be just another directory on the
same device.   Because the OP said also that there were other DBMS
being used in the same shop, I considered that there might be a
respectable SAN involved.

George




Re: Web GUI for PG table ?

2019-09-13 Thread George Neuner
On Thu, 12 Sep 2019 10:56:21 -0400, David Gauthier
 wrote:

>Many good visualization options but I need one that runs on the web AND
>allows insert/update/delete records.

Most browser GUI toolkits have some kind of spreadsheet-like "grid"
widget that allows editing the displayed data.  Some frameworks also
have "database" access widgets that can connect to the DBMS and
execute queries there without needing additional middleware on the
server.

But understand that there likely will be some significant Javascript
(or compatible) programming on the frontend to make it work as you
desire.

I've dabbled a bit in frontend web programming and found it to be a
PITA ... I much prefer the server side.  I've done a fair amount of
complex desktop GUI programming over 20+ years (on Macintosh and on
Windows, some of it before the introduction of the GUI frameworks),
and personally I find browser application development to be quite
unfriendly. YMMV.

If you want to go the web route and you don't already have someone
that is pretty well versed in frontend web coding, you might want to
hire a professional.

George





Re: Cascade Trigger Not Firing

2019-09-16 Thread George Neuner
On Sat, 14 Sep 2019 10:00:18 -0500, Ron 
wrote:

>On 9/14/19 9:54 AM, Tom Lane wrote:
>[snip
>> The only mention of this feature in 38.1 "Overview of Trigger Behavior"
>> is
>>
>>  UPDATE triggers*can*  moreover be set to fire only if certain columns
>>  are mentioned in the SET clause of the UPDATE statement.
>>
>> which seems to me to be plenty specific enough --- it is carefully
>> *not* saying that the trigger will fire if the column changes value.
>> The CREATE TRIGGER man page never says that, either.
>
>Given that the UPDATE "*can* ... be set to fire only if certain columns are 
>mentioned in the SET clause of the UPDATE statement", it logically follows 
>that the default behavior is something else (for example, if the field value 
>changes for whatever reason.

But the default could be "any column mentioned", not necessarily any
value changed.

George





Re: Is this a bug ?

2019-10-24 Thread George Neuner
On Wed, 23 Oct 2019 11:27:12 -0500, Ron 
wrote:

>On 10/23/19 11:20 AM, Geoff Winkless wrote:
>> On Wed, 23 Oct 2019 at 17:09, Ron  wrote:
>>
>>> As much as I hate to say it, MSFT was right to ignore this bug in the 
>>> standard.
>> Standards are standards for a reason. It is almost never correct to
>> deliberately ignore them. If you don't like them, then revise the
>> standard.
>>
>> Historically Microsoft ignored standards either because they
>> misunderstood them or because they wanted to lock in their customers,
>> not for any reasons of altruism.
>>
>> For what it's worth, I can see a value to having
>>
>> SELECT 'this is quite a long string'
>> 'which I've joined together '
>> 'across multiple lines';
>>
>> although the advantage of it vs using a concat operator is slim.
>
>There is no advantage to using it vs using a concat operator, and all 
>disadvantage.

It WAS an advantage querying interactively on 80 character text mode
screens, and when SQL was embedded (directly) into programs written in
other languages.

Regardless of how recent the latest standard - SQL still has many ...
warts really, but I will be charitable and call them "vestiges" ... of
its roots as a 1970s language.

YMMV,
George





Re: Composite type storage overhead

2019-10-25 Thread George Neuner
On Wed, 23 Oct 2019 21:24:58 +, Laiszner Tamás
 wrote:

 Rob Sargent  wrote
>> Why not use UUID type?

> 1.
> Although it does not enforce, but the UUID type kind of suggests a
> specific interpretation of the data. Of course the documentation says
> you are free to use any algorithm to generate the values, but there
> are quite a few standard UUID types and we are not planning to use
> any of them.

The usual interpretation suggests an IP address and a timestamp, but
there is nothing that /requires/ that interpretation.  The value is
just a large integer and you can treat it that way if you want.

Postgresql doesn't check UUID data for any particular format.  You can
store arbitrary integer values into UUID columns - with the caveat
that, as UUIDs, you can only compare them and can't (easily) do any
arithmetic.


> 2.
> The serialization format is different than needed by the application
> and, while once again this is not a hard technical barrier, that
> might cause slight additional complexity and confusion.

Not sure what is the issue here.  Admittedly I don't know how pglib
passes binary UUIDs[*] but ceratinly they can be sent as strings if
necessary.

[*] Off the top of my head, I would guess a struct or array of four
32-bit values, but truly I haven't looked.


>  3.
> The value is logically defined as a 128-bit integer, that is in itself
> a compound value split into a few "bit groups". Extracting these
> parts can be done by simple (and supposedly efficient) bitwise
> operators when stored as integer, but becomes much more cumbersome
> with UUID, I guess.

The groupings are only for display / printing, to make it easier for
humans to read.  



WRT mixing logic into the key (sharding, etc.), all UUIDs except type
4 can provide you with a reasonable static sharding key.  And even
type 4 works if you shard by time.

Also, keep in mind that a UUID can be generated by a client or a key
server and thus have no relationship to the DBMS server that stores
it. Depending on how you choose to generate and use it, a UUID doesn't
necessarily carry or reveal any exploitable information.


YMMV,
George





Re: select view definition from pg_views feature request

2019-11-04 Thread George Neuner
On Sun, 03 Nov 2019 16:15:23 -0500, Tom Lane 
wrote:

>Michael Shapiro  writes:
>> It seems that the definition of a view from pg_catalog.pg_views does not
>> qualify the tables used in the view if the tables are in the current search
>> path.
>
>> Is it possible to either have the definition always qualify all tables
>> independent of the search_path (or else provide a new column that does
>> that)?
>
>Why don't you just change the search path to empty before selecting?
>
>   regards, tom lane


I'm guessing this is fine inside a transaction, but what if you're
using a client that commits by statement?  In that case, wouldn't the
change to the path affect the other backends?

George





Re: select view definition from pg_views feature request

2019-11-05 Thread George Neuner
On Tue, 5 Nov 2019 14:29:00 +1300, David Rowley
 wrote:


>See https://www.postgresql.org/docs/current/sql-set.html
>
>"SET only affects the value used by the current session."
>
>Also:
>
>"The effects of SET LOCAL last only till the end of the current transaction"
>
>Neither affects other sessions.


Ok, so you need to "ALTER DATABASE ..." for the change to affect
everyone.  Thanks for the clarification.

George





Re: status of CURSORs after DISCONNECT

2019-11-28 Thread George Neuner
On Thu, 28 Nov 2019 07:27:15 +0100, Matthias Apitz 
wrote:

>When an ESQL/C written process issues a
>
>EXEC SQL DISCONNECT [connection];
>
>do the opened CURSOR(s) still survive? We run into the problem that the
>father process issues DISCONNECT before forking children, the forked child
>CONNECTs to the same server and database again and "thinks" it has to CLOSE the
>CURSOR (perhaps a long standing bug in our application we port now from
>Sybase(...) to PostgreSQL).
>
>Thanks
>
>   matthias

Cursors are session specific so if the PG backend exits due to the
disconnect, any cursors it is managing will be closed.  

However, if you are using a connection pooler to keep PG backend
processes alive for reuse, then disconnecting does not necessarily end
the session.  In this case, you need to make sure any cursors are
closed because the next connection to the same backend process may
know nothing about them.

In any event, it's good practice always to close cursors when you are
done with them - if for no reason other than to release resources they
are holding.

George





Re: status of CURSORs after DISCONNECT

2019-11-28 Thread George Neuner
On Thu, 28 Nov 2019 09:58:50 -0500, Tom Lane 
wrote:

>Matthias Apitz  writes:
>> When an ESQL/C written process issues a
>> EXEC SQL DISCONNECT [connection];
>> do the opened CURSOR(s) still survive?
>
>No.  Cursors are purely session-local objects in Postgres.
>I'm a bit surprised to hear it might be different in Sybase.
>
>   regards, tom lane
>

Sybase has a different notion of "session" that permits multiple
connections, and certain per session objects such as temp tables can
(optionally) be shared among all connections to the same session.

I'm not sure whether cursors similarly can be shared. Sybase allows
cursors and temp tables to be statically declared in the schema DDL.
Declared cursors and temp tables do not need to be "created"
explicitly - if they do not already exist, they can be instantiated
simply by mention in a query.

I have never tried sharing a cursor (or even using a declared cursor).
Queries in Sybase are *connection* specific, so I would think it would
be an error for multiple queries to try to reference the same cursor
name simultaneously.  But I haven't worked with Sybase for some years
so I am not up to date on the current software.

George





Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

2019-12-15 Thread George Neuner
On Tue, 10 Dec 2019 18:00:02 -0600, Ron 
wrote:

>On 12/10/19 3:11 PM, Erwin Brandstetter wrote:
>> I am looking for stable hash functions producing 8-byte or 4-byte hashes 
>> from long text values in Postgres 10 or later.
>>
>> There is md5(), the result of which can be cast to uuid. This reliably 
>> produces practically unique, stable 16-byte values. I have usecases where 
>> an 8-byte or even 4-byte hash would be good enough to make collisions 
>> reasonably unlikely. (I can recheck on the full string) - and expression 
>> indexes substantially smaller. I could truncate md5 and cast back and 
>> forth, but that seems like a lot of wasted computation. Are there 
>> suggestions for text hash functions that are
>> - fast
>> - keep collisions to a minimum
>> - stable across major Postgres versions (so expression indexes don't break)
>> - croptographic aspect is not needed (acceptable, but no benefit)
>
>What about a CRC32 function?  It's fast, and an SSE4 instruction has been in 
>Intel CPUs for about 10 years.

On long text CRC will not be as discriminating as a real cryptohash,
but it may be considerably faster to compute depending on the length
of the text.  Given that the OP can check the actual string in the
event of a collision, it may work well enough.

One way to make it more discriminating is to compute a pair of CRCs
using different polynomials.  Unfortunately the SSE4.2 CRC32
instruction uses a fixed polynomial.  You can compute it twice using
different initial values, but the result won't be as good as actually
using different polynomials.


I used to create 4-byte hashes by concatenating two 16-bit CRCs - one
using the X-modem polynomial and the other using the CCITT polynomial.
Since these gave very different results, it worked quite well for my
use case where all the strings were guaranteed < 16KB.

YMMV,
George





Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

2019-12-17 Thread George Neuner
On Sun, 15 Dec 2019 20:23:25 -0600, Ron 
wrote:

>On 12/15/19 3:59 PM, George Neuner wrote:
>
>> On long text CRC will not be as discriminating as a real cryptohash,
>
>When specifying a 4 byte hash, something must be sacrificed...

Obviously.  But the main point is that CRC never was designed to
uniquely fingerprint data - it was designed to detect corruption of
the data, which is a much weaker guarantee than the cryptodigest
hashes.

Despite being the same length as an MD5 hash, a 128-bit CRC still
might not be as discriminating ... it depends greatly on the CRC
polynomial used and on the length of the input.

George





Re: Changing default ../data/ directory

2020-01-04 Thread George Neuner
On Sat, 4 Jan 2020 07:20:44 -0800 (PST), Rich Shepard
 wrote:

>As this is all new to me I want to learn how to:
>
>1) Define a postgresl.conf and learn where it should be located.
>
>2) Move all current databased in /var/lib/pgsql/11/data to /data/pgsql/11.
>
>3) Create all new databases in /data/pgsql/11 by default.
>
>I'm sure it's in the manual yet I'm unsure just where to start reading.
>
>Rich

Since the new drive is local you can just move the data directory to
its new location and link to it from the default (/var) location.  No
configuration changes needed.

Won't help you learn necessarily, but solves the problem.

YMMV,
George





Re: WaitForMultipleObjects in C Extension

2020-01-14 Thread George Neuner
On Tue, 14 Jan 2020 10:35:09 +0300, ?lyas Derse 
wrote:

> Hi guys I need your experiences. I made an Extension in C. When I stop the
>query on PostgreSQL,I want to control my extension's result with use cases
>in C.So I think , I should use WaitForMultipleObjects.

Not really understanding this.


>Is there a like a SignalObjectHandle() ? By the way I'm using Windows.

Not exactly.  Windows does has some signaling objects like Unix/Linux,
but they exist only in the filesystem and network stack, and they are
signaled only by events within those subsystems.  IOW, they can't be
signaled manually.

For general communication/synchronization purposes Windows uses an
event based notification system.  See:
https://docs.microsoft.com/en-us/windows/win32/sync/using-synchronization
https://docs.microsoft.com/en-us/windows/win32/sync/using-event-objects

George





testing post through gmane.io

2020-01-19 Thread George Neuner


gmane.org is now gmane.io
hope this works





Re: Watching for view changes

2018-12-20 Thread George Neuner
On Thu, 20 Dec 2018 12:12:14 -0800, Mitar  wrote:


>On Thu, Dec 20, 2018 at 8:11 AM Mike Rylander  wrote:
>> If, as I assume, the FROM clause of the SELECT is essentially static,
>> just with varying WHERE conditions, you could just use an AFTER
>> INSERT/UPDATE/DELETE trigger on each of the tables involved to send a
>> NOTIFY whenever data that is exposed by the SELECT is changed.
>
>You mean that for every SQL query I am making to a database, I would
>manually determine which tables are involved and then setup triggers
>with NOTIFY that the SELECT might have changed?

You can just put an update trigger on every table.  You aren't forced
to listen for notifications.


>I am trying to see if this could be something I could abstract out
>that it would be done automatically for any query. I have looked into
>using EXPLAIN to get a list of tables involved in a query, but the
>issue is that it look non-trivial to determine what has really changed
>as a consequence of those tables changing. I would have to cache
>myself what was the result of a query and then myself do a diff? Are
>there any ways to do this inside PostgreSQL without having to cache
>results on the client and do it there?

You don't need to cache anything on the client.  An update trigger on
a table can notify a listening client when data is changed.  

The difficulty is that views are not guaranteed to be updateable.  As
a technical matter, you can put a trigger on a view, but it may never
fire.

AFAIK, update and delete triggers do work on materialized views,
because they really are physical tables.  Depending on your use case,
materialized views may or may not be of help to you.



>> You can deliver a payload, such as the table name, primary key value and,
>> with a little work, even a list of fields that were modified as a JSON
>> blob, and let the application do whatever needs to be done to react
>> the the changes -- issue other queries, etc.
>
>But this would mean that I would have to know how changes on involved
>tables influence query results. 

The trigger function itself is agnostic WRT the format of the table -
the old and new row data are provided generically as records, and you
can to convert the record data, e.g., to JSON or XML, without knowing
its format.

AFAIHS, you really only need to know the table format to inspect or
modify the row data.


>I would like to not have to do SQL query parsing and understanding
>on the client. So ideally, I would get information directly from 
>PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on
>FOR EACH ROW on a view would be perfect. In that trigger I could get
>information which rows of the view changed and then use NOTIFY to 
>inform the client. 

You might want to use BEFORE UPDATE so you get both the old and new
row data.


YMMV,
George




Re: Watching for view changes

2018-12-21 Thread George Neuner
On Fri, 21 Dec 2018 20:49:23 -0800, Mitar  wrote:

>On Thu, Dec 20, 2018 at 6:28 PM George Neuner  wrote:
>
>> The difficulty is that views are not guaranteed to be updateable.
>
>What you mean by that? I mean, just to be clear. I care only about
>read-only views. The changes to views I care about change view because
>underlying tables are updated. I do not care about UPDATE queries
>against views themselves.
>
>So you mean here that it is not guaranteed that you can make an UPDATE
>query against a view? Yes, that I know. But if underlying tables
>change, view is always updated, no? I mean, this is the main purpose
>of a view. :-)

Terminology.   UPDATE vs update.

I'm certain you know this already, but to be clear:


A normal view really is just a SELECT whose results reflect the
current data in the underlying tables.  That is, a normal view is
*code*, NOT data, and its result is an anonymous *virtual* table that
exists only in the transaction that applied the view.  Thus the
results can't be monitored for changes without employing some kind of
deliberate caching.

A materialized view IS exactly such a deliberate cache of results from
applying a view.  It is a real table that can be monitored for changes
using INSERT, UPDATE and/or DELETE triggers.  


>> As a technical matter, you can put a trigger on a view, but it may never 
>> fire.
>
>If fires only for UPDATE queries against views themselves (and only if
>INSTEAD OF trigger does not invalidate the update). But it never fires
>for updates which happen because of changes to the underlying tables.

Right.  A trigger on a (normal) view actually is on the virtual result
table - but that table doesn't exist outside of the transaction that
applied the view.

I'm not certain offhand, but I *believe* that - like a SELECT - view
results will change on the fly during a transaction if the underlying
tables change and the isolation settings allow to see it.

But even if a view will update while open, to use this you'd have to
hold the result set open (with a cursor) while also keeping the
transaction open.  Once the result set is closed, the view results are
gone.


With a materialized view, you must apply the view code again (call
REFRESH) to see changes to the underlying tables - it doesn't happen
automagically.  But when refreshed, triggers on the cache table would
react to changes.


>I would like to know:
>
>a) Do people agree/think that would be a good API for my use case?
>b) How hard would it be to implement/develop something like that? Is
>this something PostgreSQL already knows internally and it is just a
>question of exposing it?

It probably would be possible to take a trigger set on a view and
transitively trigger on the base tables underlying it.  

But to what end?  The results (of a normal view) don't persist beyond
the current transaction, and allowing them to do so would, AFAICS, be
a violation of the SQL standard.  Postgresql is not going to do that
(certainly not for a new feature, only for backward compatibility).

>c) Is there some better way to achieve this?
>  :
>Yes, materialized views are too heavy for me. But having update and
>delete triggers only memory-only temporary views would be perfect.

Unfortunately, materialized views are the only reasonable server side
solution that I can think of.


>Also, materialized views have to be manually refreshed, no? So it is
>not really true that they get updated automatically (and that update
>triggers would run) as soon as underling tables are modified?

Yes.  But you can do that with triggers on the base tables.  Or timers
if "most current" information is not critical.  

Perhaps you can use synchronized normal and materialized views.  Use
the normal view for current information in transactions, and the
materialized view to asynchronously update passive LISTEN clients with
delta information.


>> >I would like to not have to do SQL query parsing and understanding
>> >on the client.

That I understand.   I'm a compiler and language hobbiest ... 
even just parsing modern SQL can be painful.


>> So ideally, I would get information directly from
>> >PostgreSQL. For me, an API where I could do AFTER UPDATE trigger on
>> >FOR EACH ROW on a view would be perfect. In that trigger I could get
>> >information which rows of the view changed and then use NOTIFY to
>> >inform the client.
>>
>> You might want to use BEFORE UPDATE so you get both the old and new
>> row data.
>
>Not sure how this helps. 

If you know specifically what columns have changed, only they would
need be communicated to a LISTEN client.  Not necessary to send the
whole row (which may be much larger).

AFTER shows you only the new row.  BEFORE shows you both the old and
new rows so you can determin

Re: Watching for view changes

2018-12-23 Thread George Neuner
On Fri, 21 Dec 2018 23:41:16 -0800, Mitar  wrote:

>Hi!
>
>On Fri, Dec 21, 2018 at 11:10 PM George Neuner  wrote:
>> A materialized view IS exactly such a deliberate cache of results from
>> applying a view.  It is a real table that can be monitored for changes
>> using INSERT, UPDATE and/or DELETE triggers.
>
>Caching is needed if you want to compute a difference between previous
>version and new. But if you want to just know new value, then I could
>imagine that (a simple implementation would) on every change to any
>underlying table check if this change matches selectors of the query
>and if such apply its operations/projections and produce the new
>value.

Yes, that could be done.  But it isn't.  In effect you are asking the
DBMS also to be a spreadsheet: i.e. change this cell and everything
that depends on it gets recomputed.

A spreadsheet is an order of magnitude simpler to implement than a
DBMS, but the combination would be an order of magnitude (or more)
harder.  Even object graph databases don't offer spreadsheet
functionality, and it would be a lot easier to do there than in a
table relational system.


>So yes, you need caching if you want to decrease CPU use, but you
>could also see it as new values being computed again and again through
>query. Would such caching you are mentioning really improve
>performance, I do not know, so it might be premature optimization?

It may take only 3 cycles to multiply two numbers, but it can take
thousands of cycles [or millions if the data is on disk] to get those
two numbers into the multiplier.

There always are exceptions, but the general rule is that whenever the
result requires:
 - significant computation, 
 - significant resources, or
 - significant time
then you should cache the result instead of recomputing it.

Joins and sorts can take a whole lot of memory (and spill onto disk if
they overflow the work buffer).  A fetch of a table or index not in
memory is simple but takes a lot of time - as well as maybe pushing
something else out (increasing the complexity of a competing query).


>If we do go down the cache path, then I agree, materialized views
>seems nice, but I would prefer temporary materialized views: they
>should be cleaned up at the end of the session. Moreover, they should
>be ideally just in memory, not really on disk. Materialized views are
>currently stored to disk, no?

In PG, all *named* tables are backed on disk - even temporary tables.
Only anonymous tables of query results can exist entirely in memory
[and even they can spill onto disk when necessary].

With enough memory you can cache all your tables in shared buffers and
have enough extra that you never run out of work buffers and never
overflow a work buffer.  But that is the best you can achieve with PG.

George




Re: Watching for view changes

2018-12-23 Thread George Neuner
On Sun, 23 Dec 2018 23:06:51 -0800, Mitar  wrote:

>On Fri, Dec 21, 2018 at 11:10 PM George Neuner  wrote:
>> A materialized view IS exactly such a deliberate cache of results from
>> applying a view.  It is a real table that can be monitored for changes
>> using INSERT, UPDATE and/or DELETE triggers.
>
>Are you sure one can use triggers on a materialized view? I am getting:
>
>"my_materialized_view" is not a table or view
>
>as an error when I am trying to create a trigger on materialized view.

IIRC the cache table's name is generated.  I don't know the proper
incantations to get it from the catalogs, but an easy way to find it
is to put the materialized view into its own tablespace, then search
pg_tables for objects in that space.

George




Re: Watching for view changes

2018-12-24 Thread George Neuner
On Sun, 23 Dec 2018 10:10:50 -0800, Mitar  wrote:

>I can then wire triggers on underlying tables to REFRESH materialized
>views automatically. Is there some easy way to debounce those
>refreshes? If I just blindly trigger REFRESH in every row trigger,
>this could do a lot of work. I would prefer to combine all changes for
>example into 100 ms batches and refresh only once per 100 ms. So in
>some way I would like to be able to control the level of real-time I
>would need. I have found a blog post [1] about this, but it seems
>pretty tricky and requires an outside cron job. For 100 ms batching
>time this feels like something better done inside PostgreSQL itself.

see the pg_cron extension.

But cron doesn't allow sub-second scheduling. Why do you need it so
frequently?  

PG is not a real time platform, or even an all-in-memory database.  If
you try to use it that way, you are guaranteed to have problems.


>The last question is how do I get changes in materialized views
>streamed to the client. It seems one option is a trigger on the
>materialized view which uses NOTIFY to tell the client about the
>change. But NOTIFY has limit on the payload size, so I cannot just
>send the change to the client. 

If you trigger by row, the NOTIFY payload is just that one row.  If
the data can't fit into the 8K payload, then you need to send some
kind of row id and have the client read the changes explicitly.

Also remember that you may have to deal with DELETEd rows. If you
can't send row data by NOTIFY, then the client HAS to cache the whole
view anyway to see what's been deleted.


>It seems I would need additional table
>to store the change and then client would get notification, read from
>that table, and remove the rows read. So in a way I would need my own
>queue for changes. 

There's no reason to do that. The client has to interpret the view
changes and incorporate them into its own local data structures.
Segregating new/modified rows separately on the server side seems to
me to be a waste of effort.

If it makes sense, have the client collect some number of
notifications and read all the indicated rows in one query.


George




Re: Determine in a trigger if UPDATE query really changed anything

2018-12-24 Thread George Neuner
On Sun, 23 Dec 2018 20:21:22 -0800, Mitar  wrote:

>Currently I am doing:
>
>CREATE TRIGGER some_trigger AFTER UPDATE ON my_table REFERENCING NEW
>TABLE AS new_table OLD TABLE AS old_table FOR EACH STATEMENT EXECUTE
>FUNCTION my_trigger();
>
>In my trigger I do:
>
>PERFORM * FROM ((TABLE new_table EXCEPT TABLE new_table) UNION ALL
>(TABLE new_table EXCEPT TABLE old_table)) AS differences LIMIT 1;
>IF FOUND THEN
>  ...
>
>But I wonder if there is an easier way. I would just like to know if
>an UPDATE really changed anything.
>
>For DELETE I do "PERFORM * FROM old_table LIMIT 1"  and for INSERT I
>do "PERFORM * FROM new_table LIMIT 1" and I think this is reasonable.
>Still, not sure why I have to store the whole relation just to know if
>statement really changed anything.

You're still thinking in terms of deltas for the whole table.  Think
individual rows instead.  

With a BY ROW trigger, the difference between the NEW and OLD
variables lets you see the changes to the particular row.

George




Re: Watching for view changes

2018-12-24 Thread George Neuner
On Mon, 24 Dec 2018 01:16:53 -0800, Mitar  wrote:

>
>I am unable to find cache table's name. :-(
>

Did you execute the view or simply define it?  I think the table is
created the 1st time the view is executed.

George




Re: Varlena with recursive data structures?

2019-01-16 Thread George Neuner
On Wed, 16 Jan 2019 22:08:35 +, Sam Patterson
 wrote:

>Hi all,
>
>I've recently started developing an extension for Postgres for which I'll
>need to create a new variable-length base type. The type will require a
>tree-like structure in order to parse sufficiently, which of course
>probably means having some sort of recursive data structure, like a struct
>that has members which are pointers to itself for child nodes. After doing
>some research, specifically looking at how other variable-length data types
>store their data, it seems almost all of them store the data in a binary
>representation, using bit masks and offsets etc in order to store/access
>the data whilst having an in-memory representation that's used to
>manipulate the data.
>
>I presume the purpose for using this approach is because all the data in a
>varlena type has to be contiguous, and the moment you start using pointers
>this is no longer possible. So my question is, given a structure that looks
>something like this,
>
>typedef struct Node
>{
>char *data;
>Node *left;
>Node *right;
>} Node;
>
>am I right in saying that I wouldn't be able to store that representation
>on-disk, but instead I'd have to transform it into some binary
>representation and back again when writing/reading respectively, are there
>any alternatives?
>
>Regards,
>
>Karl


You might want to consider using a linear tree representation that can
be searched directly.  See:
https://en.wikipedia.org/w/index.php?title=Binary_tree§ion=9#Arrays


E.g.,

Make the tree nodes:  { unsigned data, left, right }
Store the nodes contiguously as an array, and represent the left/right
branches using simple array indexing.

Write the node data contiguously into a separate array/stream, keeping
an offset [or index] to each data element in its correponding tree
node.

And then write the node array and the data array/stream contiguously
onto the disk.


For reading/searching this structure can be used directly: read in the
homogenous node array, and suck the variable length data stream into a
separate buffer.

If it isn't important that the tree be optimally ordered, then insert
is simple: just append a new node to the array, append its data to the
data stream, and update branch indexes to link the new node at the
correct spot in the tree.

If you want the data to be as small as possible on disk, then update
or delete of a node requires repacking the data and adjusting the data
offsets of the "unchanged" nodes.
https://en.wikipedia.org/wiki/Mark-compact_algorithm

[Obviously, you can delete without repacking by just unlinking the
tree node and rewriting the node array.  The "deleted" node data is
left in place on disk but it becomes unreachable.]

If you can afford to (temporarily) sort the tree nodes by their data
offset, repacking the data can be done in the same buffer by sliding
each element forward to be contiguous with the previous.  If you need
to take the nodes in whatever order they occur, then to pack the data
you need to copy it to a new buffer.


Hope this helps.
George




Re: Future Non-server Windows support???

2019-03-03 Thread George Neuner
On Sun, 03 Mar 2019 08:30:48 +0100, Laurenz Albe
 wrote:

>Bill Haught wrote:
>> My main concern is that Microsoft has Enterprise versions of Windows and 
>> versions for everything else which makes me wonder if at some point 
>> Windows versions for desktop use may not have features needed by some 
>> database applications or differences between the versions may be enough 
>> to necessitate slight tweaks to code and compiling additional versions.
>
>Speaking as a semi-ignorant, I had the impressions that all Windows versions
>are pretty similar under the hood (with occasional annoying behavior changes),
>and most of the differences are on the GUI level, while the C API is pretty
>much the same.
>
>Yours,
>Laurenz Albe

Server versions are optimized for running background services rather
than interactive programs.  I don't know all the differences offhand,
but I do know the servers use different algorithms (not just settings)
for scheduling, memory management, and file caching.  

It isn't possible to tweak a Windows desktop into a server with just
settings - the server actually is running (at least some) different
code.

George




Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

2017-12-12 Thread George Neuner
On Tue, 12 Dec 2017 07:18:10 -0600, John McKown
 wrote:

>?This is a guess on my part, based on many years on an EBCDIC system. But
>I'll bet that they are doing a conversion off of the EBCDIC system (maybe
>Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They
>want to be able to compare the output from the existing system to the
>output on the new system. 

[he asks naively:]

If a port off the EBCDIC system really is the case, wouldn't it be
easier just to write an external compare tool?   Or maybe a PG
function to convert UTF strings as EBCDIC so as to compare string
column dumps?

Either of the above seems more rational than putting a whole lot of
effort into creating an EBCDIC collation.   

[Not that it is terribly hard to create a new collation.  It's more a
matter of "why bother?" when you are trying to move away from that
system in the 1st place.]

YMMV,
George




Re: data-checksums

2018-01-09 Thread George Neuner
"On Tue, 9 Jan 2018 20:02:37 +0100, "Rakesh Kumar"
 wrote:

>Hello Mr. Pedantic,
>
>> noticeable != huge.
>
>and noticeable != small/negligible either, at least from English
>point of view.

small != negligible. 

The word "noticable" does not imply any particular magnitude of event.
It means only that  can be observed.

There is no canon technical usage.  In layman's use, "noticable" often
*does* imply that an effect is small, and that one might not see it if
not looking specifically for it.


Unfortunately, English is a slippery language.  Perhaps technical
documents should be written in Sumerian.

YMMV,
George




Re: SSD filesystem aligned to DBMS

2018-01-16 Thread George Neuner
On Tue, 16 Jan 2018 16:50:28 +, Michael Loftis 
wrote:

>Alignment definitely makes a difference for writes. It can also make a
>difference for random reads as well since the underlying read may not line
>up to the hardware add in a read ahead (at drive or OS Level) and you’re
>reading far more data in the drive than the OS asks for.

Best performance will be when the filesystem block size matches the
SSD's writeable *data* block size.  The SSD also has a separate erase
sector size which is some (large) multiple of the data block size.



Recall that an SSD doesn't overwrite existing data blocks.  When you
update a file, the updates are written out to *new* "clean" data
blocks, and the file's block index is updated to reflect the new
structure.  

The old data blocks are marked "free+dirty".  They must be erased
(become "free+clean") before reuse.  Depending on the drive size, the
SSD's erase sectors may be anywhere from 64MB..512MB in size, and so a
single erase sector will hold many individually writeable data blocks.

When an erase sector is cleaned, ALL the data blocks it contains are
erased.  If any still contain good data, they must be relocated before
the erase can be done.



You don't want your filesystem block to be smaller than the SSD data
block, because then you are subject to *unnecessary* write
applification: the drive controller has to read/modify/write a whole
data block to change any part of it.

But, conversely, filesystem blocks that are larger than the SSD write
block typically are not a problem because ... unless you do something
really stupid [with really low level code] ... the large filesystem
blocks will end up be an exact multiple of data blocks.


Much of the literature re: alignment actually is related to the erase
sectors rather than the data blocks and is targeted at embedded
systems that are not using conventional filesystems but rather are
accessing the raw SSD.

You do want your partitions to start on erase sector boundaries, but
that usually is trivial to do.


>Stupidly a lot of this isn’t published by a lot of SSD manufacturers, but
>through benchmarks it shows up.

Yes.  The advice to match your filesystem to the data block size is
not often given.


>Another potential difference here with SAS vs SATA is the maximum queue
>depth supported by the protocol and drive.

Yes. The interface, and how it is configured, matters greatly.


>SSD drives also do internal housekeeping tasks for wear leveling on writing.

The biggest of which is always writing to a new location.  Enterprise
grade SSD's sometimes do perform erases ahead of time during idle
periods, but cheap drives often wait until the free+dirty space is to
be reused.


>I’ve seen SSD drives benchmark with 80-90MB sequential read or write,
>change the alignment, and you’ll get 400+ on the same drive with sequential
>reads (changing nothing else)
>
>A specific example
>https://www.servethehome.com/ssd-alignment-quickly-benchmark-ssd/

I believe you have seen it, but if the read performance changed that
drastically, then the controller/driver was doing something awfully
stupid ... e.g., re-reading the same data block for each filesystem
block it contains.


YMMV.
George




Re: Regex Replace with 2 conditions

2018-02-06 Thread George Neuner
On Mon, 5 Feb 2018 18:22:02 +0100, Francisco Olarte
 wrote:


>I repeat for the last time. YOU ARE NOT USING ASCII. ASCII IS A SEVEN
>BIT CODE, 0-128. "?" IS NOT IN THE ASCII CHARACTER SET.

What ASCII table are you reading?  The question mark symbol is #63. It
lies between the numbers and the capital letter set.

George




Re: Regex Replace with 2 conditions

2018-02-06 Thread George Neuner


On 2/6/2018 10:52 AM, David G. Johnston wrote:
On Tue, Feb 6, 2018 at 8:46 AM, George Neuner <mailto:gneun...@comcast.net>>wrote:


On Mon, 5 Feb 2018 18:22:02 +0100, Francisco Olarte
mailto:fola...@peoplecall.com>> wrote:


>I repeat for the last time. YOU ARE NOT USING ASCII. ASCII IS A SEVEN
>BIT CODE, 0-128. "?" IS NOT IN THE ASCII CHARACTER SET.

What ASCII table are you reading?  The question mark symbol is #63. It
lies between the numbers and the capital letter set.


​Your mail client mangled that sentence - the "?" you are seeing is a 
placeholder for the non-ASCII character "'a' with a bowl on top of it"​...


Thanks David.   Apologies to everyone for the noise.

George



Re: Regex Replace with 2 conditions

2018-02-06 Thread George Neuner
On Tue, 6 Feb 2018 17:57:33 +0100, Francisco Olarte
 wrote:

>So, no question mark sent, I suspect your mail chain may be playing
>tricks on you, or may be you are translating to 7 bits on purpose
>since your mail came with the headers:
>

>Content-Type: text/plain; charset=us-ascii
>Content-Transfer-Encoding: 7bit
><<<
>
>I'll suggest you fix that before participating in threads with unicode
>content. Also, many programs use ? as a placeholder for something not
>in its charset, so always suspect you are not seeing the right char
>when you encounter one of this things.

Usually I do see unicode characters correctly.

My news client defaults to *sending* in ISO-8859-1 (US acsii), but it
displays incoming messages in UTF-8, and in HTML if applicable ... so
I'm not sure why I'm not seeing whatever it was that you actually
typed.  It does keep coming through as a question mark in all the
responses.

I read this group through the Gmane mail->news reflector ... maybe
that has something to do with it?

George




Re: Connection loosing at some places - caused by firewall

2018-02-13 Thread George Neuner
On Tue, 14 Nov 2017 12:09:31 +0100, Durumdara 
wrote:


>*I disabled my firewall at home - the [keepalive] problem vanished!!!*

What firewall are you using?  Windows own firewall doesn't interfere
with keepalive packets.  Most commercial SOHO firewalls won't either.

George




Re: shared_buffers 8GB maximum

2018-02-16 Thread George Neuner
On Sat, 17 Feb 2018 00:36:57 +0200, Vitaliy Garnashevich
 wrote:


>- I'm not a huge Linux expert, but I've heard someone saying that 
>reading from the filesystem cache requires a context switch. 

Yes.

>I suspect >that such reads are slightly more expensive now after the 
>Meltdown/Spectre patch in the kernel.

Not necessarily - it depends on exactly what was changed ... which
unfortunately I don't know for certain.

Any filesystem call is a kernel transition.  That's a Meltdown issue. 
Meltdown can be avoided by using trampoline functions to call the
(real) kernel functions and isolating each trampoline so that no other
code immediately follows it.  This wastes some memory but there is
very little added time cost.


Spectre is about snooping within the user space of a single process -
it has nothing to do with kernel calls.  The issues with Spectre are
things like untrusted code breaking out of "sandboxes", snooping on
password handling or encryption, etc.

Fixing Spectre requires purposefully limiting speculative execution of
code and can significantly affect performance.  But the effects are
situation dependent.


>Could that be a reason for increasing the value of shared_buffers?
>
>- Could shared_buffers=128GB or more on a 250 GB RAM server be a 
>reasonable setting? What downsides could there be?

It depends.  8GB is pretty small for such a large server, but taking
1/2 the RAM is not necessarily the right thing either.

The size of shared buffers affects log size and the time to complete
checkpoints.  If a large(ish) percentage of your workload is writes,
having a very large shared space could be bad for performance, or bad
for space on the log device.

Another reason may be that the server is not dedicated to PG but does
other things as well.  Dramatically increasing PG's memory use may
negatively impact something else.


>PS. Some background. We had shared_buffers=8GB initially. In 
>pg_stat_bgwriter we saw that dirty buffers were written to disk more 
>frequently by backends than during checkpoints (buffers_clean > 
>buffers_checkpoint, and buffers_backend > buffers_checkpoint). According 
>to pg_buffercache extension, there was very small percentage of dirty 
>pages in shared buffers. The percentage of pages with usagecount >= 3 
>was also low. Some of our more frequently used tables and indexes are 
>more than 10 GB in size. This all suggested that probably the bigger 
>tables and indexes, whenever scanned, are constantly flushing pages from 
>the shared buffers area. After increasing shared_buffers to 32GB, the 
>picture started looking healthier. There were 1GB+ of dirty pages in 
>shared buffers (vs 40-200MB before), 30-60% of pages with usagecount >= 
>3 (vs 10-40% before), buffers_checkpoint started to grow faster than 
>buffers_clean or buffers_backend. There is still not all frequently used 
>data fits in shared_buffers, so we're considering to increase the 
>parameter more. I wanted to have some idea about how big it could 
>reasonably be.

So now you know that 32GB is better for your workload than 8GB.  But
that is not necessarily a reason immediately to go crazy with it.  Try
increasing it gradually - e.g., adding 16GB at a time - and see if the
additional shared space provides any real benefit.


>PPS. I know any possible answer might be just a recommendation, and 
>parameter values should be tested for each specific case, but still 
>wanted to hear your opinion. Thanks.
>
>Regards,
>Vitaliy

George




Re: Database health check/auditing

2018-02-16 Thread George Neuner
On Sat, 17 Feb 2018 07:23:32 +1100, Tim Cross 
wrote:

>Thomas Kellerer  writes:
>
>> Plus: trimming the original content, so that not the whole email
>> thread is repeated in the quote.
>
>While I'm happy to comply, I disagree with trimming/editing the
>thread. Certainly made sense when networks were slower and MUAs were
>less sophisticated, but these days I prefer using a MUA which is able to
>fold the original content and keep all the content in the message. This
>reduces confusion, misunderstanding and incorrect attribution arising
>from poor editing and makes it easy to reference the full discussion in
>one spot rather than having to go back through messages putting it all
>back together. YMMV.
>
>Tim

The problem is that conversations rarely remained confined to a single
topic [this thread for example 8-)].

The list servers do have message size limits (independent of any email
server limits).  It isn't a problem for simple text messages, but a
lot of people now are sending MIME messages that include all the text
twice (or 3 times) plus embedded HTML+CSS formatting.

On some lists, I have seen MIME messages (mostly sent from Google
Groups users) that contain - not kidding - 5,000+ lines of hidden
HTML+CSS surrounding a few lines of comment.


I agree with you that messages should stand alone wrt to the subject
they address.  It's just that, too often, messages that contain a lot
of quoted material have wandered considerably and so are mostly noise.

YMMV,
George




Re: Connection loosing at some places - caused by firewall

2018-02-19 Thread George Neuner
On Mon, 19 Feb 2018 13:02:30 +0100, Durumdara 
wrote:

>2018-02-13 21:21 GMT+01:00 George Neuner :
>
>> On Tue, 14 Nov 2017 12:09:31 +0100, Durumdara 
>> wrote:
>>
>> >*I disabled my firewall at home - the [keepalive] problem vanished!!!*
>>
>> What firewall are you using?  Windows own firewall doesn't interfere
>> with keepalive packets.  Most commercial SOHO firewalls won't either.
>>
>Normal Windows Firewall (Defender). An ASUS test notebook works. The DELL
>isn't.
>So something is different in the FW on DELL. With manually defined (faster)
>keepalive it's working.

Keepalive is part of the TCP protocol - it's an ACK that repeats the
last used packet sequence number.  Under normal circumstances, the
sequence number is incremented (modulo rollover) for each new packet.
Keepalive is recognized as a deliberate and particular breaking of the
packet sequence.

Windows firewall has no settings pertaining to keepalive separate from
the operating system.  Once you have permitted TCP connections to the
port or program, if any keepalive packets are sent, the firewall must
allow them through.

[Of course, both sides must agree on the keepalive settings for it to
work, but that is a different issue.]


It may be that your firewall is f'd up.  The firewall doe perform
"stateful inspection" of open TCP connections - essentially looking
for protocol mistakes that may indicate intrusion hacks or replayed
connections.  It could be that the SI module is losing synchronization
when the connection goes idle.
[Back in the day, there were a number of SOHO NAT routers that had
this and other problems with their stateful inspection code.]


You can try running a system file check to see if something has gotten
corrupted:
https://support.microsoft.com/en-us/help/929833/use-the-system-file-checker-tool-to-repair-missing-or-corrupted-system

You also can try resetting the firewall rules to default from the
control panel.  Of course then you'll have to reauthorize every
program that requires a connection.  


for more information on how the firewall works, see: 
https://technet.microsoft.com/en-us/library/cc779199(v=ws.10).aspx


Hope this helps,
George