Re: dump/restore problem due to CVE-2018-1058 (9.5.12)

2018-04-08 Thread Rory Campbell-Lange
On 07/04/18, Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> > (I'm aware that the reasons behind the change made to the dump format
> > due to CVE-2018-1058 are set out here:
> > https://wiki.postgresql.org/wiki/A_Guide_to_CVE-2018-1058:_Protect_Your_Search_Path)
> > 
> 
> > Additionally we sometimes use search_path manipulations +
> > temporary_schema.function to test functions in production environments.
> > Having to qualify the schema of objects seems a retrogressive step, but
> > perhaps our usage is peculiar in this way.
> 
> AFAIK you can still do that or did I miss something?

Yes, you can still do this. Howevever if my function in schema x can
still mask the function in schema y I suggest the security issue still
exists (as it doesn't refer, at least in the title, to only the 'public'
schema):

https://nvd.nist.gov/vuln/detail/CVE-2018-1058#vulnCurrentDescriptionTitle
A flaw was found in the way Postgresql allowed a user to modify the
behavior of a query for other users. An attacker with a user account
could use this flaw to execute code with the permissions of superuser in
the database. Versions 9.3 through 10 are affected.

So if in my database the default search path is x, y, z this "flaw"
still exists.

> > Also, in a coding environment where object.attribute masking is a
> > feature of the language, as it is in python, this change seems obtuse.
> > My function in schema x can still mask a function in another schema y,
> > so the problem of function masking (if it is a problem) still exists.
> 
> Are talking Python external or internal to Postgres?

I'm talking about how schema.function works in general in postgresql,
how useful that is, and how that is similar to other languages (like
python).

My further suggestion, admittedly from a naive perspective, is that the
solution to this problem is inadequate and partial, and that other
techniques should be used to solve it, such as making the masking of
functions in pg_catalog a new user permission or changing the default
search path of postgres superusers to prepend pg_catalog.

It still isn't clear to me why the output from pg_dump has a search_path
set to ''. That seems to be security through obscurity.

Thanks very much for your comments
Rory



Re: dump/restore problem due to CVE-2018-1058 (9.5.12)

2018-04-08 Thread Adrian Klaver

On 04/08/2018 03:40 AM, Rory Campbell-Lange wrote:

On 07/04/18, Adrian Klaver (adrian.kla...@aklaver.com) wrote:

(I'm aware that the reasons behind the change made to the dump format
due to CVE-2018-1058 are set out here:
https://wiki.postgresql.org/wiki/A_Guide_to_CVE-2018-1058:_Protect_Your_Search_Path)




Additionally we sometimes use search_path manipulations +
temporary_schema.function to test functions in production environments.
Having to qualify the schema of objects seems a retrogressive step, but
perhaps our usage is peculiar in this way.


AFAIK you can still do that or did I miss something?


Yes, you can still do this. Howevever if my function in schema x can
still mask the function in schema y I suggest the security issue still
exists (as it doesn't refer, at least in the title, to only the 'public'
schema):

 https://nvd.nist.gov/vuln/detail/CVE-2018-1058#vulnCurrentDescriptionTitle
 A flaw was found in the way Postgresql allowed a user to modify the
 behavior of a query for other users. An attacker with a user account
 could use this flaw to execute code with the permissions of superuser in
 the database. Versions 9.3 through 10 are affected.

So if in my database the default search path is x, y, z this "flaw"
still exists.


The above refers to the general case and is correct in that regard. The 
Wiki link you shared in your first post details the case that puts you 
at most risk:


https://wiki.postgresql.org/wiki/A_Guide_to_CVE-2018-1058:_Protect_Your_Search_Path 



Background: What is CVE-2018-1058?

"
The PostgreSQL 7.3 release introduced "schemas," which allowed users to 
create objects (e.g. tables, functions, etc.) in separate namespaces. 
When a user creates a database, by default, PostgreSQL creates a schema 
called public where, by default, all new objects (e.g. tables, 
functions) are created.


...

Without adjusting the configuration or access control settings, any user 
that can connect to a database can also create objects in the public 
schema for that database. A PostgreSQL administrator can grant and 
revoke permissions for a user to both use and/or create objects within a 
particular schema.


Schemas allow users to namespace objects, so objects of the same name 
can exist in different schemas in the same database. If there are 
objects with the same name in different schemas and the specific 
schema/object pair is not specified (i.e. schema.object), PostgreSQL 
decides which object to use based on the search_path setting. The 
search_path setting specifies the order the schemas are searched when 
looking for an object. The default value for search_path is $user,public 
where $user refers to the name of the user connected (which can be 
determined by executing SELECT SESSION_USER;).

"

The Problem: CVE-2018-1058

"
The problem described in CVE-2018-1058 centers around the default 
"public" schema and how PostgreSQL uses the search_path setting.

"

Boiled down it means that out of the box Postgres allows any user to 
create objects in the 'public' schema and the default search_path 
includes that schema. This allows an unprivileged user to create a 
masking function, though the ability to mask is dependent on other 
factors as shown in the example in 'The Problem' section above.


You are correct in that this can happen between other schemas as well. 
The difference is that setting up those schemas is done by the DBA not 
the project and so it is up to the DBA to enforce security. What it 
comes down to is that the release was not a fix as much as a heads up:


"The purpose of the release was to address CVE-2018-1058, which 
describes how a user can create like-named objects in different schemas 
that can change the behavior of other users' queries and cause 
unexpected or malicious behavior, also known as a "trojan-horse" attack. 
Most of this release centered around added documentation that described 
the issue and how to take steps to mitigate the impact on PostgreSQL 
databases. "


More comments below.




Also, in a coding environment where object.attribute masking is a
feature of the language, as it is in python, this change seems obtuse.
My function in schema x can still mask a function in another schema y,
so the problem of function masking (if it is a problem) still exists.


Are talking Python external or internal to Postgres?


I'm talking about how schema.function works in general in postgresql,
how useful that is, and how that is similar to other languages (like
python).

My further suggestion, admittedly from a naive perspective, is that the
solution to this problem is inadequate and partial, and that other
techniques should be used to solve it, such as making the masking of
functions in pg_catalog a new user permission or changing the default
search path of postgres superusers to prepend pg_catalog.

It still isn't clear to me why the output from pg_dump has a search_path
set to ''. That seems to be security through obscurity.



Sum of written buffers bigger than allocation?

2018-04-08 Thread pinker
Hi All!

First time I see that the sum of written buffers:
3765929+878326121 = 882092050 = select
pg_size_pretty(882092050::numeric*8192) = 6730 GB

is bigger than buffers_alloc value in pg_stat_bgwriter view:
buffers_alloc | 20426161 = 156 GB


buffers_checkpoint| 878599316
buffers_clean | 0
maxwritten_clean  | 0
buffers_backend   | 3766409
buffers_backend_fsync | 0
buffers_alloc | 20426161


It's kinda mysterious... Any explanations?
Does it mean that the same buffers were written over and over again?





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Sum of written buffers bigger than allocation?

2018-04-08 Thread Jeff Janes
On Sun, Apr 8, 2018 at 11:28 AM, pinker  wrote:

>
>
> It's kinda mysterious... Any explanations?
> Does it mean that the same buffers were written over and over again?
>
>
Yeah, checkpoints will write all dirty buffers, but doesn't evict them.
Next time the page is needed, it doesn't need to be re-read as it is still
there.

Cheers,

Jeff


Re: Conflict between JSON_AGG and COPY

2018-04-08 Thread Adrian Klaver

On 04/07/2018 03:44 AM, Đỗ Ngọc Trí Cường wrote:

Dear all,

I've found one case. I don't know this is a bug or I config/query some 
things wrong.


Let I describe it. I have a table with structure and data is:

  id | username |   fullname
+-+---
   1 | john        | John
   2 | anna    | Anna
   3 | sussi    | Sussi
   4 | david  | David Beckham
   5 | wayne | Wayne Rooney

I want to export it to a file in JSON format so I run the query as below:

COPY (SELECT row_to_json(t) FROM (SELECT json_agg(t1) AS "RECORDS"
FROM test_table t1) t) TO '/home/postgres/test1.json';

But the result I got will include "\n" in the result:

{"RECORDS":[{"id":1,"username":"john","fullname":"John"}, \n
{"id":2,"username":"anna","fullname":"Anna"}, \n
{"id":3,"username":"sussi","fullname":"Sussi"}, \n
{"id":4,"username":"david","fullname":"David Beckham"}, \n
{"id":5,"username":"wayne","fullname":"Wayne Rooney"}]}

Then, I try to get the same data in the other way:

COPY (WITH t2 AS (select row_to_json(t) as js from test_table t),
         t1 AS (SELECT ARRAY_TO_JSON(array_agg(js)) as "RECORDS"
FROM t2)
     SELECT row_to_json(t1) FROM t1)
     TO '/home/postgres/test2.json';


CREATE TABLE test_table (id integer, username varchar, fullname varchar);

INSERT INTO
test_table
VALUES
(1, 'john', 'John'),
(2, 'anna',  'Anna'),
(3, 'sussi',  'Sussi'),
(4, 'david', 'David Beckham'),
(5, 'wayne', 'Wayne Rooney');


This can be shortened to:

COPY
(select array_to_json(array_agg(row_to_json(t))) FROM test_table AS t) 
TO '/home/postgres/test2.json';




And the result I got is quite match what I expect.


{"RECORDS":[{"id":1,"username":"john","fullname":"John"},{"id":2,"username":"anna","fullname":"Anna"},{"id":3,"username":"sussi","fullname":"Sussi"},{"id":4,"username":"david","fullname":"David
Beckham"},{"id":5,"username":"wayne","fullname":"Wayne Rooney"}]}


I think the COPY command does not the `\n` character for pretty in 
`json_agg` command.


Well in the first case you are starting by concatenating the 5 rows in 
the table into a single row with the table rows separated by new lines:


SELECT json_agg(t1) AS "RECORDS" FROM test_table t1;
  RECORDS
---
 [{"id":1,"username":"john","fullname":"John"},   +
  {"id":2,"username":"anna","fullname":"Anna"},   +
  {"id":3,"username":"sussi","fullname":"Sussi"}, +
  {"id":4,"username":"david","fullname":"David Beckham"}, +
  {"id":5,"username":"wayne","fullname":"Wayne Rooney"}]
(1 row)


In the second case you start by maintaining the separate table rows:

select row_to_json(t) as js from test_table t;
   js

 {"id":1,"username":"john","fullname":"John"}
 {"id":2,"username":"anna","fullname":"Anna"}
 {"id":3,"username":"sussi","fullname":"Sussi"}
 {"id":4,"username":"david","fullname":"David Beckham"}
 {"id":5,"username":"wayne","fullname":"Wayne Rooney"}
(5 rows)

and then keeping that as an array of arrays:

select array_agg(row_to_json(t)) from test_table t;


{"{\"id\":1,\"username\":\"john\",\"fullname\":\"John\"}","{\"id\":2,\"username\":\"anna\",\"fullname\":\"Anna\"}","{\"id\":3,\"username\":\"sussi\",\"fullname\":\"Sussi\"}","{\"id\":4,\"username\":\"david\",\"fullname\":\"David 
Beckham\"}","{\"id\":5,\"username\":\"wayne\",\"fullname\":\"Wayne 
Rooney\"}"}

(1 row)

which then gets turned back into JSON:

select ARRAY_TO_JSON(array_agg(row_to_json(t))) from test_table t;

[{"id":1,"username":"john","fullname":"John"},{"id":2,"username":"anna","fullname":"Anna"},{"id":3,"username":"sussi","fullname":"Sussi"},{"id":4,"username":"david","fullname":"David 
Beckham"},{"id":5,"username":"wayne","fullname":"Wayne Rooney"}]





Please help me give me your idea. Am I wrong or this is really a bug?

Thank you and best regards,

Đỗ Ngọc Trí*Cường*(Mr.)

*Software Development Dept.*



Mobile: +84 9 3800 3394 

Phone: +84 28 3715 6322 

Email: dntcu...@digi-texx.vn

DIGI-TEXX | a global BPO provider

Address: Anna Building, Quang Trung Software City,

District. 12, Ho Chi Minh City, Vietnam

Website: www.digi-texx.vn 

//

/IMPORTANT NOTICE:/

/*This e-mail and any attachments may contain confidential and/or 
privileged information. If you are not the intended recipient, please 
delete it and notify the sender immediately. Any unauthorized copying, 
disclosure or distribution of the material in this e-mail is strictly 
forbidden./


/*Please consider the environment before printing./





--
Adrian Klaver
adrian.kla...@aklaver.com



Re: dump/restore problem due to CVE-2018-1058 (9.5.12)

2018-04-08 Thread Rory Campbell-Lange
Thanks for your comprehensive response, Adrian.

On 08/04/18, Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 04/08/2018 03:40 AM, Rory Campbell-Lange wrote:
> > On 07/04/18, Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> > > > (I'm aware that the reasons behind the change made to the dump format
> > > > due to CVE-2018-1058 are set out here:
> > > > https://wiki.postgresql.org/wiki/A_Guide_to_CVE-2018-1058:_Protect_Your_Search_Path)

> > So if in my database the default search path is x, y, z this "flaw"
> > still exists.
> 
> The above refers to the general case and is correct in that regard. The Wiki
> link you shared in your first post details the case that puts you at most
> risk:
> 
> https://wiki.postgresql.org/wiki/A_Guide_to_CVE-2018-1058:_Protect_Your_Search_Path

...

> Without adjusting the configuration or access control settings, any user
> that can connect to a database can also create objects in the public schema
> for that database.

...

> The Problem: CVE-2018-1058
> 
> "
> The problem described in CVE-2018-1058 centers around the default "public"
> schema and how PostgreSQL uses the search_path setting.
> "
> 
> Boiled down it means that out of the box Postgres allows any user to create
> objects in the 'public' schema and the default search_path includes that
> schema. This allows an unprivileged user to create a masking function,
> though the ability to mask is dependent on other factors as shown in the
> example in 'The Problem' section above.
> 
> You are correct in that this can happen between other schemas as well. The
> difference is that setting up those schemas is done by the DBA not the
> project and so it is up to the DBA to enforce security. What it comes down
> to is that the release was not a fix as much as a heads up:
> 
> "The purpose of the release was to address CVE-2018-1058, which describes
> how a user can create like-named objects in different schemas that can
> change the behavior of other users' queries and cause unexpected or
> malicious behavior, also known as a "trojan-horse" attack. Most of this
> release centered around added documentation that described the issue and how
> to take steps to mitigate the impact on PostgreSQL databases. "

Thank you for setting out the rationale behind the changes so clearly.

> > It still isn't clear to me why the output from pg_dump has a search_path
> > set to ''. That seems to be security through obscurity.
> 
> I see it more as a way to flag those instances that fail the recommendations
> in the Wiki article e.g. :
> 
> "As a result, pg_restore now fails because we have some table constraints
> that use functions which do not use public schema qualified table/column
> references. "

Fair enough. It is however a tedious problem to resolve in a large
code base and it would be cool to have a new "--set-search-path"
option to pg_dump to override it.

Thanks again
Rory



Re: dump/restore problem due to CVE-2018-1058 (9.5.12)

2018-04-08 Thread Adrian Klaver

On 04/08/2018 11:01 AM, Rory Campbell-Lange wrote:

Thanks for your comprehensive response, Adrian.




Fair enough. It is however a tedious problem to resolve in a large
code base and it would be cool to have a new "--set-search-path"
option to pg_dump to override it.


From other posts that covered similar ground I would say other people 
would agree. I am guessing the response you will get from those that can 
make the changes is that would be just delaying the inevitable. You can 
always ask though:)




Thanks again
Rory




--
Adrian Klaver
adrian.kla...@aklaver.com



Rationale for aversion to the central database?

2018-04-08 Thread Guyren Howe
I am a Rails developer at a medium-large size company. I’ve mostly worked at 
smaller companies. I’ve some exposure to other web development communities.

When it comes to databases, I have universally encountered the attitude that 
one should treat the database as a dumb data bucket. There is a *very* strong 
aversion to putting much of any business logic in the database. I encounter 
substantial aversion to have multiple applications access one database, or even 
the reverse: all abstraction should be at the application layer.

My best theory is that these communities developed at a time when Windows was 
more dominant, and just generally it was *significantly* easier to use MySQL 
than Postgres for many, particularly new, developers. And it is pretty 
reasonable to adopt an aversion to sophisticated use of the database in that 
case.

This attitude has just continued to today, even as many of them have switched 
to Postgres.

This is only a hypothesis. I am now officially researching the issue. I would 
be grateful for any wisdom from this community.


Aside: it is rare to find a situation in life or anywhere where one widely 
adopted thing is worse in *every way* than another thing, but this certainly 
was and largely still continues to be the case when one compares MySQL and 
Postgres. So why do folks continue to use MySQL? I find this mystifying.


Re: Rationale for aversion to the central database?

2018-04-08 Thread Alvaro Aguayo Garcia-Rada
Let's see There are two major issues when writing business logic in 
database:

1. Portability. Being tied to a single database engine is not always a good 
idea. When you write business logic in database, you have to write and maintain 
your store procedures for every database engine you want to support. That can 
be really complicated, and will surely take pretty much time, as programming 
languages for different databases are very different from each other. And it's 
permanent: Every time you make a change to a store procedure, you must make 
that change for every supported database.

2. Performance. I still don't have it clear, but, presumably, plpgsql(among 
others) may not have the same performance as Java, Ruby, Python, C++, or any 
other programming language. Also, when your application runs outside of the 
database server, having business logic on database will make your app use more 
resources from the database server, thus rendering is slow for all other 
operations. This goes against separating database and application.

However, there are some cases when you may want or need to use business logic 
on database: when you need to fetch large volumes of data to produce some 
report. This is the case of some accounting reports in complex ERPs. The only 
advantage store procedures have is they run INSIDE the database, so there's no 
TCP/IP overhead and no network latency when the store procedure make a large 
query. Even running in the same host, fetching large volumes of data will 
always be faster from a store procedure.

Setting this considerable advantage of store procedures, I still try to avoid 
business logic programming on database. In the very specific cases when I need 
to take advantage of this, I try to make it the most simple, more near to data 
collecting than business logic, so the application receives processed or 
summarized data, and processes it as needed.

Regards,

Alvaro Aguayo
Operations Manager
Open Comb Systems E.I.R.L.

Office: (+51-1) 3377813 | Mobile: (+51) 995540103 | (+51) 954183248
Web: www.ocs.pe

- Original Message -
From: "Guyren Howe" 
To: "PostgreSql-general" 
Sent: Sunday, 8 April, 2018 16:39:49
Subject: Rationale for aversion to the central database?

I am a Rails developer at a medium-large size company. I’ve mostly worked at 
smaller companies. I’ve some exposure to other web development communities.

When it comes to databases, I have universally encountered the attitude that 
one should treat the database as a dumb data bucket. There is a *very* strong 
aversion to putting much of any business logic in the database. I encounter 
substantial aversion to have multiple applications access one database, or even 
the reverse: all abstraction should be at the application layer.

My best theory is that these communities developed at a time when Windows was 
more dominant, and just generally it was *significantly* easier to use MySQL 
than Postgres for many, particularly new, developers. And it is pretty 
reasonable to adopt an aversion to sophisticated use of the database in that 
case.

This attitude has just continued to today, even as many of them have switched 
to Postgres.

This is only a hypothesis. I am now officially researching the issue. I would 
be grateful for any wisdom from this community.


Aside: it is rare to find a situation in life or anywhere where one widely 
adopted thing is worse in *every way* than another thing, but this certainly 
was and largely still continues to be the case when one compares MySQL and 
Postgres. So why do folks continue to use MySQL? I find this mystifying.



Re: Rationale for aversion to the central database?

2018-04-08 Thread Ravi Krishna
1. With a micro service based architecture these days, it is difficult to
justify putting all logic in a central database as you can only scale up in
a database.  Business logic in things like Spark can make a claim for scale
out solution.
2. All RDBMS have a non portable stored proc language, making migration a
pain.


Re: Rationale for aversion to the central database?

2018-04-08 Thread Adrian Klaver

On 04/08/2018 02:39 PM, Guyren Howe wrote:
I am a Rails developer at a medium-large size company. I’ve mostly 
worked at smaller companies. I’ve some exposure to other web development 
communities.


When it comes to databases, I have universally encountered the attitude 
that one should treat the database as a dumb data bucket. There is a 
*very* strong aversion to putting much of any business logic in the 
database. I encounter substantial aversion to have multiple applications 
access one database, or even the reverse: all abstraction should be at 
the application layer.


My best theory is that these communities developed at a time when 
Windows was more dominant, and just generally it was *significantly* 
easier to use MySQL than Postgres for many, particularly new, 
developers. And it is pretty reasonable to adopt an aversion to 
sophisticated use of the database in that case.


This attitude has just continued to today, even as many of them have 
switched to Postgres.


This is only a hypothesis. I am now officially researching the issue. I 
would be grateful for any wisdom from this community.



Aside: it is rare to find a situation in life or anywhere where one 
widely adopted thing is worse in *every way* than another thing, but 
this certainly was and largely still continues to be the case when one 
compares MySQL and Postgres. So why do folks continue to use MySQL? I 
find this mystifying.


In general I see it as a case of people working where they are 
comfortable. So folks that understand and like SQL do the heavy lifting 
there and use application logic to just work with the output of the 
database business logic. Folks that are comfortable with a language 
other then SQL use that language to do the business logic and see the 
database as just the dumb data store you refer to. The rise of 
frameworks over databases has also contributed to this in my opinion. 
Mostly because they encourage the notion that there is such a thing as 
universal SQL that operates independent of the underlying database. 
While it is possible it usually leads to a very simple SQL model that 
can work over multiple database engines. So again you end up with the 
database as a data bucket.



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Rationale for aversion to the central database?

2018-04-08 Thread g...@luxsci.net

 
 
On April 8, 2018 02:40:46 pm PDT, "Guyren Howe"  wrote:
 
I am a Rails developer at a medium-large size company. I’ve mostly 
worked at smaller companies. I’ve some exposure to other web 
development communities.


When it comes to databases, I have universally encountered the attitude 
that one should treat the database as a dumb data bucket. There is a 
*very* strong aversion to putting much of any business logic in the 
database. I encounter substantial aversion to have multiple 
applications access one database, or even the reverse: all abstraction 
should be at the application layer.


My best theory is that these communities developed at a time when 
Windows was more dominant, and just generally it was *significantly* 
easier to use MySQL than Postgres for many, particularly new, 
developers. And it is pretty reasonable to adopt an aversion to 
sophisticated use of the database in that case.


This attitude has just continued to today, even as many of them have 
switched to Postgres.


This is only a hypothesis. I am now officially researching the issue. I 
would be grateful for any wisdom from this community.


Aside: it is rare to find a situation in life or anywhere where one 
widely adopted thing is worse in *every way* than another thing, but 
this certainly was and largely still continues to be the case when one 
compares MySQL and Postgres. So why do folks continue to use MySQL? I 
find this mystifying.


===

Hi there. This issue is close to my heart and I'm with you.  I am 
however very comfortable with using psql and PL/pgSQL and I am very 
opinionated.
I feel *very* strongly that a database that actually matters and where 
RI is critical, i.e., any PG db I handle, should make sense on its own 
and be *highly* usable on its own. It should not be dependent on some 
particular external application code to use it or make sense of things. 
It follows that I think nonintuituve exceptions/gotchas should be 
*clear* at a db level, likely using functions to encapsulate that 
information.


Sure, PL/pgSQL may possibly be slow at some things like doing lots of 
bigint math, but I would probably use C and ECPG for the appropriate 
cases.
Not a large percentage of programmers these days know how fast db tasks 
can be because they are used to working with relatively slow tools and 
frameworks. ( Yes, typical Python.)


I am also highly mystified by the dumbstore approach and frankly, I 
think that folks should KNOW their tools better. Not knowing how to use 
your database effectively typically results in unnecessary and often 
very convoluted application code, from my experience.


I keep hearing about db portability but I have yet to see cases where 
db logic was an issue. But to be honest, I haven't seen many migrations 
at all. Why? Because I think that it rarely ever happens. If I had to 
do it, I sure as heck hope that the  db was "clean" and understandable 
without
having to review some probably awful app. code. Why would anyone 
migrate *away* from PG anyway? :)


One advantage to using logic and functions in  the db is that you can 
fix things immediately without having to make new application builds. 
That in itself is a huge advantage, IMO.


Cheers,
-g



Re: Rationale for aversion to the central database?

2018-04-08 Thread Ravi Krishna
> I am however very comfortable with using psql and PL/pgSQL and I am 
very opinionated.


Nothing wrong with this approach and it may very well work 90% of the 
time.   Until ... a day comes when

you need to migrate out of PG to another RDBMS.  Good luck at that time.




Re: Rationale for aversion to the central database?

2018-04-08 Thread Stephen Frost
Greetings,

* Alvaro Aguayo Garcia-Rada (aagu...@opensysperu.com) wrote:
> 1. Portability. Being tied to a single database engine is not always a good 
> idea. When you write business logic in database, you have to write and 
> maintain your store procedures for every database engine you want to support. 
> That can be really complicated, and will surely take pretty much time, as 
> programming languages for different databases are very different from each 
> other. And it's permanent: Every time you make a change to a store procedure, 
> you must make that change for every supported database.

The portability claim tends to be both a false one and often, when
realized, results in a solution where you aren't using the database for
anything complicated and you'd be better off with a much simpler data
store.  You also don't actually offer any justification for the claim
that being tied to a single database engine is not always a good idea-
why is that?  With commercial databases it tends to be because you are
at the behest of some very large commercial company- but that isn't an
issue with PostgreSQL.  The next argument may be that the project might
go away and force a move to another database, but PG's track record is
hard to beat in this area given the history and the number of people
working to make it better and keep it maintained year after year.

Ultimately, you really should be thinking of the database as the
language of your data.  You wouldn't write your app in multiple
different programming languages, would you?  What if Python suddently
goes away as a language, or Java does?  Would you write every piece of
software twice, so that you can flip over to using Python instead of
Java on a whim?

> 2. Performance. I still don't have it clear, but, presumably, plpgsql(among 
> others) may not have the same performance as Java, Ruby, Python, C++, or any 
> other programming language. Also, when your application runs outside of the 
> database server, having business logic on database will make your app use 
> more resources from the database server, thus rendering is slow for all other 
> operations. This goes against separating database and application.

No, plpgsql doesn't have the same performance characteristics as Java,
Ruby, Python, C++, or many other languages- but that's why it isn't the
only language which is supported in PostgreSQL.  You can write back-end
functions in another of those languages, plus quite a few others
including Javascript, R, Perl.  As for the question about if it'll
actually result in the database server being more taxed or not- that
really depends.  Aggregates are a great example- is it going to be
cheaper for the database to run 'sum()' across a data set and give you
the result, or for it to take every row from that table and ship it to
a client?  There are certainly examples which can go the other way too,
of course, but it's really something to think about on an individual
basis, not to make a broad stroke decision about, particularly when
you're arguing that you'll get better performance by moving the code
away from the data, that tends to not be the case.

> However, there are some cases when you may want or need to use business logic 
> on database: when you need to fetch large volumes of data to produce some 
> report. This is the case of some accounting reports in complex ERPs. The only 
> advantage store procedures have is they run INSIDE the database, so there's 
> no TCP/IP overhead and no network latency when the store procedure make a 
> large query. Even running in the same host, fetching large volumes of data 
> will always be faster from a store procedure.

This is what I'm getting at above, but I would caution that looping over
a table in a stored procedure is generally much less performant than
finding a way to express what you want in SQL.

I'm afraid that the other advantages of doing more in the database
aren't really being considered in your arguments above either- things
like having constraints all checked in one place, regardless of the
application, and the ability to have interfaces defined which multiple
applications could operate against and know that they're all going to be
getting back the same, consistent, results from the database since it's
the same code underneath.  Some of that can be done by sharing code
between the applications, of course, but there's a great deal more risk
there (what about when the applications need to change something that's
done in that shared code, so they essentially fork it..?  Or the
applications have to be upgraded at different times, or a variety of
other situations which could lead to that common code diverging, or even
when the applications aren't written in the same language...).

Just some food for thought.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Rationale for aversion to the central database?

2018-04-08 Thread Stephen Frost
Greetings,

* Ravi Krishna (sravikrish...@gmail.com) wrote:
> >> I am however very comfortable with using psql and PL/pgSQL and I am very
> >opinionated.
> >
> Nothing wrong with this approach and it may very well work 90% of the time.
> Until ... a day comes when
> you need to migrate out of PG to another RDBMS.  Good luck at that time.

Oh, don't worry, people do that.  What I particularly enjoy are the
stories (of which there are a number now...) where people moved away
from PG for some reason or another, then to another database, to
another, and another, and finally back to PG again, much the wiser for
it but also rather battle-worn. :)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Rationale for aversion to the central database?

2018-04-08 Thread Tim Cross
On 9 April 2018 at 07:39, Guyren Howe  wrote:

> I am a Rails developer at a medium-large size company. I’ve mostly worked
> at smaller companies. I’ve some exposure to other web development
> communities.
>
> When it comes to databases, I have universally encountered the attitude
> that one should treat the database as a dumb data bucket. There is a *very*
> strong aversion to putting much of any business logic in the database. I
> encounter substantial aversion to have multiple applications access one
> database, or even the reverse: all abstraction should be at the application
> layer.
>
> My best theory is that these communities developed at a time when Windows
> was more dominant, and just generally it was *significantly* easier to use
> MySQL than Postgres for many, particularly new, developers. And it is
> pretty reasonable to adopt an aversion to sophisticated use of the database
> in that case.
>
> This attitude has just continued to today, even as many of them have
> switched to Postgres.
>
> This is only a hypothesis. I am now officially researching the issue. I
> would be grateful for any wisdom from this community.
>
>
> Aside: it is rare to find a situation in life or anywhere where one widely
> adopted thing is worse in *every way* than another thing, but this
> certainly was and largely still continues to be the case when one compares
> MySQL and Postgres. So why do folks continue to use MySQL? I find this
> mystifying.
>

It is interesting looking at many of the responses to this thread. I see a
lot at each extreme - either put lots of stuff inthe database or use the
database as just a 'dumb' store and put everything in the application code.

I think the real solution is somewhere in the middle. I've lost count of
the number of applications where the application code is jumping through
all sorts of hoops to do basic data operations which would be far better
handled in the database and can easily be done using just ANSI SQL (so is
portable). It drives me crazy when people tell me the database is slow when
they are doing 'select * from table' and then filtering and sorting the
data in their application. Applications should take advantage of what the
database does well. Unfortunately, I see far too many developers who are
uncomfortable with SQL, don't know how to structure their queries
efficiently (lots of nested sub queries etc, cartesian joins etc).

At the other extreme is those who tend to put almost everything in the
database - including business policy and business 'rules' which are
probably better categorised as current business strategy. First, I think it
is nearly always a mistake to try and enforce business policy with
technology. Policies change too often and should be dealt with via
administrative measures. Technology can certainly be used to raise alerts
regarding policy breeches, but should not be used to enforce policies.
Likewise, some business rules are more akin to strategies than being actual
static rules and can change with little notice, rhyme or reason. These
probably should not be 'hard coded' into the database. Other rules are more
stable and unlikely to ever change and are likely good candidates for being
encoded in the database as either functions or constraints.

I do feel that often the big problem is with management who fail to
understand the time and effort needed to develop a good data model.
Developers are put under pressure to deliver functionality and as long as
it looks correct at the interface level, all is good. Little thought is
really put into long term maintenance or performance.  From a developer
perspective, time put into becoming an expert in React, Angular, Node,
Python etc is probably going to earn them more bonus points than time spent
on developing skills in defining good data models or understanding of the
power/functionality of the underlying database engine. Of course, this does
tend to be short sighted as a good data model will tend to make it easier
to add/enhance an application and understanding your database system will
make changes and enhancements less daunting.

For me, the sign of a good developer is one who is able to get the balance
right. They understand the strengths and weaknesses of ALL the components
involved and are able to select the technology mix which suits the problem
domain and are able to get the right balance between business
responsiveness to change and long term maintenance/viability.
Unfortunately, such developers are rare, so it will usually mean there are
a team of people with different skills and what will matter is how well
they are able to work together as a team and come up with an architecture
which satisfies the business requirements.

-- 
regards,

Tim

--
Tim Cross


Re: Rationale for aversion to the central database?

2018-04-08 Thread Peter Klipstein
Tim, I'm sorry if I sound like a cheerleader, but boy did you nail this. I
would basically say exactly the same thing, just not as well.



On Sun, Apr 8, 2018 at 9:37 PM, Tim Cross  wrote:

>
>
> On 9 April 2018 at 07:39, Guyren Howe  wrote:
>
>> I am a Rails developer at a medium-large size company. I’ve mostly worked
>> at smaller companies. I’ve some exposure to other web development
>> communities.
>>
>> When it comes to databases, I have universally encountered the attitude
>> that one should treat the database as a dumb data bucket. There is a *very*
>> strong aversion to putting much of any business logic in the database. I
>> encounter substantial aversion to have multiple applications access one
>> database, or even the reverse: all abstraction should be at the application
>> layer.
>>
>> My best theory is that these communities developed at a time when Windows
>> was more dominant, and just generally it was *significantly* easier to use
>> MySQL than Postgres for many, particularly new, developers. And it is
>> pretty reasonable to adopt an aversion to sophisticated use of the database
>> in that case.
>>
>> This attitude has just continued to today, even as many of them have
>> switched to Postgres.
>>
>> This is only a hypothesis. I am now officially researching the issue. I
>> would be grateful for any wisdom from this community.
>>
>>
>> Aside: it is rare to find a situation in life or anywhere where one
>> widely adopted thing is worse in *every way* than another thing, but this
>> certainly was and largely still continues to be the case when one compares
>> MySQL and Postgres. So why do folks continue to use MySQL? I find this
>> mystifying.
>>
>
> It is interesting looking at many of the responses to this thread. I see a
> lot at each extreme - either put lots of stuff inthe database or use the
> database as just a 'dumb' store and put everything in the application code.
>
> I think the real solution is somewhere in the middle. I've lost count of
> the number of applications where the application code is jumping through
> all sorts of hoops to do basic data operations which would be far better
> handled in the database and can easily be done using just ANSI SQL (so is
> portable). It drives me crazy when people tell me the database is slow when
> they are doing 'select * from table' and then filtering and sorting the
> data in their application. Applications should take advantage of what the
> database does well. Unfortunately, I see far too many developers who are
> uncomfortable with SQL, don't know how to structure their queries
> efficiently (lots of nested sub queries etc, cartesian joins etc).
>
> At the other extreme is those who tend to put almost everything in the
> database - including business policy and business 'rules' which are
> probably better categorised as current business strategy. First, I think it
> is nearly always a mistake to try and enforce business policy with
> technology. Policies change too often and should be dealt with via
> administrative measures. Technology can certainly be used to raise alerts
> regarding policy breeches, but should not be used to enforce policies.
> Likewise, some business rules are more akin to strategies than being actual
> static rules and can change with little notice, rhyme or reason. These
> probably should not be 'hard coded' into the database. Other rules are more
> stable and unlikely to ever change and are likely good candidates for being
> encoded in the database as either functions or constraints.
>
> I do feel that often the big problem is with management who fail to
> understand the time and effort needed to develop a good data model.
> Developers are put under pressure to deliver functionality and as long as
> it looks correct at the interface level, all is good. Little thought is
> really put into long term maintenance or performance.  From a developer
> perspective, time put into becoming an expert in React, Angular, Node,
> Python etc is probably going to earn them more bonus points than time spent
> on developing skills in defining good data models or understanding of the
> power/functionality of the underlying database engine. Of course, this does
> tend to be short sighted as a good data model will tend to make it easier
> to add/enhance an application and understanding your database system will
> make changes and enhancements less daunting.
>
> For me, the sign of a good developer is one who is able to get the balance
> right. They understand the strengths and weaknesses of ALL the components
> involved and are able to select the technology mix which suits the problem
> domain and are able to get the right balance between business
> responsiveness to change and long term maintenance/viability.
> Unfortunately, such developers are rare, so it will usually mean there are
> a team of people with different skills and what will matter is how well
> they are able to work together as a team and come up with an architectur