Default ordering option

2019-07-23 Thread Cyril Champier
Hi,

In this documentation
, it is said:

> If sorting is not chosen, the rows will be returned in an unspecified
> order. The actual order in that case will depend on the scan and join plan
> types and the order on disk, but it must not be relied on.


I would like to know if there is any way to change that to have a "real"
random behaviour.

My use case:
At Doctolib, we do a lot of automatic tests.
Sometimes, people forgot to order their queries. Most of the time, there is
no real problem on production. Let say, we display a user list without
order.
When a developer writes a test for this feature, he will create 2 users A
and B, then assert that they are displayed "[A, B]".
99% of the time the test will be ok, but sometimes, the displayed list will
be "[B,A]", and the test will fail.

One solution could be to ensure random order with an even distribution, so
that such failing test would be detected quicker.

Is that possible? Maybe with a plugin?

Thanks,
Cyril


Re: Default ordering option

2019-07-24 Thread Cyril Champier
Thanks for your answers.
Unfortunately the update trick only seems to work under certain conditions.

I do this to shuffle my patients table:
UPDATE "patients"
SET "updated_at" = NOW()
WHERE "patients"."id" = (SELECT "patients"."id" FROM "patients" ORDER BY
random() LIMIT 1)

Then indeed, this query returns different order:
SELECT *
FROM "patients"

But this one (because it use an index?) always returns values in the same
order:
SELECT "id"
FROM "patients"



And for the other suggestion, I cannot blindly add 'ORDER BY random()' to
every select,
because of the incompatibility with distinct and union, and the way we use
our orm.


On Wed, Jul 24, 2019 at 3:54 AM Ian Barwick 
wrote:

> On 7/24/19 2:23 AM, Adrian Klaver wrote:
> > On 7/23/19 8:43 AM, Cyril Champier wrote:
> >> Hi,
> >>
> >> In this documentation <
> https://www.postgresql.org/docs/9.1/queries-order.html>, it is said:
> >>
> >> If sorting is not chosen, the rows will be returned in an
> >> unspecified order. The actual order in that case will depend on the
> >> scan and join plan types and the order on disk, but it must not be
> >> relied on.
> >>
> >>
> >> I would like to know if there is any way to change that to have a
> "real" random behaviour.
> >>
> >> My use case:
> >> At Doctolib, we do a lot of automatic tests.
> >> Sometimes, people forgot to order their queries. Most of the time,
> there is no real problem on production. Let say, we display a user list
> without order.
> >> When a developer writes a test for this feature, he will create 2 users
> A and B, then assert that they are displayed "[A, B]".
> >> 99% of the time the test will be ok, but sometimes, the displayed list
> will be "[B,A]", and the test will fail.
> >>
> >> One solution could be to ensure random order with an even distribution,
> so that such failing test would be detected quicker.
> >>
> >> Is that possible? Maybe with a plugin?
> >
> > Not that I know of.
> >
> > A possible solution given below:
> >
> > test_(postgres)> insert into t1 values (1, 'dog'), (2, 'cat'), (3,
> 'fish');
> > INSERT 0 3
> >
> > test_(postgres)> select * from t1 ;
> >   a |  b
> > ---+--
> >   1 | dog
> >   2 | cat
> >   3 | fish
> > (3 rows)
> >
> > test_(postgres)> update  t1 set b = 'dogfish' where  a =1;
> > UPDATE 1
> >
> > test_(postgres)> select * from t1 ;
> >   a |b
> > ---+-
> >   2 | cat
> >   3 | fish
> >   1 | dogfish
> > (3 rows)
> >
> > An UPDATE reorders the rows. Maybe throw an UPDATE into the test after
> creating the users to force an 'out of order' result?
>
> An UPDATE without changing any values should have the same effect, e.g. :
>
>  UPDATE t1 SET b = b WHERE a = 1;
>
> Something like this
>
>  WITH x AS (SELECT * FROM t1 ORDER BY a DESC)
>  UPDATE t1 t
> SET a = t.a
>FROM x
>   WHERE t.a = x.a
>
> would shuffle the rows into reverse order, which might be enough to catch
> out any missing ORDER BY (this assumes nothing else will touch the table
> and reorder it before the test is run).
>
> You could also order by RANDOM() but there would be a chance the rows would
> end up in sequential order.
>
> Regards
>
>
> Ian Barwick
>
> --
>   Ian Barwick   https://www.2ndQuadrant.com/
>   PostgreSQL Development, 24x7 Support, Training & Services
>


Re: Default ordering option

2019-07-24 Thread Cyril Champier
Indeed, you are right, I do my test in pure sql and via ruby ActiveRecord,
and I must had been confused,
the behaviour is correct in sql, it must have been a cache thing in
ActiveRecord that prevented the reordering.
But meanwhile, I tested on our whole CI, and it took twice the normal time
with updates to shuffle DB :(

For the union, I speak about production code like this:
"select count(*) from (#{directory_doctors_query_sql} union all
#{profiles_query_sql}) as doctors"
In the to_sql, we cannot detect that we will be injected into a union.
So I cannot blindly add the random in the to_sql method.







On Wed, Jul 24, 2019 at 4:48 PM Adrian Klaver 
wrote:

> On 7/24/19 1:45 AM, Cyril Champier wrote:
> > Thanks for your answers.
> > Unfortunately the update trick only seems to work under certain
> conditions.
> >
> > I do this to shuffle my patients table:
> > UPDATE "patients"
> > SET "updated_at" = NOW()
> > WHERE "patients"."id" = (SELECT "patients"."id" FROM "patients" ORDER BY
> > random() LIMIT 1)
> >
> > Then indeed, this query returns different order:
> > SELECT *
> > FROM "patients"
> >
> > But this one (because it use an index?) always returns values in the
> > same order:
> > SELECT "id"
> > FROM "patients"
>
> Hmm, I don't see that:
>
> test=# \d t1
>Table "public.t1"
>   Column |   Type| Collation | Nullable | Default
> +---+---+--+-
>   a  | integer   |   | not null |
>   b  | character varying |   |  |
> Indexes:
>  "t1_pkey" PRIMARY KEY, btree (a)
>
>
> test=# select * from t1;
>   a |b
> ---+-
>   2 | cat
>   3 | fish
>   1 | dogfish
> (3 rows)
>
> test=# select a from t1;
>   a
> ---
>   2
>   3
>   1
> (3 rows)
>
> Are you sure there is nothing going on between the first and second
> queries e.g. ROLLBACK?
>
> >
> >
> >
> > And for the other suggestion, I cannot blindly add 'ORDER BY random()'
> > to every select,
> > because of the incompatibility with distinct and union, and the way we
> > use our orm.
> >
> Are you talking about the production or test queries above?
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Default ordering option

2019-07-25 Thread Cyril Champier
*Adrian*:
>
> If order is not an issue in the production code why test for it in the
> test code?


In many cases, it would not be a problem in tests if we had an unordered
array comparison helper.
But in other cases, it is a production issue.
In ruby ActiveRecord for exemple, you can do `Patient.find_by(last_name:
'champier')`,
which translates to `SELECT  "patients".* FROM "patients" WHERE
"patients"."last_name" = 'champier' LIMIT 1`.
If last_name is not unique, the returned record will be random.

So yes, everything as to be randomized, because the sources are multiples
and the consequences can vary to a dramatic production bug, a failed CI 1%
of the time, or to a useless test assertion.


*Peter*:

> It might be an interesting exercise to implement this as a post-parsing
> hook.


I known nothing about that, but that sounds interesting, do you have any
documentation pointer to help me implement that?



On Wed, Jul 24, 2019 at 10:36 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-07-23 17:43, Cyril Champier wrote:
> > In this documentation
> > <https://www.postgresql.org/docs/9.1/queries-order.html>, it is said:
> >
> > If sorting is not chosen, the rows will be returned in an
> > unspecified order. The actual order in that case will depend on the
> > scan and join plan types and the order on disk, but it must not be
> > relied on.
> >
> >
> > I would like to know if there is any way to change that to have a "real"
> > random behaviour.
>
> It might be an interesting exercise to implement this as a post-parsing
> hook.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Default ordering option

2019-07-25 Thread Cyril Champier
Peter:
So I would need to create a pg extension encapsulating this hook callback?
If this is the case, it seems it will be much more complicated than
expected, and I wont be able to do it :(
But thanks for the suggestion anyway.

On Thu, Jul 25, 2019 at 12:21 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-07-25 09:43, Cyril Champier wrote:
> > It might be an interesting exercise to implement this as a
> post-parsing
> > hook.
> >
> >
> > I known nothing about that, but that sounds interesting, do you have any
> > documentation pointer to help me implement that?
>
> Look for post_parse_analyze_hook.  Walk the parsed query tree, look for
> queries without ordering clause and manufacture one.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Default ordering option

2019-07-26 Thread Cyril Champier
Adrian:

Are you really looking for a pseudo-random name?


No, the code I pasted was an existing production bug: the last_name should
have been unique, so the selected patient would always be the same.
This should have been detected in tests, but since the order was "almost
always the same", our test was green 99% of the time, so we discarded it as
flaky.

Fuzzy testing could be an option, but this would go too far, as for Peter
extension suggestion.
We have huge existing codebase with more than 10K tests, and I do not want
to modify our whole testing strategy.

Meanwhile, I went for an ORM patch (ActiveRecord) and forbid usages that
can workaround it.
If you are interested, here is a gist:
https://gist.github.com/cyrilchampier/fdb945e8a09f93d50c7e89305c2f53f0

I wish there was a simple flag to activate in PostgreSQL to do that!

Thanks for your ideas!




On Thu, Jul 25, 2019 at 4:55 PM Adrian Klaver 
wrote:

> On 7/25/19 12:43 AM, Cyril Champier wrote:
> > *Adrian*:
> >
> > If order is not an issue in the production code why test for it in
> the
> > test code?
> >
> >
> > In many cases, it would not be a problem in tests if we had an unordered
> > array comparison helper.
> > But in other cases, it is a production issue.
> > In ruby ActiveRecord for exemple, you can do `Patient.find_by(last_name:
> > 'champier')`,
> > which translates to `SELECT  "patients".* FROM "patients" WHERE
> > "patients"."last_name" = 'champier' LIMIT 1`.
> > If last_name is not unique, the returned record will be random.
>
> Are you really looking for a pseudo-random name?
>
> If so would not(warning not a Ruby developer, so below is tentative):
>
> Patient.where(["last_name = :last_name", {last_name:
> "champier"}]).order('RANDOM()').first
>
> work better?
>
> If not why not use something that returns all possible matches?
>
> >
> > So yes, everything as to be randomized, because the sources are
> > multiples and the consequences can vary to a dramatic production bug, a
> > failed CI 1% of the time, or to a useless test assertion.
>
> One way I can think of doing this is write a script that walks through
> your tables in the test db and does an UPDATE across the rows. It is
> going to add time to your tests, but then I believe that is going to be
> the case for anything you do. Or you could look at something I have
> never tried, fuzzy testing. As a starting point:
>
> https://www.guru99.com/fuzz-testing.html
>
> Maybe other folks have suggestions on tools you could use for fuzzy
> testing.
>
>
> >
> >
> > *Peter*:
> >
> > It might be an interesting exercise to implement this as a
> post-parsing
> > hook.
> >
> > I known nothing about that, but that sounds interesting, do you have any
> > documentation pointer to help me implement that?
> >
> >
> >
> > On Wed, Jul 24, 2019 at 10:36 PM Peter Eisentraut
> >  > <mailto:peter.eisentr...@2ndquadrant.com>> wrote:
> >
> > On 2019-07-23 17:43, Cyril Champier wrote:
> >  > In this documentation
> >  > <https://www.postgresql.org/docs/9.1/queries-order.html>, it is
> said:
> >  >
> >  > If sorting is not chosen, the rows will be returned in an
> >  > unspecified order. The actual order in that case will depend
> > on the
> >  > scan and join plan types and the order on disk, but it must
> > not be
> >  > relied on.
> >  >
> >  >
> >  > I would like to know if there is any way to change that to have a
> > "real"
> >  > random behaviour.
> >
> > It might be an interesting exercise to implement this as a
> post-parsing
> > hook.
> >
> > --
> > Peter Eisentraut http://www.2ndQuadrant.com/
> > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Default ordering option

2019-07-26 Thread Cyril Champier
Julien,

Because it's production code generated by our ORM for this command:
`Patient.find_by(last_name: 'champier')`.
Of course this was not intended by the developer that though the last_name
was unique.



On Fri, Jul 26, 2019 at 10:10 AM Julien Rouhaud  wrote:

> On Fri, Jul 26, 2019 at 9:53 AM Cyril Champier
>  wrote:
> >
> > Adrian:
> >
> >> Are you really looking for a pseudo-random name?
> >
> >
> > No, the code I pasted was an existing production bug: the last_name
> should have been unique, so the selected patient would always be the same.
> > This should have been detected in tests, but since the order was "almost
> always the same", our test was green 99% of the time, so we discarded it as
> flaky.
>
> If the filter should return at most 1 row, why put a LIMIT in the
> first place?  Even with a forced random() you won't get a failure
> every time, while asserting there's at most 1 row returned is
> guaranteed to fail?
>