Random function

2020-03-24 Thread Luis Roberto Weck

Hi,

I am trying to generate some random data using the random() function.

However, I am getting the same result over mulitiple rows. This is a 
sample of the SQL I am using:


select (select string_agg(random()::text,';')
          from pg_catalog.generate_series(1,3,1) )
  from generate_series(1,10,1)

And I am getting something like:

|string_agg |
+--+
|0.243969671428203583;0.69257879443434;0.291524752043187618|
|0.243969671428203583;0.69257879443434;0.291524752043187618|
|0.243969671428203583;0.69257879443434;0.291524752043187618|
|0.243969671428203583;0.69257879443434;0.291524752043187618|
|0.243969671428203583;0.69257879443434;0.291524752043187618|
|0.243969671428203583;0.69257879443434;0.291524752043187618|
|0.243969671428203583;0.69257879443434;0.291524752043187618|
|0.243969671428203583;0.69257879443434;0.291524752043187618|
|0.243969671428203583;0.69257879443434;0.291524752043187618|
|0.243969671428203583;0.69257879443434;0.291524752043187618|

If this is the expected output, is there a way to always generate random 
numbers?






Re: Random function

2020-03-24 Thread Tom Lane
Luis Roberto Weck  writes:
> I am trying to generate some random data using the random() function.

> However, I am getting the same result over mulitiple rows. This is a 
> sample of the SQL I am using:

> select (select string_agg(random()::text,';')
> from pg_catalog.generate_series(1,3,1) )
>   from generate_series(1,10,1)

The sub-select is independent of the outer select so it's only computed
once, and then you get ten copies of that result.  Restructuring the
query, or inserting an artificial dependency on the outer select's data,
would help.

regards, tom lane




Re: Random function

2020-03-24 Thread David G. Johnston
How is this a performance related question?

On Tue, Mar 24, 2020 at 11:10 AM Luis Roberto Weck <
[email protected]> wrote:

> However, I am getting the same result over mulitiple rows. This is a
> sample of the SQL I am using:
>
> select (select string_agg(random()::text,';')
>from pg_catalog.generate_series(1,3,1) )
>from generate_series(1,10,1)
>
> And I am getting something like:
>
> |string_agg |
> +--+
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
>
> If this is the expected output,


Yes, you've asked it to compute a value, assign it to a column, then
generate 10 rows of that value.

is there a way to always generate random
> numbers?
>

Don't use a scalar subquery in the main target list.

One possible answer:

select format('%s;%s;%s', random(), random(), random()) from
generate_series(1, 10)

David J.


Re: Slow planning time when public schema included (12 vs. 9.4)

2020-03-24 Thread Anders Steinlein
On Sat, Mar 21, 2020 at 11:55 PM Tom Lane  wrote:

> Anders Steinlein  writes:
> > This they most definitely are not. 9.4 was running on an old box, Ubuntu
> > 12.04, while 12 is on an up-to-date Ubuntu 18.04 LTS. AFAICS, 2.15 on the
> > 9.4 box and 2.27 on the 12 box.
>
> I'm suspicious that the root issue has to do with libc differences,
> but I haven't any hard data to back that up with.
>
> Another possibility perhaps is that v12's ANALYZE is collecting a lot
> more "common" values than 9.4 did.  Whether it is or not, the advice
> you already got to ratchet down the stats target would likely be
> helpful to reduce the planning time.
>

Yes, indeed, lowering the statistics target to the default 100 decreased
the planning time a lot -- to sub-10m! Thanks for the guidance, although
the excessive difference between the two boxes/libc versions are
disappointing, to say the least.

Do you have any insight into how the Postgres 12 nondeterministic collation
feature (with ICU) compares performance-wise in general? Although having a
much lower statistics target "fixed" this, I'm concerned joins and sorting
is slower in general after having uncovered this (we haven't dug into that
performance numbers yet), since email (citext) are PKs in a lot of our
tables. Would changing our email domain using citext to instead be a domain
over text using a case-insensitive collation be a better choice?

Thanks again,
-- a.