date:20220715

Proposal to introduce a shuffle function to intarray extension

2022-07-15 Thread Martin Kalcher


Dear list,

i am dealing with an application that processes fairly large arrays of 
integers. It makes heavy use of the intarray extension, which works 
great in most cases. However, there are two requirements that cannot be 
addressed by the extension and are rather slow with plain SQL. Both can 
be met with shuffling:


- Taking n random members from an integer array
- Splitting an array into n chunks, where each member is assigned to a 
random chunk


Shuffling is currently implemented by unnesting the array, ordering the 
members by random() and aggregating them again.



  create table numbers (arr int[]);

  insert into numbers (arr)
  select array_agg(i)
  from generate_series(1, 400) i;


  select arr[1:3]::text || ' ... ' || arr[398:400]::text
  from (
select array_agg(n order by random()) arr
from (
  select unnest(arr) n from numbers
) plain
  ) shuffled;

  -
   {2717290,3093757,2426384} ... {3011871,1402540,1613647}

  Time: 2348.961 ms (00:02.349)


I wrote a small extension (see source code below) to see how much we can 
gain, when the shuffling is implemented in C and the results speak for 
themselves:



  select arr[1:3]::text || ' ... ' || arr[398:400]::text
  from (
select shuffle(arr) arr from numbers
  ) shuffled;

  
   {1313971,3593627,86630} ... {50764,430410,3901128}

  Time: 132.151 ms


I would like to see a function like this inside the intarray extension. 
Is there any way to get to this point? How is the process to deal with 
such proposals?


Best regards,
Martin Kalcher


Source code of extension mentioned above:


#include "postgres.h"
#include "port.h"
#include "utils/array.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(shuffle);

void _shuffle(int32 *a, int len);

Datum
shuffle(PG_FUNCTION_ARGS)
{
  ArrayType  *a = PG_GETARG_ARRAYTYPE_P_COPY(0);

  int len;

  if (array_contains_nulls(a))
ereport(ERROR,
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 errmsg("array must not contain nulls")));

  len = ArrayGetNItems(ARR_NDIM(a), ARR_DIMS(a));

  if (len > 1)
_shuffle((int32 *) ARR_DATA_PTR(a), len);

  PG_RETURN_POINTER(a);
}

void
_shuffle(int32 *a, int len) {
  int i, j;
  int32 tmp;

  for (i = len - 1; i > 0; i--) {
j = random() % (i + 1);
tmp = a[i];
a[i] = a[j];
a[j] = tmp;
  }
}

Re: About revoking large number of privileges; And the PUBLIC role.

2022-07-15 Thread Stephen Frost

Greetings,

* Dominique Devienne (ddevie...@gmail.com) wrote:
> So my goal is to delete all those "db specific" ROLEs, then the DB
> with all its schemas.

If you want to drop the database anyway.. then why not simply do that
first?  Nothing can be connected to a DB that's being dropped and we
don't actually try to lock all the objects in a to-be-dropped DB.

Thanks,

Stephen

signature.asc
Description: PGP signature

could not link file in wal restore lines

2022-07-15 Thread Zsolt Ero

Hi,

I'm testing pgbackrest restore.

It works well, but in the pg logs I get the following lines:

could not link file "pg_wal/000101560098" to
"pg_wal/00010157006E": File exists

In total, I get 18 lines of "could not link file" and 932 lines of
"restored log file" lines.

At the end it finishes with:

redo done at 15A/A001710
last completed transaction was at log time 2022-07-15 19:41:05.175573+00
restored log file "0003015A000A" from archive
selected new timeline ID: 4
archive recovery complete
restored log file "0003.history" from archive
database system is ready to accept connections

Before going in production, I wanted to ask, if this is correct like this?
I mean are those "could not link file" lines anything to worry about?

Thanks and regards,
Zsolt

Re:Re: equivalent thing of mtr in mysql

2022-07-15 Thread merryok

I'm sorry, first time to post a thread. mtr is short for mini-transaction.
About mtr in mysql, we can refer to 
https://dev.mysql.com/doc/refman/8.0/en/glossary.html or
 
https://dev.mysql.com/blog-archive/mysql-8-0-new-lock-free-scalable-wal-design/ 




So an insert in mysql,  wrapped in a user transaction, may result in multiple 
mini-transactions, one is responsible for writing redo for undo, the other one 
for writing redo for clustered index/secondary index, and so on. Each mtr may 
modify multiple pages,  whose redo logs are guaranteed by mtr to be written 
into disk or none of them are written.











At 2022-07-13 02:57:28, "Peter J. Holzer"  wrote:
>On 2022-07-12 22:39:31 +0800, merryok wrote:
>> Hi, guys. I'm new here.
>> 
>> I'm eager to figure out what is the equivalent thing of mtr in mysql in PG.
>
>What is MTR? A search for "mtr mysql" yields "mysql test run" and
>"multi-threaded replication", neither of which seems to be what you are
>talking about. When referring to concepts from other databases, please
>include a link to the relevant documentation.
>
>> When a dml operation occurs, it may modify multiple pages and gererate 
>> multiple
>> redo log records. mtr can make those logs atomically be transferred to log
>> buffer and written to disk. 
>> I think same situation exists in PG, but I can't find something like mtr, 
>> why ?
>
>Transactions in PostgreSQL are always atomic (That should also be the
>case with MySQL, unless you use MyISAM tables). Ensuring that these
>changes also result in an atomic disk write seems to be both pointless
>and impossible (that might be many gigabytes of data).
>
>hp
>
>-- 
>   _  | Peter J. Holzer| Story must make more sense than reality.
>|_|_) ||
>| |   | h...@hjp.at |-- Charles Stross, "Creative writing
>__/   | http://www.hjp.at/ |   challenge!"

Proposal to introduce a shuffle function to intarray extension

Re: About revoking large number of privileges; And the PUBLIC role.

could not link file in wal restore lines

Re:Re: equivalent thing of mtr in mysql

4 matches

Site Navigation

Mail list logo

Footer information