Re: Quesion about querying distributed databases

2025-03-05 Thread Ron Johnson
On Wed, Mar 5, 2025 at 9:44 PM me nefcanto  wrote:

> I once worked with a monolithic SQL Server database with more than 10
> billion records and about 8 Terabytes of data. A single backup took us more
> than 21 days. It was a nightmare.
>

25 years ago (meaning *much* slower hardware), I managed a 1TB database.
Backups took about 4 hours.  Could have gotten it down to two hours if I'd
wanted to use more tape drives.

Right now, I manage a 5TB database.  Backups take 110 minutes, and that's
when using one channel for all IO, writing to not the fastest NAS, and
other 3+TB databases backing up to it at the same time.


> Almost everybody knows that scaling up has a ceiling
>

And that ceiling is much, much higher than you think it is.


> , but scaling out has no boundaries.
>

Except for complexity and fragility. I bet I could get good scaled up
performance out of the amount of hardware you're using to scale out.

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!


Re: Quesion about querying distributed databases

2025-03-05 Thread me nefcanto
I appreciate your time guys. Thank you very much.

> Having 1 table per database per server is too ugly.

Our databases are not one table per database. They are mapped to DDD's
bounded contexts and usually by one table per domain entity.
For example, we have these databases:


   - Contacts
   - Courses
   - Seo
   - Payment
   - Forms
   - Geo
   - Sales
   - Media
   - Taxonomy
   - ...

These are the tables we have in the Contacts database:


   - Addresses
   - AddressTypes
   - Attributes
   - BankAccounts
   - ContactContents
   - Contacts
   - Emails
   - Genders
   - JobTitles
   - JuridicalPersons
   - NaturalPersonRelations
   - NaturalPersons
   - Persons
   - Phones
   - PhoneTypes
   - Relations
   - RelationTypes
   - SocialNetworks
   - SocialProfiles
   - Titles

And, these are the tables we have in the Geo database:


   - AdministrativeDivisions
   - AdministrativeDivisionTypes
   - Cities
   - CityDivisions
   - Countries
   - Locations
   - SpatialDataItems
   - TelephonePrefixes
   - TimeZones

But we also do have databases that only have one table in them. The number
of tables is not our criteria to break them. The business semantics is our
criteria.

> Cross-database on MSSQL is identical to the cross schema on Postgres.

Cross-database query in SQL Server is not equivalent to cross-schema
queries in Postgres. Because SQL Server also has the concept of schemas. In
other words, both SQL Server and Postgres let you create databases, create
schemas inside them, and create tables inside schemas. So SQL Server's
cross-schema query equals Postgres's cross-schema query.

> If you truly need cross server support (versus say beefier hardware) how
did you come to choose postgres?

We chose Postgres for these reasons that we did R&D about:


   - Native array per column support
   - Not having multiple storage engines like MariaDB to be confused about
   - Supporting expressions in unique constraints
   - It's usually considered one of the best when it comes to performance,
   especially in GIS we intend to develop more upon
   - As it claims on its website, it's the most advanced open-source
   database engine (but to be honest, we saw many serious drawbacks to that
   statement)

But here's the deal. We don't have one project only. We don't need
*cross-server
queries* for all of our projects. But we tend to keep our architecture the
same across projects as much as we can. We chose Postgres because we had
experience with SQL Server and MariaDB and assumed that cross-database
query on the same server is something natural. Both of them support that.
And both are very performant on that. On MariaDB all you have to do is to
use `db_name.table_name` and on SQL Server all you have to do is to use
`database_name.schema_name.table_name`. So we thought, for projects that do
not need more than one server, we keep databases on the same server. When
it needed more resources, we start by taking heavy databases onto their own
servers, and we start implementing table partitinong on them.

But we have experienced some amazing improvements too in our initial tests.
For example, creating all databases and tables and database objects on
MariaDB takes more than 400 seconds, while the same took 80 seconds on
Postgres. So amazing performance on DDL.
Also, 1 million records in bulk insertion take almost one-sixth to
on-fourth of the time on MariaDB. These are valuable numbers. They warmed
our hearts to keep digging as much as we can to see if we can perform this
migration.

Regards
Saeed

On Thu, Mar 6, 2025 at 7:14 AM Rob Sargent  wrote:

>
>
> On Mar 5, 2025, at 8:03 PM, Igor Korot jnit worked great for SQL Server.
> If you're small, we host them all on one server. If you get bigger, we can
> put heavy databases on separate machines.
>
>
>> However, I don't have experience working with other types of database
>> scaling. I have used table partitioning, but I have never used sharding.
>>
>> Anyway, that's why I asked you guys. However, encouraging me to go back
>> to monolith without giving solutions on how to scale, is not helping. To be
>> honest, I'm somehow disappointed by how the most advanced open source
>> database does not support cross-database querying just like how SQL Server
>> does. But if it doesn't, it doesn't. Our team should either drop it as a
>> choice or find a way (by asking the experts who built it or use it) how to
>> design based on its features. That's why I'm asking.
>>
>>
> Cross-database on MSSQL is identical to cross schema on postgres. If you
> truly need cross server support (versus say beefier hardware) how did you
> come to choose postgres?  The numbers you present are impressive but not
> unheard of on this list.
>
>


Re: Quesion about querying distributed databases

2025-03-05 Thread me nefcanto
I once worked with a monolithic SQL Server database with more than 10
billion records and about 8 Terabytes of data. A single backup took us more
than 21 days. It was a nightmare. Almost everybody knows that scaling up
has a ceiling, but scaling out has no boundaries.

Therefore I will never choose a monolithic database design unless it's a
small project. But my examples are just examples. We predict 100 million
records per year. So we have to design accordingly. And it's not just sales
records. Many applications have requirements that are cheap data but vast
in multitude. Consider a language-learning app that wants to store the
known words of any learner. 10 thousand learners each knowing 2 thousand
words means 20 million records. Convert that to 100 thousand learners each
knowing 7 thousand words and now you almost have a billion records. Cheap,
but necessary. Let's not dive into telemetry or time-series data.

We initially chose to break the database into smaller databases, because it
seemed natural for our modularized monolith architecture. And it worked
great for SQL Server. If you're small, we host them all on one server. If
you get bigger, we can put heavy databases on separate machines.

However, I don't have experience working with other types of database
scaling. I have used table partitioning, but I have never used sharding.

Anyway, that's why I asked you guys. However, encouraging me to go back to
monolith without giving solutions on how to scale, is not helping. To be
honest, I'm somehow disappointed by how the most advanced open source
database does not support cross-database querying just like how SQL Server
does. But if it doesn't, it doesn't. Our team should either drop it as a
choice or find a way (by asking the experts who built it or use it) how to
design based on its features. That's why I'm asking.

One thing that comes to my mind, is to use custom types. Instead of storing
data in ItemCategories and ItemAttributes, store them as arrays in the
relevant tables in the same database. But then it seems to me that in this
case, Mongo would become a better choice because I lose the relational
nature and normalization somehow. What drawbacks have you experienced in
that sense?

Regards
Saeed

On Wed, Mar 5, 2025 at 7:38 PM Adrian Klaver 
wrote:

> On 3/5/25 04:15, me nefcanto wrote:
> > Dear Laurenz, the point is that I think if we put all databases into one
> > database, then we have blocked our growth in the future.
>
> How?
>
> > A monolith database can be scaled only vertically. We have had huge
> > headaches in the past with SQL Server on Windows and a single database.
> > But when you divide bounded contexts into different databases, then you
> > have the chance to deploy each database on a separate physical machine.
> > That means a lot in terms of performance. Please correct me if I am
> wrong.
>
> And you add the complexity of talking across machines, as well as
> maintaining separate machines.
>
> >
> > Let's put this physical restriction on ourselves that we have different
> > databases. What options do we have? One option that comes to my mind, is
> > to store the ID of the categories in the Products table. This means that
> > I don't need FDW anymore. And databases can be on separate machines. I
> > first query the categories database first, get the category IDs, and
> > then add a where clause to limit the product search. That could be an
> > option. Array data type in Postgres is something that I think other
> > RDBMSs do not have. Will that work? And how about attributes? Because
> > attributes are more than a single ID. I should store the attribute key,
> > alongside its value. It's a key-value pair. What can I do for that?
>
> You seem to be going out of the way to make your life more complicated.
>
> The only way you are going to find an answer is set up test cases and
> experiment. My bet is a single server with a single database and
> multiple schemas is where you end up, after all that is where you are
> starting from.
>
>
> >
> > Thank you for sharing your time. I really appreciate it.
> > Saeed
> >
> >
> >
> >
> >
> > On Wed, Mar 5, 2025 at 3:18 PM Laurenz Albe  > > wrote:
> >
> > On Wed, 2025-03-05 at 14:18 +0330, me nefcanto wrote:
> >  > That means a solid monolith database. We lose many goodies with
> that.
> >  > As a real-world example, right now we can import a single database
> >  > from the production to the development to test and troubleshoot
> data.
> >
> > Well, can't you import a single schema then?
> >
> >  > What if we host all databases on the same server and use FDW. What
> >  > happens in that case? Does it return 100 thousand records and join
> >  > in the memory?
> >
> > It will do just the same thing.  The performance could be better
> > because of the reduced latency.
> >
> >  > Because in SQL Server, when you perform a cross-database query
> >  > (not cross-server

Re: Quesion about querying distributed databases

2025-03-05 Thread Rob Sargent


> On Mar 5, 2025, at 8:03 PM, Igor Korot jnit worked great for SQL Server. If 
> you're small, we host them all on one server. If you get bigger, we can put 
> heavy databases on separate machines.
>> 
>> However, I don't have experience working with other types of database 
>> scaling. I have used table partitioning, but I have never used sharding.
>> 
>> Anyway, that's why I asked you guys. However, encouraging me to go back to 
>> monolith without giving solutions on how to scale, is not helping. To be 
>> honest, I'm somehow disappointed by how the most advanced open source 
>> database does not support cross-database querying just like how SQL Server 
>> does. But if it doesn't, it doesn't. Our team should either drop it as a 
>> choice or find a way (by asking the experts who built it or use it) how to 
>> design based on its features. That's why I'm asking.
>> 

Cross-database on MSSQL is identical to cross schema on postgres. If you truly 
need cross server support (versus say beefier hardware) how did you come to 
choose postgres?  The numbers you present are impressive but not unheard of on 
this list. 



Re: Error on query execution

2025-03-05 Thread Igor Korot
Tom,

On Tue, Mar 4, 2025, 8:37 PM Tom Lane  wrote:

> Igor Korot  writes:
> > binaryIntVal = htonl( (uint32_t) type );
> > paramValues[0] = (char *) &binaryIntVal;
> > paramLengths[0] = sizeof( binaryIntVal );
> > paramFormats[0] = 1;
>
> You're apparently trying to pass this parameter as an int4 ...
>

I thought only binary or text is allowed as parameters.

Guess I was wrong...

Thank you.


> > res = PQexecParams( m_db, "SELECT * FROM abcatfmt WHERE abf_type =
> > $1", 1, nullptr, paramValues, paramLengths, paramFormats, 1 );
>
> ... but given that you didn't specify any data type, I think the
> parser will fall back to assuming that $1 is the same type as
> "abf_type", whatever that is.  Passing data in binary is not at all
> forgiving about getting the data type right.
>
> regards, tom lane
>


Re: Quesion about querying distributed databases

2025-03-05 Thread Igor Korot
Hi,

On Wed, Mar 5, 2025, 8:44 PM me nefcanto  wrote:

> I once worked with a monolithic SQL Server database with more than 10
> billion records and about 8 Terabytes of data. A single backup took us more
> than 21 days. It was a nightmare. Almost everybody knows that scaling up
> has a ceiling, but scaling out has no boundaries.
>

But then you did the backup incrementally correct?

That should not take the same amount of time...



> Therefore I will never choose a monolithic database design unless it's a
> small project. But my examples are just examples. We predict 100 million
> records per year. So we have to design accordingly. And it's not just sales
> records. Many applications have requirements that are cheap data but vast
> in multitude. Consider a language-learning app that wants to store the
> known words of any learner. 10 thousand learners each knowing 2 thousand
> words means 20 million records. Convert that to 100 thousand learners each
> knowing 7 thousand words and now you almost have a billion records. Cheap,
> but necessary. Let's not dive into telemetry or time-series data.
>

Can you try and see if 1 server with 3 different databases will do?

Having 1 table per database per server is too ugly.

Also please understand - every databae is different. And so it works and
operates differently. What work good in one may not work good in another...

Thank you.


> We initially chose to break the database into smaller databases, because
> it seemed natural for our modularized monolith architecture. And it worked
> great for SQL Server. If you're small, we host them all on one server. If
> you get bigger, we can put heavy databases on separate machines.
>
> However, I don't have experience working with other types of database
> scaling. I have used table partitioning, but I have never used sharding.
>
> Anyway, that's why I asked you guys. However, encouraging me to go back to
> monolith without giving solutions on how to scale, is not helping. To be
> honest, I'm somehow disappointed by how the most advanced open source
> database does not support cross-database querying just like how SQL Server
> does. But if it doesn't, it doesn't. Our team should either drop it as a
> choice or find a way (by asking the experts who built it or use it) how to
> design based on its features. That's why I'm asking.
>
> One thing that comes to my mind, is to use custom types. Instead of
> storing data in ItemCategories and ItemAttributes, store them as arrays in
> the relevant tables in the same database. But then it seems to me that in
> this case, Mongo would become a better choice because I lose the relational
> nature and normalization somehow. What drawbacks have you experienced in
> that sense?
>
> Regards
> Saeed
>
> On Wed, Mar 5, 2025 at 7:38 PM Adrian Klaver 
> wrote:
>
>> On 3/5/25 04:15, me nefcanto wrote:
>> > Dear Laurenz, the point is that I think if we put all databases into
>> one
>> > database, then we have blocked our growth in the future.
>>
>> How?
>>
>> > A monolith database can be scaled only vertically. We have had huge
>> > headaches in the past with SQL Server on Windows and a single database.
>> > But when you divide bounded contexts into different databases, then you
>> > have the chance to deploy each database on a separate physical machine.
>> > That means a lot in terms of performance. Please correct me if I am
>> wrong.
>>
>> And you add the complexity of talking across machines, as well as
>> maintaining separate machines.
>>
>> >
>> > Let's put this physical restriction on ourselves that we have different
>> > databases. What options do we have? One option that comes to my mind,
>> is
>> > to store the ID of the categories in the Products table. This means
>> that
>> > I don't need FDW anymore. And databases can be on separate machines. I
>> > first query the categories database first, get the category IDs, and
>> > then add a where clause to limit the product search. That could be an
>> > option. Array data type in Postgres is something that I think other
>> > RDBMSs do not have. Will that work? And how about attributes? Because
>> > attributes are more than a single ID. I should store the attribute key,
>> > alongside its value. It's a key-value pair. What can I do for that?
>>
>> You seem to be going out of the way to make your life more complicated.
>>
>> The only way you are going to find an answer is set up test cases and
>> experiment. My bet is a single server with a single database and
>> multiple schemas is where you end up, after all that is where you are
>> starting from.
>>
>>
>> >
>> > Thank you for sharing your time. I really appreciate it.
>> > Saeed
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Mar 5, 2025 at 3:18 PM Laurenz Albe > > > wrote:
>> >
>> > On Wed, 2025-03-05 at 14:18 +0330, me nefcanto wrote:
>> >  > That means a solid monolith database. We lose many goodies with
>> that.
>> >  > As a real-world example, right 

Re: Quesion about querying distributed databases

2025-03-05 Thread Laurenz Albe
On Wed, 2025-03-05 at 12:57 +0330, me nefcanto wrote:
> Right now this data is in MariaDB, on separate databases (schema) but on one
> server. The solution in this situation is to have a cross-database query.
> (this is the status quo of our application)
> 
> Now our team has decided to migrate to Postgres. However, we realized that
> Postgres does not support cross-database queries. And if we want to do so,
> we should use FDW. So, we thought we might as well put databases on separate
> servers for scalability if we have to write more code. That's the reason
> behind this question.

In MySQL, the terms "database" and "schema" are used for the same thing.
Not so in PostgreSQL.  I think you should migrate the data into different
schemas in a single database, pretty much like you had it in MySQL.
Then you don't need a foreign data wrapper, and I bet the query can
perform as well as it did on MySQL.

Yours,
Laurenz Albe





Re: Quesion about querying distributed databases

2025-03-05 Thread Laurenz Albe
On Wed, 2025-03-05 at 14:18 +0330, me nefcanto wrote:
> That means a solid monolith database. We lose many goodies with that.
> As a real-world example, right now we can import a single database
> from the production to the development to test and troubleshoot data.

Well, can't you import a single schema then?

> What if we host all databases on the same server and use FDW. What
> happens in that case? Does it return 100 thousand records and join
> in the memory?

It will do just the same thing.  The performance could be better
because of the reduced latency.

> Because in SQL Server, when you perform a cross-database query
> (not cross-server) the performance is extremely good, proving that
> it does not return 100 thousand ItemId from Taxonomy.ItemCategories
> to join with ProductId.
> 
> Is that the same case with Postgres too, If databases are located
> on one server?

No, you cannot perform cross-database queries without a foreign
data wrapper.  I don't see a reason why the statement shouldn't
perform as well as in SQL Server if you use schemas instead of
databases.

Yours,
Laurenz Albe




Re: Quesion about querying distributed databases

2025-03-05 Thread me nefcanto
Dear Laurenz,

That means a solid monolith database. We lose many goodies with that. As a
real-world example, right now we can import a single database from the
production to the development to test and troubleshoot data.

What if we host all databases on the same server and use FDW. What happens
in that case? Does it return 100 thousand records and join in the memory?

Because in SQL Server, when you perform a cross-database query (not
cross-server) the performance is extremely good, proving that it does not
return 100 thousand ItemId from Taxonomy.ItemCategories to join with
ProductId.

Is that the same case with Postgres too, If databases are located on one
server?

Regards
Saeed


Re: Quesion about querying distributed databases

2025-03-05 Thread me nefcanto
Laurenz Albe, thanks for your answer.

Right now this data is in MariaDB, on separate databases (schema) but on
one server. The solution in this situation is to have a cross-database
query. (this is the status quo of our application)

Now our team has decided to migrate to Postgres. However, we realized that
Postgres does not support cross-database queries. And if we want to do so,
we should use FDW. So, we thought we might as well put databases on
separate servers for scalability if we have to write more code. That's the
reason behind this question.

But we're stuck at performance. In SQL Server and MariaDB, cross-database
queries allow for neat separation of data while delivering good performance
in the orchestration layer. You have separate databases, which allows for
fine-grained management (different backup schedules, index recalculation,
deployment, etc.) but at the same time you can write a query in your app,
or in an orchestrator database (let's call it All) that is fast enough for
millions of records.

However, we're stuck in this in Postgres. What solutions exist for this
problem?

Regards
Saeed


On Wed, Mar 5, 2025 at 11:09 AM Laurenz Albe 
wrote:

> On Wed, 2025-03-05 at 10:12 +0330, me nefcanto wrote:
> > Adrian Klaver, thank you for the link. I asked the AI to create a query
> for me using FDW.
> >
> > The problem here is that it collects all of the product_id values from
> the ItemCategories table [...]
> >
> > That's not scalable. Is there a workaround for this?
>
> Without having scrutinized the case in detail: if your data are organized
> in an
> entity-attribute-value design and distributed across three databases, you
> cannot
> expect to end up with efficient queries.
>
> Perhaps you can extract the data and load them into a reasonably organized
> single database.  Such an ETL process might make the task much easier.
>
> Yours,
> Laurenz Albe
>


Re: Error on query execution

2025-03-05 Thread Igor Korot
Hi, Tom,

On Tue, Mar 4, 2025 at 8:37 PM Tom Lane  wrote:
>
> Igor Korot  writes:
> > binaryIntVal = htonl( (uint32_t) type );
> > paramValues[0] = (char *) &binaryIntVal;
> > paramLengths[0] = sizeof( binaryIntVal );
> > paramFormats[0] = 1;
>
> You're apparently trying to pass this parameter as an int4 ...
>
> > res = PQexecParams( m_db, "SELECT * FROM abcatfmt WHERE abf_type =
> > $1", 1, nullptr, paramValues, paramLengths, paramFormats, 1 );
>
> ... but given that you didn't specify any data type, I think the
> parser will fall back to assuming that $1 is the same type as
> "abf_type", whatever that is.  Passing data in binary is not at all
> forgiving about getting the data type right.

abf_type is declared as smallnt:

queries.push_back( L"CREATE TABLE IF NOT EXISTS abcatfmt(abf_name
char(30) NOT NULL, abf_frmt char(254), abf_type smallint, abf_cntr
integer, PRIMARY KEY( abf_name ));" );

Thank you.

>
> regards, tom lane




Re: Duplicate Key Values

2025-03-05 Thread Ron Johnson
On Wed, Mar 5, 2025 at 12:36 PM mark bradley 
wrote:

> Although I did not enter them, somehow duplicate primary key values have
> appeared in one of my tables.
>

Is it a text/varchar column?  Has the distro been upgraded "recently", or
maybe streamed from an older Linux system to a newer Linux system?


> I can't remove the offending rows because the key is a foreign key in
> another table.
>
> What to do?
>

Are the records completely (i.e., all columns) identical?

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!


Re: Quesion about querying distributed databases

2025-03-05 Thread Adrian Klaver

On 3/5/25 04:15, me nefcanto wrote:
Dear Laurenz, the point is that I think if we put all databases into one 
database, then we have blocked our growth in the future.


How?

A monolith database can be scaled only vertically. We have had huge 
headaches in the past with SQL Server on Windows and a single database.
But when you divide bounded contexts into different databases, then you 
have the chance to deploy each database on a separate physical machine. 
That means a lot in terms of performance. Please correct me if I am wrong.


And you add the complexity of talking across machines, as well as 
maintaining separate machines.




Let's put this physical restriction on ourselves that we have different 
databases. What options do we have? One option that comes to my mind, is 
to store the ID of the categories in the Products table. This means that 
I don't need FDW anymore. And databases can be on separate machines. I 
first query the categories database first, get the category IDs, and 
then add a where clause to limit the product search. That could be an 
option. Array data type in Postgres is something that I think other 
RDBMSs do not have. Will that work? And how about attributes? Because 
attributes are more than a single ID. I should store the attribute key, 
alongside its value. It's a key-value pair. What can I do for that?


You seem to be going out of the way to make your life more complicated.

The only way you are going to find an answer is set up test cases and 
experiment. My bet is a single server with a single database and 
multiple schemas is where you end up, after all that is where you are 
starting from.





Thank you for sharing your time. I really appreciate it.
Saeed





On Wed, Mar 5, 2025 at 3:18 PM Laurenz Albe > wrote:


On Wed, 2025-03-05 at 14:18 +0330, me nefcanto wrote:
 > That means a solid monolith database. We lose many goodies with that.
 > As a real-world example, right now we can import a single database
 > from the production to the development to test and troubleshoot data.

Well, can't you import a single schema then?

 > What if we host all databases on the same server and use FDW. What
 > happens in that case? Does it return 100 thousand records and join
 > in the memory?

It will do just the same thing.  The performance could be better
because of the reduced latency.

 > Because in SQL Server, when you perform a cross-database query
 > (not cross-server) the performance is extremely good, proving that
 > it does not return 100 thousand ItemId from Taxonomy.ItemCategories
 > to join with ProductId.
 >
 > Is that the same case with Postgres too, If databases are located
 > on one server?

No, you cannot perform cross-database queries without a foreign
data wrapper.  I don't see a reason why the statement shouldn't
perform as well as in SQL Server if you use schemas instead of
databases.

Yours,
Laurenz Albe



--
Adrian Klaver
adrian.kla...@aklaver.com





Duplicate Key Values

2025-03-05 Thread mark bradley
Although I did not enter them, somehow duplicate primary key values have 
appeared in one of my tables.  I can't remove the offending rows because the 
key is a foreign key in another table.

What to do?

Mark Brady,
amazon.com/author/markjbrady


Re: end of COPY

2025-03-05 Thread Greg Sabino Mullane
On Wed, Mar 5, 2025 at 10:22 AM Marc Millas  wrote:

> Then the flow contains a single line: \. to my understanding this means
> end of the copy
>
> but, Postgres generates an error : invalid input syntax for type numeric
> "\."
>

This can happen when you send TWO backslashes and a dot, rather than a
single backslash and a dot.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: end of COPY

2025-03-05 Thread Adrian Klaver

On 3/5/25 07:22, Marc Millas wrote:

Hi,
on a Postgres 16 DB running on a redhat 8.5 x86 machineI want to input 
some data using copy from stdin

so.. more that 700 000 lines goes well.
Then the flow contains a single line: \.
to my understanding this means end of the copy


Provide the lines at the end of the data.



but, Postgres generates an error :
invalid input syntax for type numeric "\."
then the 2 lines stating the copy command that was executed.


Provide the complete error message.



what am I missing ?

thanks


Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com 



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: end of COPY

2025-03-05 Thread Ron Johnson
On Wed, Mar 5, 2025 at 10:53 AM Greg Sabino Mullane 
wrote:

> On Wed, Mar 5, 2025 at 10:22 AM Marc Millas 
> wrote:
>
>> Then the flow contains a single line: \. to my understanding this means
>> end of the copy
>>
>> but, Postgres generates an error : invalid input syntax for type numeric
>> "\."
>>
>
> This can happen when you send TWO backslashes and a dot, rather than a
> single backslash and a dot.
>

I hate proportional fonts. 😠

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!


Re: Quesion about querying distributed databases

2025-03-05 Thread Greg Sabino Mullane
On Wed, Mar 5, 2025 at 7:15 AM me nefcanto  wrote:

> I think if we put all databases into one database, then we have blocked
> our growth in the future.
>

I think this is premature optimization. Your products table has 100,000
rows. That's very tiny for the year 2025. Try putting everything on one box
with good indexes and you might be surprised at the performance.


> A monolith database can be scaled only vertically.
>

Postgres scales well vertically. Plus, you can have streaming replicas to
distribute the read queries (like the one given here) across many machines.


> We have had huge headaches in the past with SQL Server on Windows and a
> single database.
> But when you divide bounded contexts into different databases, then you
> have the chance to deploy each database on a separate physical machine.
> That means a lot in terms of performance.
>

I get your concern, but if the data is inter-related, it really is best to
have them on the same server (and same database, and same schema). Then
Postgres can devise a really efficient plan. You can also use Citus to
start sharding things across multiple physical servers if your database
gets very large.

Let's put this physical restriction on ourselves that we have different
> databases. What options do we have?
>

Your main option is FDW, which will never perform as well as a single
server. Plus, you have the additional headache of trying to coordinate data
updates atomically across different servers. The other option is to have
the application do the work, e.g. pull a list of things from one server,
use that to build a query against another one, etc. Definitely not ideal.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


end of COPY

2025-03-05 Thread Marc Millas
Hi,
on a Postgres 16 DB running on a redhat 8.5 x86 machineI want to input some
data using copy from stdin
so.. more that 700 000 lines goes well.
Then the flow contains a single line: \.
to my understanding this means end of the copy

but, Postgres generates an error :
invalid input syntax for type numeric "\."
then the 2 lines stating the copy command that was executed.

what am I missing ?

thanks


Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com


Virtual patching software for PostgreSQL

2025-03-05 Thread Mike Yeap
Hi all, I'm looking for an alternative software that does virtual patching
for PostgreSQL DB servers to the Trellix Database Security (previously
named McAfee Database Security) I'm currently using

Appreciate if anyone using something like that can share some info

Thanks!

Regards,
Mike Yeap


Re: Quesion about querying distributed databases

2025-03-05 Thread me nefcanto
Dear Laurenz, the point is that I think if we put all databases into one
database, then we have blocked our growth in the future.
A monolith database can be scaled only vertically. We have had huge
headaches in the past with SQL Server on Windows and a single database.
But when you divide bounded contexts into different databases, then you
have the chance to deploy each database on a separate physical machine.
That means a lot in terms of performance. Please correct me if I am wrong.

Let's put this physical restriction on ourselves that we have different
databases. What options do we have? One option that comes to my mind, is to
store the ID of the categories in the Products table. This means that I
don't need FDW anymore. And databases can be on separate machines. I first
query the categories database first, get the category IDs, and then add a
where clause to limit the product search. That could be an option. Array
data type in Postgres is something that I think other RDBMSs do not have.
Will that work? And how about attributes? Because attributes are more than
a single ID. I should store the attribute key, alongside its value. It's a
key-value pair. What can I do for that?

Thank you for sharing your time. I really appreciate it.
Saeed





On Wed, Mar 5, 2025 at 3:18 PM Laurenz Albe 
wrote:

> On Wed, 2025-03-05 at 14:18 +0330, me nefcanto wrote:
> > That means a solid monolith database. We lose many goodies with that.
> > As a real-world example, right now we can import a single database
> > from the production to the development to test and troubleshoot data.
>
> Well, can't you import a single schema then?
>
> > What if we host all databases on the same server and use FDW. What
> > happens in that case? Does it return 100 thousand records and join
> > in the memory?
>
> It will do just the same thing.  The performance could be better
> because of the reduced latency.
>
> > Because in SQL Server, when you perform a cross-database query
> > (not cross-server) the performance is extremely good, proving that
> > it does not return 100 thousand ItemId from Taxonomy.ItemCategories
> > to join with ProductId.
> >
> > Is that the same case with Postgres too, If databases are located
> > on one server?
>
> No, you cannot perform cross-database queries without a foreign
> data wrapper.  I don't see a reason why the statement shouldn't
> perform as well as in SQL Server if you use schemas instead of
> databases.
>
> Yours,
> Laurenz Albe
>


Re: Quesion about querying distributed databases

2025-03-05 Thread Laurenz Albe
On Wed, 2025-03-05 at 15:45 +0330, me nefcanto wrote:
> Dear Laurenz, the point is that I think if we put all databases into one 
> database,
> then we have blocked our growth in the future.

Hard to say.
If you want to shard for horizontal scaling, that usually only works well
if there are few interconnections between the different shards.  If you end
up joining data from different shards, you usually lose.

> A monolith database can be scaled only vertically. We have had huge headaches
> in the past with SQL Server on Windows and a single database.

Without knowing what the headaches were, it is hard to answer something here.

> But when you divide bounded contexts into different databases, then you have
> the chance to deploy each database on a separate physical machine. That means
> a lot in terms of performance. Please correct me if I am wrong.

I don't know if you are wrong.  But it seems like it is not working well,
is it?  Perhaps you can explain how splitting up the data might result in
better performance.  Is that just a guess or do you have a reason to think so?

> Let's put this physical restriction on ourselves that we have different
> databases. What options do we have?

I don't know.  Based on what you showed: perhaps a complete re-design?

Yours,
Laurenz Albe




Re: end of COPY

2025-03-05 Thread Ron Johnson
On Wed, Mar 5, 2025 at 10:22 AM Marc Millas  wrote:

> Hi,
> on a Postgres 16 DB running on a redhat 8.5 x86 machineI want to input
> some data using copy from stdin
> so.. more that 700 000 lines goes well.
> Then the flow contains a single line: \.
> to my understanding this means end of the copy
>

I just tested this on PG 16.8.  No backslash.
$ psql TAP -Xc "copy public.job_notification TO STDOUT;"

https://www.postgresql.org/docs/16/sql-copy.html
"End of data *can be* represented by a single line containing just
backslash-period (\.)."
"the end-of-data marker (\.) or the null string (\N by default)."


> but, Postgres generates an error :
> invalid input syntax for type numeric "\."
> then the 2 lines stating the copy command that was executed.
>
> what am I missing ?
>

COPY is expecting something after the backslash.

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!


Re: end of COPY

2025-03-05 Thread Marc Millas
Greg... 👍

works fine now.

thanks !


Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Wed, Mar 5, 2025 at 4:53 PM Greg Sabino Mullane 
wrote:

> On Wed, Mar 5, 2025 at 10:22 AM Marc Millas 
> wrote:
>
>> Then the flow contains a single line: \. to my understanding this means
>> end of the copy
>>
>> but, Postgres generates an error : invalid input syntax for type numeric
>> "\."
>>
>
> This can happen when you send TWO backslashes and a dot, rather than a
> single backslash and a dot.
>
> Cheers,
> Greg
>
> --
> Crunchy Data - https://www.crunchydata.com
> Enterprise Postgres Software Products & Tech Support
>
>


Re: Duplicate Key Values

2025-03-05 Thread Ron Johnson
What does "and unique identifier" mean?

(Creating a Primary Key constraint automatically creates a unique
no-nulls-allowed index on the column(s), so no other index is needed.)

Can you show the table definition?

Anyway, you *should* be able to delete one of the duplicate node_id records
by specifying another column with a different value;

On Wed, Mar 5, 2025 at 6:14 PM mark bradley  wrote:

> Hi Ron,
>
> The key is an integer.  I'm using pGAdmin4 and recently updated to the
> latest version.
>
> The records are not all identical, some have NULL values of the non-key
> for some unknown reason.  Here is a screenshot.  The key is node_id, which
> I have specified as a key and unique identifier.
>
>
>
> Mark Brady,
> *amazon.com/author/markjbrady *
> --
> *From:* Ron Johnson 
> *Sent:* Wednesday, March 5, 2025 12:50 PM
> *To:* pgsql-general@lists.postgresql.org <
> pgsql-general@lists.postgresql.org>
> *Subject:* Re: Duplicate Key Values
>
> On Wed, Mar 5, 2025 at 12:36 PM mark bradley 
> wrote:
>
> Although I did not enter them, somehow duplicate primary key values have
> appeared in one of my tables.
>
>
> Is it a text/varchar column?  Has the distro been upgraded "recently", or
> maybe streamed from an older Linux system to a newer Linux system?
>
>
> I can't remove the offending rows because the key is a foreign key in
> another table.
>
> What to do?
>
>
> Are the records completely (i.e., all columns) identical?
>
> --
> Death to , and butter sauce.
> Don't boil me, I'm still alive.
>  lobster!
>


-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!


Re: Duplicate Key Values

2025-03-05 Thread Adrian Klaver

On 3/5/25 15:37, Ron Johnson wrote:

What does "and unique identifier" mean?

(Creating a Primary Key constraint automatically creates a unique 
no-nulls-allowed index on the column(s), so no other index is needed.)


Can you show the table definition?

Anyway, you /should/ be able to delete one of the duplicate node_id 
records by specifying another column with a different value;


On Wed, Mar 5, 2025 at 6:14 PM mark bradley > wrote:


Hi Ron,

The key is an integer.  I'm using pGAdmin4 and recently updated to
the latest version.


1) Did you not see duplicates with the old version of pgAdmin4?

2) What do you see if you use psql?

3) Did you upgrade/move the Postgres server or the underlying OS?

4) Have you tried reindexing the node_id field?



The records are not all identical, some have NULL values of the
non-key for some unknown reason.  Here is a screenshot.  The key is
node_id, which I have specified as a key and unique identifier.






--
Adrian Klaver
adrian.kla...@aklaver.com