Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-09 Thread Jehan-Guillaume de Rorthais
On Sun, 5 Feb 2023 17:14:44 -0800 Peter Geoghegan wrote: ... > The OP should see the Postgres ICU docs for hints on how to use these > facilities to make a custom collation that matches whatever their > requirements are: > > https://www.postgresql.org/docs/current/collation.html#COLLATION-MANAGI

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-06 Thread Peter J. Holzer
On 2023-02-05 18:57:13 -0600, Ron wrote: > Why are you specifying the collation to be "C" when the default db encoding > is UTF8, and UTF-8 has Greek, Chinese and English encodings? C is equally bad for Greek, Chinese and English ;-) hp -- _ | Peter J. Holzer| Story must make mo

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-05 Thread Tom Lane
Dionisis Kontominas writes: >1. Regarding the different languages in the same column, that is normal >if the column is a UTF-8 one, i.e. should be able to hold for example >English, Greek and Chinese characters. In this case what is the best >approach to define the collation and l

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-05 Thread Peter Geoghegan
On Sun, Feb 5, 2023 at 4:19 PM Tom Lane wrote: > If there's a predominant language in the data, selecting a collation > matching that seems like your best bet. Otherwise, maybe you should > just shrug your shoulders and stick with C collation. It's likely > to be faster than any alternative. FW

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-05 Thread Dionisis Kontominas
Because if I don't specify the collation/lctype it seems to get the default from the OS, which in my case is : English_Netherlands.1252 (database encoding UTF8). That might not be best for truly unicode content columns, so I investigated the "C" option, which also seems not to work; might be worse

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-05 Thread Ron
Why are you specifying the collation to be "C" when the default db encoding is UTF8, and UTF-8 has Greek, Chinese and English encodings? On 2/5/23 17:08, Dionisis Kontominas wrote: Hello all,   I have a question regarding the definition of the type of a character field in a table and more spe

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-05 Thread Dionisis Kontominas
Hi Tom, 1. Regarding the different languages in the same column, that is normal if the column is a UTF-8 one, i.e. should be able to hold for example English, Greek and Chinese characters. In this case what is the best approach to define the collation and lctype of the column? Either

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-05 Thread Tom Lane
Dionisis Kontominas writes: >I suppose that affects the outcome of ORDER BY clauses on the field, > along with the content of the indexes. Is this right? Yeah. >Assuming that the requirement exists, to store UTF-8 characters on a > field that can be from multiple languages, and the datab

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-05 Thread Dionisis Kontominas
Hello Tom, Thank you for your response. I suppose that affects the outcome of ORDER BY clauses on the field, along with the content of the indexes. Is this right? Assuming that the requirement exists, to store UTF-8 characters on a field that can be from multiple languages, and the data

Re: Question regarding UTF-8 data and "C" collation on definition of field of table

2023-02-05 Thread Tom Lane
Dionisis Kontominas writes: > Let's say that the definition is for example as follows: > name character varying(8) COLLATE pg_catalog."C" NOT NULL > and also assume that the database default encoding is UTF8 and also the > Collate and Ctype is "C"". I plan to store strings of various languag