Re: Facing issue in using special characters

2019-03-17 Thread Warner, Gary, Jr
Many of us have faced character encoding issues because we are not in control 
of our input sources and made the common assumption that UTF-8 covers 
everything.

In my lab, as an example, some of our social media posts have included ZawGyi 
Burmese character sets rather than Unicode Burmese.  (Because Myanmar developed 
technology In a closed to the world environment, they made up their own 
non-standard character set which is very common still in Mobile phones.). We 
had fully tested the app with Unicode Burmese, but honestly didn’t know ZawGyi 
was even a thing that we would see in our dataset.  We’ve also had problems 
with non-Unicode word separators in Arabic.

What we’ve found to be helpful is to view the troubling code in a hex editor 
and determine what non-standard characters may be causing the problem.

It may be some data conversion is necessary before insertion. But the first 
step is knowing WHICH characters are causing the issue.



Re: Distributing data over "spindles" even on AWS EBS, (followup to the work queue saga)

2019-03-17 Thread Gunther

On 3/14/2019 11:11, Jeremy Schneider wrote:

On 3/14/19 07:53, Gunther wrote:

  2. build a low level "spreading" scheme which is to take the partial
 files 4653828 and 4653828.1, .2, _fsm, etc. and move each to another
 device and then symlink it back to that directory (I come back to this!)

...

To 2. I find that it would be a nice feature of PostgreSQL if we could
just use symlinks and a symlink rule, for example, when PostgreSQL finds
that 4653828 is in fact a symlink to /otherdisk/PG/16284/4653828, then
it would

  * by default also create  4653828.1 as a symlink and place the actual
data file on /otherdisk/PG/16284/4653828.1

How about if we could just specify multiple tablespaces for an object,
and then PostgreSQL would round-robin new segments across the presently
configured tablespaces?  This seems like a simple and elegant solution
to me.


Very good idea! I agree.

Very important also would be to take out the existing patch someone had 
contributed to allow toast tables to be assigned to different tablespaces.



  4. maybe I can configure in AWS EBS to reserve more IOPS -- but why
 would I pay for more IOPS if my cost is by volume size? I can just
 make another volume? or does AWS play a similar trick on us with
 IOPS being limited on some "credit" system???

Not credits, but if you're using gp2 volumes then pay close attention to
how burst balance works. A single large volume is the same price as two
striped volumes at half size -- but the striped volumes will have double
the burst speed and take twice as long to refill the burst balance.


Yes, I learned that too. It seems a very interesting "bug" of the Amazon 
GP2 IOPS allocation scheme.  They say it's like 3 IOPS per GiB, so if I 
have 100 GiB I get 300 IOPS. But it also says minimum 100. So that means 
if I have 10 volumes of 10 GiB each, I get 1000 IOPS minimum between 
them all. But if I have it all on one 100 GiB volume I only get 300 IOPS.


I wonder if Amazon is aware of this. I hope they are and think that's 
just fine. Because I like it.


It also is a clear sign to me that I want to use page sizes > 4k for the 
file system. I have tried on Amazon Linux to use 8k block sizes of the 
XFS volume, but I cannot mount those, since the Linux says it can 
currently only deal with 4k blocks. This is another reason I consider 
switching the database server(s) to FreeBSD.  OTOH, who knows may be 
this 4k is a limit of the AWS EBS infrastructure. After all, if I am 
scraping the 300 or 1000 IOPS limit already and if I can suddenly 
upgrade my block sizes per IO, I double my IO throughput.


regards,
-Gunther