Re: Converting UTF-8 email text to PDF

2025-02-14 Thread Linux-Fan
Loris Bennett writes: Hi, I am using Emacs' Gnus to display a buffer containing an email. I want to convert this email to a PDF file. [...] Does anyone have a better suggestion? If you can get your e-mail in .eml format, you could use a tool specifically targeted at converting e-mails t

Re: Converting UTF-8 email text to PDF

2025-02-13 Thread Ralph Katz
On 2/13/25 03:32, Loris Bennett wrote: Hi, I am using Emacs' Gnus to display a buffer containing an email. I want to convert this email to a PDF file. If I save the mail to a file I get: $ file einladung.txt einladung.txt: news or mail, Unicode text, UTF-8 text Does anyone h

Re: Converting UTF-8 email text to PDF

2025-02-13 Thread Richmond
On 13/02/2025 10:32, Loris Bennett wrote: > Hi, > > I am using Emacs' Gnus to display a buffer containing an email. > I want to convert this email to a PDF file. > > You can print the file into a pdf from a web browser. Put file:/// into the location bar and navigate to the text file, select it and

Re: Converting UTF-8 email text to PDF (addendum)

2025-02-13 Thread Hans
I got this information from here: https://askubuntu.com/questions/27097/how-to-print-a-regular-file-to-pdf-from-command-line[1] where are shown other ways, too. Best Hans [1] https://askubuntu.com/questions/27097/how-to-print-a-regular-file-to-pdf-from-command-line

Re: Converting UTF-8 email text to PDF

2025-02-13 Thread Hans
Hi Loris, you wrote > $ file einladung.txt > einladung.txt: news or mail, Unicode text, UTF-8 text So you already have a simple textfile. Did you try the following? enscript einladung.txt -o - | ps2pdf - einladung.pdf Please note, "enscript" is a package which can easily be

Re: Converting UTF-8 email text to PDF

2025-02-13 Thread Loris Bennett
13. Februar 2025, 11:32:09 CET schrieb Loris Bennett: >> Hi, >> >> I am using Emacs' Gnus to display a buffer containing an email. >> I want to convert this email to a PDF file. >> >> If I save the mail to a file I get: >> >> $ file einlad

Re: Converting UTF-8 email text to PDF

2025-02-13 Thread Loris Bennett
Rand Pritelrohm writes: > On 2025-02-13 11:32:09, Loris Bennett wrote: > >> Hi, >> > [snip] >> >> Does anyone have a better suggestion? >> > [snip] > > Hello, > > I use with success 'paps' > https://github.com/dov/paps > > Maybe you can find it in repos. 'paps' is indeed available for bookwo

Re: Converting UTF-8 email text to PDF

2025-02-13 Thread Rand Pritelrohm
On 2025-02-13 11:32:09, Loris Bennett wrote: > Hi, > [snip] > > Does anyone have a better suggestion? > [snip] Hello, I use with success 'paps' https://github.com/dov/paps Maybe you can find it in repos. Regards, Rand

Re: Converting UTF-8 email text to PDF

2025-02-13 Thread Hans
acs' Gnus to display a buffer containing an email. > I want to convert this email to a PDF file. > > If I save the mail to a file I get: > > $ file einladung.txt > einladung.txt: news or mail, Unicode text, UTF-8 text > > I have tried the following approaches to co

Converting UTF-8 email text to PDF

2025-02-13 Thread Loris Bennett
Hi, I am using Emacs' Gnus to display a buffer containing an email. I want to convert this email to a PDF file. If I save the mail to a file I get: $ file einladung.txt einladung.txt: news or mail, Unicode text, UTF-8 text I have tried the following approaches to converting to PD

UTF-8 Everywhere -- Re: how u mine 4 utf8 [was Re: Using .XCompose]

2020-07-17 Thread Zenaan Harkness
Just in case anyone missed the memo, essential reading [significantly beautified since last I looked, looks like some of the structure intent from the link at bottom has been usefully incorporated]: UTF-8 Everywhere Manifesto http://utf8everywhere.org/ Found in this dark and dingy

Wordpress UTF-8 problem after PHP upgrade (from 5.6 to 7.3)

2020-06-23 Thread Christoph K.
Hello, I've just upgraded from PHP 5.6 to PHP 7.3 (and reverted back for now). With PHP 7.3 there is a problem in Wordpress displaying german Umlaute and other UTF-8 related characters. I guess it's just some locale-related setting somewhere, any suggestions? Thanks, Christoph

Re: en_DK.UTF-8 UTF-8

2019-06-06 Thread Andrei POPESCU
On Du, 24 mar 19, 00:12:50, info wrote: > Hi debian > > Why can I not fint "en_DK.UTF-8 UTF-8" when I install a normal os from you > (it is not a server version)?? Try installing in "expert" mode, it should show you the full list of locales to chose from

Re: en_DK.UTF-8 UTF-8

2019-03-24 Thread Urs Thuermann
info writes: > Why can I not fint "en_DK.UTF-8 UTF-8" when I install a normal os from > you (it is not a server version)?? > > I know that I can set it after the installation, but it is not the same!! You have to generate all locales you want to use. Run dpkg-re

en_DK.UTF-8 UTF-8

2019-03-23 Thread info
Hi debian Why can I not fint "en_DK.UTF-8 UTF-8" when I install a normal os from you (it is not a server version)?? I know that I can set it after the installation, but it is not the same!! /allan

Re: utf

2018-04-06 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, Apr 06, 2018 at 02:12:22PM +1200, Ben Caradoc-Davies wrote: > On 06/04/18 09:33, deloptes wrote: > >Stefan Monnier wrote: > >>UUIC that's partly why it's finally losing popularity and being replaced > >>with json for that use.  I'm not familiar

Re: utf

2018-04-05 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, Apr 05, 2018 at 09:14:51PM -0400, Stefan Monnier wrote: > >> UUIC that's partly why it's finally losing popularity and being replaced > >> with json for that use.  I'm not familiar enough with json to know if > >> it's really a good replacement

Re: utf

2018-04-05 Thread Ben Caradoc-Davies
On 06/04/18 09:33, deloptes wrote: Stefan Monnier wrote: UUIC that's partly why it's finally losing popularity and being replaced with json for that use.  I'm not familiar enough with json to know if it's really a good replacement, but it does look like an improvement. that is simply not true.

Re: utf

2018-04-05 Thread Stefan Monnier
>> UUIC that's partly why it's finally losing popularity and being replaced >> with json for that use.  I'm not familiar enough with json to know if >> it's really a good replacement, but it does look like an improvement. > that is simply not true. Did you read the text to which I was responding?

Re: utf

2018-04-05 Thread deloptes
Stefan Monnier wrote: > UUIC that's partly why it's finally losing popularity and being replaced > with json for that use.  I'm not familiar enough with json to know if > it's really a good replacement, but it does look like an improvement. that is simply not true. JSON might be more simple, and

Re: utf

2018-04-05 Thread Stefan Monnier
> But (mis-)using it as a data serialization language must be one > of the worst (and ugliest) misunderstandings IT has had the last > 20 years. UUIC that's partly why it's finally losing popularity and being replaced with json for that use. I'm not familiar enough with json to know if it's reall

Re: utf

2018-04-05 Thread deloptes
to...@tuxteam.de wrote: > On Thu, Apr 05, 2018 at 02:56:47PM +0200, Nicolas George wrote: >> to...@tuxteam.de (2018-04-05): >> > But then when I see people proposing XML as structured data >> > representation, I suddenly grow very sad... >> >> Isn't it? > > For the last 6 years I've seen it done

Re: utf

2018-04-05 Thread rhkramer
On Thursday, April 05, 2018 08:42:39 AM rhkra...@gmail.com wrote: > I'm laughing (at myself)--I just checked my mail directory, I have at least > 4 mbox files (and then I stopped looking) greater than 175 MB. One of > them, my inbox, is 2.8 GB--no problems. > > I do need to compact my inbox, and

Re: utf

2018-04-05 Thread Stefan Monnier
> Actually people saying mbox is a bad database are in principle right > (I never liked maildir either: dumping metadata into file names seemed > to me a bit disgusting too, but I disgress). But there's something > special about mail databases which eases that a bit: records (i.e. > mails) are *mo

Re: utf

2018-04-05 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, Apr 05, 2018 at 02:56:47PM +0200, Nicolas George wrote: > to...@tuxteam.de (2018-04-05): > > But then when I see people proposing XML as structured data > > representation, I suddenly grow very sad... > > Isn't it? For the last 6 years I've s

Re: utf

2018-04-05 Thread Nicolas George
to...@tuxteam.de (2018-04-05): > But then when I see people proposing XML as structured data > representation, I suddenly grow very sad... Isn't it? Regards, -- Nicolas George signature.asc Description: Digital signature

Re: utf

2018-04-05 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, Apr 05, 2018 at 08:42:39AM -0400, rhkra...@gmail.com wrote: > On Thursday, April 05, 2018 02:26:01 AM to...@tuxteam.de wrote: [...] > > Increase that by 2-3 orders of magnitude [...] > I'm laughing (at myself)--I just checked my mail directo

Re: utf

2018-04-05 Thread rhkramer
On Thursday, April 05, 2018 02:26:01 AM to...@tuxteam.de wrote: > On Wed, Apr 04, 2018 at 11:33:13PM +0200, deloptes wrote: > > [...] > > > other formats). I wouldn't store my mail in mbox anyway. For local > > system/user mails as a simple default storage perhaps yes - it might be > > OK, but fo

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-05 Thread rhkramer
On Wednesday, April 04, 2018 02:45:49 PM Don Armstrong wrote: > On Wed, 04 Apr 2018, rhkra...@gmail.com wrote: > > I've considered maildir--it meets some of my requirements (that is, to > > make something close to an askSam workalike), but one drawback is that > > it is essentially one email (i.e.,

Re: utf

2018-04-05 Thread Nicolas George
Richard Hector (2018-04-05): > >> What if the question is "Find all the English words that have an E > >> in the 5th position and a U in the 7th"? > > Yes, what? Who would ever ask such a question? What is the point of such > > a question? > Solving a crossword puzzle? This is a good example, than

Re: utf

2018-04-04 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Apr 04, 2018 at 11:33:13PM +0200, deloptes wrote: [...] > other formats). I wouldn't store my mail in mbox anyway. For local > system/user mails as a simple default storage perhaps yes - it might be OK, > but for public mail, where you have 10

Re: utf

2018-04-04 Thread Richard Hector
On 05/04/18 05:53, Nicolas George wrote: >> What if the question is "Find all the English words that have an E >> in the 5th position and a U in the 7th"? > > Yes, what? Who would ever ask such a question? What is the point of such > a question? Solving a crossword puzzle? Richard signature.as

Re: Invalid UTF-8 byte?

2018-04-04 Thread Michael Stone
On Thu, Apr 05, 2018 at 09:42:19AM +1200, Ben Caradoc-Davies wrote: On 05/04/18 02:09, to...@tuxteam.de wrote: Try UTF-16, what Microsoft (and a couple of years ago Apple) love to call "Unicode": in more "Western" contexts every second byte is NULL! The Java platform us

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread deloptes
rhkra...@gmail.com wrote: > I'll probably look into notmuch, just for kicks. > > I've considered maildir--it meets some of my requirements (that is, to > make something close to an askSam workalike), but one drawback is that it > is essentially one email (i.e., my "record").  One of the desirable

Re: utf

2018-04-04 Thread Stefan Monnier
Most of those calls to strlen have nothing to do with char-length but are more interested in display-width or byte-length. In the context of Unicode, using utf-8 doesn't make byte-length any harder than with ASCII. And in the context of Unicode, display-width is a lot more complex than strlen

Re: Invalid UTF-8 byte?

2018-04-04 Thread Ben Caradoc-Davies
On 05/04/18 02:09, to...@tuxteam.de wrote: Try UTF-16, what Microsoft (and a couple of years ago Apple) love to call "Unicode": in more "Western" contexts every second byte is NULL! The Java platform uses UTF-16 internally: "The char data type (and therefore the valu

Re: utf

2018-04-04 Thread deloptes
oxFormat/mbox https://wiki2.dovecot.org/MailboxFormat/Maildir I have worked on cloud mail solution using dovecot with mysql backend for 18mil customers. Another company was using dbmail with mysql with very good results. But this goes somehow off topic in regards of original UTF The only advantage I se

Re: utf

2018-04-04 Thread deloptes
Nicolas George wrote: >> What if the question is "Find all the English words that have an E >> in the 5th position and a U in the 7th"? > > Yes, what? Who would ever ask such a question? What is the point of such > a question? > > The point of such a question is only to try and disprove my point

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 01:36:15 PM Don Armstrong wrote: > On Tue, 03 Apr 2018, rhkra...@gmail.com wrote: > > I am building (have built several iterations) of a free format > > database to work something like askSam. It is a mashup of several > > applications, things like recol, kmail, nail, k

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
ould use msort (which depends on a 1 byte > > record separator to --separate the records ;-) while sorting. Some of > > the files already include UTF-8, and, in the future, I anticpate all > > will be in UTFF-8. > > Note that ISO 646, hence ISO 8859, hence ISO 10646, h

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Apr 04, 2018 at 03:44:23PM -0300, Henrique de Moraes Holschuh wrote: [...] > That said, it is always safe to break valid "modified UTF-8" into > records using zeroes, as long as you don't expect the result to be valid &g

Re: utf

2018-04-04 Thread Joel Roth
On Wed, Apr 04, 2018 at 02:20:17PM -0400, rhkra...@gmail.com wrote: > On Wednesday, April 04, 2018 12:58:57 PM deloptes wrote: > > And regarding the mbox thing, well mbox was depreciated for many reasons. I > > guess if it was that good it wouldn't be depreciated. > Oh, I wasn't aware that mbox wa

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Jonathan de Boyne Pollard
Henrique de Moraes Holschuh: Also, a text file MAY contain NULs (the character), it is just considered bad practice (nowadays?). Don't assume you won't see any. For example, received e-mail is *more* likely to have NULs in it than normal text due to the quality of some mail agents out there.

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Don Armstrong
On Wed, 04 Apr 2018, rhkra...@gmail.com wrote: > I've considered maildir--it meets some of my requirements (that is, to > make something close to an askSam workalike), but one drawback is that > it is essentially one email (i.e., my "record"). One of the desirable > features of askSam is that you d

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Jonathan de Boyne Pollard
rhkramer: Where were you in 2000 when I started the project? I cannot speak for anyone else, but I was probably once again giving a frequently given answer that I eventually put up on a WWW page. http://jdebp.eu./FGA/mail-mbox-formats.html

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Henrique de Moraes Holschuh
t contain any null byte; many text editors even refuse to open such a > > > > Depends on the encoding. For ASCII, ISO-8859-* and UTF-8 (and any other > > modern encoding AFAIK, other than modified UTF-8), any zero bytes map > > one-to-one to the NUL character/code point.

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 01:36:15 PM Don Armstrong wrote: > On Tue, 03 Apr 2018, rhkra...@gmail.com wrote: > > I am building (have built several iterations) of a free format > > database to work something like askSam. It is a mashup of several > > applications, things like recol, kmail, nail, k

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Jonathan de Boyne Pollard
. Some of the files already include UTF-8, and, in the future, I anticpate all will be in UTFF-8. Note that ISO 646, hence ISO 8859, hence ISO 10646, has had a single-byte Record Separator character since the 1960s. (-:

Re: utf

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 12:58:57 PM deloptes wrote: > And regarding the mbox thing, well mbox was depreciated for many reasons. I > guess if it was that good it wouldn't be depreciated. Oh, I wasn't aware that mbox was deprecated--can you shed more light on that. AFAIK, it is not defined in

Re: mbox vs maildir vs better formats [Re: Invalid UTF-8 byte? (was: Re: utf)]

2018-04-04 Thread Nicolas George
Don Armstrong (2018-04-04): > There are definitely better formats than Maildir, like Dovecot's > multi-dbox.[1] > > These issues are why almost everyone who uses Maildir just uses it as > the backing message store and uses the index on top to do avoid ever > reading all of the messages in the Mail

mbox vs maildir vs better formats [Re: Invalid UTF-8 byte? (was: Re: utf)]

2018-04-04 Thread Don Armstrong
On Wed, 04 Apr 2018, Nicolas George wrote: > Don Armstrong (2018-04-04): > > You should consider looking at using Maildir with notmuch and using > > things which integrate notmuch.[1] > > Maildir is not that much better than mbox. Sure, it eliminates most of > its worse flaws, but it brings flaws

Re: utf

2018-04-04 Thread Nicolas George
Greg Wooledge (2018-04-04): > The problem is, you reject every single example that everyone gives > you. I do not reject them, I refute them. > I don't know what you expect from us. Acknowledge that I am right once I have refuted all your examples and you have eventually understood my point. At

Re: utf

2018-04-04 Thread Greg Wooledge
On Wed, Apr 04, 2018 at 07:35:37PM +0200, Nicolas George wrote: > I am not sure exactly what is your example, but you got its flaw right: > n is not out of the blue, it was obtained by previously walking the > string. And in that case, you have all freedom to express n as a more > convenient entity

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Nicolas George
Don Armstrong (2018-04-04): > You should consider looking at using Maildir with notmuch and using > things which integrate notmuch.[1] Maildir is not that much better than mbox. Sure, it eliminates most of its worse flaws, but it brings flaws of its own, like trashing the inode and dentries caches

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Don Armstrong
On Tue, 03 Apr 2018, rhkra...@gmail.com wrote: > I am building (have built several iterations) of a free format > database to work something like askSam. It is a mashup of several > applications, things like recol, kmail, nail, kate and the data is > stored in mbox formatted files. > > Each record

Re: utf

2018-04-04 Thread Nicolas George
deloptes (2018-04-04): > ok, thanks. I understood the part above, but not sure if I understand this > part. A standard text editing operation is find and replace, where you get > the start and end point in the string. Of course it is not "n completely > out of the blue". I am not sure exactly what

Re: utf

2018-04-04 Thread deloptes
Nicolas George wrote: > Find me a case where you need to access the n-th char of a string, with > n completely out of the blue, and I will explain how somebody botched > their design. ok, thanks. I understood the part above, but not sure if I understand this part. A standard text editing operatio

Re: utf

2018-04-04 Thread Nicolas George
Greg Wooledge (2018-04-04): > Does it count if we want the 1st char, then the 2nd char, then the 3rd > char, then the 4th char, and so on? Or is that not blue enough? It is not out of the blue, it is in sequence. > How about the last char? Or the last two chars? Ditto. >

Re: utf

2018-04-04 Thread Greg Wooledge
On Wed, Apr 04, 2018 at 07:07:01PM +0200, Nicolas George wrote: > Find me a case where you need to access the n-th char of a string, with > n completely out of the blue, and I will explain how somebody botched > their design. Does it count if we want the 1st char, then the 2nd char, then the 3rd c

Re: utf

2018-04-04 Thread Nicolas George
deloptes (2018-04-04): > @Nicolas, I think OP does not understand you - perhaps it is not worth the > effort. My impression is that you refer to a string (properly) as sequence > of bytes and other refer to it as number of chars, which is not consistant > with utf. Not at all, I am w

Re: utf

2018-04-04 Thread deloptes
to it as number of chars, which is not consistant with utf. >From my work with UTF, it is possible but not satisfying to guess encoding. I wonder why no one suggested a kind of markup (xml) instead of byte delimiter. And regarding the mbox thing, well mbox was depreciated for many reasons. I gu

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 10:24:06 AM Greg Wooledge wrote: > On Wed, Apr 04, 2018 at 04:15:48PM +0200, Andre Majorel wrote: > > On 2018-04-04 14:55 +0200, Nicolas George wrote: > > > I have given you advice (for free), you are not taking it. Too bad for > > > you. Good day. > > > > Is advice th

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 10:15:48 AM Andre Majorel wrote: > On 2018-04-04 14:55 +0200, Nicolas George wrote: > > I have given you advice (for free), you are not taking it. Too bad for > > you. Good day. > > Is advice that comes with condescension truly free ? Thank you!

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Greg Wooledge
On Wed, Apr 04, 2018 at 04:15:48PM +0200, Andre Majorel wrote: > On 2018-04-04 14:55 +0200, Nicolas George wrote: > > > I have given you advice (for free), you are not taking it. Too bad for > > you. Good day. > > Is advice that comes with condescension truly free ? Any advice that stops the OP

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Andre Majorel
On 2018-04-04 14:55 +0200, Nicolas George wrote: > I have given you advice (for free), you are not taking it. Too bad for > you. Good day. Is advice that comes with condescension truly free ? -- André Majorel I trust bugs.debian.org to not publish my email addr

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread tomas
l byte; many text editors even refuse to open such a > > Depends on the encoding. For ASCII, ISO-8859-* and UTF-8 (and any other > modern encoding AFAIK, other than modified UTF-8), any zero bytes map > one-to-one to the NUL character/code point. I don't recall how it is on > o

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 08:26:41 AM Greg Wooledge wrote: > On Wed, Apr 04, 2018 at 01:23:25PM +0200, Nicolas George wrote: > > rhkra...@gmail.com (2018-04-03): > > > and the data is stored in mbox formatted files. > > > > DO NOT DO THAT. > > > > This is the only goo

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Nicolas George
rhkra...@gmail.com (2018-04-04): > I'll convert the file format after you convert the programs to work with the > different file format. Those programs include kmail, nail, (essentially all > email programs that use mbox as the file format), recoll (conversion should > not > be difficult), var

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
I'll convert the file format after you convert the programs to work with the different file format. Those programs include kmail, nail, (essentially all email programs that use mbox as the file format), recoll (conversion should not be difficult), various editors (nedit, kate, for which I've wr

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Greg Wooledge
On Wed, Apr 04, 2018 at 01:23:25PM +0200, Nicolas George wrote: > rhkra...@gmail.com (2018-04-03): > > and the data is stored in mbox formatted files. > > DO NOT DO THAT. > > This is the only good advice you can have for that project. Store your > data in a decent form

Re: utf

2018-04-04 Thread Henrique de Moraes Holschuh
On Tue, 03 Apr 2018, Darac Marjal wrote: > If these things matter to you, it's better to convert from UTF-8 to Unicode, UTF-8 *is* Unicode :p What you mean is either UCS-4 or UTF-32 (which are just another encoding for Unicode). But all of them are Unicode. UTF-* are only used for

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Nicolas George
rhkra...@gmail.com (2018-04-04): > Sorry, I already have 300 MB plus stored in that format. Then convert. Small extra work now. Many less headaches later. Regards, -- Nicolas George signature.asc Description: Digital signature

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
Sorry, I already have 300 MB plus stored in that format. Where were you in 2000 when I started the project? On Wednesday, April 04, 2018 07:23:25 AM Nicolas George wrote: > rhkra...@gmail.com (2018-04-03): > > and the data is stored in mbox formatted files. > > DO NO

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Nicolas George
rhkra...@gmail.com (2018-04-03): > and the data is stored in mbox formatted files. DO NOT DO THAT. This is the only good advice you can have for that project. Store your data in a decent format. Regards, -- Nicolas George signature.asc Description: Digital sig

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Henrique de Moraes Holschuh
On Tue, 03 Apr 2018, Michael Lange wrote: > I believe (please anyone correct me if I am wrong) that "text" files > won't contain any null byte; many text editors even refuse to open such a Depends on the encoding. For ASCII, ISO-8859-* and UTF-8 (and any other modern encod

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread rhkramer
On Tuesday, April 03, 2018 08:30:04 AM Greg Wooledge wrote: > WHAT ARE YOU TRYING TO DO? I am building (have built several iterations) of a free format database to work something like askSam. It is a mashup of several applications, things like recol, kmail, nail, kate and the data is stored in

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread rhkramer
On Tuesday, April 03, 2018 08:30:04 AM Greg Wooledge wrote: > > Addendum: iirc (again please correct me if I am wrong) unix file names > > may contain (at least in theory) any byte except 2F (the slash) and the > > null byte. So if your text files might contain arbitrary file names there > > may be

Re: utf

2018-04-03 Thread Stefan Monnier
>> > What is the length of a string? >> When is that relevant? > When you're trying to display one on a screen, or print one on paper. To display a string you don't just need its length, you need the actual bitmap representation, and getting info such as length is trivial once you've rendered the

Re: utf

2018-04-03 Thread Nicolas George
Greg Wooledge (2018-04-03): > When you're trying to display one on a screen, or print one on paper. With just the length? You will not get anything done. To properly display a string, you need to handle ligatures, right-to-left, kerning, etc. The length of the string is barely relevant. > When yo

Re: utf

2018-04-03 Thread Greg Wooledge
On Tue, Apr 03, 2018 at 10:51:43PM +0200, Nicolas George wrote: > Ben Caradoc-Davies (2018-04-04): > > What is the length of a > > string? > > When is that relevant? When you're trying to display one on a screen, or print one on paper. When you've

Re: utf

2018-04-03 Thread Nicolas George
Ben Caradoc-Davies (2018-04-04): >What is the length of a > string? When is that relevant? > Are you trying to count the number of glyphs? What for? > I do not think that > you can

Re: utf

2018-04-03 Thread Ben Caradoc-Davies
On 03/04/18 20:55, Darac Marjal wrote: If these things matter to you, it's better to convert from UTF-8 to Unicode, first. Fixed length encodings like UTF-32 will not fix broken assumptions about some relationship between byte length and number of characters because Unicode contains t

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread Michael Lange
On Tue, 3 Apr 2018 15:47:57 -0400 Greg Wooledge wrote: > On Tue, Apr 03, 2018 at 09:36:42PM +0200, Michael Lange wrote: > > >From what i have understood I think the OP should certainly at least, > > whatever the files they want to include exactly look like and > > whichever byte they choose as de

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread Greg Wooledge
On Tue, Apr 03, 2018 at 09:36:42PM +0200, Michael Lange wrote: > >From what i have understood I think the OP should certainly at least, > whatever the files they want to include exactly look like and whichever > byte they choose as delimiter, scan the file first for such a byte and if > it is actua

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread Michael Lange
Hi, On Tue, 3 Apr 2018 14:32:08 +0200 wrote: > > > Probably it is the same with some other control characters like 04 > > > (End of Transmission). When I look at > > > https://en.wikipedia.org/wiki/ASCII it seems like 1C (File > > > Separator) or 1E (Record Separator) might be appropriate choice

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Apr 03, 2018 at 02:14:07PM +0200, Michael Lange wrote: > On Tue, 3 Apr 2018 13:58:33 +0200 > Michael Lange wrote: > > > I believe (please anyone correct me if I am wrong) that "text" files > > won't contain any null byte; many text editors ev

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread Greg Wooledge
> Addendum: iirc (again please correct me if I am wrong) unix file names > may contain (at least in theory) any byte except 2F (the slash) and the > null byte. So if your text files might contain arbitrary file names there > may be (at least in theory) a (admittedly very small) chance that such a >

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread rhkramer
miter inside the text. You have to put your structure > outside the text. > > It is very typical of the "problems" people have with UTF-8: the problem > resides not in the properties of UTF-8 but in the unwritten assumptions > about the way they should be implementing things. > > Regards,

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread Michael Lange
On Tue, 3 Apr 2018 13:58:33 +0200 Michael Lange wrote: > I believe (please anyone correct me if I am wrong) that "text" files > won't contain any null byte; many text editors even refuse to open such > a file, I guess since they assume it is a "binary" file. > Probably it is the same with some ot

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread Michael Lange
Hi, On Tue, 3 Apr 2018 07:43:02 -0400 rhkra...@gmail.com wrote: > > maybe you could use the null byte? > > Thanks! > > Surprisingly (to me), this (and maybe several other of the control > characters might work--I did a search of one of the files, and there > are no null bytes. I believe (pleas

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread Nicolas George
he "problems" people have with UTF-8: the problem resides not in the properties of UTF-8 but in the unwritten assumptions about the way they should be implementing things. Regards, -- Nicolas George signature.asc Description: Digital signature

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-03 Thread rhkramer
On Monday, April 02, 2018 06:43:28 PM Michael Lange wrote: > On Mon, 2 Apr 2018 08:37:54 -0400 > > rhkra...@gmail.com wrote: > > A few weeks ago, I was looking for a byte that, in UTF-8, would be a > > totally invalid byte (not an invalid sequence of bytes). At the t

Re: utf

2018-04-03 Thread Nicolas George
> On Mon, Apr 02, 2018 at 09:39:05AM +0200, Andre Majorel wrote: > >I wouldn't say that. UTF-8 breaks a number of assumptions. For > >instance, > >1) every character has the same size, > >2) every byte sequence is a valid character, > >3) the equality or ineq

Re: utf

2018-04-03 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Apr 03, 2018 at 09:14:22PM +1200, Richard Hector wrote: > On 03/04/18 20:55, Darac Marjal wrote: > > If these things matter to you, it's better to convert from UTF-8 to > > Unicode, first. I tend to think of Unicode as

Re: utf

2018-04-03 Thread Richard Hector
On 03/04/18 20:55, Darac Marjal wrote: > If these things matter to you, it's better to convert from UTF-8 to > Unicode, first. I tend to think of Unicode as an arbitrarily large code > page. Each character maps to a number, but that number could be 1, 1000 > or 500_000 (Unicode se

Re: utf

2018-04-03 Thread Darac Marjal
On Mon, Apr 02, 2018 at 09:39:05AM +0200, Andre Majorel wrote: On 2018-04-02 08:00 +1200, Ben Caradoc-Davies wrote: On 02/04/18 02:05, mess-mate wrote: >howto change the system utf to eu character set ? Why? UTF (especially UTF-8) is vastly superior for all purposes: I wouldn't say t

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-02 Thread Michael Lange
Hi, On Mon, 2 Apr 2018 08:37:54 -0400 rhkra...@gmail.com wrote: > A few weeks ago, I was looking for a byte that, in UTF-8, would be a > totally invalid byte (not an invalid sequence of bytes). At the time, > I tried some googling, but it looked rather hopeless (maybe it was my > g

Re: utf

2018-04-02 Thread Ben Caradoc-Davies
On 02/04/18 19:39, Andre Majorel wrote: On 2018-04-02 08:00 +1200, Ben Caradoc-Davies wrote: Why? UTF (especially UTF-8) is vastly superior for all purposes: I wouldn't say that. UTF-8 breaks a number of assumptions. For instance, 1) every character has the same size, 2) every byte sequen

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-02 Thread rhkramer
Thanks, again, to Henrique and tomas for the followups! On Monday, April 02, 2018 02:40:55 PM to...@tuxteam.de wrote: > On Mon, Apr 02, 2018 at 03:18:38PM -0300, Henrique de Moraes Holschuh wrote:

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-02 Thread tomas
interesting, in a quick skim, I learned > > > some > > > interesting things about UTF-8, especially the property of self- > > > synchronization. > > > > Yes, UTF-8 is a brilliant design. > > Possibly relevant, definitely entertaining, Rob Pike's

  1   2   3   4   >