Hello Nils, Op 11-07-2020 om 21:56 schreef Nils König: > when editing a UTF8 file in nano that contains a BOM (efbbbf) and inserting a > character at the beginning, the BOM bytes will move after the inserted > character. This can lead to breakages when such a file is being parsed by a > program
Ideally, a UTF-8 file should not contain a Byte Order Mark. What if I concatenate several files together? Then the result might contain BOMs embedded in the text. As far as I know, BOM is only a problem with Windows and Google files. I do not know of any tool on Unix that adds a BOM to a UTF-8 file. > a BOM should, if at all present, only occur at the very beginning > of the file. Ideally, yes. But as shown above, if a file contains a BOM, the BOM is bound to appear in other places too. And the Unicode standard does not forbid the BOM from occurring elsewhere -- in that case it should be considered as a Zero Width Non Breaking Space. > Ideally nano should detect the presence of BOM and not have it be > editable/moveable. I could mitigate the problem by placing the cursor after the BOM when a file is opened. (See attached patch.) But you can still delete the BOM with <Backspace>, or put the cursor on it with <Left> or <Home>. For nano, all characters are just a group of bytes that can be added, deleted, restored, searched, and saved. If I would make the BOM uneditable and unmovable, people could no longer use nano to get rid of a BOM in a file. https://bugs.launchpad.net/ubuntu/+source/nano/+bug/1045062 Benno
diff --git a/src/files.c b/src/files.c index 04476c44..aad58b78 100644 --- a/src/files.c +++ b/src/files.c @@ -459,7 +459,10 @@ bool open_buffer(const char *filename, bool new_one) openfile->lock_filename = thelocksname; #endif openfile->current = openfile->filetop; - openfile->current_x = 0; + if (strcmp(openfile->filetop->data, "\xEF\xBB\xBF") == 0) + openfile->current_x = 3; + else + openfile->current_x = 0; openfile->placewewant = 0; }
signature.asc
Description: OpenPGP digital signature