Re: utf-8 bom frequency of bytes

2012-01-31 Fir de Conversatie John Little
On Jan 22, 11:09 pm, Dominique Pellé wrote: > > I'm curious to see your patch when ready. > > Reading by buffer of 200 bytes makes f_readfile() complicated. > I think we're better off reading byte by byte using fgetc(fd)... Googling "fgetc performance" indicates that on some platforms at least th

Re: utf-8 bom frequency of bytes

2012-01-22 Fir de Conversatie John Little
On Jan 22, 11:09 pm, Dominique Pellé wrote: > > My patch also did not address the buggy removal of BOM as > you indicated... > > The BOM bug can be reproduced ... Perhaps more relevant in practice, CR before NL removal has the same problem and can be demonstrated with a line of 199 characters end

Re: utf-8 bom frequency of bytes

2012-01-22 Fir de Conversatie Dominique Pellé
John Little wrote: > I can't help thinking that your linear times are not guaranteed with > the vagaries of heap fragmentation and memory allocator > implementation, and that calling into memory allocator code every 200 > bytes of megabytes of data is to be avoided.  Such intuitions are > infamous

Re: utf-8 bom frequency of bytes

2012-01-21 Fir de Conversatie John Little
On Jan 21, 11:50 pm, Dominique Pellé wrote: > > And graph after patch is here: > > http://dominique.pelle.free.fr/time-readfile.png Thank you, an interesting tutorial. Next time I want to graph something I'll be searching for your post rather than firing up a spreadsheet. I can't help thinking

Re: utf-8 bom frequency of bytes

2012-01-21 Fir de Conversatie Dominique Pellé
John Little wrote: > On Jan 21, 9:56 am, Bram Moolenaar wrote: > >> Sounds good.  I'll add this in the todo list. > > There is already an item about readfile() and realloc in the todo > list: > > 8   When editing a file with extremely long lines (e.g., an > executable), the >    "linerest" in rea

Re: utf-8 bom frequency of bytes

2012-01-20 Fir de Conversatie John Little
On Jan 21, 9:56 am, Bram Moolenaar wrote: > Sounds good.  I'll add this in the todo list. There is already an item about readfile() and realloc in the todo list: 8 When editing a file with extremely long lines (e.g., an executable), the "linerest" in readfile() is allocated twice to be

Re: utf-8 bom frequency of bytes

2012-01-20 Fir de Conversatie Bram Moolenaar
Dominique Pelle wrote: > John Little wrote: > > > Hi all > > > > I'm revising the function f_readfile in eval.c, to speed it up when > > processing very long lines. (It presently grows a string every 200 > > bytes by allocating a new one 200 bytes longer, copying the old to the > > new, and deal

Re: utf-8 bom frequency of bytes

2012-01-20 Fir de Conversatie Dominique Pellé
John Little wrote: > Hi all > > I'm revising the function f_readfile in eval.c, to speed it up when > processing very long lines. (It presently grows a string every 200 > bytes by allocating a new one 200 bytes longer, copying the old to the > new, and deallocating the new.  F.ex., for a 1 MB line

Re: utf-8 bom frequency of bytes

2012-01-20 Fir de Conversatie John Little
On Jan 20, 5:42 pm, "Benjamin R. Haskell" wrote: > > I don't know the background of 'f_readfile', but why would the BOM be > removed in positions other than at the start of the string?  Isn't it > only meaningful as an encoding detection when it's the first thing being > read?  Anywhere else U+FE

Re: utf-8 bom frequency of bytes

2012-01-19 Fir de Conversatie Benjamin R. Haskell
On Thu, 19 Jan 2012, John Little wrote: Hi all I'm revising the function f_readfile in eval.c, to speed it up when processing very long lines. (It presently grows a string every 200 bytes by allocating a new one 200 bytes longer, copying the old to the new, and deallocating the new. F.ex.,

utf-8 bom frequency of bytes

2012-01-19 Fir de Conversatie John Little
Hi all I'm revising the function f_readfile in eval.c, to speed it up when processing very long lines. (It presently grows a string every 200 bytes by allocating a new one 200 bytes longer, copying the old to the new, and deallocating the new. F.ex., for a 1 MB line, such as may be used by the ya