>>>>> "MM" == Martin Maechler <maech...@stat.math.ethz.ch> >>>>> on Thu, 23 Apr 2009 12:40:22 +0200 (CEST) writes:
>>>>> "vQ" == Wacek Kusnierczyk <waclaw.marcin.kusnierc...@idi.ntnu.no> >>>>> on Thu, 23 Apr 2009 11:49:54 +0200 writes: vQ> maech...@stat.math.ethz.ch wrote: >>> vQ> sprintf has a documented limit on strings included in the output using the vQ> format '%s'. It appears that there is a limit on the length of strings included vQ> with, e.g., the format '%d' beyond which surprising things happen (output vQ> modified for conciseness): >>> vQ> ... and this limit is *not* documented. MM> well, it is basically (+ a few bytes ?) MM> the same 8192 limit that *is* documented. indeed, I was right with that.. vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 >>> vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 >>> vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 11403 >>> vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 10602 11403 12204 >>> vQ> ... >>> vQ> Note that not only more than one '1' is included in the vQ> output, but also that the same functional expression (no vQ> side effects used beyond the interface) gives different vQ> results on each execution. Analogous behaviour can be vQ> observed with '%nd' where n > 8200. vQ> The actual output above is consistent across separate sessions. >>> vQ> With sufficiently large field width values, R segfaults: >>> vQ> sprintf('%*d', 10^5, 1) vQ> # *** caught segfault *** vQ> # address 0xbfcfc000, cause 'memory not mapped' vQ> # Segmentation fault >>> >>> >>> Thank you, Wacek. >>> That's all ``interesting'' ... unfortunately, >>> >>> my version of 'man 3 sprintf' contains >>> >>> >>>>> BUGS >>>>> Because sprintf() and vsprintf() assume an arbitrarily >>>>> long string, callers must be careful not to overflow the >>>>> actual space; this is often impossible to assure. Note >>>>> that the length of the strings produced is >>>>> locale-dependent and difficult to predict. Use >>>>> snprintf() and vsnprintf() instead (or asprintf() and vasprintf). vQ> yes, but this is c documentation, not r documentation. MM> Of course! ... and I *do* apply it to R's C code [sprintf.c] MM> and hence am even concurring with you .. vQ> it's applicable vQ> to a degree, since ?sprintf does say that sprintf is "a wrapper for the vQ> C function 'sprintf'". however, in c you use a buffer and you usually vQ> have control over it's capacity, while in r this is a hidden vQ> implementational detail, which should not be visible to the user, or vQ> should cause an attempt to overflow the buffer to fail more gracefully vQ> than with a segfault. vQ> in r, sprintf('%9000d', 1) will produce a confused output with a count vQ> of 1's variable (!) across runs (while sprintf('%*d', 9000, 1) seems to vQ> do fine): vQ> gregexpr('1', sprintf('%*d', 9000, 1)) vQ> # [1] 9000 vQ> gregexpr('1', sprintf('%9000d', 1)) vQ> # [1] 9000 9801 ..., variable across executions vQ> on one execution in a series i actually got this: vQ> Warning message: vQ> In gregexpr("1", sprintf("%9000d", 1)) : vQ> input string 1 is invalid in this locale vQ> while the very next execution, still in the same session, gave vQ> # [1] 9000 9801 10602 vQ> with sprintf('%*d', 10000, 1) i got segfaults on some executions but vQ> correct output on others, while sprintf('%10000d', 1) is confused again. >>> (note the "impossible" part above) >>> vQ> yes, but it does also say "must be careful", and it seems that someone vQ> has not been careful enough. >>> and we haven't used snprintf() yet, probably because it >>> requires the C99 C standard, and AFAIK, we have only relatively >>> recently started to more or less rely on C99 in the R sources. >>> vQ> while snprintf would help avoid buffer overflow, it may not be a vQ> solution to the issue of confused output. MM> I think it would / will. We would be able to give warnings and MM> errors, by checking the snprintf() return codes. My current working code gives an error for all the above examples, e.g., > sprintf('%9999d', 1) Error in sprintf("%9999d", 1) : required resulting string length 9999 is > maximal 8191 it passes 'make check-devel' and I am inclined to commit that code to R-devel (e.g. tomorrow). Yes, the documentation will also have to be amended, but apart from that, would people see a big problem with the "8192" limit which now is suddenly of greater importance {{as I said all along; hence my question to Wacek (and the R-develers) if anybody found that limit too low}} Of course, one could realloc()ate longer strings when needed inside R's sprintf() code, but I reluctant to do that (render the code yet more complicated ...) if nobody sees a need. Martin >>> More precisely, I see that some windows-only code relies on >>> snprintf() being available whereas in at least on non-Windows >>> section, I read /* we cannot assume snprintf here */ >>> >>> Now such platform dependency issues and corresponding configure >>> settings I do typically leave to other R-corers with a much >>> wider overview about platforms and their compilers and C libraries. >>> vQ> it looks like src/main/sprintf.c is just buggy, and it's plausible that vQ> the bug could be repaired in a platform-independent manner. MM> definitely. MM> In the mean time, I've actually found that what I first said on MM> the usability of snprintf() in R's code base was only partly correct. MM> There are other parts of R code where we use snprintf() for all MM> platforms, hence we rely on its presence (and correct MM> implementation!) and so we can and I think should use it in MM> place of sprintf() in quite a few places inside R's sprintf.c >>> BTW, >>> 1) sprintf("%n %g", 1,1) also seg.faults >>> vQ> as do vQ> sprintf('%n%g', 1, 1) vQ> sprintf('%n%') vQ> etc., while vQ> sprintf('%q%g', 1, 1) vQ> sprintf('%q%') vQ> work just fine. strange, because per ?sprintf 'n' is not recognized as vQ> a format specifier, so the output from the first two above should be as vQ> from the last two above, respectively. (and likewise in the %S case, vQ> discussed and bug-reported earlier.) MM> I have now fixed these bugs at least; MM> the more subtle "%<too_large_n>d" ones are different, and MM> as I said, I'm convinced that a nice & clean fix for those will MM> start using snprintf(). >>> 2) Did you have a true use case where the 8192 limit was an >>> undesirable limit? vQ> how does it matter? MM> well, we could increase it, if it did matter. MM> {{ you *could* have been more polite here, no? MM> it *was* after all a serious question that I asked! }} vQ> if you set a limit, be sure to consistently enforce vQ> it and warn the user on attempts to exceed it. or write clearly in the vQ> docs that such attempts will cause the output to be silently truncated. MM> Sure, I'm not at all disagreeing on that, and if you read this into my MM> posting, you misunderstand. MM> Martin [..........................] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel