Ersek, Laszlo writes: > Sorry for the delayed answer. > > On Mon, 10 Oct 2011, Jean-Francois Dockes wrote: > > > It would seem that there is some file in your document set which is > > crashing recoll. We need to determine which it is, get it out of the > > indexed set so that you can begin to use recoll again, and if at all > > possible, I would very much like to get a copy to fix the bug (if this > > is confidential data, we'll try other ways to get details about the > > issue). > > > > For a beginning, we need to have a look at the log file before the point > > where recoll crashes. > > I rebuilt the package with noopt,nostrip,debug. I debugged it down to > recoll-1.13.04/common/unacpp.cpp, function unacmaybefold(). It is called > with dofold = true. unacfold_string() returns with -1, errno set to 12 > (ENOMEM). Then unacmaybefold() goes on to format an error message:
Thanks a lot for taking the time to debug this ! [...] > I identified the file that caused this huge number of conversion errors -- > it's a Maildir file with a zip and a rar attachment. Both compressed files > have the same contents: two latin2 encoded text files (tables, actually), > 1.3 and 1.4 MB in size. In total 5.4 MB of latin2 encoded text that caused > 90,228 conversion failures (and presumably leaked the same number of conv > descs). [...] I'd guess that you have an utf-8 locale. Recoll has no good way to identify the character set when it is not told (by a file header or the locale or a configuration option). The next version will be less verbose about conversion errors and probably will have an error percentage threshold after which it will just stop indexing the file. The iconv_close() leak was fixed in recoll 1.16 (incidentally to another change), but it is present in 1.13, 1.14, and 1.15 (same unac.c file). > The following patch fixed my problem. VSZ peaks around 160 MB. The original patch needed --ignore-whitespace to apply, I am attaching one with the same changes, but which should apply cleanly (if Kartik can make use of it). Again, many thanks. Regards, jf
patch-unac-icclose.diff
Description: Binary data