Bug#1003089: man-db has become prohibitively slow

Steinar H. Gunderson Thu, 20 Jan 2022 11:27:17 -0800

On Mon, Jan 17, 2022 at 10:46:29PM +0000, Colin Watson wrote:
> test_manfile (which despite the name is not a test function) calls
> find_name with file!="-" and encoding=NULL; that causes find_name to
> call get_page_encoding, which always returns something non-NULL
> ("ISO-8859-1" for English pages), and then call add_manconv from that to
> UTF-8.


I think there's a potential bug here; from attaching gdb and breaking on
iconv_open, it seems there's a lot of encoding from UTF-8 to UTF-8, which
should be no-ops (except that it might do some additional well-formedness
checking). Is that intentional?

Apart from that, I've given your patch a quick run, and it seems to cut out
nearly all of the unneeded overhead. So what we're potentially left with
without doing strange things like multithreading or simplifying the lexer,
is:

 - Get rid of the unneeded conversions (~15–20% overhead, it seems).
 - Launch in the background, potentially.

Does that make sense? In any case, what we have here is already a huge
improvement.

/* Steinar */
-- 
Homepage: https://www.sesse.net/

Bug#1003089: man-db has become prohibitively slow

Reply via email to