Re: completion very slow with gigantic list

alex xmb sw ratchev Thu, 11 Jan 2024 15:46:10 -0800

On Thu, Jan 11, 2024, 23:21 Chet Ramey <chet.ra...@case.edu> wrote:

> On 1/10/24 12:28 AM, Eric Wong wrote:
> > Hi, I noticed bash struggles with gigantic completion lists
> > (100k items of ~70 chars each)
> >
> > It's reproducible with both LANG+LC_ALL set to en_US.UTF-8 and C,
> > so it's not just locales slowing things down.
> >
> > This happens on the up-to-date `devel' branch
> > (commit 584a2b4c9e11bd713030916d9d832602891733d7),
> > but I first noticed this on Debian oldstable (5.1.4)
> >
> > strcoll and strlen seem to be at the top of profiles, and
> > mregister_free when building devel with default options...
> > ltrace reveals it's doing strlen repeatedly on the entire
> > (100k items * 70 chars each = ~7MB)
>
> OK. Let's look at what happens. Let's say the text you're trying to
> complete is "a" (or "").
>
> You generate a huge list of strings and store that list into a string,
> which the shell has to split into individual words (there's only one) and
> run through word expansion, as part of running compgen -W on it.
>


dunno : this list as flat text , and the matches as regex ..

Since you have to return just words from the list, you need to run
> strcmp/strcoll against each member of the list to figure out which ones
> to output to store in COMPREPLY. At least that's all done in a subshell.
>
> So you end up with every substring in $wordlist as part of $COMPREPLY.
>
> Then you hand that list to readline, which runs through it to find the
> longest common substring of all the matches. This is where you get a
> ton of strlen calls, since mbrtowc needs to know the maximum number of
> bytes to examine when converting strings to potentially multibyte
> characters. Room for improvement here, but strlen is pretty cheap.
> Still, it would reduce the number of calls.
> (lib/readline/complete.c:compute_lcd_of_matches()).
>
> You end up with a list of matches that readline wants: LCD in matches[0],
> the matches in the rest of the match list, NULL terminated. Now you have
> to postprocess it.
>
> You have to remove duplicate matches, since you didn't tell readline to
> keep them. To make that easier, you have to sort them (since you didn't
> tell readline not to). That's where you get the calls to strcmp/strcoll --
> you can't avoid them. You can't count on qsort removing duplicates for
> you, so you have to run through the array again, comparing each element
> against the next until they don't match, marking the ones that do for
> removal -- more strcmp/strcoll calls.
>
> Now you have the list of possible matches, and readline will either insert
> the longest common substring or display the match list. If you're going to
> display the match list, you have to run through the array again to
> determine the longest match, do any required processing for the
> completion-prefix-display-length and completion-display-width variables,
> possibly color the common prefix, then print the matches, which will
> call strlen() on each match to determine how much screen space the match
> will take.
>
> A lot of this is caused by readline's passing around string vectors
> (char **) instead of some struct that held a string and its length. But
> that's the public API, and I have higher-priority things to do than
> redo readline's internal completion architecture.
>
> You're welcome to take your shot, and make improvements where there are
> improvements to be made. I'd be glad to take them.
>
> Chet
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>                  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/
>
>

Re: completion very slow with gigantic list

Reply via email to