On Sat, Jun 20, 2020 at 12:20:41PM +0200, Albretch Mueller wrote:
> _X=".\(html\|txt\)"
> _SDIR="$(pwd)"
> 
> _AR_TERMS=(
> Kant
> "Gilbert Ryle"
> Hegel
> )
> 
> for iZ in ${!_AR_TERMS[@]}; do
>  find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
> "${_AR_TERMS[$iZ]}" {} \;
> done # iZ: terms search/grep'ped inside text files;  echo "~";
> 
> 
> # this would be much faster
> 
> find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il
> "Kant\|Gilbert Ryle\|Hegel" {} \;
> 
> but how do I know which match happened in order to save it into separate 
> files?

Hm. The first approach goes three times through your files, once for
each term. The second goes once, for a combined regular expression.

So no wonder the second approach is faster.

But to actually attack the problem you should be aware that the
second method is doing *something different* from the first one:
"grep -l" will stop at the first hit, so even if you could ask
grep which one of the alternatives it found, it'll miss Hegel
in a file where Kant figures first. Is that what you want?

Once you have answered that question, you'll be able to proceed.
One possibility is postprocessing your output: grep outputs the
hit line, and you can match that against the individual terms;
you'd have to drop the "-l" for that, making things somewhat slower.

Another possibility is to keep the "-l" and to re-grep the files
found against all the individual patterns.

Cheers
-- t

Attachment: signature.asc
Description: Digital signature

Reply via email to