On Sat, Jun 20, 2020 at 12:20:41PM +0200, Albretch Mueller wrote: > _X=".\(html\|txt\)" > _SDIR="$(pwd)" > > _AR_TERMS=( > Kant > "Gilbert Ryle" > Hegel > ) > > for iZ in ${!_AR_TERMS[@]}; do > find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il > "${_AR_TERMS[$iZ]}" {} \; > done # iZ: terms search/grep'ped inside text files; echo "~"; > > > # this would be much faster > > find "${_SDIR}" -type f -iregex .*"${_X}" -exec grep -il > "Kant\|Gilbert Ryle\|Hegel" {} \; > > but how do I know which match happened in order to save it into separate > files?
Hm. The first approach goes three times through your files, once for each term. The second goes once, for a combined regular expression. So no wonder the second approach is faster. But to actually attack the problem you should be aware that the second method is doing *something different* from the first one: "grep -l" will stop at the first hit, so even if you could ask grep which one of the alternatives it found, it'll miss Hegel in a file where Kant figures first. Is that what you want? Once you have answered that question, you'll be able to proceed. One possibility is postprocessing your output: grep outputs the hit line, and you can match that against the individual terms; you'd have to drop the "-l" for that, making things somewhat slower. Another possibility is to keep the "-l" and to re-grep the files found against all the individual patterns. Cheers -- t
signature.asc
Description: Digital signature