URL: <http://savannah.gnu.org/bugs/?48055>
Summary: Regex ranges and locales in gnu-awk regextype Project: findutils Submitted by: piotrjurkiewicz Submitted on: Mon 30 May 2016 08:12:40 AM CEST Category: find Severity: 3 - Normal Item Group: Wrong result Status: None Privacy: Public Assigned to: None Originator Name: Originator Email: Open/Closed: Open Discussion Lock: Any Release: 4.6.0 Fixed Release: None _______________________________________________________ Details: Starting with gawk 4.0 the traditional behaviour of regex ranges has been brought back. This means that [a-z] matches only lowercase letters and [A-Z] matches only uppercase letters, regardless of locale and collation being set. See more: https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html Can test this with the following command: $ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk pre-4.0 ABC $ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk 4.0+ [nothing] Findutils, however, still emulate the old behaviour of gawk in gnu-awk mode. That is, when using certain locales, [a-z] and [A-Z] ranges matches both lowercase and uppercase letters. Test: Prepare: mkdir test cd test touch a.lower touch b.UPPER Then both commands: LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[a-z]{5}$' LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[A-Z]{5}$' returns: ./a.lower ./b.UPPER instead just one file with appropriate case. _______________________________________________________ Reply to this item at: <http://savannah.gnu.org/bugs/?48055> _______________________________________________ Message sent via/by Savannah http://savannah.gnu.org/