On Fri, Mar 26, 2021 at 10:46:23AM -0500, David Wright wrote: > On Fri 26 Mar 2021 at 19:11:24 (+0530), Susmita/Rajib wrote: > > It is clearly noticed that wide applications of tricks with wildcards, > > regex and redirections aren't simply available in the man pages.
Nor should they be. The man page should document how the program functions, but should not tell you *every* possible way you can use the program. You seem to be focusing on the shell, so take a look at the bash man page. The man page for bash 5.1 as distributed in bullseye is almost 6400 lines long (using standard 80-character lines). If you wanted to include a tutorial, or a programmer's guide book, inside of this man page, it would be even larger. And it's already a tremendously large document. People have written whole books on shell programming. (Whether those books are any good is a separate question.) It's a huge topic. It's not something you can just tack on to the end of a man page. > Correct. The man pages document fact that are specific to particular > commands. Wildcards, regex and redirections are features of the shell > that invokes them, and so are documented there. This is only partly true. "Wildcards" (shell matching patterns, traditionally known as "globs") are indeed implemented at the shell level, and are documented in the shell's manual. However, these globs were so well received that they were also implemented outside of the shell. There are two C library functions -- fnmatch(3) and glob(3) -- which describe the C library's implementation for pattern matching and filename expansions, respectively. The libc implementation is a little bit different from bash's implementation, which in turn is a little bit different from dash's implementation. But the most basic features are the same. Many programs use fnmatch(3) or glob(3) or both, in order to maintain some level of compatibility with how the shell does pattern matching or pathname expansions. Regular expressions have a completely different lineage. They were developed as part of computer science theory back in the 1960s, but the ones we know and love were originally written in various Unix tools such as ed(1), grep(1) and awk(1). Back in the 1970s, each of these tools had its own separate regular expression engine, so they all had slightly different feature sets and syntax. Around the 1990s, people decided it would be a lot more sensible to share and standardize the regex engine across the various tools, so that for example grep(1) and sed(1) would both support the same expressions. But some of the tools were a little too different from each other for there to be just one regex engine. Eventually a compromise was reached, and Unix (POSIX) standardized on two regex languages: BRE (Basic Regular Expressions) and ERE (Extended Regular Expressions). sed(1), grep(1), ed(1) and some other programs use BRE. Or at least they're supposed to. awk(1), egrep(1) a.k.a. grep -E, and some other programs use ERE. The engine that supports these two types of regular expression is implemented in the C library, and is documented in regex(7) and regex(3). Bash uses ERE, but only in one place: the =~ operator in the [[ command. Bash uses the C library's implementation for this, rather than trying to write its own engine. Pretty much everything else that bash does uses globs. The GNU implementations of sed and grep, which are supposed to use Basic Regular Expressions, actually use their own special regex engine with their own special extensions. The effect of this is that people who only learned Linux, not Unix, often write scripts that use the GNU extensions, and therefore do not work on any other Unix-type systems. You'll want to watch out for that. There are several other regular expression engines, which go beyond the two flavors standardized by POSIX. The most common of these is undoubtedly perl's engine. It implements a great number of extensions to the regular expression language, and has been around for decades. A mostly-compatible clone of it called PCRE (Perl-Compatible Regular Expressions) was spun off and is implemented as a C library. Some programs use it. Tcl has its own extended regex language, which it calls ARE (Advanced Regular Expressions). It's not as popular as perl's, but it does have some of the same features. I'm sure there are a bunch of other flavors floating around out there as well.