Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?

Heinz Mezera Tue, 09 Jun 2015 08:18:12 -0700

Hello Duncan,

your answer appears to be a short explaination of regex, but does not
work as expected.


Am Montag, den 08.06.2015, 11:31 +0000 schrieb Duncan:
> Heinz Mezera posted on Mon, 08 Jun 2015 09:16:22 +0200 as excerpted:
> 
> > I'd like to select Headers in the Header-Pan with a regular expresssion
> > in the Subject/Author field and need your help. Is this possible and how
> > do I do it.
> > 
> > I want to select all headers
> > - starting with three alphabetic characters
> > - followed by an underscore
> > - two digits after the underscore
> > - and any number of charcters afterwards.
> > 
> > PAN Info:
> > Pan 0.139 Sexual Chocolate (GIT bf56508 git://git.gnome.org/pan2;
> > i686-pc-linux-gnu)
> 
> ** Note that after changing the search expression, you may have to toggle 
> to something else (say subject), then back to regex, in ordered to get it 
> to "take".  I noticed it would dynamically refilter part of the time, but 
> would appear to stall out and not update without the toggle, sometimes.  
> Given that hint, and the caveat that I tested the components separately 
> but not together, as I didn't have posts handy that matched that specific 
> pattern...
> 
> One way to do it:
> 
> ^[[:alpha:]]{3}_[[:digit:]]{2}.*$

No matter what expresiion I enter into the search field the headerlist
will be totally empty. To make sure it's not an error in the regex I
tried

^.*$

I think the above expresiion should show all headers, but the header
list is totally empty.
Enclosing the expression in single or double quotes or brackets has no
effect.
What am I doing wrong?

> 
> ^ = zero-width match at the beginning/left
> $ = same at the end/right
> 
> Non-special characters match themselves.  Letters, digits, _, etc, are 
> non-special.  
> 
> . matches exactly one occurrence of any character (and *, mentioned again 
> below, is any number including zero, so .* is a full wildcard, including 
> matching nothing). 
> 
> [] encloses a "character class".  Such character classes can include 
> ranges of characters [a-z], individual lists [123], and/or category 
> classes (I seem to have forgotten the proper term ATM) like the above, 
> enclosed in further [:xxx:] marks, thus the nesting.
> 
> So [[:alpha:][:digit:]] and [a-zA-Z0-9] would both match alphanumeric 
> characters in ASCII, tho pan's regex is case insensitive so both a-z and 
> A-Z wouldn't be needed for pan, only one or the other.  You can also do 
> things like [[:digit:]abc._], to match digits, abc, and the individual 
> characters . and _.  The significance of the [:xxx:] matches, however, is 
> that they work across character sets, so [:alpha:] matches letters that 
> would be skipped in character-sets where a-z doesn't include all letters 
> due to strange ordering or something.
> 
> To match a - in a character-class, put it at the beginning so it can't 
> specify a range.  The \ char is the escape char, both inside and outside 
> a character-class, so you can use \] to match a literal ] for instance, 
> and of course \\ to match a literal \.
> 
> Additionally, you can specify a /negative/ character-class with ^ as the 
> first character (outside a character-class, it means match the beginning, 
> inside, as the first character of the class, it negates the class, inside 
> as anything other than the first char, it matches itself normally).  So 
> [^abc] means any character /but/ abc.
> 
> Significantly, character classes normally only match *ONE* character.  To 
> match more than one you can repeat, [a-z][a-z] will match TWO letters, or 
> use frequency specifiers inside of {} as I did, above.  {1,3} would be 
> one, two, or three matches, {1,} would be at least one match.
> 
> In addition to the {}-delimited frequency range specifiers, there's:
> 
> * = zero or more (*NOT* one or more, it doesn't have to be there!)
> ? = zero or one (may or may not be there, but matches only once)
> + = 1 or more
> 
> Again in case it didn't sink in above, \ is the escape char, so to match 
> a literal *, you'd use \*
> 
> () are the grouping characters, and | indicates alternatives (or).  So 
> ((cat)|(horse)) will match "cat" or "horse" but will NOT match "cah", for 
> instance.  Note that the alternatives do NOT need to be the same length, 
> and that the inside grouping help clarify the scope of the match but 
> aren't absolutely required, so (cat|horse) should have the same effect.  
> So there are two ways to match a "cat" that may or may not be there:
> 
> (cat)?
> (cat|)
> 
> That's the basics.  FWIW for non-pan usage, some regex uses make things 
> like {} special characters, so {3} is a frequency and \{3\} are the 
> literal characters, while others don't unless they're escaped, so {3} 
> would be the literal characters and the backslash-escaped version would 
> be frequency.  And of course the shell has its own special chars and \ 
> escape char, so sometimes you need to play with the number of \\\ a bit 
> in ordered to get it to work like you want, but once you understand the 
> basics, even /just/ the basics, regex can really be quite powerful.
> 
> Of course there's far FAR more.  Just a couple quick examples.  First, () 
> not only groups, but stores for later use.  So if for instance you are 
> trying to match quotes but don't know if it's single-quotes or double-
> quotes, you can use (['"]) for the first match (possibly as (['"])? or 
> ('|"|) if you don't know if it'll be quoted or not), and \1 or possibly 
> $1 to automatically match the same thing at the other end of the quote.  
> Second, there's what's called look-ahead and look-behind matching, which 
> can be positive or negative.  So for instance if you want to match "pro" 
> but not "gopro", there's a way to say "look behind (to the left of) the 
> pro and don't match if the preceding letters are 'go'".  I don't use them 
> enough to be sure of my memory, however, so generally have to look that 
> sort of advanced stuff up, if I need it.  And for this advanced stuff, 
> you usually have to either lookup or test whether whatever you're trying 
> to work with actually supports it or not.  I'm not sure whether pan does, 
> for instance, tho it wouldn't surprise me if it did.
> 
> So back to the specific case in point:
> 
> ^[[:alpha:]]{3}_[[:digit:]]{2}.*$
> 
> Given the above, we can parse that as:
> 
> ^ Left anchor (begin the line with what follows):
> 
> [[:alpha:]] one alphabet character
> 
> {3} match the previous exactly three times
> 
> _ (matches itself)
> 
> [[:digit:]] one digit
> 
> {2} match the previous exactly twice
> 
> . any character
> 
> * match the previous any number (including none) of times
> 
> $ right anchor (end of line)
> 
> 
> Of course the .*$ aren't actually needed, since without them the match is 
> simply left-anchored only, but I like the explicit "the rest of the line 
> doesn't matter for the match" that .*$ provides.  And in non-pan usages 
> where you're matching to delete or replace the match, it COULD matter, as 
> failing to include the .*$ would leave any other junk on the line still 
> there, while including it would match and thus delete/replace the entire 
> line.
> 
kr Heinz


_______________________________________________
Pan-users mailing list
Pan-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/pan-users

Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?

Reply via email to