Regular expressions - Ick!

This isn’t really important in the grand scheme of thngs. But it has
become a personal challange of mine. And one at which I have
completely failed so far.

It involves (extended) regular expressions. I’ve used them. But I’ve
never really been good at them. I use tin to read my news articles.
Tin has a feature where if tries to put the most recent four people in
four different colors. It makes it easier to tell who wrote what. It
uses some rather complex regular expressions to accomplish it’s goal.

To further my goal of making it easier to tell who wrote what, I just
turned on an optino to put the persons initials on the beginning of
lines that they wrote. So a line that I wrote might look like:

BC > this is what I wrote.

Unfortunately, those initials broke tin’s regular expressions detection
of who wrote what. Here are the three regular expressions that it uses:

Regex used to show quoted lines :
^\s{0,3}([]{}>|:)]|\w{1,3}[>|])(?!-)
Regex used to show twice quoted l. :
^\s{0,3}(([]{}>|:)]|\w{1,3}[>|])\s*){2}(?!-[})>])
Regex used to show >= 3 times q.l. :
^\s{0,3}(([]{}>|:)]|\w{1,3}[>|])\s*){3}

I wanted to optionally allow for up to 4 leading characters at the
beginning of the line. And that could optionally be followed by a
single space character. So I came up with the string:

[[:alpha:]]{0,4}\ ?

I just couldn’t figure out the correct place to insert this in the
default regular expressions that tin uses.

Can anyone see a problem with my string?
If not, where is the correct place to insert it in the expressions
that tin uses?

Does anyone else find the rules of regular expressions much like the
rules of Cranko? (You have to be an old MAS*H fan.)

Bill Caroselli wrote:

This isn’t really important in the grand scheme of thngs. But it has
become a personal challange of mine. And one at which I have
completely failed so far.

It involves (extended) regular expressions. I’ve used them. But I’ve
never really been good at them. I use tin to read my news articles.
Tin has a feature where if tries to put the most recent four people in
four different colors. It makes it easier to tell who wrote what. It
uses some rather complex regular expressions to accomplish it’s goal.

To further my goal of making it easier to tell who wrote what, I just
turned on an optino to put the persons initials on the beginning of
lines that they wrote. So a line that I wrote might look like:

BC > this is what I wrote.

Unfortunately, those initials broke tin’s regular expressions detection
of who wrote what. Here are the three regular expressions that it uses:

Regex used to show quoted lines :
^\s{0,3}([]{}>|:)]|\w{1,3}[>|])(?!-)
^ => at beginning of line.

\s{0,3} => 0 to three whitespace chars.
( … ) => remember what matched
[ … ] one of these chars. Note the “]” means a litereal “]”
| => “or”.
\w{1,3}[>|] => 1 to 3 “word” chars followed by a
“>” or “|”. A “word” char is a leeter, digit or “_”.
(?!-) => but only if a “-” does not
follow.

Regex used to show twice quoted l. :
^\s{0,3}(([]{}>|:)]|\w{1,3}[>|])\s*){2}(?!-[})>])
\s* => followed by any number of

whitespace chars (or none)
{2} => what ever was matched
ocurs twice consecutively.
(?!-[})>]) => bot only if the
match is not followed by a “-” and one of the guys inside “[ … ]”.

Regex used to show >= 3 times q.l. :
^\s{0,3}(([]{}>|:)]|\w{1,3}[>|])\s*){3}

I wanted to optionally allow for up to 4 leading characters at the
beginning of the line. And that could optionally be followed by a
single space character. So I came up with the string:

[[:alpha:]]{0,4}\ ?
I’m not sure what you’re trying to do, so no comment on this. Decent

perl books go into a fair bit of detail on regular expressions. I use
“Perl in a Nutshell” by Ellen Siever et al from O’Reilly and “Perl Black
Book” by Steven Holzner from Coriolis

Richard

I just couldn’t figure out the correct place to insert this in the
default regular expressions that tin uses.

Can anyone see a problem with my string?
If not, where is the correct place to insert it in the expressions
that tin uses?

Does anyone else find the rules of regular expressions much like the
rules of Cranko? (You have to be an old MAS*H fan.)

Richard Kramer <rrkramer@kramer-smilko.com> wrote:
RK > Bill Caroselli wrote:

Unfortunately, those initials broke tin’s regular expressions detection
of who wrote what. Here are the three regular expressions that it uses:

Regex used to show quoted lines :
^\s{0,3}([]{}>|:)]|\w{1,3}[>|])(?!-)
RK > ^ => at beginning of line.

RK > \s{0,3} => 0 to three whitespace chars.
RK > ( … ) => remember what matched
RK > [ … ] one of these chars. Note the “]” means a litereal “]”
RK > | => “or”.
RK > \w{1,3}[>|] => 1 to 3 “word” chars followed by a
RK > “>” or “|”. A “word” char is a leter, digit or “_”.
RK > (?!-) => but only if a “-” does not
RK > follow.

Regex used to show twice quoted l. :
^\s{0,3}(([]{}>|:)]|\w{1,3}[>|])\s*){2}(?!-[})>])
RK > \s* => followed by any number of

RK > whitespace chars (or none)
RK > {2} => what ever was matched
RK > ocurs twice consecutively.
RK > (?!-[})>]) => bot only if the
RK > match is not followed by a “-” and one of the guys inside “[ … ]”.

Regex used to show >= 3 times q.l. :
^\s{0,3}(([]{}>|:)]|\w{1,3}[>|])\s*){3}

I wanted to optionally allow for up to 4 leading characters at the
beginning of the line. And that could optionally be followed by a
single space character. So I came up with the string:

[[:alpha:]]{0,4}\ ?
RK > I’m not sure what you’re trying to do, so no comment on this. Decent

RK > perl books go into a fair bit of detail on regular expressions. I use
RK > “Perl in a Nutshell” by Ellen Siever et al from O’Reilly and “Perl Black
RK > Book” by Steven Holzner from Coriolis

RK > Richard

Thanks. This explains a lot.

I was using “Sed & Awk” by Dale Dougherty (O’Reilly & Assoc.) chapter
3 as my regular expression guild. But you explained several things
that are NOT in that chapter. I.E. \s == whitespace & \w == word
character. I’m still mulling over the “remember what matched” phrase.
But at least I can much more intelligently try things that may work.
I’ll also add “Perl in a Nutshell” to my bookshelf.

Thanks again.

I sinply added \w{0,4} after the leading carrot ^
All works well now.

Bill Caroselli <qtps@earthlink.net> wrote:
BC > Richard Kramer <rrkramer@kramer-smilko.com> wrote:
BC > RK > Bill Caroselli wrote:

Unfortunately, those initials broke tin’s regular expressions detection
of who wrote what. Here are the three regular expressions that it uses:

Regex used to show quoted lines :
^\s{0,3}([]{}>|:)]|\w{1,3}[>|])(?!-)
BC > RK > ^ => at beginning of line.

BC > RK > \s{0,3} => 0 to three whitespace chars.
BC > RK > ( … ) => remember what matched
BC > RK > [ … ] one of these chars. Note the “]” means a litereal “]”
BC > RK > | => “or”.
BC > RK > \w{1,3}[>|] => 1 to 3 “word” chars followed by a
BC > RK > “>” or “|”. A “word” char is a leter, digit or “_”.
BC > RK > (?!-) => but only if a “-” does not
BC > RK > follow.

Regex used to show twice quoted l. :
^\s{0,3}(([]{}>|:)]|\w{1,3}[>|])\s*){2}(?!-[})>])
BC > RK > \s* => followed by any number of

BC > RK > whitespace chars (or none)
BC > RK > {2} => what ever was matched
BC > RK > ocurs twice consecutively.
BC > RK > (?!-[})>]) => bot only if the
BC > RK > match is not followed by a “-” and one of the guys inside “[ … ]”.

Regex used to show >= 3 times q.l. :
^\s{0,3}(([]{}>|:)]|\w{1,3}[>|])\s*){3}

I wanted to optionally allow for up to 4 leading characters at the
beginning of the line. And that could optionally be followed by a
single space character. So I came up with the string:

[[:alpha:]]{0,4}\ ?
BC > RK > I’m not sure what you’re trying to do, so no comment on this. Decent

BC > RK > perl books go into a fair bit of detail on regular expressions. I use
BC > RK > “Perl in a Nutshell” by Ellen Siever et al from O’Reilly and “Perl Black
BC > RK > Book” by Steven Holzner from Coriolis

BC > RK > Richard

BC > Thanks. This explains a lot.

BC > I was using “Sed & Awk” by Dale Dougherty (O’Reilly & Assoc.) chapter
BC > 3 as my regular expression guild. But you explained several things
BC > that are NOT in that chapter. I.E. \s == whitespace & \w == word
BC > character. I’m still mulling over the “remember what matched” phrase.
BC > But at least I can much more intelligently try things that may work.
BC > I’ll also add “Perl in a Nutshell” to my bookshelf.


Bill Caroselli – Q-TPS Consulting
1-(626) 824-7983
qtps@earthlink.net