How about something like this (it's my first try, but it seems to
work)...
#############################
use strict;
use warnings;
my @domains = qw(www.x.com x.com www.sandisk.com network.tv funny.co.jp
johnson.pictures.geography.info);
foreach(sort @domains){
if($_ =~
/([a-zA-Z0-9\-.]*?)([a-zA-Z0-9\-]+\.(co\.\w{2}|com|net|edu|gov|info|tv))
$/){
my $host = $1;
my $domain = $2;
$host = "No host " unless $host;
print "$host => $domain\n";
}else{
print "Error: domain name format incorrect!\n";
}
}
#############################
The regex gets a little convoluted, so I used YAPE::Regex::Explain to
sort it out:
The regular expression:
(?-imsx:/([a-zA-Z0-9\-.]*?)([a-zA-Z0-9\-]+\.(co\.\w{2}|com|net|edu|gov|i
nfo|tv))$/)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[a-zA-Z0-9\-.]*? any character of: 'a' to 'z', 'A' to
'Z', '0' to '9', '\-', '.' (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[a-zA-Z0-9\-]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9', '\-' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
co 'co'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
\w{2} word characters (a-z, A-Z, 0-9, _) (2
times)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
com 'com'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
net 'net'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
edu 'edu'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
gov 'gov'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
info 'info'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
tv 'tv'
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]