On Sat, Aug 29, 2009 at 11:44, Peter Daum<[email protected]> wrote:
> I'm struggling with a tricky quoting problem:
>
> I need to split lines divided by some delimiter;
>
> - The delimiter usually will be '|',
> but can be changed, so it needs to be variable.
> - Furthermore, the columns may also contain the delimiter,
> in which case it is quoted by a backslash
>
snip
The quotemeta function is the right way to go, in that case I am
invoking it using the \Q, \E special escapes:
#!/usr/bin/perl
use strict;
use warnings;
my $delim = "|";
my $re = qr/(?<!\\)\Q$delim\E/;
while (<DATA>) {
chomp;
print join(",", split $re), "\n";
}
__DATA__
foo|bar\|bar|baz|quux\|
foo|bar\\|bar|baz|quux\|
However, it still has a bug (as demonstrated by the second data line):
escaped \ does not behave correctly. If you need that functionality
you will probably want to write a simple parser:
#!/usr/bin/perl
use strict;
use warnings;
my $delim = "|";
my $tokenizer = qr{
\\\\ | #literal backslash
\\\Q$delim\E | #escape delimiter
\Q$delim\E | #delimiter
[^$delim\\]+ #anything else
}x;
my %map = (
"\\\\" => "\\",
"\\$delim" => $delim,
);
while (<DATA>) {
chomp;
my @tokens = /($tokenizer)/g;
my @rec = ("");
for my $token (@tokens) {
if ($token eq $delim) {
push @rec, "";
next
}
$rec[-1] .= exists $map{$token} ? $map{$token} : $token;
}
print join(",", @rec), "\n";
}
__DATA__
foo|bar\|bar|baz|quux\|
foo|bar\\|bar|baz|quux\|
Warning, this code interprets the escapes, whereas the code you had
did not. Depend on what you want, you may want to remove %map and
change the token concatenation:
$rec[-1] .= $token;
The difference for "foo|bar\\|bar|baz|quux\|" is
without interpretation: "foo,bar\\,bar,baz,quux\|"
with interpretation: "foo,bar\,bar,baz,quux|"
--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/