Thanks to much help from the list, and hours of reading up on Unicode,
the Encode module, and many posts to perlmonks, I've come up with a
hideous solution for processing text files with different character
encodings.
Can someone please explain why this first block of code works when
decoding .txt files of different character encoding types:
#!/usr/bin/perl
use strict;
use warnings;
use Encode::Guess;
print "\nPlease specify the file path: ";
my $datapath = <STDIN>;
$datapath =~ s/^\s+//;
$datapath =~ s/\s+$//;
open (my $filehndl , "<", "$datapath") ||
die ("Can't open .txt file $datapath. Exiting program.\n\n");
binmode($filehndl);
if (read($filehndl, my $filestrt, 500))
{
my $enc = guess_encoding($filestrt);
if (ref($enc))
{
my $enc_name = $enc->name;
#my $encoding = find_encoding("$enc_name");
open (my $filehdl2 , "<:encoding($enc_name)" , "$datapath");
while (my $line = <$filehdl2>)
{
#my $line = $encoding->decode($string);
#my $line = decode("$enc_name", $string);
chomp $line;
my @words = split / /, $line;
my $nr_words = @words;
print "\n$line\n";
print "The line above has " . scalar @words . "
occurrences of something.\n";
}
close ($filehdl2);
}
}
close ($filehndl);
But this second generates the error:
UTF16: Unrecognised BOM 6100 at /usr/lib/perl/5.10//Encode.pm line
162, <$filehndl> line 1.
#!/usr/bin/perl
use strict;
use warnings;
use Encode;
use Encode::Guess;
print "\nPlease specify the file path: ";
my $datapath = <STDIN>;
$datapath =~ s/^\s+//;
$datapath =~ s/\s+$//;
open (my $filehndl , "<", "$datapath") ||
die ("Can't open .txt file $datapath. Exiting program.\n\n");
binmode($filehndl);
if (read($filehndl, my $filestrt, 500))
{
my $enc = guess_encoding($filestrt);
if (ref($enc))
{
my $enc_name = $enc->name;
while (my $line = decode("$enc_name", <$filehndl>))
{
chomp $line;
my @words = split / /, $line;
my $nr_words = @words;
print "\n$line\n";
print "The line above has " . scalar @words . "
occurrences of something.\n";
}
}
}
close ($filehndl);
Otherwise, can someone suggest a more elegant way of accomplishing
this? It doesn't seem like I should have to open the file twice, as
I'm doing in the first block. I can't figure out any way around that,
though.
Thanks for any help!
-Doug.
===
Douglas Cacialli, M.A. - Doctoral candidate
Clinical Psychology Training Program
University of Nebraska-Lincoln
Lincoln, Nebraska 68588-0308
===
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/