On 8/17/11 Wed  Aug 17, 2011  2:59 PM, "ERIC KRAUSE" <[email protected]>
scribbled:

> Hello all,
> I am beating my head against the wall, any help would be appreciated.
> 
> I have a file:
>   /  /    /      /   m  /  cvfbcbf/ A123/  / / /// ////
>   /  /    /      /   m  /  cvfbcbf/ A234/  / / /// ////
>   /  /    /      /   m  /  cvfbcbf/ B123/  / / /// ////
> 
> There is spaces in the beginning and the end of each line and each line is
> very similar. I'm trying to count how many unique A#'s and B#'s as well as
> total A#'s and B#'s.

A hash would be suitable for that task.

> 
> The problem for me is the line endings I think. When I open the file and read
> in one line, I get the whole file. I think the line endings are ^p (MS
> paragraph markers), but I can't open the file to view them. The files are
> huge, 150M or bigger. MS Word chokes on them.

Try Wordpad or Notepad to open the file. It sounds like the file is not a
regular text file with normal Windows (or Unix) line endings such as "\r\n",
"\n", "\r", etc. Where did the file come from?

> 
> Each line does end with 30 spaces.
> 
> Is there a way for me to search the entire 150M single line and get the
> metrics I'm looking for, or is it possible to open the file, search for the 30
> spaces and replace with \n?

Yes:

  $file_contents =~ s/\s{30,}/\n/g;

which will substitute any consecutive substring of 30 or more whitespace
characters with a newline character.

You can also split the file on the 30 spaces:

  my @lines = split(/\s{30,}/,$file_contents);

If you can figure out how the paragraph markers are stored in the file, you
can split on those, instead. The above statement will likely leave those
markers at the beginning of each line, except possibly the first.

You can use substr to print parts of the file:

  print substr($file_contents,0,80), "\n";

to see what you really have.



-- 
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/


Reply via email to