> On 1/19/2010 12:09 AM, Perl Noob wrote:
>> I have a data file with thousands of records. The problem is that
>> the
>> records in the data file span two lines for each record. I want to
>> write a perl script that makes each record a single line. The file
>> looks like this:
>>
>> RECORD1FIELD1 RECORD1FIELD2 RECORD1FIELD3 RECORD1FIELD3
>> RECORD1FIELD4 RECORD1FIELD5
>>
>> RECORD2FIELD1 RECORD2FIELD2 RECORD2FIELD3 RECORD2FIELD3
>> RECORD2FIELD4 RECORD2FIELD5
>>
>> . . .
>>
>> What I want is this:
>>
>> RECORD1FIELD1 . . .RECORD1FIELD5
>> RECORD2FIELD1 . . .RECORD2FIELD5
>>
>>
>> The second line of each record actually has a bunch of spaces before
>> the first field. I thought I could exploit this with:
>>
>> s/\n //gi;
>>
>> what I thought would happen is the script would look for a new line
>> followed by a bunch of empty spaces and delete only those. But that
>> didn't work.
>>
>> Using a hex editor I saw that each new line was 0D 0A. I then tried:
>>
>> s/\x0D\x0A//gi;
>>
>> that didn't work either.
>>
>> I just want to move the second line of each record to the end of the
>> first. It seems so simple, but I am exhausted of trying different
>> things.
>>
>>
>>
>>
>
> I see a couple of choices. Your example data seems to have an
> extra newline between logical records. If that's true, then
> you can read them as paragraphs, e.g.,
>
> 1 #!/usr/bin/perl
> 2
> 3 use warnings;
> 4 use strict;
> 5
> 6 $/ = "\n\n"; # one of the paragraph modes
> 7
> 8 while( <DATA> ) {
> 9 my @fields = split;
> 10 print "@fields\n";
> 11 }
> 12
> 13
> 14 __DATA__
> 15 RECORD1FIELD1 RECORD1FIELD2 RECORD1FIELD3 RECORD1FIELD3
> 16 RECORD1FIELD4 RECORD1FIELD5
> 17
> 18 RECORD2FIELD1 RECORD2FIELD2 RECORD2FIELD3 RECORD2FIELD3
> 19 RECORD2FIELD4 RECORD2FIELD5
> 20
>
> If the apparent extra newline was not intentional, then
> you could simply read two lines at a time, e.g.,
>
> 1 #!/usr/bin/perl
> 2
> 3 use warnings;
> 4 use strict;
> 5
> 6 while( <DATA> ) {
> 7 $_ .= <DATA>;
> 8 my @fields = split;
> 9 print "@fields\n";
> 10 }
> 11
> 12
> 13 __DATA__
> 14 RECORD1FIELD1 RECORD1FIELD2 RECORD1FIELD3 RECORD1FIELD3
> 15 RECORD1FIELD4 RECORD1FIELD5
> 16 RECORD2FIELD1 RECORD2FIELD2 RECORD2FIELD3 RECORD2FIELD3
> 17 RECORD2FIELD4 RECORD2FIELD5
>
>
> --
> Brad
I am AMAZED at the help available in this forum. It is an awesome
resource. I can see, though, that my situation needs to be stated
more clearly.
The data is not consistent throughout the entire file. I WISH I only
had to skip every other line. The problem is not quite that simple.
The data I need is always consistent within the file, but is not so
neat as to be on every other line. The common characteristic of the
data I need is that the record has an end of line marker followed by
65 spaces on the following line. Here is a better sample of what I
described:
_______BEGIN SAMPLE DATA FILE_________________
RandomJunkNothingImportantMoreJunk
StuffthatdoesntmatterWhocaresaboutthis
RECORD1FIELD1(3 spaces)RECORD1FIELD2(3 spaces)RECORD1FIELD3(newline)
(65 spaces)RECORD1FIELD4(12 spaces)RECORD1FIELD5
RECORD2FIELD1(3 spaces)RECORD2FIELD2(3 spaces)RECORD2FIELD3(newline)
(65 spaces)RECORD2FIELD4(12 spaces)RECORD2FIELD5
RandomJunkNothingImportantMoreJunk
StuffthatdoesntmatterWhocaresaboutthis
MoreJunkThatDoesntmatterStuffIdontwantWhocaresaboutthis
RECORD3FIELD1(3 spaces)RECORD3FIELD2(3 spaces)RECORD3FIELD3(newline)
(65 spaces)RECORD3FIELD4(12 spaces)RECORD3FIELD5
RECORD4FIELD1(3 spaces)RECORD4FIELD2(3 spaces)RECORD4FIELD3(newline)
(65 spaces)RECORD4FIELD4(12 spaces)RECORD4FIELD5
RECORD5FIELD1(3 spaces)RECORD5FIELD2(3 spaces)RECORD5FIELD3(newline)
(65 spaces)RECORD5FIELD4(12 spaces)RECORD5FIELD5
RECORD6FIELD1(3 spaces)RECORD6FIELD2(3 spaces)RECORD6FIELD3(newline)
(65 spaces)RECORD6FIELD4(12 spaces)RECORD6FIELD5
___________END SAMPLE DATA FILE ____________________
You will notice in the sample above that the only consistent items
between the usable data is the (newline) followed by (65 spaces).
Therefore if I could find a way to do a search and replace
s/(newline)(65spaces)//gi;
that would be great. I just need to get each (newline)followed by
(65spaces) and delete it. I just am not sure how to do that. My
brain hurts.
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/