Rob Dixon wrote:
> Saravana Kumar wrote:
> > John W. Krahn wrote:
> >
> >>Saravana Kumar wrote:
> >>
> >>>I am new to the list and newbie in perl.
> >>>
> >>>I have a big flat file(100G). The file was supposed to be in a single
> >>>line but many of records(as it has ^M). There are also ^@ and tabs in
> >>>between.
> >>>
> >>>I want to first replace the control characters and tabs with space.
> >>>
> >>>I tried this s/[[:cntrl:]\t]/ /g.
> >>
> >>The [:cntrl:] character class includes the "\t" character.
> >>
> >>>After replacing the above said characters
> >>>with space i have to insert \n after each 1000th character.
> >>>
> >>>But the program hangs after reading about 24G( 1/4th of the file).
> >>>
> >>>I thought of reading the file character by character, check if the
> >>>character is ^M||^@||\t. If true replace with the space and write the
> >>>ouput else
> >>>simply write the output. I have to keep track of the count of
> >>>characters so as to insert \n after each 1000th character.
> >>>
> >>>Will the above work or is there any other(simple) way to do this?( or
> >>>should i just move on to C?)
> >>>
> >>>I am not sure why my first program hang(i ran the program in a machine
> >>>with 2G RAM).
> >>
> >>You can do what you want if you set the Input Record Separator to read
> >>1000 bytes at a time:
> >>
> >>$/ = \1000;
> >>while ( <FILE> ) {
> >> s/[[:cntrl:]]/ /g;
> >> print "$_\n";
> >> }
> >
> > Thanks John. That did the trick. I ran the above script with my input
> > file and redirected the output to another file. Since it is creating a
> > new file i was wondering whether i can do the changes in the same file
> > ie., read 1000 characters, do the replacement and write the output to
> > the same file. This will reduce the disk space used(since the file i
> > have is 100G).
>
> That is like preparing an apple pie while it is in the oven to save on
> kitchen space. You can't easily do it because each of your new records is
> one character longer than the original record and you would be overwriting
> data you hadn't processed yet. It is possible, in the sense that you could
> make sure that all the data is read from the file and held elsewhere (in
> memory or in a temporary file) before it is overwritten, but it wouldn't
> be a simple piece of code to get working correctly. In any case it is a
> bad idea because if you have a failure of any sort part-way through
> processing then your original data is then lost and you have no second
> chance. If the people you are working for expect to have files of this
> size and haven't allowed for storage space for several of them at once
> then you need to have a word with them about storage planning. You need a
> new disk drive: $100 will buy you around 300GB these days and that doesn't
> buy enough of your time to write clever software to cope with the lack of
> disk space.
>
> Cheers,
>
> Rob
>
I have enough space in the HDD to store more files but this "idea" came to
me just as a thought. I missed the part that adding "\n" will actually
overwrite the first character in the next record, which i haven't read at
all. I am going ahead with the same method( redirecting the output to new
file) so as to save the coding time. Not to mention that i cant loose any
data in that file.
Thanks! for all who replied to my queries. Thanks! for the time spent.
Regds,
SK
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>