> From: Nathalie Conte [mailto:[email protected]]
> Sent: Friday, September 30, 2011 9:38 AM
> To: [email protected]
> Subject: parsing script removing some lines help please
>
>
>
> Hi,
> I am lost in my script, and would need to basic help please.
> I have got a file , separated by tabs, and the first column contain a
> chromosome number, then several other column with different infos.
> Basically I am trying to created a script that would take a file(see
> example), parse line by line, and when the first column start by any
> of
> the chromosomes I don't want (6,8,14,16,18,Y), go the next line, and if
> it doesn't start by the bad chromosomes , print all the line to a new
> output file.
> the script below, just reprint the same original file :(
> thanks for any clues
> Nat
>
>
>
> #!/software/bin/perl
> use warnings;
> use strict;
> open(IN, "<example.txt") or die( $! );
> open(OUT, ">>removed.txt") or die( $! );
> my @bad_chromosome=(6,8,14,16,18,Y);
> while(<IN>){
> chomp;
> my @column=split /\t/;
> foreach my $chr_no(@bad_chromosome){
> if ($column[0]==$chr_no){
> next;
> }
> }
> print OUT
> $column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\t",$column[
> 4],"\t",$column[5],"\t",$column[6],"\t",$column[7],"\t",$column[8],"\t"
> ,$column[9],"\t",$column[10],"\t",$column[11],"\t",$column[12],"\t",$co
> lumn[13],"\t",$column[14],"\n";
> }
>
>
>
> close IN; close OUT;
>
John has provided good advice on this problem, but I wanted to add a couple
of things.
To avoid explicitly coding the foreach loop for @bad_chromosome, you could
use the 'grep' function.
Also, if you are just reprinting the input line, print $_.
unless ( grep {$column[0] eq $_} @bad_chromosome ){
print OUT "$_\n"; # or print $OUT if declared as John suggested
The grep call will return the number of times $column[0] matched an element
of @bad_chromosome.
Thus, if there is a match the grep call will evaluate to 'true'. Otherwise,
it will evaluate to 'false'.
Using grep does have a drawback (but not that much unless you have a lot of
values in @bad_chromosome). It checks all the values of @bad_chromosome for
a match. Using the 'if ... next' stops looking for a match when a match is
found.
If you wonder about the use of $_ in the grep function - that is a localized
copy of $_ and does not affect the $_ that contains the data read from the
file.
If you are using Perl 5.10 or higher, you can use the 'smart match'
operators instead of grep.
HTH, Ken
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/