Hello Tiago,
On Thu, Jan 26, 2012 at 11:08 AM, Tiago Hori <[email protected]> wrote:
> Hi All,
>
> I need some help to get started on a script.
>
> I have these huge data files 16K rows and several columns. I need to parse
> the rows into a subset of these 16K rows. Each rows has a identifier made
> up of 2 letters and 6 numbers and the ones I want have specific letter,
> they start with either C or D. So I know I can use regex, but I have been
> trying to figure out the rest and I don't know where to start. This is the
> first time I am trying to do something from scratch so any suggestions
> would be appreciated. I am not asking for the script but just some help on
> how to go about it.
>
> So, what I want to be able to do is retrieve all the rows that have
> identifiers starting with C or D. Should I use arrays, can I store each row
> as one item a (set of information separated by tabs) in an array?
>
Yes I would use an array to store the data and then use regex to
extract the rows based on your criteria.
I put together a little sample program using fictitious data. You
should be able to apply the same concept to your needs.
***tested***
#!/usr/bin/perl
use warnings;
use strict;
while ( <DATA> ) {
chomp;
my @array = split;
my $GeneID = $array[6];
if ($GeneID =~ /^C|D/) {
print $_,"\n";
}
}
__DATA__
Line1 c 2 3 4 5 C 7 8 9
Line2 1 2 3 4 5 6 7 8 9
Line3 1 2 3 4 5 D 7 8 9
Line4 1 2 3 4 5 6 7 8 9
Line5 1 2 3 4 5 D 7 8 9
Line6 1 2 3 4 5 6 7 8 9
***output***
Line1 c 2 3 4 5 C 7 8 9
Line3 1 2 3 4 5 D 7 8 9
Line5 1 2 3 4 5 D 7 8 9
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/