Dear all,
I am a Perl newbie struggling to accomplish a conceptually simple
bioinformatics task. I have a single large file containing about
200,000 DNA probe sequences (from an Affymetrix microarray), each of
which is accompanied by a header, like so (this is in FASTA format):
>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:107_at:543:519; Interrogation_Position=258; Antisense;
CTACTCTCGTGGTGCACAAGGAGTG
>probe:HG_U95Av2:1156_at:528:483; Interrogation_Position=2054;
Antisense;
TGCAGGTGGCAGATCTGCAGTCCAT
>probe:HG_U95Av2:1102_s_at:541:589; Interrogation_Position=4316;
Antisense;
GTGAAGGTTGCTGAGGCTCTGACCC
.........etc.
What I would like to do is extract from this file a subset of ~130,800
probes (both the header and the sequence) and output this subset into a
new text file in the same (FASTA) format. These 130,800 probes
correspond to 8,175 probe set IDs ("1138_at" is an example of a probe
set ID in the header listed above). I have these 8,175 IDs listed in a
separate file called "ID.txt" and the 200,000 probe sequences in a file
called "HG_U95Av2_probe_fasta.txt". The script below is missing
something because the output file ("probe_subset.txt") is blank. This
is also the case if I replace the file "ID.txt" with a file consisting
of a single probe set ID (e.g. 1138_at). Does anyone know what I am
missing? I am running this script in Cygwin on Windows XP. I
appreciate any suggestions!
~ Mike O.
#!/usr/bin/perl -w
use strict;
my $IDs = 'ID.txt';
unless (open(IDFILE, $IDs)) {
print "Could not open file $IDs!\n";
}
my $probes = 'HG_U95Av2_probe_fasta.txt';
unless (open(PROBES, $probes)) {
print "Could not open file $probes!\n";
}
open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
my @ID = <IDFILE>;
chomp @ID;
while (<PROBES>) {
foreach (@ID) {
if(/^>probe:\w+:(\w+):/) {
print OUT;
print OUT scalar(<PROBES>);
}
}
}
exit;
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>