here is the script. give it a name, say, seqComp.pl. usage: perl seqComp.pl
<input_FASTA_file>.
HTH,
Anjan
#! /usr/bin/perl -w
use strict;
open (S, "$ARGV[0]") || die "cannot open FASTA file to read: $!";
my %s;# a hash of arrays, to hold each line of sequence
my %seq; #a hash to hold the AA sequences.
my $key;
while (<S>){ #Read the FASTA file.
chomp;
if (/>/){
s/>//;
$key= $_;
}else{
push (@{$s{$key}}, $_);
}
}
foreach my $a (keys %s){
my $s= join("", @{$s{$a}});
$seq{$a}=$s;
#print("$a\t$s\n");
}
my @aa= qw(A R N D C Q E G H I L K M F P S T W Y V);
foreach my $k (keys %seq){
my %count; # a hash to hold the count for each amino acid in the
protein.
my @seq= split(//, $seq{$k});
foreach my $r(@seq){
$count{$r}++;
}
print("$k\n");
foreach my $a (@aa){
$count{$a}||=0;
$count{$a}= sprintf("%0.2f", $count{$a}/length($seq{$k}));
print("$a\t$count{$a}\n");
}
}
On Wed, Dec 1, 2010 at 2:31 PM, Rob Dixon <[email protected]> wrote:
> On 01/12/2010 08:44, Changrong Ge wrote:
>
>>
>> I am quite new to this perl language-I am from biochemistry field.
>> Now trying to write a script for my current work but could not make
>> it. The idea is to calculate the composition (percentage) of amino
>> acids in a protein sequence.
>>
>> Input is a series of fasta format (protein sequence) output is a tab
>> delimited format like below: >
>>
>> Name A T C D N Q E .......
>> protein1 0.23 0.40 0.20 ...
>> protein2 0.52 0.01 ....
>> protein3
>> ......
>> Could somebody help me with this? I tried reading some books like
>> perl for bioinformatics, but still not into it.
>>
>
> Hi Changrong
>
> The problem doesn't seem difficult, but I'm afraid we don't have much
> knowledge of bioinformatics between us. If you post a sample of input
> data and the corresponding output you desire then I am sure we can help.
>
> Regards,
>
> Rob
>
>
> --
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> http://learn.perl.org/
>
>
>
--
===================================
anjan purkayastha, phd.
research associate
fas center for systems biology,
harvard university
52 oxford street
cambridge ma 02138
phone-703.740.6939
===================================