Your code can be simplified quite a bit if I correctly understand what you were actually trying to do. I have taken a stab at it but had to guess at your intent with the layout of the fields. Let's clear up the field layout. Your data have 9 fields (at least to the untrained eye), separated by the | character. Yet your "Fields:" line has 12 field names. Looking at this as 9 fields, do you wish to return fields 1, 6, 7, 8, and some portion of 9?
It's the 9th field that is troublesome. If you look at the last field of your REQUIRED output, you want *6* items separated by spaces in rows 1 and 4 but only *5* items in rows 2 and 3. Is this a mistake? Please count your fields in the raw data and your desired output. Make sure you are correct in what you have and what you need. Where is the "gene n chromosome info?" Assumptions I made in order to solve your problem: Since the 4th space-delimited column of the 9th field is always zero, I am assuming that you do not need it. I am also assuming that you need 6 items from the 9th field. Therefore, for the 9th field, I am assuming that you need items: 1-3 and 5-7. Hopefully, this will solve your problem with the data. I also check the web page for the gi number. Please note that you are not keeping track of the gi numbers in any way that would make it easy to see if you have already looked up that gi number (this should be done in a hash, not an array). Lines 3 and 4 of your data have the same gi number. This routine submits one gi for each line, so there might be repeats. I save the web page to "gi".htm (substitute the actual gi number for "gi"). I then extract the DNA map text to "gi".txt. I then add "gi".txt to the end of the reference to the array for that record. Hopefully this will give you enough to complete the rest yourself. If you do not require a printout of the final array, you can delete the two lines containing "Dumper." -------BEGIN CODE------- #!/usr/bin/perl use warnings; use strict; use LWP::UserAgent; use Data::Dumper; my @data; my $ua = LWP::UserAgent->new() or die "Could not create UserAgent: $!\n"; my $noeol = '[^\n\r\x0A\x0D]'; while (<DATA>) { next if /^#/ or /^\s*$/; chomp; push @data, [ ( split /\|/, $_ )[0,5..8] ]; $data[-1]->[-1] = join ' ', ( split ' ', $data[-1]->[-1] )[0..2,4..6]; my $gi = $data[-1]->[1]; my $response = $ua->get("http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=$ gi", ':content_file' => "$gi.htm"); if ($response->is_success) { local $/ = undef; open HTM, "$gi.htm" or die "Cannot open $gi.htm for reading: $!\n"; (my $text = <HTM>) =~ s/.+ORIGIN$noeol+.(.+?)\/\/.+/$1/s; open TXT, ">$gi.txt" or die "Cannot open $gi.txt for writing: $!\n"; print TXT $text; push @{$data[-1]}, "$gi.txt"; } else { print $response->as_string; } } print Dumper [EMAIL PROTECTED]; __DATA__ # BLASTN 2.2.9 [May-01-2004] your data from below goes here -------END CODE------- "Aditi gupta" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > hi to all, > > i had a file which contained following data: > > # BLASTN 2.2.9 [May-01-2004] > # Query: gi|37182815|gb|AY358849.1| Homo sapiens clone DNA180287 ALTE (UNQ6508) mRNA, complete cds > # Database: nr > # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score > gi|37182815|gb|AY358849.1| gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 0 552 568 3218 3234 1.1 34.19 > gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 95.24 21 1 0 435 455 56604 56624 1.1 34.19 > gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 100.00 16 0 0 260 275 89982 89967 4.2 32.21 > gi|37182815|gb|AY358849.1| gi|7385112|gb|AF222766.1|AF222766 100.00 17 0 0 345 361 242 226 1.1 34.19 > > but i required only some of the fields, and with the help of members of this maillist, i succeeded and obtained following output: > > gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 552 568 > gi|14318385|gb|AC089993.2| 95.24 21 1 435 455 > gi|14318385|gb|AC089993.2| 100.00 16 0 260 275 > gi|7385112|gb|AF222766.1|AF222766 100.00 17 0 345 361 > [code snipped] > > > > but i also have to feed the gi number(the first field) into ncbi entrez nucleotide site: > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide > and retreive the gene and chromosome name, if available from the resulting web page ........ > is it possible to get the gene n chromosome info in the output with other fields?what changes in code are required? -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>
