Jon Hans wrote:
>
> #!/usr/bin/perl
> #######################################################
>
> I am trying to find all of the reoccurring sequences
> excluding the sub sequences.
>
> Maybe I am missing the obvious, but having a little
> perl exposure and not being an expert perl programmer
> I have hacked together some code that does some of
> what I would like to do, but I know that there must be
> a much better way of doing this. I just don't have any
> ideas right now, having only had a couple hours sleep
> in the last couple of days. :+( am I looking at this
> all wrong? There should be some regular expression(s)
> that would make this more maintainable and elegant.
> :-)
>
> I have used an array of items called @datalist and a
> hash called %frequency that has a count of how often
> each item occurs in the data list. I used tr to clean
> the data of special characters if any and split on
> white space into the @datalist array.
>
> I would appreciate some help with this. Thanks
I can't tell exactly what you are trying to do. Do you
have any examples of the original data and what you want
the modified data to look like?
> #######################################################
>
> # find frequency of all sequences of the given size
> my $count = $first = $currentseq = 0;
my $count = my $first = my $currentseq = 0;
> # size of sequence to look for
> my $sizeof = 10;
>
> while ($first + $sizeof < $#datalist) {
>
> #ugly
> if ( defined $frequency{$datalist[$first]} &&
> defined $frequency{$datalist[$first+1]} &&
> $frequency{$datalist[$first+2]} &&
> $frequency{$datalist[$first+3]} &&
> $frequency{$datalist[$first+4]} &&
> $frequency{$datalist[$first+5]} &&
> $frequency{$datalist[$first+6]} &&
> $frequency{$datalist[$first+7]} &&
> $frequency{$datalist[$first+8]} &&
> $frequency{$datalist[$first+9]} ) {
if ( (grep defined $frequency{ $_ }, @datalist[ $first .. $first + 9 ]) == 10 ) {
> # put a sequence together with a space separating
> items
> $currentseq .= $datalist[ $first ] ;
You initialized $currentseq with "0" earlier. Did you
really want it to start with "0"?
> for (my $count = 1; $count < $sizeof; ++$count)
> {
> $currentseq .= " " . $datalist[ $first +
> $count ] ;
> }
$currentseq = join ' ', @datalist[ $first .. $first + 9 ];
> # increment count of sequence for the current one
> ++$current{ $currentseq };
> }
> # next position in the data list
> ++$first;
> }
>
> foreach ( keys ( %current ) ) {
for my $currentsequence ( keys %current ) {
> # if no multiples remove sequence
> if ( $current{ $_ } < 2 ) {
> delete $current{ $_ } ;
> }
delete $current{ $currentsequence } if $current{ $currentsequence } < 2;
>
> my $currentsequence = $_ ;
> my $numberof = $current{ $_ } ;
>
> foreach ( keys ( %lastseq ) ) {
> # if the number of times the smaller sequence occurs
> is # the same, then the shorter sequence is not needed
> if ( grep($_,$currentsequence) && $lastseq{ $_ } == $numberof ) {
^^^^^^^^^^^^^^^^^^^^^^^^^
grep() operates on a LIST not a scalar
if ( $currentsequence && $lastseq{ $_ } == $current{ $currentsequence } ) {
> delete $lastseq{ $_ } ;
> }
> }
> }
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]