Hello John: Well I could use a packed type. The only letters that will be found in the string are "ATCG" so yeah I don't need unicode and those things.
Will try out with vector or ByteString. Thanks! :) On Mon, Apr 19, 2010 at 2:37 PM, John Lato <[email protected]> wrote: > > Subject: Re: [Haskell-cafe] hamming distance allocation > > > > Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller: > >> Hello all: > >> > >> I want to generate some hamming distance statistics about a set of > >> strings. As explained in another e-mail in this list, I used the > >> following code to call the > >> functions: > >> (exampl holds the list of strings of size w) > >> filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl, > >> ys <- exampl] > >> > >> I have two hamming functions: > >> -- hamming distance for variable length strings > >> hamming :: String -> String -> Int > >> hamming x y = hamming' x y 0 > >> where > >> hamming' [] _ !c = c > >> hamming' _ [] !c = c > >> hamming' (x:xs) (y:ys) !c > >> | x == y = hamming' xs ys c > >> | otherwise = hamming' xs ys (c + 1) > >> > >> -- function posted in this mailing list > >> hamming2 :: String -> String -> Int > >> hamming2 xs ys = length (filter not (zipWith (==) xs ys)) > >> > >> I am executing these functions millions of times and the bottleneck of > >> my program is in them as explained by running in profiling mode with > >> +RTS -K400M -p -RTS > >> > >> The costlier function is the hamming distance > >> COST CENTRE MODULE %time %alloc > >> > >> hamming Distances 66.6 41.9 > >> > >> It says that it is performing 41% of the allocations. In the case of > >> hamming2 the allocations go as far as 52%. > > > > Allocations are cheap, so that's not necessarily a problem. More > important > > is, what's the maximum residency and how much is copied during GC? > > Are you compiling with -O2 ? > > > >> I could understand that > >> there are allocations in "hamming2" because we are creating pairs, but > >> in the case of "hamming" there should be no allocation. > > > > Why not? I don't know how GHC counts allocations, but everytime you go > from > > (x:xs) to xs, you need a new pointer to the tail. If that counts as > > allocation, hamming must allocate a lot, too. > > Is it really necessary to use Strings? I think a packed type, e.g. > Vector or ByteString, would be much more efficient here. Of course > this is only likely to be a benefit if you can move away from String > entirely. > > I suspect that "hamming2" would perform better then. > > John > _______________________________________________ > Haskell-Cafe mailing list > [email protected] > http://www.haskell.org/mailman/listinfo/haskell-cafe >
_______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
