Hello all: I found my leak after adding some bang patterns in a different part of the program. The compiler was generating all the combinations of the list comprehensions and therefore the performance dropped very badly.
BTW, hamming is 2 times faster than hamming2. Thank you as always! Arnoldo On Mon, Apr 19, 2010 at 5:53 PM, Arnoldo Muller <[email protected]>wrote: > Hello Daniel: > > My % GC time is : 75.0% (81.4% elapsed) and I am compiling with -O2. > Thank you for clarifying about the pointers. > > Slowly my memory grows up and eventually it explodes. I would expect that > the list comprehension is lazily evaluated and therefore at any given time I > am only executing one hamming distance. The result of the hamming distance > is stored into a small statistics datatype I built (only stores sums and sum > of squares and the counts). This datatype is updated using a foldr. > > I have no idea where the leak is. What do you see in a .prof file to find a > leak (hamming distance has the largest amount of time and %alloc) ? From my > .prof file where would you start looking at? > > > Best Regards, > > Arnoldo Muller > > > > > On Mon, Apr 19, 2010 at 3:18 AM, Daniel Fischer > <[email protected]>wrote: > >> Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller: >> > Hello all: >> > >> > I want to generate some hamming distance statistics about a set of >> > strings. As explained in another e-mail in this list, I used the >> > following code to call the >> > functions: >> > (exampl holds the list of strings of size w) >> > filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl, >> > ys <- exampl] >> > >> > I have two hamming functions: >> > -- hamming distance for variable length strings >> > hamming :: String -> String -> Int >> > hamming x y = hamming' x y 0 >> > where >> > hamming' [] _ !c = c >> > hamming' _ [] !c = c >> > hamming' (x:xs) (y:ys) !c >> > | x == y = hamming' xs ys c >> > | otherwise = hamming' xs ys (c + 1) >> > >> > -- function posted in this mailing list >> > hamming2 :: String -> String -> Int >> > hamming2 xs ys = length (filter not (zipWith (==) xs ys)) >> > >> > I am executing these functions millions of times and the bottleneck of >> > my program is in them as explained by running in profiling mode with >> > +RTS -K400M -p -RTS >> > >> > The costlier function is the hamming distance >> > COST CENTRE MODULE %time %alloc >> > >> > hamming Distances 66.6 41.9 >> > >> > It says that it is performing 41% of the allocations. In the case of >> > hamming2 the allocations go as far as 52%. >> >> Allocations are cheap, so that's not necessarily a problem. More important >> is, what's the maximum residency and how much is copied during GC? >> Are you compiling with -O2 ? >> >> > I could understand that >> > there are allocations in "hamming2" because we are creating pairs, but >> > in the case of "hamming" there should be no allocation. >> >> Why not? I don't know how GHC counts allocations, but everytime you go >> from >> (x:xs) to xs, you need a new pointer to the tail. If that counts as >> allocation, hamming must allocate a lot, too. >> >> > >> > How can I execute my hamming functions without allocating memory? >> > >> > Best regards, >> > >> > Arnoldo Muller >> >> >
_______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
