Hello.
I am exploring haskell features for parallel and cocurrent programming
and see something difficult to explain.
In brief - asking RTS to use more threads results in awfull drop of
performance. And according to 'top' test programm consumes up to N CPUs
power.
Am I doing something wrong? I attached the code, but I am just issuing
thousands of HTTP GET requests in 1-4 forkIO threads. And since it looks
like local apache is faster than haskell program (which is a pity) I
expected that using more OS threads should improve performance.
Just in case:
ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.12.1
import Data.List
import System.IO
import qualified System.IO.UTF8
import System.Environment (getArgs)
import Network.HTTP
import Network.URI
import System.Time
import System.IO.Unsafe
import Control.Monad
import Control.Exception
import Control.Concurrent
import Control.Concurrent.MVar
secDiff :: ClockTime -> ClockTime -> Float
secDiff (TOD secs1 psecs1) (TOD secs2 psecs2) =
fromInteger (psecs2 - psecs1) / 1e12 + fromInteger (secs2 - secs1)
-- single get
get :: Int -> IO(String)
get id = do
res <- simpleHTTP $ getRequest "http://127.0.0.1"
case res of
Left err -> return(show err)
Right rsp -> return(show $ rspCode rsp)
-- perform GET per each list element using c threads
doList :: [Int] -> Int -> IO()
doList ids 0 =
return()
doList [] c =
return()
doList ids c = do
forkChild $ forM_ todo get
doList later (c-1)
where (todo, later) = splitAt (length ids `div` c) ids
{-
Copied from
http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Control-Concurrent.html#11
Terminating the program
-}
children :: MVar [MVar ()]
children = unsafePerformIO (newMVar [])
waitForChildren :: IO ()
waitForChildren = do
cs <- takeMVar children
case cs of
[] -> return ()
m:ms -> do
putMVar children ms
takeMVar m
waitForChildren
forkChild :: IO () -> IO ThreadId
forkChild io = do
mvar <- newEmptyMVar
childs <- takeMVar children
putMVar children (mvar:childs)
forkIO (io `finally` putMVar mvar ())
-- end of copied code
main = do
[c', n'] <- getArgs
let
c = read c' :: Int
n = read n' :: Int
start <- getClockTime
doList [1..n] c
waitForChildren
end <- getClockTime
putStrLn $ show(c) ++ " " ++ show(secDiff start end) ++ "s"
20:31 sa...@loft4633:/tmp 21> ghc --make -threaded get.hs
[1 of 1] Compiling Main ( get.hs, get.o )
Linking get ...
20:31 sa...@loft4633:/tmp 22> ./get 1 10000
1 3.242352s
20:31 sa...@loft4633:/tmp 23> ./get 2 10000
2 3.08306s
20:31 sa...@loft4633:/tmp 24> ./get 2 10000 +RTS -N2
2 6.898871s
20:32 sa...@loft4633:/tmp 25> ./get 3 10000
3 2.950677s
20:32 sa...@loft4633:/tmp 26> ./get 3 10000 +RTS -N2
3 7.381678s
20:32 sa...@loft4633:/tmp 27> ./get 3 10000 +RTS -N3
3 14.683548s
20:32 sa...@loft4633:/tmp 28> ./get 4 10000
4 3.332165s
20:33 sa...@loft4633:/tmp 29> ./get 4 10000 +RTS -N4 -s
./get 4 10000 +RTS -N4 -s
4 57.17923s
2,147,969,912 bytes allocated in the heap
49,059,288 bytes copied during GC
736,656 bytes maximum residency (98 sample(s))
486,744 bytes maximum slop
5 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 949 collections, 948 parallel, 76.73s, 25.67s elapsed
Generation 1: 98 collections, 98 parallel, 7.70s, 2.56s elapsed
Parallel GC work balance: 2.17 (6115428 / 2822692, ideal 4)
MUT time (elapsed) GC time (elapsed)
Task 0 (worker) : 1.43s ( 27.76s) 6.31s ( 2.12s)
Task 1 (worker) : 0.00s ( 28.13s) 10.62s ( 3.56s)
Task 2 (worker) : 0.37s ( 28.63s) 11.06s ( 3.69s)
Task 3 (worker) : 0.00s ( 28.95s) 6.29s ( 2.10s)
Task 4 (worker) : 20.73s ( 28.95s) 9.68s ( 3.24s)
Task 5 (worker) : 0.00s ( 28.95s) 0.60s ( 0.20s)
Task 6 (worker) : 21.81s ( 28.95s) 11.91s ( 3.97s)
Task 7 (worker) : 18.59s ( 28.95s) 13.04s ( 4.36s)
Task 8 (worker) : 17.24s ( 28.96s) 14.92s ( 4.99s)
SPARKS: 0 (0 converted, 0 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 79.23s ( 28.95s elapsed)
GC time 84.43s ( 28.23s elapsed)
EXIT time 0.00s ( 0.01s elapsed)
Total time 162.49s ( 57.19s elapsed)
%GC time 52.0% (49.4% elapsed)
Alloc rate 27,513,782 bytes per MUT second
Productivity 48.0% of total user, 136.5% of total elapsed
gc_alloc_block_sync: 15006
whitehole_spin: 0
gen[0].steps[0].sync_large_objects: 7617
gen[0].steps[1].sync_large_objects: 35
gen[1].steps[0].sync_large_objects: 1400
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe