Re: Performance of small allocations via prim ops

Ben Gamari Thu, 06 Apr 2023 15:02:36 -0700

Harendra Kumar <[email protected]> writes:

> I was looking at the RTS code for allocating small objects via prim ops
> e.g. newByteArray# . The code looks like:
>
> stg_newByteArrayzh ( W_ n )
> {
>     MAYBE_GC_N(stg_newByteArrayzh, n);
>
>     payload_words = ROUNDUP_BYTES_TO_WDS(n);
>     words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
>     ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
>
> We are making a foreign call here (ccall). I am wondering how much overhead
> a ccall adds? I guess it may have to save and restore registers. Would it
> be better to do the fast path case of allocating small objects from the
> nursery using cmm code like in stg_gc_noregs?
>
GHC's operational model is designed in such a way that foreign calls are
fairly cheap (e.g. we don't need to switch stacks, which can be quite
costly). Judging by the assembler produced for newByteArray# in one
random x86-64 tree that I have lying around, it's only a couple of
data-movement instructions, an %eax clear, and a stack pop:


      36:       48 89 ce                mov    %rcx,%rsi
      39:       48 89 c7                mov    %rax,%rdi
      3c:       31 c0                   xor    %eax,%eax
      3e:       e8 00 00 00 00          call   43 <stg_newByteArrayzh+0x43>
      43:       48 83 c4 08             add    $0x8,%rsp

The data movement operations in particular are quite cheap on most
microarchitectures where GHC would run due to register renaming. I doubt
that this overhead would be noticable in anything but a synthetic
benchmark. However, it never hurts to measure.

Cheers,

- Ben

signature.asc
Description: PGP signature

_______________________________________________
ghc-devs mailing list
[email protected]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: Performance of small allocations via prim ops

Reply via email to