[Python-Dev] Re: Optimizing pymalloc (was obmalloc

2019-07-10 Thread Tim Peters
[Inada Naoki] >> So I tried to use LIKELY/UNLIKELY macro to teach compiler hot part. >> But I need to use >> "static inline" for pymalloc_alloc and pymalloc_free yet [1]. [Neil Schemenauer] > I think LIKELY/UNLIKELY is not helpful if you compile with LTO/PGO > enabled. I like adding those regardl

[Python-Dev] Re: Optimizing pymalloc (was obmalloc

2019-07-10 Thread Inada Naoki
> Mean +- std dev: [python-master] 199 ms +- 1 ms -> [python] 182 ms +- > 4 ms: 1.10x faster (-9%) ... > I will try to split pymalloc_alloc and pymalloc_free to smaller functions. I did it and pymalloc is now as fast as mimalloc. $ ./python bm_spectral_norm.py --compare-to=./python-master python-

[Python-Dev] Re: Optimizing pymalloc (was obmalloc

2019-07-10 Thread Inada Naoki
On Wed, Jul 10, 2019 at 5:18 PM Neil Schemenauer wrote: > > On 2019-07-09, Inada Naoki wrote: > > PyObject_Malloc inlines pymalloc_alloc, and PyObject_Free inlines > > pymalloc_free. > > But compiler doesn't know which is the hot part in pymalloc_alloc and > > pymalloc_free. > > Hello Inada, > >

[Python-Dev] Re: Optimizing pymalloc (was obmalloc

2019-07-10 Thread Neil Schemenauer
On 2019-07-09, Inada Naoki wrote: > PyObject_Malloc inlines pymalloc_alloc, and PyObject_Free inlines > pymalloc_free. > But compiler doesn't know which is the hot part in pymalloc_alloc and > pymalloc_free. Hello Inada, I don't see this on my PC. I'm using GCC 8.3.0. I have configured the bui

[Python-Dev] Re: Optimizing pymalloc (was obmalloc

2019-07-09 Thread Neil Schemenauer
On 2019-07-09, Inada Naoki wrote: > So I tried to use LIKELY/UNLIKELY macro to teach compiler hot part. > But I need to use > "static inline" for pymalloc_alloc and pymalloc_free yet [1]. I think LIKELY/UNLIKELY is not helpful if you compile with LTO/PGO enabled. So, I would try that first. Also

[Python-Dev] Re: Optimizing pymalloc (was obmalloc

2019-07-09 Thread Tim Peters
[Inada Naoki , looking into why mimalloc did so much better on spectral_norm] > I compared "perf" output of mimalloc and pymalloc, and I succeeded to > optimize pymalloc! > > $ ./python bm_spectral_norm.py --compare-to ./python-master > python-master: . 199 ms +- 1 ms > python: