[Numpy-discussion] Shape detection
Hello, I am not sure that this is correct group for my problem but I hope someone can help me :) I try to analyze picture with porous material (https://python.neocast.eu/disc.png). I calculate a total quantity of each pores but I would like also calculate a porosity. To make I need a size of sample or total pixels which are in the sample. How to do it ? thanks in advance Pavel ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Shape detection
Hi Pawel, I would recommend scikit-image for these types of analysis. Here's a start: --- from skimage import io, measure import numpy as np image = io.imread('disc.png') thresholded = image > 10 labels = measure.label(image) regions = measure.regionprops(labels) regions_small = [r for r in regions if r.area < 10] mu = np.mean([r.area for r in regions_small]) M = np.max([r.area for r in regions_small]) print(f"Mean area: {mu}") print(f"Max area: {M}") --- That's a pretty crude way of rejecting the background areas, and can be improved in various ways. Feel free to also post to the scikit-image user forum at https://forum.image.sc/tag/scikit-image Best regards, Stéfan On Sat, Jan 22, 2022, at 10:45, pawel.dar...@gmail.com wrote: > Hello, > > I am not sure that this is correct group for my problem but I hope > someone can help me :) > > I try to analyze picture with porous material > (https://python.neocast.eu/disc.png). I calculate a total quantity of > each pores but I would like also calculate a porosity. To make I need a > size of sample or total pixels which are in the sample. How to do it ? > > thanks in advance > Pavel ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Proposal for new function to determine if a float contains an integer
Have any of the numpy devs weighed in on this? If an efficient version of this were available in numpy there is a lot of pandas code I would enjoy ripping out. On Sun, Jan 2, 2022 at 11:16 AM Joseph Fox-Rabinovitz < jfoxrabinov...@gmail.com> wrote: > Is there a guide on how to pacakage non-ufunc functions with multiple > loops? Something like sort? It looks like there is no way of adding > additional arguments to a ufunc as of yet. > > On a related note, would it be more useful to have a function that returns > the number of bits required to store a number, or -1 if it has a fractional > part? Then you could just test something like ``(k := integer_bits(a)) < 64 > & k > 0``. > > - Joe > > > On Sat, Jan 1, 2022 at 5:55 AM Joseph Fox-Rabinovitz < > jfoxrabinov...@gmail.com> wrote: > >> Stefano, >> >> That is an excellent point. Just to make sure I understand, would an >> interface like `is_integer(a, int_dtype=None)` be satisfactory? That way, >> there are no bounds by default (call it python integer bounds), but the >> user can specify a limited type at will. An alternative would be something >> like `is_integer(a, bits=None, unsigned=False)`. This would have the >> advantage of testing against hypothetical types, which might be useful >> sometimes, or just annoying. I could always allow a two-element tuple in as >> an argument to the first version. >> >> While I completely agree with the idea behind adding this test, one big >> question remains: can I add arbirary arguments to a ufunc? >> >> - Joe >> >> On Sat, Jan 1, 2022 at 5:41 AM Stefano Miccoli >> wrote: >> >>> I would rather suggest .is_integer(integer_dtype) signature because >>> knowing that 1e300 is an integer is not very useful in the numpy world, >>> since this integer number is not representable as a numpy.integer dtype. >>> >>> Note that in python >>> >>> assert not f.is_integer() or int(f) == f >>> >>> never fails because integers have unlimited precision but this does >>> would not map into >>> >>> assert ( ~f_arr.is_integer() | (np.int64(f_arr) == f.arr) ).all() >>> >>> because of possible OverflowErrors. >>> >>> Stefano >>> >>> On 31 Dec 2021, at 04:46, numpy-discussion-requ...@python.org wrote: >>> >>> Is adding arbitrary optional parameters a thing with ufuncs? I could >>> easily add upper and lower bounds checks. >>> >>> On Thu, Dec 30, 2021, 20:56 Brock Mendel wrote: >>> At least some of the commenters on that StackOverflow page need a slightly stronger check: not only is_integer(x), but also "np.iinfo(dtype).min <= x <= np.info(dtype).max" for some particular dtype. i.e. "Can I losslessly set these values into the array I already have?" >>> >>> ___ >>> NumPy-Discussion mailing list -- numpy-discussion@python.org >>> To unsubscribe send an email to numpy-discussion-le...@python.org >>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>> Member address: jfoxrabinov...@gmail.com >>> >> ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: jbrockmen...@gmail.com > ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Upcoming: 2nd edition of "Machine learning with scikit-learn MOOC"
Hi everyone, The team at Inria, with the help of the Inria learning lab, will soon be opening the 2nd edition of the "Machine Learning with scikit-learn" MOOC: https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/ The content of the MOOC is visible here (we are still polishing details, this is not final): https://inria.github.io/scikit-learn-mooc/ As you can see, it touches all the basics of machine learning, introduced with scikit-learn, teaching much more than the API of the library. We have put a lot of effort on being didactic. Anna Kondratenko, one of last year's participant, said of last year's edition: "I did a #ScikitLearnMooc course as part of a #100DaysOfCode challenge and I just loved it. Scikit-learn creators managed to make it practice-focused and entertaining at the same time. Also, it is perfect for beginners since it starts from the basics going to more advanced level." https://twitter.com/anacoding/status/1484949583629369344 This year's edition should be significantly more didactic! One of the values of participating to the MOOC, compared to just the material that we provide on the web, is that it is full of coding exercise, that are meant to teach understanding of machine-learning and coding skills. The MOOC is absolutely free, and all the materials are open (in the spirit of scikit-learn). While people on this list may already know the contents of this MOOC (though, we have inserted many useful reflections), you might know people who could benefit from this course to learn machine learning. Please help us spread the word. Pythonly yours, Gaël -- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Performance mystery
On 1/20/22, Francesc Alted wrote: > On Wed, Jan 19, 2022 at 7:48 PM Francesc Alted wrote: > >> On Wed, Jan 19, 2022 at 6:58 PM Stanley Seibert >> wrote: >> >>> Given that this seems to be Linux only, is this related to how glibc >>> does >>> large allocations (>128kB) using mmap()? >>> >>> https://stackoverflow.com/a/33131385 >>> >> >> That's a good point. As MMAP_THRESHOLD is 128 KB, and the size of `z` is >> almost 4 MB, mmap machinery is probably getting involved here. Also, as >> pages acquired via anonymous mmap are not actually allocated until you >> access them the first time, that would explain that the first access is >> slow. What puzzles me is that the timeit loops access `z` data 3*1 >> times, which is plenty of time for doing the allocation (just should >> require just a single iteration). >> > > I think I have more evidence that what is happening here has to see of how > the malloc mechanism works in Linux. I find the next explanation to be > really good: > > https://sourceware.org/glibc/wiki/MallocInternals > > In addition, this excerpt of the mallopt manpage ( > https://man7.org/linux/man-pages/man3/mallopt.3.html) is very significant: > > Note: Nowadays, glibc uses a dynamic mmap threshold by > default. The initial value of the threshold is 128*1024, > but when blocks larger than the current threshold and less > than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the > threshold is adjusted upward to the size of the freed > block. When dynamic mmap thresholding is in effect, the > threshold for trimming the heap is also dynamically > adjusted to be twice the dynamic mmap threshold. Dynamic > adjustment of the mmap threshold is disabled if any of the > M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or > M_MMAP_MAX parameters is set. > > This description matches closely what is happening here: after `z` is freed > (replaced by another random array in the second part of the calculation), > then dynamic mmap threshold enters and the threshold is increased by 2x of > the freed block (~4MB in this case), so for the second part, the program > break > (i.e. where the heap ends) is increased instead, which is faster because > this memory does not need to be zeroed before use. > > Interestingly, the M_MMAP_THRESHOLD for system malloc can be set by using > the MALLOC_MMAP_THRESHOLD_ environment variable. For example, the original > times are: > > $ python mmap-numpy.py > numpy version 1.20.3 > > 635.4752 microseconds > 635.8906 microseconds > 636.0661 microseconds > > 144.7238 microseconds > 143.9147 microseconds > 144.0621 microseconds > > but if we enforce to always use mmap: > > $ MALLOC_MMAP_THRESHOLD_=0 python mmap-numpy.py > numpy version 1.20.3 > > 628.8890 microseconds > 628.0965 microseconds > 628.7590 microseconds > > 640.9369 microseconds > 641.5104 microseconds > 642.4027 microseconds > > so first and second parts executes at the same (slow) speed. And, if we > set the threshold to be exactly 4 MB: > > $ MALLOC_MMAP_THRESHOLD_=4194304 python mmap-numpy.py > numpy version 1.20.3 > > 630.7381 microseconds > 631.3634 microseconds > 632.2200 microseconds > > 382.6925 microseconds > 380.1790 microseconds > 380.0340 microseconds > > we see how performance is increased for the second part (although that not > as much as without specifying the threshold manually; probably this manual > setting prevents other optimizations to quick in). > > As a final check, if we use other malloc systems, like the excellent > mimalloc (https://github.com/microsoft/mimalloc), we can get really good > performance for the two parts: > > $ LD_PRELOAD=/usr/local/lib/libmimalloc.so python mmap-numpy.py > numpy version 1.20.3 > > 147.5968 microseconds > 146.9028 microseconds > 147.1794 microseconds > > 148.0905 microseconds > 147.7667 microseconds > 147.5180 microseconds > > However, as this is avoiding the mmap() calls, this approach probably uses > more memory, specially when large arrays need to be handled. > > All in all, this is testimonial of how much memory handling can affect > performance in modern computers. Perhaps it is time for testing different > memory allocation strategies in NumPy and come up with suggestions for > users. > > Francesc > > > >> >> >>> >>> >>> On Wed, Jan 19, 2022 at 9:06 AM Sebastian Berg < >>> sebast...@sipsolutions.net> wrote: >>> On Wed, 2022-01-19 at 11:49 +0100, Francesc Alted wrote: > On Wed, Jan 19, 2022 at 7:33 AM Stefan van der Walt > > wrote: > > > On Tue, Jan 18, 2022, at 21:55, Warren Weckesser wrote: > > > expr = 'z.real**2 + z.imag**2' > > > > > > z = generate_sample(n, rng) > > > > 🤔 If I duplicate the `z = ...` line, I get the fast result > > throughout. > > If, however, I use `generate_sample(1, rng)` (or any ot