Re: [Collections] Suppliers, Iterables, and Producers
Gary and Alex, Any thoughts on this? Claude On Wed, May 1, 2024 at 7:55 AM Claude Warren wrote: > Good suggestions. > > short-circuit. We could make this distinction by including it in the name: >> forEachUntil(Predicate ...), forEachUnless, ... > > > We need the unit name in the method name. All Bloom filters implement > IndexProducer and BitmapProducer and since they use Predicate method > parameters they will conflict. > > > I have opened a ticket [1] with the list of tasks, which I think is now: > >- Be clear that producers are like interruptible iterators with >predicate tests acting as a switch to short-circuit the iteration. >- Rename classes: > - CellConsumer to CellPredicate (?) > - Rename BitMap to BitMaps. >- Rename methods: > - Producer forEachX() to forEachUntil() > - The semantic nomenclature: > - Bitmaps are arrays of bits not a BitMaps object. > - Indexes are ints and not an instance of a Collection object. > - Cells are pairs of ints representing an index and a value. They > are not Pair<> objects. > - Producers iterate over collections of the object (Bitmap, Index, > Cell) applying a predicate to do work and stop the iteration early if > necessary. They are carriers/transporters of Bloom filter enabled bits. > They allow us to query the contents of the Bloom filter in an > implementation agnostic way. > > > In thinking about the term Producer, other terms could be used > Interrogator (sounds like you can add a query), Extractor might work. But > it has also come to mind that there is a "compute" series of methods in the > ConcurrentMap class. Perhaps the term we want is not "forEach", but > "process". The current form of usage is something like: > > IndexProducer ip = > ip.forEachIndex(idx -> someIntPredicate) > > We could change the name from XProducer to XProcessor, or XExtractor; and > the method to processXs. So the above code would look like: > > IndexExtractor ix = > ix.processIndexs(idx -> someIntPredicate) > > another example > > BitMapExtractor bx = . > bx.processBitMaps(bitmap -> someBitMapPredicate) > > Claude > > [1] https://issues.apache.org/jira/browse/COLLECTIONS-854 > > > On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory > wrote: > >> >> >> On 2024/04/30 14:33:47 Alex Herbert wrote: >> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory >> wrote: >> > >> > > Hi Claude, >> > > >> > > Thank you for the detailed reply :-) A few comments below. >> > > >> > > On 2024/04/30 06:29:38 Claude Warren wrote: >> > > > I will see if I can clarify the javadocs and make things clearer. >> > > > >> > > > What I think I specifically heard is: >> > > > >> > > >- Be clear that producers are fast fail iterators with predicate >> > > tests. >> > > >- Rename CellConsumer to CellPredicate (?) >> > > >> > > Agreed (as suggested by Albert) >> > > >> > > >- The semantic nomenclature: >> > > > - Bitmaps are arrays of bits not a BitMap object. >> > > > - Indexes are ints and not an instance of a Collection object. >> > > > - Cells are pairs of ints representing an index and a value. >> They >> > > > are not Pair<> objects. >> > > > - Producers iterate over collections of the object (Bitmap, >> Index, >> > > > Cell) applying a predicate to do work and stop the iteration >> early >> > > if >> > > > necessary. They are carriers/transporters of Bloom filter >> enabled >> > > bits. >> > > > They allow us to query the contents of the Bloom filter in an >> > > > implementation agnostic way. >> > > >> > > As you say naming is hard. The above is a great example and a good >> > > exercise I've gone through at work and in other FOSS projects: >> "Producers >> > > iterate over collections of the object...". In general when I see or >> write >> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run", >> you >> > > get the idea ;-) I know that either the class (or method) name is bad >> or >> > > the Javadoc/documentation is bad; not _wrong_, just bad in the sense >> that >> > > it's confusing (to me). >> > > >> > > I am not advocating for a specific change ATM but I want to discuss >> the >> > > option because it is possible the current name is not as good as it >> could >> > > be. It could end up as an acceptable compromise if we cannot use more >> Java >> > > friendly terms though. >> > > >> > > Whenever I see a class that implements a "forEach"-kind of method, I >> think >> > > "Iterable". >> > > >> > >> > Here we should think "Collection", or generally more than 1. In the Java >> > sense an Iterable is something you can walk through to the >> > end, possibly removing elements as you go using the Iterator interface. >> We >> > would not require supporting removal, and we want to control a >> > short-circuit. We could make this distinction by including it in the >> name: >> > forEachUntil(Predicate ...), forEachUnle
Re: [Collections] Suppliers, Iterables, and Producers
LGTM. Maybe the current PR (LGTM) should be merged first, Alex, how does that PR look to you? Gary On Fri, May 3, 2024, 11:44 AM Claude Warren wrote: > Gary and Alex, > > Any thoughts on this? > > Claude > > On Wed, May 1, 2024 at 7:55 AM Claude Warren wrote: > >> Good suggestions. >> >> short-circuit. We could make this distinction by including it in the name: >>> forEachUntil(Predicate ...), forEachUnless, ... >> >> >> We need the unit name in the method name. All Bloom filters implement >> IndexProducer and BitmapProducer and since they use Predicate method >> parameters they will conflict. >> >> >> I have opened a ticket [1] with the list of tasks, which I think is now: >> >>- Be clear that producers are like interruptible iterators with >>predicate tests acting as a switch to short-circuit the iteration. >>- Rename classes: >> - CellConsumer to CellPredicate (?) >> - Rename BitMap to BitMaps. >>- Rename methods: >> - Producer forEachX() to forEachUntil() >> - The semantic nomenclature: >> - Bitmaps are arrays of bits not a BitMaps object. >> - Indexes are ints and not an instance of a Collection object. >> - Cells are pairs of ints representing an index and a value. They >> are not Pair<> objects. >> - Producers iterate over collections of the object (Bitmap, Index, >> Cell) applying a predicate to do work and stop the iteration early if >> necessary. They are carriers/transporters of Bloom filter enabled >> bits. >> They allow us to query the contents of the Bloom filter in an >> implementation agnostic way. >> >> >> In thinking about the term Producer, other terms could be used >> Interrogator (sounds like you can add a query), Extractor might work. But >> it has also come to mind that there is a "compute" series of methods in the >> ConcurrentMap class. Perhaps the term we want is not "forEach", but >> "process". The current form of usage is something like: >> >> IndexProducer ip = >> ip.forEachIndex(idx -> someIntPredicate) >> >> We could change the name from XProducer to XProcessor, or XExtractor; and >> the method to processXs. So the above code would look like: >> >> IndexExtractor ix = >> ix.processIndexs(idx -> someIntPredicate) >> >> another example >> >> BitMapExtractor bx = . >> bx.processBitMaps(bitmap -> someBitMapPredicate) >> >> Claude >> >> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854 >> >> >> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory >> wrote: >> >>> >>> >>> On 2024/04/30 14:33:47 Alex Herbert wrote: >>> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory >>> wrote: >>> > >>> > > Hi Claude, >>> > > >>> > > Thank you for the detailed reply :-) A few comments below. >>> > > >>> > > On 2024/04/30 06:29:38 Claude Warren wrote: >>> > > > I will see if I can clarify the javadocs and make things clearer. >>> > > > >>> > > > What I think I specifically heard is: >>> > > > >>> > > >- Be clear that producers are fast fail iterators with predicate >>> > > tests. >>> > > >- Rename CellConsumer to CellPredicate (?) >>> > > >>> > > Agreed (as suggested by Albert) >>> > > >>> > > >- The semantic nomenclature: >>> > > > - Bitmaps are arrays of bits not a BitMap object. >>> > > > - Indexes are ints and not an instance of a Collection >>> object. >>> > > > - Cells are pairs of ints representing an index and a >>> value. They >>> > > > are not Pair<> objects. >>> > > > - Producers iterate over collections of the object (Bitmap, >>> Index, >>> > > > Cell) applying a predicate to do work and stop the iteration >>> early >>> > > if >>> > > > necessary. They are carriers/transporters of Bloom filter >>> enabled >>> > > bits. >>> > > > They allow us to query the contents of the Bloom filter in an >>> > > > implementation agnostic way. >>> > > >>> > > As you say naming is hard. The above is a great example and a good >>> > > exercise I've gone through at work and in other FOSS projects: >>> "Producers >>> > > iterate over collections of the object...". In general when I see or >>> write >>> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run", >>> you >>> > > get the idea ;-) I know that either the class (or method) name is >>> bad or >>> > > the Javadoc/documentation is bad; not _wrong_, just bad in the sense >>> that >>> > > it's confusing (to me). >>> > > >>> > > I am not advocating for a specific change ATM but I want to discuss >>> the >>> > > option because it is possible the current name is not as good as it >>> could >>> > > be. It could end up as an acceptable compromise if we cannot use >>> more Java >>> > > friendly terms though. >>> > > >>> > > Whenever I see a class that implements a "forEach"-kind of method, I >>> think >>> > > "Iterable". >>> > > >>> > >>> > Here we should think "Collection", or generally more than 1. In the >>> Java >>> > sense an Iterable is something yo
Re: Modularization of components
Apache Commons VFS is already broken up into a multi-module project, so I don't know what you're talking about; see https://search.maven.org/search?q=g:org.apache.commons%20AND%20a:commons-vfs2* The next release will be further modularized; see git master, It's a multi-module project, sure, but the modules are split along technical boundaries rather than functional. I didn't explain this well enough in my original message, so let me try that again. I thought VFS was an appropriate example because it contains *a lot* of functionality. This is by design, of course, and it's a useful thing. But most people who use VFS don't use all of the file system types (called Providers in VFS). There's FTP, SFTP, HTTP, and a bunch of others. My hypothetical suggestion was that if each of those providers were their own module, the dependency footprint would go down for many projects which use some but not all VFS Providers. IMO this would be a good thing for a variety of reasons. I don't know whether VFS is an appropriate example from a technical/feasibility perspective, and sure, backwards compatibility is a concern. But this was intended as an example to start a discussion about modularization within commons. (1) It's painful to build Apache Commons releases with Maven multi-module projects. It's NOT just building a jar file or set of jars. In comparison, building a mono-module is "simple". Is this a fundamental maven issue which is hard to solve? I haven't had too many issues with multi-module maven projects in the past, but I admit that my builds are a lot less complex than commons projects. (2) Always, always, always keep compatibility in mind How is this related? Any set of functionalities should be amenable to a modular design, unless there are cyclic dependencies (that signal bad design). I imagine that some (many?) projects aren't designed with modularization or pluginification in mind, and they end up doing something like Providers.register(FTP.class, HTTP.class, SFTP.class) to register all known implementations. Inverting that relationship isn't always easy to do after the fact. So I understand that this isn't necessarily a quick and easy project. Supporting JPMS is orthogonal to a modular (Maven) project (see [RNG], for example). True. I think in the long term both are desirable. One to reduce size & dependencies & build times; the other to better isolate components & implementation details. But if I had to choose one or the other, maven modularization would certainly be first on the list. In summary, IMO modularization should be a feature (and a default goal) of any new major release. I know that it is a lot of work (of course, cf. [Math] history) , but we should encourage contributions towards that goal. Thanks for the +1 on that, Gilles. I'm certainly not expecting any overnight changes on this. My goal was merely to start a discussion and see whether there's any interest for this in the community. Commons components are used incredibly widely. Which is obviously a great thing. But I see WARs getting fatter and fatter with transitive dependencies, and lots of classes remaining unused at runtime. In the age of continuous deployments and fast container startup times, making it easier to keep things slim seems like a useful goal. Best, Elric - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org