Re: [commons-io] question: file content merge sort and binary search
Hi Sure, I will support my code. I have a lot of other opensource projects, not so much free time. But this code will have the highest priority as Commons is used by thousands of developers. My other projects are used by hundreds of people. On Wed, Jul 19, 2023 at 8:28 PM Gilles Sadowski wrote: > Hi. > > Le mar. 18 juil. 2023 à 19:06, ssz a écrit : > > > > [...] > > > > We use this library as a second-level cache when parsing CIMXML RDF, this > > file-based cache contains triples, and also subject-type pairs (RDF > nodes). > > It is not csv. > > Also, I'm thinking about RDF-Graph implementation backed by fs. > > This is where the discussion, about whether "Commons" is the > right place, could start because... > > > > > So, I think we can always find ways to use this functionality. > > Placing it in some common place would save other developers time. > > ... placing it here implies that there will be people willing to stay > around and maintain ... > > > Implementation of file-sorting and searching is not so simple as it > sounds. > > ... this "not so simple" functionality. > > That's why we ask for use-cases: People who have a direct > interest in maintaining the functionality are more likely to help > fix it when the need arises. > IOW, I'd expect the contributors of a major functionality of > which they are the only known users to stay around in order > to support it. > > Regards, > Gilles > > > [...] > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [commons-io] question: file content merge sort and binary search
Hi. [Disclaimer: I'm not a user nor a developer of "Commons IO", so I'm not the most suitable for entertaining this conversation and, surely, I shouldn't be the only one...] Le jeu. 20 juil. 2023 à 10:33, ssz a écrit : > > Hi > Sure, I will support my code. > I have a lot of other opensource projects, not so much free time. I have to point out that the two sentences seem to neutralize themselves... > But this code will have the highest priority as Commons is used by > thousands of developers. That's what I've heard, but did not see much of a proof: We have no reliable way to know where "Commons" code is used. [This was a feature of open-source, but new regulations might make it a liability...] More importantly, if true, only a very tiny fraction of those users share their experience here, so that a quite small number of "regular" developers end up deciding what is useful. Almost inevitably, the selection is biased... > My other projects are used by hundreds of people. That's great, but would not convince (based on the lack of feedback) a committer here who is not among those users. The general problem is: 1. The active team is not getting bigger. 2. Those "regular" developers find they have already too much to handle. 3. Hence they tend to not easily accept contributions that are (or seem) likely to require time which they don't have. 4. This puts off would-be contributors that could have become part of the active team. 1. The active team is not getting bigger... So I'm trying to find other arguments... Which projects (ASF?) depend on your proposed contribution? Regards, Gilles >>> [...] - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [commons-io] question: file content merge sort and binary search
Note that Apache Ant already provides similar functionality: https://ant.apache.org/manual/api/org/apache/tools/ant/filters/SortFilter.html Gary On Thu, Jul 20, 2023, 07:38 Gilles Sadowski wrote: > Hi. > > [Disclaimer: I'm not a user nor a developer of "Commons IO", so > I'm not the most suitable for entertaining this conversation and, > surely, I shouldn't be the only one...] > > Le jeu. 20 juil. 2023 à 10:33, ssz a écrit : > > > > Hi > > Sure, I will support my code. > > I have a lot of other opensource projects, not so much free time. > > I have to point out that the two sentences seem to neutralize > themselves... > > > But this code will have the highest priority as Commons is used by > > thousands of developers. > > That's what I've heard, but did not see much of a proof: We have no > reliable way to know where "Commons" code is used. [This was a > feature of open-source, but new regulations might make it a liability...] > More importantly, if true, only a very tiny fraction of those users share > their experience here, so that a quite small number of "regular" > developers end up deciding what is useful. Almost inevitably, the > selection is biased... > > > My other projects are used by hundreds of people. > > That's great, but would not convince (based on the lack of feedback) > a committer here who is not among those users. > > The general problem is: > 1. The active team is not getting bigger. > 2. Those "regular" developers find they have already too much to > handle. > 3. Hence they tend to not easily accept contributions that are (or > seem) likely to require time which they don't have. > 4. This puts off would-be contributors that could have become part > of the active team. > 1. The active team is not getting bigger... > > So I'm trying to find other arguments... > Which projects (ASF?) depend on your proposed contribution? > > Regards, > Gilles > > >>> [...] > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [commons-io] question: file content merge sort and binary search
That's great! - But ANT is quite an ancient system, and it is now relatively unknown. - And it is relatively heavy. Maybe it's better to have single-function in the dedicated library or in well-known library with other useful features - It uses in-memory sorting: https://github.com/apache/ant/blob/master/src/main/org/apache/tools/ant/filters/SortFilter.java#L352 - What about binary search? On Thu, Jul 20, 2023 at 2:56 PM Gary Gregory wrote: > Note that Apache Ant already provides similar functionality: > > https://ant.apache.org/manual/api/org/apache/tools/ant/filters/SortFilter.html > > Gary > > On Thu, Jul 20, 2023, 07:38 Gilles Sadowski wrote: > > > Hi. > > > > [Disclaimer: I'm not a user nor a developer of "Commons IO", so > > I'm not the most suitable for entertaining this conversation and, > > surely, I shouldn't be the only one...] > > > > Le jeu. 20 juil. 2023 à 10:33, ssz a écrit : > > > > > > Hi > > > Sure, I will support my code. > > > I have a lot of other opensource projects, not so much free time. > > > > I have to point out that the two sentences seem to neutralize > > themselves... > > > > > But this code will have the highest priority as Commons is used by > > > thousands of developers. > > > > That's what I've heard, but did not see much of a proof: We have no > > reliable way to know where "Commons" code is used. [This was a > > feature of open-source, but new regulations might make it a liability...] > > More importantly, if true, only a very tiny fraction of those users share > > their experience here, so that a quite small number of "regular" > > developers end up deciding what is useful. Almost inevitably, the > > selection is biased... > > > > > My other projects are used by hundreds of people. > > > > That's great, but would not convince (based on the lack of feedback) > > a committer here who is not among those users. > > > > The general problem is: > > 1. The active team is not getting bigger. > > 2. Those "regular" developers find they have already too much to > > handle. > > 3. Hence they tend to not easily accept contributions that are (or > > seem) likely to require time which they don't have. > > 4. This puts off would-be contributors that could have become part > > of the active team. > > 1. The active team is not getting bigger... > > > > So I'm trying to find other arguments... > > Which projects (ASF?) depend on your proposed contribution? > > > > Regards, > > Gilles > > > > >>> [...] > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > >
Re: [commons-io] question: file content merge sort and binary search
For sure, you can sort in-memory, no problem here. I think we need to find a well-known library with non-in-memory sorting and binary searching, it is better relatively new (java8+) On Thu, Jul 20, 2023 at 3:08 PM ssz wrote: > That's great! > - But ANT is quite an ancient system, and it is now relatively unknown. > - And it is relatively heavy. Maybe it's better to have single-function in > the dedicated library or in well-known library with other useful features > - It uses in-memory sorting: > https://github.com/apache/ant/blob/master/src/main/org/apache/tools/ant/filters/SortFilter.java#L352 > - What about binary search? > > On Thu, Jul 20, 2023 at 2:56 PM Gary Gregory > wrote: > >> Note that Apache Ant already provides similar functionality: >> >> https://ant.apache.org/manual/api/org/apache/tools/ant/filters/SortFilter.html >> >> Gary >> >> On Thu, Jul 20, 2023, 07:38 Gilles Sadowski wrote: >> >> > Hi. >> > >> > [Disclaimer: I'm not a user nor a developer of "Commons IO", so >> > I'm not the most suitable for entertaining this conversation and, >> > surely, I shouldn't be the only one...] >> > >> > Le jeu. 20 juil. 2023 à 10:33, ssz a écrit : >> > > >> > > Hi >> > > Sure, I will support my code. >> > > I have a lot of other opensource projects, not so much free time. >> > >> > I have to point out that the two sentences seem to neutralize >> > themselves... >> > >> > > But this code will have the highest priority as Commons is used by >> > > thousands of developers. >> > >> > That's what I've heard, but did not see much of a proof: We have no >> > reliable way to know where "Commons" code is used. [This was a >> > feature of open-source, but new regulations might make it a >> liability...] >> > More importantly, if true, only a very tiny fraction of those users >> share >> > their experience here, so that a quite small number of "regular" >> > developers end up deciding what is useful. Almost inevitably, the >> > selection is biased... >> > >> > > My other projects are used by hundreds of people. >> > >> > That's great, but would not convince (based on the lack of feedback) >> > a committer here who is not among those users. >> > >> > The general problem is: >> > 1. The active team is not getting bigger. >> > 2. Those "regular" developers find they have already too much to >> > handle. >> > 3. Hence they tend to not easily accept contributions that are (or >> > seem) likely to require time which they don't have. >> > 4. This puts off would-be contributors that could have become part >> > of the active team. >> > 1. The active team is not getting bigger... >> > >> > So I'm trying to find other arguments... >> > Which projects (ASF?) depend on your proposed contribution? >> > >> > Regards, >> > Gilles >> > >> > >>> [...] >> > >> > - >> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> > For additional commands, e-mail: dev-h...@commons.apache.org >> > >> > >> >
Re: [commons-io] question: file content merge sort and binary search
> Which projects (ASF?) depend on your proposed contribution? Currently only business projects, not opensource. I'm thinking about RDF-Graph backed by FS. If I implement this solution I will raise an issue with the Apache Jena team. The original library will probably support multiplatform, in which case the jvm part can use Commons. + I'm thinking about more use-cases. Sorting & searching might be useful for working with logs. + I have asked colleagues, maybe we will find more examples.. On Thu, Jul 20, 2023 at 2:37 PM Gilles Sadowski wrote: > Hi. > > [Disclaimer: I'm not a user nor a developer of "Commons IO", so > I'm not the most suitable for entertaining this conversation and, > surely, I shouldn't be the only one...] > > Le jeu. 20 juil. 2023 à 10:33, ssz a écrit : > > > > Hi > > Sure, I will support my code. > > I have a lot of other opensource projects, not so much free time. > > I have to point out that the two sentences seem to neutralize > themselves... > > > But this code will have the highest priority as Commons is used by > > thousands of developers. > > That's what I've heard, but did not see much of a proof: We have no > reliable way to know where "Commons" code is used. [This was a > feature of open-source, but new regulations might make it a liability...] > More importantly, if true, only a very tiny fraction of those users share > their experience here, so that a quite small number of "regular" > developers end up deciding what is useful. Almost inevitably, the > selection is biased... > > > My other projects are used by hundreds of people. > > That's great, but would not convince (based on the lack of feedback) > a committer here who is not among those users. > > The general problem is: > 1. The active team is not getting bigger. > 2. Those "regular" developers find they have already too much to > handle. > 3. Hence they tend to not easily accept contributions that are (or > seem) likely to require time which they don't have. > 4. This puts off would-be contributors that could have become part > of the active team. > 1. The active team is not getting bigger... > > So I'm trying to find other arguments... > Which projects (ASF?) depend on your proposed contribution? > > Regards, > Gilles > > >>> [...] > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [commons-io] question: file content merge sort and binary search
I already commented elsewhere (can't recall if it was on this list or github) on why this is not a good fit for IO. Too much like a database operation, IO is a lower level library, and so on. IO is not a kitchen sink for anything related to IO. Like Lang, it was initially conceived as a library for low level operation that could be imagined to be in the JDK. It's actually perfectly fine that the JDK does not contain such operations as it should not be a kitchen sink either, but only provide primitive operations. IO also does not contain CSV operations, that's in Commons CSV. IO also does not contain high-level operations, projects like Apache Tika, Lucene, and Solr do that. This still feels like a component that provides one narrow purpose that should live in it's own project, which it already does, yours, and also happens to already exist within Apache in Ant and maybe elsewhere (Tika, Lucene, Solr, Spark?). So I think you are going about this backward: Instead of keeping on arguing to shove your library in IO, I would survey other projects (see above) to see if common functionality could be extracted and more importantly if these projects would then be interested in relying on a new library (where it may reside) instead of maintaining their own code. It does matter if the common code is derived from your library or existing projects (assuming proper licensing), what matters is improving the Apache ecosystem, and FOSS in general. If you are interested in I/O code and this interest matches the Commons IO component of the Commons project, then great, there are some recent and not so recent Jira tickets that could use some attention. Gary On Thu, Jul 20, 2023, 08:09 ssz wrote: > That's great! > - But ANT is quite an ancient system, and it is now relatively unknown. > - And it is relatively heavy. Maybe it's better to have single-function in > the dedicated library or in well-known library with other useful features > - It uses in-memory sorting: > > https://github.com/apache/ant/blob/master/src/main/org/apache/tools/ant/filters/SortFilter.java#L352 > - What about binary search? > > On Thu, Jul 20, 2023 at 2:56 PM Gary Gregory > wrote: > > > Note that Apache Ant already provides similar functionality: > > > > > https://ant.apache.org/manual/api/org/apache/tools/ant/filters/SortFilter.html > > > > Gary > > > > On Thu, Jul 20, 2023, 07:38 Gilles Sadowski > wrote: > > > > > Hi. > > > > > > [Disclaimer: I'm not a user nor a developer of "Commons IO", so > > > I'm not the most suitable for entertaining this conversation and, > > > surely, I shouldn't be the only one...] > > > > > > Le jeu. 20 juil. 2023 à 10:33, ssz a écrit : > > > > > > > > Hi > > > > Sure, I will support my code. > > > > I have a lot of other opensource projects, not so much free time. > > > > > > I have to point out that the two sentences seem to neutralize > > > themselves... > > > > > > > But this code will have the highest priority as Commons is used by > > > > thousands of developers. > > > > > > That's what I've heard, but did not see much of a proof: We have no > > > reliable way to know where "Commons" code is used. [This was a > > > feature of open-source, but new regulations might make it a > liability...] > > > More importantly, if true, only a very tiny fraction of those users > share > > > their experience here, so that a quite small number of "regular" > > > developers end up deciding what is useful. Almost inevitably, the > > > selection is biased... > > > > > > > My other projects are used by hundreds of people. > > > > > > That's great, but would not convince (based on the lack of feedback) > > > a committer here who is not among those users. > > > > > > The general problem is: > > > 1. The active team is not getting bigger. > > > 2. Those "regular" developers find they have already too much to > > > handle. > > > 3. Hence they tend to not easily accept contributions that are (or > > > seem) likely to require time which they don't have. > > > 4. This puts off would-be contributors that could have become part > > > of the active team. > > > 1. The active team is not getting bigger... > > > > > > So I'm trying to find other arguments... > > > Which projects (ASF?) depend on your proposed contribution? > > > > > > Regards, > > > Gilles > > > > > > >>> [...] > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > >
Re: [commons-io] question: file content merge sort and binary search
Le jeu. 20 juil. 2023 à 15:18, Gary Gregory a écrit : > > [...] Instead of keeping on arguing to shove your library in [...] If we could stop the brutal language... (?) The OP asked politely, and was ready to wait indefinitely (unsubscribing from this ML) for an answer; I just wanted to make sure that there was no missed opportunities (on both ends). Regards, Gilles > [...] - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] JIRA ticket for modularisation of all 14 legacy packages
Hello. Le mer. 19 juil. 2023 à 12:59, Dimitrios Efthymiou a écrit : > > [...] > 1-- [...] > 2--As for the atomic refactoring and feature branch, well, > unless someone moves the Variance class (you said that someone > is doing it now) and the distance package and whatever other > dependencies exist within the ml.clustering package, > there can be no moving of the remaining clustering classes > to the new clustering module, right? > 3-- [...] > 4--I don't know how to continue with the clustering modularisation > given all those dependencies. Maybe I shouldn't have started this, > because now I am stuck You aren't. The dependencies found in "o.a.c.math4.legacy.ml.clustering" are (1) "MatrixUtils" and "RealMatrix" (from the "linear" package) (2) "Variance" and "VectorialMean" (from the "stat" package) (1) creates the coupling for a single method ("getMembershipMatrix") that isn't called from anywhere (not even the unit tests). Remove the method and the dependency towards "linear" vanishes. (2) "Variance" can be replaced with a temporary implementation like (untested copy/paste from "SecondMoment" and "FirstMoment"): ---CUT--- class Variance { private int n = 0; private double dev = 0; private double nDev = 0; private double m2 = 0; private double m1 = 0; void increment(final double d) { ++n; dev = d - m1; nDev = dev / n; m1 += nDev; m2 += ((double) n - 1) * dev * nDev; } double get() { return m2; } } ---CUT--- Then, creating a private copy of class "VectorialMean" (replacing, in the copy, CM exceptions with JDK ones) would complete the removal of the dependency towards the "stat" package. And so on. Regards, Gilles > > > [...] - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [commons-io] question: file content merge sort and binary search
> I already commented elsewhere on why this is not a good fit for IO My bad, I did not realise that discussion is closed with final resolution. I read "what do others think?" and thought that it is about continuing discussion. > Instead of keeping on arguing to shove your library in IO I thought about this discussion as a discussion, sorry if I am annoying you. I always think that being open to discussion and finding solutions, talking to other people is a good strategy: the probability to find a good solution is higher in this case. And sorry, but, honestly, I still don't see any good arguments (or few). The last one, about ANT, seems to me (just in my opinion, maybe wrong) a not very good argument. And I tried to explain why, maybe not so clearly; since you mention ANT again, my arguments are definitely not clear enough; Apache ANT is great, but there is no such functionality, in my opinion. Also, if I remember correctly, I tried to explain why Commons CSV is not what my solution is doing. If you believe that in-memory sort or csv parsing is what, roughly speaking, my solution is doing (or these solutions can be compared) then we have different points of view, and there would not be consensus. And I think this is also my fault. > I would survey other projects (see above) to see if common functionality could be extracted and more importantly if these projects would then be interested in relying on a new library (where it may reside) instead of maintaining their own code. > ... > and maybe elsewhere (Tika, Lucene, Solr, Spark?) Other Apache libraries - great! I will think about it. Thank you for that point. It's really great. I already said above that I am not insisting on Commons IO. I am just looking for alternatives and want to hear other people's opinions. Maybe some other Commons or non-Commons library, I am not familiar with the whole ecosystem. Maybe someone else could give me a hint. And you actually did it, twice about ANT and several times about Commons CSV, I really appreciate it. That was the one of the reasons why I wrote here. JDK itself supports Files and IO operations, also it supports sorting and binary search. Proposed functionality is out of JDK's scope, for sure, but it seemed to me that this is close to what JDK offers. Obviously I was wrong. I think this discussion can be finished. If someone else has something to add please feel free to email me directly. Again, sorry for disturbing. I didn't want to bother anyone, and didn't realise that it is what I'm doing. Thanks for taking the time and for trying to explain to me where I'm wrong. Have a nice day! One more thing > Too much like a database operation, IO is a lower level library, and so on. One of my colleagues also thinks that Commons-IO is not quite a suitable place for this proposition. So I'm totally sure that this is really not a suitable place. On Thu, Jul 20, 2023 at 4:16 PM Gary Gregory wrote: > I already commented elsewhere (can't recall if it was on this list or > github) on why this is not a good fit for IO. Too much like a database > operation, IO is a lower level library, and so on. IO is not a kitchen sink > for anything related to IO. Like Lang, it was initially conceived as a > library for low level operation that could be imagined to be in the JDK. > It's actually perfectly fine that the JDK does not contain such operations > as it should not be a kitchen sink either, but only provide primitive > operations. IO also does not contain CSV operations, that's in Commons CSV. > IO also does not contain high-level operations, projects like Apache Tika, > Lucene, and Solr do that. This still feels like a component that provides > one narrow purpose that should live in it's own project, which it already > does, yours, and also happens to already exist within Apache in Ant and > maybe elsewhere (Tika, Lucene, Solr, Spark?). So I think you are going > about this backward: Instead of keeping on arguing to shove your library in > IO, I would survey other projects (see above) to see if common > functionality could be extracted and more importantly if these projects > would then be interested in relying on a new library (where it may reside) > instead of maintaining their own code. It does matter if the common code is > derived from your library or existing projects (assuming proper licensing), > what matters is improving the Apache ecosystem, and FOSS in general. If you > are interested in I/O code and this interest matches the Commons IO > component of the Commons project, then great, there are some recent and not > so recent Jira tickets that could use some attention. > > Gary > > On Thu, Jul 20, 2023, 08:09 ssz wrote: > > > That's great! > > - But ANT is quite an ancient system, and it is now relatively unknown. > > - And it is relatively heavy. Maybe it's better to have single-function > in > > the dedicated library or in well-known library with other useful features > > - It uses in-memory sorting: > > >
Re: [math] JIRA ticket for modularisation of all 14 legacy packages
are you saying that in order to move the ml.clustering classes to the new clustering module, I can take all the dependencies to classes and their transitive dependencies, copy them to the new clustering module, but only keep in them the minimum required code for the new module to operate? On Thu, 20 Jul 2023 at 15:27, Gilles Sadowski wrote: > Hello. > > Le mer. 19 juil. 2023 à 12:59, Dimitrios Efthymiou > a écrit : > > > > [...] > > 1-- [...] > > 2--As for the atomic refactoring and feature branch, well, > > unless someone moves the Variance class (you said that someone > > is doing it now) and the distance package and whatever other > > dependencies exist within the ml.clustering package, > > there can be no moving of the remaining clustering classes > > to the new clustering module, right? > > 3-- [...] > > 4--I don't know how to continue with the clustering modularisation > > given all those dependencies. Maybe I shouldn't have started this, > > because now I am stuck > > You aren't. > > The dependencies found in "o.a.c.math4.legacy.ml.clustering" are > (1) "MatrixUtils" and "RealMatrix" (from the "linear" package) > (2) "Variance" and "VectorialMean" (from the "stat" package) > > (1) creates the coupling for a single method ("getMembershipMatrix") > that isn't called from anywhere (not even the unit tests). Remove the > method and the dependency towards "linear" vanishes. > > (2) "Variance" can be replaced with a temporary implementation like > (untested copy/paste from "SecondMoment" and "FirstMoment"): > ---CUT--- > class Variance { > private int n = 0; > private double dev = 0; > private double nDev = 0; > private double m2 = 0; > private double m1 = 0; > > void increment(final double d) { > ++n; > dev = d - m1; > nDev = dev / n; > m1 += nDev; > m2 += ((double) n - 1) * dev * nDev; > } > > double get() { > return m2; > } > } > ---CUT--- > Then, creating a private copy of class "VectorialMean" (replacing, > in the copy, CM exceptions with JDK ones) would complete the > removal of the dependency towards the "stat" package. > > And so on. > > Regards, > Gilles > > > > > [...] > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
[ANNOUNCE] Apache Commons FileUpload 2.0.0-M1
The Apache Commons FileUpload Parent team is pleased to announce the release of Apache Commons FileUpload Parent 2.0.0-M1. The Apache Commons FileUpload component provides a simple yet flexible means of adding support for multipart file upload functionality to servlets and web applications. This version requires Java 8 or later and supports both the jakarta and javax namespaces. For complete information on Apache Commons FileUpload, including instructions on how to submit bug reports, patches, or suggestions for improvement, see the Apache Commons FileUpload Parent website: https://commons.apache.org/proper/commons-fileupload/ Download it from: https://commons.apache.org/proper/commons-fileupload/download_fileupload.cgi Enjoy, Gary Gregory Apache Commons, PMC - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] JIRA ticket for modularisation of all 14 legacy packages
Le jeu. 20 juil. 2023 à 21:19, Dimitrios Efthymiou a écrit : > > are you saying that in order to move the ml.clustering classes > to the new clustering module, I can take all the dependencies to classes > and their transitive dependencies, copy them to the new clustering module, > but only keep in them the minimum required code for the new module to > operate? To some extent, yes. But the main point is the refactoring. For example, your wouldn't copy the code from "linear" after seeing that one can do without it. But also note in this case that exposing a "double[][]" instead may not be the best choice for a long-term API. As was mentioned, it would be worth looking at how other libraries provide similar functionality. The module should solve all issues mentioned in JIRA; it's not just copying the classes and removing dependencies. Gilles > > On Thu, 20 Jul 2023 at 15:27, Gilles Sadowski wrote: > > > Hello. > > > > Le mer. 19 juil. 2023 à 12:59, Dimitrios Efthymiou > > a écrit : > > > > > > [...] > > > 1-- [...] > > > 2--As for the atomic refactoring and feature branch, well, > > > unless someone moves the Variance class (you said that someone > > > is doing it now) and the distance package and whatever other > > > dependencies exist within the ml.clustering package, > > > there can be no moving of the remaining clustering classes > > > to the new clustering module, right? > > > 3-- [...] > > > 4--I don't know how to continue with the clustering modularisation > > > given all those dependencies. Maybe I shouldn't have started this, > > > because now I am stuck > > > > You aren't. > > > > The dependencies found in "o.a.c.math4.legacy.ml.clustering" are > > (1) "MatrixUtils" and "RealMatrix" (from the "linear" package) > > (2) "Variance" and "VectorialMean" (from the "stat" package) > > > > (1) creates the coupling for a single method ("getMembershipMatrix") > > that isn't called from anywhere (not even the unit tests). Remove the > > method and the dependency towards "linear" vanishes. > > > > (2) "Variance" can be replaced with a temporary implementation like > > (untested copy/paste from "SecondMoment" and "FirstMoment"): > > ---CUT--- > > class Variance { > > private int n = 0; > > private double dev = 0; > > private double nDev = 0; > > private double m2 = 0; > > private double m1 = 0; > > > > void increment(final double d) { > > ++n; > > dev = d - m1; > > nDev = dev / n; > > m1 += nDev; > > m2 += ((double) n - 1) * dev * nDev; > > } > > > > double get() { > > return m2; > > } > > } > > ---CUT--- > > Then, creating a private copy of class "VectorialMean" (replacing, > > in the copy, CM exceptions with JDK ones) would complete the > > removal of the dependency towards the "stat" package. > > > > And so on. > > > > Regards, > > Gilles > > > > > > > [...] - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] JIRA ticket for modularisation of all 14 legacy packages
Unfortunately, i just tried a simple move, but there are deoendencies on 3 distance classes and about 12 stats classes, because there are transitive dependencies. Not to mention the respective test classes. On Thu, 20 Jul 2023, 22:22 Gilles Sadowski, wrote: > Le jeu. 20 juil. 2023 à 21:19, Dimitrios Efthymiou > a écrit : > > > > are you saying that in order to move the ml.clustering classes > > to the new clustering module, I can take all the dependencies to classes > > and their transitive dependencies, copy them to the new clustering > module, > > but only keep in them the minimum required code for the new module to > > operate? > > To some extent, yes. But the main point is the refactoring. For example, > your wouldn't copy the code from "linear" after seeing that one can do > without it. But also note in this case that exposing a "double[][]" > instead > may not be the best choice for a long-term API. As was mentioned, it > would be worth looking at how other libraries provide similar > functionality. > The module should solve all issues mentioned in JIRA; it's not just copying > the classes and removing dependencies. > > Gilles > > > > > On Thu, 20 Jul 2023 at 15:27, Gilles Sadowski > wrote: > > > > > Hello. > > > > > > Le mer. 19 juil. 2023 à 12:59, Dimitrios Efthymiou > > > a écrit : > > > > > > > > [...] > > > > 1-- [...] > > > > 2--As for the atomic refactoring and feature branch, well, > > > > unless someone moves the Variance class (you said that someone > > > > is doing it now) and the distance package and whatever other > > > > dependencies exist within the ml.clustering package, > > > > there can be no moving of the remaining clustering classes > > > > to the new clustering module, right? > > > > 3-- [...] > > > > 4--I don't know how to continue with the clustering modularisation > > > > given all those dependencies. Maybe I shouldn't have started this, > > > > because now I am stuck > > > > > > You aren't. > > > > > > The dependencies found in "o.a.c.math4.legacy.ml.clustering" are > > > (1) "MatrixUtils" and "RealMatrix" (from the "linear" package) > > > (2) "Variance" and "VectorialMean" (from the "stat" package) > > > > > > (1) creates the coupling for a single method ("getMembershipMatrix") > > > that isn't called from anywhere (not even the unit tests). Remove the > > > method and the dependency towards "linear" vanishes. > > > > > > (2) "Variance" can be replaced with a temporary implementation like > > > (untested copy/paste from "SecondMoment" and "FirstMoment"): > > > ---CUT--- > > > class Variance { > > > private int n = 0; > > > private double dev = 0; > > > private double nDev = 0; > > > private double m2 = 0; > > > private double m1 = 0; > > > > > > void increment(final double d) { > > > ++n; > > > dev = d - m1; > > > nDev = dev / n; > > > m1 += nDev; > > > m2 += ((double) n - 1) * dev * nDev; > > > } > > > > > > double get() { > > > return m2; > > > } > > > } > > > ---CUT--- > > > Then, creating a private copy of class "VectorialMean" (replacing, > > > in the copy, CM exceptions with JDK ones) would complete the > > > removal of the dependency towards the "stat" package. > > > > > > And so on. > > > > > > Regards, > > > Gilles > > > > > > > > > [...] > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [math] JIRA ticket for modularisation of all 14 legacy packages
Le jeu. 20 juil. 2023 à 23:28, Dimitrios Efthymiou a écrit : > > Unfortunately, i just tried a simple move, but there are deoendencies on 3 > distance classes But... those classes are only used by the "clustering" package; they are not external dependencies; they would go into the new module as first-class citizens. [A follow-up issue would be whether those distance classes are worth sharing with the other machine learning utility in the module "commons-math-neuralnet".] > and about 12 stats classes, Which ones? > because there are transitive > dependencies. Not to mention the respective test classes. Well, of course there is work to do to fix all aspects of the move... Gilles > > > > > > > [...] - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] JIRA ticket for modularisation of all 14 legacy packages
I am not home now, but these are the ones i remember from looking at the GitHub repo: AbstractStorelessUnivariateStatistic.java AbstractUnivariateStatistic.java WeightedEvaluation.java Sum.java FirstMoment.java Mean.java SecondMoment.java StandardDeviation.java Variance.java VectorialMean.java On Thu, 20 Jul 2023, 22:40 Gilles Sadowski, wrote: > Le jeu. 20 juil. 2023 à 23:28, Dimitrios Efthymiou > a écrit : > > > > Unfortunately, i just tried a simple move, but there are deoendencies on > 3 > > distance classes > > But... those classes are only used by the "clustering" package; they > are not external dependencies; they would go into the new module > as first-class citizens. > [A follow-up issue would be whether those distance classes are > worth sharing with the other machine learning utility in the module > "commons-math-neuralnet".] > > > and about 12 stats classes, > > Which ones? > > > because there are transitive > > dependencies. Not to mention the respective test classes. > > Well, of course there is work to do to fix all aspects of the move... > > Gilles > > > > > > > > > [...] > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [math] JIRA ticket for modularisation of all 14 legacy packages
I am not home now, but these are the ones i remember from looking at the GitHub repo: AbstractStorelessUnivariateStatistic.java AbstractUnivariateStatistic.java WeightedEvaluation.java Sum.java FirstMoment.java Mean.java SecondMoment.java StandardDeviation.java Variance.java VectorialMean.java On Thu, 20 Jul 2023, 22:40 Gilles Sadowski, wrote: > Le jeu. 20 juil. 2023 à 23:28, Dimitrios Efthymiou > a écrit : > > > > Unfortunately, i just tried a simple move, but there are deoendencies on > 3 > > distance classes > > But... those classes are only used by the "clustering" package; they > are not external dependencies; they would go into the new module > as first-class citizens. > [A follow-up issue would be whether those distance classes are > worth sharing with the other machine learning utility in the module > "commons-math-neuralnet".] > > > and about 12 stats classes, > > Which ones? > > > because there are transitive > > dependencies. Not to mention the respective test classes. > > Well, of course there is work to do to fix all aspects of the move... > > Gilles > > > > > > > > > [...] > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [math] JIRA ticket for modularisation of all 14 legacy packages
Le jeu. 20 juil. 2023 à 23:46, Dimitrios Efthymiou a écrit : > > I am not home now, but these are the ones i remember from looking at the > GitHub repo: > AbstractStorelessUnivariateStatistic.java > AbstractUnivariateStatistic.java > WeightedEvaluation.java > Sum.java > FirstMoment.java > Mean.java > SecondMoment.java > StandardDeviation.java > Variance.java > VectorialMean.java Please try what I suggested in my previous message (about 30 lines of code that could be copied into an "internal" package to get the same functionality as the last 2 classes above). Then we can continue discussing (on JIRA) on how to move around roadblocks actually encountered. [You can create a JIRA "sub-task" for each specific problem.] Gilles > > > On Thu, 20 Jul 2023, 22:40 Gilles Sadowski, wrote: > > > Le jeu. 20 juil. 2023 à 23:28, Dimitrios Efthymiou > > a écrit : > > > > > > Unfortunately, i just tried a simple move, but there are deoendencies on > > 3 > > > distance classes > > > > But... those classes are only used by the "clustering" package; they > > are not external dependencies; they would go into the new module > > as first-class citizens. > > [A follow-up issue would be whether those distance classes are > > worth sharing with the other machine learning utility in the module > > "commons-math-neuralnet".] > > > > > and about 12 stats classes, > > > > Which ones? > > > > > because there are transitive > > > dependencies. Not to mention the respective test classes. > > > > Well, of course there is work to do to fix all aspects of the move... > > > > Gilles > > > > > > > > > > > [...] - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [commons-pool] branch POOL_2_X updated: Add Duration named APIs and deprecate old APIs.
Hi Phil, There can get some ambiguity for me when I read code like (unless you know the API inside and out and use it on the daily): somePoolThing.getFooTime() Some of our methods return a Duration and others an Instant, so there, I think the type in the method name makes sense. Then, for a bit of symmetry, it's nice if the setter and getter names are the same (minus the set/get prefix obv). Gary On Tue, Jul 18, 2023, 17:38 Phil Steitz wrote: > I like changing the arguments to be Duration, but that has already been > done. What I am talking about is the method names, which is a second > change that I don't think is necessary. For example, unless I am missing > something, before this commit, we had setTimeBetweenEvctionRuns(Duration) > and that is being deprecated and changed to > setDurationBetweenEvictionRuns(Duration). I actually think the first name > is better. It is natural English and it is not a good practice to put type > names in method names, IMO. But more importantly, anyone who started using > this will have to change - in my mind needlessly. Am I misreading the > diff? > > I would move to at least Java 11 for 3.0 and I would not keep JMX, but it > would be good to ask on the user list if anyone is depending on it / would > be seriously harmed if it goes away. I think the Tomcat JMX may depend on > it, but I am not sure what the state of play there is vis a vis JMX. > > Phil > > On Tue, Jul 18, 2023 at 2:16 PM Gary Gregory > wrote: > > > This will make it smoother to port to 3.0 where there will be no long > time > > APIs, the Duration type is used throughout (unless Instant is > appropriate). > > I have most of the deprecated methods removed locally and will push in a > > day or two. > > > > What remains: > > - do we want to keep the JMX code? > > - should 3.0 use Java 11 or 17? > > > > Gary > > > > On Tue, Jul 18, 2023, 16:41 Phil Steitz wrote: > > > > > Why exactly do we need to s/Time/Duration in all of the method names? > > > Duration is a measure of time. I don't get why this is necessary and > it > > > will force people to change (eventually). I was +1 to get rid of the > > > "millis" in the names, but this change seems needless to me. Also, > there > > > are still quite a few places where the text of the javadoc refers to > > > milliseconds. Was this discussed before and I missed it? > > > > > > Phil > > > > > > On Tue, Jul 18, 2023 at 7:25 AM wrote: > > > > > > > This is an automated email from the ASF dual-hosted git repository. > > > > > > > > ggregory pushed a commit to branch POOL_2_X > > > > in repository https://gitbox.apache.org/repos/asf/commons-pool.git > > > > > > > > > > > > The following commit(s) were added to refs/heads/POOL_2_X by this > push: > > > > new 9d2f4af1 Add Duration named APIs and deprecate old APIs. > > > > 9d2f4af1 is described below > > > > > > > > commit 9d2f4af14dde121271c1bb862d4b1f236072eb2a > > > > Author: Gary Gregory > > > > AuthorDate: Tue Jul 18 10:25:03 2023 -0400 > > > > > > > > Add Duration named APIs and > > > > deprecate old APIs. > > > > > > > > Eases migration to 3.0.0 > > > > --- > > > > .../commons/pool2/impl/BaseGenericObjectPool.java | 102 > > > > + > > > > .../commons/pool2/impl/BaseObjectPoolConfig.java | 32 ++- > > > > .../commons/pool2/impl/GenericKeyedObjectPool.java | 2 +- > > > > .../commons/pool2/impl/GenericObjectPool.java | 2 +- > > > > .../apache/commons/pool2/ObjectPoolIssue326.java | 2 + > > > > .../java/org/apache/commons/pool2/PoolTest.java| 2 +- > > > > .../pool2/impl/TestAbandonedKeyedObjectPool.java | 2 +- > > > > .../pool2/impl/TestAbandonedObjectPool.java| 2 +- > > > > .../pool2/impl/TestGenericKeyedObjectPool.java | 5 +- > > > > .../commons/pool2/impl/TestGenericObjectPool.java | 6 +- > > > > .../impl/TestGenericObjectPoolClassLoaders.java| 4 +- > > > > .../TestGenericObjectPoolFactoryCreateFailure.java | 3 +- > > > > 12 files changed, 131 insertions(+), 33 deletions(-) > > > > > > > > diff --git > > > > > > a/src/main/java/org/apache/commons/pool2/impl/BaseGenericObjectPool.java > > > > > > b/src/main/java/org/apache/commons/pool2/impl/BaseGenericObjectPool.java > > > > index 4277ce86..fc95ba32 100644 > > > > --- > > > > > > a/src/main/java/org/apache/commons/pool2/impl/BaseGenericObjectPool.java > > > > +++ > > > > > > b/src/main/java/org/apache/commons/pool2/impl/BaseGenericObjectPool.java > > > > @@ -867,7 +867,7 @@ public abstract class BaseGenericObjectPool > > > extends > > > > BaseObject implements Aut > > > > /** > > > > * Gets the minimum amount of time an object may sit idle in the > > > pool > > > > * before it is eligible for eviction by the idle object evictor > > (if > > > > any - > > > > - * see {@link #setTimeBetweenEvictionRuns(Duration)}). When > > > > non-positive, > > > > + * see {@link #setDurationBetweenEvictionRuns(Duration)}). When > > >
Re: [commons-pool] branch POOL_2_X updated: Add Duration named APIs and deprecate old APIs.
OK, I get it now. I am just worried about making people change things twice - first to a method that takes Duration rather than long and then to another one that does the same thing but now has a different name. How about leaving the 2.x names as is, not adding new methods with the same signatures but different names, keeping the old deprecated millis ones marked deprecated and make the change to more easily disambiguate names in 3.x? Otherwise, the 2.x code gets very cluttered and it also forces us to finalize the 3.0 names now. We made a bunch of name changes in 2.0 and I think it is fair to do some of that in 3.0 as well. On Thu, Jul 20, 2023 at 4:51 PM Gary Gregory wrote: > Hi Phil, > > There can get some ambiguity for me when I read code like (unless you know > the API inside and out and use it on the daily): > > somePoolThing.getFooTime() > > Some of our methods return a Duration and others an Instant, so there, I > think the type in the method name makes sense. Then, for a bit of symmetry, > it's nice if the setter and getter names are the same (minus the set/get > prefix obv). > > Gary > > > On Tue, Jul 18, 2023, 17:38 Phil Steitz wrote: > > > I like changing the arguments to be Duration, but that has already been > > done. What I am talking about is the method names, which is a second > > change that I don't think is necessary. For example, unless I am missing > > something, before this commit, we had setTimeBetweenEvctionRuns(Duration) > > and that is being deprecated and changed to > > setDurationBetweenEvictionRuns(Duration). I actually think the first > name > > is better. It is natural English and it is not a good practice to put > type > > names in method names, IMO. But more importantly, anyone who started > using > > this will have to change - in my mind needlessly. Am I misreading the > > diff? > > > > I would move to at least Java 11 for 3.0 and I would not keep JMX, but it > > would be good to ask on the user list if anyone is depending on it / > would > > be seriously harmed if it goes away. I think the Tomcat JMX may depend > on > > it, but I am not sure what the state of play there is vis a vis JMX. > > > > Phil > > > > On Tue, Jul 18, 2023 at 2:16 PM Gary Gregory > > wrote: > > > > > This will make it smoother to port to 3.0 where there will be no long > > time > > > APIs, the Duration type is used throughout (unless Instant is > > appropriate). > > > I have most of the deprecated methods removed locally and will push in > a > > > day or two. > > > > > > What remains: > > > - do we want to keep the JMX code? > > > - should 3.0 use Java 11 or 17? > > > > > > Gary > > > > > > On Tue, Jul 18, 2023, 16:41 Phil Steitz wrote: > > > > > > > Why exactly do we need to s/Time/Duration in all of the method names? > > > > Duration is a measure of time. I don't get why this is necessary and > > it > > > > will force people to change (eventually). I was +1 to get rid of the > > > > "millis" in the names, but this change seems needless to me. Also, > > there > > > > are still quite a few places where the text of the javadoc refers to > > > > milliseconds. Was this discussed before and I missed it? > > > > > > > > Phil > > > > > > > > On Tue, Jul 18, 2023 at 7:25 AM wrote: > > > > > > > > > This is an automated email from the ASF dual-hosted git repository. > > > > > > > > > > ggregory pushed a commit to branch POOL_2_X > > > > > in repository https://gitbox.apache.org/repos/asf/commons-pool.git > > > > > > > > > > > > > > > The following commit(s) were added to refs/heads/POOL_2_X by this > > push: > > > > > new 9d2f4af1 Add Duration named APIs and deprecate old APIs. > > > > > 9d2f4af1 is described below > > > > > > > > > > commit 9d2f4af14dde121271c1bb862d4b1f236072eb2a > > > > > Author: Gary Gregory > > > > > AuthorDate: Tue Jul 18 10:25:03 2023 -0400 > > > > > > > > > > Add Duration named APIs and > > > > > deprecate old APIs. > > > > > > > > > > Eases migration to 3.0.0 > > > > > --- > > > > > .../commons/pool2/impl/BaseGenericObjectPool.java | 102 > > > > > + > > > > > .../commons/pool2/impl/BaseObjectPoolConfig.java | 32 ++- > > > > > .../commons/pool2/impl/GenericKeyedObjectPool.java | 2 +- > > > > > .../commons/pool2/impl/GenericObjectPool.java | 2 +- > > > > > .../apache/commons/pool2/ObjectPoolIssue326.java | 2 + > > > > > .../java/org/apache/commons/pool2/PoolTest.java| 2 +- > > > > > .../pool2/impl/TestAbandonedKeyedObjectPool.java | 2 +- > > > > > .../pool2/impl/TestAbandonedObjectPool.java| 2 +- > > > > > .../pool2/impl/TestGenericKeyedObjectPool.java | 5 +- > > > > > .../commons/pool2/impl/TestGenericObjectPool.java | 6 +- > > > > > .../impl/TestGenericObjectPoolClassLoaders.java| 4 +- > > > > > .../TestGenericObjectPoolFactoryCreateFailure.java | 3 +- > > > > > 12 files changed, 131 insertions(+), 33 deletions(-) > > > > > > > > > > diff --git > > >
[pool] Another source compatibility break in 2.x
We have a minor source compat break still in 2.x The change to have BaseGenericObjectPool implement Autocloseable forced addition of an abstract close method. Technically, that could break subclass implementations that don't implement close. I see three options here. Maybe someone else has a better idea. 0) Ignore the problem. Unlikely to actually impact anyone. 1) Add a default implementation that a) throws UnsupportedOperationException b) No-Ops c) does 2) Add Implements Autocloseable to the subclasses (GOP, GKOP, ...) instead 3) Revert the change for 2.x I am leaning toward 1a but I would also be OK with 0. I don't much like 2 and I really don't like 3. Option 2 could be remediated in pool 3, so the ugliness would be temporary. Any better ideas? Phil