[ https://issues.apache.org/jira/browse/MNG-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923272#comment-17923272 ]
Guillaume Nodet commented on MNG-7592: -------------------------------------- This should now be fairly easy to implement. The {{DefaultModelBuilder}} reads the POM XML using an {{XmlReaderRequest}} which supports a {{transform}} function. Passing a function that would intern the required strings (it has a {{context}} parameter to indicate which element is being parsed) would work. https://github.com/apache/maven/blob/a5ddd7a56379b972972fe8fdd2d31e801d9a54f8/impl/maven-impl/src/main/java/org/apache/maven/impl/model/DefaultModelBuilder.java#L1269-L1288 > String deduplication in model building > -------------------------------------- > > Key: MNG-7592 > URL: https://issues.apache.org/jira/browse/MNG-7592 > Project: Maven > Issue Type: Improvement > Reporter: Christoph Läubrich > Priority: Major > Fix For: 4.x / Backlog > > > I currently investigate improving memory consumption in m2eclipse (maven ide > extension) and noticed that one problem is that maven model seem to not > deduplicate strings, so for large projects (I used apache camel as an > example), there are a lot of duplicate strings hanging around, e.g. I see > 12.000 instances of "org.apache.maven.plugins" or around 10.000 of > "org.apache.camel" (please note that probably not all related to maven!). > If I look at the Graph of incoming references I see for example that these > are from Model/Artifact groupId. > I know that string deduplication in general is hard and even controversial, > but maybe one could think about such thing at least for the "hotsposts", e,g, > groupId, artifactId and version or even managementKeys seem good candidates > to be considered for such thing as these are used all over the place. -- This message was sent by Atlassian Jira (v8.20.10#820010)