Hey there, I am doing some stress tests indexing with DIH. I am indexing a mysql DB with 1400000 rows aprox. I am using also the DeDuplication patch. I am using tomcat with JVM limit of -Xms2000M -Xmx2000M I have indexed 3 times using full-import command without restarting tomcat or reloading the core between the indexations. I have used jmap and jhat to map heap memory in some moments of the indexations. Here I show the beginig of the maps (I don't show the lower part of the stack because object instance numbers are completely stable in there). I have noticed that the number of Term, TermInfo and TermQuery grows between an indexation and another... is that normal?
FIRST TIME I INDEX... WITH A MILION INDEXED DOCS APROX... HERE INDEXING PROCESS IS STILL RUNNING 268290 instances of class org.apache.lucene.index.Term 215943 instances of class org.apache.lucene.index.TermInfo 129649 instances of class org.apache.lucene.index.FreqProxTermsWriter$PostingList 51537 instances of class org.apache.lucene.search.TermQuery 25457 instances of class org.apache.lucene.index.BufferedDeletes$Num 23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry 1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry 1120 instances of class org.apache.lucene.index.FieldInfo 919 instances of class org.apache.catalina.loader.ResourceEntry FIRST TIME I INDEX, COMPLETED (1.4 MILION DOCS INDEXED) 552522 instances of class org.apache.lucene.index.Term 505835 instances of class org.apache.lucene.index.TermInfo 128937 instances of class org.apache.lucene.index.FreqProxTermsWriter$PostingList 48645 instances of class org.apache.lucene.search.TermQuery 24065 instances of class org.apache.lucene.index.BufferedDeletes$Num 23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry 1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry 1470 instances of class org.apache.lucene.index.FieldInfo 923 instances of class org.apache.catalina.loader.ResourceEntry 858 instances of class com.sun.tools.javac.util.List SECOND TIME I INDEX WITH 500000 INDEXED DOCS... HERE INDEX PROCESS IS STILL RUNNING 264617 instances of class org.apache.lucene.index.FreqProxTermsWriter$PostingList 262496 instances of class org.apache.lucene.index.Term 116078 instances of class org.apache.lucene.index.TermInfo 53383 instances of class org.apache.lucene.search.TermQuery 42274 instances of class org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput 30230 instances of class org.apache.lucene.search.TermQuery$TermWeight 26044 instances of class org.apache.lucene.index.BufferedDeletes$Num 23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry 15115 instances of class org.apache.lucene.search.BooleanScorer2$Coordinator 15115 instances of class org.apache.lucene.search.ReqExclScorer 7325 instances of class org.apache.lucene.search.ConjunctionScorer$1 1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry 1279 instances of class org.apache.lucene.index.FieldInfo 923 instances of class org.apache.catalina.loader.ResourceEntry SECOND TIME I INDEX WITH 1200000 INDEXED DOCS... HERE INDEX PROCESS IS STILL RUNNING 574603 instances of class org.apache.lucene.index.Term 423558 instances of class org.apache.lucene.index.TermInfo 141394 instances of class org.apache.lucene.index.FreqProxTermsWriter$PostingList 106729 instances of class org.apache.lucene.search.TermQuery 54858 instances of class org.apache.lucene.index.BufferedDeletes$Num 25347 instances of class org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput 23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry 11587 instances of class org.apache.lucene.search.TermQuery$TermWeight 5793 instances of class org.apache.lucene.search.BooleanScorer2$Coordinator 5793 instances of class org.apache.lucene.search.ReqExclScorer 2922 instances of class org.apache.lucene.search.ConjunctionScorer$1 2170 instances of class org.apache.lucene.index.FieldInfo 1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry 923 instances of class org.apache.catalina.loader.ResourceEntry 858 instances of class com.sun.tools.javac.util.List SECOND TIME I INDEX, COMPLETED (1.4 MILION DOCS INDEXED) 999753 instances of class org.apache.lucene.index.Term 808190 instances of class org.apache.lucene.index.TermInfo 156511 instances of class org.apache.lucene.search.TermQuery 128975 instances of class org.apache.lucene.index.FreqProxTermsWriter$PostingList 104396 instances of class org.apache.lucene.index.BufferedDeletes$Num 23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry 15401 instances of class org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput 14896 instances of class org.apache.lucene.search.TermQuery$TermWeight 7447 instances of class org.apache.lucene.search.BooleanScorer2$Coordinator 7447 instances of class org.apache.lucene.search.ReqExclScorer 3025 instances of class org.apache.lucene.search.ConjunctionScorer$1 2660 instances of class org.apache.lucene.index.FieldInfo 1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry 923 instances of class org.apache.catalina.loader.ResourceEntry 858 instances of class com.sun.tools.javac.util.List THIRD TIME I INDEX WITH 200000 INDEXED DOCS... HERE INDEX PROCESS IS STILL RUNNING 591510 instances of class org.apache.lucene.index.Term 384132 instances of class org.apache.lucene.index.TermInfo 264655 instances of class org.apache.lucene.index.FreqProxTermsWriter$PostingList 261909 instances of class org.apache.lucene.search.TermQuery 149021 instances of class org.apache.lucene.index.BufferedDeletes$Num 23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry 9456 instances of class org.apache.solr.update.processor.TextProfileSignature$Token 5802 instances of class org.apache.lucene.document.Field 5313 instances of class org.apache.solr.common.SolrInputField 5034 instances of class org.apache.solr.common.SolrInputField$1 2642 instances of class org.apache.lucene.index.FieldInfo 1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry 1040 instances of class org.apache.lucene.analysis.CharArraySet 1040 instances of class org.apache.lucene.analysis.tokenattributes.TermAttribute 1038 instances of class org.apache.lucene.analysis.standard.StandardTokenizer 1038 instances of class org.apache.lucene.analysis.standard.StandardTokenizerImpl 1038 instances of class org.apache.lucene.analysis.tokenattributes.TypeAttribute 1035 instances of class org.apache.lucene.analysis.StopFilter 1035 instances of class org.apache.solr.analysis.RemoveDuplicatesTokenFilter 923 instances of class org.apache.catalina.loader.ResourceEntry 858 instances of class com.sun.tools.javac.util.List THIRD TIME I INDEX WITH 700000 INDEXED DOCS... HERE INDEX PROCESS IS STILL RUNNING 613746 instances of class org.apache.lucene.index.Term 480070 instances of class org.apache.lucene.index.TermInfo 137789 instances of class org.apache.lucene.search.TermQuery 130575 instances of class org.apache.lucene.index.FreqProxTermsWriter$PostingList 89024 instances of class org.apache.lucene.index.BufferedDeletes$Num 23233 instances of class com.sun.tools.javac.zip.ZipFileIndexEntry 13341 instances of class org.apache.solr.update.processor.TextProfileSignature$Token 9557 instances of class org.apache.lucene.document.Field 9118 instances of class org.apache.solr.common.SolrInputField 8927 instances of class org.apache.solr.common.SolrInputField$1 2870 instances of class org.apache.lucene.index.FieldInfo 2211 instances of class org.apache.lucene.analysis.tokenattributes.TermAttribute 2209 instances of class org.apache.solr.analysis.RemoveDuplicatesTokenFilter 1618 instances of class org.apache.lucene.analysis.CharArraySet 1613 instances of class org.apache.lucene.analysis.StopFilter 1613 instances of class org.apache.lucene.analysis.standard.StandardTokenizer 1613 instances of class org.apache.lucene.analysis.standard.StandardTokenizerImpl 1613 instances of class org.apache.lucene.analysis.tokenattributes.TypeAttribute 1569 instances of class com.sun.tools.javac.zip.ZipFileIndex$DirectoryEntry 1292 instances of class org.apache.solr.update.processor.TextProfileSignature$TokenComparator 923 instances of class org.apache.catalina.loader.ResourceEntry 858 instances of class com.sun.tools.javac.util.List If I keep doing full-import with a cron job I will end with a outofmemory error heap space (but it will take a lot of indexations to happen) -- View this message in context: http://www.nabble.com/stress-tests-to-DIH-and-deduplication-patch-tp23295926p23295926.html Sent from the Solr - User mailing list archive at Nabble.com.