The problem: Not all of the documents that I expect to be indexed are showing up in the index.
The background: I start off with an empty index based on a schema with a single field named 'query', marked as unique and using the following analyzer: <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> My input is a utf-8 encoded file with one sentence per line. Its total size is about 60MB. I would like each line of the file to correspond to a single document in the solr index. If I print the number of unique lines in the file (using cat | sort | uniq | wc -l), I get a little over 2M. Printing the total number of lines in the file gives me around 2.7M. I use the following to start indexing: curl 'http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=\' When this command completes, I see numDocs is approximately 470k (which is what I find strange) and maxDocs is approximately 890k (which is fine since I know I have around 700k duplicates). Even more confusing is that if I run this exact command a second time without performing any other operations, numDocs goes up to around 610k, and a third time brings it up to about 750k. Can anyone tell me what might cause Solr not to index everything in my input file the first time, and why it would be able to index new documents the second and third times? I also have this line in solrconfig.xml, if it matters: <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="20480000" /> Thanks, Dan -- View this message in context: http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html Sent from the Solr - User mailing list archive at Nabble.com.