I think you might need to figure out what files are not coming in the index, and see if you can find command pattern in those files. Since these are pdf files, please make sure the file's security settings allow content extraction etc..
Regards, Vivek -----Original Message----- From: 荣康 [mailto:whuiss_cs2...@163.com] Sent: Wednesday, February 08, 2012 11:30 PM To: solr-user@lucene.apache.org Subject: Help:Solr can't put all pdf files into index Hey , I am using solr as my search engine to search my pdf files. I have 18219 files(different file names) and all the files are in one same directory。But when I use solr to import the files into index using Dataimport method, solr report only import 17233 files. It's very strange. This problem has stoped out project for a few days. I can't handle it. please help me! Schema.xml <fields> <field name="text" type="text" indexed="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true"/> <field name="filename" type="filenametext" indexed="true" required="true" termVectors="true" termPositions="true" termOffsets="true"/> <field name="id" type="string" stored="true"/> </fields> <uniqueKey>id</uniqueKey> <copyField source="filename" dest="text"/> and <dataConfig> <dataSource type="BinFileDataSource" name="bin"/> <document> <entity name="f" processor="FileListEntityProcessor" recursive="true" rootEntity="false" dataSource="null" baseDir="H:/pdf/cls_1_16800_OCRed/1" fileName=".*\.(PDF)|(pdf)|(Pdf)|(pDf)|(pdF)|(PDf)|(PdF)|(pDF)" onError="skip"> <entity name="tika-test" processor="TikaEntityProcessor" url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip"> <field column="text" name="text"/> </entity> <field column="file" name="id"/> <field column="file" name="filename"/> </entity> </document> </dataConfig> sincerecly Rong Kang