Hi, Can you explain me this problem? I have indexed data from multi file which use tika libs. And i have indexed data from http. But only one file (ex: http://myweb/filename.pdf). Now i have many file formats in a http path (ex:http://myweb/files/). I tried index data from a http path but it's not work. It is my data-config.
*<dataConfig> <dataSource type="BinURLDataSource" name="bin" encoding="utf-8"/> <document> <entity name="sd" processor="FileListEntityProcessor" fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)" baseDir="http://www.lc.unsw.edu.au/onlib/pdf/" recursive="true" rootEntity="false" transformer="DateFormatTransformer" > <entity name="tika-test" processor="TikaEntityProcessor" url="${sd.fileAbsolutePath}" format="text" dataSource="bin" > <field column="Author" name="author" meta="true"/> <field column="title" name="title" meta="true"/> <field column="text" name="text"/> </entity> <field column="file" name="filename"/> </entity> </document> </dataConfig>* Error: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir' value: http://www.lc.unsw.edu.au/onlib/pdf/ is not a directory Processing Document # 1 at org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:124) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:69) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:552) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392) Thanks for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3331651.html Sent from the Solr - User mailing list archive at Nabble.com.