On second hand, the curl command doesn't index the file content on this system either. It worked on my home system (mac) but isn't working anymore on my job system (windows), could this have anything to do with the issue? I'm not getting any error messages, the file identifier is properly indexed but the text fields are all empty, despite the file type used (pdf, docx, txt). My solr config looks as follows:
<config> <dataDir>solr/data</dataDir> <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> <lst name="defaults"> <str name="fmap.content">text</str> <str name="fmap.Last-Modified">last_modified</str> <str name="uprefix">ignored_</str> </lst> </requestHandler> ... </config> And the logs: [INFO] Started Jetty Server 28-04-2010 14:27:54.643 WARN 31889293-0 org.apache.solr.updat e.SolrIndexWriter:120 No lockType configured for solr/data\index/ assuming 'simp le' 28-04-2010 14:27:54.674 INFO 31889293-0 org.apache.s olr.core.SolrCore:114 SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\Development\workspace\solr-server\solr\data\index,segFN=se gments_6,version=1272456431379,generation=6,filenames=[_0.cfs, _3.cfs, _0_1.del, _1.cfs, _2.cfs, segments_6, _4.cfs] 28-04-2010 14:27:54.674 INFO 31889293-0 org.apache.s olr.core.SolrCore:136 newest commit = 1272456431379 28-04-2010 14:27:54.689 INFO 31889293-0 org.apache.solr.upd ate.UpdateHandler:399 start commit(optimize=false,waitFlush=false,waitSearcher=t rue,expungeDeletes=false) 28-04-2010 14:27:54.721 INFO 31889293-0 org.apache.s olr.core.SolrCore:122 SolrDeletionPolicy.onCommit: commits:num=2 commit{dir=C:\Development\workspace\solr-server\solr\data\index,segFN=se gments_6,version=1272456431379,generation=6,filenames=[_0.cfs, _3.cfs, _0_1.del, _1.cfs, _2.cfs, segments_6, _4.cfs] commit{dir=C:\Development\workspace\solr-server\solr\data\index,segFN=se gments_7,version=1272456431380,generation=7,filenames=[_0.cfs, segments_7, _ 3.cf s, _0_1.del, _4_1.del, _5.cfs, _1.cfs, _2.cfs, _4.cfs] 28-04-2010 14:27:54.721 INFO 31889293-0 org.apache.s olr.core.SolrCore:136 newest commit = 1272456431380 28-04-2010 14:27:54.721 INFO 31889293-0 org.apache.solr.search. SolrIndexSearcher:135 Opening searc...@1da1a93 main 28-04-2010 14:27:54.721 INFO 31889293-0 org.apache.solr.upd ate.UpdateHandler:423 end_commit_flush 28-04-2010 14:27:54.721 INFO 1-thread-1 org.apache.solr.search. SolrIndexSearcher:1480 autowarming searc...@1da1a93 main from searc...@1c44a6d m ain fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,siz e=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00 ,cumulative_inserts=0,cumulative_evictions=0} 28-04-2010 14:27:54.736 INFO 1-thread-1 org.apache.solr.search. SolrIndexSearcher:1482 autowarming result for searc...@1da1a93 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,siz e=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00 ,cumulative_inserts=0,cumulative_evictions=0} 28-04-2010 14:27:54.736 INFO 1-thread-1 org.apache.s olr.core.SolrCore:1276 [] Registered new searcher searc...@1da1a93 main 28-04-2010 14:27:54.736 INFO 1-thread-1 org.apache.solr.search. SolrIndexSearcher:225 Closing searc...@1c44a6d main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,siz e=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00 ,cumulative_inserts=0,cumulative_evictions=0} *28-04-2010 14:27:54.736 INFO 31889293-0 .update.processor.Updat eRequestProcessor:171 {add=[RUNNING.txt],commit=} 0 140 28-04-2010 14:27:54.736 INFO 31889293-0 org.apache.s olr.core.SolrCore:1324 [] webapp=/solr-server path=/update/extract params={commi t=true&literal.id=RUNNING.txt} status=0 QTime=140* On Wed, Apr 28, 2010 at 2:24 PM, Grant Ingersoll <gsing...@apache.org>wrote: > What error are you getting? Does > http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/work > for you? > > On Apr 28, 2010, at 6:44 AM, Jeroen van Schagen wrote: > > > Dear solr-user, > > > > Using a quartz scheduler, I want to index all documents inside a specific > > folder with Solr(J). To perform the actual indexing I selected the > > org.apache.solr.handler.extraction.ExtractingRequestHandler. The request > > handler functions perfectly when the request is send by curl: curl " > > http://localhost:8080/solr/update/extract?literal.id=doc2&commit=true" > -F > > "tutori...@example.docx" > > > > But, for some reason, the file is not indexed when using SolrJ. My > indexing > > method looks as follow: > > private static final String EXTRACT_REQUEST_MAPPING = "/update/extract"; > > private File baseFolder; > > private boolean recursive = false; > > private CommonsHttpSolrServer server; > > public void index(File folder) { > > if (!folder.isDirectory()) { > > throw new IllegalArgumentException(folder.getAbsolutePath() + > " > > is not a directory."); > > } > > logger.info("Indexing documents inside folder [{}]", > > folder.getAbsolutePath()); > > for (File file : folder.listFiles()) { > > if (file.isFile()) { > > ContentStreamUpdateRequest up = new > > ContentStreamUpdateRequest(EXTRACT_REQUEST_MAPPING); > > try { > > up.addFile(file); > > up.setParam("literal.id", file.getName()); > > up.setAction(AbstractUpdateRequest.ACTION.COMMIT, > true, > > true); > > server.request(up); > > } catch (SolrServerException e) { > > logger.error("Could not connect to server.", e); > > } catch (IOException e) { > > logger.error("Could not upload file to server.", e); > > } > > } else if (recursive && file.isDirectory()) { > > index(file); // Index sub-directory as well > > } > > } > > } > > > > is there something im doing wrong here? > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > >