Fwd: problem using ExtractingRequestHandler with solrj

Jeroen van Schagen Wed, 28 Apr 2010 06:11:18 -0700

On second hand, the curl command doesn't index the file content on this
system either. It worked on my home system (mac) but isn't working anymore
on my job system (windows), could this have anything to do with the issue?
I'm not getting any error messages, the file identifier is properly indexed
but the text fields are all empty, despite the file type used (pdf, docx,
txt). My solr config looks as follows:


<config>
    <dataDir>solr/data</dataDir>
    <requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
        <lst name="defaults">
            <str name="fmap.content">text</str>
            <str name="fmap.Last-Modified">last_modified</str>
            <str name="uprefix">ignored_</str>
        </lst>
    </requestHandler>
    ...
</config>

And the logs:

[INFO] Started Jetty Server
28-04-2010 14:27:54.643 WARN  31889293-0
org.apache.solr.updat
e.SolrIndexWriter:120 No lockType configured for solr/data\index/ assuming
'simp
le'
28-04-2010 14:27:54.674 INFO  31889293-0
org.apache.s
olr.core.SolrCore:114 SolrDeletionPolicy.onInit: commits:num=1

commit{dir=C:\Development\workspace\solr-server\solr\data\index,segFN=se
gments_6,version=1272456431379,generation=6,filenames=[_0.cfs, _3.cfs,
_0_1.del,
 _1.cfs, _2.cfs, segments_6, _4.cfs]
28-04-2010 14:27:54.674 INFO  31889293-0
org.apache.s
olr.core.SolrCore:136 newest commit = 1272456431379
28-04-2010 14:27:54.689 INFO  31889293-0
org.apache.solr.upd
ate.UpdateHandler:399 start
commit(optimize=false,waitFlush=false,waitSearcher=t
rue,expungeDeletes=false)
28-04-2010 14:27:54.721 INFO  31889293-0
org.apache.s
olr.core.SolrCore:122 SolrDeletionPolicy.onCommit: commits:num=2

commit{dir=C:\Development\workspace\solr-server\solr\data\index,segFN=se
gments_6,version=1272456431379,generation=6,filenames=[_0.cfs, _3.cfs,
_0_1.del,
 _1.cfs, _2.cfs, segments_6, _4.cfs]

commit{dir=C:\Development\workspace\solr-server\solr\data\index,segFN=se
gments_7,version=1272456431380,generation=7,filenames=[_0.cfs, segments_7, _
3.cf
s, _0_1.del, _4_1.del, _5.cfs, _1.cfs, _2.cfs, _4.cfs]
28-04-2010 14:27:54.721 INFO  31889293-0
org.apache.s
olr.core.SolrCore:136 newest commit = 1272456431380
28-04-2010 14:27:54.721 INFO  31889293-0
org.apache.solr.search.
SolrIndexSearcher:135 Opening searc...@1da1a93 main
28-04-2010 14:27:54.721 INFO  31889293-0
org.apache.solr.upd
ate.UpdateHandler:423 end_commit_flush
28-04-2010 14:27:54.721 INFO  1-thread-1
org.apache.solr.search.
SolrIndexSearcher:1480 autowarming searc...@1da1a93 main from
searc...@1c44a6d m
ain

fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,siz
e=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00
,cumulative_inserts=0,cumulative_evictions=0}
28-04-2010 14:27:54.736 INFO  1-thread-1
org.apache.solr.search.
SolrIndexSearcher:1482 autowarming result for searc...@1da1a93 main

fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,siz
e=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00
,cumulative_inserts=0,cumulative_evictions=0}
28-04-2010 14:27:54.736 INFO  1-thread-1
org.apache.s
olr.core.SolrCore:1276 [] Registered new searcher searc...@1da1a93 main
28-04-2010 14:27:54.736 INFO  1-thread-1
org.apache.solr.search.
SolrIndexSearcher:225 Closing searc...@1c44a6d main

fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,siz
e=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00
,cumulative_inserts=0,cumulative_evictions=0}
*28-04-2010 14:27:54.736 INFO  31889293-0
.update.processor.Updat
eRequestProcessor:171 {add=[RUNNING.txt],commit=} 0 140
28-04-2010 14:27:54.736 INFO  31889293-0
org.apache.s
olr.core.SolrCore:1324 [] webapp=/solr-server path=/update/extract
params={commi
t=true&literal.id=RUNNING.txt} status=0 QTime=140*

On Wed, Apr 28, 2010 at 2:24 PM, Grant Ingersoll <gsing...@apache.org>wrote:

> What error are you getting?  Does
> http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/work
>  for you?
>
> On Apr 28, 2010, at 6:44 AM, Jeroen van Schagen wrote:
>
> > Dear solr-user,
> >
> > Using a quartz scheduler, I want to index all documents inside a specific
> > folder with Solr(J). To perform the actual indexing I selected the
> > org.apache.solr.handler.extraction.ExtractingRequestHandler. The request
> > handler functions perfectly when the request is send by curl: curl "
> > http://localhost:8080/solr/update/extract?literal.id=doc2&commit=true";
>  -F
> > "tutori...@example.docx"
> >
> > But, for some reason, the file is not indexed when using SolrJ. My
> indexing
> > method looks as follow:
> > private static final String EXTRACT_REQUEST_MAPPING = "/update/extract";
> > private File baseFolder;
> > private boolean recursive = false;
> > private CommonsHttpSolrServer server;
> > public void index(File folder) {
> >        if (!folder.isDirectory()) {
> >            throw new IllegalArgumentException(folder.getAbsolutePath() +
> "
> > is not a directory.");
> >        }
> >        logger.info("Indexing documents inside folder [{}]",
> > folder.getAbsolutePath());
> >        for (File file : folder.listFiles()) {
> >            if (file.isFile()) {
> >                ContentStreamUpdateRequest up = new
> > ContentStreamUpdateRequest(EXTRACT_REQUEST_MAPPING);
> >                try {
> >                    up.addFile(file);
> >                    up.setParam("literal.id", file.getName());
> >                    up.setAction(AbstractUpdateRequest.ACTION.COMMIT,
> true,
> > true);
> >                    server.request(up);
> >                } catch (SolrServerException e) {
> >                    logger.error("Could not connect to server.", e);
> >                } catch (IOException e) {
> >                    logger.error("Could not upload file to server.", e);
> >                }
> >            } else if (recursive && file.isDirectory()) {
> >                index(file); // Index sub-directory as well
> >            }
> >        }
> >    }
> >
> > is there something im doing wrong here?
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Fwd: problem using ExtractingRequestHandler with solrj

Reply via email to