Continue committing after out of memory of contrib library. (tika)

David Vdd Tue, 14 May 2013 07:58:43 -0700

I'm using a combination of tika and custom code to extract text from files.
(with solrj)
I was looking at the amount of files I had in my index and noticed many of
them where missing.
Then I went to the solradmin panel and noticed this in the logfiles:


        SEVERE
        SolrCore
        java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit

        SEVERE
        SolrDispatchFilter
         null:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit

        SEVERE
        CommitTracker
        auto commit error...:java.lang.IllegalStateException: this writer
hit an OutOfMemoryError; cannot commit

After this all the uploads to tika seem to fail.(internal server error 500).

This is the code I use to upload stuff with Tika:

    SolrServer solr;
    ...
    public void IndexFile(File fileToIndex) throws IOException,
SolrServerException {
        ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
        up.addFile(fileToIndex, "application/octet-stream");
        up.setParam("literal.filename", fileToIndex.getName());
        up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
        solr.request(up);
    }

Is there a way to skip the file that caused the out of memory and then
*continue extracting/indexing*. I don't know how to do this in SolrJ. 

All the files I uploaded manually kept working. (because I index each page
of a pdf seperatly using pdfbox)
Only those who used tika gave Exceptions and didn't commit.

I know I could've increased memory parameters but some Excel files fail to
extract even with 16Gb memory assigned. I've tested it with the tika
library.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Continue-committing-after-out-of-memory-of-contrib-library-tika-tp4063240.html
Sent from the Solr - User mailing list archive at Nabble.com.

Continue committing after out of memory of contrib library. (tika)

Reply via email to