Re: Problem retaining PDF text

2016-04-13 Thread Alexandre Rafalovitch
Atomic update requires to reload the content of all _other_ fields to reconstruct full document before putting it back into Lucene index. That's because Lucene does not support 'update' and every update actually deletes the original and recreates it. The problem is that your PDF text is probably n

Problem retaining PDF text

2016-04-13 Thread Alan G Quan
I am indexing PDF documents in Solr 5.3.0 like this: curl "http://localhost:8983/solr/mycore1/update/extract?literal.id=101&commit=true"; -F "myfile=@101.pdf". This works fine and I can search for keywords in the PDF text in Solr and it finds the document correctly. But when I make any subseque