Hi Erik, I think we have some misunderstanding.
I want to index the text of the docs in Solr (only indexed, NOT stored). But I want the text (Tika output) back for: * later faster reindexing (some text extraction like OCR takes really long) * use the text for other processings The original doc is NOT stored in solr. So my question was if I can index the original doc via ExtractingRequestHandler in Solr AND get back the text output, in a single call. AFAIK I can do it only in 2 calls: 1) ExtractingRequestHandler?ext.extract.only=true -> Text 2) Index the text from 1) in solr Thx > Yes, you can. but.... Generally, storing the raw input in Solr is > not the best approach. The problem here is that pretty soon > you get a huge index that contains *everything*. Solr was not > intended to be a data store. > > Besides, you then need to store the binary form of the file. Solr > only deals with text, not markup. > > Most people index the text in Solr, and enough information > so the application knows where to go to fetch the original > document when the user drills down (e.g. file path, database > PK, etc). Would that work for your situation? > > Best > Erick > > On Sat, Mar 31, 2012 at 3:55 PM, <spr...@gmx.eu> wrote: > > Hi, > > > > I want to index various filetypes in solr, this can easily done with > > ExtractingRequestHandler. But I also need the extracted > content back. > > I know ext.extract.only but then nothing gets indexed, right? > > > > Can I index the document AND get the content back as with > ext.extract.only? > > In a single request? > > > > Thank you > > > > >