HI Erick, Thanks for pointing out the main problem of my system.
Trung. On Fri, Jul 10, 2015 at 11:47 PM, Erick Erickson <erickerick...@gmail.com> wrote: > In a word, no. If you don't store the data it is completely gone > with no chance of retrieval. > > There are a couple of things to think about though > > 1> The original doc must exist somewhere. Store some kind > of URI in Solr that you can use to retrieve the original doc > on demand. > > 2> Go ahead and store the data. Disk space is cheap, and the > stored data goes in special files (*.fdt) that have very little impact > on either search speed or memory requirements. And the memory > requirements can be controlled somewhat with the documentCache > assuming you don't have gigantic docs. > > This kind of sidesteps the question of re-extracting the document > on Solr on demand and returning the text (which I think is what > you're asking). I would definitely avoid doing this even if I knew how. > The problem here is that you're making Solr do quite intensive > work (Tika extraction) while at the same time serving queries > what has negative performance implications. It it turns out that you > have to do this, consider running Tika in the app layer and > doing the extraction on demand there. It's not very hard, see: > https://lucidworks.com/blog/indexing-with-solrj/ > and ignore the db bits. > > Best, > Erick > > On Thu, Jul 9, 2015 at 7:53 PM, trung.ht <trung...@anlab.vn> wrote: > > Hi everyone, > > > > I use solr to index and search in office file (docx, pptx, ...). To > reduce > > the size of solr index, I do not store the content of the file on solr, > > however now my customer want to preview the content of the file. > > > > I have read the document of ExtractingRequestHandler, but it seems that > to > > return content in the response from solr, the only option is to > > set extractOnly=true, but in that case, solr would not index the file. > > > > My question is: is there anyway for solr to extract the content from > tika, > > index the content (without storing it) and then give me the content in > the > > response? > > > > Thanks in advanced and sorry because my explanation is confusing. > > > > Trung. >