HI Erick,

Thanks for pointing out the main problem of my system.

Trung.

On Fri, Jul 10, 2015 at 11:47 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> In a word, no. If you don't store the data it is completely gone
> with no chance of retrieval.
>
> There are a couple of things to think about though
>
> 1> The original doc must exist somewhere. Store some kind
> of URI in Solr that you can use to retrieve the original doc
> on demand.
>
> 2> Go ahead and store the data. Disk space is cheap, and the
> stored data goes in special files (*.fdt) that have very little impact
> on either search speed or memory requirements. And the memory
> requirements can be controlled somewhat with the documentCache
> assuming you don't have gigantic docs.
>
> This kind of sidesteps the question of re-extracting the document
> on Solr on demand and returning the text (which I think is what
> you're asking). I would  definitely avoid doing this even if I knew how.
> The problem here is that you're making Solr do quite intensive
> work (Tika extraction) while at the same time serving queries
> what has negative performance implications. It it turns out that you
> have to do this, consider running Tika in the app layer and
> doing the extraction on demand there. It's not very hard, see:
> https://lucidworks.com/blog/indexing-with-solrj/
> and ignore the db bits.
>
> Best,
> Erick
>
> On Thu, Jul 9, 2015 at 7:53 PM, trung.ht <trung...@anlab.vn> wrote:
> > Hi everyone,
> >
> > I use solr to index and search in office file (docx, pptx, ...). To
> reduce
> > the size of solr index, I do not store the content of the file on solr,
> > however now my customer want to preview the content of the file.
> >
> > I have read the document of ExtractingRequestHandler, but it seems that
> to
> > return content in the response from solr, the only option is to
> > set extractOnly=true, but in that case, solr would not index the file.
> >
> > My question is: is there anyway for solr to extract the content from
> tika,
> > index the content (without storing it) and then give me the content in
> the
> > response?
> >
> > Thanks in advanced and sorry because my explanation is confusing.
> >
> > Trung.
>

Reply via email to