Re: ExtractRequestHandler and Tika. Get only plain text

2018-11-14 Thread Erick Erickson
bq. Does you post mean that functionality for indexing documents in Solr using ExtractRequestHandler doesn't provide the option of Indexing plain data Frankly I don't know. It's just that if you plan to eventually offload the Tika parsing onto a client (or use a service), does it make sense to spe

Re: ExtractRequestHandler and Tika. Get only plain text

2018-11-14 Thread Sergio García Maroto
Thanks Erick. I do use this strategy for indexing data from DB. It is very flexible for me. I work in a company where .net is the main dev platform , so even more important to separate things. Does you post mean that functionality for indexing documents in Solr using ExtractRequestHandler doesn't

Re: ExtractRequestHandler and Tika. Get only plain text

2018-11-14 Thread Erick Erickson
While ERH is find for getting started, as you go toward production you'll want to consider parsing the data outside of Solr for the reasons (and example) outlined here: https://lucidworks.com/2012/02/14/indexing-with-solrj/ Best, Erick On Wed, Nov 14, 2018 at 6:46 AM Sergio García Maroto wrote: >

Re: ExtractRequestHandler and Tika. Get only plain text

2018-11-14 Thread Sergio García Maroto
Thanks a lot Jan. That works very well. I am now trying to index the doc in Solr deleting the extractOnly parameter and can't find any similiar option to get the data indexed in plain text. I am getting the metadata as well, This is my request. http://localhost:8983/solr/document/update/extract?it

Re: ExtractRequestHandler and Tika. Get only plain text

2018-11-14 Thread Jan Høydahl
Have you tried to specify &extractFormat=text -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 14. nov. 2018 kl. 12:09 skrev marotosg : > > Hi all, > > Currently I am trying to do index documents from different kinds with Solr > and tika. It's working fine but when s

ExtractRequestHandler and Tika. Get only plain text

2018-11-14 Thread marotosg
Hi all, Currently I am trying to do index documents from different kinds with Solr and tika. It's working fine but when solr returns the content of the document. Doesn't return the plain text. It comes back as well with some metadata. For instance my request. http://localhost:8983/solr/document