I'm using Lucid Imagination installation kit for SOLR (the last one with SOLR
1.4).
I would like to use stopwords, and I installed in
LucidWorks/lucidworks/solr/conf/stopwords.txt the italian version of the
file.
Moreover the field where I want to clean stopwords is declared in schema.xml
as
I reply to myself because I founded the mistake. The italian stopwords file
that I founded on apache site contains on the same line of each stopword a
comment shell style, the stopwords tokenizer probably is basical and doesn't
accept comments on the same line of stopwords. I dropped them and now
There's a tika example in solr/trunk/example/exampleDIH in the current
> solr trunk. (I don't remember if it's in the solr 1.4 release.) With
> this you can save the pdf binary in one field and save the extracted
> text in another field. I'm doing this now with html.
>
&
Ok I'm going ahead (may be:).
I tried another curl command to send the file from remote:
http://mysolr:/solr/update/extract?literal.id=8514&stream.file=files/attach-8514.pdf&stream.contentType=application/pdf
and the behaviour has been changed: now I get an error in solr log file:
HTTP St
I understand that tika is able to index pdf content: its true? I tried to
post a pdf from local and I've seen in the solr/admin schema browser another
document, but when I search only the document id is available, the documents
doesn't seem indexed. Do I need other products to index pdf content?