stopwords file configuration

2010-11-16 Thread alendo
I'm using Lucid Imagination installation kit for SOLR (the last one with SOLR 1.4). I would like to use stopwords, and I installed in LucidWorks/lucidworks/solr/conf/stopwords.txt the italian version of the file. Moreover the field where I want to clean stopwords is declared in schema.xml as

Re: stopwords file configuration

2010-11-16 Thread alendo
I reply to myself because I founded the mistake. The italian stopwords file that I founded on apache site contains on the same line of each stopword a comment shell style, the stopwords tokenizer probably is basical and doesn't accept comments on the same line of stopwords. I dropped them and now

Re: Posting pdf file and posting from remote

2010-02-11 Thread alendo
There's a tika example in solr/trunk/example/exampleDIH in the current > solr trunk. (I don't remember if it's in the solr 1.4 release.) With > this you can save the pdf binary in one field and save the extracted > text in another field. I'm doing this now with html. > &

Re: Posting pdf file and posting from remote

2010-02-09 Thread alendo
Ok I'm going ahead (may be:). I tried another curl command to send the file from remote: http://mysolr:/solr/update/extract?literal.id=8514&stream.file=files/attach-8514.pdf&stream.contentType=application/pdf and the behaviour has been changed: now I get an error in solr log file: HTTP St

Posting pdf file and posting from remote

2010-02-09 Thread alendo
I understand that tika is able to index pdf content: its true? I tried to post a pdf from local and I've seen in the solr/admin schema browser another document, but when I search only the document id is available, the documents doesn't seem indexed. Do I need other products to index pdf content?