nutch in solr

2012-02-05 Thread alessio crisantemi
Hi All, I have some problems with integration of Nutch in Solr and Tomcat. I follo Nutch tutorial for integration and now, I can crawl a website: all works right. But It I try the solr integration, I can't indexing on Solr. follow the nutch output after the command: bin/nutch crawl urls -solr htt

Re: indexing data on solr

2012-02-05 Thread O. Klein
Read http://wiki.apache.org/solr/DataImportHandler for better method. The FileListEntityProcessor is what you are looking for. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-on-solr-tp3717111p3717208.html Sent from the Solr - User mailing list archive at Nabble.

Re: Multi word synonyms

2012-02-05 Thread O. Klein
Your query analyser will tokenize "simple sirup" into "simple" and "sirup" and wont match on "simple syrup" in the synonyms.txt So you have to change the query analyzer into KeywordTokenizerFactory as well. It might be idea to make a field for synonyms only with this tokenizer and another field t

triedate vs date

2012-02-05 Thread Jamie Johnson
I was recently walking through schema.xml and noticed triedate vs date and a note that triedate should be considered instead. I believe I understand the basic principle behind triedate but is there an analysis that exists which shows how much bigger an index would be if triedate were used vs date?

Re: using pre-core properties in dih config

2012-02-05 Thread Esteban Donato
they way you described it is how DIH works with variable replacement. Alternatively, you can define the per-core properties in SOLR_HOME//conf/solrcore.properties file as a list of key=value pairs. For the global variable, NUM_CORES, you can define it as a JVM system property, like -DNUM_CORES=3

Re: ReversedWildcardFilterFactory and PorterStemFilterFactory

2012-02-05 Thread Erick Erickson
The analysis chains are exactly that, chains. They really don't have a way to skip things in a context-sensitive way. So if you stem first, the stemmed stuff gets reversed and vice-versa. Are you sure that stemming then reversing is a bad solution? What's the use case that this is bad for? Other

Re: term positions offsets and frequency

2012-02-05 Thread Erick Erickson
Well, you haven't told us anything about your setup, like how big the corpus is, how big your index is, what "large increase" means (1G? 16G?). Have a look at: http://lucene.apache.org/java/3_5_0/fileformats.html#file-names and watch the file size changes with and without the information in order

Re: nutch in solr

2012-02-05 Thread Matthew Parker
Doesn't tomcat run on port 8080, and not port 8983? Or did you change the tomcat's default port to 8983? On Feb 5, 2012 5:17 AM, "alessio crisantemi" wrote: > Hi All, > I have some problems with integration of Nutch in Solr and Tomcat. > > I follo Nutch tutorial for integration and now, I can cra

Re: Multi word synonyms

2012-02-05 Thread Erick Erickson
I'm not quite sure what you're trying to do with KeywordTokenizerFactory in your SynonymFilter definition, but if I use the defaults, then the all-phrase form works just fine. So the question is "what problem are you trying to address by using KeywordTokenizerFactory?" Best Erick On Sun, Feb 5,

Re: triedate vs date

2012-02-05 Thread Erick Erickson
I'd just try it and see, changing this should be pretty simple to do a before/after with your data and see if it's acceptable in your situation... Best Erick On Sun, Feb 5, 2012 at 8:32 AM, Jamie Johnson wrote: > I was recently walking through schema.xml and noticed triedate vs date > and a note

Re: nutch in solr

2012-02-05 Thread tamanjit.bin...@yahoo.co.in
alessio crisantemi-2, I think you got it.. Check the jars in nutch lib and see if the solr n solrj jars are same... That could be the issue -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-in-solr-tp3716969p3717542.html Sent from the Solr - User mailing list archive at Na

Re: nutch in solr

2012-02-05 Thread alessio crisantemi
no, all run on port 8983. .. 2012/2/5 Matthew Parker > Doesn't tomcat run on port 8080, and not port 8983? Or did you change the > tomcat's default port to 8983? > On Feb 5, 2012 5:17 AM, "alessio crisantemi" > > wrote: > > > Hi All, > > I have some problems with integration of Nutch in Solr an

Re: nutch in solr

2012-02-05 Thread Matthew Parker
No, they all don't run on 8983. Tomcat's default port is 8080. If you're using the embedded server in SOLR, you are using Jetty, which runs on port 8983. On Sun, Feb 5, 2012 at 11:54 AM, alessio crisantemi < alessio.crisant...@gmail.com> wrote: > no, all run on port 8983. > .. > > 2012/2/5 Matt

Re: nutch in solr

2012-02-05 Thread Geek Gamer
looks like solrj version in nutch classpath is different that the solr version on server, can you post the versions for both nutch and solr? On Sun, Feb 5, 2012 at 10:24 PM, alessio crisantemi wrote: > no, all run on port 8983. > .. > > 2012/2/5 Matthew Parker > >> Doesn't tomcat run on port 8

Re: nutch in solr

2012-02-05 Thread alessio crisantemi
if I look the solr and nuth libs I found: apache-solr-solrj-1.4.1.jar on Solr and solr-solrj-3.4.0.jar this are the only jar files with a word 'solrj' taht's the problem?! 2012/2/5 Geek Gamer > looks like solrj version in nutch classpath is different that the solr > version on server, > can

Re: nutch in solr

2012-02-05 Thread Geek Gamer
solj is the solr java client library, so there seem to be two versions 1.4.1 and 3.4.0, which are incompatible, so you can do the following, refer : https://github.com/geek4377/nutch/commit/c66bf35ff4f86393413621b3b889b1c78281df4d to see how to upgrade the solr version in nutch, teh above exam

Re: nutch in solr

2012-02-05 Thread alessio crisantemi
tx, I try and write the result asap a. 2012/2/5 Geek Gamer > solj is the solr java client library, > > so there seem to be two versions 1.4.1 and 3.4.0, which are > incompatible, so you can do the following, > > refer : > https://github.com/geek4377/nutch/commit/c66bf35ff4f86393413621b3b889b1c7

Solr with Scala

2012-02-05 Thread deniz
Hi all, I have a question about scala and solr... I am curious if we can use solr with scala (plugins etc) to improve performance. anybody used scala on solr? could you tell me opinions about them? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472

RE: Multi word synonyms

2012-02-05 Thread Zac Smith
Thanks for the response. This almost worked, I created a new field using the KeywordTokenizerFactory as you suggested. The only problem was that searches only found documents when quotes were used. E.g. synonyms.txt setup like this: simple syrup,sugar syrup,stock syrup I indexed a document wit

RE: Multi word synonyms

2012-02-05 Thread Zac Smith
Thanks for your response. When I don't include the KeywordTokenizerFactory in the SynonymFilter definition, I get additional term values that I don't want. e.g. synonyms.txt looks like: simple syrup,sugar syrup,stock syrup A document with a value containing 'simple syrup' can now be found when

effect of continuous deletes on index's read performance

2012-02-05 Thread prasenjit mukherjee
I have a use case where documents are continuously added @ 20 docs/sec ( each doc add is also doing a commit ) and docs continuously getting deleted at the same rate. So the searchable index size remains the same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). Will it have pauses when deletes

Re: effect of continuous deletes on index's read performance

2012-02-05 Thread Otis Gospodnetic
Hi Prasenjit, It sounds like at this point your main enemy might be those per-doc-add commits.  Don't commit until you need to see your new docs in results.  And if you need NRT then use softCommit option with Solr trunk (http://search-lucene.com/?q=softcommit&fc_project=Solr) or use commitWith

Re: solr.VelocityResponseWriter error in version 3.5.0

2012-02-05 Thread Rocky Wang
I also just come across this issue. here are some tips: 1. create "lib" dir under your own $SOLR_HOME dir 2. copy .../dist/apache-solr-velocity-*-SNAPSHOT.jar to the dir 3. create "contrib" dir under $SOLR_HOME 4. copy all files under.../solr/contirb in solr source dir 5. update your own solrconfig

Re: effect of continuous deletes on index's read performance

2012-02-05 Thread prasenjit mukherjee
Thanks Otis. commitWithin will definitely work for me ( as I currently am using 3.4 version, which doesnt have NRT yet ). Assuming that I use commitWithin=10secs, are you saying that the continuous deletes ( without commit ) wont have any affect on performance ? I was under the impression that de