Hi All,
I have some problems with integration of Nutch in Solr and Tomcat.
I follo Nutch tutorial for integration and now, I can crawl a website: all
works right.
But It I try the solr integration, I can't indexing on Solr.
follow the nutch output after the command:
bin/nutch crawl urls -solr htt
Read http://wiki.apache.org/solr/DataImportHandler for better method. The
FileListEntityProcessor is what you are looking for.
--
View this message in context:
http://lucene.472066.n3.nabble.com/indexing-data-on-solr-tp3717111p3717208.html
Sent from the Solr - User mailing list archive at Nabble.
Your query analyser will tokenize "simple sirup" into "simple" and "sirup"
and wont match on "simple syrup" in the synonyms.txt
So you have to change the query analyzer into KeywordTokenizerFactory as
well.
It might be idea to make a field for synonyms only with this tokenizer and
another field t
I was recently walking through schema.xml and noticed triedate vs date
and a note that triedate should be considered instead. I believe I
understand the basic principle behind triedate but is there an
analysis that exists which shows how much bigger an index would be if
triedate were used vs date?
they way you described it is how DIH works with variable replacement.
Alternatively, you can define the per-core properties in
SOLR_HOME//conf/solrcore.properties file as a list of
key=value pairs. For the global variable, NUM_CORES, you can define
it as a JVM system property, like -DNUM_CORES=3
The analysis chains are exactly that, chains. They really don't
have a way to skip things in a context-sensitive way.
So if you stem first, the stemmed stuff gets reversed and
vice-versa.
Are you sure that stemming then reversing is a bad solution?
What's the use case that this is bad for?
Other
Well, you haven't told us anything about your setup, like how big the
corpus is, how big your index is, what "large increase" means (1G?
16G?).
Have a look at:
http://lucene.apache.org/java/3_5_0/fileformats.html#file-names
and watch the file size changes with and without the information
in order
Doesn't tomcat run on port 8080, and not port 8983? Or did you change the
tomcat's default port to 8983?
On Feb 5, 2012 5:17 AM, "alessio crisantemi"
wrote:
> Hi All,
> I have some problems with integration of Nutch in Solr and Tomcat.
>
> I follo Nutch tutorial for integration and now, I can cra
I'm not quite sure what you're trying to do with KeywordTokenizerFactory in
your SynonymFilter definition, but if I use the defaults, then the
all-phrase form works just fine.
So the question is "what problem are you trying to address by using
KeywordTokenizerFactory?"
Best
Erick
On Sun, Feb 5,
I'd just try it and see, changing this should be pretty
simple to do a before/after with your data and see
if it's acceptable in your situation...
Best
Erick
On Sun, Feb 5, 2012 at 8:32 AM, Jamie Johnson wrote:
> I was recently walking through schema.xml and noticed triedate vs date
> and a note
alessio crisantemi-2,
I think you got it.. Check the jars in nutch lib and see if the solr n solrj
jars are same... That could be the issue
--
View this message in context:
http://lucene.472066.n3.nabble.com/nutch-in-solr-tp3716969p3717542.html
Sent from the Solr - User mailing list archive at Na
no, all run on port 8983.
..
2012/2/5 Matthew Parker
> Doesn't tomcat run on port 8080, and not port 8983? Or did you change the
> tomcat's default port to 8983?
> On Feb 5, 2012 5:17 AM, "alessio crisantemi" >
> wrote:
>
> > Hi All,
> > I have some problems with integration of Nutch in Solr an
No, they all don't run on 8983.
Tomcat's default port is 8080.
If you're using the embedded server in SOLR, you are using Jetty, which
runs on port 8983.
On Sun, Feb 5, 2012 at 11:54 AM, alessio crisantemi <
alessio.crisant...@gmail.com> wrote:
> no, all run on port 8983.
> ..
>
> 2012/2/5 Matt
looks like solrj version in nutch classpath is different that the solr
version on server,
can you post the versions for both nutch and solr?
On Sun, Feb 5, 2012 at 10:24 PM, alessio crisantemi
wrote:
> no, all run on port 8983.
> ..
>
> 2012/2/5 Matthew Parker
>
>> Doesn't tomcat run on port 8
if I look the solr and nuth libs I found:
apache-solr-solrj-1.4.1.jar on Solr
and
solr-solrj-3.4.0.jar
this are the only jar files with a word 'solrj'
taht's the problem?!
2012/2/5 Geek Gamer
> looks like solrj version in nutch classpath is different that the solr
> version on server,
> can
solj is the solr java client library,
so there seem to be two versions 1.4.1 and 3.4.0, which are
incompatible, so you can do the following,
refer :
https://github.com/geek4377/nutch/commit/c66bf35ff4f86393413621b3b889b1c78281df4d
to see how to upgrade the solr version in nutch, teh above exam
tx, I try and write the result asap
a.
2012/2/5 Geek Gamer
> solj is the solr java client library,
>
> so there seem to be two versions 1.4.1 and 3.4.0, which are
> incompatible, so you can do the following,
>
> refer :
> https://github.com/geek4377/nutch/commit/c66bf35ff4f86393413621b3b889b1c7
Hi all,
I have a question about scala and solr... I am curious if we can use solr
with scala (plugins etc) to improve performance.
anybody used scala on solr? could you tell me opinions about them?
-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.472
Thanks for the response. This almost worked, I created a new field using the
KeywordTokenizerFactory as you suggested. The only problem was that searches
only found documents when quotes were used.
E.g.
synonyms.txt setup like this:
simple syrup,sugar syrup,stock syrup
I indexed a document wit
Thanks for your response. When I don't include the KeywordTokenizerFactory in
the SynonymFilter definition, I get additional term values that I don't want.
e.g. synonyms.txt looks like:
simple syrup,sugar syrup,stock syrup
A document with a value containing 'simple syrup' can now be found when
I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit ) and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).
Will it have pauses when deletes
Hi Prasenjit,
It sounds like at this point your main enemy might be those per-doc-add
commits. Don't commit until you need to see your new docs in results. And if
you need NRT then use softCommit option with Solr trunk
(http://search-lucene.com/?q=softcommit&fc_project=Solr) or use commitWith
I also just come across this issue. here are some tips:
1. create "lib" dir under your own $SOLR_HOME dir
2. copy .../dist/apache-solr-velocity-*-SNAPSHOT.jar to the dir
3. create "contrib" dir under $SOLR_HOME
4. copy all files under.../solr/contirb in solr source dir
5. update your own solrconfig
Thanks Otis. commitWithin will definitely work for me ( as I
currently am using 3.4 version, which doesnt have NRT yet ).
Assuming that I use commitWithin=10secs, are you saying that the
continuous deletes ( without commit ) wont have any affect on
performance ?
I was under the impression that de
24 matches
Mail list logo