date:20110425

Re: Suggester with multi terms

2011-04-25 Thread Em

blocky,

Shingles should be your way.

Regards,
Em

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-with-multi-terms-tp2859547p2860419.html
Sent from the Solr - User mailing list archive at Nabble.com.

Different Cluster Results on Different Servers, with same SOLR setup

2011-04-25 Thread Pawan Darira

Hi

I have same Solr 1.4 setup on two different servers, One for production &
One for Staging. My production server gives proper cluster  & Staging server
give wrong cluster. The problem is for "date" related cluster only

I have checked all the configuration & setup. everything seems fine. i am
creating index through "DIH"

p.s. my application & solr setup is similar on staging & production

please suggest any solution.

-- 
Thanks,
Pawan Darira

Re: Query regarding solr plugin.

2011-04-25 Thread Erick Erickson

Looking at things more carefully, it may be one of your dependent classes
that's not being found.

A couple of things to try.

1> when you do a 'jar -tfv ", you should see
output like:
 1183 Sun Jun 06 01:31:14 EDT 2010
org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class
and your  statement may need the whole path, in this example...
 (note, this
is just an example of the pathing, this class has nothing to do with
your filter)...

2> But I'm guessing your path is actually OK, because I'd expect to be seeing a
"class not found" error. So my guess is that your class depends on
other jars that
aren't packaged up in your jar and if you find which ones they are and copy them
to your lib directory you'll be OK. Or your code is throwing an error
on load. Or
something like that...

3> to try to understand what's up, I'd back up a step. Make a really
stupid class
that doesn't do anything except derive from BaseTokenFilterFacotry and see if
you can load that. If you can, then your process is OK and you need to
find out what classes your new filter depend on. If you still can't, then we can
see what else we can come up with..

Best
Erick

On Mon, Apr 25, 2011 at 2:34 AM, rajini maski  wrote:
> Erick ,
> *
> *
> * Thanks.* It was actually a copy mistake. Anyways i did a redo of all the
> below mentioned steps. I had given class name as
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
> I did it again now following few different steps following this link :
> http://help.eclipse.org/helios/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-32.htm
>
>
> 1 ) Created new package in src folder . *org.apache.pointcross.synonym*.This
> is having class Synonym.java
>
> 2) Now did a right click on same package and selected export option->Java
> tab->JAR File->Selected the path for package -> finish
>
> 3) This created jar file in specified location. Now followed in cmd  , jar
> tfv
> org.apache.pointcross.synonym. the following was desc in cmd.
>
> :\Apps\Rajani Eclipse\Solr141_jar>jar -
> tfv org.apache.pointcross.synonym.Synonym.jar
>  25 Mon Apr 25 11:32:12 GMT+05:30 2011 META-INF/MANIFEST.MF
>  383 Thu Apr 14 16:36:00 GMT+05:30 2011 .project
>  2261 Fri Apr 22 16:26:12 GMT+05:30 2011 .classpath
>  1017 Thu Apr 21 16:34:20 GMT+05:30 2011 jarLog.jardesc
>
> 4) Now placed same jar file in solr home/lib folder .Solrconfig.xml
>  enabled  and in schema   synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
> 5) Restart tomcat : http://localhost:8097/finding1
>
> Error SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'pointcross.synonym.Synonym'
> at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
> at
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388)
> at
> org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84)
> at
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
> at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:835)
> at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58)
>
>
> I am basically trying to enable this jar functionality to solr. Please let
> me know the mistake here.
>
> Rajani
>
>
>
>
> On Fri, Apr 22, 2011 at 6:29 PM, Erick Erickson 
> wrote:
>
>> First I appreciate your writeup of the problem, it's very helpful when
>> people
>> take the time to put in the details
>>
>> I can't reconcile these two things:
>>
>> {{{> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>
>> as org.apache.solr.common.SolrException: Error loading class
>> 'pointcross.orchSynonymFilterFactory' at}}}
>>
>> This seems to indicate that your config file is really looking for
>> "pointcross.orchSynonymFilterFactory" rather than
>> "org.apachepco.search.orchSynonymFilterFactory".
>>
>> Do you perhaps have another definition in your config
>> "pointcross.orchSynonymFilterFactory"?
>>
>> Try running "jar -tfv " to see what classes
>> are actually defined in the file in the solr lib directory. Perhaps
>> it's not what you expect (Perhaps Eclipse did something
>> unexpected).
>>
>> Given the anomaly above (the error reported doesn't correspond to
>> the class you defined) I'd also look to see if you have any old
>> jars lying around that you somehow get to first.
>>
>> Finally, is there any chance that your
>> "pointcross.orchSynonymFilterFactory"
>> is a dependency of "org.apachepco.search.orchSynonymFilterFactory"? In
>> which case Solr may be finding
>> "org.apachepco.search.orchSynonymFilterFactory"
>> but failing to load a dependency (that would have to be put in the lib
>> or the jar).
>>
>> Hope that helps
>> Erick
>>
>>
>>
>> On Fri, Apr 22, 2011 at 3:00 AM, rajini maski 
>> wrote:
>> > One doubt regarding adding the solr plugin.
>> >
>> >
>> >          I have a new java file created that includes few changes in
>> > SynonymFilterFactory.java. I want this java file to be added to solr
>> > inst

Re: Different Cluster Results on Different Servers, with same SOLR setup

2011-04-25 Thread Erick Erickson

There's not much information to go on here. You haven't stated the
problem so people unfamiliar with your setup can understand it. What
is the error you're getting? Show us the configurations, please.

You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Apr 25, 2011 at 4:56 AM, Pawan Darira  wrote:
> Hi
>
> I have same Solr 1.4 setup on two different servers, One for production &
> One for Staging. My production server gives proper cluster  & Staging server
> give wrong cluster. The problem is for "date" related cluster only
>
> I have checked all the configuration & setup. everything seems fine. i am
> creating index through "DIH"
>
> p.s. my application & solr setup is similar on staging & production
>
> please suggest any solution.
>
> --
> Thanks,
> Pawan Darira
>

Re: Unable to load EntityProcessor implementation for entity:16865747177753

2011-04-25 Thread vrpar...@gmail.com

Thanks firdous_kind86

i replace tikaentityprocessor with xpathentityprocessor and works fine

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-load-EntityProcessor-implementation-for-entity-16865747177753-tp2846513p2861229.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to concatenate two nodes of xml with xpathentityprocessor

2011-04-25 Thread vrpar...@gmail.com

hello ,

i am using Xpathentityprocessor to do index xml files

below is my xml file


   CustomerA
   ThisB
   AnyC


now i want to concatenate in index so that when i search it gives below
result

CData with id attribute---  like CustomerAThisB or something like that 

is it possible by RegexTransformer or templatetransformer? i did googling
little for both but could not get excat/useful solution

Thanks

Vishal Parekh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-concatenate-two-nodes-of-xml-with-xpathentityprocessor-tp2861260p2861260.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MoreLikeThis

2011-04-25 Thread Brian Lamb

It finds something under "match" but just nothing under "response". I tried
turning on debugQuery=on but I did not see anything that jumped out at me as
a bug or anything. Is there some kind of threshold setting that I can tinker
with to see if that is the problem?

On Sun, Apr 24, 2011 at 2:37 AM, Grant Ingersoll wrote:

>
> On Apr 21, 2011, at 8:46 PM, Brian Lamb wrote:
>
> > Hi all,
> >
> > I have an mlt search set up on my site with over 2 million records in the
> > index. Normally, my results look like:
> >
> > 
> >  
> >0
> >204
> >  
> >  
> >
> >  Some result.
> >
> >  
> >  
> >
> >  A similar result
> >
> >...
> >  
> > 
> >
> > And there are 100 results under response. However, in some cases, there
> are
> > no results under "response". Why is this the case and is there anything I
> > can do about it?
>
> Is it because it couldn't find anything?  Or are you thinking there is a
> bug?  You might try adding &debugQuery=true and see what gets parsed, etc.
> and then try running that query.
>
>
> >
> > Here is my mlt configuration:
> >
> > 
> >  
> >title,score
> >1
> >100
> >*,score
> >   
> > 
> >
> > And here is the URL I use to get results:
> > http://localhost:8983/solr/mlt/?q=title:Some random title
> >
> > Any help on this matter would be greatly appreciated. Thanks!
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Robert Petersen

Sorry, that was supposed to be just another way to say the same thing...
OK look here is my current situation.  Even with preserveOriginal and
concatAll set, I am still getting an even odder result.

I set up sku=218078624 with title=" Beanbag AppleTV Friction Dash Mount
for GPS " and index it in dev.

The search and index analyzer stack are the same.  When I do this search
in the solr admin page I get zero results " sku:218078624  title:AppleTV
" but when I do this search I get one result " sku:218078624
title:appletv ".  This is the opposite of what was happening before I
added the preserve original setting.  In the analysis page I plug in
that title and term, and it looks to me like it should match... which is
why I started asking about term positions and such.  I don't understand
why I don't get a hit in both cases.  It is so weird.

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: Friday, April 22, 2011 5:55 PM
To: Robert Petersen
Cc: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Fri, Apr 22, 2011 at 8:24 PM, Robert Petersen 
wrote:
> I can repeatedly demonstrate this in my dev environment, where I get
> entirely different results searching for AppleTV vs. appletv

You originally said "I cannot get a match between AppleTV on the
indexing side and appletv on the search side".
Getting different numbers of results or different results is slightly
different.

For example, if there were a document with "Apple TV" in it, then a
query of "AppleTV" would match that doc, but a query of "appletv"
would not.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Yonik Seeley

On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen  wrote:
> The search and index analyzer stack are the same.

Ahhh, they should not be!
Using both generate and catenate in WDF at query time is a no-no.
Same reason you can't have multi-word synonyms at query time:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

I'd recommend going back to the WDF settings in the solr example
server as a starting point.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Good protwords.txt ?

2011-04-25 Thread Otis Gospodnetic

Hi,

Are there any good / comprehensive examples of protwords.txt for English?
Or good stemdict.txt examples that work with StemmerOverrideFilterFactory?

Would be good to have a good example to include in Solr distribution...

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

RE: Solr - Multi Term highlighting issue

2011-04-25 Thread Ramanathapuram, Rajesh

Hi Robert, 

Thanks for your help. 

This looks much closer to my issue(may be not). Unfortunately, I can't
switch to solr version 3.1 yet. 
I hope to revisit and update this post when I do.

Thanks

thanks & regards,
Rajesh Ramana 
Enterprise Applications, Turner Broadcasting System, Inc.
404.878.7474 

-Original Message-
From: Ramanathapuram, Rajesh [mailto:rajesh.ramanathapu...@turner.com] 
Sent: Sunday, April 24, 2011 1:58 AM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

I think I am using ver 1.4, I 'll try to review the link you provided
later today.

Rajesh Ramana

On Apr 24, 2011, at 12:52 AM, "Robert Muir"  wrote:

> On Sat, Apr 23, 2011 at 11:36 PM, Ramanathapuram, Rajesh 
>  wrote:
>> What is really weird is if I search for srchterm1 and srchterm2 
>> separately, the results come up fine. If I search for multiple terms,

>> this issue seems to happen when the terms are separated by html tags 
>> and special characters like ') / \' etc...
>> 
> 
> What version of Solr are you using? Because you are saying the issue 
> only happens when terms involve special characters, its possible it 
> could be this bug: https://issues.apache.org/jira/browse/LUCENE-2874,
> with the overlapping terms being created by the WordDelimiterFilter.
> 
> This is fixed in 3.1.

Re: Good protwords.txt ?

2011-04-25 Thread Robert Muir

On Mon, Apr 25, 2011 at 2:05 PM, Otis Gospodnetic
 wrote:
> Hi,
>
> Are there any good / comprehensive examples of protwords.txt for English?
> Or good stemdict.txt examples that work with StemmerOverrideFilterFactory?
>
> Would be good to have a good example to include in Solr distribution...
>

I brought this up a while ago (as I am probably more than 50-60% done
with all of this via 2+2lemma.txt) and there was no interest:

http://www.lucidimagination.com/search/document/180c90276e589d68/solr_example_synonyms_file

Automatic synonyms for multiple variations of a word

2011-04-25 Thread Otis Gospodnetic

Hi,

How do people handle cases where synonyms are used and there are multiple 
version of the original word that really need to point to the same set of 
synonyms?

For example:
Consider singular and plural of the word "responsibility".  One might have 
synonyms defined like this:

  responsibility, obligation, duty

But the plural "responsibilities" is not in there, and thus it will not get 
expanded to the synonyms above! That's a problem.

Sure, one could change the synonyms file to look like this:

  responsibility, responsibilities, obligation, duty

But that means somebody needs to think of all variations of the word! 

Is there a something one can do to get all variations of the word to map to the 
same synonyms without having to explicitly specify all variations of the word?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Robert Petersen

Aha!  I knew something must be awry, but when I looked at the analysis
page output, well it sure looked like it should match.  :)

OK here is the query side WDF that finally works, I just turned
everything off.  (yay)  First I tried just completely removeing WDF from
the query side analyzer stack but that didn't work.  So anyway I suppose
I should turn off the catenate all plus the preserve original settings,
reindex, and see if I still get a match huh?  (PS  thank you very much
for the help!!!)





-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: Monday, April 25, 2011 9:24 AM
To: solr-user@lucene.apache.org
Subject: Re: term position question from analyzer stack for
WordDelimiterFilterFactory

On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen 
wrote:
> The search and index analyzer stack are the same.

Ahhh, they should not be!
Using both generate and catenate in WDF at query time is a no-no.
Same reason you can't have multi-word synonyms at query time:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
FilterFactory

I'd recommend going back to the WDF settings in the solr example
server as a starting point.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Lucene Rev Stump the Chump

2011-04-25 Thread Grant Ingersoll

Hey everyone,

As you no doubt by now know, Lucene Revolution, the second annual Lucene/Solr 
conference sponsored by Lucid Imagination, is happening out in San Francisco at 
the end of May.  There are a lot of really great talks and speakers from across 
the spectrum  (check out lucenerevolution.org if you haven't already) on how 
people tackled and solved tough problems across the Lucene/Solr space. 

Now, it's time for  _your_ toughest, most challenging Solr/Lucene questions.  
Back by popular demand at this year's Revolution conference, I'll be on the hot 
seat for "Stump The Chump!" -- where I'll spontaneously field Solr/Lucene 
questions I've never seen before in front of a hundreds of people. 

But in order to be a success, we need your questions/problems/challenges.  
Please email a description of your Lucene/Solr problem to 
i...@lucenerevolution.org (don't reply here, as I don't want to see it ahead of 
time)

You can read more details online at http://bit.ly/stump-grant  

Even if you won't be able to make it to San Francisco, please send in any good 
questions you would be interested to see me tackle under the spotlight.  We'll 
record the session on video and post it online shortly after the conference 
(we're exploring a webcast -- still TBD). 

Grant

Re: Multi-word Solr Synonym issue

2011-04-25 Thread Chris Hostetter


: Subject: Multi-word Solr Synonym issue
: In-Reply-To: 

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow

I have a field 'type' that has several values. If it's type 'foo' then 
it also has a field 'restriction_id'.

What I want is a filter query which says "either it's not a 'foo' or if 
it is then it has the restriction '1'"

I expect two matches - one of type 'bar' and one of type 'foo' 

Neither

 fq=(-type:foo OR restriction_id:1)
 fq={!dismax q.op=OR}-type:foo restriction_id:1

produce any results.

 fq=restriction_id:1

gets the 'foo' typed result.

 fq=type:bar 

get the 'bar' typed result.

Either of these

  fq=type:[* TO *] OR (type:foo AND restriction_id:1)
  fq=type:(bar OR quux OR fleeg) OR restriction_id:1

do work but are very, very slow to the point of unusability (our indexes 
are pretty large).

Searching round it seems like other people have experienced similar 
issues and the answer has been "Lucene just doesn't work like that"

"When dealing with Lucene people are strongly encouraged to think in 
terms of MUST, MUST_NOT and SHOULD (which are represented in the query 
parser as the prefixes "+", "-" and the default) instead of in terms of 
AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's 
QueryParser) is not a strict Boolean Logic system, so it's best not to 
try and think of it like one."

  http://wiki.apache.org/lucene-java/BooleanQuerySyntax

Am I just out of luck? Might edismax help here?

Simon

Re: Negative OR in fq field not working as expected

2011-04-25 Thread Jonathan Rochkind

The solr 'lucene' query parser (that's being used there, in an fq) 
sometimes has trouble with "pure negative" clauses in an OR.


Even though it can handle "pure negative" queries like "-type:foo", it 
has trouble with pure negative in an OR like you are doing. At least in 
1.4.1, don't know if it's been improved in 3.1.  I _think_ you may have 
a case it has trouble with.


This is what I do instead, to rewrite the query to mean the same thing 
but not give the lucene query parser trouble:


fq=( (*:* AND -type:foo) OR restriction_id:1)

"*:*" means "everything", so (*:* AND -type:foo) means the same thing as 
just "-type:foo", but can get around the lucene query parsers troubles.


So that might work for you.

Dismax has even WORSE problems with "pure negative", with no easy way to 
get around em, so switching to dismax is probably not helpful there.


On 4/25/2011 4:27 PM, Simon Wistow wrote:

I have a field 'type' that has several values. If it's type 'foo' then
it also has a field 'restriction_id'.

What I want is a filter query which says "either it's not a 'foo' or if
it is then it has the restriction '1'"

I expect two matches - one of type 'bar' and one of type 'foo'

Neither

  fq=(-type:foo OR restriction_id:1)
  fq={!dismax q.op=OR}-type:foo restriction_id:1

produce any results.

  fq=restriction_id:1

gets the 'foo' typed result.

  fq=type:bar

get the 'bar' typed result.

Either of these

   fq=type:[* TO *] OR (type:foo AND restriction_id:1)
   fq=type:(bar OR quux OR fleeg) OR restriction_id:1

do work but are very, very slow to the point of unusability (our indexes
are pretty large).

Searching round it seems like other people have experienced similar
issues and the answer has been "Lucene just doesn't work like that"

"When dealing with Lucene people are strongly encouraged to think in
terms of MUST, MUST_NOT and SHOULD (which are represented in the query
parser as the prefixes "+", "-" and the default) instead of in terms of
AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's
QueryParser) is not a strict Boolean Logic system, so it's best not to
try and think of it like one."

   http://wiki.apache.org/lucene-java/BooleanQuerySyntax

Am I just out of luck? Might edismax help here?

Simon

Re: Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow

On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
> This is what I do instead, to rewrite the query to mean the same thing but 
> not give the lucene query parser trouble:
> 
> fq=( (*:* AND -type:foo) OR restriction_id:1)
> 
> "*:*" means "everything", so (*:* AND -type:foo) means the same thing as 
> just "-type:foo", but can get around the lucene query parsers troubles.
> 
> So that might work for you.

Thanks for confirming my suspicions.

Unfortunately I've tried that as well and, whilst it works 
it's also unbelievably slow (~30s query time).

Would writing my own Query Parser help here?

Simon

Re: normalizing the score

2011-04-25 Thread Chris Hostetter



: All I found was: 
http://search.lucidimagination.com/search/document/9d06882d97db5c59/a_question_about_solr_score
: 
: where Hoss suggests to normalize depending on the maxScore.

to be clear, i do not (nor have i ever) suggested that someone normalize 
based on maxScore.

my point there was that when [people *insist* on providing osme sort of 
normalization, the maxScore is always available if they want to use it

: I am not comfortable with that since, at least, I want that a search for 
: "the wombats" in a directory of mathematical concepts, and display that 
: all scores are pretty bad and not display 1.0 for matches that are only 
: on the word "the".

the crux of the problem is in deciding what you want to normalize relative 
to -- the "ideal" solution is to normalize relative the maximum *possible* 
score for *any* query against your corpus, but that's not something that's 
generally feasible to do (and based on experiments i tried once, it didn't 
seem like it would be very useful anyway)

: It seems that the strategy would be to normalize by maxScore if the maxScore 
is bigger than 1.0.
: Can you confirm that?
: Isn't there going to be similar edge cases as above?
: 
: I remember a time where Lucene results' score were always normalized. 
: That seems to be not in SOLR, or?

once upon a time, lucene's most "beginer freindly" api did provide 
normalized scores, using the approach you described (divide by max score 
if max score greater then 1.0) and it had all of the problems you might 
expect -- but some people liked it because they had an irrational dislike 
for scores greater then 1.

Solr has never supported those psuedo-nromalize scores, and lucene's java 
API eventually got rid of them.

-Hoss

Re: Negative OR in fq field not working as expected

2011-04-25 Thread Yonik Seeley

On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow  wrote:
> On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:
>> This is what I do instead, to rewrite the query to mean the same thing but
>> not give the lucene query parser trouble:
>>
>> fq=( (*:* AND -type:foo) OR restriction_id:1)
>>
>> "*:*" means "everything", so (*:* AND -type:foo) means the same thing as
>> just "-type:foo", but can get around the lucene query parsers troubles.
>>
>> So that might work for you.
>
> Thanks for confirming my suspicions.
>
> Unfortunately I've tried that as well and, whilst it works
> it's also unbelievably slow (~30s query time).

It really shouldn't be that slow... how many documents are in your
index, and how many match -type:foo?

bq. Would writing my own Query Parser help here?

Nope.  That's just syntax.

If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
are much slower (to the point where it causes you problems) and
filters of the form
type:foo) OR restriction_id:1
are fast, then you could index the negation of the type field as well
(if you know all the types)

For instance, in a doc, index two type fields:
type:bar
type_not:foo

Or if "type" is multi-valued, you could index both foo and NOT_foo in
the same field.

Then you could express the filter as type_not:foo OR restriction_id:1
or
type:NOT_foo OR restriction_id:1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: normalizing the score

2011-04-25 Thread Paul Libbrecht

Thanks for the precision Hoss,

that is helpful an explanation.
I am still unsure how it is ever possible to display score-bars for which you 
need some normalization... but that's for another day.

I feel indications of match quality is still somehow a science that has not 
blossomed yet.
Sorting by score is, however, in very good shape.

paul


Le 25 avr. 2011 à 22:53, Chris Hostetter a écrit :

> 
> 
> : All I found was: 
> http://search.lucidimagination.com/search/document/9d06882d97db5c59/a_question_about_solr_score
> : 
> : where Hoss suggests to normalize depending on the maxScore.
> 
> to be clear, i do not (nor have i ever) suggested that someone normalize 
> based on maxScore.
> 
> my point there was that when [people *insist* on providing osme sort of 
> normalization, the maxScore is always available if they want to use it
> 
> : I am not comfortable with that since, at least, I want that a search for 
> : "the wombats" in a directory of mathematical concepts, and display that 
> : all scores are pretty bad and not display 1.0 for matches that are only 
> : on the word "the".
> 
> the crux of the problem is in deciding what you want to normalize relative 
> to -- the "ideal" solution is to normalize relative the maximum *possible* 
> score for *any* query against your corpus, but that's not something that's 
> generally feasible to do (and based on experiments i tried once, it didn't 
> seem like it would be very useful anyway)
> 
> : It seems that the strategy would be to normalize by maxScore if the 
> maxScore is bigger than 1.0.
> : Can you confirm that?
> : Isn't there going to be similar edge cases as above?
> : 
> : I remember a time where Lucene results' score were always normalized. 
> : That seems to be not in SOLR, or?
> 
> once upon a time, lucene's most "beginer freindly" api did provide 
> normalized scores, using the approach you described (divide by max score 
> if max score greater then 1.0) and it had all of the problems you might 
> expect -- but some people liked it because they had an irrational dislike 
> for scores greater then 1.
> 
> Solr has never supported those psuedo-nromalize scores, and lucene's java 
> API eventually got rid of them.
> 
> -Hoss

Re: Negative OR in fq field not working as expected

2011-04-25 Thread Jonathan Rochkind


Yeah, I do the (*:* AND -type:foo) OR something:else

thing on my own pretty big index, and it's not slow at all.  At least no 
slower than doing any other "X OR Y" where X and Y both include lots of 
results.


Pre-warming the field cache for, in this case, the 'type' field may 
help. Same as it would if 'X' were just "type:bar" (not negated) where 
"type:bar" matched about the same number or documents as "-type:foo" 
does in your case.  In general, there's nothing special that should make 
that slow, it's a pretty ordinary query, really. Just using weird syntax 
to get around lucene query parser  issues.


[Obligatory mention: This may have nothing to do with your issue, but I 
have found occasions where not having enough RAM allocated to Solr 1.4.1 
can make things terribly slow, even though there is no OutOfMemory error 
or other error in the logs. Especially if you are doing facetting and/or 
StatsComponent.  Excaserbated if you are using the default JVM GC 
strategies instead of picking some of the concurrent strategies.]


On 4/25/2011 5:02 PM, Yonik Seeley wrote:

On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow  wrote:

On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said:

This is what I do instead, to rewrite the query to mean the same thing but
not give the lucene query parser trouble:

fq=( (*:* AND -type:foo) OR restriction_id:1)

"*:*" means "everything", so (*:* AND -type:foo) means the same thing as
just "-type:foo", but can get around the lucene query parsers troubles.

So that might work for you.

Thanks for confirming my suspicions.

Unfortunately I've tried that as well and, whilst it works
it's also unbelievably slow (~30s query time).

It really shouldn't be that slow... how many documents are in your
index, and how many match -type:foo?

bq. Would writing my own Query Parser help here?

Nope.  That's just syntax.

If filters of the form ( (*:* AND -type:foo) OR restriction_id:1)
are much slower (to the point where it causes you problems) and
filters of the form
type:foo) OR restriction_id:1
are fast, then you could index the negation of the type field as well
(if you know all the types)

For instance, in a doc, index two type fields:
type:bar
type_not:foo

Or if "type" is multi-valued, you could index both foo and NOT_foo in
the same field.

Then you could express the filter as type_not:foo OR restriction_id:1
or
type:NOT_foo OR restriction_id:1

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow

On Mon, Apr 25, 2011 at 05:02:12PM -0400, Yonik Seeley said:
> It really shouldn't be that slow... how many documents are in your
> index, and how many match -type:foo?

Total number of docs is 161,000,000

 type:foo  39,000,000
-type:foo 122,200,000 
 type:bar 90,000,000

We're aware it's large and we're in the process or splitting the index 
up but I was just hoping that there was a workaround I could use in 
order to reclaim some performance.

Re: Reloading synonyms.txt without downtime

2011-04-25 Thread Chris Hostetter


: Apparently, when one RELOADs a core, the synonyms file is not reloaded.  Is 
this 
: 
: the expected behaviour?  Is it the desired behaviour?

this is not expected, nor is it desired (by me) nor can i reproduce the 
problem you are talking about.

steps i attempted to reproduce:

1) started the example (on trunk)

2) loaded the analysis.jsp page, changed the field pulldown to "type" and 
entered "text" for the type name.  entered "bbbfoo" in the "Field value 
(Query)" box, and hit the button.

3) verified that synonym filter produced "ar" as a query time synonym.

4) edited the example synony.txt file to add bbbxxx to the list of 
synonyms for bbbfoo

5) hit this url: 
http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1

6) went back to the analysis.jsp page and hit the button again.

7) verified that the results changed, and now bbbxxx was produced as 
well.

If you are seeing situations where after a core reload you do *not* see 
changes to the synonyms.txt file, then either there is an edge case bug, 
or perhaps you aren't changing what you think?

providing more details about your setup and steps to reproduce would be 
helpful.

: Issue https://issues.apache.org/jira/browse/SOLR-1307 mentions this a bit, 
but 
: doesn't go in a lot of depth.

I don't understand this sentence ... that issue is a feature request for a 
(new) general way for plugins to re-init themselves (or some aspect of 
their config) with out requing an entire core reload, i don't see any 
comments in that issue (other then the one where you mention this thread) 
suggesting that a core reload doesn't currently cause synonyms to reload 
... if you can be specific about what you mean that would be helpful.

-Hoss

Problems with Spellchecker in 3.1

2011-04-25 Thread Bob Sandiford

Oops.  Sorry.  I'm hijacking my own thread to put a real Subject in place...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


> -Original Message-
> From: Bob Sandiford
> Sent: Monday, April 25, 2011 5:34 PM
> To: solr-user@lucene.apache.org
> Subject:
> 
> Hi, all.
> 
> We're having some troubles with the Solr Spellcheck Response.  We're
> running version 3.1.
> 
> Overview:  If we search for something really ugly like:  "
> kljhklsdjahfkljsdhf book rck"
> 
> then when we get back the response, there's a suggestions list for
> 'rck', but no suggestions list for the other two words.  For 'book',
> that's fine, because it is 'spelled correctly' (i.e. we got hits on the
> word) and there shouldn't be any suggestions.  For the ugly thing,
> though, there aren't any hits.
> 
> The problem is that when we're handling the result, we can't tell the
> difference between no suggestions for a 'correctly spelled' term, and
> no suggestions for something that's odd like this.
> 
> (Now - this is happening with searches that aren't as obviously garbage
> - this was just to illustrate the point).
> 
> Our setup:
> We're running multiple shards, which may be part of the issue.  For
> example, 'book' might be found in one of the shards, but not another.
> 
> I don't *think* this has anything to do with our schema, since it's
> really how the Search Suggestions are being returned to us.
> 
> What we'd really like to see is the response coming back with an
> indication that a word wasn't found / had no suggestions.  We've hacked
> around in the code a little bit to do this, but were wondering if
> anyone has come across this, and what approaches you've taken.
> 
> Here's the xml we're getting back from the search:
> 
> 
> 
> 
> 
> 
>   0
>   56
>   
> true
> true
> score desc, RELEVANCE_SORT_nsort desc
> spellcheckedStandard
> true
> 1000
> true
>  ELECTRONIC_ACCESS_display ISBN_display TITLE_boost
> FORMAT_display score MEDIA_TYPE_display AUTHOR_boost LOCALURL_display
> UPC_display id DOC_ID_display CHILD_SITE_display DS_EC
> PRIMARY_AUTHOR_boost PRIMARY_TITLE_boost DS_ID TOPIC_display
> ASSET_NAME_display OCLC_display
>  name="shards">localhost:8983/solr/SD_ILS/,localhost:8983/solr/SD_ASSET/
> 
> 
>   AUTHOR_facet
>   FORMAT_facet
>   LANGUAGE_facet
>   PUBDATE_nfacet
>   SUBJECT_facet
>   ABCDEF_cfacet
> 
> spellcheckedStandard
> 
>   ACCESS_LEVEL_nfacet:"0"
>   CLEARANCE_nfacet:"0"
>   NEED_TO_KNOWS_facet:"@@EMPTY@@"
>   CITIZENSHIPS_facet:"@@EMPTY@@"
>   RESTRICTIONS_facet:"@@EMPTY@@"
> 
> 1
> true
> *
> 12
> 5
> 0
> TITLE_boost:"kljhklsdjahfkljsdhf book rck"~100^200.0
> OR PRIMARY_AUTHOR_boost:"kljhklsdjahfkljsdhf book rck"~100^100.0 OR
> DOC_TEXT:"kljhklsdjahfkljsdhf book rck"~100^2 OR
> PRIMARY_TITLE_boost:"kljhklsdjahfkljsdhf book rck"~100^1000.0 OR
> AUTHOR_boost:"kljhklsdjahfkljsdhf book rck"~100^20.0 OR
> textFuzzy:kljhklsdjahfkljsdhf~0.7 AND textFuzzy:book~0.7 AND
> textFuzzy:rck~0.7
>   
> 
> 
> 
>   
>   
> 
> 
> 
> 
> 
> 
>   
>   
>   
> 
> 
> 
>   
> 
>   5
>   362
>   365
>   0
>   
> 
>   rock
>   24000
> 
> 
>   rick
>   6048
> 
> 
>   rack
>   84
> 
> 
>   reck
>   78
> 
> 
>   ruck
>   30
> 
>   
> 
> false
>   
> 
> 
> 
> 
> 
> Thanks!
> 
> Bob Sandiford | Lead Software Engineer | SirsiDynix
> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> www.sirsidynix.com

Scaling Search with Big Data/Hadoop and Solr now available at Lucene Revolution

2011-04-25 Thread Jay Hill

I've worked with a lot of different Solr implementations, and one area that
is emerging more and more is using Solr in combination with other "big data"
solutions. My company, Lucid Imagination, has added a two-day course to our
upcoming Lucene Revolution conference, "Scaling Search with Big Data and
Solr", that covers Hadoop & Solr, on May 23-24 - it'll be at Lucene
Revolution in San Francisco (the conference is on May 25-26 -- see
lucenerevolution.org).

Description: "The class covers Hadoop from the ground up, including
MapReduce, the Hadoop Distributed File System (HDFS), cluster management,
etc., before continuing on to connect it to Solr. Students will study common
use cases for generating search indexes from big data, typical patterns for
the data processing workflow, and how to make it all work reliably at scale.
We will explore in-depth an example of processing 1 billion records to
create a faceted Solr search solution."

This course will be presented on May 23 and 24 at the Lucene Revolution
conference in San Francisco (the conference is on May 25-26 -- see
lucenerevolution.org). Details here:
http://lucenerevolution.org/training#solr-scaling

I've been asked by a lot of Solr users whether Lucid offers anything like
this, so I know there is a lot of interest out there.

-Jay

solr sorting on multiple conditions, please help

2011-04-25 Thread James Lin

Hi Folks,

I got a problem on solr sorting as below:

sort=query({!v="area_id: 78153"}) desc, score desc

What I want to achieve is sort by if there is a match with area_id, then
sort by the actual score

problem is, area_id is a multiple value, the result I am getting does not
sort by the actual score even the results all matches area_id 78153

I am getting results like this

Area 2, score 0.21
Area 3, score 0.38
Area 4, score 0.23

but the result should be like this

Area 3, score 0.38
Area 4, score 0.23
Area 2, score 0.21

Thanks heaps in advanced.

Regards

James

Re: Good protwords.txt ?

2011-04-25 Thread Otis Gospodnetic

Hi Robert,

That's some old thread from 1969 - that's before my time! :)

I'm not sure what 2+2lemma.txt is... aha, I see it on 
http://wordlist.sourceforge.net/12dicts-readme-r5.html -- a headword + N 
related 
words.  I don't think this will help me tame the overly aggressive Porter 
stemmer, although your sample "stemmer corrections for textTight, the 
plural-only stemmer (via StemmerOverrideFilter)" looks good and like something 
that *would* help me tame Porter.

errataerratum
newsnews
radii  radius
cavalrymen  cavalryman
...

Is the full dictionary you've built available anywhere for download?

Thanks,
Otis
P.S.
I saw that thread at http://search-lucene.com/m/jeWPi1X3FVw started a debate 
over what to include by default, concerns over performance, etc. -- I'd say 
it's 
better to include things like the above and comment it out (if we are afraid of 
poor performance out of the box or some such) than not providing it at all.

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

- Original Message 
> From: Robert Muir 
> To: solr-user@lucene.apache.org
> Sent: Mon, April 25, 2011 2:20:45 PM
> Subject: Re: Good protwords.txt ?
> 
> On Mon, Apr 25, 2011 at 2:05 PM, Otis Gospodnetic
>   wrote:
> > Hi,
> >
> > Are there any good / comprehensive examples  of protwords.txt for English?
> > Or good stemdict.txt examples that work  with StemmerOverrideFilterFactory?
> >
> > Would be good to have a good  example to include in Solr distribution...
> >
> 
> I brought this up a  while ago (as I am probably more than 50-60% done
> with all of this via  2+2lemma.txt) and there was no interest:
> 
>http://www.lucidimagination.com/search/document/180c90276e589d68/solr_example_synonyms_file
>e
>

Re: Automatic synonyms for multiple variations of a word

2011-04-25 Thread Otis Gospodnetic

Hi Otis & Robert,

 - Original Message 

>
> How do people handle cases where synonyms are used and there are  multiple 
> version of the original word that really need to point to the same  set of 
> synonyms?
> 
> For example:
> Consider singular and plural of the  word "responsibility".  One might have 
> synonyms defined like  this:
> 
>   responsibility, obligation, duty
> 
> But the plural  "responsibilities" is not in there, and thus it will not get 
> expanded to the  synonyms above! That's a problem.
> 
> Sure, one could change the synonyms  file to look like this:
> 
>   responsibility, responsibilities,  obligation, duty
> 
> But that means somebody needs to think of all variations  of the word! 

Yes, that seems to be the case now, as it was in 2008:
http://search-lucene.com/m/gLwUCV0qU02&subj=Re+Synonyms+and+stemming+revisited
http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think that 
suggestion doesn't actually work)

> Is there a something one can do to get all variations of  the word to map to 
>the 
>
> same synonyms without having to explicitly specify  all variations of the 
word?

I think this is where Robert's 2+2lemma pointer may help because the 2+lemma 
list contains "records" where a headword is followed by a list of other 
variations of the word.  The way I think this would help is by simply taking 
that list and turning it into the synonyms file format, and then merging in the 
actual synonyms.

For example, if I have the word "responsibility", then from 2+2lemma I should 
be 
able to get that "responsibilities" is one of the variants of "responsibility". 
 
I should then be able to take those 2 words and stick them in synonyms file 
like 
this:

  responsibility, responsibilities

And then append actual synonyms to that:

  responsibility, responsibilities, obligation, duty

But I may then need to actually expand synonyms themselves, too (again using 
data from 2+2lemma):

  responsibility, responsibilities, obligation, obligations, duty, duties


I haven't tried this yet.  Just theorizing and hoping for feedback.

Does this sound about right?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Re: Automatic synonyms for multiple variations of a word

2011-04-25 Thread Lance Norskog

This has come up with stemming: you can stem your synonym list with
the FieldAnalyzer Solr http call, then save the final chewed-up terms
as a new synonym file. You then use that one in the analyzer stack
below the stemmer filter.

On Mon, Apr 25, 2011 at 9:15 PM, Otis Gospodnetic
 wrote:
> Hi Otis & Robert,
>
>  - Original Message 
>
>>
>> How do people handle cases where synonyms are used and there are  multiple
>> version of the original word that really need to point to the same  set of
>> synonyms?
>>
>> For example:
>> Consider singular and plural of the  word "responsibility".  One might have
>> synonyms defined like  this:
>>
>>   responsibility, obligation, duty
>>
>> But the plural  "responsibilities" is not in there, and thus it will not get
>> expanded to the  synonyms above! That's a problem.
>>
>> Sure, one could change the synonyms  file to look like this:
>>
>>   responsibility, responsibilities,  obligation, duty
>>
>> But that means somebody needs to think of all variations  of the word!
>
> Yes, that seems to be the case now, as it was in 2008:
> http://search-lucene.com/m/gLwUCV0qU02&subj=Re+Synonyms+and+stemming+revisited
> http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think that
> suggestion doesn't actually work)
>
>> Is there a something one can do to get all variations of  the word to map to
>>the
>>
>> same synonyms without having to explicitly specify  all variations of the
> word?
>
> I think this is where Robert's 2+2lemma pointer may help because the 2+lemma
> list contains "records" where a headword is followed by a list of other
> variations of the word.  The way I think this would help is by simply taking
> that list and turning it into the synonyms file format, and then merging in 
> the
> actual synonyms.
>
> For example, if I have the word "responsibility", then from 2+2lemma I should 
> be
> able to get that "responsibilities" is one of the variants of 
> "responsibility".
> I should then be able to take those 2 words and stick them in synonyms file 
> like
> this:
>
>  responsibility, responsibilities
>
> And then append actual synonyms to that:
>
>  responsibility, responsibilities, obligation, duty
>
> But I may then need to actually expand synonyms themselves, too (again using
> data from 2+2lemma):
>
>  responsibility, responsibilities, obligation, obligations, duty, duties
>
>
> I haven't tried this yet.  Just theorizing and hoping for feedback.
>
> Does this sound about right?
>
> Thanks,
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Automatic synonyms for multiple variations of a word

2011-04-25 Thread Otis Gospodnetic

Right, instead of this in synonyms file:

  responsibility, obligation, duty

 
I could stem each of the above words/synonyms and have something like this in 
synonyms file:

  respons, oblig, duti

But somehow this feels bad (well, so does sticking word variations in what's 
supposed to be a synonyms file), partly because it means that the person adding 
new synonyms would need to know what they stem to (or always check it against 
Solr before editing the file).

I've never seen anyone actually use such a synonyms file in production, have 
you?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Lance Norskog 
> To: solr-user@lucene.apache.org
> Sent: Tue, April 26, 2011 12:20:05 AM
> Subject: Re: Automatic synonyms for multiple variations of a word
> 
> This has come up with stemming: you can stem your synonym list with
> the  FieldAnalyzer Solr http call, then save the final chewed-up terms
> as a new  synonym file. You then use that one in the analyzer stack
> below the stemmer  filter.
> 
> On Mon, Apr 25, 2011 at 9:15 PM, Otis Gospodnetic
>   wrote:
> > Hi Otis & Robert,
> >
> >  - Original Message  
> >
> >>
> >> How do people handle cases where synonyms  are used and there are  multiple
> >> version of the original word that  really need to point to the same  set of
> >>  synonyms?
> >>
> >> For example:
> >> Consider singular and  plural of the  word "responsibility".  One might 
have
> >> synonyms  defined like  this:
> >>
> >>   responsibility, obligation,  duty
> >>
> >> But the plural  "responsibilities" is not in there,  and thus it will not 
>get
> >> expanded to the  synonyms above! That's a  problem.
> >>
> >> Sure, one could change the synonyms  file to  look like this:
> >>
> >>   responsibility, responsibilities,   obligation, duty
> >>
> >> But that means somebody needs to think  of all variations  of the word!
> >
> > Yes, that seems to be the case  now, as it was in 2008:
> > 
>http://search-lucene.com/m/gLwUCV0qU02&subj=Re+Synonyms+and+stemming+revisited
> > http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think  that
> > suggestion doesn't actually work)
> >
> >> Is there a  something one can do to get all variations of  the word to map 
> >>  
>to
> >>the
> >>
> >> same synonyms without having to  explicitly specify  all variations of the
> > word?
> >
> > I think  this is where Robert's 2+2lemma pointer may help because the 
2+lemma
> >  list contains "records" where a headword is followed by a list of other
> >  variations of the word.  The way I think this would help is by simply  
>taking
> > that list and turning it into the synonyms file format, and then  merging 
> > in 
>the
> > actual synonyms.
> >
> > For example, if I have  the word "responsibility", then from 2+2lemma I 
>should be
> > able to get  that "responsibilities" is one of the variants of 
>"responsibility".
> > I  should then be able to take those 2 words and stick them in synonyms 
> > file  
>like
> > this:
> >
> >  responsibility,  responsibilities
> >
> > And then append actual synonyms to  that:
> >
> >  responsibility, responsibilities, obligation,  duty
> >
> > But I may then need to actually expand synonyms themselves,  too (again 
using
> > data from 2+2lemma):
> >
> >  responsibility,  responsibilities, obligation, obligations, duty, duties
> >
> >
> >  I haven't tried this yet.  Just theorizing and hoping for  feedback.
> >
> > Does this sound about right?
> >
> >  Thanks,
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> 
> 
> 
> -- 
> Lance  Norskog
> goks...@gmail.com
>

Re: Query regarding solr plugin.

2011-04-25 Thread rajini maski

Thanks Erick. I have added my replies to the points you did mention. I am
somewhere going wrong. I guess do I need to club both the jars or something
? If yes, how do i do that? I have no much idea about java and jar files.
Please guide me here.

A couple of things to try.

1> when you do a 'jar -tfv ", you should see
output like:
 1183 Sun Jun 06 01:31:14 EDT 2010
org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class
and your  statement may need the whole path, in this example...
 (note,
this
is just an example of the pathing, this class has nothing to do with
your filter)...

I could see this output..

2> But I'm guessing your path is actually OK, because I'd expect to be
seeing a
"class not found" error. So my guess is that your class depends on
other jars that
aren't packaged up in your jar and if you find which ones they are and copy
them
to your lib directory you'll be OK. Or your code is throwing an error
on load. Or
something like that...

There is jar - "apache-solr-core-1.4.1.jar" this has the
BaseTokenFilterFacotry class and the Synonymfilterfactory class..I made the
changes in second class file and created it as new. Now i created a jar of
that java file and placed this in solr home/lib and also placed
"apache-solr-core-1.4.1.jar" file in lib folder of solr home.  [solr home -
c:\orch\search\solr  lib path - c:\orch\search\solr\lib]

3> to try to understand what's up, I'd back up a step. Make a really
stupid class
that doesn't do anything except derive from BaseTokenFilterFacotry and see
if
you can load that. If you can, then your process is OK and you need to
find out what classes your new filter depend on. If you still can't, then we
can
see what else we can come up with..

I am perhaps doing same. In the synonymfilterfactory class, there is a
function parse rules which takes delimiters as one of the input parameter.
Here i changed  comma ',' to '~' tilde symbol and  thats it.

Regards,
Rajani

On Mon, Apr 25, 2011 at 6:26 PM, Erick Erickson wrote:

> Looking at things more carefully, it may be one of your dependent classes
> that's not being found.
>
> A couple of things to try.
>
> 1> when you do a 'jar -tfv ", you should see
> output like:
>  1183 Sun Jun 06 01:31:14 EDT 2010
> org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class
> and your  statement may need the whole path, in this example...
>  (note,
> this
> is just an example of the pathing, this class has nothing to do with
> your filter)...
>
> 2> But I'm guessing your path is actually OK, because I'd expect to be
> seeing a
> "class not found" error. So my guess is that your class depends on
> other jars that
> aren't packaged up in your jar and if you find which ones they are and copy
> them
> to your lib directory you'll be OK. Or your code is throwing an error
> on load. Or
> something like that...
>
> 3> to try to understand what's up, I'd back up a step. Make a really
> stupid class
> that doesn't do anything except derive from BaseTokenFilterFacotry and see
> if
> you can load that. If you can, then your process is OK and you need to
> find out what classes your new filter depend on. If you still can't, then
> we can
> see what else we can come up with..
>
> Best
> Erick
>
> On Mon, Apr 25, 2011 at 2:34 AM, rajini maski 
> wrote:
> > Erick ,
> > *
> > *
> > * Thanks.* It was actually a copy mistake. Anyways i did a redo of all
> the
> > below mentioned steps. I had given class name as
> >  > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >
> > I did it again now following few different steps following this link :
> >
> http://help.eclipse.org/helios/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-32.htm
> >
> >
> > 1 ) Created new package in src folder .
> *org.apache.pointcross.synonym*.This
> > is having class Synonym.java
> >
> > 2) Now did a right click on same package and selected export option->Java
> > tab->JAR File->Selected the path for package -> finish
> >
> > 3) This created jar file in specified location. Now followed in cmd  ,
> jar
> > tfv
> > org.apache.pointcross.synonym. the following was desc in cmd.
> >
> > :\Apps\Rajani Eclipse\Solr141_jar>jar -
> > tfv org.apache.pointcross.synonym.Synonym.jar
> >  25 Mon Apr 25 11:32:12 GMT+05:30 2011 META-INF/MANIFEST.MF
> >  383 Thu Apr 14 16:36:00 GMT+05:30 2011 .project
> >  2261 Fri Apr 22 16:26:12 GMT+05:30 2011 .classpath
> >  1017 Thu Apr 21 16:34:20 GMT+05:30 2011 jarLog.jardesc
> >
> > 4) Now placed same jar file in solr home/lib folder .Solrconfig.xml
> >  enabled  and in schema   class="synonym.Synonym"
> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >
> > 5) Restart tomcat : http://localhost:8097/finding1
> >
> > Error SEVERE: org.apache.solr.common.SolrException: Error loading class
> > 'pointcross.synonym.Synonym'
> > at
> >
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
> > at
> >
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388)
> >

Re: Suggester with multi terms

Different Cluster Results on Different Servers, with same SOLR setup

Re: Query regarding solr plugin.

Re: Different Cluster Results on Different Servers, with same SOLR setup

Re: Unable to load EntityProcessor implementation for entity:16865747177753

how to concatenate two nodes of xml with xpathentityprocessor

Re: MoreLikeThis

RE: term position question from analyzer stack for WordDelimiterFilterFactory

Re: term position question from analyzer stack for WordDelimiterFilterFactory

Good protwords.txt ?

RE: Solr - Multi Term highlighting issue

Re: Good protwords.txt ?

Automatic synonyms for multiple variations of a word

RE: term position question from analyzer stack for WordDelimiterFilterFactory

Lucene Rev Stump the Chump

Re: Multi-word Solr Synonym issue

Negative OR in fq field not working as expected

Re: Negative OR in fq field not working as expected

Re: Negative OR in fq field not working as expected

Re: normalizing the score

Re: Negative OR in fq field not working as expected

Re: normalizing the score

Re: Negative OR in fq field not working as expected

Re: Negative OR in fq field not working as expected

Re: Reloading synonyms.txt without downtime

Problems with Spellchecker in 3.1

Scaling Search with Big Data/Hadoop and Solr now available at Lucene Revolution

solr sorting on multiple conditions, please help

Re: Good protwords.txt ?

Re: Automatic synonyms for multiple variations of a word

Re: Automatic synonyms for multiple variations of a word

Re: Automatic synonyms for multiple variations of a word

Re: Query regarding solr plugin.

33 matches

Site Navigation

Mail list logo

Footer information