Re: Suggester with multi terms
blocky, Shingles should be your way. Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-with-multi-terms-tp2859547p2860419.html Sent from the Solr - User mailing list archive at Nabble.com.
Different Cluster Results on Different Servers, with same SOLR setup
Hi I have same Solr 1.4 setup on two different servers, One for production & One for Staging. My production server gives proper cluster & Staging server give wrong cluster. The problem is for "date" related cluster only I have checked all the configuration & setup. everything seems fine. i am creating index through "DIH" p.s. my application & solr setup is similar on staging & production please suggest any solution. -- Thanks, Pawan Darira
Re: Query regarding solr plugin.
Looking at things more carefully, it may be one of your dependent classes that's not being found. A couple of things to try. 1> when you do a 'jar -tfv ", you should see output like: 1183 Sun Jun 06 01:31:14 EDT 2010 org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class and your statement may need the whole path, in this example... (note, this is just an example of the pathing, this class has nothing to do with your filter)... 2> But I'm guessing your path is actually OK, because I'd expect to be seeing a "class not found" error. So my guess is that your class depends on other jars that aren't packaged up in your jar and if you find which ones they are and copy them to your lib directory you'll be OK. Or your code is throwing an error on load. Or something like that... 3> to try to understand what's up, I'd back up a step. Make a really stupid class that doesn't do anything except derive from BaseTokenFilterFacotry and see if you can load that. If you can, then your process is OK and you need to find out what classes your new filter depend on. If you still can't, then we can see what else we can come up with.. Best Erick On Mon, Apr 25, 2011 at 2:34 AM, rajini maski wrote: > Erick , > * > * > * Thanks.* It was actually a copy mistake. Anyways i did a redo of all the > below mentioned steps. I had given class name as > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > I did it again now following few different steps following this link : > http://help.eclipse.org/helios/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-32.htm > > > 1 ) Created new package in src folder . *org.apache.pointcross.synonym*.This > is having class Synonym.java > > 2) Now did a right click on same package and selected export option->Java > tab->JAR File->Selected the path for package -> finish > > 3) This created jar file in specified location. Now followed in cmd , jar > tfv > org.apache.pointcross.synonym. the following was desc in cmd. > > :\Apps\Rajani Eclipse\Solr141_jar>jar - > tfv org.apache.pointcross.synonym.Synonym.jar > 25 Mon Apr 25 11:32:12 GMT+05:30 2011 META-INF/MANIFEST.MF > 383 Thu Apr 14 16:36:00 GMT+05:30 2011 .project > 2261 Fri Apr 22 16:26:12 GMT+05:30 2011 .classpath > 1017 Thu Apr 21 16:34:20 GMT+05:30 2011 jarLog.jardesc > > 4) Now placed same jar file in solr home/lib folder .Solrconfig.xml > enabled and in schema synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > 5) Restart tomcat : http://localhost:8097/finding1 > > Error SEVERE: org.apache.solr.common.SolrException: Error loading class > 'pointcross.synonym.Synonym' > at > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373) > at > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388) > at > org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84) > at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) > at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:835) > at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58) > > > I am basically trying to enable this jar functionality to solr. Please let > me know the mistake here. > > Rajani > > > > > On Fri, Apr 22, 2011 at 6:29 PM, Erick Erickson > wrote: > >> First I appreciate your writeup of the problem, it's very helpful when >> people >> take the time to put in the details >> >> I can't reconcile these two things: >> >> {{{> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> >> >> as org.apache.solr.common.SolrException: Error loading class >> 'pointcross.orchSynonymFilterFactory' at}}} >> >> This seems to indicate that your config file is really looking for >> "pointcross.orchSynonymFilterFactory" rather than >> "org.apachepco.search.orchSynonymFilterFactory". >> >> Do you perhaps have another definition in your config >> "pointcross.orchSynonymFilterFactory"? >> >> Try running "jar -tfv " to see what classes >> are actually defined in the file in the solr lib directory. Perhaps >> it's not what you expect (Perhaps Eclipse did something >> unexpected). >> >> Given the anomaly above (the error reported doesn't correspond to >> the class you defined) I'd also look to see if you have any old >> jars lying around that you somehow get to first. >> >> Finally, is there any chance that your >> "pointcross.orchSynonymFilterFactory" >> is a dependency of "org.apachepco.search.orchSynonymFilterFactory"? In >> which case Solr may be finding >> "org.apachepco.search.orchSynonymFilterFactory" >> but failing to load a dependency (that would have to be put in the lib >> or the jar). >> >> Hope that helps >> Erick >> >> >> >> On Fri, Apr 22, 2011 at 3:00 AM, rajini maski >> wrote: >> > One doubt regarding adding the solr plugin. >> > >> > >> > I have a new java file created that includes few changes in >> > SynonymFilterFactory.java. I want this java file to be added to solr >> > inst
Re: Different Cluster Results on Different Servers, with same SOLR setup
There's not much information to go on here. You haven't stated the problem so people unfamiliar with your setup can understand it. What is the error you're getting? Show us the configurations, please. You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Apr 25, 2011 at 4:56 AM, Pawan Darira wrote: > Hi > > I have same Solr 1.4 setup on two different servers, One for production & > One for Staging. My production server gives proper cluster & Staging server > give wrong cluster. The problem is for "date" related cluster only > > I have checked all the configuration & setup. everything seems fine. i am > creating index through "DIH" > > p.s. my application & solr setup is similar on staging & production > > please suggest any solution. > > -- > Thanks, > Pawan Darira >
Re: Unable to load EntityProcessor implementation for entity:16865747177753
Thanks firdous_kind86 i replace tikaentityprocessor with xpathentityprocessor and works fine -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-load-EntityProcessor-implementation-for-entity-16865747177753-tp2846513p2861229.html Sent from the Solr - User mailing list archive at Nabble.com.
how to concatenate two nodes of xml with xpathentityprocessor
hello , i am using Xpathentityprocessor to do index xml files below is my xml file CustomerA ThisB AnyC now i want to concatenate in index so that when i search it gives below result CData with id attribute--- like CustomerAThisB or something like that is it possible by RegexTransformer or templatetransformer? i did googling little for both but could not get excat/useful solution Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-concatenate-two-nodes-of-xml-with-xpathentityprocessor-tp2861260p2861260.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MoreLikeThis
It finds something under "match" but just nothing under "response". I tried turning on debugQuery=on but I did not see anything that jumped out at me as a bug or anything. Is there some kind of threshold setting that I can tinker with to see if that is the problem? On Sun, Apr 24, 2011 at 2:37 AM, Grant Ingersoll wrote: > > On Apr 21, 2011, at 8:46 PM, Brian Lamb wrote: > > > Hi all, > > > > I have an mlt search set up on my site with over 2 million records in the > > index. Normally, my results look like: > > > > > > > >0 > >204 > > > > > > > > Some result. > > > > > > > > > > A similar result > > > >... > > > > > > > > And there are 100 results under response. However, in some cases, there > are > > no results under "response". Why is this the case and is there anything I > > can do about it? > > Is it because it couldn't find anything? Or are you thinking there is a > bug? You might try adding &debugQuery=true and see what gets parsed, etc. > and then try running that query. > > > > > > Here is my mlt configuration: > > > > > > > >title,score > >1 > >100 > >*,score > > > > > > > > And here is the URL I use to get results: > > http://localhost:8983/solr/mlt/?q=title:Some random title > > > > Any help on this matter would be greatly appreciated. Thanks! > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem docs using Solr/Lucene: > http://www.lucidimagination.com/search > >
RE: term position question from analyzer stack for WordDelimiterFilterFactory
Sorry, that was supposed to be just another way to say the same thing... OK look here is my current situation. Even with preserveOriginal and concatAll set, I am still getting an even odder result. I set up sku=218078624 with title=" Beanbag AppleTV Friction Dash Mount for GPS " and index it in dev. The search and index analyzer stack are the same. When I do this search in the solr admin page I get zero results " sku:218078624 title:AppleTV " but when I do this search I get one result " sku:218078624 title:appletv ". This is the opposite of what was happening before I added the preserve original setting. In the analysis page I plug in that title and term, and it looks to me like it should match... which is why I started asking about term positions and such. I don't understand why I don't get a hit in both cases. It is so weird. -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, April 22, 2011 5:55 PM To: Robert Petersen Cc: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Fri, Apr 22, 2011 at 8:24 PM, Robert Petersen wrote: > I can repeatedly demonstrate this in my dev environment, where I get > entirely different results searching for AppleTV vs. appletv You originally said "I cannot get a match between AppleTV on the indexing side and appletv on the search side". Getting different numbers of results or different results is slightly different. For example, if there were a document with "Apple TV" in it, then a query of "AppleTV" would match that doc, but a query of "appletv" would not. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: term position question from analyzer stack for WordDelimiterFilterFactory
On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen wrote: > The search and index analyzer stack are the same. Ahhh, they should not be! Using both generate and catenate in WDF at query time is a no-no. Same reason you can't have multi-word synonyms at query time: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory I'd recommend going back to the WDF settings in the solr example server as a starting point. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Good protwords.txt ?
Hi, Are there any good / comprehensive examples of protwords.txt for English? Or good stemdict.txt examples that work with StemmerOverrideFilterFactory? Would be good to have a good example to include in Solr distribution... Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
RE: Solr - Multi Term highlighting issue
Hi Robert, Thanks for your help. This looks much closer to my issue(may be not). Unfortunately, I can't switch to solr version 3.1 yet. I hope to revisit and update this post when I do. Thanks thanks & regards, Rajesh Ramana Enterprise Applications, Turner Broadcasting System, Inc. 404.878.7474 -Original Message- From: Ramanathapuram, Rajesh [mailto:rajesh.ramanathapu...@turner.com] Sent: Sunday, April 24, 2011 1:58 AM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Solr - Multi Term highlighting issue I think I am using ver 1.4, I 'll try to review the link you provided later today. Rajesh Ramana On Apr 24, 2011, at 12:52 AM, "Robert Muir" wrote: > On Sat, Apr 23, 2011 at 11:36 PM, Ramanathapuram, Rajesh > wrote: >> What is really weird is if I search for srchterm1 and srchterm2 >> separately, the results come up fine. If I search for multiple terms, >> this issue seems to happen when the terms are separated by html tags >> and special characters like ') / \' etc... >> > > What version of Solr are you using? Because you are saying the issue > only happens when terms involve special characters, its possible it > could be this bug: https://issues.apache.org/jira/browse/LUCENE-2874, > with the overlapping terms being created by the WordDelimiterFilter. > > This is fixed in 3.1.
Re: Good protwords.txt ?
On Mon, Apr 25, 2011 at 2:05 PM, Otis Gospodnetic wrote: > Hi, > > Are there any good / comprehensive examples of protwords.txt for English? > Or good stemdict.txt examples that work with StemmerOverrideFilterFactory? > > Would be good to have a good example to include in Solr distribution... > I brought this up a while ago (as I am probably more than 50-60% done with all of this via 2+2lemma.txt) and there was no interest: http://www.lucidimagination.com/search/document/180c90276e589d68/solr_example_synonyms_file
Automatic synonyms for multiple variations of a word
Hi, How do people handle cases where synonyms are used and there are multiple version of the original word that really need to point to the same set of synonyms? For example: Consider singular and plural of the word "responsibility". One might have synonyms defined like this: responsibility, obligation, duty But the plural "responsibilities" is not in there, and thus it will not get expanded to the synonyms above! That's a problem. Sure, one could change the synonyms file to look like this: responsibility, responsibilities, obligation, duty But that means somebody needs to think of all variations of the word! Is there a something one can do to get all variations of the word to map to the same synonyms without having to explicitly specify all variations of the word? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
RE: term position question from analyzer stack for WordDelimiterFilterFactory
Aha! I knew something must be awry, but when I looked at the analysis page output, well it sure looked like it should match. :) OK here is the query side WDF that finally works, I just turned everything off. (yay) First I tried just completely removeing WDF from the query side analyzer stack but that didn't work. So anyway I suppose I should turn off the catenate all plus the preserve original settings, reindex, and see if I still get a match huh? (PS thank you very much for the help!!!) -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, April 25, 2011 9:24 AM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen wrote: > The search and index analyzer stack are the same. Ahhh, they should not be! Using both generate and catenate in WDF at query time is a no-no. Same reason you can't have multi-word synonyms at query time: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym FilterFactory I'd recommend going back to the WDF settings in the solr example server as a starting point. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Lucene Rev Stump the Chump
Hey everyone, As you no doubt by now know, Lucene Revolution, the second annual Lucene/Solr conference sponsored by Lucid Imagination, is happening out in San Francisco at the end of May. There are a lot of really great talks and speakers from across the spectrum (check out lucenerevolution.org if you haven't already) on how people tackled and solved tough problems across the Lucene/Solr space. Now, it's time for _your_ toughest, most challenging Solr/Lucene questions. Back by popular demand at this year's Revolution conference, I'll be on the hot seat for "Stump The Chump!" -- where I'll spontaneously field Solr/Lucene questions I've never seen before in front of a hundreds of people. But in order to be a success, we need your questions/problems/challenges. Please email a description of your Lucene/Solr problem to i...@lucenerevolution.org (don't reply here, as I don't want to see it ahead of time) You can read more details online at http://bit.ly/stump-grant Even if you won't be able to make it to San Francisco, please send in any good questions you would be interested to see me tackle under the spotlight. We'll record the session on video and post it online shortly after the conference (we're exploring a webcast -- still TBD). Grant
Re: Multi-word Solr Synonym issue
: Subject: Multi-word Solr Synonym issue : In-Reply-To: http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Negative OR in fq field not working as expected
I have a field 'type' that has several values. If it's type 'foo' then it also has a field 'restriction_id'. What I want is a filter query which says "either it's not a 'foo' or if it is then it has the restriction '1'" I expect two matches - one of type 'bar' and one of type 'foo' Neither fq=(-type:foo OR restriction_id:1) fq={!dismax q.op=OR}-type:foo restriction_id:1 produce any results. fq=restriction_id:1 gets the 'foo' typed result. fq=type:bar get the 'bar' typed result. Either of these fq=type:[* TO *] OR (type:foo AND restriction_id:1) fq=type:(bar OR quux OR fleeg) OR restriction_id:1 do work but are very, very slow to the point of unusability (our indexes are pretty large). Searching round it seems like other people have experienced similar issues and the answer has been "Lucene just doesn't work like that" "When dealing with Lucene people are strongly encouraged to think in terms of MUST, MUST_NOT and SHOULD (which are represented in the query parser as the prefixes "+", "-" and the default) instead of in terms of AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's QueryParser) is not a strict Boolean Logic system, so it's best not to try and think of it like one." http://wiki.apache.org/lucene-java/BooleanQuerySyntax Am I just out of luck? Might edismax help here? Simon
Re: Negative OR in fq field not working as expected
The solr 'lucene' query parser (that's being used there, in an fq) sometimes has trouble with "pure negative" clauses in an OR. Even though it can handle "pure negative" queries like "-type:foo", it has trouble with pure negative in an OR like you are doing. At least in 1.4.1, don't know if it's been improved in 3.1. I _think_ you may have a case it has trouble with. This is what I do instead, to rewrite the query to mean the same thing but not give the lucene query parser trouble: fq=( (*:* AND -type:foo) OR restriction_id:1) "*:*" means "everything", so (*:* AND -type:foo) means the same thing as just "-type:foo", but can get around the lucene query parsers troubles. So that might work for you. Dismax has even WORSE problems with "pure negative", with no easy way to get around em, so switching to dismax is probably not helpful there. On 4/25/2011 4:27 PM, Simon Wistow wrote: I have a field 'type' that has several values. If it's type 'foo' then it also has a field 'restriction_id'. What I want is a filter query which says "either it's not a 'foo' or if it is then it has the restriction '1'" I expect two matches - one of type 'bar' and one of type 'foo' Neither fq=(-type:foo OR restriction_id:1) fq={!dismax q.op=OR}-type:foo restriction_id:1 produce any results. fq=restriction_id:1 gets the 'foo' typed result. fq=type:bar get the 'bar' typed result. Either of these fq=type:[* TO *] OR (type:foo AND restriction_id:1) fq=type:(bar OR quux OR fleeg) OR restriction_id:1 do work but are very, very slow to the point of unusability (our indexes are pretty large). Searching round it seems like other people have experienced similar issues and the answer has been "Lucene just doesn't work like that" "When dealing with Lucene people are strongly encouraged to think in terms of MUST, MUST_NOT and SHOULD (which are represented in the query parser as the prefixes "+", "-" and the default) instead of in terms of AND, OR, and NOT ... Lucene's Boolean Queries (and thus Lucene's QueryParser) is not a strict Boolean Logic system, so it's best not to try and think of it like one." http://wiki.apache.org/lucene-java/BooleanQuerySyntax Am I just out of luck? Might edismax help here? Simon
Re: Negative OR in fq field not working as expected
On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said: > This is what I do instead, to rewrite the query to mean the same thing but > not give the lucene query parser trouble: > > fq=( (*:* AND -type:foo) OR restriction_id:1) > > "*:*" means "everything", so (*:* AND -type:foo) means the same thing as > just "-type:foo", but can get around the lucene query parsers troubles. > > So that might work for you. Thanks for confirming my suspicions. Unfortunately I've tried that as well and, whilst it works it's also unbelievably slow (~30s query time). Would writing my own Query Parser help here? Simon
Re: normalizing the score
: All I found was: http://search.lucidimagination.com/search/document/9d06882d97db5c59/a_question_about_solr_score : : where Hoss suggests to normalize depending on the maxScore. to be clear, i do not (nor have i ever) suggested that someone normalize based on maxScore. my point there was that when [people *insist* on providing osme sort of normalization, the maxScore is always available if they want to use it : I am not comfortable with that since, at least, I want that a search for : "the wombats" in a directory of mathematical concepts, and display that : all scores are pretty bad and not display 1.0 for matches that are only : on the word "the". the crux of the problem is in deciding what you want to normalize relative to -- the "ideal" solution is to normalize relative the maximum *possible* score for *any* query against your corpus, but that's not something that's generally feasible to do (and based on experiments i tried once, it didn't seem like it would be very useful anyway) : It seems that the strategy would be to normalize by maxScore if the maxScore is bigger than 1.0. : Can you confirm that? : Isn't there going to be similar edge cases as above? : : I remember a time where Lucene results' score were always normalized. : That seems to be not in SOLR, or? once upon a time, lucene's most "beginer freindly" api did provide normalized scores, using the approach you described (divide by max score if max score greater then 1.0) and it had all of the problems you might expect -- but some people liked it because they had an irrational dislike for scores greater then 1. Solr has never supported those psuedo-nromalize scores, and lucene's java API eventually got rid of them. -Hoss
Re: Negative OR in fq field not working as expected
On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow wrote: > On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said: >> This is what I do instead, to rewrite the query to mean the same thing but >> not give the lucene query parser trouble: >> >> fq=( (*:* AND -type:foo) OR restriction_id:1) >> >> "*:*" means "everything", so (*:* AND -type:foo) means the same thing as >> just "-type:foo", but can get around the lucene query parsers troubles. >> >> So that might work for you. > > Thanks for confirming my suspicions. > > Unfortunately I've tried that as well and, whilst it works > it's also unbelievably slow (~30s query time). It really shouldn't be that slow... how many documents are in your index, and how many match -type:foo? bq. Would writing my own Query Parser help here? Nope. That's just syntax. If filters of the form ( (*:* AND -type:foo) OR restriction_id:1) are much slower (to the point where it causes you problems) and filters of the form type:foo) OR restriction_id:1 are fast, then you could index the negation of the type field as well (if you know all the types) For instance, in a doc, index two type fields: type:bar type_not:foo Or if "type" is multi-valued, you could index both foo and NOT_foo in the same field. Then you could express the filter as type_not:foo OR restriction_id:1 or type:NOT_foo OR restriction_id:1 -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: normalizing the score
Thanks for the precision Hoss, that is helpful an explanation. I am still unsure how it is ever possible to display score-bars for which you need some normalization... but that's for another day. I feel indications of match quality is still somehow a science that has not blossomed yet. Sorting by score is, however, in very good shape. paul Le 25 avr. 2011 à 22:53, Chris Hostetter a écrit : > > > : All I found was: > http://search.lucidimagination.com/search/document/9d06882d97db5c59/a_question_about_solr_score > : > : where Hoss suggests to normalize depending on the maxScore. > > to be clear, i do not (nor have i ever) suggested that someone normalize > based on maxScore. > > my point there was that when [people *insist* on providing osme sort of > normalization, the maxScore is always available if they want to use it > > : I am not comfortable with that since, at least, I want that a search for > : "the wombats" in a directory of mathematical concepts, and display that > : all scores are pretty bad and not display 1.0 for matches that are only > : on the word "the". > > the crux of the problem is in deciding what you want to normalize relative > to -- the "ideal" solution is to normalize relative the maximum *possible* > score for *any* query against your corpus, but that's not something that's > generally feasible to do (and based on experiments i tried once, it didn't > seem like it would be very useful anyway) > > : It seems that the strategy would be to normalize by maxScore if the > maxScore is bigger than 1.0. > : Can you confirm that? > : Isn't there going to be similar edge cases as above? > : > : I remember a time where Lucene results' score were always normalized. > : That seems to be not in SOLR, or? > > once upon a time, lucene's most "beginer freindly" api did provide > normalized scores, using the approach you described (divide by max score > if max score greater then 1.0) and it had all of the problems you might > expect -- but some people liked it because they had an irrational dislike > for scores greater then 1. > > Solr has never supported those psuedo-nromalize scores, and lucene's java > API eventually got rid of them. > > -Hoss
Re: Negative OR in fq field not working as expected
Yeah, I do the (*:* AND -type:foo) OR something:else thing on my own pretty big index, and it's not slow at all. At least no slower than doing any other "X OR Y" where X and Y both include lots of results. Pre-warming the field cache for, in this case, the 'type' field may help. Same as it would if 'X' were just "type:bar" (not negated) where "type:bar" matched about the same number or documents as "-type:foo" does in your case. In general, there's nothing special that should make that slow, it's a pretty ordinary query, really. Just using weird syntax to get around lucene query parser issues. [Obligatory mention: This may have nothing to do with your issue, but I have found occasions where not having enough RAM allocated to Solr 1.4.1 can make things terribly slow, even though there is no OutOfMemory error or other error in the logs. Especially if you are doing facetting and/or StatsComponent. Excaserbated if you are using the default JVM GC strategies instead of picking some of the concurrent strategies.] On 4/25/2011 5:02 PM, Yonik Seeley wrote: On Mon, Apr 25, 2011 at 4:49 PM, Simon Wistow wrote: On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said: This is what I do instead, to rewrite the query to mean the same thing but not give the lucene query parser trouble: fq=( (*:* AND -type:foo) OR restriction_id:1) "*:*" means "everything", so (*:* AND -type:foo) means the same thing as just "-type:foo", but can get around the lucene query parsers troubles. So that might work for you. Thanks for confirming my suspicions. Unfortunately I've tried that as well and, whilst it works it's also unbelievably slow (~30s query time). It really shouldn't be that slow... how many documents are in your index, and how many match -type:foo? bq. Would writing my own Query Parser help here? Nope. That's just syntax. If filters of the form ( (*:* AND -type:foo) OR restriction_id:1) are much slower (to the point where it causes you problems) and filters of the form type:foo) OR restriction_id:1 are fast, then you could index the negation of the type field as well (if you know all the types) For instance, in a doc, index two type fields: type:bar type_not:foo Or if "type" is multi-valued, you could index both foo and NOT_foo in the same field. Then you could express the filter as type_not:foo OR restriction_id:1 or type:NOT_foo OR restriction_id:1 -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Negative OR in fq field not working as expected
On Mon, Apr 25, 2011 at 05:02:12PM -0400, Yonik Seeley said: > It really shouldn't be that slow... how many documents are in your > index, and how many match -type:foo? Total number of docs is 161,000,000 type:foo 39,000,000 -type:foo 122,200,000 type:bar 90,000,000 We're aware it's large and we're in the process or splitting the index up but I was just hoping that there was a workaround I could use in order to reclaim some performance.
Re: Reloading synonyms.txt without downtime
: Apparently, when one RELOADs a core, the synonyms file is not reloaded. Is this : : the expected behaviour? Is it the desired behaviour? this is not expected, nor is it desired (by me) nor can i reproduce the problem you are talking about. steps i attempted to reproduce: 1) started the example (on trunk) 2) loaded the analysis.jsp page, changed the field pulldown to "type" and entered "text" for the type name. entered "bbbfoo" in the "Field value (Query)" box, and hit the button. 3) verified that synonym filter produced "ar" as a query time synonym. 4) edited the example synony.txt file to add bbbxxx to the list of synonyms for bbbfoo 5) hit this url: http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1 6) went back to the analysis.jsp page and hit the button again. 7) verified that the results changed, and now bbbxxx was produced as well. If you are seeing situations where after a core reload you do *not* see changes to the synonyms.txt file, then either there is an edge case bug, or perhaps you aren't changing what you think? providing more details about your setup and steps to reproduce would be helpful. : Issue https://issues.apache.org/jira/browse/SOLR-1307 mentions this a bit, but : doesn't go in a lot of depth. I don't understand this sentence ... that issue is a feature request for a (new) general way for plugins to re-init themselves (or some aspect of their config) with out requing an entire core reload, i don't see any comments in that issue (other then the one where you mention this thread) suggesting that a core reload doesn't currently cause synonyms to reload ... if you can be specific about what you mean that would be helpful. -Hoss
Problems with Spellchecker in 3.1
Oops. Sorry. I'm hijacking my own thread to put a real Subject in place... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com > -Original Message- > From: Bob Sandiford > Sent: Monday, April 25, 2011 5:34 PM > To: solr-user@lucene.apache.org > Subject: > > Hi, all. > > We're having some troubles with the Solr Spellcheck Response. We're > running version 3.1. > > Overview: If we search for something really ugly like: " > kljhklsdjahfkljsdhf book rck" > > then when we get back the response, there's a suggestions list for > 'rck', but no suggestions list for the other two words. For 'book', > that's fine, because it is 'spelled correctly' (i.e. we got hits on the > word) and there shouldn't be any suggestions. For the ugly thing, > though, there aren't any hits. > > The problem is that when we're handling the result, we can't tell the > difference between no suggestions for a 'correctly spelled' term, and > no suggestions for something that's odd like this. > > (Now - this is happening with searches that aren't as obviously garbage > - this was just to illustrate the point). > > Our setup: > We're running multiple shards, which may be part of the issue. For > example, 'book' might be found in one of the shards, but not another. > > I don't *think* this has anything to do with our schema, since it's > really how the Search Suggestions are being returned to us. > > What we'd really like to see is the response coming back with an > indication that a word wasn't found / had no suggestions. We've hacked > around in the code a little bit to do this, but were wondering if > anyone has come across this, and what approaches you've taken. > > Here's the xml we're getting back from the search: > > > > > > > 0 > 56 > > true > true > score desc, RELEVANCE_SORT_nsort desc > spellcheckedStandard > true > 1000 > true > ELECTRONIC_ACCESS_display ISBN_display TITLE_boost > FORMAT_display score MEDIA_TYPE_display AUTHOR_boost LOCALURL_display > UPC_display id DOC_ID_display CHILD_SITE_display DS_EC > PRIMARY_AUTHOR_boost PRIMARY_TITLE_boost DS_ID TOPIC_display > ASSET_NAME_display OCLC_display > name="shards">localhost:8983/solr/SD_ILS/,localhost:8983/solr/SD_ASSET/ > > > AUTHOR_facet > FORMAT_facet > LANGUAGE_facet > PUBDATE_nfacet > SUBJECT_facet > ABCDEF_cfacet > > spellcheckedStandard > > ACCESS_LEVEL_nfacet:"0" > CLEARANCE_nfacet:"0" > NEED_TO_KNOWS_facet:"@@EMPTY@@" > CITIZENSHIPS_facet:"@@EMPTY@@" > RESTRICTIONS_facet:"@@EMPTY@@" > > 1 > true > * > 12 > 5 > 0 > TITLE_boost:"kljhklsdjahfkljsdhf book rck"~100^200.0 > OR PRIMARY_AUTHOR_boost:"kljhklsdjahfkljsdhf book rck"~100^100.0 OR > DOC_TEXT:"kljhklsdjahfkljsdhf book rck"~100^2 OR > PRIMARY_TITLE_boost:"kljhklsdjahfkljsdhf book rck"~100^1000.0 OR > AUTHOR_boost:"kljhklsdjahfkljsdhf book rck"~100^20.0 OR > textFuzzy:kljhklsdjahfkljsdhf~0.7 AND textFuzzy:book~0.7 AND > textFuzzy:rck~0.7 > > > > > > > > > > > > > > > > > > > > > 5 > 362 > 365 > 0 > > > rock > 24000 > > > rick > 6048 > > > rack > 84 > > > reck > 78 > > > ruck > 30 > > > > false > > > > > > > Thanks! > > Bob Sandiford | Lead Software Engineer | SirsiDynix > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com > www.sirsidynix.com
Scaling Search with Big Data/Hadoop and Solr now available at Lucene Revolution
I've worked with a lot of different Solr implementations, and one area that is emerging more and more is using Solr in combination with other "big data" solutions. My company, Lucid Imagination, has added a two-day course to our upcoming Lucene Revolution conference, "Scaling Search with Big Data and Solr", that covers Hadoop & Solr, on May 23-24 - it'll be at Lucene Revolution in San Francisco (the conference is on May 25-26 -- see lucenerevolution.org). Description: "The class covers Hadoop from the ground up, including MapReduce, the Hadoop Distributed File System (HDFS), cluster management, etc., before continuing on to connect it to Solr. Students will study common use cases for generating search indexes from big data, typical patterns for the data processing workflow, and how to make it all work reliably at scale. We will explore in-depth an example of processing 1 billion records to create a faceted Solr search solution." This course will be presented on May 23 and 24 at the Lucene Revolution conference in San Francisco (the conference is on May 25-26 -- see lucenerevolution.org). Details here: http://lucenerevolution.org/training#solr-scaling I've been asked by a lot of Solr users whether Lucid offers anything like this, so I know there is a lot of interest out there. -Jay
solr sorting on multiple conditions, please help
Hi Folks, I got a problem on solr sorting as below: sort=query({!v="area_id: 78153"}) desc, score desc What I want to achieve is sort by if there is a match with area_id, then sort by the actual score problem is, area_id is a multiple value, the result I am getting does not sort by the actual score even the results all matches area_id 78153 I am getting results like this Area 2, score 0.21 Area 3, score 0.38 Area 4, score 0.23 but the result should be like this Area 3, score 0.38 Area 4, score 0.23 Area 2, score 0.21 Thanks heaps in advanced. Regards James
Re: Good protwords.txt ?
Hi Robert, That's some old thread from 1969 - that's before my time! :) I'm not sure what 2+2lemma.txt is... aha, I see it on http://wordlist.sourceforge.net/12dicts-readme-r5.html -- a headword + N related words. I don't think this will help me tame the overly aggressive Porter stemmer, although your sample "stemmer corrections for textTight, the plural-only stemmer (via StemmerOverrideFilter)" looks good and like something that *would* help me tame Porter. errataerratum newsnews radii radius cavalrymen cavalryman ... Is the full dictionary you've built available anywhere for download? Thanks, Otis P.S. I saw that thread at http://search-lucene.com/m/jeWPi1X3FVw started a debate over what to include by default, concerns over performance, etc. -- I'd say it's better to include things like the above and comment it out (if we are afraid of poor performance out of the box or some such) than not providing it at all. Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Robert Muir > To: solr-user@lucene.apache.org > Sent: Mon, April 25, 2011 2:20:45 PM > Subject: Re: Good protwords.txt ? > > On Mon, Apr 25, 2011 at 2:05 PM, Otis Gospodnetic > wrote: > > Hi, > > > > Are there any good / comprehensive examples of protwords.txt for English? > > Or good stemdict.txt examples that work with StemmerOverrideFilterFactory? > > > > Would be good to have a good example to include in Solr distribution... > > > > I brought this up a while ago (as I am probably more than 50-60% done > with all of this via 2+2lemma.txt) and there was no interest: > >http://www.lucidimagination.com/search/document/180c90276e589d68/solr_example_synonyms_file >e >
Re: Automatic synonyms for multiple variations of a word
Hi Otis & Robert, - Original Message > > How do people handle cases where synonyms are used and there are multiple > version of the original word that really need to point to the same set of > synonyms? > > For example: > Consider singular and plural of the word "responsibility". One might have > synonyms defined like this: > > responsibility, obligation, duty > > But the plural "responsibilities" is not in there, and thus it will not get > expanded to the synonyms above! That's a problem. > > Sure, one could change the synonyms file to look like this: > > responsibility, responsibilities, obligation, duty > > But that means somebody needs to think of all variations of the word! Yes, that seems to be the case now, as it was in 2008: http://search-lucene.com/m/gLwUCV0qU02&subj=Re+Synonyms+and+stemming+revisited http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think that suggestion doesn't actually work) > Is there a something one can do to get all variations of the word to map to >the > > same synonyms without having to explicitly specify all variations of the word? I think this is where Robert's 2+2lemma pointer may help because the 2+lemma list contains "records" where a headword is followed by a list of other variations of the word. The way I think this would help is by simply taking that list and turning it into the synonyms file format, and then merging in the actual synonyms. For example, if I have the word "responsibility", then from 2+2lemma I should be able to get that "responsibilities" is one of the variants of "responsibility". I should then be able to take those 2 words and stick them in synonyms file like this: responsibility, responsibilities And then append actual synonyms to that: responsibility, responsibilities, obligation, duty But I may then need to actually expand synonyms themselves, too (again using data from 2+2lemma): responsibility, responsibilities, obligation, obligations, duty, duties I haven't tried this yet. Just theorizing and hoping for feedback. Does this sound about right? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: Automatic synonyms for multiple variations of a word
This has come up with stemming: you can stem your synonym list with the FieldAnalyzer Solr http call, then save the final chewed-up terms as a new synonym file. You then use that one in the analyzer stack below the stemmer filter. On Mon, Apr 25, 2011 at 9:15 PM, Otis Gospodnetic wrote: > Hi Otis & Robert, > > - Original Message > >> >> How do people handle cases where synonyms are used and there are multiple >> version of the original word that really need to point to the same set of >> synonyms? >> >> For example: >> Consider singular and plural of the word "responsibility". One might have >> synonyms defined like this: >> >> responsibility, obligation, duty >> >> But the plural "responsibilities" is not in there, and thus it will not get >> expanded to the synonyms above! That's a problem. >> >> Sure, one could change the synonyms file to look like this: >> >> responsibility, responsibilities, obligation, duty >> >> But that means somebody needs to think of all variations of the word! > > Yes, that seems to be the case now, as it was in 2008: > http://search-lucene.com/m/gLwUCV0qU02&subj=Re+Synonyms+and+stemming+revisited > http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think that > suggestion doesn't actually work) > >> Is there a something one can do to get all variations of the word to map to >>the >> >> same synonyms without having to explicitly specify all variations of the > word? > > I think this is where Robert's 2+2lemma pointer may help because the 2+lemma > list contains "records" where a headword is followed by a list of other > variations of the word. The way I think this would help is by simply taking > that list and turning it into the synonyms file format, and then merging in > the > actual synonyms. > > For example, if I have the word "responsibility", then from 2+2lemma I should > be > able to get that "responsibilities" is one of the variants of > "responsibility". > I should then be able to take those 2 words and stick them in synonyms file > like > this: > > responsibility, responsibilities > > And then append actual synonyms to that: > > responsibility, responsibilities, obligation, duty > > But I may then need to actually expand synonyms themselves, too (again using > data from 2+2lemma): > > responsibility, responsibilities, obligation, obligations, duty, duties > > > I haven't tried this yet. Just theorizing and hoping for feedback. > > Does this sound about right? > > Thanks, > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > -- Lance Norskog goks...@gmail.com
Re: Automatic synonyms for multiple variations of a word
Right, instead of this in synonyms file: responsibility, obligation, duty I could stem each of the above words/synonyms and have something like this in synonyms file: respons, oblig, duti But somehow this feels bad (well, so does sticking word variations in what's supposed to be a synonyms file), partly because it means that the person adding new synonyms would need to know what they stem to (or always check it against Solr before editing the file). I've never seen anyone actually use such a synonyms file in production, have you? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Lance Norskog > To: solr-user@lucene.apache.org > Sent: Tue, April 26, 2011 12:20:05 AM > Subject: Re: Automatic synonyms for multiple variations of a word > > This has come up with stemming: you can stem your synonym list with > the FieldAnalyzer Solr http call, then save the final chewed-up terms > as a new synonym file. You then use that one in the analyzer stack > below the stemmer filter. > > On Mon, Apr 25, 2011 at 9:15 PM, Otis Gospodnetic > wrote: > > Hi Otis & Robert, > > > > - Original Message > > > >> > >> How do people handle cases where synonyms are used and there are multiple > >> version of the original word that really need to point to the same set of > >> synonyms? > >> > >> For example: > >> Consider singular and plural of the word "responsibility". One might have > >> synonyms defined like this: > >> > >> responsibility, obligation, duty > >> > >> But the plural "responsibilities" is not in there, and thus it will not >get > >> expanded to the synonyms above! That's a problem. > >> > >> Sure, one could change the synonyms file to look like this: > >> > >> responsibility, responsibilities, obligation, duty > >> > >> But that means somebody needs to think of all variations of the word! > > > > Yes, that seems to be the case now, as it was in 2008: > > >http://search-lucene.com/m/gLwUCV0qU02&subj=Re+Synonyms+and+stemming+revisited > > http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think that > > suggestion doesn't actually work) > > > >> Is there a something one can do to get all variations of the word to map > >> >to > >>the > >> > >> same synonyms without having to explicitly specify all variations of the > > word? > > > > I think this is where Robert's 2+2lemma pointer may help because the 2+lemma > > list contains "records" where a headword is followed by a list of other > > variations of the word. The way I think this would help is by simply >taking > > that list and turning it into the synonyms file format, and then merging > > in >the > > actual synonyms. > > > > For example, if I have the word "responsibility", then from 2+2lemma I >should be > > able to get that "responsibilities" is one of the variants of >"responsibility". > > I should then be able to take those 2 words and stick them in synonyms > > file >like > > this: > > > > responsibility, responsibilities > > > > And then append actual synonyms to that: > > > > responsibility, responsibilities, obligation, duty > > > > But I may then need to actually expand synonyms themselves, too (again using > > data from 2+2lemma): > > > > responsibility, responsibilities, obligation, obligations, duty, duties > > > > > > I haven't tried this yet. Just theorizing and hoping for feedback. > > > > Does this sound about right? > > > > Thanks, > > Otis > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > -- > Lance Norskog > goks...@gmail.com >
Re: Query regarding solr plugin.
Thanks Erick. I have added my replies to the points you did mention. I am somewhere going wrong. I guess do I need to club both the jars or something ? If yes, how do i do that? I have no much idea about java and jar files. Please guide me here. A couple of things to try. 1> when you do a 'jar -tfv ", you should see output like: 1183 Sun Jun 06 01:31:14 EDT 2010 org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class and your statement may need the whole path, in this example... (note, this is just an example of the pathing, this class has nothing to do with your filter)... I could see this output.. 2> But I'm guessing your path is actually OK, because I'd expect to be seeing a "class not found" error. So my guess is that your class depends on other jars that aren't packaged up in your jar and if you find which ones they are and copy them to your lib directory you'll be OK. Or your code is throwing an error on load. Or something like that... There is jar - "apache-solr-core-1.4.1.jar" this has the BaseTokenFilterFacotry class and the Synonymfilterfactory class..I made the changes in second class file and created it as new. Now i created a jar of that java file and placed this in solr home/lib and also placed "apache-solr-core-1.4.1.jar" file in lib folder of solr home. [solr home - c:\orch\search\solr lib path - c:\orch\search\solr\lib] 3> to try to understand what's up, I'd back up a step. Make a really stupid class that doesn't do anything except derive from BaseTokenFilterFacotry and see if you can load that. If you can, then your process is OK and you need to find out what classes your new filter depend on. If you still can't, then we can see what else we can come up with.. I am perhaps doing same. In the synonymfilterfactory class, there is a function parse rules which takes delimiters as one of the input parameter. Here i changed comma ',' to '~' tilde symbol and thats it. Regards, Rajani On Mon, Apr 25, 2011 at 6:26 PM, Erick Erickson wrote: > Looking at things more carefully, it may be one of your dependent classes > that's not being found. > > A couple of things to try. > > 1> when you do a 'jar -tfv ", you should see > output like: > 1183 Sun Jun 06 01:31:14 EDT 2010 > org/apache/lucene/analysis/sinks/TokenTypeSinkTokenizer.class > and your statement may need the whole path, in this example... > (note, > this > is just an example of the pathing, this class has nothing to do with > your filter)... > > 2> But I'm guessing your path is actually OK, because I'd expect to be > seeing a > "class not found" error. So my guess is that your class depends on > other jars that > aren't packaged up in your jar and if you find which ones they are and copy > them > to your lib directory you'll be OK. Or your code is throwing an error > on load. Or > something like that... > > 3> to try to understand what's up, I'd back up a step. Make a really > stupid class > that doesn't do anything except derive from BaseTokenFilterFacotry and see > if > you can load that. If you can, then your process is OK and you need to > find out what classes your new filter depend on. If you still can't, then > we can > see what else we can come up with.. > > Best > Erick > > On Mon, Apr 25, 2011 at 2:34 AM, rajini maski > wrote: > > Erick , > > * > > * > > * Thanks.* It was actually a copy mistake. Anyways i did a redo of all > the > > below mentioned steps. I had given class name as > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > > I did it again now following few different steps following this link : > > > http://help.eclipse.org/helios/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-32.htm > > > > > > 1 ) Created new package in src folder . > *org.apache.pointcross.synonym*.This > > is having class Synonym.java > > > > 2) Now did a right click on same package and selected export option->Java > > tab->JAR File->Selected the path for package -> finish > > > > 3) This created jar file in specified location. Now followed in cmd , > jar > > tfv > > org.apache.pointcross.synonym. the following was desc in cmd. > > > > :\Apps\Rajani Eclipse\Solr141_jar>jar - > > tfv org.apache.pointcross.synonym.Synonym.jar > > 25 Mon Apr 25 11:32:12 GMT+05:30 2011 META-INF/MANIFEST.MF > > 383 Thu Apr 14 16:36:00 GMT+05:30 2011 .project > > 2261 Fri Apr 22 16:26:12 GMT+05:30 2011 .classpath > > 1017 Thu Apr 21 16:34:20 GMT+05:30 2011 jarLog.jardesc > > > > 4) Now placed same jar file in solr home/lib folder .Solrconfig.xml > > enabled and in schema class="synonym.Synonym" > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > > 5) Restart tomcat : http://localhost:8097/finding1 > > > > Error SEVERE: org.apache.solr.common.SolrException: Error loading class > > 'pointcross.synonym.Synonym' > > at > > > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373) > > at > > > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388) > >