Closing holes in the index - optimize
hi community, I understand that optimize builds a NEW index from the live one, and then swaping them. And that it is best to schedule optimize with care. Can we still update and query the live index while optimize is on the go? Or is the live index being locked during optimize? 10X shlomiJ
Merging results from Shards - relevancy and performance
hola, 1) When distributing search across several Shards, is the merged result reflects the overall ranking, cross-shards? I'm talking about stuff like "document frequency". I guess it does, otherwise distributed search wouldn't have overhead. talking about overhead, 2) is there a known ratio of the overhead of using shards against single core, and the impact on performance for adding the N+1 shard to the distributed index? thanks for any knowledge/thought. ShlomiJ
Question about SOLR custom sort order
Hi, I use Solr 1.4 version and I have a question about SOLR sort order. Requirement : Sort names(e.g. : location names like Dallas(City), las vegas(City), Texas(State), India(Country), Canada(Country), etc..) based on category(e.g. : CITY, STATE, COUNTRY, etc..) How to sort the SOLR results based on custom order? Expected result : 1. Dallas 2. Las Vegas 3. Texas 4. Canada 5. India Thanks & Regards, Gupta
Re: Interpreting solr response time from log
Thanks Gora for clarifying. So if my understanding is correct then the total response time is not logged in solr logs and I need to rely on the QTime in the response. -- View this message in context: http://lucene.472066.n3.nabble.com/Interpreting-solr-response-time-from-log-tp3624340p3624931.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search Query (Should I use fq)
In addition to Ahmet's comment, the other rule of thumb is that fqs do NOT influence a document's score, it's strictly an include/exclude decision. The new cache=false capabilities allow you to keep one-off fqs from using entries in your cache BTW... Best Erick On Fri, Dec 30, 2011 at 3:45 AM, reeuv wrote: > Thanks for your help iorixxx . > > If you can help me solve one of my other questions as well that would be > great > > http://lucene.472066.n3.nabble.com/Getting-results-in-reverse-order-they-were-indexed-td3620577.html > http://lucene.472066.n3.nabble.com/Getting-results-in-reverse-order-they-were-indexed-td3620577.html > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Search-Query-Should-I-use-fq-tp3620521p3620586.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Interpreting solr response time from log
On Sun, Jan 1, 2012 at 8:58 PM, Jithin wrote: > Thanks Gora for clarifying. So if my understanding is correct then the total > response time is not logged in solr logs and I need to rely on the QTime in > the response. If your log level is set at least to INFO, as it should be by default Solr does log response time to a different file. E.g., I have INFO: [] webapp=/solr path=/select/ params={indent=on&start=0&q=*:*&version=2.2&rows=10} hits=22 status=0 QTime=40 where the QTime is 40ms, as also reflected in the HTTP response. You were looking at the request logs in your example. This information is logged to standard output (usually, terminal) by the Jetty embedded in Solr (i.e., if you are doing "java -jar start.jar"), or to catalina.out if Solr is used with Tomcat. Regards, Gora
Re: Solr, SQL Server's LIKE
Chantal: bq: The problem with the wildcard searches is that the input is not analyzed. As of 3.6/4.0, this is no longer entirely true. Some analysis is performed for wildcard searches by default and you can specify most anything you want if you really need to see: https://issues.apache.org/jira/browse/SOLR-2438 and http://wiki.apache.org/solr/MultitermQueryAnalysis Best Erick On Fri, Dec 30, 2011 at 4:33 PM, Devon Baumgarten wrote: > Hoss, > > Thanks. You've answered my question. To clarify, what I should have asked for > instead of 'exact' was 'not fuzzy'. For some reason it didn't occur to me > that I didn't need n-grams to use the wildcard. You asking for me to clarify > what I meant made me realize that the n-grams are the source of all my > current problems. :) > > Thanks! > > Devon Baumgarten > > > -Original Message- > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > Sent: Thursday, December 29, 2011 7:00 PM > To: solr-user@lucene.apache.org > Subject: RE: Solr, SQL Server's LIKE > > > : Thanks. I know I'll be able to utilize some of Solr's free text > : searching capabilities in other search types in this project. The > : product manager wants this particular search to exactly mimic LIKE%. > ... > : Ex: If I search "Albatross" I want "Albert" to be excluded completely, > : rather than having a low score. > > please be specific about the types of queries you want. ie: we need more > then one example of the type of input you want to provide, the type of > matches you want to see for that input, and the type of matches you want > to get back. > > in your first message you said you need to match company titles "pretty > exactly" but then seem to contradict yourself by saying the SQL's LIKE > command fit's the bill -- even though the SQL LIKE command exists > specificly for in-exact matches on field values. > > Based on your one example above of Albatross, you don't need anything > special: don't use ngrams, don't use stemming, don't use fuzzy anything -- > just search for "Albatross" and it will match "Albatross" but not > "Albert". if you want "Albatross" to match "Albatross Road" use some > basic tokenization. > > If all you really care about is prefix searching (which seems suggested by > your "LIKE%" comment above, which i'm guessing is shorthand for something > similar to "LIKE 'ABC%'"), so that queries like "abc" and "abcd" both > match "abcdef" and "abcd" but neither of them match "abcd" > then just use prefix queries (ie: "abcd*") -- they should be plenty > efficient for your purposes. you only need to worry about ngrams when you > want to efficiently match in the middle of a string. (ie: "TITLE LIKE > %ABC%") > > > -Hoss
Re: Highlighting with prefix queries and maxBooleanClause
This may be the impetus for Hoss creating SOLR-2996. I suspect this will go away if you use the correct match-all-docs syntax, i.e. q=*:* rather than q=* Hoss' suggestion in 2996 is to "do the right thing" with q=*, but for now you need to use the right syntax. But I'm not sure what highlighting will do when there's nothing to highlight on (ie, no query terms to match against your text field). FWIW Erick On Fri, Dec 30, 2011 at 6:00 PM, Michael Lissner wrote: > This question has come up a few times, but I've yet to see a good solution. > > Basically, if I have highlighting turned on and do a query for q=*, I get an > error that maxBooleanClauses has been exceeded. Granted, this is a silly > query, but a user might do something similar. My expectation is that queries > that work when highlighting is OFF should continue working when it is ON. > > What's the best solution for queries like this? Is it simply to catch the > error and then up maxBooleanClauses? Or to turn off highlighting when this > error occurs? > > Or am I doing something altogether wrong? > > This is the query I'm using to cause the error: > http://localhost:8983/solr/select/?q=*&start=0&rows=20&hl=true&hl.fl=text > > Changing hl to false makes the query go through. > > I'm using Solr 4.0.0-dev > > The traceback is: > > SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount > is set to 1024 > at > org.apache.lucene.search.ScoringRewrite$1.checkMaxClauseCount(ScoringRewrite.java:68) > at > org.apache.lucene.search.ScoringRewrite$ParallelArraysTermCollector.collect(ScoringRewrite.java:159) > at > org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:81) > at > org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:114) > at > org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:312) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:155) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:144) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:384) > at > org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216) > at > org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184) > at > org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205) > at > org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:511) > at > org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:402) > at > org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:121) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > > Thanks, > > Mike
Re: Closing holes in the index - optimize
There's no index locking while optimizing that I know of. But optimize is not all that necessary usually unless you have a pretty static index, in fact under the covers it's been renamed to forceMerge or something just to keep it from sounding so necessary. The resource reclamation & etc. happens automagically when segments are merged now, so you might not want to optimize at all! Best Erick On Sun, Jan 1, 2012 at 3:28 AM, shlomi java wrote: > hi community, > > I understand that optimize builds a NEW index from the live one, and then > swaping them. > And that it is best to schedule optimize with care. > > Can we still update and query the live index while optimize is on the go? > Or is the live index being locked during optimize? > > 10X > shlomiJ
Re: Merging results from Shards - relevancy and performance
1> Yes. Note that the distributed tf/idf is an issue, although it's changing. That is, if your documents are statistically very different across shards, the scores aren't really comparable. This is changing, but I don't think it's committed yet. 2> Well, you're mixing apples and oranges I think. The general recommendation is to use a single core and *replicate* it across as many machines as necessary until you index gets too big to fit on your machines (i.e. you cannot get decent query times at all). This is NOT distributed searching as each request is wholly serviced by a single slave searcher. Once you cross the threshold of what fits on your hardware, you really have no choice except to shard and use distributed searching. There is certainly some overhead, but since you have no choice but to pay it, you just cope . At very large scale (i.e. lots of shards on lots of machines), you run into the "laggard problem". That is, as the number of shards increases, so does the chance that at least one of them will, for whatever reason, take an anomalously long time to complete which will slow your final results. FWIW Erick On Sun, Jan 1, 2012 at 4:34 AM, shlomi java wrote: > hola, > > 1) When distributing search across several Shards, is the merged result > reflects the overall ranking, cross-shards? > I'm talking about stuff like "document frequency". > I guess it does, otherwise distributed search wouldn't have overhead. > > talking about overhead, > 2) is there a known ratio of the overhead of using shards against > single core, and the impact on performance for adding the N+1 shard to the > distributed index? > > thanks for any knowledge/thought. > ShlomiJ
Re: Question about SOLR custom sort order
There's no good way of enforcing this as far as I know as you've outlined the problem. You can easily specify multiple sort criteria, where ties in the first criteria are broken by the second criteria and so on. So, if your records have *no* city value you can do what you want by specifying the city field to sortMissingLast then specifying state as your second criteria. So you can probably do what you want by breaking up your location field into fields that are specific for sorting (probably use copyfield here?) and specifying as above. Best Erick On Sun, Jan 1, 2012 at 3:19 AM, Gupta, Veeranjaneya wrote: > Hi, > > I use Solr 1.4 version and I have a question about SOLR sort order. > Requirement : Sort names(e.g. : location names like Dallas(City), las > vegas(City), Texas(State), India(Country), Canada(Country), etc..) based on > category(e.g. : CITY, STATE, COUNTRY, etc..) > How to sort the SOLR results based on custom order? > Expected result : > > 1. Dallas > > 2. Las Vegas > > 3. Texas > > 4. Canada > > 5. India > > Thanks & Regards, Gupta
Re: Question about SOLR custom sort order
We fullfilled a similar requirement by creating a new field that is populated at client-level (a standalone app that converts binary data in solr input documents) Andrea On 1/1/12, Erick Erickson wrote: > There's no good way of enforcing this as far as I know > as you've outlined the problem. You can easily specify > multiple sort criteria, where ties in the first criteria > are broken by the second criteria and so on. > > So, if your records have *no* city value you can do what > you want by specifying the city field to sortMissingLast > then specifying state as your second criteria. > > So you can probably do what you want by breaking > up your location field into fields that are specific > for sorting (probably use copyfield here?) and specifying > as above. > > > Best > Erick > > On Sun, Jan 1, 2012 at 3:19 AM, Gupta, Veeranjaneya > wrote: >> Hi, >> >> I use Solr 1.4 version and I have a question about SOLR sort order. >> Requirement : Sort names(e.g. : location names like Dallas(City), las >> vegas(City), Texas(State), India(Country), Canada(Country), etc..) based >> on category(e.g. : CITY, STATE, COUNTRY, etc..) >> How to sort the SOLR results based on custom order? >> Expected result : >> >> 1. Dallas >> >> 2. Las Vegas >> >> 3. Texas >> >> 4. Canada >> >> 5. India >> >> Thanks & Regards, Gupta >
Re: Highlighting with prefix queries and maxBooleanClause
On 01/01/2012 07:48 AM, Erick Erickson wrote: This may be the impetus for Hoss creating SOLR-2996. Yep, it is indeed, though I believe this problem can also happen when a user searches for something like q=a* in a big index. I need a bigger index to know for sure about that, but from what I've read so far, I'm fairly certain that this problem is bigger than just the q=* search. I think my solution when this error is thrown is going to be to bump the size of the maxBooleanClause and retry the query. Failing that, I'll have to retry the query with highlighting off. I suspect this will go away if you use the correct match-all-docs syntax, i.e. q=*:* rather than q=* It does, yes. But I'm not sure what highlighting will do when there's nothing to highlight on (ie, no query terms to match against your text field). I believe it does nothing, thankfully. Mike