Closing holes in the index - optimize

2012-01-01 Thread shlomi java
hi community,

I understand that optimize builds a NEW index from the live one, and then
swaping them.
And that it is best to schedule optimize with care.

Can we still update and query the live index while optimize is on the go?
Or is the live index being locked during optimize?

10X
shlomiJ


Merging results from Shards - relevancy and performance

2012-01-01 Thread shlomi java
hola,

1) When distributing search across several Shards, is the merged result
reflects the overall ranking, cross-shards?
I'm talking about stuff like "document frequency".
I guess it does, otherwise distributed search wouldn't have overhead.

talking about overhead,
2) is there a known ratio of the overhead of using shards against
single core, and the impact on performance for adding the N+1 shard to the
distributed index?

thanks for any knowledge/thought.
ShlomiJ


Question about SOLR custom sort order

2012-01-01 Thread Gupta, Veeranjaneya
Hi,

I use Solr 1.4 version and I have a question about SOLR sort order.
Requirement : Sort  names(e.g. : location names like Dallas(City), las 
vegas(City), Texas(State), India(Country), Canada(Country), etc..) based on 
category(e.g. : CITY, STATE, COUNTRY, etc..)
How to sort the SOLR results based on custom order?
Expected result :

1.   Dallas

2.   Las Vegas

3.   Texas

4.   Canada

5.   India

Thanks & Regards, Gupta


Re: Interpreting solr response time from log

2012-01-01 Thread Jithin
Thanks Gora for clarifying. So if my understanding is correct then the total
response time is not logged in solr logs and I need to rely on the QTime in
the response.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Interpreting-solr-response-time-from-log-tp3624340p3624931.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search Query (Should I use fq)

2012-01-01 Thread Erick Erickson
In addition to Ahmet's comment, the other rule of thumb is that
fqs do NOT influence a document's score, it's strictly an
include/exclude decision.

The new cache=false capabilities allow you to keep one-off
fqs from using entries in your cache BTW...

Best
Erick

On Fri, Dec 30, 2011 at 3:45 AM, reeuv  wrote:
> Thanks for your help iorixxx .
>
> If you can help me solve one of my other questions as well that would be
> great
>
> http://lucene.472066.n3.nabble.com/Getting-results-in-reverse-order-they-were-indexed-td3620577.html
> http://lucene.472066.n3.nabble.com/Getting-results-in-reverse-order-they-were-indexed-td3620577.html
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Search-Query-Should-I-use-fq-tp3620521p3620586.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Interpreting solr response time from log

2012-01-01 Thread Gora Mohanty
On Sun, Jan 1, 2012 at 8:58 PM, Jithin  wrote:
> Thanks Gora for clarifying. So if my understanding is correct then the total
> response time is not logged in solr logs and I need to rely on the QTime in
> the response.

If your log level is set at least to INFO, as it should be by default Solr does
log response time to a different file. E.g., I have
INFO: [] webapp=/solr path=/select/
params={indent=on&start=0&q=*:*&version=2.2&rows=10} hits=22 status=0
QTime=40
where the QTime is 40ms, as also reflected in the HTTP response. You
were looking at the request logs in your example. This information is
logged to standard output (usually, terminal) by the Jetty embedded in
Solr (i.e., if you are doing "java -jar start.jar"), or to catalina.out if Solr
is used with Tomcat.

Regards,
Gora


Re: Solr, SQL Server's LIKE

2012-01-01 Thread Erick Erickson
Chantal:

bq: The problem with the wildcard searches is that the input is not
analyzed.

As of 3.6/4.0, this is no longer entirely true. Some analysis is
performed for wildcard searches by default and you can
specify most anything you want if you really need to see:
https://issues.apache.org/jira/browse/SOLR-2438
and
http://wiki.apache.org/solr/MultitermQueryAnalysis

Best
Erick

On Fri, Dec 30, 2011 at 4:33 PM, Devon Baumgarten
 wrote:
> Hoss,
>
> Thanks. You've answered my question. To clarify, what I should have asked for 
> instead of 'exact' was 'not fuzzy'. For some reason it didn't occur to me 
> that I didn't need n-grams to use the wildcard. You asking for me to clarify 
> what I meant made me realize that the n-grams are the source of all my 
> current problems. :)
>
> Thanks!
>
> Devon Baumgarten
>
>
> -Original Message-
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> Sent: Thursday, December 29, 2011 7:00 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr, SQL Server's LIKE
>
>
> : Thanks. I know I'll be able to utilize some of Solr's free text
> : searching capabilities in other search types in this project. The
> : product manager wants this particular search to exactly mimic LIKE%.
>        ...
> : Ex: If I search "Albatross" I want "Albert" to be excluded completely,
> : rather than having a low score.
>
> please be specific about the types of queries you want. ie: we need more
> then one example of the type of input you want to provide, the type of
> matches you want to see for that input, and the type of matches you want
> to get back.
>
> in your first message you said you need to match company titles "pretty
> exactly" but then seem to contradict yourself by saying the SQL's LIKE
> command fit's the bill -- even though the SQL LIKE command exists
> specificly for in-exact matches on field values.
>
> Based on your one example above of Albatross, you don't need anything
> special: don't use ngrams, don't use stemming, don't use fuzzy anything --
> just search for "Albatross" and it will match "Albatross" but not
> "Albert".  if you want "Albatross" to match "Albatross Road" use some
> basic tokenization.
>
> If all you really care about is prefix searching (which seems suggested by
> your "LIKE%" comment above, which i'm guessing is shorthand for something
> similar to "LIKE 'ABC%'"), so that queries like "abc" and "abcd" both
> match "abcdef" and "abcd" but neither of them match "abcd"
> then just use prefix queries (ie: "abcd*") -- they should be plenty
> efficient for your purposes.  you only need to worry about ngrams when you
> want to efficiently match in the middle of a string. (ie: "TITLE LIKE
> %ABC%")
>
>
> -Hoss


Re: Highlighting with prefix queries and maxBooleanClause

2012-01-01 Thread Erick Erickson
This may be the impetus for Hoss creating SOLR-2996.

I suspect this will go away if you use the correct
match-all-docs syntax, i.e. q=*:* rather than q=*

Hoss' suggestion in 2996 is to "do the right thing" with
q=*, but for now you need to use the right syntax.

But I'm not sure what highlighting will do when there's
nothing to highlight on (ie, no query terms to match
against your text field).

FWIW
Erick

On Fri, Dec 30, 2011 at 6:00 PM, Michael Lissner
 wrote:
> This question has come up a few times, but I've yet to see a good solution.
>
> Basically, if I have highlighting turned on and do a query for q=*, I get an
> error that maxBooleanClauses has been exceeded. Granted, this is a silly
> query, but a user might do something similar. My expectation is that queries
> that work when highlighting is OFF should continue working when it is ON.
>
> What's the best solution for queries like this? Is it simply to catch the
> error and then up maxBooleanClauses? Or to turn off highlighting when this
> error occurs?
>
> Or am I doing something altogether wrong?
>
> This is the query I'm using to cause the error:
>    http://localhost:8983/solr/select/?q=*&start=0&rows=20&hl=true&hl.fl=text
>
> Changing hl to false makes the query go through.
>
> I'm using Solr 4.0.0-dev
>
> The traceback is:
>
> SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount
> is set to 1024
>    at
> org.apache.lucene.search.ScoringRewrite$1.checkMaxClauseCount(ScoringRewrite.java:68)
>    at
> org.apache.lucene.search.ScoringRewrite$ParallelArraysTermCollector.collect(ScoringRewrite.java:159)
>    at
> org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:81)
>    at
> org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:114)
>    at
> org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:312)
>    at
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:155)
>    at
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:144)
>    at
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:384)
>    at
> org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
>    at
> org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
>    at
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205)
>    at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:511)
>    at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:402)
>    at
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:121)
>    at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>    at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>    at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>    at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>    at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>    at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>    at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>    at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>    at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>    at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>    at org.mortbay.jetty.Server.handle(Server.java:326)
>    at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>    at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>    at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>    at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>
> Thanks,
>
> Mike


Re: Closing holes in the index - optimize

2012-01-01 Thread Erick Erickson
There's no index locking while optimizing that I know of.
But optimize is not all that necessary usually unless
you have a pretty static index, in fact under the
covers it's been renamed to forceMerge or something
just to keep it from sounding so necessary. The
resource reclamation & etc. happens automagically
when segments are merged now, so you might not want
to optimize at all!

Best
Erick

On Sun, Jan 1, 2012 at 3:28 AM, shlomi java  wrote:
> hi community,
>
> I understand that optimize builds a NEW index from the live one, and then
> swaping them.
> And that it is best to schedule optimize with care.
>
> Can we still update and query the live index while optimize is on the go?
> Or is the live index being locked during optimize?
>
> 10X
> shlomiJ


Re: Merging results from Shards - relevancy and performance

2012-01-01 Thread Erick Erickson
1> Yes. Note that the distributed tf/idf is an issue, although it's changing.
 That is, if your documents are statistically very different across
 shards, the scores aren't really comparable. This is changing, but
I don't think it's committed yet.
2> Well, you're mixing apples and oranges I think. The general
 recommendation is to use a single core and *replicate* it
 across as many machines as necessary until you index gets
 too big to fit on your machines (i.e. you cannot get decent
 query times at all). This is NOT distributed searching as
 each request is wholly serviced by a single slave searcher.

 Once you cross the threshold of what fits on your hardware,
 you really have no choice except to shard and use distributed
 searching.

 There is certainly some overhead, but since you have no
 choice but to pay it, you just cope .

 At very large scale (i.e. lots of shards on lots of machines),
 you run into the "laggard problem". That is, as the number
 of shards increases, so does the chance that at least one
 of them will, for whatever reason, take an anomalously long
 time to complete which will slow your final results.

FWIW
Erick

On Sun, Jan 1, 2012 at 4:34 AM, shlomi java  wrote:
> hola,
>
> 1) When distributing search across several Shards, is the merged result
> reflects the overall ranking, cross-shards?
> I'm talking about stuff like "document frequency".
> I guess it does, otherwise distributed search wouldn't have overhead.
>
> talking about overhead,
> 2) is there a known ratio of the overhead of using shards against
> single core, and the impact on performance for adding the N+1 shard to the
> distributed index?
>
> thanks for any knowledge/thought.
> ShlomiJ


Re: Question about SOLR custom sort order

2012-01-01 Thread Erick Erickson
There's no good way of enforcing this as far as I know
as you've outlined the problem. You can easily specify
multiple sort criteria, where ties in the first criteria
are broken by the second criteria and so on.

So, if your records have *no* city value you can do what
you want by specifying the city field to sortMissingLast
then specifying state as your second criteria.

So you can probably do what you want by breaking
up your location field into fields that are specific
for sorting (probably use copyfield here?) and specifying
as above.


Best
Erick

On Sun, Jan 1, 2012 at 3:19 AM, Gupta, Veeranjaneya
 wrote:
> Hi,
>
> I use Solr 1.4 version and I have a question about SOLR sort order.
> Requirement : Sort  names(e.g. : location names like Dallas(City), las 
> vegas(City), Texas(State), India(Country), Canada(Country), etc..) based on 
> category(e.g. : CITY, STATE, COUNTRY, etc..)
> How to sort the SOLR results based on custom order?
> Expected result :
>
> 1.       Dallas
>
> 2.       Las Vegas
>
> 3.       Texas
>
> 4.       Canada
>
> 5.       India
>
> Thanks & Regards, Gupta


Re: Question about SOLR custom sort order

2012-01-01 Thread Andrea Gazzarini
We fullfilled a similar requirement by creating a new field that is
populated at client-level (a standalone app that converts binary data
in solr input documents)

Andrea

On 1/1/12, Erick Erickson  wrote:
> There's no good way of enforcing this as far as I know
> as you've outlined the problem. You can easily specify
> multiple sort criteria, where ties in the first criteria
> are broken by the second criteria and so on.
>
> So, if your records have *no* city value you can do what
> you want by specifying the city field to sortMissingLast
> then specifying state as your second criteria.
>
> So you can probably do what you want by breaking
> up your location field into fields that are specific
> for sorting (probably use copyfield here?) and specifying
> as above.
>
>
> Best
> Erick
>
> On Sun, Jan 1, 2012 at 3:19 AM, Gupta, Veeranjaneya
>  wrote:
>> Hi,
>>
>> I use Solr 1.4 version and I have a question about SOLR sort order.
>> Requirement : Sort  names(e.g. : location names like Dallas(City), las
>> vegas(City), Texas(State), India(Country), Canada(Country), etc..) based
>> on category(e.g. : CITY, STATE, COUNTRY, etc..)
>> How to sort the SOLR results based on custom order?
>> Expected result :
>>
>> 1.       Dallas
>>
>> 2.       Las Vegas
>>
>> 3.       Texas
>>
>> 4.       Canada
>>
>> 5.       India
>>
>> Thanks & Regards, Gupta
>


Re: Highlighting with prefix queries and maxBooleanClause

2012-01-01 Thread Michael Lissner

On 01/01/2012 07:48 AM, Erick Erickson wrote:

This may be the impetus for Hoss creating SOLR-2996.
Yep, it is indeed, though I believe this problem can also happen when a 
user searches for something like q=a* in a big index. I need a bigger 
index to know for sure about that, but from what I've read so far, I'm 
fairly certain that this problem is bigger than just the q=* search.


I think my solution when this error is thrown is going to be to bump the 
size of the maxBooleanClause and retry the query. Failing that, I'll 
have to retry the query with highlighting off.

I suspect this will go away if you use the correct
match-all-docs syntax, i.e. q=*:* rather than q=*

It does, yes.

But I'm not sure what highlighting will do when there's
nothing to highlight on (ie, no query terms to match
against your text field).

I believe it does nothing, thankfully.

Mike