Re: Commit in solr 1.3 can take up to 5 minutes

2008-10-05 Thread Uwe Klosa
It's a live server with many search queries. I will set up a test server
next week or the week after and index the same amount of documents. I will
get back with the results.

Uwe

On Sat, Oct 4, 2008 at 8:18 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Sat, Oct 4, 2008 at 11:55 AM, Uwe Klosa <[EMAIL PROTECTED]> wrote:
> > A "Opening Server" is always happening directly after "start commit" with
> no
> > delay.
>
> Ah, so it doesn't look like it's the close of the IndexWriter then!
> When do you see the "end_commit_flush"?
> Could you post everything in your log between when the commit begins
> and when it ends?
> Is this a live server (is query traffic continuing to come in while
> the commit is happening?)  If so, it would be interesting to see (and
> easier to debug) if it happened on a server with no query traffic.
>
> > But I can see many {commit=} with QTime around 280.000 (4 and a half
> > minutes)
>
> > One difference I could see to your logging is that I have waitFlush=true.
> > Could that have this impact?
>
> These parameters (waitFlush/waitSearcher) won't affect how long it
> takes to get the new searcher registered, but does affect at what
> point control is returned to the caller (and hence when you see the
> response).  If waitSearcher==false, then you see the response before
> searcher warming, otherwise it blocks until after.  waitFlush==false
> is not currently supported (it will always act as true), so your
> change of that doesn't matter.
>
> -Yonik
>


How stop properly solr to modify solrconfig or ... files

2008-10-05 Thread sunnyfr

Hi everybody 
still me :)
Little question : How can I stop solr properly. I explain my problem :
I did a full import and put adaptive parameter for mySql to avoid OOM error.
Then I wanted to add to solrconfig postCommit and postOptimize, snapshooter
once full import was done.
And I stopped tomcat. But then after my modification in solrconfig, when I
wanted to restart it,
I got an error, it couldn't fing sgement_7oe .. indeed when I checked in
index directory it wasn't there.
So I don't know what happened and I don't know how can I do when it lost or
damage a segment file..?
and most important how can I stop it properly if I need to modify my xml's
configuration files?

Thanks a lot,

-- 
View this message in context: 
http://www.nabble.com/How-stop-properly-solr-to-modify-solrconfig-or-...-files-tp19826679p19826679.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dismax and long phrases

2008-10-05 Thread Yonik Seeley
Hmmm, tricky.  I think you've uncovered an algorithmic flaw in DisMax.

Consider 2 fields, f1, f2 and 2 terms foo and bar.  For illustration
purposes, here is a query that's structurally equivalent (assuming
mm=100% of terms must match):

+(f1:foo OR f2:foo) +(f1:bar OR f2:bar)

OK, so it says that "foo" must appear in either field and "bar" must
appear in either field.  So far so good.

Now consider what happens if "bar" is a stopword for f1... the query becomes

+(f1:foo OR f2:foo) +(f2:bar)

Oops, now this query is saying that bar *must* appear in f2... it's
more restrictive than the first query.  It appears that dismax is a
bit broken when some of the fields have stopwords and some don't.
Offhand, I don't see an easy fix for this problem.

-Yonik



On Fri, Oct 3, 2008 at 5:44 PM, Jon Drukman <[EMAIL PROTECTED]> wrote:
> i have a document with the following field
>
> Saying goodbye to Norman
>
> if i search for "saying goodbye to norman" with the standard query, it works
> fine.  if i specify dismax, however, it does not match.  here's the output
> of debugQuery, which I don't understand at all:
>
> saying goodbye to norman
> saying goodbye to norman
> +((DisjunctionMaxQuery((user_name:saying^0.4 |
> description:say | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 |
> location:saying^0.6 | name:say^1.5)~0.01)
> DisjunctionMaxQuery((user_name:goodbye^0.4 | description:goodby |
> tags:goodby^0.5 | misc:goodby^0.3 | group_name:goodby^1.5 |
> location:goodbye^0.6 | name:goodby^1.5)~0.01)
> DisjunctionMaxQuery((user_name:to^0.4 | location:to^0.6)~0.01)
> DisjunctionMaxQuery((user_name:norman^0.4 | description:norman |
> tags:norman^0.5 | misc:norman^0.3 | group_name:norman^1.5 |
> location:norman^0.6 | name:norman^1.5)~0.01))~4)
> DisjunctionMaxQuery((description:"say goodby norman"~100 | group_name:"say
> goodby norman"~100^1.5 | name:"say goodby norman"~100^1.5)~0.01)
> +(((user_name:saying^0.4 | description:say
> | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 | location:saying^0.6 |
> name:say^1.5)~0.01 (user_name:goodbye^0.4 | description:goodby |
> tags:goodby^0.5 | misc:goodby^0.3 | group_name:goodby^1.5 |
> location:goodbye^0.6 | name:goodby^1.5)~0.01 (user_name:to^0.4 |
> location:to^0.6)~0.01 (user_name:norman^0.4 | description:norman |
> tags:norman^0.5 | misc:norman^0.3 | group_name:norman^1.5 |
> location:norman^0.6 | name:norman^1.5)~0.01)~4) (description:"say goodby
> norman"~100 | group_name:"say goodby norman"~100^1.5 | name:"say goodby
> norman"~100^1.5)~0.01
>
>
>
> it works fine if I search for "say goodbye" or "saying goodbye" or "saying
> goodbye norman".  how can i get it to do exact matches (which should score
> very high)?
>
>
> -jsd-
>
>


Re: dismax and long phrases

2008-10-05 Thread Chris Hostetter

: Hmmm, tricky.  I think you've uncovered an algorithmic flaw in DisMax.

I would call it a deficency, not a flaw :)

: more restrictive than the first query.  It appears that dismax is a
: bit broken when some of the fields have stopwords and some don't.
: Offhand, I don't see an easy fix for this problem.

you are correct, this has been discussed in the past...

http://www.nabble.com/Making-stop-words-optional-with-DisMax--to16307924.html#a16307924
http://www.nabble.com/DisMax-request-handler-doesn%27t-work-with-stopwords--to11015905.html#a11015905

It's not a bug in the implementation, it's a side effect of the basic 
tenent of how dismax works since it inverts the input and creates a 
DisjunctionMaxQuery for each "word" in the input, any word that is valid 
in at least one of the "qf" fields generates a "should" clause that 
contributes to the MM count.  

One idea that was suggested at one point (possibly by me) is to make "qf" 
a multivalue param and use each one to construct a seperate boolean query 
and wrap those in another boolean (or dismax) query so that only one is 
required ... then you could put all of your fields w/o stopwords in one qf 
with a bunch of big boosts, and all of your fields w/stopwords in another 
qf with smaller boosts.

But i don't think anyone ever submitted a patch, and i haven't htought it 
through enough to be confident it would work out well.


-Hoss



Re: Using Filters with SolrIndexSearcher

2008-10-05 Thread Chris Hostetter

: I have a lucene CustomScoreQuery which I am wanting to execute with a lucene
: Filter.  However the getDocListAndSet API provided by the SolrIndexSearcher
: doesn't seem to allow Filters to be used along with Queries.  Instead it
: seems that the Filters must be first converted to a DocSet.  Internally, the
: SolrIndexSearcher then seems to call Searcher.search(Query, HitCollector),
: which results in my CustomScoreQuery being called for every document
: resulting from the query (which in my case is every document in the index),
: instead of those resulting from the Query and the Filter.  Ideally
: SolrIndexSearcher would call something like Searcher.search(Query, Filter,

this statement confuses me ... independent of anything in SOlr, if you are 
using Lucene directly then a CustomScoreQuery that matches every doc in 
your index should wind up iterating over every document regardless of 
wether a filter is applied.  this hsouldn't work any differently in Solr 
when dealing with DocSets instead of Filters.

If you only care about docs that match a certian filter, couldn't you use 
that Filter to create subQuery for your ConstantScoreQuery?  (instead of a 
MatchAllDocs query which is what it sounds like you're using right now)

If i'm missunderstanding your question, posting some code so we can see 
what you are trying would be helpful.

-Hoss



Re: positionIncrementGap in schema.xml

2008-10-05 Thread sanraj25

Hi,
 Thanks Erik .I am clear.But when I checked with multiValued=true for a
single field ,I gave 
 positionIncrementGap=100.That time also mismatch. for ex,

author: John Doe
author: Bob Smith

a phrase query of "doe bob" now matched even i specified 
positionIncrementGap=100.again I changed 
positionIncrementGap=0.Now also  doe bob matched.may I give 200 or 300 to
that value? give some suggestions.

Thanks
-sanraj


Erik Hatcher wrote:
> 
> 
> On Oct 3, 2008, at 5:10 AM, sanraj25 wrote:
>> what is the purpose of  positionIncrementGap attribute in  
>> field
>> type tag of schema.xml. The value specified for that
>> positionIncrementGap=100. If we change the value  what will happen?
> 
> Suppose a document has a multi-valued "author" field.   Like this:
> 
> author: John Doe
> author: Bob Smith
> 
> With a position increment gap of 0, a phrase query of "doe bob" would  
> be a match.  But often it is undesirable for that kind of match across  
> different field values.  A position increment gap controls the virtual  
> space between the last token of one field instance and the first token  
> of the next instance.  With a gap of 100, this prevents phrase queries  
> (even with a modest slop factor) from matching across instances.
> 
>   Erik
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/positionIncrementGap-in-schema.xml-tp19794338p19832472.html
Sent from the Solr - User mailing list archive at Nabble.com.