Re: facet performance tips

2009-08-13 Thread Jérôme Etévé
Thanks everyone for your advices.

I increased my filterCache, and the faceting performances improved greatly.

My faceted field can have at the moment ~4 different terms, so I
did set a filterCache size of 5 and it works very well.

However, I'm planning to increase the number of terms to maybe around
500 000, so I guess this approach won't work anymore, as I doubt a 500
000 sized fieldCache would work.

So I guess my best move would be to upgrade to the soon to be 1.4
version of solr to benefit from its new faceting method.

I know this is a bit off-topic, but do you have a rough idea about
when 1.4 will be an official release?
As well, is the current trunk OK for production? Is it compatible with
1.3 configuration files?

Thanks !

Jerome.

2009/8/13 Stephen Duncan Jr :
> Note that depending on the profile of your field (full text and how many
> unique terms on average per document), the improvements from 1.4 may not
> apply, as you may exceed the limits of the new faceting technique in Solr
> 1.4.
> -Stephen
>
> On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher  wrote:
>
>> Yes, increasing the filterCache size will help with Solr 1.3 performance.
>>
>> Do note that trunk (soon Solr 1.4) has dramatically improved faceting
>> performance.
>>
>>Erik
>>
>>
>> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:
>>
>>  Hi everyone,
>>>
>>>  I'm using some faceting on a solr index containing ~ 160K documents.
>>> I perform facets on multivalued string fields. The number of possible
>>> different values is quite large.
>>>
>>> Enabling facets degrades the performance by a factor 3.
>>>
>>> Because I'm using solr 1.3, I guess the facetting makes use of the
>>> filter cache to work. My filterCache is set
>>> to a size of 2048. I also noticed in my solr stats a very small ratio
>>> of cache hit (~ 0.01%).
>>>
>>> Can it be the reason why the faceting is slow? Does it make sense to
>>> increase the filterCache size so it matches more or less the number
>>> of different possible values for the faceted fields? Would that not
>>> make the memory usage explode?
>>>
>>> Thanks for your help !
>>>
>>> --
>>> Jerome Eteve.
>>>
>>> Chat with me live at http://www.eteve.net
>>>
>>> jer...@eteve.net
>>>
>>
>>
>
>
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Questions about XPath in data import handler

2009-08-13 Thread Andrew Clegg

A couple of questions about the DIH XPath syntax...

The docs say it supports:

   xpath="/a/b/subje...@qualifier='fullTitle']"
   xpath="/a/b/subject/@qualifier"
   xpath="/a/b/c"

Does the second one mean "select the value of the attribute called qualifier
in the /a/b/subject element"?

e.g. For this document:


 
  
 


... I would get the result "some text"?

Also... Can I select a non-leaf node and get *ALL* the text underneath it?
e.g. /a/b in this example?

Thanks!

Andrew.

-- 
View this message in context: 
http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954223.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Questions about XPath in data import handler

2009-08-13 Thread Andrew Clegg



Andrew Clegg wrote:
> 
>   
> 

Sorry, Nabble swallowed my XML example. That was supposed to be

[a]
 [b]
  [subject qualifier="some text" /]
 [/b]
[/a]

... but in XML.

Andrew.

-- 
View this message in context: 
http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954263.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Questions about XPath in data import handler

2009-08-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg wrote:
>
> A couple of questions about the DIH XPath syntax...
>
> The docs say it supports:
>
>   xpath="/a/b/subje...@qualifier='fullTitle']"
>   xpath="/a/b/subject/@qualifier"
>   xpath="/a/b/c"
>
> Does the second one mean "select the value of the attribute called qualifier
> in the /a/b/subject element"?
>
> e.g. For this document:
>
>
>
>  
>
>
>
> ... I would get the result "some text"?
yes you are right. Isn't that the semantics of standard xpath syntax?
>
> Also... Can I select a non-leaf node and get *ALL* the text underneath it?
> e.g. /a/b in this example?
>
> Thanks!
>
> Andrew.
>
> --
> View this message in context: 
> http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954223.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Query with no cache without editing solrconfig?

2009-08-13 Thread Koji Sekiguchi

Jason Rutherglen wrote:

Is there a way to do this via a URL?

  


I think - no there isn't.

Koji



Re: Distributed query returns time consumed by each Solr shard?

2009-08-13 Thread Grant Ingersoll
Not that I am aware of.  I think there is a patch for timing out  
shards and returning partial results if a shard takes to long.  I  
believe it is slated for 1.4, but it doesn't have any unit tests at  
the moment.



On Aug 12, 2009, at 7:12 PM, Jason Rutherglen wrote:


Is there a way to do this currently? If a shard takes an
inordinate amount of time compared to the other shards, it's useful
to see the various qtimes per shard, with the aggregated results.





Re: Questions about XPath in data import handler

2009-08-13 Thread Andrew Clegg



Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg
> wrote:
> 
>> Does the second one mean "select the value of the attribute called
>> qualifier
>> in the /a/b/subject element"?
> 
> yes you are right. Isn't that the semantics of standard xpath syntax?
> 

Yes, just checking since the DIH XPath engine is a little different.

Do you know what I would get in this case?

> > Also... Can I select a non-leaf node and get *ALL* the text underneath
> it?
> > e.g. /a/b in this example?

Cheers,

Andrew.

-- 
View this message in context: 
http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954869.html
Sent from the Solr - User mailing list archive at Nabble.com.



I think this is a "bug"

2009-08-13 Thread Paul Tomblin
I don't want to join yet another mailing list or register for JIRA,
but I just noticed that the Javadocs for
SolrInputDocument.addField(String name, Object value, float boost) is
incredibly wrong - it looks like it was copied from a "deleteAll"
method.


-- 
http://www.linkedin.com/in/paultomblin


Re: I think this is a "bug"

2009-08-13 Thread Chris Male
Hi Paul,

Yes the comment does look very wrong.  I'll open a JIRA issue and include a
fix.

On Thu, Aug 13, 2009 at 4:43 PM, Paul Tomblin  wrote:

> I don't want to join yet another mailing list or register for JIRA,
> but I just noticed that the Javadocs for
> SolrInputDocument.addField(String name, Object value, float boost) is
> incredibly wrong - it looks like it was copied from a "deleteAll"
> method.
>
>
> --
> http://www.linkedin.com/in/paultomblin
>


Curl error 26 failed creating formpost data

2009-08-13 Thread Kevin Miller
I am trying to use the curl command located on the Extracting Request
Handler on the Solr Wiki.  I am using the command in the following way:

curl
"http://echo12:8983/solr/update/extract?literal.id=doc1&uprefix=attr&map
.content=attr_content&commit=true" -F "myfi...@../../BadNews.doc"

echo12 is the server where Solr is located and the BadNews.doc is
located in the exampledocs directory.

When I execute this command I get the following error message: curl:
(26) failed creating formpost data.

Can someone please direct me to where I can find the way to correct this
error?


Kevin Miller
Web Services


RE: Using Lucene's payload in Solr

2009-08-13 Thread Ensdorf Ken
> > It looks like things have changed a bit since this subject was last
> > brought
> > up here.  I see that there are support in Solr/Lucene for indexing
> > payload
> > data (DelimitedPayloadTokenFilterFactory and
> > DelimitedPayloadTokenFilter).
> > Overriding the Similarity class is straight forward.  So the last
> > piece of
> > the puzzle is to use a BoostingTermQuery when searching.  I think
> > all I need
> > to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser
> > under
> > the cover.  I think all I need to do is to write my own query parser
> > plugin
> > that uses a custom query parser, with the only difference being in
> the
> > getFieldQuery() method where a BoostingTermQuery is used instead of a
> > TermQuery.
>
> The BTQ is now deprecated in favor of the BoostingFunctionTermQuery,
> which gives some more flexibility in terms of how the spans in a
> single document are scored.
>
> >
> > Am I on the right track?
>
> Yes.
>
> > Has anyone done something like this already?
>

I wrote a QParserPlugin that seems to do the trick.  This is minimally tested - 
we're not actually using it at the moment, but should get you going.  Also, as 
Grant suggested, you may want to sub BFTQ for BTQ below:

package com.zoominfo.solr.analysis;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.*;
import org.apache.lucene.search.*;
import org.apache.lucene.search.payloads.BoostingTermQuery;
import org.apache.solr.common.params.*;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.search.*;

public class BoostingTermQParserPlugin extends QParserPlugin {
  public static String NAME = "zoom";

  public void init(NamedList args) {
  }

  public QParser createParser(String qstr, SolrParams localParams, SolrParams 
params, SolrQueryRequest req) {
System.out.print("BoostingTermQParserPlugin::createParser\n");
return new BoostingTermQParser(qstr, localParams, params, req);
  }
}

class BoostingTermQueryParser extends QueryParser {

public BoostingTermQueryParser(String f, Analyzer a) {
super(f, a);

System.out.print("BoostingTermQueryParser::BoostingTermQueryParser\n");
}

@Override
protected Query newTermQuery(Term term){
System.out.print("BoostingTermQueryParser::newTermQuery\n");
return new BoostingTermQuery(term);
}
}

class BoostingTermQParser extends QParser {
  String sortStr;
  QueryParser lparser;

  public BoostingTermQParser(String qstr, SolrParams localParams, SolrParams 
params, SolrQueryRequest req) {
super(qstr, localParams, params, req);
System.out.print("BoostingTermQParser::BoostingTermQParser\n");
  }


  public Query parse() throws ParseException {
System.out.print("BoostingTermQParser::parse\n");
String qstr = getString();

String defaultField = getParam(CommonParams.DF);
if (defaultField==null) {
  defaultField = getReq().getSchema().getSolrQueryParser(null).getField();
}

lparser = new BoostingTermQueryParser(defaultField, 
getReq().getSchema().getQueryAnalyzer());

// these could either be checked & set here, or in the SolrQueryParser 
constructor
String opParam = getParam(QueryParsing.OP);
if (opParam != null) {
  lparser.setDefaultOperator("AND".equals(opParam) ? 
QueryParser.Operator.AND : QueryParser.Operator.OR);
} else {
  // try to get default operator from schema
  
lparser.setDefaultOperator(getReq().getSchema().getSolrQueryParser(null).getDefaultOperator());
}

return lparser.parse(qstr);
  }


  public String[] getDefaultHighlightFields() {
return new String[]{lparser.getField()};
  }

}


RE: [OT] Solr Webinar

2009-08-13 Thread Chenini, Mohamed
I also registered to attend but I am not going to because here at work a
last minute meeting has been scheduled at the same time.

Is it possible in the future to schedule such webinars starting 5-6 PM
ET?

Thanks,
Mohamed

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Wednesday, August 12, 2009 6:22 PM
To: solr-user@lucene.apache.org
Subject: Re: [OT] Solr Webinar

I believe it will be, but am not sure of the procedure for  
distributing.  I think if you register, but don't show, you will get a  
notification.

-Grant

On Aug 10, 2009, at 12:26 PM, Lucas F. A. Teixeira wrote:

> Hello Grant,
> Will the webinar be recorded and available to download later  
> someplace?
> Unfortunately, I can't watch this time.
>
> Thanks,
>
> []s,
>
> Lucas Frare Teixeira .*.
> - lucas...@gmail.com
> - blog.lucastex.com
> - twitter.com/lucastex
>
>
> On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll  
> wrote:
>
>> I will be giving a free one hour webinar on getting started with  
>> Apache
>> Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT
>>
>> You can sign up @
>> http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP
>>
>> I will present and demo:
>> * Getting started with LucidWorks for Solr
>> * Getting better, faster results using Solr's findability and  
>> relevance
>> improvement tools
>> * Deploying Solr in production, including monitoring performance  
>> and trends
>> with the LucidGaze for Solr performance profiler
>>
>> -Grant


This email/fax message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution of this
email/fax is prohibited. If you are not the intended recipient, please
destroy all paper and electronic copies of the original message.



Re: Using Lucene's payload in Solr

2009-08-13 Thread Bill Au
Thanks for the tip on BFTQ.  I have been using a nightly build before that
was committed.  I have upgrade to the latest nightly build and will use that
instead of BTQ.

I got DelimitedPayloadTokenFilter to work and see that the terms and payload
of the field are correct but the delimiter and payload are stored so they
appear in the response also.  Here is an example:

XML for indexing:
Solr|2.0 In|2.0 Action|2.0


XML response:

Solr|2.0 In|2.0 Action|2.0


>
I want to set payload on a field that has a variable number of words.  So I
guess I can use a copy field with a PatternTokenizerFactory to filter out
the delimiter and payload.

I am thinking maybe I can do this instead when indexing:

XML for indexing:
Solr In Action

This will simplify indexing as I don't have to repeat the payload for each
word in the field.  I do have to write a payload aware update handler.  It
looks like I can use Lucene's NumericPayloadTokenFilter in my custom update
handler to

Any thoughts/comments/suggestions?

Bill


On Wed, Aug 12, 2009 at 7:13 AM, Grant Ingersoll wrote:

>
> On Aug 11, 2009, at 5:30 PM, Bill Au wrote:
>
>  It looks like things have changed a bit since this subject was last
>> brought
>> up here.  I see that there are support in Solr/Lucene for indexing payload
>> data (DelimitedPayloadTokenFilterFactory and DelimitedPayloadTokenFilter).
>> Overriding the Similarity class is straight forward.  So the last piece of
>> the puzzle is to use a BoostingTermQuery when searching.  I think all I
>> need
>> to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser under
>> the cover.  I think all I need to do is to write my own query parser
>> plugin
>> that uses a custom query parser, with the only difference being in the
>> getFieldQuery() method where a BoostingTermQuery is used instead of a
>> TermQuery.
>>
>
> The BTQ is now deprecated in favor of the BoostingFunctionTermQuery, which
> gives some more flexibility in terms of how the spans in a single document
> are scored.
>
>
>> Am I on the right track?
>>
>
> Yes.
>
>  Has anyone done something like this already?
>>
>
> I intend to, but haven't started.
>
>  Since Solr already has indexing support for payload, I was hoping that
>> query
>> support is already in the works if not available already.  If not, I am
>> willing to contribute but will probably need some guidance since my
>> knowledge in Solr query parser is weak.
>>
>
>
> https://issues.apache.org/jira/browse/SOLR-1337
>


Boosting relevance as terms get nearer to each other

2009-08-13 Thread Michael _
Hello,
I'd like to score documents higher that have the user's search terms nearer
each other.  For example, if a user searches for

  a AND b AND c

the standard query handler should return all documents with [a] [b] and [c]
in them, but documents matching the phrase "a b c" should get a boost over
those with "a x b c" over those with "b x y c z a", etc.

To accomplish this, I thought I might replace the user's query with

  "a b c"~10

hoping that the slop term gets a higher and higher score the closer together
[a] [b] and [c] appear.  This doesn't seem to be the case in my experiments;
when I debug the query, there's no component of the score based on how close
together [a] [b] and [c] are.  And I'm suspicious that this would make my
queries a whole lot slower -- in reality my users' queries get expanded
quite a bit already, and I'd thus need to add many slop terms.

Perhaps instead I could modify the Standard query handler to examine the
distance between all ANDed tokens, and boost proportionally to the inverse
of their average distance apart.  I've never modified a query handler before
so I have no idea if this is possible.

Any suggestions on what approach I should take?  The less I have to modify
Solr, the better -- I'd prefer a query-side solution over writing a plugin
over forking the standard query handler.

Thanks in advance!
Michael


Re: Using Lucene's payload in Solr

2009-08-13 Thread Grant Ingersoll


On Aug 13, 2009, at 11:58 AM, Bill Au wrote:

Thanks for the tip on BFTQ.  I have been using a nightly build  
before that
was committed.  I have upgrade to the latest nightly build and will  
use that

instead of BTQ.

I got DelimitedPayloadTokenFilter to work and see that the terms and  
payload
of the field are correct but the delimiter and payload are stored so  
they

appear in the response also.  Here is an example:

XML for indexing:
Solr|2.0 In|2.0 Action|2.0


XML response:

Solr|2.0 In|2.0 Action|2.0




Correct.





I want to set payload on a field that has a variable number of  
words.  So I
guess I can use a copy field with a PatternTokenizerFactory to  
filter out

the delimiter and payload.

I am thinking maybe I can do this instead when indexing:

XML for indexing:
Solr In Action


Hmmm, interesting, what's your motivation vs. boosting the field?




This will simplify indexing as I don't have to repeat the payload  
for each
word in the field.  I do have to write a payload aware update  
handler.  It
looks like I can use Lucene's NumericPayloadTokenFilter in my custom  
update

handler to

Any thoughts/comments/suggestions?





Bill


On Wed, Aug 12, 2009 at 7:13 AM, Grant Ingersoll  
wrote:




On Aug 11, 2009, at 5:30 PM, Bill Au wrote:

It looks like things have changed a bit since this subject was last

brought
up here.  I see that there are support in Solr/Lucene for indexing  
payload
data (DelimitedPayloadTokenFilterFactory and  
DelimitedPayloadTokenFilter).
Overriding the Similarity class is straight forward.  So the last  
piece of
the puzzle is to use a BoostingTermQuery when searching.  I think  
all I

need
to do is to subclass Solr's LuceneQParserPlugin uses  
SolrQueryParser under

the cover.  I think all I need to do is to write my own query parser
plugin
that uses a custom query parser, with the only difference being in  
the
getFieldQuery() method where a BoostingTermQuery is used instead  
of a

TermQuery.



The BTQ is now deprecated in favor of the  
BoostingFunctionTermQuery, which
gives some more flexibility in terms of how the spans in a single  
document

are scored.



Am I on the right track?



Yes.

Has anyone done something like this already?




I intend to, but haven't started.

Since Solr already has indexing support for payload, I was hoping  
that

query
support is already in the works if not available already.  If not,  
I am

willing to contribute but will probably need some guidance since my
knowledge in Solr query parser is weak.




https://issues.apache.org/jira/browse/SOLR-1337



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Issue with Collection & Distribution

2009-08-13 Thread william pink
Hello,

I am having a few problems with the snapinstaller/commit on the slave, I
have a pull_from_master script which is the following

#!/bin/bash
cd /opt/solr/solr/bin -v
./snappuller -v -P 18983
./snapinstaller -v


I have been executing snapshooter manually on the master then running the
above script to test but I am getting the following

2009/08/13 17:18:16 notifing Solr to open a new Searcher
2009/08/13 17:18:16 failed to connect to Solr server
2009/08/13 17:18:17 snapshot installed but Solr server has not open a new
Searcher

Commit logs

2009/08/13 17:18:16 started by user
2009/08/13 17:18:16 command: /opt/solr/solr/bin/commit
2009/08/13 17:18:16 commit request to Solr at
http://slave-server:8983/solr/update failed:
2009/08/13 17:18:16   028 
2009/08/13 17:18:16 failed (elapsed time: 0 sec)

Snappinstaller logs

2009/08/13 17:18:16 started by user
2009/08/13 17:18:16 command: ./snapinstaller -v
2009/08/13 17:18:16 installing snapshot
/opt/solr/solr/data/snapshot.20090813171835
2009/08/13 17:18:16 notifing Solr to open a new Searcher
2009/08/13 17:18:16 failed to connect to Solr server
2009/08/13 17:18:17 snapshot installed but Solr server has not open a new
Searcher
2009/08/13 17:18:17 failed (elapsed time: 1 sec)


Is there a way of telling why it is failing?

Many Thanks,
Will


Re: Solr 1.4 Clustering / mlt AS search?

2009-08-13 Thread Stanislaw Osinski
Hi,

On Tue, Aug 11, 2009 at 22:19, Mark Bennett  wrote:

Carrot2 has several pluggable algorithms to choose from, though I have no
> evidence that they're "better" than Lucene's.  Where TF/IDF is sort of a
> one
> step algebraic calculation, some clustering algorithms use iterative
> approaches, etc.


I'm not sure if I completely follow the way in which you'd like to use
Carrot2 for scoring -- would you cluster the whole index? Carrot2 was
designed to be a post-retrieval clustering algorithm and optimized to
cluster small sets of documents (up to ~1000) in real time. All processing
is performed in-memory, which limits Carrot2's applicability to really large
sets of documents.

S.


RE: facet performance tips

2009-08-13 Thread Fuad Efendi
I took 1.4 from trunk three days ago, it seems Ok for production (at least for 
my Master instance which is doing writes-only). I use the same config files.

500 000 terms are Ok too; I am using several millions with pre-1.3 SOLR taken 
from trunk.

However, do not try to "facet" (probably outdated term after SOLR-475) on 
generic queries such as [* TO *] (with huge resultset). For smaller query 
results (100,000 instead of 100,000,000) "counting terms" is fast enough (few 
milliseconds at http://www.tokenizer.org)

 

-Original Message-
From: Jérôme Etévé [mailto:jerome.et...@gmail.com] 
Sent: August-13-09 5:38 AM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips

Thanks everyone for your advices.

I increased my filterCache, and the faceting performances improved greatly.

My faceted field can have at the moment ~4 different terms, so I
did set a filterCache size of 5 and it works very well.

However, I'm planning to increase the number of terms to maybe around
500 000, so I guess this approach won't work anymore, as I doubt a 500
000 sized fieldCache would work.

So I guess my best move would be to upgrade to the soon to be 1.4
version of solr to benefit from its new faceting method.

I know this is a bit off-topic, but do you have a rough idea about
when 1.4 will be an official release?
As well, is the current trunk OK for production? Is it compatible with
1.3 configuration files?

Thanks !

Jerome.

2009/8/13 Stephen Duncan Jr :
> Note that depending on the profile of your field (full text and how many
> unique terms on average per document), the improvements from 1.4 may not
> apply, as you may exceed the limits of the new faceting technique in Solr
> 1.4.
> -Stephen
>
> On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher  wrote:
>
>> Yes, increasing the filterCache size will help with Solr 1.3 performance.
>>
>> Do note that trunk (soon Solr 1.4) has dramatically improved faceting
>> performance.
>>
>>Erik
>>
>>
>> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:
>>
>>  Hi everyone,
>>>
>>>  I'm using some faceting on a solr index containing ~ 160K documents.
>>> I perform facets on multivalued string fields. The number of possible
>>> different values is quite large.
>>>
>>> Enabling facets degrades the performance by a factor 3.
>>>
>>> Because I'm using solr 1.3, I guess the facetting makes use of the
>>> filter cache to work. My filterCache is set
>>> to a size of 2048. I also noticed in my solr stats a very small ratio
>>> of cache hit (~ 0.01%).
>>>
>>> Can it be the reason why the faceting is slow? Does it make sense to
>>> increase the filterCache size so it matches more or less the number
>>> of different possible values for the faceted fields? Would that not
>>> make the memory usage explode?
>>>
>>> Thanks for your help !
>>>
>>> --
>>> Jerome Eteve.
>>>
>>> Chat with me live at http://www.eteve.net
>>>
>>> jer...@eteve.net
>>>
>>
>>
>
>
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net




RE: facet performance tips

2009-08-13 Thread Fuad Efendi
It seems BOBO-Browse is alternate faceting engine; would be interesting to
compare performance with SOLR... Distributed?


-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: August-12-09 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips

For your fields with many terms you may want to try Bobo
http://code.google.com/p/bobo-browse/ which could work well with your
case.






RE: facet performance tips

2009-08-13 Thread Fuad Efendi
Interesting, it has "BoboRequestHandler implements SolrRequestHandler"
- easy to try it; and shards support



[Fuad Efendi] It seems BOBO-Browse is alternate faceting engine; would be
interesting to
compare performance with SOLR... Distributed?


[Jason Rutherglen] For your fields with many terms you may want to try Bobo
http://code.google.com/p/bobo-browse/ which could work well with your
case.








Re: Using Lucene's payload in Solr

2009-08-13 Thread Bill Au
I need to boost a field differently according to the content of the field.
Here is an example:


  Solr
  information retrieval
  webapp
  xml


  Tomcat
  webapp
  XMLSpy
  xml
  ide


A seach on category:webapp should return Tomcat before Solr.  A search on
category:xml should return XMLSpy before Solr.

Bill

On Thu, Aug 13, 2009 at 12:13 PM, Grant Ingersoll wrote:

>
> On Aug 13, 2009, at 11:58 AM, Bill Au wrote:
>
>  Thanks for the tip on BFTQ.  I have been using a nightly build before that
>> was committed.  I have upgrade to the latest nightly build and will use
>> that
>> instead of BTQ.
>>
>> I got DelimitedPayloadTokenFilter to work and see that the terms and
>> payload
>> of the field are correct but the delimiter and payload are stored so they
>> appear in the response also.  Here is an example:
>>
>> XML for indexing:
>> Solr|2.0 In|2.0 Action|2.0
>>
>>
>> XML response:
>> 
>> Solr|2.0 In|2.0 Action|2.0
>> 
>>
>
>
> Correct.
>
>
>>
>>>  I want to set payload on a field that has a variable number of words.
>>  So I
>> guess I can use a copy field with a PatternTokenizerFactory to filter out
>> the delimiter and payload.
>>
>> I am thinking maybe I can do this instead when indexing:
>>
>> XML for indexing:
>> Solr In Action
>>
>
> Hmmm, interesting, what's your motivation vs. boosting the field?
>
>
>
>
>> This will simplify indexing as I don't have to repeat the payload for each
>> word in the field.  I do have to write a payload aware update handler.  It
>> looks like I can use Lucene's NumericPayloadTokenFilter in my custom
>> update
>> handler to
>>
>> Any thoughts/comments/suggestions?
>>
>>
>
>  Bill
>>
>>
>> On Wed, Aug 12, 2009 at 7:13 AM, Grant Ingersoll > >wrote:
>>
>>
>>> On Aug 11, 2009, at 5:30 PM, Bill Au wrote:
>>>
>>> It looks like things have changed a bit since this subject was last
>>>
 brought
 up here.  I see that there are support in Solr/Lucene for indexing
 payload
 data (DelimitedPayloadTokenFilterFactory and
 DelimitedPayloadTokenFilter).
 Overriding the Similarity class is straight forward.  So the last piece
 of
 the puzzle is to use a BoostingTermQuery when searching.  I think all I
 need
 to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser
 under
 the cover.  I think all I need to do is to write my own query parser
 plugin
 that uses a custom query parser, with the only difference being in the
 getFieldQuery() method where a BoostingTermQuery is used instead of a
 TermQuery.


>>> The BTQ is now deprecated in favor of the BoostingFunctionTermQuery,
>>> which
>>> gives some more flexibility in terms of how the spans in a single
>>> document
>>> are scored.
>>>
>>>
>>>  Am I on the right track?


>>> Yes.
>>>
>>> Has anyone done something like this already?
>>>


>>> I intend to, but haven't started.
>>>
>>> Since Solr already has indexing support for payload, I was hoping that
>>>
 query
 support is already in the works if not available already.  If not, I am
 willing to contribute but will probably need some guidance since my
 knowledge in Solr query parser is weak.


>>>
>>> https://issues.apache.org/jira/browse/SOLR-1337
>>>
>>>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
Yeah we need a performance comparison, I haven't had time to put
one together. If/when I do I'll compare Bobo performance against
Solr bitset intersection based facets, compare memory
consumption.

For near realtime Solr needs to cache and merge bitsets at the
SegmentReader level, and Bobo needs to be upgraded to work with
Lucene 2.9's searching at the segment level (currently it uses a
MultiSearcher).

Distributed search on either should be fairly straightforward?

On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote:
> It seems BOBO-Browse is alternate faceting engine; would be interesting to
> compare performance with SOLR... Distributed?
>
>
> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: August-12-09 6:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: facet performance tips
>
> For your fields with many terms you may want to try Bobo
> http://code.google.com/p/bobo-browse/ which could work well with your
> case.
>
>
>
>
>


Re: Issue with Collection & Distribution

2009-08-13 Thread Bill Au
Have you check the solr log on the slave to see if there was any commit
done?  It looks to me you are still using an older version of the commit
script that is not compatible with the newer Solr response format.  If
thats' the case, the commit was actually performed.  It is just that the
script failed to handle the Solr response.  See

https://issues.apache.org/jira/browse/SOLR-463
https://issues.apache.org/jira/browse/SOLR-426

Bill

On Thu, Aug 13, 2009 at 12:28 PM, william pink  wrote:

> Hello,
>
> I am having a few problems with the snapinstaller/commit on the slave, I
> have a pull_from_master script which is the following
>
> #!/bin/bash
> cd /opt/solr/solr/bin -v
> ./snappuller -v -P 18983
> ./snapinstaller -v
>
>
> I have been executing snapshooter manually on the master then running the
> above script to test but I am getting the following
>
> 2009/08/13 17:18:16 notifing Solr to open a new Searcher
> 2009/08/13 17:18:16 failed to connect to Solr server
> 2009/08/13 17:18:17 snapshot installed but Solr server has not open a new
> Searcher
>
> Commit logs
>
> 2009/08/13 17:18:16 started by user
> 2009/08/13 17:18:16 command: /opt/solr/solr/bin/commit
> 2009/08/13 17:18:16 commit request to Solr at
> http://slave-server:8983/solr/update failed:
> 2009/08/13 17:18:16name="responseHeader">0 name="QTime">28 
> 2009/08/13 17:18:16 failed (elapsed time: 0 sec)
>
> Snappinstaller logs
>
> 2009/08/13 17:18:16 started by user
> 2009/08/13 17:18:16 command: ./snapinstaller -v
> 2009/08/13 17:18:16 installing snapshot
> /opt/solr/solr/data/snapshot.20090813171835
> 2009/08/13 17:18:16 notifing Solr to open a new Searcher
> 2009/08/13 17:18:16 failed to connect to Solr server
> 2009/08/13 17:18:17 snapshot installed but Solr server has not open a new
> Searcher
> 2009/08/13 17:18:17 failed (elapsed time: 1 sec)
>
>
> Is there a way of telling why it is failing?
>
> Many Thanks,
> Will
>


RE: JVM Heap utilization & Memory leaks with Solr

2009-08-13 Thread Fuad Efendi
Most OutOfMemoryException (if not 100%) happening with SOLR are because of
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/FieldCache.
html
- it is used internally in Lucene to cache Field value and document ID. 

My very long-term observations: SOLR can run without any problems few
days/months and unpredictable OOM happens just because someone tried sorted
search which will populate array with IDs of ALL documents in the index.

The only solution: calculate exactly amount of RAM needed for FieldCache...
For instance, for 100,000,000 documents single instance of FieldCache may
require 8*100,000,000 bytes (8 bytes per document ID?) which is almost 1Gb
(at least!)


I didn't notice any memory leaks after I started to use 16Gb RAM for SOLR
instance (almost a year without any restart!)




-Original Message-
From: Rahul R [mailto:rahul.s...@gmail.com] 
Sent: August-13-09 1:25 AM
To: solr-user@lucene.apache.org
Subject: Re: JVM Heap utilization & Memory leaks with Solr

*You should try to generate heap dumps and analyze the heap using a tool
like the Eclipse Memory Analyzer. Maybe it helps spotting a group of
objects holding a large amount of memory*

The tool that I used also allows to capture heap snap shots. Eclipse had a
lot of pre-requisites. You need to apply some three or five patches before
you can start using it My observations with this tool were that some
Hashmaps were taking up a lot of space. Although I could not pin it down to
the exact HashMap. These would either be weblogic's or Solr's I will
anyway give eclipse's a try and see how it goes. Thanks for your input.

Rahul

On Wed, Aug 12, 2009 at 2:15 PM, Gunnar Wagenknecht
wrote:

> Rahul R schrieb:
> > I tried using a profiling tool - Yourkit. The trial version was free for
> 15
> > days. But I couldn't find anything of significance.
>
> You should try to generate heap dumps and analyze the heap using a tool
> like the Eclipse Memory Analyzer. Maybe it helps spotting a group of
> objects holding a large amount of memory.
>
> -Gunnar
>
> --
> Gunnar Wagenknecht
> gun...@wagenknecht.org
> http://wagenknecht.org/
>
>




Re: Solr 1.4 Clustering / mlt AS search?

2009-08-13 Thread Mark Bennett
* mlb: comments

On Thu, Aug 13, 2009 at 9:39 AM, Stanislaw Osinski wrote:

> Hi,
>
> On Tue, Aug 11, 2009 at 22:19, Mark Bennett  wrote:
>
> Carrot2 has several pluggable algorithms to choose from, though I have no
> > evidence that they're "better" than Lucene's.  Where TF/IDF is sort of a
> > one
> > step algebraic calculation, some clustering algorithms use iterative
> > approaches, etc.
>
>
> I'm not sure if I completely follow the way in which you'd like to use
> Carrot2 for scoring -- would you cluster the whole index? Carrot2 was
> designed to be a post-retrieval clustering algorithm and optimized to
> cluster small sets of documents (up to ~1000) in real time. All processing
> is performed in-memory, which limits Carrot2's applicability to really
> large
> sets of documents.
>
> S.
>

* mlb: I agree with all of your assertions, but...

There are comments in the Solr materials about having an option to cluster
based on the entire document set, and some warning about this being atypical
and possibly slow.  And from what you're saying, for a big enough docset, it
might go from "slow" to "impossible", I'm not sure.

And so my question was, *if* you were willing to spend that much time and
effort to cluster all the text of all the documents (and if it were even
possible), would the result perform better than the standard TF/IDF
techniques?

In the application I'm considering, the queries tend to be longer than
average, more like full sentences or more.  And they tend to be of a
question and answer nature.  I've seen references in several search engines
that QandA search sometimes benefits from alternative search techniques.
And, from a separate email, the IDF part of the standard similarity may be
causing a problem, so I'm casting a wide net for other ideas.  Just
brainstorming here... :-)

So, given that, did you have any thoughts on it Stanislaw?
Mark


RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-13 Thread Fuad Efendi
UPDATE:

I have 100,000,000 new documents in 24 hours, including possible updates OR
possibly adding same document several times. I have two segments now (30Gb
total), and network is overloaded (I use web crawler to generate documents).
I never had more than 25,000,000 within a month before...

I read that high mergeFactor improves performance of updates; however, it
didn't work (it delays all merges... commit/optimize took similar timing).
High ramBufferSizeMB does the job.


[Fuad Efendi] >Looks like I temporarily solved the problem with
not-so-obvious settings:
[Fuad Efendi] >ramBufferSizeMB=8192
[Fuad Efendi] >mergeFactor=10



> Never tried profiling;
> 3000-5000 docs per second if SOLR is not busy with segment merge;
>
> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>
> During document updates (small batches 100-1000 docs) only 5-15% CPU
>
> constant rate 5:1 is very suspicious...
>
>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>> Buffer
>> Flash / Segment Merge per 1 minute of (heavy) batch document updates.





RE: facet performance tips

2009-08-13 Thread Fuad Efendi
SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to
be); check this
http://issues.apache.org/jira/browse/SOLR-475
(and probably http://issues.apache.org/jira/browse/SOLR-711)

-Original Message-
From: Jason Rutherglen 

Yeah we need a performance comparison, I haven't had time to put
one together. If/when I do I'll compare Bobo performance against
Solr bitset intersection based facets, compare memory
consumption.

For near realtime Solr needs to cache and merge bitsets at the
SegmentReader level, and Bobo needs to be upgraded to work with
Lucene 2.9's searching at the segment level (currently it uses a
MultiSearcher).

Distributed search on either should be fairly straightforward?

On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote:
> It seems BOBO-Browse is alternate faceting engine; would be interesting to
> compare performance with SOLR... Distributed?
>
>
> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: August-12-09 6:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: facet performance tips
>
> For your fields with many terms you may want to try Bobo
> http://code.google.com/p/bobo-browse/ which could work well with your
> case.
>
>
>
>
>




Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
Right, I haven't used SOLR-475 yet and am more familiar with
Bobo. I believe there are differences but I haven't gone into
them yet. As I'm using Solr 1.4 now, maybe I'll test the
UnInvertedField modality.

Feel free to report back results as I don't think I've seen much
yet?

On Thu, Aug 13, 2009 at 10:51 AM, Fuad Efendi wrote:
> SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to
> be); check this
> http://issues.apache.org/jira/browse/SOLR-475
> (and probably http://issues.apache.org/jira/browse/SOLR-711)
>
> -Original Message-
> From: Jason Rutherglen
>
> Yeah we need a performance comparison, I haven't had time to put
> one together. If/when I do I'll compare Bobo performance against
> Solr bitset intersection based facets, compare memory
> consumption.
>
> For near realtime Solr needs to cache and merge bitsets at the
> SegmentReader level, and Bobo needs to be upgraded to work with
> Lucene 2.9's searching at the segment level (currently it uses a
> MultiSearcher).
>
> Distributed search on either should be fairly straightforward?
>
> On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote:
>> It seems BOBO-Browse is alternate faceting engine; would be interesting to
>> compare performance with SOLR... Distributed?
>>
>>
>> -Original Message-
>> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
>> Sent: August-12-09 6:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: facet performance tips
>>
>> For your fields with many terms you may want to try Bobo
>> http://code.google.com/p/bobo-browse/ which could work well with your
>> case.
>>
>>
>>
>>
>>
>
>
>


Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-13 Thread Grant Ingersoll

BTW, what version of Solr are you on?

On Aug 13, 2009, at 1:43 PM, Fuad Efendi wrote:


UPDATE:

I have 100,000,000 new documents in 24 hours, including possible  
updates OR
possibly adding same document several times. I have two segments now  
(30Gb
total), and network is overloaded (I use web crawler to generate  
documents).

I never had more than 25,000,000 within a month before...

I read that high mergeFactor improves performance of updates;  
however, it
didn't work (it delays all merges... commit/optimize took similar  
timing).

High ramBufferSizeMB does the job.


[Fuad Efendi] >Looks like I temporarily solved the problem with
not-so-obvious settings:
[Fuad Efendi] >ramBufferSizeMB=8192
[Fuad Efendi] >mergeFactor=10




Never tried profiling;
3000-5000 docs per second if SOLR is not busy with segment merge;

During segment merge 99% CPU, no disk swap; I can't suspect I/O...

During document updates (small batches 100-1000 docs) only 5-15% CPU

constant rate 5:1 is very suspicious...


In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
Buffer
Flash / Segment Merge per 1 minute of (heavy) batch document  
updates.






--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Solr 1.4 Clustering / mlt AS search?

2009-08-13 Thread Grant Ingersoll


On Aug 13, 2009, at 1:29 PM, Mark Bennett wrote:


* mlb: comments

On Thu, Aug 13, 2009 at 9:39 AM, Stanislaw Osinski  
wrote:



Hi,

On Tue, Aug 11, 2009 at 22:19, Mark Bennett   
wrote:


Carrot2 has several pluggable algorithms to choose from, though I  
have no
evidence that they're "better" than Lucene's.  Where TF/IDF is  
sort of a

one
step algebraic calculation, some clustering algorithms use iterative
approaches, etc.



I'm not sure if I completely follow the way in which you'd like to  
use

Carrot2 for scoring -- would you cluster the whole index? Carrot2 was
designed to be a post-retrieval clustering algorithm and optimized to
cluster small sets of documents (up to ~1000) in real time. All  
processing
is performed in-memory, which limits Carrot2's applicability to  
really

large
sets of documents.

S.



* mlb: I agree with all of your assertions, but...

There are comments in the Solr materials about having an option to  
cluster
based on the entire document set, and some warning about this being  
atypical
and possibly slow.  And from what you're saying, for a big enough  
docset, it

might go from "slow" to "impossible", I'm not sure.


Those comments are referring to a yet unimplemented feature that will  
allow for pluggable background clustering using something like Mahout  
to cluster the whole collection and then return back the results later  
upon request.





And so my question was, *if* you were willing to spend that much  
time and
effort to cluster all the text of all the documents (and if it were  
even

possible), would the result perform better than the standard TF/IDF
techniques?

In the application I'm considering, the queries tend to be longer than
average, more like full sentences or more.  And they tend to be of a
question and answer nature.  I've seen references in several search  
engines
that QandA search sometimes benefits from alternative search  
techniques.
And, from a separate email, the IDF part of the standard similarity  
may be

causing a problem, so I'm casting a wide net for other ideas.  Just
brainstorming here... :-)


QA has a lot of factors at play, but I can't recall anyone using  
clustering as a way of doing the initial passage retrieval, but it's  
been a few years since I kept up with that literature.


You of course can turn off or downplay IDF if that is an issue.   I  
think payloads can also play a useful hand in QA (or Lucene's new  
Attribute capabilities, but I won't quite go there yet) because you  
could store term level information (often POS plays a role in helping  
QA, as well as parsing information)



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



RE: Curl error 26 failed creating formpost data

2009-08-13 Thread Kevin Miller
I figured out what was causing this error.  I was directing the
information for the myfile into the wrong directory. 


Kevin Miller
Web Services

-Original Message-
From: Kevin Miller [mailto:kevin.mil...@oktax.state.ok.us] 
Sent: Thursday, August 13, 2009 10:08 AM
To: solr-user@lucene.apache.org
Subject: Curl error 26 failed creating formpost data

I am trying to use the curl command located on the Extracting Request
Handler on the Solr Wiki.  I am using the command in the following way:

curl
"http://echo12:8983/solr/update/extract?literal.id=doc1&uprefix=attr&map
.content=attr_content&commit=true" -F "myfi...@../../BadNews.doc"

echo12 is the server where Solr is located and the BadNews.doc is
located in the exampledocs directory.

When I execute this command I get the following error message: curl:
(26) failed creating formpost data.

Can someone please direct me to where I can find the way to correct this
error?


Kevin Miller
Web Services



HTTP ERROR: 500 No default field name specified

2009-08-13 Thread Kevin Miller
I have a different error once I direct the curl to look in the correct
folder for the file.  I am getting an HTTP ERROR: 500 No default field
name specified.

I am using a test word document in the exampledocs folder.  I am issuing
the curl command from the exampledocs folder.  Following is the command
I am using:

c:\curl\bin\curl
"http://echo12:8983/solr/update/extract?literal.id=doc1uprefix=attr_&map
.content=attr_content&commit=true" -F "myfi...@badnews.doc"

curl is installed on my machine at c:\curl and the .exe file is located
at c:\curl\bin

Can someone please direct me to where I can look to find out how to the
a default field name?


Kevin Miller
Oklahoma Tax Commission
Web Services


RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-13 Thread Fuad Efendi
I upgraded "master" to 1.4-dev from trunk 3 days ago

BTW such performance broke my "commodity hardware", most probably network
card... can't SSH to check stats; need to check onsite what happened...


-Original Message-
From: Grant Ingersoll 
Sent: August-13-09 4:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

BTW, what version of Solr are you on?

On Aug 13, 2009, at 1:43 PM, Fuad Efendi wrote:

> UPDATE:
>
> I have 100,000,000 new documents in 24 hours, including possible  
> updates OR
> possibly adding same document several times. I have two segments now  
> (30Gb
> total), and network is overloaded (I use web crawler to generate  
> documents).
> I never had more than 25,000,000 within a month before...
>
> I read that high mergeFactor improves performance of updates;  
> however, it
> didn't work (it delays all merges... commit/optimize took similar  
> timing).
> High ramBufferSizeMB does the job.
>
>
> [Fuad Efendi] >Looks like I temporarily solved the problem with
> not-so-obvious settings:
> [Fuad Efendi] >ramBufferSizeMB=8192
> [Fuad Efendi] >mergeFactor=10
>
>
>
>> Never tried profiling;
>> 3000-5000 docs per second if SOLR is not busy with segment merge;
>>
>> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>>
>> During document updates (small batches 100-1000 docs) only 5-15% CPU
>>
>> constant rate 5:1 is very suspicious...
>>
>>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>>> Buffer
>>> Flash / Segment Merge per 1 minute of (heavy) batch document  
>>> updates.
>
>
>

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search





Re: Facets with an IDF concept

2009-08-13 Thread wojtekpia

Hi Asif,

Did you end up implementing this as a custom sort order for facets? I'm
facing a similar problem, but not related to time. Given 2 terms:
A: appears twice in half the search results
B: appears once in every search result
I think term A is more "interesting". Using facets sorted by frequency, term
B is more important (since it shows up first). To me, terms that appear in
all documents aren't really that interesting. I'm thinking of using a
combination of document count (in the result set, not globally) and term
frequency (in the result set, not globally) to come up with a facet sort
order.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Facets-with-an-IDF-concept-tp24071160p24959192.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Lock timed out 2 worker running

2009-08-13 Thread renz052496
Yes, I missunderstood you question (re: the crashed). Solr did not crash but
we shutdown the JVM (tomcat) gracefully after we kill all our workers. But
upon restarting, solr just throwing the error.
Regards,
/Renz

2009/8/11 Chris Hostetter 

>
> : > 5) are these errors appearing after Solr crashes and you restart it?
> :
> :
> : Yep, I can't find the logs but it's something like can't obtain lock for
> : .lck Need to delete that fiile in order to start the solr
> properly
>
> wait ... either you missunderstood my question, or you just explained
> what's happening.
>
> If you are using SimpleFSLock, and solr crashes (OOM, kill -9, yank the
> power cord) then it's possible the lock file will get left arround, in
> which case this is the expected behavior.  there's a config option you
> can set to tell solr that on start up you want it to cleanup any old lock
> files, but if you switch to the "single" lock manager mode your life gets
> a lot easier anyway.
>
> But you never mentioned anything about the server crashing in your
> original message, so i'm wondering if you really ment to answer "yep" when
> i asked "are these errors appearing *after* Solr crashes"
>
>
> -Hoss
>
>


Solr 1.4 Replication scheme

2009-08-13 Thread KaktuChakarabati

Hello,
I've recently switched over to solr1.4 (recent nightly build) and have been
using the new replication.
Some questions come to mind:

In the old replication, I could snappull with multiple slaves asynchronously
but perform the snapinstall on each at the same time (+- epsilon seconds),
so that way production load balanced query serving will always be
consistent.

With the new system it seems that i have no control over syncing them, but
rather it polls every few minutes and then decides the next cycle based on
last time it *finished* updating, so in any case I lose control over the
synchronization of snap installation across multiple slaves. 

Also, I noticed the default poll interval is 60 seconds. It would seem that
for such a rapid interval, what i mentioned above is a non issue, however i
am not clear how this works vis-a-vis the new searcher warmup? for a
considerable index size (20Million docs+) the warmup itself is an expensive
and somewhat lengthy process and if a new searcher opens and warms up every
minute, I am not at all sure i'll be able to serve queries with reasonable
QTimes.

Anyone else came across these issues? any advise/comment will be
appreciated!

Thanks,
-Chak

-- 
View this message in context: 
http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24965590.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: [OT] Solr Webinar

2009-08-13 Thread Lukáš Vlček
Hello,
they [Lucid Imagination guys] said it should be published on their blog.
I hope I understood it correctly.

Regards,
Lukas

http://blog.lukas-vlcek.com/


On Fri, Aug 14, 2009 at 7:52 AM, Mani Kumar wrote:

> if anyone has any pointer to this webinar, please share it.
> thanks!
> mani
>
> On Thu, Aug 13, 2009 at 9:26 PM, Chenini, Mohamed  >wrote:
>
> > I also registered to attend but I am not going to because here at work a
> > last minute meeting has been scheduled at the same time.
> >
> > Is it possible in the future to schedule such webinars starting 5-6 PM
> > ET?
> >
> > Thanks,
> > Mohamed
> >
> > -Original Message-
> > From: Grant Ingersoll [mailto:gsing...@apache.org]
> > Sent: Wednesday, August 12, 2009 6:22 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: [OT] Solr Webinar
> >
> > I believe it will be, but am not sure of the procedure for
> > distributing.  I think if you register, but don't show, you will get a
> > notification.
> >
> > -Grant
> >
> > On Aug 10, 2009, at 12:26 PM, Lucas F. A. Teixeira wrote:
> >
> > > Hello Grant,
> > > Will the webinar be recorded and available to download later
> > > someplace?
> > > Unfortunately, I can't watch this time.
> > >
> > > Thanks,
> > >
> > > []s,
> > >
> > > Lucas Frare Teixeira .*.
> > > - lucas...@gmail.com
> > > - blog.lucastex.com
> > > - twitter.com/lucastex
> > >
> > >
> > > On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll
> > > wrote:
> > >
> > >> I will be giving a free one hour webinar on getting started with
> > >> Apache
> > >> Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT
> > >>
> > >> You can sign up @
> > >> http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP
> > >>
> > >> I will present and demo:
> > >> * Getting started with LucidWorks for Solr
> > >> * Getting better, faster results using Solr's findability and
> > >> relevance
> > >> improvement tools
> > >> * Deploying Solr in production, including monitoring performance
> > >> and trends
> > >> with the LucidGaze for Solr performance profiler
> > >>
> > >> -Grant
> >
> > 
> > This email/fax message is for the sole use of the intended
> > recipient(s) and may contain confidential and privileged information.
> > Any unauthorized review, use, disclosure or distribution of this
> > email/fax is prohibited. If you are not the intended recipient, please
> > destroy all paper and electronic copies of the original message.
> >
> >
>


Re: defaultOperator="AND" and queries with "("

2009-08-13 Thread Shalin Shekhar Mangar
On Thu, Aug 13, 2009 at 5:31 AM, Subbacharya, Madhu <
madhu.subbacha...@corp.aol.com> wrote:

>
> Hello,
>
>   We have Solr running with the defaultOperator set to "AND".  Am not able
> to get any results for queries like   q=( Ferrari AND ( "599 GTB Fiorano" OR
> "612 Scaglietti" OR F430 )) , which contain "(" for grouping. Anyone have
> any ideas for a workaround ?
>
>
Can you try adding debugQuery=on to the request and post the details here?

-- 
Regards,
Shalin Shekhar Mangar.


Re: [OT] Solr Webinar

2009-08-13 Thread Mani Kumar
if anyone has any pointer to this webinar, please share it.
thanks!
mani

On Thu, Aug 13, 2009 at 9:26 PM, Chenini, Mohamed wrote:

> I also registered to attend but I am not going to because here at work a
> last minute meeting has been scheduled at the same time.
>
> Is it possible in the future to schedule such webinars starting 5-6 PM
> ET?
>
> Thanks,
> Mohamed
>
> -Original Message-
> From: Grant Ingersoll [mailto:gsing...@apache.org]
> Sent: Wednesday, August 12, 2009 6:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: [OT] Solr Webinar
>
> I believe it will be, but am not sure of the procedure for
> distributing.  I think if you register, but don't show, you will get a
> notification.
>
> -Grant
>
> On Aug 10, 2009, at 12:26 PM, Lucas F. A. Teixeira wrote:
>
> > Hello Grant,
> > Will the webinar be recorded and available to download later
> > someplace?
> > Unfortunately, I can't watch this time.
> >
> > Thanks,
> >
> > []s,
> >
> > Lucas Frare Teixeira .*.
> > - lucas...@gmail.com
> > - blog.lucastex.com
> > - twitter.com/lucastex
> >
> >
> > On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll
> > wrote:
> >
> >> I will be giving a free one hour webinar on getting started with
> >> Apache
> >> Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT
> >>
> >> You can sign up @
> >> http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP
> >>
> >> I will present and demo:
> >> * Getting started with LucidWorks for Solr
> >> * Getting better, faster results using Solr's findability and
> >> relevance
> >> improvement tools
> >> * Deploying Solr in production, including monitoring performance
> >> and trends
> >> with the LucidGaze for Solr performance profiler
> >>
> >> -Grant
>
> 
> This email/fax message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged information.
> Any unauthorized review, use, disclosure or distribution of this
> email/fax is prohibited. If you are not the intended recipient, please
> destroy all paper and electronic copies of the original message.
>
>


Re: Questions about XPath in data import handler

2009-08-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
yes. look at the 'flatten' attribute in the field. It should give you
all the text (not attributes) under a given node.



On Thu, Aug 13, 2009 at 8:02 PM, Andrew Clegg wrote:
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg
>> wrote:
>>
>>> Does the second one mean "select the value of the attribute called
>>> qualifier
>>> in the /a/b/subject element"?
>>
>> yes you are right. Isn't that the semantics of standard xpath syntax?
>>
>
> Yes, just checking since the DIH XPath engine is a little different.
>
> Do you know what I would get in this case?
>
>> > Also... Can I select a non-leaf node and get *ALL* the text underneath
>> it?
>> > e.g. /a/b in this example?
>
> Cheers,
>
> Andrew.
>
> --
> View this message in context: 
> http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954869.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com