date:20091012

Re: dose solr sopport distribute index storage ?

2009-10-12 Thread Shalin Shekhar Mangar

On Mon, Oct 12, 2009 at 10:27 AM, Pravin Karne <
pravin_ka...@persistent.co.in> wrote:

> How to set master/slave setup for solr.
>
>
Index documents only on the master. Put the slaves behind a load balancer
and query only on slaves. Setup replication between the master and slaves.
See http://wiki.apache.org/solr/SolrReplication

-- 
Regards,
Shalin Shekhar Mangar.

Re: Facet query help

2009-10-12 Thread Shalin Shekhar Mangar

On Mon, Oct 12, 2009 at 6:07 AM, Tommy Chheng wrote:

> The dummy data set is composed of 6 docs.
>
> My query is set for 'tommy' with the facet query of Memory_s:1+GB
>
> http://lh:8983/solr/select/?facet=true&facet.field=CPU_s&facet.field=Memory_s&facet.field=Video+Card_s&wt=ruby&facet.query=Memory_s:1+GB&q=tommy&indent=on
>
> However, in the response (http://pastie.org/650932), I get two docs: one
> which has the correct field Memory_s:1 GB and the second document which has
> a Memory_s:3+GB. Why did the second document match if i set the facet.query
> to just 1+GB??
>
>
facet.query does not limit documents. It is used for finding the number of
documents matching the query. In order to filter the result set you should
use filter query e.g. fq=Memory_s:"1 GB"

-- 
Regards,
Shalin Shekhar Mangar.

Re: Is negative boost possible?

2009-10-12 Thread Andrzej Bialecki


Yonik Seeley wrote:

On Sun, Oct 11, 2009 at 6:04 PM, Lance Norskog  wrote:

And the other important
thing to know about boost values is that the dynamic range is about
6-8 bits


That's an index-time boost - an 8 bit float with 5 bits of mantissa
and 3 bits of exponent.
Query time boosts are normal 32 bit floats.


To be more specific: index-time float encoding does not permit negative 
numbers (see SmallFloat), but query-time boosts can be negative, and 
they DO affect the score - see below. BTW, standard Collectors collect 
only results with positive scores, so if you want to collect results 
with negative scores as well then you need to use a custom Collector.


---
BeanShell 2.0b4 - by Pat Niemeyer (p...@pat.net)
bsh % import org.apache.lucene.search.*;
bsh % import org.apache.lucene.index.*;
bsh % import org.apache.lucene.store.*;
bsh % import org.apache.lucene.document.*;
bsh % import org.apache.lucene.analysis.*;
bsh % tq = new TermQuery(new Term("a", "b"));
bsh % print(tq);
a:b
bsh % tq.setBoost(-1);
bsh % print(tq);
a:b^-1.0
bsh % q = new BooleanQuery();
bsh % tq1 = new TermQuery(new Term("a", "c"));
bsh % tq1.setBoost(10);
bsh % q.add(tq1, BooleanClause.Occur.SHOULD);
bsh % q.add(tq, BooleanClause.Occur.SHOULD);
bsh % print(q);
a:c^10.0 a:b^-1.0
bsh % dir = new RAMDirectory();
bsh % w = new IndexWriter(dir, new WhitespaceAnalyzer());
bsh % doc = new Document();
bsh % doc.add(new Field("a", "b c d", Field.Store.YES, 
Field.Index.ANALYZED));

bsh % w.addDocument(doc);
bsh % w.close();
bsh % r = IndexReader.open(dir);
bsh % is = new IndexSearcher(r);
bsh % td = is.search(q, 10);
bsh % sd = td.scoreDocs;
bsh % print(sd.length);
1
bsh % print(is.explain(q, 0));
0.1373985 = (MATCH) sum of:
  0.15266499 = (MATCH) weight(a:c^10.0 in 0), product of:
0.99503726 = queryWeight(a:c^10.0), product of:
  10.0 = boost
  0.30685282 = idf(docFreq=1, numDocs=1)
  0.32427183 = queryNorm
0.15342641 = (MATCH) fieldWeight(a:c in 0), product of:
  1.0 = tf(termFreq(a:c)=1)
  0.30685282 = idf(docFreq=1, numDocs=1)
  0.5 = fieldNorm(field=a, doc=0)
  -0.0152664995 = (MATCH) weight(a:b^-1.0 in 0), product of:
-0.099503726 = queryWeight(a:b^-1.0), product of:
  -1.0 = boost
  0.30685282 = idf(docFreq=1, numDocs=1)
  0.32427183 = queryNorm
0.15342641 = (MATCH) fieldWeight(a:b in 0), product of:
  1.0 = tf(termFreq(a:b)=1)
  0.30685282 = idf(docFreq=1, numDocs=1)
  0.5 = fieldNorm(field=a, doc=0)

bsh %


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: rollback and cumulative_add

2009-10-12 Thread Koji Sekiguchi

Koji Sekiguchi wrote:
> Hello,
>
> I found that rollback resets adds and docsPending count,
> but doesn't reset cumulative_adds.
>
> $ cd example/exampledocs
> # comment out the line of  so avoid committing in post.sh
> $ ./post.sh *.xml
> => docsPending=19, adds=19, cumulative_adds=19
>
> # do rollback
> $ curl http://localhost:8983/solr/update?rollback=true
> => rollbacks=1, docsPending=0, adds=0, cumulative_adds=19
>
> Is this correct behavior?
>
> Koji
>
>   
(forwarded dev list)

I think this is a bug that was introduced by me when I contributed
the first patch for the rollback and the bug was inherited by
the successive patches. I'll reopen SOLR-670 and attach the fix soon:

https://issues.apache.org/jira/browse/SOLR-670

Koji
-- 

http://www.rondhuit.com/

Re: Is negative boost possible?

2009-10-12 Thread Yonik Seeley

On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki  wrote:
> BTW, standard Collectors collect only results
> with positive scores, so if you want to collect results with negative scores
> as well then you need to use a custom Collector.

Solr never discarded non-positive hits, and now Lucene 2.9 no longer
does either.

-Yonik

two facet.prefix on one facet field in a single query

2009-10-12 Thread Bill Au

Is it possible to have two different facet.prefix on the same facet field in
a single query.  I wan to get facet counts for two prefix, "xx" and "yy".  I
tried using two facet.prefix (ie &facet.prefix=xx&facet.prefix=yy) but the
second one seems to have no effect.

Bill

Re: Facet query help

2009-10-12 Thread Tommy Chheng

ok, so fq != facet.query. i thought it was an alias. I'm trying your 
suggestion fq=Memory_s:"1 GB" and now it's returning zero documents even 
though there is one document that has "tommy" and "Memory_s:1 GB" as 
seen in the original pastie(http://pastie.org/650932). I tried the fq 
query body with quotes and without quotes.


http://lh:8983/solr/select/?facet=true&facet.field=CPU_s&facet.field=Memory_s&facet.field=Video+Card_s&wt=ruby&fq=%22Memory_s:1+GB%22&q=tommy&indent=on

Any thoughts?

thanks,
tommy

On 10/12/09 1:00 AM, Shalin Shekhar Mangar wrote:

On Mon, Oct 12, 2009 at 6:07 AM, Tommy Chhengwrote:

   

The dummy data set is composed of 6 docs.

My query is set for 'tommy' with the facet query of Memory_s:1+GB

http://lh:8983/solr/select/?facet=true&facet.field=CPU_s&facet.field=Memory_s&facet.field=Video+Card_s&wt=ruby&facet.query=Memory_s:1+GB&q=tommy&indent=on

However, in the response (http://pastie.org/650932), I get two docs: one
which has the correct field Memory_s:1 GB and the second document which has
a Memory_s:3+GB. Why did the second document match if i set the facet.query
to just 1+GB??


 

facet.query does not limit documents. It is used for finding the number of
documents matching the query. In order to filter the result set you should
use filter query e.g. fq=Memory_s:"1 GB"

Re: format of sort parameter in Solr::Request::Standard

2009-10-12 Thread Paul Rosen

I did an experiment that worked. In Solr::Request::Standard, in the 
to_hash() method, I changed the commented line below to the two lines 
following it.


sort = @params[:sort].collect do |sort|
  key = sort.keys[0]
  "#{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'}"
end.join(',') if @params[:sort]

# START OF CHANGES
#hash[:q] = sort ? "#...@params[:query]};#{sort}" : @params[:query]
hash[:q] = @params[:query]
hash[:sort] = sort if sort != nil
# END OF CHANGES

hash["q.op"] = @params[:operator]
hash[:df] = @params[:default_field]

Does this make sense? Should this be changed in the next version of the 
solr-ruby gem?


Paul Rosen wrote:

Hi all,

I'm using solr-ruby 0.0.7 and am having trouble getting Sort to work.

I have the following statement:

req = Solr::Request::Standard.new(:start => start, :rows => max,
 :sort => [ :title_sort => :ascending ],
 :query => query, :filter_queries => filter_queries,
 :field_list => @field_list,
 :facets => {:fields => @facet_fields, :mincount => 1, :missing => true, 
:limit => -1},
 :highlighting => {:field_list => ['text'], :fragment_size => 600}, 
:shards => @cores)


That produces no results, but removing the :sort parameter off does give 
results.


Here is the output from solr:

INFO: [merged] webapp=/solr path=/select 
params={wt=ruby&facet.limit=-1&rows=30&start=0&facet=true&facet.mincount=1&q=(rossetti);title_sort+asc&fl=archive,date_label,genre,role_ART,role_AUT,role_EDT,role_PBL,role_TRL,source,image,thumbnail,text_url,title,alternative,uri,url,exhibit_type,license,title_sort,author_sort&qt=standard&facet.missing=true&hl.fl=text&facet.field=genre&facet.field=archive&facet.field=freeculture&hl.fragsize=600&hl=true&shards=localhost:8983/solr/merged} 
status=0 QTime=19


It looks to me like the string should have "&sort=title_sort+asc" 
instead of ";title_sort_asc" tacked on to the query, but I'm not sure 
about that.


Any clues what I'm doing wrong?

Thanks,
Paul

Re: format of sort parameter in Solr::Request::Standard

2009-10-12 Thread Erik Hatcher


Paul-

Trunk solr-ruby has this instead:

hash[:sort] = @params[:sort].collect do |sort|
  key = sort.keys[0]
  "#{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'}"
end.join(',') if @params[:sort]

The ";sort..." stuff is now deprecated with Solr itself

I suppose the 0.8 gem needs to be pushed to rubyforge, eh?

Erik


On Oct 12, 2009, at 11:03 AM, Paul Rosen wrote:

I did an experiment that worked. In Solr::Request::Standard, in the  
to_hash() method, I changed the commented line below to the two  
lines following it.


   sort = @params[:sort].collect do |sort|
 key = sort.keys[0]
 "#{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'}"
   end.join(',') if @params[:sort]

# START OF CHANGES
   #hash[:q] = sort ? "#...@params[:query]};#{sort}" : @params[:query]
   hash[:q] = @params[:query]
   hash[:sort] = sort if sort != nil
# END OF CHANGES

   hash["q.op"] = @params[:operator]
   hash[:df] = @params[:default_field]

Does this make sense? Should this be changed in the next version of  
the solr-ruby gem?


Paul Rosen wrote:

Hi all,
I'm using solr-ruby 0.0.7 and am having trouble getting Sort to work.
I have the following statement:
req = Solr::Request::Standard.new(:start => start, :rows => max,
:sort => [ :title_sort => :ascending ],
:query => query, :filter_queries => filter_queries,
:field_list => @field_list,
:facets => {:fields => @facet_fields, :mincount => 1, :missing =>  
true, :limit => -1},
:highlighting => {:field_list => ['text'], :fragment_size =>  
600}, :shards => @cores)
That produces no results, but removing the :sort parameter off does  
give results.

Here is the output from solr:
INFO: [merged] webapp=/solr path=/select  
params 
= 
{wt 
= 
ruby 
&facet 
.limit 
= 
-1 
&rows=30&start=0&facet=true&facet.mincount=1&q=(rossetti);title_sort 
+ 
asc 
&fl 
= 
archive 
,date_label 
,genre 
,role_ART 
,role_AUT 
,role_EDT 
,role_PBL 
,role_TRL 
,source 
,image 
,thumbnail 
,text_url 
,title 
,alternative 
,uri 
,url 
,exhibit_type 
,license 
,title_sort 
,author_sort 
&qt 
= 
standard 
&facet 
.missing 
= 
true 
&hl 
.fl 
= 
text 
&facet 
.field 
= 
genre 
&facet 
.field 
= 
archive 
&facet.field=freeculture&hl.fragsize=600&hl=true&shards=localhost: 
8983/solr/merged} status=0 QTime=19
It looks to me like the string should have "&sort=title_sort+asc"  
instead of ";title_sort_asc" tacked on to the query, but I'm not  
sure about that.

Any clues what I'm doing wrong?
Thanks,
Paul

Solr over DRBD

2009-10-12 Thread Pieter Steyn

Hi there,

I have a 2 node cluster running apache and solr over a shared
partition ontop of DRBD.   Think of it like a SAN.

I'm curios as to how I should do load balancing / sharing with Solr in
this setup.  I'm already using DNS round robbin for apache.

My Solr installation is on /cluster/Solr.  I've been starting an
instance of Solr on each server out of the same installation / working
directory.
Is this safe?  I haven't noticed any problems so far.

Does this mean they'll share the same index?  Is there a better way to
do this?  Should I perhaps only do commits on one of the servers (and
setup heartbeat to determine which server to run the commit on)?

I'm running Solr 1.3, but I'm not against upgrading if that provides
me with a better way of load balancing.

Kind regards,
Pieter

capitalization and delimiters

2009-10-12 Thread Audrey Foo



In my search docs, I have content such as 'powershot' and 'powerShot'.
I would expect 'powerShot' would be searched as 'power', 'shot' and 
'powershot', so that results for all these are returned. Instead, only results 
for 'power' and 'shot' are returned.
Any suggestions?
In the schema, index analyzer:
In the schema, query analyzer
ThanksAudrey  
_
New! Open Messenger faster on the MSN homepage
http://go.microsoft.com/?linkid=9677405

Re: Default query parameter for one core

2009-10-12 Thread Michael

Thanks for your input, Shalin.

On Sun, Oct 11, 2009 at 12:30 AM, Shalin Shekhar Mangar
 wrote:
>> - I can't use a variable like ${shardsParam} in a single shared
>> solrconfig.xml, because the line
>>    ${shardsParam}
>>  has to be in there, and that forces a (possibly empty) &shards
>> parameter onto cores that *don't* need one, causing a
>> NullPointerException.
>>
>>
> Well, we can fix the NPE :)  Please raise an issue.

The NPE may be the "correct" behavior -- I'm causing an empty &shards=
parameter, which doesn't have a defined behavior AFAIK.  The
deficiency I was pointing out was that using ${shardsParam} doesn't
help me achieve my real goal, which is to have the entire  tag
disappear for some shards.

>> So I think my best bet is to make two mostly-identical
>> solrconfig.xmls, and point core0 to the one specifying a &shards=
>> parameter:
>>    
>>
>> I don't like the duplication of config, but at least it accomplishes my
>> goal!
>>
>>
> There is another way too. Each plugin in Solr now supports a configuration
> attribute named "enable" which can be true or false. You can control the
> value (true/false) through a variable. So you can duplicate just the handle
> instead of the complete solrconfig.xml

I had looked into this, but thought it doesn't help because I'm not
disabling an entire plugin -- just a  tag specifying a default
parameter to a .  Individual  tags don't have an
"enable" flag for me to conditionally set to false.  Maybe I'm
misunderstanding what you're suggesting?

Thanks again,
Michael

Re: Is negative boost possible?

2009-10-12 Thread Andrzej Bialecki


Yonik Seeley wrote:

On Mon, Oct 12, 2009 at 5:58 AM, Andrzej Bialecki  wrote:

BTW, standard Collectors collect only results
with positive scores, so if you want to collect results with negative scores
as well then you need to use a custom Collector.


Solr never discarded non-positive hits, and now Lucene 2.9 no longer
does either.


Hmm ... The code that I pasted in my previous email uses 
Searcher.search(Query, int), which in turn uses search(Query, Filter, 
int), and it doesn't return any results if only the first clause is 
present (the one with negative boost) even though it's a matching clause.


I think this is related to the fact that in TopScoreDocCollector:48 the 
pqTop.score is initialized to 0, and then all results that have lower 
score that this are discarded. Perhaps this should be initialized to 
Float.MIN_VALUE?



--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Scoring for specific field queries

2009-10-12 Thread R. Tan

Avlesh,

I got it, finally, by doing an OR between the two fields, one with an exact
match keyword and the other is grouped.

q=suggestion:"formula xxx" OR tokenized_suggestion:(formula )

Thanks for all your help!

Rih


On Fri, Oct 9, 2009 at 4:26 PM, R. Tan  wrote:

> I ended up with the same set of results earlier but I don't results such as
> "the champion", I think because of the EdgeNGram filter.
>
> With NGram, I'm back to the same problem:
>
> Result for q=ca
>
> 
> 0.8717008
> Blu Jazz Cafe
> 
>
> 
> 0.8717008
> Café in the Pond
> 
>
>

Letters with accent in query

2009-10-12 Thread R. Tan

Hi,
I'm querying with an accented keyword such as "café" but the debug info
shows that it is only searching for "caf". I'm using the ISOLatin1Accent
filter as well.

Query:
http://localhost:8983/solr/select?q=%E9&debugQuery=true

Params return shows this:


true


What am I missing here?

Rih

Re: Default query parameter for one core

2009-10-12 Thread Michael

OK, a hacky but working solution to making one core shard to all
others: have the default parameter *name* vary, so that one core gets
"&shards=foo" and all other cores get "&dummy=foo".

# solr.xml




  


  
  
   ...



# solrconfig.xml

  
${shardsValue}
...

Michael

On Mon, Oct 12, 2009 at 12:00 PM, Michael  wrote:
> Thanks for your input, Shalin.
>
> On Sun, Oct 11, 2009 at 12:30 AM, Shalin Shekhar Mangar
>  wrote:
>>> - I can't use a variable like ${shardsParam} in a single shared
>>> solrconfig.xml, because the line
>>>    ${shardsParam}
>>>  has to be in there, and that forces a (possibly empty) &shards
>>> parameter onto cores that *don't* need one, causing a
>>> NullPointerException.
>>>
>>>
>> Well, we can fix the NPE :)  Please raise an issue.
>
> The NPE may be the "correct" behavior -- I'm causing an empty &shards=
> parameter, which doesn't have a defined behavior AFAIK.  The
> deficiency I was pointing out was that using ${shardsParam} doesn't
> help me achieve my real goal, which is to have the entire  tag
> disappear for some shards.
>
>>> So I think my best bet is to make two mostly-identical
>>> solrconfig.xmls, and point core0 to the one specifying a &shards=
>>> parameter:
>>>    
>>>
>>> I don't like the duplication of config, but at least it accomplishes my
>>> goal!
>>>
>>>
>> There is another way too. Each plugin in Solr now supports a configuration
>> attribute named "enable" which can be true or false. You can control the
>> value (true/false) through a variable. So you can duplicate just the handle
>> instead of the complete solrconfig.xml
>
> I had looked into this, but thought it doesn't help because I'm not
> disabling an entire plugin -- just a  tag specifying a default
> parameter to a .  Individual  tags don't have an
> "enable" flag for me to conditionally set to false.  Maybe I'm
> misunderstanding what you're suggesting?
>
> Thanks again,
> Michael
>

Re: Letters with accent in query

2009-10-12 Thread Michael

What tokenizer and filters are you using in what order?  See schema.xml.

Also, you may wish to use ASCIIFoldingFilter, which covers more cases
than ISOLatin1AccentFilter.

Michael

On Mon, Oct 12, 2009 at 12:42 PM, R. Tan  wrote:
> Hi,
> I'm querying with an accented keyword such as "café" but the debug info
> shows that it is only searching for "caf". I'm using the ISOLatin1Accent
> filter as well.
>
> Query:
> http://localhost:8983/solr/select?q=%E9&debugQuery=true
>
> Params return shows this:
> 
> 
> true
> 
>
> What am I missing here?
>
> Rih
>

Search results order

2009-10-12 Thread bhaskar chandrasekar

Hi,
 
I have indexed my xml which contains the following data.
 


  http://www.yahoo.com 
  yahoomail
  yahoo has various links and gives in detail about 
the all the links in it


  http://www.rediff.com
  It is a good website
  Rediff has a interesting homepage


  http://www.ndtv.com
  Ndtv has a variety of good links
  The homepage of Ndtv is very good


 
 
In my solr home page , when I search input as “good”
 
It displays the docs which has “good” as highest occurrences by default.
 
The output comes as follows.

  http://www.ndtv.com
  Ndtv has a variety of good links
  The homepage of Ndtv is very good


  http://www.rediff.com
  It is a good website
  Rediff has a interesting homepage

 
If I need to display doc which has least occurrence of search input “good” as 
first result.
 
What changes should I make in solrconfig file to achieve the same?.
Any suggestions would be helpful.
 
 
For me the output should come as below.
 

  http://www.rediff.com
  It is a good website
  Rediff has a interesting homepage


  http://www.ndtv.com
  Ndtv has a variety of good links
  The homepage of Ndtv is very good

 
Regards
Bhaskar

Re: dose solr sopport distribute index storage ?

2009-10-12 Thread Chaitali Gupta

Hi, 

How should we setup master and slaves in Solr? What configuration files and 
parameters should we need to change and how ? 

Thanks, 
Chaitali 

--- On Mon, 10/12/09, Shalin Shekhar Mangar  wrote:

From: Shalin Shekhar Mangar 
Subject: Re: dose solr sopport distribute index storage ?
To: solr-user@lucene.apache.org
Date: Monday, October 12, 2009, 3:17 AM

On Mon, Oct 12, 2009 at 10:27 AM, Pravin Karne <
pravin_ka...@persistent.co.in> wrote:

> How to set master/slave setup for solr.
>
>
Index documents only on the master. Put the slaves behind a load balancer
and query only on slaves. Setup replication between the master and slaves.
See http://wiki.apache.org/solr/SolrReplication

-- 
Regards,
Shalin Shekhar Mangar.

Conditional copyField

2009-10-12 Thread David Stuart


Hi,
I am pushing data to solr from two different sources nutch and a cms.  
I have a data clash in that in nutch a copyField is required to push  
the url field to the id field as it is used as  the primary lookup in  
the nutch solr intergration update. The other cms also uses the url  
field but also populates the id field with a different value. Now I  
can't really change either source definition so is there a way in  
solrconfig or schema to check if id is empty and only copy if true or  
is there a better way via the updateprocessor?


Thanks for your help in advance
Regards

David

Re: format of sort parameter in Solr::Request::Standard

2009-10-12 Thread Erik Hatcher

I've just pushed a new 0.0.8 gem to Rubyforge that includes the fix I  
described for the sort parameter.


Erik


On Oct 12, 2009, at 11:03 AM, Paul Rosen wrote:

I did an experiment that worked. In Solr::Request::Standard, in the  
to_hash() method, I changed the commented line below to the two  
lines following it.


   sort = @params[:sort].collect do |sort|
 key = sort.keys[0]
 "#{key.to_s} #{sort[key] == :descending ? 'desc' : 'asc'}"
   end.join(',') if @params[:sort]

# START OF CHANGES
   #hash[:q] = sort ? "#...@params[:query]};#{sort}" : @params[:query]
   hash[:q] = @params[:query]
   hash[:sort] = sort if sort != nil
# END OF CHANGES

   hash["q.op"] = @params[:operator]
   hash[:df] = @params[:default_field]

Does this make sense? Should this be changed in the next version of  
the solr-ruby gem?


Paul Rosen wrote:

Hi all,
I'm using solr-ruby 0.0.7 and am having trouble getting Sort to work.
I have the following statement:
req = Solr::Request::Standard.new(:start => start, :rows => max,
:sort => [ :title_sort => :ascending ],
:query => query, :filter_queries => filter_queries,
:field_list => @field_list,
:facets => {:fields => @facet_fields, :mincount => 1, :missing =>  
true, :limit => -1},
:highlighting => {:field_list => ['text'], :fragment_size =>  
600}, :shards => @cores)
That produces no results, but removing the :sort parameter off does  
give results.

Here is the output from solr:
INFO: [merged] webapp=/solr path=/select  
params 
= 
{wt 
= 
ruby 
&facet 
.limit 
= 
-1 
&rows=30&start=0&facet=true&facet.mincount=1&q=(rossetti);title_sort 
+ 
asc 
&fl 
= 
archive 
,date_label 
,genre 
,role_ART 
,role_AUT 
,role_EDT 
,role_PBL 
,role_TRL 
,source 
,image 
,thumbnail 
,text_url 
,title 
,alternative 
,uri 
,url 
,exhibit_type 
,license 
,title_sort 
,author_sort 
&qt 
= 
standard 
&facet 
.missing 
= 
true 
&hl 
.fl 
= 
text 
&facet 
.field 
= 
genre 
&facet 
.field 
= 
archive 
&facet.field=freeculture&hl.fragsize=600&hl=true&shards=localhost: 
8983/solr/merged} status=0 QTime=19
It looks to me like the string should have "&sort=title_sort+asc"  
instead of ";title_sort_asc" tacked on to the query, but I'm not  
sure about that.

Any clues what I'm doing wrong?
Thanks,
Paul

Re: dose solr sopport distribute index storage ?

2009-10-12 Thread Dan Trainor


On 10/12/2009 10:49 AM, Chaitali Gupta wrote:

Hi,

How should we setup master and slaves in Solr? What configuration files and 
parameters should we need to change and how ?

Thanks,
Chaitali


Hi -

I think Shalin was pretty clear on that, it is documented very well at 
http://wiki.apache.org/solr/SolrReplication .


I am responding, however, to explain something that took me a bit of 
time to wrap my brain around in the hopes that it helps you and perhaps 
some others.


Solr in itself does not replicate.  Instead, Solr relies on an 
underlying rsync setup to keep these indices sync'd throughout the 
collective.  When you break it down, its simply rsync with a 
configuration file making all the nodes "aware" that they participate in 
this configuration.  Wrap a cron around this between all the nodes, and 
they simply replicate raw data from one "master" to one or more slave.


I would suggest reading up on how snapshots are preformed and how the 
log files are created/what they do.  Of course it would benefit you to 
know the ins and outs of all the elements that help Solr replicate, but 
its been my experience that most of it has to do with those particular 
items.


Thanks
-dant

Re: dose solr sopport distribute index storage ?

2009-10-12 Thread Pieter Steyn

Sorry for the hijack, but s replication necessary when using a cluster
file-system such as GFS2.  Where the files are the same for any
instance of Solr?


On Mon, Oct 12, 2009 at 8:36 PM, Dan Trainor  wrote:
> On 10/12/2009 10:49 AM, Chaitali Gupta wrote:
>>
>> Hi,
>>
>> How should we setup master and slaves in Solr? What configuration files
>> and parameters should we need to change and how ?
>>
>> Thanks,
>> Chaitali
>
> Hi -
>
> I think Shalin was pretty clear on that, it is documented very well at
> http://wiki.apache.org/solr/SolrReplication .
>
> I am responding, however, to explain something that took me a bit of time to
> wrap my brain around in the hopes that it helps you and perhaps some others.
>
> Solr in itself does not replicate.  Instead, Solr relies on an underlying
> rsync setup to keep these indices sync'd throughout the collective.  When
> you break it down, its simply rsync with a configuration file making all the
> nodes "aware" that they participate in this configuration.  Wrap a cron
> around this between all the nodes, and they simply replicate raw data from
> one "master" to one or more slave.
>
> I would suggest reading up on how snapshots are preformed and how the log
> files are created/what they do.  Of course it would benefit you to know the
> ins and outs of all the elements that help Solr replicate, but its been my
> experience that most of it has to do with those particular items.
>
> Thanks
> -dant
>
>

Re: Search results order

2009-10-12 Thread Nicholas Clark

You can reverse the sort order. In this case, you want score ascending:

sort=score+asc

If you just want documents without that keyword, then try using the minus
sign:

q=-good

http://wiki.apache.org/solr/CommonQueryParameters

-Nick


On Mon, Oct 12, 2009 at 1:19 PM, bhaskar chandrasekar
wrote:

> Hi,
>
> I have indexed my xml which contains the following data.
>
> 
> 
>   http://www.yahoo.com 
>   yahoomail
>   yahoo has various links and gives in detail
> about the all the links in it
> 
> 
>   http://www.rediff.com
>   It is a good website
>   Rediff has a interesting homepage
> 
> 
>   http://www.ndtv.com
>   Ndtv has a variety of good links
>   The homepage of Ndtv is very good
> 
> 
>
>
> In my solr home page , when I search input as “good”
>
> It displays the docs which has “good” as highest occurrences by default.
>
> The output comes as follows.
> 
>   http://www.ndtv.com
>   Ndtv has a variety of good links
>   The homepage of Ndtv is very good
> 
> 
>   http://www.rediff.com
>   It is a good website
>   Rediff has a interesting homepage
> 
>
> If I need to display doc which has least occurrence of search input “good”
> as first result.
>
> What changes should I make in solrconfig file to achieve the same?.
> Any suggestions would be helpful.
>
>
> For me the output should come as below.
>
> 
>   http://www.rediff.com
>   It is a good website
>   Rediff has a interesting homepage
> 
> 
>   http://www.ndtv.com
>   Ndtv has a variety of good links
>   The homepage of Ndtv is very good
> 
>
> Regards
> Bhaskar
>
>
>

Re: Boosting of words

2009-10-12 Thread Nicholas Clark

The easiest way to boost your query is to modify your query string.

q=product:red color:red^10

In the above example, I have boosted the color field. If "red" is found in
that field, it will get a boost of 10. If it is only found in the product
field, then there will be no boost.

Here's more information:

http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms

Once you're comfortable with that, I suggest that you look into using the
DisMax request handler. It will allow you to easily search across multiple
fields with custom boost values.

http://wiki.apache.org/solr/DisMaxRequestHandler

-Nick

On Sun, Oct 11, 2009 at 12:26 PM, bhaskar chandrasekar  wrote:

> Hi,
>
> I would like to know how can i give boosting to search input in Solr.
> Where exactly should i make the changes?.
>
> Regards
> Bhaskar
>
>
>

Re: Is negative boost possible?

2009-10-12 Thread Yonik Seeley

On Mon, Oct 12, 2009 at 12:03 PM, Andrzej Bialecki  wrote:
>> Solr never discarded non-positive hits, and now Lucene 2.9 no longer
>> does either.
>
> Hmm ... The code that I pasted in my previous email uses
> Searcher.search(Query, int), which in turn uses search(Query, Filter, int),
> and it doesn't return any results if only the first clause is present (the
> one with negative boost) even though it's a matching clause.
>
> I think this is related to the fact that in TopScoreDocCollector:48 the
> pqTop.score is initialized to 0, and then all results that have lower score
> that this are discarded. Perhaps this should be initialized to
> Float.MIN_VALUE?

Hmmm, You're actually seeing this with Lucene 2.9?
The HitQueue (subclass of PriorityQueue) is pre-populated with
sentinel objects with scores of -Inf, not zero.

-Yonik
http://www.lucidimagination.com

Re: Conditional copyField

2009-10-12 Thread AHMET ARSLAN

> Hi,
> I am pushing data to solr from two different sources nutch
> and a cms. I have a data clash in that in nutch a copyField
> is required to push the url field to the id field as it is
> used as  the primary lookup in the nutch solr
> intergration update. The other cms also uses the url field
> but also populates the id field with a different value. Now
> I can't really change either source definition so is there a
> way in solrconfig or schema to check if id is empty and only
> copy if true or is there a better way via the
> updateprocessor?

copyField declaration has three attributes: source, dest and maxChars.
Therefore it can be concluded that there is no way to do it in schema.xml

Luckily, Wiki [1] has a quick example that implements a conditional copyField.

[1] http://wiki.apache.org/solr/UpdateRequestProcessor

Re: Solr 1.4 Release Party

2009-10-12 Thread Michael Masters

Where does the quote come from :)

On Sat, Oct 10, 2009 at 6:38 AM, Israel Ekpo  wrote:
> I can't wait...
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
>

doing searches from within an UpdateRequestProcessor

2009-10-12 Thread Bill Au

Is it possible to do searches from within an UpdateRequestProcessor?  The
documents in my index reference each other.  When a document is deleted, I
would like to update all documents containing a reference to the deleted
document.  My initial idea is to use a custom UpdateRequestProcessor.  Is
there a better way to do this?
Bill

Lucene Merge Threads

2009-10-12 Thread Giovanni Fernandez-Kincade

Hi,
I'm attempting to optimize a pretty large index, and even though the optimize 
request timed out, I watched it using a profiler and saw that the optimize 
thread continued executing. Eventually it completed, but in the background I 
still see a thread performing a merge:

Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51
java.io.RandomAccessFile.readBytes(byte[], int, int)
java.io.RandomAccessFile.read(byte[], int, int)
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
 int, int)
org.apache.lucene.store.BufferedIndexInput.refill()
org.apache.lucene.store.BufferedIndexInput.readByte()
org.apache.lucene.store.IndexInput.readVInt()
org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
org.apache.lucene.index.SegmentTermEnum.next()
org.apache.lucene.index.SegmentMergeInfo.next()
org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer)
org.apache.lucene.index.SegmentMerger.mergeTerms()
org.apache.lucene.index.SegmentMerger.merge(boolean)
org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge)
org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run()


This has taken quite a while, and hasn't really been fully utilizing the 
machine's resources. After looking at the Lucene source, I noticed that you can 
set a MaxThreadCount parameter in this class. Is this parameter exposed by Solr 
somehow? I see the class mentioned, commented out, in my solrconfig.xml, but 
I'm not sure of the correct way to specify the parameter:






Also, if I can specify this parameter, is it safe to just start/stop my servlet 
server (Tomcat) mid-merge?

Thanks in advance,
Gio.

Re: Lucene Merge Threads

2009-10-12 Thread Jason Rutherglen

Try this in solrconfig.xml:


  1


Yes you can stop the process mid-merge.  The partially merged files
will be deleted on restart.

We need to update the wiki?

On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade
 wrote:
> Hi,
> I'm attempting to optimize a pretty large index, and even though the optimize 
> request timed out, I watched it using a profiler and saw that the optimize 
> thread continued executing. Eventually it completed, but in the background I 
> still see a thread performing a merge:
>
> Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51
> java.io.RandomAccessFile.readBytes(byte[], int, int)
> java.io.RandomAccessFile.read(byte[], int, int)
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
>  int, int)
> org.apache.lucene.store.BufferedIndexInput.refill()
> org.apache.lucene.store.BufferedIndexInput.readByte()
> org.apache.lucene.store.IndexInput.readVInt()
> org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
> org.apache.lucene.index.SegmentTermEnum.next()
> org.apache.lucene.index.SegmentMergeInfo.next()
> org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer)
> org.apache.lucene.index.SegmentMerger.mergeTerms()
> org.apache.lucene.index.SegmentMerger.merge(boolean)
> org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge)
> org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge)
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge)
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run()
>
>
> This has taken quite a while, and hasn't really been fully utilizing the 
> machine's resources. After looking at the Lucene source, I noticed that you 
> can set a MaxThreadCount parameter in this class. Is this parameter exposed 
> by Solr somehow? I see the class mentioned, commented out, in my 
> solrconfig.xml, but I'm not sure of the correct way to specify the parameter:
>
> 
>    
> 
>
>
> Also, if I can specify this parameter, is it safe to just start/stop my 
> servlet server (Tomcat) mid-merge?
>
> Thanks in advance,
> Gio.
>

RE: Lucene Merge Threads

2009-10-12 Thread Giovanni Fernandez-Kincade

Do you have to make a new call to optimize to make it start the merge again?

-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: Monday, October 12, 2009 7:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Lucene Merge Threads

Try this in solrconfig.xml:


  1


Yes you can stop the process mid-merge.  The partially merged files
will be deleted on restart.

We need to update the wiki?

On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade
 wrote:
> Hi,
> I'm attempting to optimize a pretty large index, and even though the optimize 
> request timed out, I watched it using a profiler and saw that the optimize 
> thread continued executing. Eventually it completed, but in the background I 
> still see a thread performing a merge:
>
> Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51
> java.io.RandomAccessFile.readBytes(byte[], int, int)
> java.io.RandomAccessFile.read(byte[], int, int)
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
>  int, int)
> org.apache.lucene.store.BufferedIndexInput.refill()
> org.apache.lucene.store.BufferedIndexInput.readByte()
> org.apache.lucene.store.IndexInput.readVInt()
> org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
> org.apache.lucene.index.SegmentTermEnum.next()
> org.apache.lucene.index.SegmentMergeInfo.next()
> org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer)
> org.apache.lucene.index.SegmentMerger.mergeTerms()
> org.apache.lucene.index.SegmentMerger.merge(boolean)
> org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge)
> org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge)
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge)
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run()
>
>
> This has taken quite a while, and hasn't really been fully utilizing the 
> machine's resources. After looking at the Lucene source, I noticed that you 
> can set a MaxThreadCount parameter in this class. Is this parameter exposed 
> by Solr somehow? I see the class mentioned, commented out, in my 
> solrconfig.xml, but I'm not sure of the correct way to specify the parameter:
>
> 
>    
> 
>
>
> Also, if I can specify this parameter, is it safe to just start/stop my 
> servlet server (Tomcat) mid-merge?
>
> Thanks in advance,
> Gio.
>

Re: two facet.prefix on one facet field in a single query

2009-10-12 Thread Bill Au

It looks like there is a JIRA covering this:

https://issues.apache.org/jira/browse/SOLR-1387

On Mon, Oct 12, 2009 at 11:00 AM, Bill Au  wrote:

> Is it possible to have two different facet.prefix on the same facet field
> in a single query.  I wan to get facet counts for two prefix, "xx" and
> "yy".  I tried using two facet.prefix (ie &facet.prefix=xx&facet.prefix=yy)
> but the second one seems to have no effect.
>
> Bill
>

XSLT Response for multivalue fields

2009-10-12 Thread blholmes


I am having trouble generating the xsl file for multivalue entries. I'm not
sure I'm missing something, or if this is how it is supposed to function. I
have to authors and I'd like to have seperate ByLine notes in my
translation.
Here is what solr returns normally
...

Crista  Souza
Darrell  Dunn


Here is my xsl

   

   



And here is what it is returning:
Crista  SouzaDarrell  Dunn

I was expecting it to return 
Crista  Souza
Darrell  Dunn

I've tried other variations and using templates instead but it keeps
displaying the same thing, one ByLine field with things mushed together.

Any clues if this is an issue with xslt code, the xslt response Writer,
XALAN, or solr? I've no clues where to go from here. Any ideas to point me
in the right direction appreciated.
-- 
View this message in context: 
http://www.nabble.com/XSLT-Response-for-multivalue-fields-tp25865618p25865618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 Release Party

2009-10-12 Thread Israel Ekpo

It is my email signature.

It is a sort of hybrid/mashup from different sources.

On Mon, Oct 12, 2009 at 6:49 PM, Michael Masters  wrote:

> Where does the quote come from :)
>
> On Sat, Oct 10, 2009 at 6:38 AM, Israel Ekpo  wrote:
> > I can't wait...
> >
> > --
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> >
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.

Re: Boosting of words

2009-10-12 Thread bhaskar chandrasekar

Hi Nicholas,

Thanks for your input.Where exactly the query

q=product:red color:red^10

should be used and defined?.
Help me.

Regards
Bhaskar

--- On Mon, 10/12/09, Nicholas Clark  wrote:

From: Nicholas Clark 
Subject: Re: Boosting of words
To: solr-user@lucene.apache.org
Date: Monday, October 12, 2009, 2:13 PM

The easiest way to boost your query is to modify your query string.

q=product:red color:red^10

In the above example, I have boosted the color field. If "red" is found in
that field, it will get a boost of 10. If it is only found in the product
field, then there will be no boost.

Here's more information:

http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms

Once you're comfortable with that, I suggest that you look into using the
DisMax request handler. It will allow you to easily search across multiple
fields with custom boost values.

http://wiki.apache.org/solr/DisMaxRequestHandler

-Nick

On Sun, Oct 11, 2009 at 12:26 PM, bhaskar chandrasekar  wrote:

> Hi,
>
> I would like to know how can i give boosting to search input in Solr.
> Where exactly should i make the changes?.
>
> Regards
> Bhaskar
>
>
>

RE: Lucene Merge Threads

2009-10-12 Thread Giovanni Fernandez-Kincade

This didn't end up working. I got the following error when I tried to commit:

Oct 12, 2009 8:36:42 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error loading class '
5
'
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325)
at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:81)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:178)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:172)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:400)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: 
5

at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.$$YJP$$doPrivileged(Native Method)
at java.security.AccessController.doPrivileged(Unknown Source)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)
at java.lang.Class.$$YJP$$forName0(Native Method)
at java.lang.Class.forName0(Unknown Source)
at java.lang.Class.forName(Unknown Source)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294)
... 28 more


I believe it's because the MaxThreadCount is not a public property of the 
ConcurrentMergeSchedulerClass. You have to call this method to set it:

public void setMaxThreadCount(int count) {
if (count < 1)
  throw new IllegalArgumentException("count should be at least 1");
maxThreadCount = count;
  }

Is that possible through the solrconfig?

Thanks,
Gio.

-Original Message-
From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] 
Sent: Monday, October 12, 2009 7:53 PM
To: solr-user@lucene.apache.org
Subject: RE: Lucene Merge Threads

Do you have to make a new call to optimize to make it start the merge again?

-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: Monday, October 12, 2009 7:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Lucene Merge Threads

Try this in solrconfig.xml:


  1


Yes you can stop the process mid-merge.  The partially merged files
will be deleted on restart.

We need to update the wiki?

On Mon, Oct 12, 2009 at 4:05 PM, Giovanni Fernandez-Kincade
 wrote:
> Hi,
> I'm attempting to optimize a pretty large index, and even though the optimize 
> request timed out, I watched it using a pro

SpellCheck Index not building

2009-10-12 Thread Varun Gupta

Hi,

I am using Solr 1.3 for spell checking. I am facing a strange problem of
spell checking index not been generated. When I have less number of
documents (less than 1000) indexed then the spell check index builds, but
when the documents are more (around 40K), then the index for spell checking
does not build. I can see the directory for spell checking build and there
are two files under it: segments_3  & segments.gen

I am using the following query to build the spell checking index:
/select
params={spellcheck=true&start=0&qt=contentsearch&wt=xml&rows=0&spellcheck.build=true&version=2.2

In the logs I see:
INFO: [] webapp=/solr path=/select
params={spellcheck=true&start=0&qt=contentsearch&wt=xml&rows=0&spellcheck.build=true&version=2.2}
hits=37467 status=0 QTime=44

Please help me solve this problem.

Here is my configuration:
*schema.xml:*

  




  

   
   
   

*solrconfig.xml:*
  

 dismax

  false
  false
  5
  true
  jarowinkler


spellcheck

  

  
textSpell

  a_spell
  a_spell
  ./spellchecker_a_spell
  0.7


  jarowinkler
  a_spell
  
  org.apache.lucene.search.spell.JaroWinklerDistance
  ./spellchecker_a_spell
  0.7

  

--
Thanks
Varun Gupta

Re: SpellCheck Index not building

2009-10-12 Thread Shalin Shekhar Mangar

On Tue, Oct 13, 2009 at 8:36 AM, Varun Gupta  wrote:

> Hi,
>
> I am using Solr 1.3 for spell checking. I am facing a strange problem of
> spell checking index not been generated. When I have less number of
> documents (less than 1000) indexed then the spell check index builds, but
> when the documents are more (around 40K), then the index for spell checking
> does not build. I can see the directory for spell checking build and there
> are two files under it: segments_3  & segments.gen
>
>
It seems that you might be running out of memory with a larger index. Can
you check the logs to see if it has any exceptions recorded?

-- 
Regards,
Shalin Shekhar Mangar.

Re: SpellCheck Index not building

2009-10-12 Thread Varun Gupta

No, there are no exceptions in the logs.

--
Thanks
Varun Gupta

On Tue, Oct 13, 2009 at 8:46 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Oct 13, 2009 at 8:36 AM, Varun Gupta 
> wrote:
>
> > Hi,
> >
> > I am using Solr 1.3 for spell checking. I am facing a strange problem of
> > spell checking index not been generated. When I have less number of
> > documents (less than 1000) indexed then the spell check index builds, but
> > when the documents are more (around 40K), then the index for spell
> checking
> > does not build. I can see the directory for spell checking build and
> there
> > are two files under it: segments_3  & segments.gen
> >
> >
> It seems that you might be running out of memory with a larger index. Can
> you check the logs to see if it has any exceptions recorded?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: doing searches from within an UpdateRequestProcessor

2009-10-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

A custom UpdateRequestProcessor is the solution. You can access the
searcher in a UpdateRequestProcessor.

On Tue, Oct 13, 2009 at 4:20 AM, Bill Au  wrote:
> Is it possible to do searches from within an UpdateRequestProcessor?  The
> documents in my index reference each other.  When a document is deleted, I
> would like to update all documents containing a reference to the deleted
> document.  My initial idea is to use a custom UpdateRequestProcessor.  Is
> there a better way to do this?
> Bill
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: search by some functionality

2009-10-12 Thread Chris Hostetter


: Maybe I'm missing something, but function queries aren't involved in
: determining whether a document matches or not, only its score.  How is a a
: custom function / value-source going to filter?

it's not ... i didn't realize that was the context of the question, i was 
just answering the specific question about how to create custom functions.



-Hoss

Re: Weird Facet and KeywordTokenizerFactory Issue

2009-10-12 Thread Chris Hostetter


: I had to be brief as my facets are in the order of 100K over 800K documents
: and also if I give the complete schema.xml I was afraid nobody would read my
: long message :-) ..Hence I showed only relevant pieces of the result showing
: different fields having same problem

relevant is good, but you have to provide a consistent picture from start 
to finish ... you don't need to show 1,000 lines of facet field output, 
but you at least need to show the field names.

: 
:   
: 
: 
: 
: 
: 
: 
:   

...have you used analysis.jsp to see what terms that analyzer produces 
based on the strings you are indexing for your documents?  becuase 
combined with synonyms like this...

: New York, N.Y., NY => New York

...it doesn't suprise me that you're getting "New" as an indexed term.  
By default SynonymFilter uses whitespace to delimit tokens in multi-token 
synonyms, so for some input like "NY" you should see it produce the token 
"New" and "York"

you can use the tokenizerFactory attribute on SynonymFilterFactory to 
specify a TokenizerFactory class to use when parsing synonyms.txt



-Hoss

Re: Question about PatternReplace filter and automatic Synonym generation

2009-10-12 Thread Chris Hostetter


:  There is a Solr.PatternTokenizerFactory class which likely fits the bill in
: this case. The related question I have is this - is it possible to have
: multiple Tokenizers in your analysis chain?

No .. Tokenizers consume CharReaders and produce a TokenStream ... what's 
needed here is a TokenFilter that comsumes a TOkenStream and produces a 
TokenStream





-Hoss

Re: De-basing / re-basing docIDs, or how to effectively pass calculated values from a Scorer or Filter up to (Solr's) QueryComponent.process

2009-10-12 Thread Chris Hostetter

: In the code I'm working with, I generate a cache of calculated values as a
: by-product within a Filter.getDocidSet implementation (and within a Query-ized
: version of the filter and its Scorer method) . These values are keyed off the
: IndexReader's docID values, since that's all that's accessible at that level.
: Ultimately, however, I need to be able to access these values much higher up
: in the stack (Solr's QueryComponent.process method), so that I can inject the

my suggestion would be to change your Filter to use the FieldCache to 
lookup the uiqueKey for your docid, and base your cache off that ... then 
other uses of your cache (higher up the chain) will have an idea that 
makes sense outside the ocntext of segment reader.




-Hoss

Re: DIH and EmbeddedSolr

2009-10-12 Thread rohan rai

Hey
Any reason why it may be happening ??

Regards
Rohan

On Sun, Oct 11, 2009 at 9:25 PM, rohan rai  wrote:

>
> Small data set..
> 
> 
> 
> 11
> 11
> 11
> 
> 
> 22
> 22
> 22
> 
> 
> 33
> 33
> 33
> 
> 
>
> data-config
> 
> 
> 
>  forEach="/root/test/"
> url="/home/test/test_data.xml"
> >
> 
> 
> 
> 
> 
> 
>
> schema
> 
> 
>   
> omitNorms="true"/>
>   
>
>  
>multiValued="false" required="true"/>
>multiValued="false" />
>multiValued="false" />
>  
>
>  id
>
>  name
>
>  
> 
>
> Sometime it creates sometimes it gives thread pool exception. It does not
> consistently creates the index.
>
> Regards
> Rohan
>
>
> On Sun, Oct 11, 2009 at 3:56 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> On Sat, Oct 10, 2009 at 7:44 PM, rohan rai  wrote:
>>
>> > This is pretty unstable...anyone has any clue...Sometimes it even
>> creates
>> > index, sometimes it does not ??
>> >
>> >
>> Most DataImportHandler tests run Solr in an embedded-like mode and they
>> run
>> fine. Can you tell us which version of Solr are you using? Also, any data
>> which can help us reproduce the problem would be nice.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>

Re: Lucene Merge Threads

2009-10-12 Thread Noble Paul നോബിള്‍ नोब्ळ्

which version of Solr are you using? the 1 syntax was added recently

On Tue, Oct 13, 2009 at 8:08 AM, Giovanni Fernandez-Kincade
 wrote:
> This didn't end up working. I got the following error when I tried to commit:
>
> Oct 12, 2009 8:36:42 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error loading class '
>                5
>        '
>        at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310)
>        at 
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325)
>        at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:81)
>        at 
> org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:178)
>        at 
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:172)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:400)
>        at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
>        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168)
>        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>        at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
>        at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>        at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
>        at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
>        at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
>        at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
>        at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
>        at 
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
>        at 
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
>        at 
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
>        at 
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
>        at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.ClassNotFoundException:
>                5
>
>        at java.net.URLClassLoader$1.run(Unknown Source)
>        at java.security.AccessController.$$YJP$$doPrivileged(Native Method)
>        at java.security.AccessController.doPrivileged(Unknown Source)
>        at java.net.URLClassLoader.findClass(Unknown Source)
>        at java.lang.ClassLoader.loadClass(Unknown Source)
>        at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
>        at java.lang.ClassLoader.loadClass(Unknown Source)
>        at java.lang.ClassLoader.loadClassInternal(Unknown Source)
>        at java.lang.Class.$$YJP$$forName0(Native Method)
>        at java.lang.Class.forName0(Unknown Source)
>        at java.lang.Class.forName(Unknown Source)
>        at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294)
>        ... 28 more
>
>
> I believe it's because the MaxThreadCount is not a public property of the 
> ConcurrentMergeSchedulerClass. You have to call this method to set it:
>
> public void setMaxThreadCount(int count) {
>    if (count < 1)
>      throw new IllegalArgumentException("count should be at least 1");
>    maxThreadCount = count;
>  }
>
> Is that possible through the solrconfig?
>
> Thanks,
> Gio.
>
> -Original Message-
> From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
> Sent: Monday, October 12, 2009 7:53 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Lucene Merge Threads
>
> Do you have to make a new call to optimize to make it start the merge again?
>
> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: Monday, October 12, 2009 7:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene Merge Threads
>
> Try this in solrconfig.xml:
>
> 
>  1
> 
>
> Yes you can stop the process mid-mer

47 matches

Mail list logo