Phrase Search Issue

2009-05-21 Thread dabboo

Hi,

I am facing one issue in phrase query. I am entering 'Top of the world' as
my search criteria. I am expecting it to return all the records in which,
one field should all these words in any order. 

But it is treating as OR and returning all the records, which are having
either of these words. I am doing this using dismax request. 

I would appreciate if somebody can provide me some pointers.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Phrase-Search-Issue-tp23648813p23648813.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: best way to cache "base" queries (before application of filters)

2009-05-21 Thread Kent Fitch
Thanks for your reply, Yonik:

On Thu, May 21, 2009 at 2:43 AM, Yonik Seeley
 wrote:
>
> Some thoughts:
>
> #1) This is sort of already implemented in some form... see this
> section of solrconfig.xml and try uncommenting it:
> ...

> > On Wed, May 20, 2009 at 12:43 PM, Yonik Seeley
> > >  wrote:
> > >true

> > Of course the examples you gave used the default sort (by score) so
> > this wouldn't help if you do actually need to sort by score.

Right - we need to sort by relevance

> #2) Your problem might be able to be solved with field collapsing on
> the "category" field in the future (but it's not in Solr yet).

Sorry - I didnt understand this

> #3) Current work I'm doing right now will push Filters down a level
> and check them in tandem with the query instead of after.  This should
> speed things up by at least a factor of 2 in your case.
> https://issues.apache.org/jira/browse/SOLR-1165
>
> I'm trying to get SOLR-1165 finished this week, and I'd love to see
> how it affects your performance.
> In the meantime, try useFilterForSortedQuery and let us know if it
> still works (it's been turned off for a lng time) ;-)

OK - so this looks like something to make all queries much faster by
only bothering to score results matching a filter?  If so, that's
really great, but I'm not sure it particularly helps our use-case
(other than making all filtered results faster) because:

- we've got one query we want filtered 5 ways to find the top scoring
results matching the query and each filter

- the filtering basically divides that query result set into 5 non
overlapping sets

- the query part is often complicated and expensive - we want to avoid
running it 5 times because our sloppy phrase requirement and often
millions of hits make finding and scoring expensive

- all documents in the query part will be scored eventually, even with
SOLR-1165, because they'll be part of one of the 5 filters

It is tempting to pass back to a custom query component lots of
results - enough so that the 'n' top scoring document that satisfy
each filter appear, but we may need to pass up to the query component
millions of hits to find, say, the top 5 ranked results for "maps".

It is tempting to apply the filters one by one in our own query
component on a scored document list retrieved by SolrIndexSearcher -
Im not sure - maybe I havent understood SOLR-1165?

Thanks also Walter for your suggestions.  Our users have a requirement
for the index to be continuously updated (well, every 10 minutes or
so), and our queries are extremely diverse/"long tail"ish, so an HTTP
cache will probably not help us.

Kent Fitch


Re: Phrase Search Issue

2009-05-21 Thread dabboo

This problem is related with the default operator in dismax. Currently OR is
the default operator and it is behaving perfectly fine. I have changed the
default operator in schema.xml to AND, I also have changed the minimum match
to 100%.

But it seems like AND as default operator doesnt work with Dismax.
Please suggest.

Thanks,
Amit Garg



dabboo wrote:
> 
> Hi,
> 
> I am facing one issue in phrase query. I am entering 'Top of the world' as
> my search criteria. I am expecting it to return all the records in which,
> one field should all these words in any order. 
> 
> But it is treating as OR and returning all the records, which are having
> either of these words. I am doing this using dismax request. 
> 
> I would appreciate if somebody can provide me some pointers.
> 
> Thanks,
> Amit Garg
> 

-- 
View this message in context: 
http://www.nabble.com/Phrase-Search-Issue-tp23648813p23649189.html
Sent from the Solr - User mailing list archive at Nabble.com.



what does the version parameter in the query mean?

2009-05-21 Thread Anshuman Manur
Hello all,

I'm using Solr 1.3.0, and when I query my index for "solr" using the admin
page, the query string in the address bar of my browser reads like this:

http://localhost:8080/solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on

Now, I don't know what version=2.2 means, and the wiki or the docs don't
tell me. Could someone enlighten me?

Thank You
Anshuman Manur


Re: How to change the weight of the fields ?

2009-05-21 Thread Vincent Pérès

It seems I can only search on the field 'text'. With the following url :
http://localhost:8983/solr/select/?q=novel&qt=dismax&fl=title_s,id&version=2.2&start=0&rows=10&indent=on&debugQuery=on

I get answers, but on the debug area, it seems it's only searching on the
'text' field (with or without 'qt' the results are displayed within the same
order) :


novel
novel
−

+DisjunctionMaxQuery((text:novel^0.5 | title_s:novel^5.0 |
id:novel^10.0)~0.01) ()

−

+(text:novel^0.5 | title_s:novel^5.0 | id:novel^10.0)~0.01 ()

−

−


0.014641666 = (MATCH) sum of:
  0.014641666 = (MATCH) max plus 0.01 times others of:
0.014641666 = (MATCH) weight(text:novel^0.5 in 114927), product of:
  0.01362607 = queryWeight(text:novel^0.5), product of:
0.5 = boost
3.4734163 = idf(docFreq=10634, numDocs=43213)
0.007845918 = queryNorm
  1.0745333 = (MATCH) fieldWeight(text:novel in 114927), product of:
1.4142135 = tf(termFreq(text:novel)=2)
3.4734163 = idf(docFreq=10634, numDocs=43213)
0.21875 = fieldNorm(field=text, doc=114927)

etc.

I should have a debug below with a search of the term into 'title_s' and
'id' no?

Thanks for your answers !
Vincent
-- 
View this message in context: 
http://www.nabble.com/How-to-change-the-weight-of-the-fields---tp23619971p23649624.html
Sent from the Solr - User mailing list archive at Nabble.com.



Strange Phrase Query Issue with Dismax

2009-05-21 Thread dabboo

Hi,

I am facing very strange issue on solr, not sure if it is already a bug.

If I am searching for 'Top 500' then it returns all the records which
contains either of these anywhere, which is fine.

But if I search for 'Top 500 Companies' in any order, it gives me all those
records, which contains these 3 words in any one of the field, irrespective
of sequence. In this case, it is not returning me the records, which
contains either of these word (which actually is my requirement).

Please suggest.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Strange-Phrase-Query-Issue-with-Dismax-tp23650114p23650114.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
On Wed, May 20, 2009 at 11:18 AM, James X
 wrote:
> Hi Mike, thanks for the quick response:
>
> $ java -version
> java version "1.6.0_11"
> Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
>
> I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
> hitting that yet!

The issue didn't spell this out very well -- I've added a comment.

> The exception always reports 0 length, but the number of of docs varies,
> heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
>     89 1 docs vs 0 length
>     20 2 docs vs 0 length
>      9 3 docs vs 0 length
>      1 4 docs vs 0 length
>      3 5 docs vs 0 length
>      2 6 docs vs 0 length
>      1 7 docs vs 0 length
>      1 9 docs vs 0 length
>      1 10 docs vs 0 length

Hmm... odd that it's always 0 file length.  What filesystem & IO
devices is the index being written to?

> The only unusual thing I can think of that we're doing with Solr is
> aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
> pattern between core admin operations and these exceptions, however...

I think from Lucene's standpoint this just means creating & closing
lots of IndexWriters?  (Which should be just fine).

What are your documents like?  Ie, how many and what type of fields?
Are you adding docs from multiple threads?  (Solr would do so, I
believe, so I guess: is your client that's submitting docs to a given
core, doing so with multiple threads?).

Mike


Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
Another question: are there any other exceptions in your logs?  Eg
problems adding certain documents, or anything?

Mike

On Wed, May 20, 2009 at 11:18 AM, James X
 wrote:
> Hi Mike, thanks for the quick response:
>
> $ java -version
> java version "1.6.0_11"
> Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
>
> I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
> hitting that yet!
>
> The exception always reports 0 length, but the number of of docs varies,
> heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
>     89 1 docs vs 0 length
>     20 2 docs vs 0 length
>      9 3 docs vs 0 length
>      1 4 docs vs 0 length
>      3 5 docs vs 0 length
>      2 6 docs vs 0 length
>      1 7 docs vs 0 length
>      1 9 docs vs 0 length
>      1 10 docs vs 0 length
>
> The only unusual thing I can think of that we're doing with Solr is
> aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
> pattern between core admin operations and these exceptions, however...
>
> James
>
> On Wed, May 20, 2009 at 2:37 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hmm... somehow Lucene is flushing a new segment on closing the
>> IndexWriter, and thinks 1 doc had been added to the stored fields
>> file, yet the fdx file is the wrong size (0 bytes).  This check (&
>> exception) are designed to prevent corruption from entering the index,
>> so it's at least good to see CheckIndex passes after this.
>>
>> I don't think you're hitting LUCENE-1521: that issue only happens if a
>> single segment has more than ~268 million docs.
>>
>> Which exact JRE version are you using?
>>
>> When you hit this exception, is it always "1 docs vs 0 length in bytes"?
>>
>> Mike
>>
>> On Wed, May 20, 2009 at 3:19 AM, James X
>>  wrote:
>> > Hello all,I'm running Solr 1.3 in a multi-core environment. There are up
>> to
>> > 2000 active cores in each Solr webapp instance at any given time.
>> >
>> > I've noticed occasional errors such as:
>> > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
>> docs
>> > vs 0 length in bytes of _h.fdx
>> >        at
>> >
>> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
>> >        at
>> >
>> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
>> >        at
>> >
>> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
>> >        at
>> >
>> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
>> >        at
>> > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
>> >        at
>> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
>> >        at
>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
>> >        at
>> > org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
>> >        at
>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
>> >        at
>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>> >        at
>> > org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
>> >
>> > during commit / optimise operations.
>> >
>> > These errors then cause cascading errors during updates on the offending
>> > cores:
>> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
>> timed
>> > out: SingleInstanceLock: write.lock
>> >        at org.apache.lucene.store.Lock.obtain(Lock.java:85)
>> >        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
>> >        at
>> org.apache.lucene.index.IndexWriter.(IndexWriter.java:924)
>> >        at
>> > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:116)
>> >        at
>> >
>> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
>> >
>> > This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but
>> when I
>> > upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
>> >
>> > CheckIndex doesn't find any problems with the index, and problems
>> disappear
>> > after an (inconvenient, for me) restart of Solr.
>> >
>> > Firstly, can I as the symptoms are so close to those in 1521, can I check
>> my
>> > Lucene upgrade method should work:
>> > - unzip the Solr 1.3 war
>> > - remove the Lucene 2.4dev jars
>> > (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
>> > lucene-memory,lucene-highlighter, lucene-analyzers)
>> > - move in the Lucene 2.4.1 jars
>> > - rezip the directory structures as solr.war.
>> >
>> > I think this has worked, as solr/default/admin/registry.jsp shows:
>> >  2.4.1
>> >  2.4.1 750176 - 2009-03-04
>> > 21:56:52
>> >
>> > Secondly, if this Lucene fix isn't the right solution to this problem,
>> can
>> > anyone suggest an alternative approach? The only problems I've had up to
>> now
>> > is to do with the number of allowed file handles, which 

Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
If you're able to run a patched version of Lucene, can you apply the
attached patch, run it, get the issue to happen again, and post back
the resulting exception?

It only adds further diagnostics to that RuntimeException you're hitting.

Another thing to try is turning on assertions, which may very well
catch the issue sooner.

Mike

On Wed, May 20, 2009 at 11:18 AM, James X
 wrote:
> Hi Mike, thanks for the quick response:
>
> $ java -version
> java version "1.6.0_11"
> Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
>
> I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
> hitting that yet!
>
> The exception always reports 0 length, but the number of of docs varies,
> heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
>     89 1 docs vs 0 length
>     20 2 docs vs 0 length
>      9 3 docs vs 0 length
>      1 4 docs vs 0 length
>      3 5 docs vs 0 length
>      2 6 docs vs 0 length
>      1 7 docs vs 0 length
>      1 9 docs vs 0 length
>      1 10 docs vs 0 length
>
> The only unusual thing I can think of that we're doing with Solr is
> aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
> pattern between core admin operations and these exceptions, however...
>
> James
>
> On Wed, May 20, 2009 at 2:37 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hmm... somehow Lucene is flushing a new segment on closing the
>> IndexWriter, and thinks 1 doc had been added to the stored fields
>> file, yet the fdx file is the wrong size (0 bytes).  This check (&
>> exception) are designed to prevent corruption from entering the index,
>> so it's at least good to see CheckIndex passes after this.
>>
>> I don't think you're hitting LUCENE-1521: that issue only happens if a
>> single segment has more than ~268 million docs.
>>
>> Which exact JRE version are you using?
>>
>> When you hit this exception, is it always "1 docs vs 0 length in bytes"?
>>
>> Mike
>>
>> On Wed, May 20, 2009 at 3:19 AM, James X
>>  wrote:
>> > Hello all,I'm running Solr 1.3 in a multi-core environment. There are up
>> to
>> > 2000 active cores in each Solr webapp instance at any given time.
>> >
>> > I've noticed occasional errors such as:
>> > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
>> docs
>> > vs 0 length in bytes of _h.fdx
>> >        at
>> >
>> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
>> >        at
>> >
>> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
>> >        at
>> >
>> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
>> >        at
>> >
>> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
>> >        at
>> > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
>> >        at
>> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
>> >        at
>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
>> >        at
>> > org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
>> >        at
>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
>> >        at
>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
>> >        at
>> > org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
>> >
>> > during commit / optimise operations.
>> >
>> > These errors then cause cascading errors during updates on the offending
>> > cores:
>> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
>> timed
>> > out: SingleInstanceLock: write.lock
>> >        at org.apache.lucene.store.Lock.obtain(Lock.java:85)
>> >        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
>> >        at
>> org.apache.lucene.index.IndexWriter.(IndexWriter.java:924)
>> >        at
>> > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:116)
>> >        at
>> >
>> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
>> >
>> > This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but
>> when I
>> > upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
>> >
>> > CheckIndex doesn't find any problems with the index, and problems
>> disappear
>> > after an (inconvenient, for me) restart of Solr.
>> >
>> > Firstly, can I as the symptoms are so close to those in 1521, can I check
>> my
>> > Lucene upgrade method should work:
>> > - unzip the Solr 1.3 war
>> > - remove the Lucene 2.4dev jars
>> > (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
>> > lucene-memory,lucene-highlighter, lucene-analyzers)
>> > - move in the Lucene 2.4.1 jars
>> > - rezip the directory structures as solr.war.
>> >
>> > I think this has worked, as solr/default/admin/registry.jsp shows:
>> >  2.4.1
>> >  2.4.1 750176 - 2009-03-04
>> > 21:56:52
>> >
>> > Secondly,

Re: best way to cache "base" queries (before application of filters)

2009-05-21 Thread Yonik Seeley
On Thu, May 21, 2009 at 3:30 AM, Kent Fitch  wrote:
> > #2) Your problem might be able to be solved with field collapsing on
> > the "category" field in the future (but it's not in Solr yet).
> Sorry - I didnt understand this

A single relevancy search, but group or collapse results based on the
value of the category field such that you don't get more than 10
results for each value of category.

but it's not in Solr yet...
http://issues.apache.org/jira/browse/SOLR-236

> - we've got one query we want filtered 5 ways to find the top scoring
> results matching the query and each filter

The problem is that "caching the base query" involves caching not only
all of the matching documents, but the score for each document.
That's expensive.

You could also write your own HitCollector that filtered the results
of the base query 5 different ways simultaneously.

-Yonik
http://www.lucidimagination.com


Re: master/slave failure scenario

2009-05-21 Thread nk 11
Just curious. What would be the disadvantages of a no replication / multi
master (no slave) setup?
The client code should do the updates for evey master ofc, but if one
machine would fail then I can imediatly continue the indexing process and
also I can query the index on any machine for a valid result.
I might be missing something...
On Thu, May 14, 2009 at 4:19 PM, nk 11  wrote:

> wow! that was just a couple of days old!
> thanks as lot!
>   2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> yeah there is a hack
>>
>> https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708316
>>
>> On Thu, May 14, 2009 at 6:07 PM, nk 11  wrote:
>> > sorry for the mail. I wanted to hit reply :(
>> >
>> > On Thu, May 14, 2009 at 3:37 PM, nk 11  wrote:
>> >>
>> >> oh, so the configuration must be manualy changed?
>> >> Can't something be passed at (re)start time?
>> >>
>> >> 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>> >>>
>> >>> On Thu, May 14, 2009 at 4:07 PM, nk 11 
>> wrote:
>> >>> > Ok so the VIP will point to the new master. but what makes a slave
>> >>> > promoted
>> >>> > to a master? Only the fact that it will receive add/update requests?
>> >>> > And I suppose that this "hot" promotion is possible only if the
>> slave
>> >>> > is
>> >>> > convigured as master also...
>> >>> right.. By default you can setup all slaves to be master also. It does
>> >>> not cost anything if it is not serving any requests.
>> >>>
>> >>> so , if you have such a setting you will have to disable that slave to
>> >>> be a slave and restart it and you will have to make the VIP point to
>> >>> this new slave as master.
>> >>>
>> >>> so hot promotion is still not possible.
>> >>> >
>> >>> > 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>> >>> >>
>> >>> >> ideally , we don't do that.
>> >>> >> you can just keep the master host behind a VIP so if you wish to
>> >>> >> change the master make the VIP point to the new host
>> >>> >>
>> >>> >> On Wed, May 13, 2009 at 10:52 PM, nk 11 
>> >>> >> wrote:
>> >>> >> > This is more interesting.Such a procedure would involve taking
>> down
>> >>> >> > and
>> >>> >> > reconfiguring the slave?
>> >>> >> >
>> >>> >> > On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
>> >>> >> > wrote:
>> >>> >> >
>> >>> >> >> Or ...
>> >>> >> >>
>> >>> >> >> 1. Promote existing slave to new master
>> >>> >> >> 2. Add new slave to cluster
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> -Bryan
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
>> >>> >> >>
>> >>> >> >>  - Migrate configuration files from old master (or backup) to
>> new
>> >>> >> >> master.
>> >>> >> >>> - Replicate from a slave to the new master.
>> >>> >> >>> - Resume indexing to new master.
>> >>> >> >>>
>> >>> >> >>> -Jay
>> >>> >> >>>
>> >>> >> >>> On Wed, May 13, 2009 at 4:26 AM, nk 11 > >
>> >>> >> >>> wrote:
>> >>> >> >>>
>> >>> >> >>>  Nice.
>> >>> >>  What if the master fails permanently (like a disk crash...)
>> and
>> >>> >>  the
>> >>> >>  new
>> >>> >>  master is a clean machine?
>> >>> >>  2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 
>> >>> >> 
>> >>> >>   On Wed, May 13, 2009 at 12:10 PM, nk 11 <
>> nick.cass...@gmail.com>
>> >>> >>  wrote:
>> >>> >> >
>> >>> >> >> Hello
>> >>> >> >>
>> >>> >> >> I'm kind of new to Solr and I've read about replication, and
>> >>> >> >> the
>> >>> >> >> fact
>> >>> >> >>
>> >>> >> > that a
>> >>> >> >
>> >>> >> >> node can act as both master and slave.
>> >>> >> >> I a replica fails and then comes back on line I suppose that
>> it
>> >>> >> >> will
>> >>> >> >>
>> >>> >> > resyncs
>> >>> >> >
>> >>> >> >> with the master.
>> >>> >> >>
>> >>> >> > right
>> >>> >> >
>> >>> >> >>
>> >>> >> >> But what happnes if the master fails? A slave that is
>> >>> >> >> configured as
>> >>> >> >>
>> >>> >> > master
>> >>> >> >
>> >>> >> >> will kick in? What if that slave is not yes fully sync'ed
>> with
>> >>> >> >> the
>> >>> >> >>
>> >>> >> > failed
>> >>> >> 
>> >>> >> > master and has old data?
>> >>> >> >>
>> >>> >> > if the master fails you can't index the data. but the slaves
>> >>> >> > will
>> >>> >> > continue serving the requests with the last index. You an
>> bring
>> >>> >> > back
>> >>> >> > the master up and resume indexing.
>> >>> >> >
>> >>> >> >
>> >>> >> >> What happens when the original master comes back on line? He
>> >>> >> >> will
>> >>> >> >>
>> >>> >> > remain
>> >>> >> 
>> >>> >> > a
>> >>> >> >
>> >>> >> >> slave because there is another node with the master role?
>> >>> >> >>
>> >>> >> >> Thank you!
>> >>> >> >>
>> >>> >> >>
>> >>> >> >
>> >>> >> >
>> >>> >> > --
>> >>> >> > 

Customizing SOLR-236 field collapsing

2009-05-21 Thread Marc Sturlese

Hey there,
I have been testing the last adjacent field collapsing patch in trunk and
seems to work perfectly. I am trying to modify the function of it but don't
know exactly how to do it. What I would like to do is instead of collapse
the results send them to the end of the results cue.
Aparently it is not possible to do that due to the way it is implemented. I
have noticed that you get a DocSet of the ids that "survived" the collapsing
and that match the query and filters (collapseFilterDocSet =
collapseFilter.getDocSet();, you get it in CollapseComponent.java.
Once it is done the search is excuted again, this time the DocSet obtained
before is passed as a filter:

DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(),
  collapseFilterDocSet
== null? rb.getFilters(): null,
  collapseFilterDocSet,
 
rb.getSortSpec().getSort(),
 
rb.getSortSpec().getOffset(),
 
rb.getSortSpec().getCount(),
  rb.getFieldFlags());

The result of this search will give you the final result (with the correct
offset and start).
I have thought in saving the collapsed docs in another DocSet and after do
something with them... but don't know how to manage it.
Any clue about how could I reach the goal?
Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/Customizing-SOLR-236-field-collapsing-tp23653220p23653220.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to index large set data

2009-05-21 Thread Erick Erickson
This isn't much data to go on. Do you have any idea what your throughput is?How
many documents are you indexing? one 45G doc or 4.5 billion 10 character
docs?
Have you looked at any profiling data to see how much memory is being
consumed?
Are you IO bound or CPU bound?

Best
Erick

On Thu, May 21, 2009 at 2:18 AM, Jianbin Dai  wrote:

>
> Hi,
>
> I have about 45GB xml files to be indexed. I am using DataImportHandler. I
> started the full import 4 hours ago, and it's still running
> My computer has 4GB memory. Any suggestion on the solutions?
> Thanks!
>
> JB
>
>
>
>
>


Re: Plugin Not Found

2009-05-21 Thread Jeff Newburn
Nothing else is in the lib directory but this one jar.

Additionally, the logs seem to say that it finds the lib as shown below
INFO: Solr home set to '/home/zetasolr/'
May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr
classloader

However as soon as it tries the component it cannot find the class.

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Reply-To: 
> Date: Thu, 21 May 2009 10:19:19 +0530
> To: 
> Subject: Re: Plugin Not Found
> 
> what else is there in the solr.home/lib other than this component?
> 
> On Wed, May 20, 2009 at 9:08 PM, Jeff Newburn  wrote:
>> I tried to change the package name to com.zappos.solr.
>> 
>> When I declared the search component with:
>> > class="com.zappos.solr.FacetCubeComponent"/>
>> 
>> I get:
>> SEVERE: org.apache.solr.common.SolrException: Unknown Search Component:
>> facetcube
>>    at org.apache.solr.core.SolrCore.getSearchComponent(SolrCore.java:874)
>>    at
>> org.apache.solr.handler.component.SearchHandler.inform(SearchHandler.java:12
>> 7)
>>    at
>> 
>> 
>> When I declare the component with solr.FacetCubeComponent I get the same
>> error message.
>> 
>> When we turned on trace we got the same exception plus
>> Caused by: java.lang.ClassNotFoundException:
>> com.zappos.solr.FacetCubeComponent
>>    at
>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav
>> a:1360)
>>    at
>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav
>> a:1206)
>>    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>>    at java.lang.Class.forName0(Native Method)
>>    at java.lang.Class.forName(Class.java:247)
>>    at
>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:29
>> 4)
>>    ... 27 more
>> 
>> 
>> 
>> --
>> Jeff Newburn
>> Software Engineer, Zappos.com
>> jnewb...@zappos.com - 702-943-7562
>> 
>> 
>>> From: Grant Ingersoll 
>>> Reply-To: 
>>> Date: Wed, 20 May 2009 10:38:30 -0400
>>> To: 
>>> Subject: Re: Plugin Not Found
>>> 
>>> Just a wild guess here, but...
>>> 
>>> Try doing one of two things:
>>> 1. change the package name to be something other than o.a.s
>>> 2. Change your config to use solr.FacetCubeComponent
>>> 
>>> You might also try turning on trace level logging for the
>>> SolrResourceLoader and report back the output.
>>> 
>>> -Grant
>>> 
>>> On May 20, 2009, at 10:20 AM, Jeff Newburn wrote:
>>> 
 Error is below. This error does not appear when I manually copy the
 jar file
 into the tomcat webapp directory only when I try to put it in the
 solr.home
 lib directory.
 
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.component.FacetCubeComponent'
    at
 org
 .apache
 .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:31
 0)
    at
 org
 .apache
 .solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:
 325)
    at
 org
 .apache
 .solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader
 .java:84)
    at
 org
 .apache
 .solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.j
 ava:141)
    at
 org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:841)
    at org.apache.solr.core.SolrCore.(SolrCore.java:528)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:
 350)
    at org.apache.solr.core.CoreContainer.load(CoreContainer.java:227)
    at
 org.apache.solr.core.CoreContainer
 $Initializer.initialize(CoreContainer.java
 :107)
    at
 org
 .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:
 69)
    at
 org
 .apache
 .catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilter
 Config.java:275)
    at
 org
 .apache
 .catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFil
 terConfig.java:397)
    at
 org
 .apache
 .catalina.core.ApplicationFilterConfig.(ApplicationFilterCon
 fig.java:108)
    at
 org
 .apache
 .catalina.core.StandardContext.filterStart(StandardContext.java:37
 09)
    at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:
 4356)
    at
 org
 .apache
 .catalina.core.ContainerBase.addChildInternal(ContainerBase.java:7
 91)
    at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:
 771)
    at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525)
    at
 org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:829)
    at
 org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:718)
    at
 org.apache.catalina.startup.HostConfig.deployApps(Host

Re: master/slave failure scenario

2009-05-21 Thread Bryan Talbot
Indexing is usually much more expensive that replication so it won't  
scale well as you add more servers.  Also, what would a client do if  
it was able to send the update to only some of the servers because  
others were down (for maintenance, etc)?




-Bryan




On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote:

Just curious. What would be the disadvantages of a no replication /  
multi

master (no slave) setup?
The client code should do the updates for evey master ofc, but if one
machine would fail then I can imediatly continue the indexing  
process and

also I can query the index on any machine for a valid result.
I might be missing something...
On Thu, May 14, 2009 at 4:19 PM, nk 11  wrote:


wow! that was just a couple of days old!
thanks as lot!
 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ्  




yeah there is a hack

https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel 
#action_12708316


On Thu, May 14, 2009 at 6:07 PM, nk 11   
wrote:

sorry for the mail. I wanted to hit reply :(

On Thu, May 14, 2009 at 3:37 PM, nk 11   
wrote:


oh, so the configuration must be manualy changed?
Can't something be passed at (re)start time?

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ्  



On Thu, May 14, 2009 at 4:07 PM, nk 11 

wrote:
Ok so the VIP will point to the new master. but what makes a  
slave

promoted
to a master? Only the fact that it will receive add/update  
requests?

And I suppose that this "hot" promotion is possible only if the

slave

is
convigured as master also...
right.. By default you can setup all slaves to be master also.  
It does

not cost anything if it is not serving any requests.

so , if you have such a setting you will have to disable that  
slave to
be a slave and restart it and you will have to make the VIP  
point to

this new slave as master.

so hot promotion is still not possible.


2009/5/14 Noble Paul നോബിള്‍ नोब्ळ्  



ideally , we don't do that.
you can just keep the master host behind a VIP so if you wish  
to

change the master make the VIP point to the new host

On Wed, May 13, 2009 at 10:52 PM, nk 11 1...@gmail.com>

wrote:

This is more interesting.Such a procedure would involve taking

down

and
reconfiguring the slave?

On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
wrote:


Or ...

1. Promote existing slave to new master
2. Add new slave to cluster




-Bryan





On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:

- Migrate configuration files from old master (or backup) to

new

master.

- Replicate from a slave to the new master.
- Resume indexing to new master.

-Jay

On Wed, May 13, 2009 at 4:26 AM, nk 11 


wrote:

Nice.

What if the master fails permanently (like a disk crash...)

and

the
new
master is a clean machine?
2009/5/13 Noble Paul നോബിള്‍ नो 
ब्ळ् 


On Wed, May 13, 2009 at 12:10 PM, nk 11 <

nick.cass...@gmail.com>

wrote:



Hello

I'm kind of new to Solr and I've read about  
replication, and

the
fact


that a


node can act as both master and slave.
I a replica fails and then comes back on line I suppose  
that

it

will


resyncs


with the master.


right



But what happnes if the master fails? A slave that is
configured as


master


will kick in? What if that slave is not yes fully sync'ed

with

the


failed



master and has old data?


if the master fails you can't index the data. but the  
slaves

will
continue serving the requests with the last index. You an

bring

back
the master up and resume indexing.


What happens when the original master comes back on  
line? He

will


remain



a


slave because there is another node with the master role?

Thank you!





--
-
Noble Paul | Principal Engineer| AOL | http://aol.com












--
-
Noble Paul | Principal Engineer| AOL | http://aol.com







--
-
Noble Paul | Principal Engineer| AOL | http://aol.com









--
-
Noble Paul | Principal Engineer| AOL | http://aol.com








Re: Customizing SOLR-236 field collapsing

2009-05-21 Thread Thomas Traeger
Is adding QueryComponent to your SearchComponents an option? When 
combined with the CollapseComponent this approach would return the 
collapsed and the complete result set.


i.e.:


 collapse
 query
 facet
 mlt
 highlight


Thomas

Marc Sturlese schrieb:

Hey there,
I have been testing the last adjacent field collapsing patch in trunk and
seems to work perfectly. I am trying to modify the function of it but don't
know exactly how to do it. What I would like to do is instead of collapse
the results send them to the end of the results cue.
Aparently it is not possible to do that due to the way it is implemented. I
have noticed that you get a DocSet of the ids that "survived" the collapsing
and that match the query and filters (collapseFilterDocSet =
collapseFilter.getDocSet();, you get it in CollapseComponent.java.
Once it is done the search is excuted again, this time the DocSet obtained
before is passed as a filter:

DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(),
  collapseFilterDocSet
== null? rb.getFilters(): null,
  collapseFilterDocSet,
 
rb.getSortSpec().getSort(),
 
rb.getSortSpec().getOffset(),
 
rb.getSortSpec().getCount(),

  rb.getFieldFlags());

The result of this search will give you the final result (with the correct
offset and start).
I have thought in saving the collapsed docs in another DocSet and after do
something with them... but don't know how to manage it.
Any clue about how could I reach the goal?
Thanks in advance
  




RE: Creating a distributed search in a searchComponent

2009-05-21 Thread siping liu

I was looking for answer to the same question, and have similar concern. Looks 
like any serious customization work requires developing custom SearchComponent, 
but it's not clear to me how Solr designer wanted this to be done. I have more 
confident to either do it at Lucene level, or stay on client side and using 
something like Multi-core (as discussed here 
http://wiki.apache.org/solr/MultipleIndexes).


 
> Date: Wed, 20 May 2009 13:47:20 -0400
> Subject: RE: Creating a distributed search in a searchComponent
> From: nicholas.bai...@rackspace.com
> To: solr-user@lucene.apache.org
> 
> It seems I sent this out a bit too soon. After looking at the source it seems 
> there are two seperate paths for distributed and regular queries, however the 
> prepare method for for all components is run before the shards parameter is 
> checked. So I can build the shards portion by using the prepare method of the 
> my own search component. 
> 
> However I'm not sure if this is the greatest idea in case solr changes at 
> some point.
> 
> -Nick
> 
> -Original Message-
> From: "Nick Bailey" 
> Sent: Wednesday, May 20, 2009 1:29pm
> To: solr-user@lucene.apache.org
> Subject: Creating a distributed search in a searchComponent
> 
> Hi,
> 
> I am wondering if it is possible to basically add the distributed portion of 
> a search query inside of a searchComponent.
> 
> I am hoping to build my own component and add it as a first-component to the 
> StandardRequestHandler. Then hopefully I will be able to use this component 
> to build the "shards" parameter of the query and have the Handler then treat 
> the query as a distributed search. Anyone have any experience or know if this 
> is possible?
> 
> Thanks,
> Nick
> 
> 
> 

_
Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009

Re: Plugin Not Found

2009-05-21 Thread Mark Miller

Jeff Newburn wrote:

Nothing else is in the lib directory but this one jar.

Additionally, the logs seem to say that it finds the lib as shown below
INFO: Solr home set to '/home/zetasolr/'
May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr
classloader

However as soon as it tries the component it cannot find the class.

  
Something must be wacky. I just did a quick custom component with 1.3 
and trunk, and it loaded no problem in both cases.


Anything odd about your Component? Your sure it extends SearchComponent?

As Noble mentioned, you will not be able to find other classes/jars in 
the solr.home/lib directory from a class/jar in the solr.home/lib 
directory. But this, oddly, doesn't appear to be the issue your facing.


Do share if you have anything else you can add.

--
- Mark

http://www.lucidimagination.com





Re: Creating a distributed search in a searchComponent

2009-05-21 Thread Shalin Shekhar Mangar
On Wed, May 20, 2009 at 10:59 PM, Nick Bailey  wrote:

> Hi,
>
> I am wondering if it is possible to basically add the distributed portion
> of a search query inside of a searchComponent.
>
> I am hoping to build my own component and add it as a first-component to
> the StandardRequestHandler.  Then hopefully I will be able to use this
> component to build the "shards" parameter of the query and have the Handler
> then treat the query as a distributed search.  Anyone have any experience or
> know if this is possible?
>

You can also add a ServletFilter before SolrDispatchFilter and add the
parameters before Solr processes the query.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Creating a distributed search in a searchComponent

2009-05-21 Thread Shalin Shekhar Mangar
Also look at SOLR-565 and see if that helps you.

https://issues.apache.org/jira/browse/SOLR-565

On Thu, May 21, 2009 at 9:58 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

>
> On Wed, May 20, 2009 at 10:59 PM, Nick Bailey <
> nicholas.bai...@rackspace.com> wrote:
>
>> Hi,
>>
>> I am wondering if it is possible to basically add the distributed portion
>> of a search query inside of a searchComponent.
>>
>> I am hoping to build my own component and add it as a first-component to
>> the StandardRequestHandler.  Then hopefully I will be able to use this
>> component to build the "shards" parameter of the query and have the Handler
>> then treat the query as a distributed search.  Anyone have any experience or
>> know if this is possible?
>>
>
> You can also add a ServletFilter before SolrDispatchFilter and add the
> parameters before Solr processes the query.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: what does the version parameter in the query mean?

2009-05-21 Thread Jay Hill
I was interested in this recently and also couldn't find anything on the
wiki. I found this in the list archive:

The version parameter determines the XML protocol used in the response.
Clients are strongly encouraged to ''always'' specify the protocol version,
so as to ensure that the format of the response they receive does not change
unexpectedly if/when the Solr server is upgraded.

Here is a link to the archive:
http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg00518.html

-Jay


On Thu, May 21, 2009 at 1:06 AM, Anshuman Manur  wrote:

> Hello all,
>
> I'm using Solr 1.3.0, and when I query my index for "solr" using the admin
> page, the query string in the address bar of my browser reads like this:
>
>
> http://localhost:8080/solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on
>
> Now, I don't know what version=2.2 means, and the wiki or the docs don't
> tell me. Could someone enlighten me?
>
> Thank You
> Anshuman Manur
>


No sanity checks before replicating files?

2009-05-21 Thread Damien Tournoud
Hi list,

We have deployed an experimental Solr 1.4 cluster (a master/slave
setup, with automatic promotion of the slave as a master in case of
failure) on drupal.org, to manage our medium size index (3GB, about
400K documents).

One of the problem we are facing is that there seems to be no sanity
checks before downloading files. Take the following scenario:

 - initial situation: s1 is master, s2 is slave
 - s1 fails, the virtual IP falls back to s2
 - some updates happen on s2
 - suppose now that s1 gets back online, s2 tries to replicate from
s1, but after replicating all the files (3GB), the commit fails
because the local index has been locally updated, the replication
fails, but the process restarts at the next poll (redownload all the
index files, fails again...) and so on

We are considering configuring each server to replicate from the
virtual IP, which should solve that issue for us, but couldn't the
slave do some sanity checks before trying to download all the files
from the master?

Thanks in advance for any help you could provide,

Damien Tournoud


Re: master/slave failure scenario

2009-05-21 Thread nk 11
You are right... I just don't like the idea of stopping the indexing process
if the master fails until a new one is started (more or less by hand).

On Thu, May 21, 2009 at 6:49 PM, Bryan Talbot wrote:

> Indexing is usually much more expensive that replication so it won't scale
> well as you add more servers.  Also, what would a client do if it was able
> to send the update to only some of the servers because others were down (for
> maintenance, etc)?
>
>
>
> -Bryan
>
>
>
>
>
> On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote:
>
>  Just curious. What would be the disadvantages of a no replication / multi
>> master (no slave) setup?
>> The client code should do the updates for evey master ofc, but if one
>> machine would fail then I can imediatly continue the indexing process and
>> also I can query the index on any machine for a valid result.
>> I might be missing something...
>> On Thu, May 14, 2009 at 4:19 PM, nk 11  wrote:
>>
>>  wow! that was just a couple of days old!
>>> thanks as lot!
>>>  2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>>>
>>>  yeah there is a hack


 https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
 #action_12708316

 On Thu, May 14, 2009 at 6:07 PM, nk 11  wrote:

> sorry for the mail. I wanted to hit reply :(
>
> On Thu, May 14, 2009 at 3:37 PM, nk 11  wrote:
>
>>
>> oh, so the configuration must be manualy changed?
>> Can't something be passed at (re)start time?
>>
>> 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>>
>>>
>>> On Thu, May 14, 2009 at 4:07 PM, nk 11 
>>>
>> wrote:

> Ok so the VIP will point to the new master. but what makes a slave
 promoted
 to a master? Only the fact that it will receive add/update requests?
 And I suppose that this "hot" promotion is possible only if the

>>> slave

> is
 convigured as master also...

>>> right.. By default you can setup all slaves to be master also. It
>>> does
>>> not cost anything if it is not serving any requests.
>>>
>>> so , if you have such a setting you will have to disable that slave
>>> to
>>> be a slave and restart it and you will have to make the VIP point to
>>> this new slave as master.
>>>
>>> so hot promotion is still not possible.
>>>

 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 

>
> ideally , we don't do that.
> you can just keep the master host behind a VIP so if you wish to
> change the master make the VIP point to the new host
>
> On Wed, May 13, 2009 at 10:52 PM, nk 11 
> wrote:
>
>> This is more interesting.Such a procedure would involve taking
>>
> down

> and
>> reconfiguring the slave?
>>
>> On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
>> wrote:
>>
>>  Or ...
>>>
>>> 1. Promote existing slave to new master
>>> 2. Add new slave to cluster
>>>
>>>
>>>
>>>
>>> -Bryan
>>>
>>>
>>>
>>>
>>>
>>> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
>>>
>>> - Migrate configuration files from old master (or backup) to
>>>
>> new

> master.
>>>
 - Replicate from a slave to the new master.
 - Resume indexing to new master.

 -Jay

 On Wed, May 13, 2009 at 4:26 AM, nk 11 >>>
>>>
>  wrote:

 Nice.

> What if the master fails permanently (like a disk crash...)
>
 and

> the
> new
> master is a clean machine?
> 2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 
>
> On Wed, May 13, 2009 at 12:10 PM, nk 11 <
>
 nick.cass...@gmail.com>

> wrote:
>
>>
>>  Hello
>>>
>>> I'm kind of new to Solr and I've read about replication, and
>>> the
>>> fact
>>>
>>>  that a
>>
>>  node can act as both master and slave.
>>> I a replica fails and then comes back on line I suppose that
>>>
>> it

> will
>>>
>>>  resyncs
>>
>>  with the master.
>>>
>>>  right
>>
>>
>>> But what happnes if the master fails? A slave that is
>>> configured as
>>>
>>>  master
>>
>>  will kick in? What if that slave is not yes fully sync'ed
>>>
>> with

> the
>>

Re: Customizing SOLR-236 field collapsing

2009-05-21 Thread Marc Sturlese

Yes, I have tried it but I see couple of problems doing that.

I will have to do more searches so response time will increase. 

The second thing is that, imagine I show the results collapsed in page one
and put a button to see the non collapsed results. If later results for the
second page are requested, some results from the non collapsed request would
be the same that some results that apeared in the first page doing
collapsing:

collapsing page 1 shows docs:
1-2-3-6-7

non collapsing results page 1 shows docs:
1-2-3-4-5

collapsing results page 2 shows docs:
8-9-10-11-12

non collapsing results page 2 show docs:
6-7-8-9-10

I want to avoid that and make the response as fast as possible. That is the
reason because I want to send the collapsed docs to the end of the queue...

Thanks



Thomas Traeger-2 wrote:
> 
> Is adding QueryComponent to your SearchComponents an option? When 
> combined with the CollapseComponent this approach would return the 
> collapsed and the complete result set.
> 
> i.e.:
> 
> 
>   collapse
>   query
>   facet
>   mlt
>   highlight
> 
> 
> Thomas
> 
> Marc Sturlese schrieb:
>> Hey there,
>> I have been testing the last adjacent field collapsing patch in trunk and
>> seems to work perfectly. I am trying to modify the function of it but
>> don't
>> know exactly how to do it. What I would like to do is instead of collapse
>> the results send them to the end of the results cue.
>> Aparently it is not possible to do that due to the way it is implemented.
>> I
>> have noticed that you get a DocSet of the ids that "survived" the
>> collapsing
>> and that match the query and filters (collapseFilterDocSet =
>> collapseFilter.getDocSet();, you get it in CollapseComponent.java.
>> Once it is done the search is excuted again, this time the DocSet
>> obtained
>> before is passed as a filter:
>>
>> DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(),
>>  
>> collapseFilterDocSet
>> == null? rb.getFilters(): null,
>>  
>> collapseFilterDocSet,
>>  
>> rb.getSortSpec().getSort(),
>>  
>> rb.getSortSpec().getOffset(),
>>  
>> rb.getSortSpec().getCount(),
>>  
>> rb.getFieldFlags());
>>
>> The result of this search will give you the final result (with the
>> correct
>> offset and start).
>> I have thought in saving the collapsed docs in another DocSet and after
>> do
>> something with them... but don't know how to manage it.
>> Any clue about how could I reach the goal?
>> Thanks in advance
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Customizing-SOLR-236-field-collapsing-tp23653220p23656522.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: master/slave failure scenario

2009-05-21 Thread Otis Gospodnetic

Hi,

You should be able to do the following.
Put masters behind a load balancer (LB).
Create a LB VIP and a pool with 2 masters, masterA & masterB with a rule that 
all requests always go to A unless A is down.  If If A is down they go to B.
Bring up master instances A and B on 2 servers and make them point to the 
shared storage.

masterA \
   \--> shared storage
   /
masterB /

Your indexing client doesn't talk to the servers directly. It talks through the 
VIP you created in LB.
At any one time only one of the masters is active.
If A goes down, LB detects it and makes B active.
Your indexer may have to reconnect if it detects a failure, maybe it would need 
to reindex some number of documents if they didn't make it to disk before A 
died, maybe even some lock file cleanup might be needed, but the above should 
be doable with little effort.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: nk 11 
> To: solr-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 12:44:55 PM
> Subject: Re: master/slave failure scenario
> 
> You are right... I just don't like the idea of stopping the indexing process
> if the master fails until a new one is started (more or less by hand).
> 
> On Thu, May 21, 2009 at 6:49 PM, Bryan Talbot wrote:
> 
> > Indexing is usually much more expensive that replication so it won't scale
> > well as you add more servers.  Also, what would a client do if it was able
> > to send the update to only some of the servers because others were down (for
> > maintenance, etc)?
> >
> >
> >
> > -Bryan
> >
> >
> >
> >
> >
> > On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote:
> >
> >  Just curious. What would be the disadvantages of a no replication / multi
> >> master (no slave) setup?
> >> The client code should do the updates for evey master ofc, but if one
> >> machine would fail then I can imediatly continue the indexing process and
> >> also I can query the index on any machine for a valid result.
> >> I might be missing something...
> >> On Thu, May 14, 2009 at 4:19 PM, nk 11 wrote:
> >>
> >>  wow! that was just a couple of days old!
> >>> thanks as lot!
> >>>  2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
> >>>
> >>>  yeah there is a hack
> 
> 
>  
> https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
>  #action_12708316
> 
>  On Thu, May 14, 2009 at 6:07 PM, nk 11 wrote:
> 
> > sorry for the mail. I wanted to hit reply :(
> >
> > On Thu, May 14, 2009 at 3:37 PM, nk 11 wrote:
> >
> >>
> >> oh, so the configuration must be manualy changed?
> >> Can't something be passed at (re)start time?
> >>
> >> 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
> >>
> >>>
> >>> On Thu, May 14, 2009 at 4:07 PM, nk 11 
> >>>
> >> wrote:
> 
> > Ok so the VIP will point to the new master. but what makes a slave
>  promoted
>  to a master? Only the fact that it will receive add/update requests?
>  And I suppose that this "hot" promotion is possible only if the
> 
> >>> slave
> 
> > is
>  convigured as master also...
> 
> >>> right.. By default you can setup all slaves to be master also. It
> >>> does
> >>> not cost anything if it is not serving any requests.
> >>>
> >>> so , if you have such a setting you will have to disable that slave
> >>> to
> >>> be a slave and restart it and you will have to make the VIP point to
> >>> this new slave as master.
> >>>
> >>> so hot promotion is still not possible.
> >>>
> 
>  2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
> 
> >
> > ideally , we don't do that.
> > you can just keep the master host behind a VIP so if you wish to
> > change the master make the VIP point to the new host
> >
> > On Wed, May 13, 2009 at 10:52 PM, nk 11 
> > wrote:
> >
> >> This is more interesting.Such a procedure would involve taking
> >>
> > down
> 
> > and
> >> reconfiguring the slave?
> >>
> >> On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
> >> wrote:
> >>
> >>  Or ...
> >>>
> >>> 1. Promote existing slave to new master
> >>> 2. Add new slave to cluster
> >>>
> >>>
> >>>
> >>>
> >>> -Bryan
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
> >>>
> >>> - Migrate configuration files from old master (or backup) to
> >>>
> >> new
> 
> > master.
> >>>
>  - Replicate from a slave to the new master.
>  - Resume indexing to new master.
> 
>  -Jay

Re: No sanity checks before replicating files?

2009-05-21 Thread Otis Gospodnetic

Hi Damien,

Interesting, this is similar to my suggestion to another person I just replied 
to here on solr-user.
Have you actually run into this problem?  I haven't tried it, but I'd think the 
first next replication (copying index from s1 to s2) would not necessarily 
fail, but would simply overwrite any changes that were made on s2 while it was 
serving as the master.  Is that not what happens?  If that's what happens, then 
I think what you'd simply have to do is to:

1) bring s1 back up, but don't make it a master immediately
2) take away the master role from s2
3) make s1 copy the index from s2, since s2 might have a more up to date index 
now
4) make s1 the master


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Damien Tournoud 
> To: solr-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 12:37:10 PM
> Subject: No sanity checks before replicating files?
> 
> Hi list,
> 
> We have deployed an experimental Solr 1.4 cluster (a master/slave
> setup, with automatic promotion of the slave as a master in case of
> failure) on drupal.org, to manage our medium size index (3GB, about
> 400K documents).
> 
> One of the problem we are facing is that there seems to be no sanity
> checks before downloading files. Take the following scenario:
> 
> - initial situation: s1 is master, s2 is slave
> - s1 fails, the virtual IP falls back to s2
> - some updates happen on s2
> - suppose now that s1 gets back online, s2 tries to replicate from
> s1, but after replicating all the files (3GB), the commit fails
> because the local index has been locally updated, the replication
> fails, but the process restarts at the next poll (redownload all the
> index files, fails again...) and so on
> 
> We are considering configuring each server to replicate from the
> virtual IP, which should solve that issue for us, but couldn't the
> slave do some sanity checks before trying to download all the files
> from the master?
> 
> Thanks in advance for any help you could provide,
> 
> Damien Tournoud



clustering SOLR-769

2009-05-21 Thread Allahbaksh Asadullah
Hi,
I built Solr from SVN today morning. I am using Clustering example. I
have added my own schema.xml.

The problem is the even though I change carrot.snippet field from
features to filecontent the clustering results are not changed a bit.
Please note features field is also there in my document.

   name
   
   features
   id

Why I get the same cluster even though I have changed the
carrot.snippet. Whether there is some problem with my understarnding?

Regards,
allahbaksh


Re: No sanity checks before replicating files?

2009-05-21 Thread Damien Tournoud
Hi Otis,

Thanks for your answer.

On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic
 wrote:
> Interesting, this is similar to my suggestion to another person I just 
> replied to here on solr-user.
> Have you actually run into this problem?  I haven't tried it, but I'd think 
> the first next replication (copying index from s1 to s2) would not 
> necessarily fail, but would simply overwrite any changes that were made on s2 
> while it was serving as the master.  Is that not what happens?

No it doesn't. For some reason, Solr download all the files of the
index, but fails to commit the changes locally. At the next poll, the
process restarts. Not only does this clogs the network, but it also
unnecessarily uses resources on the newly promoted slave, until we
change its configuration.

> If that's what happens, then I think what you'd simply have to do is to:
>
> 1) bring s1 back up, but don't make it a master immediately
> 2) take away the master role from s2
> 3) make s1 copy the index from s2, since s2 might have a more up to date 
> index now
> 4) make s1 the master

Once s2 is the master, we want it to stay this way. We will reassign
s1 as the slave at a later stage, when resources allows. What worries
me is that strange behavior of Solr 1.4 replication when the "slave"
index is fresher then the "master" one.

Damien


Re: How to change the weight of the fields ?

2009-05-21 Thread Otis Gospodnetic

Hi,

I'm not sure why the rest of the scoring explanation is not shown, but your 
query *was* expanded to search on text and title_s, and id fields, so I think 
that expanded/rewritten query is what went to the index.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Vincent Pérès 
> To: solr-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 4:34:00 AM
> Subject: Re: How to change the weight of the fields ?
> 
> 
> It seems I can only search on the field 'text'. With the following url :
> http://localhost:8983/solr/select/?q=novel&qt=dismax&fl=title_s,id&version=2.2&start=0&rows=10&indent=on&debugQuery=on
> 
> I get answers, but on the debug area, it seems it's only searching on the
> 'text' field (with or without 'qt' the results are displayed within the same
> order) :
> 
> 
> novel
> novel
> −
> 
> +DisjunctionMaxQuery((text:novel^0.5 | title_s:novel^5.0 |
> id:novel^10.0)~0.01) ()
> 
> −
> 
> +(text:novel^0.5 | title_s:novel^5.0 | id:novel^10.0)~0.01 ()
> 
> −
> 
> −
> 
> 
> 0.014641666 = (MATCH) sum of:
>   0.014641666 = (MATCH) max plus 0.01 times others of:
> 0.014641666 = (MATCH) weight(text:novel^0.5 in 114927), product of:
>   0.01362607 = queryWeight(text:novel^0.5), product of:
> 0.5 = boost
> 3.4734163 = idf(docFreq=10634, numDocs=43213)
> 0.007845918 = queryNorm
>   1.0745333 = (MATCH) fieldWeight(text:novel in 114927), product of:
> 1.4142135 = tf(termFreq(text:novel)=2)
> 3.4734163 = idf(docFreq=10634, numDocs=43213)
> 0.21875 = fieldNorm(field=text, doc=114927)
> 
> etc.
> 
> I should have a debug below with a search of the term into 'title_s' and
> 'id' no?
> 
> Thanks for your answers !
> Vincent
> -- 
> View this message in context: 
> http://www.nabble.com/How-to-change-the-weight-of-the-fields---tp23619971p23649624.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Phrase Search Issue

2009-05-21 Thread Otis Gospodnetic

Amit,

Append &debugQuery=true to the search request URL and you'll see how your query 
string was interpreted.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: dabboo 
> To: solr-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 3:48:45 AM
> Subject: Re: Phrase Search Issue
> 
> 
> This problem is related with the default operator in dismax. Currently OR is
> the default operator and it is behaving perfectly fine. I have changed the
> default operator in schema.xml to AND, I also have changed the minimum match
> to 100%.
> 
> But it seems like AND as default operator doesnt work with Dismax.
> Please suggest.
> 
> Thanks,
> Amit Garg
> 
> 
> 
> dabboo wrote:
> > 
> > Hi,
> > 
> > I am facing one issue in phrase query. I am entering 'Top of the world' as
> > my search criteria. I am expecting it to return all the records in which,
> > one field should all these words in any order. 
> > 
> > But it is treating as OR and returning all the records, which are having
> > either of these words. I am doing this using dismax request. 
> > 
> > I would appreciate if somebody can provide me some pointers.
> > 
> > Thanks,
> > Amit Garg
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Phrase-Search-Issue-tp23648813p23649189.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: No sanity checks before replicating files?

2009-05-21 Thread Otis Gospodnetic

Aha, I see.  Perhaps you can post the error message/stack trace?

As for the sanity check, I bet a call to 
http://host:port/solr/replication?command=indexversion could be used ensure 
only newer versions of the index are being pulled.  We'll see what Paul says 
when he wakes up. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Damien Tournoud 
> To: solr-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 1:26:30 PM
> Subject: Re: No sanity checks before replicating files?
> 
> Hi Otis,
> 
> Thanks for your answer.
> 
> On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic
> wrote:
> > Interesting, this is similar to my suggestion to another person I just 
> > replied 
> to here on solr-user.
> > Have you actually run into this problem?  I haven't tried it, but I'd think 
> the first next replication (copying index from s1 to s2) would not 
> necessarily 
> fail, but would simply overwrite any changes that were made on s2 while it 
> was 
> serving as the master.  Is that not what happens?
> 
> No it doesn't. For some reason, Solr download all the files of the
> index, but fails to commit the changes locally. At the next poll, the
> process restarts. Not only does this clogs the network, but it also
> unnecessarily uses resources on the newly promoted slave, until we
> change its configuration.
> 
> > If that's what happens, then I think what you'd simply have to do is to:
> >
> > 1) bring s1 back up, but don't make it a master immediately
> > 2) take away the master role from s2
> > 3) make s1 copy the index from s2, since s2 might have a more up to date 
> > index 
> now
> > 4) make s1 the master
> 
> Once s2 is the master, we want it to stay this way. We will reassign
> s1 as the slave at a later stage, when resources allows. What worries
> me is that strange behavior of Solr 1.4 replication when the "slave"
> index is fresher then the "master" one.
> 
> Damien



Re: Plugin Not Found

2009-05-21 Thread Jeff Newburn
One additional note we are on 1.4 tunk as of 5/7/2009.  Just not sure why it
won't load since it obviously works fine if directly inserted into the
WEB-INF directory.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


> From: Mark Miller 
> Reply-To: 
> Date: Thu, 21 May 2009 12:19:47 -0400
> To: 
> Subject: Re: Plugin Not Found
> 
> Jeff Newburn wrote:
>> Nothing else is in the lib directory but this one jar.
>> 
>> Additionally, the logs seem to say that it finds the lib as shown below
>> INFO: Solr home set to '/home/zetasolr/'
>> May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader
>> createClassLoader
>> INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr
>> classloader
>> 
>> However as soon as it tries the component it cannot find the class.
>> 
>>   
> Something must be wacky. I just did a quick custom component with 1.3
> and trunk, and it loaded no problem in both cases.
> 
> Anything odd about your Component? Your sure it extends SearchComponent?
> 
> As Noble mentioned, you will not be able to find other classes/jars in
> the solr.home/lib directory from a class/jar in the solr.home/lib
> directory. But this, oddly, doesn't appear to be the issue your facing.
> 
> Do share if you have anything else you can add.
> 
> -- 
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 



Regarding Delta-Import Query in DIH

2009-05-21 Thread jayakeerthi s
Hi All,

I understand from the details provided under
http://wiki.apache.org/solr/DataImportHandler regarding Delta-import that
there should be an additional column *last_modified* of timestamp type in
the table.

Is there any other way/method the same can be achieved without creating the
additional column *last_modified* in the tables?? please advise.


Thanks in advance


Re: Plugin Not Found

2009-05-21 Thread Grant Ingersoll
Can you share your full log (at least through startup) as well as the  
config for both the component and the ReqHandler that is using it?


-Grant

On May 21, 2009, at 3:37 PM, Jeff Newburn wrote:

One additional note we are on 1.4 tunk as of 5/7/2009.  Just not  
sure why it

won't load since it obviously works fine if directly inserted into the
WEB-INF directory.
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



From: Mark Miller 
Reply-To: 
Date: Thu, 21 May 2009 12:19:47 -0400
To: 
Subject: Re: Plugin Not Found

Jeff Newburn wrote:

Nothing else is in the lib directory but this one jar.

Additionally, the logs seem to say that it finds the lib as shown  
below

INFO: Solr home set to '/home/zetasolr/'
May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to  
Solr

classloader

However as soon as it tries the component it cannot find the class.



Something must be wacky. I just did a quick custom component with 1.3
and trunk, and it loaded no problem in both cases.

Anything odd about your Component? Your sure it extends  
SearchComponent?


As Noble mentioned, you will not be able to find other classes/jars  
in

the solr.home/lib directory from a class/jar in the solr.home/lib
directory. But this, oddly, doesn't appear to be the issue your  
facing.


Do share if you have anything else you can add.

--
- Mark

http://www.lucidimagination.com







--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: clustering SOLR-769

2009-05-21 Thread Stanislaw Osinski
Hi.


> I built Solr from SVN today morning. I am using Clustering example. I
> have added my own schema.xml.
>
> The problem is the even though I change carrot.snippet field from
> features to filecontent the clustering results are not changed a bit.
> Please note features field is also there in my document.
>
>   name
>   
>   features
>   id
>
> Why I get the same cluster even though I have changed the
> carrot.snippet. Whether there is some problem with my understarnding?


If you get back to the clustering dir in examples and change

features

to

manu

do you see any change in clusters?

Cheers,

Staszek

--
http://carrot2.org


Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread James X
Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple
of integers, booleans and one html field (for document body content).

I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
that would mean I have several active Solr worker threads.

The only exceptions I have are the RuntimeException flush errors, followed
by a handful (normally 10-20) of LockObtainFailedExceptions, which i
presumed were being caused by the faulty threads dying and failing to
release locks.

Oh wait, I am getting WstxUnexpectedCharException exceptions every now and
then:
SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
((CTRL-CHAR, code 8))
 at [row,col {unknown-source}]: [1,26070]
at
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)

I presumed these were caused by character encoding issues, but haven't
looked into them at all yet.

Thanks again for your help! I'll make some time this afternoon to build some
patched Lucene jars and get the results


On Thu, May 21, 2009 at 5:06 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Another question: are there any other exceptions in your logs?  Eg
> problems adding certain documents, or anything?
>
> Mike
>
> On Wed, May 20, 2009 at 11:18 AM, James X
>  wrote:
> > Hi Mike, thanks for the quick response:
> >
> > $ java -version
> > java version "1.6.0_11"
> > Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
> > Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
> >
> > I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
> > hitting that yet!
> >
> > The exception always reports 0 length, but the number of of docs varies,
> > heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
> > 89 1 docs vs 0 length
> > 20 2 docs vs 0 length
> >  9 3 docs vs 0 length
> >  1 4 docs vs 0 length
> >  3 5 docs vs 0 length
> >  2 6 docs vs 0 length
> >  1 7 docs vs 0 length
> >  1 9 docs vs 0 length
> >  1 10 docs vs 0 length
> >
> > The only unusual thing I can think of that we're doing with Solr is
> > aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot
> a
> > pattern between core admin operations and these exceptions, however...
> >
> > James
> >
> > On Wed, May 20, 2009 at 2:37 AM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> Hmm... somehow Lucene is flushing a new segment on closing the
> >> IndexWriter, and thinks 1 doc had been added to the stored fields
> >> file, yet the fdx file is the wrong size (0 bytes).  This check (&
> >> exception) are designed to prevent corruption from entering the index,
> >> so it's at least good to see CheckIndex passes after this.
> >>
> >> I don't think you're hitting LUCENE-1521: that issue only happens if a
> >> single segment has more than ~268 million docs.
> >>
> >> Which exact JRE version are you using?
> >>
> >> When you hit this exception, is it always "1 docs vs 0 length in bytes"?
> >>
> >> Mike
> >>
> >> On Wed, May 20, 2009 at 3:19 AM, James X
> >>  wrote:
> >> > Hello all,I'm running Solr 1.3 in a multi-core environment. There are
> up
> >> to
> >> > 2000 active cores in each Solr webapp instance at any given time.
> >> >
> >> > I've noticed occasional errors such as:
> >> > SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
> >> docs
> >> > vs 0 length in bytes of _h.fdx
> >> >at
> >> >
> >>
> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
> >> >at
> >> >
> >>
> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
> >> >at
> >> >
> >>
> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
> >> >at
> >> >
> >>
> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
> >> >at
> >> >
> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
> >> >at
> >> > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
> >> >at
> >> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
> >> >at
> >> >
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
> >> >at
> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
> >> >at
> >> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
> >> >

Re: Solr statistics of top searches and results returned

2009-05-21 Thread Grant Ingersoll


On May 20, 2009, at 4:33 AM, Shalin Shekhar Mangar wrote:


On Wed, May 20, 2009 at 1:31 PM, Plaatje, Patrick <
patrick.plaa...@getronics.com> wrote:



At the moment Solr does not have such functionality. I have written a
plugin for Solr though which uses a second Solr core to store/index  
the
searches. If you're interested, send me an email and I'll get you  
the source

for the plugin.


Patrick, this will be a useful addition. However instead of doing  
this with

another core, we can keep running statistics which can be shown on the
statistics page itself. What do you think?


I think you will want some type of persistence mechanism otherwise you  
will end up consuming a lot of resources keeping track of all the  
query strings, unless I'm missing something.  Either a Lucene index  
(Solr core) or the option of embedding a DB.  Ideally, it would be  
pluggable such that people could choose their storage mechanism.  Most  
people do this kind of thing offline via log analysis as logs can grow  
quite large quite quickly.





A related approach for showing slow queries was discussed recently.  
There's

an issue open which has more details:

https://issues.apache.org/jira/browse/SOLR-1101

--
Regards,
Shalin Shekhar Mangar.


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: clustering SOLR-769

2009-05-21 Thread Allahbaksh Asadullah
Hi,
I will try this. Because when I tried it with field declared by me there was
no change. Will check out this and let you know.
Is it possbile to specify more than one snippet field or should I use copy
field to copy copy two or three field into single field and specify it in
snippet field.
Regards,
Allahbaksh

On Fri, May 22, 2009 at 2:24 AM, Stanislaw Osinski
wrote:

> Hi.
>
>
>> I built Solr from SVN today morning. I am using Clustering example. I
>> have added my own schema.xml.
>>
>> The problem is the even though I change carrot.snippet field from
>> features to filecontent the clustering results are not changed a bit.
>> Please note features field is also there in my document.
>>
>>   name
>>   
>>   features
>>   id
>>
>> Why I get the same cluster even though I have changed the
>> carrot.snippet. Whether there is some problem with my understarnding?
>
>
> If you get back to the clustering dir in examples and change
>
> features
>
> to
>
> manu
>
> do you see any change in clusters?
>
> Cheers,
>
> Staszek
>
> --
> http://carrot2.org
>



-- 
Allahbaksh Mohammedali Asadullah,
Software Engineering & Technology Labs,
Infosys Technolgies Limited, Electronic City,
Hosur Road, Bangalore 560 100, India.
(Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
Fax: 91-80-28520362 | Mobile: 91-9845505322.


getting all rows from SOLRJ client using setRows method

2009-05-21 Thread darniz

Hello 
is there a way you can get all the results back from SOLR when querying
solrJ client

my gut feeling was that this might work
query.setRows(-1)

The way is to change the configuration xml file, but that like hard coding
the configuration, and there also i have to set some valid number, i cant
say return all rows.

Is there a way to done through query.

Thanks
rashid


-- 
View this message in context: 
http://www.nabble.com/getting-all-rows-from-SOLRJ-client-using-setRows-method-tp23662668p23662668.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: getting all rows from SOLRJ client using setRows method

2009-05-21 Thread Ryan McKinley


careful what you ask for...  what if you have a million docs?  will  
you get an OOM?


Maybe a better solution is to run a loop where you grab a bunch of  
docs and then increase the "start" value.


but you can always use:
query.setRows( Integer.MAX_VALUE )

ryan


On May 21, 2009, at 8:37 PM, darniz wrote:



Hello
is there a way you can get all the results back from SOLR when  
querying

solrJ client

my gut feeling was that this might work
query.setRows(-1)

The way is to change the configuration xml file, but that like hard  
coding
the configuration, and there also i have to set some valid number, i  
cant

say return all rows.

Is there a way to done through query.

Thanks
rashid


--
View this message in context: 
http://www.nabble.com/getting-all-rows-from-SOLRJ-client-using-setRows-method-tp23662668p23662668.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: what does the version parameter in the query mean?

2009-05-21 Thread Anshuman Manur
ahI see! thank you so much for the response!

I'm using SolrJ, so I probably don't need to set XML version since the wiki
tells me that it uses binary as a default!

On Thu, May 21, 2009 at 10:00 PM, Jay Hill  wrote:

> I was interested in this recently and also couldn't find anything on the
> wiki. I found this in the list archive:
>
> The version parameter determines the XML protocol used in the response.
> Clients are strongly encouraged to ''always'' specify the protocol version,
> so as to ensure that the format of the response they receive does not
> change
> unexpectedly if/when the Solr server is upgraded.
>
> Here is a link to the archive:
> http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg00518.html
>
> -Jay
>
>
> On Thu, May 21, 2009 at 1:06 AM, Anshuman Manur <
> anshuman_ma...@stragure.com
> > wrote:
>
> > Hello all,
> >
> > I'm using Solr 1.3.0, and when I query my index for "solr" using the
> admin
> > page, the query string in the address bar of my browser reads like this:
> >
> >
> >
> http://localhost:8080/solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on
> >
> > Now, I don't know what version=2.2 means, and the wiki or the docs don't
> > tell me. Could someone enlighten me?
> >
> > Thank You
> > Anshuman Manur
> >
>


lock problem

2009-05-21 Thread Ashish P

Hi, 
The scenario is I have 2 different solr instances running at different
locations concurrently. The data location for both instances is same:
\\hostname\FileServer\CoreTeam\Research\data.
Both instances use  EmbeddedSolrServer and locktype at both instances is
'single'.

I am getting following exception : 
Cannot overwrite: \\hostname\FileServer\CoreTeam\Research\data\index\_1.fdt
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:440)
at org.apache.lucene.index.FieldsWriter.(FieldsWriter.java:64)
at
org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:73)

I tried simple locktype also but it shows timeout exception when writing to
index.
Please help me out..
Thanks,
Ashish


-- 
View this message in context: 
http://www.nabble.com/lock-problem-tp23663558p23663558.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Regarding Delta-Import Query in DIH

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
the last_modified column is just one way. The query has to be
intelligent enough to detect the delta . it doesn't matter how you do
it

On Fri, May 22, 2009 at 1:32 AM, jayakeerthi s  wrote:
> Hi All,
>
> I understand from the details provided under
> http://wiki.apache.org/solr/DataImportHandler regarding Delta-import that
> there should be an additional column *last_modified* of timestamp type in
> the table.
>
> Is there any other way/method the same can be achieved without creating the
> additional column *last_modified* in the tables?? please advise.
>
>
> Thanks in advance
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: No sanity checks before replicating files?

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
Let us see what is the desired behavior.

When s1 comes back up online , s2 must download a fresh copy of index
from s1 because s1 is the slave and s2 has a newer version of index
than s1.

Are you suggesting that s2 downloads the index files and then commit
fails? The code is written as follows

boolean freshDownloadneeded = myIndexGeneration >= mastersIndexgeneration;

then it should be a problem

can u post the stacktrace?

On Thu, May 21, 2009 at 11:45 PM, Otis Gospodnetic
 wrote:
>
> Aha, I see.  Perhaps you can post the error message/stack trace?
>
> As for the sanity check, I bet a call to 
> http://host:port/solr/replication?command=indexversion could be used ensure 
> only newer versions of the index are being pulled.  We'll see what Paul says 
> when he wakes up. :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Damien Tournoud 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, May 21, 2009 1:26:30 PM
>> Subject: Re: No sanity checks before replicating files?
>>
>> Hi Otis,
>>
>> Thanks for your answer.
>>
>> On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic
>> wrote:
>> > Interesting, this is similar to my suggestion to another person I just 
>> > replied
>> to here on solr-user.
>> > Have you actually run into this problem?  I haven't tried it, but I'd think
>> the first next replication (copying index from s1 to s2) would not 
>> necessarily
>> fail, but would simply overwrite any changes that were made on s2 while it 
>> was
>> serving as the master.  Is that not what happens?
>>
>> No it doesn't. For some reason, Solr download all the files of the
>> index, but fails to commit the changes locally. At the next poll, the
>> process restarts. Not only does this clogs the network, but it also
>> unnecessarily uses resources on the newly promoted slave, until we
>> change its configuration.
>>
>> > If that's what happens, then I think what you'd simply have to do is to:
>> >
>> > 1) bring s1 back up, but don't make it a master immediately
>> > 2) take away the master role from s2
>> > 3) make s1 copy the index from s2, since s2 might have a more up to date 
>> > index
>> now
>> > 4) make s1 the master
>>
>> Once s2 is the master, we want it to stay this way. We will reassign
>> s1 as the slave at a later stage, when resources allows. What worries
>> me is that strange behavior of Solr 1.4 replication when the "slave"
>> index is fresher then the "master" one.
>>
>> Damien
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: How to index large set data

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
check the status page of DIH and see if it is working properly. and
if, yes what is the rate of indexing

On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai  wrote:
>
> Hi,
>
> I have about 45GB xml files to be indexed. I am using DataImportHandler. I 
> started the full import 4 hours ago, and it's still running
> My computer has 4GB memory. Any suggestion on the solutions?
> Thanks!
>
> JB
>
>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: How to index large set data

2009-05-21 Thread Jianbin Dai

Hi Paul,

Thank you so much for answering my questions. It really helped.
After some adjustment, basically setting mergeFactor to 1000 from the default 
value of 10, I can finished the whole job in 2.5 hours. I checked that during 
running time, only around 18% of memory is being used, and VIRT is always 
1418m. I am thinking it may be restricted by JVM memory setting. But I run the 
data import command through web, i.e.,
http://:/solr/dataimport?command=full-import, how can I set the 
memory allocation for JVM? 
Thanks again!

JB

--- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ्  wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Subject: Re: How to index large set data
> To: solr-user@lucene.apache.org
> Date: Thursday, May 21, 2009, 9:57 PM
> check the status page of DIH and see
> if it is working properly. and
> if, yes what is the rate of indexing
> 
> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai 
> wrote:
> >
> > Hi,
> >
> > I have about 45GB xml files to be indexed. I am using
> DataImportHandler. I started the full import 4 hours ago,
> and it's still running
> > My computer has 4GB memory. Any suggestion on the
> solutions?
> > Thanks!
> >
> > JB
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 






Re: How to index large set data

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
what is the total no:of docs created ?  I guess it may not be memory
bound. indexing is mostly amn IO bound operation. You may be able to
get a better perf if a SSD is used (solid state disk)

On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai  wrote:
>
> Hi Paul,
>
> Thank you so much for answering my questions. It really helped.
> After some adjustment, basically setting mergeFactor to 1000 from the default 
> value of 10, I can finished the whole job in 2.5 hours. I checked that during 
> running time, only around 18% of memory is being used, and VIRT is always 
> 1418m. I am thinking it may be restricted by JVM memory setting. But I run 
> the data import command through web, i.e.,
> http://:/solr/dataimport?command=full-import, how can I set the 
> memory allocation for JVM?
> Thanks again!
>
> JB
>
> --- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ्  
> wrote:
>
>> From: Noble Paul നോബിള്‍  नोब्ळ् 
>> Subject: Re: How to index large set data
>> To: solr-user@lucene.apache.org
>> Date: Thursday, May 21, 2009, 9:57 PM
>> check the status page of DIH and see
>> if it is working properly. and
>> if, yes what is the rate of indexing
>>
>> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai 
>> wrote:
>> >
>> > Hi,
>> >
>> > I have about 45GB xml files to be indexed. I am using
>> DataImportHandler. I started the full import 4 hours ago,
>> and it's still running
>> > My computer has 4GB memory. Any suggestion on the
>> solutions?
>> > Thanks!
>> >
>> > JB
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>
>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Solr statistics of top searches and results returned

2009-05-21 Thread Shalin Shekhar Mangar
On Fri, May 22, 2009 at 3:22 AM, Grant Ingersoll wrote:

>
> I think you will want some type of persistence mechanism otherwise you will
> end up consuming a lot of resources keeping track of all the query strings,
> unless I'm missing something.  Either a Lucene index (Solr core) or the
> option of embedding a DB.  Ideally, it would be pluggable such that people
> could choose their storage mechanism.  Most people do this kind of thing
> offline via log analysis as logs can grow quite large quite quickly.
>

For a general case, yes. But I was thinking more of a top 'n' queries as a
running statistic.

-- 
Regards,
Shalin Shekhar Mangar.