Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread KK
I want to know the maximum no of cores supported by Solr. 1000s or may be
millions all under one solr instance ?
Also I want to know how to redirect a particular query to a particular core.
Actually I'm querying solr from Ajax, so I think there must be some request
parameter that says which core we want to query, right? Can some one tell me
how to do this, any good pointers on the same will be helpful as well.
Thank you.

--kk


Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread Shishir Jain
http://wiki.apache.org/solr/CoreAdmin
Best regards,
Shishir

On Thu, May 14, 2009 at 1:58 PM, KK  wrote:

> I want to know the maximum no of cores supported by Solr. 1000s or may be
> millions all under one solr instance ?
> Also I want to know how to redirect a particular query to a particular
> core.
> Actually I'm querying solr from Ajax, so I think there must be some request
> parameter that says which core we want to query, right? Can some one tell
> me
> how to do this, any good pointers on the same will be helpful as well.
> Thank you.
>
> --kk
>


Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
there is no hard limit on the no:of cores. it is limited by your
system's ability to open files and the resources.
the queries are automatically sent to appropriate core if your url is

htt://host:port//select

On Thu, May 14, 2009 at 1:58 PM, KK  wrote:
> I want to know the maximum no of cores supported by Solr. 1000s or may be
> millions all under one solr instance ?
> Also I want to know how to redirect a particular query to a particular core.
> Actually I'm querying solr from Ajax, so I think there must be some request
> parameter that says which core we want to query, right? Can some one tell me
> how to do this, any good pointers on the same will be helpful as well.
> Thank you.
>
> --kk
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Delete documents from index with dataimport

2009-05-14 Thread Andrew McCombe
Hi

Yes I'd like the document deleted from Solr and yes, there is a unique
document id field in Solr.

Regards
Andrew

Andrew

2009/5/13 Fergus McMenemie :
>>Hi
>>
>>Is it possible, through dataimport handler to remove an existing
>>document from the Solr index?
>>
>>I import/update from my database where the active field is true.
>>However, if the client then set's active to false, the document stays
>>in the Solr index and doesn't get removed.
>>
>>Regards
>>Andrew
>
> Yes but only in the latest trunk. If your "active" field is false
> do you want to see the document deleted? Do you have another field
> which is a unique ID for the document?
>
> Fergus
> --
>
> ===
> Fergus McMenemie               Email:fer...@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===
>


UK Solr users meeting?

2009-05-14 Thread Colin Hammond
I was wondering if there is an interest in a UK (South East) solr user 
group meeting


Please let me know if you are interested.  I am happy to organize.

Regards,

Colin


Re: UK Solr users meeting?

2009-05-14 Thread Fergus McMenemie
>I was wondering if there is an interest in a UK (South East) solr user 
>group meeting
>
>Please let me know if you are interested.  I am happy to organize.
>
>Regards,
>
>Colin

Yes Very interested. I am in lincolnshire.
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Delete documents from index with dataimport

2009-05-14 Thread Fergus McMenemie
>Hi
>
>Yes I'd like the document deleted from Solr and yes, there is a unique
>document id field in Solr.
>

I that case try the following. Create a field in the entity:-
 

Notes. 
1) the entity is assumed to have name="jc".
2) the uniqueKey field is assumed to called "id".
3) the entity needs to have transformer="RegexTransformer"


>
>2009/5/13 Fergus McMenemie :
>>>Hi
>>>
>>>Is it possible, through dataimport handler to remove an existing
>>>document from the Solr index?
>>>
>>>I import/update from my database where the active field is true.
>>>However, if the client then set's active to false, the document stays
>>>in the Solr index and doesn't get removed.
>>>
>>>Regards
>>>Andrew
>>
>> Yes but only in the latest trunk. If your "active" field is false
>> do you want to see the document deleted? Do you have another field
>> which is a unique ID for the document?
>>
>> Fergus

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread KK
Thank you very much. Got the point.
One off the track question, can we automate the creation of new cores[it
requires manually editing the solr.xml file as I know, and what about the
location of core index directory, do we need to point that manually as
well].
After going through the wiki what I found is we've to mention the names of
cores in solr.xml. I want to automate the process in such a way that when a
user registers[ on say my site for the service], we'll create a coresponding
core for the same user and with a specific core id[unique for this user
only] so that the user will be given a search interface that will redirect
all searches for this user to http://host:port//select
Will apprecite any ideas on this.

Thanks,
KK.

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 

> there is no hard limit on the no:of cores. it is limited by your
> system's ability to open files and the resources.
> the queries are automatically sent to appropriate core if your url is
>
> htt://host:port//select
>
> On Thu, May 14, 2009 at 1:58 PM, KK  wrote:
> > I want to know the maximum no of cores supported by Solr. 1000s or may be
> > millions all under one solr instance ?
> > Also I want to know how to redirect a particular query to a particular
> core.
> > Actually I'm querying solr from Ajax, so I think there must be some
> request
> > parameter that says which core we want to query, right? Can some one tell
> me
> > how to do this, any good pointers on the same will be helpful as well.
> > Thank you.
> >
> > --kk
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


RE: Autocommit blocking adds? AutoCommit Speedup?

2009-05-14 Thread Gargate, Siddharth
Hi all,
I am also facing the same issue where autocommit blocks all
other requests. I having around 1,00,000 documents with average size of
100K each. It took more than 20 hours to index. 
I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25.
Do I need more configuration changes?
Also I see that memory usage goes to peak level of heap specified(6 GB
in my case). Looks like Solr spends most of the time in GC. 
According to my understanding, fix for Solr-1155 would be that commit
will run in background and new documents will be queued in the memory.
But I am afraid of the memory consumption by this queue if commit takes
much longer to complete.

Thanks,
Siddharth

-Original Message-
From: jayson.minard [mailto:jayson.min...@gmail.com] 
Sent: Saturday, May 09, 2009 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Autocommit blocking adds? AutoCommit Speedup?


First cut of updated handler now in:
https://issues.apache.org/jira/browse/SOLR-1155

Needs review from those that know Lucene better, and double check for
errors
in locking or other areas of the code.  Thanks.

--j


jayson.minard wrote:
> 
> Can we move this to patch files within the JIRA issue please.  Will
make
> it easier to review and help out a as a patch to current trunk.
> 
> --j
> 
> 
> Jim Murphy wrote:
>> 
>> 
>> 
>> Yonik Seeley-2 wrote:
>>> 
>>> ...your code snippit elided and edited below ...
>>> 
>> 
>> 
>> 
>> Don't take this code as correct (or even compiling) but is this the
>> essence?  I moved shared access to the writer inside the read lock
and
>> kept the other non-commit bits to the write lock.  I'd need to
rethink
>> the locking in a more fundamental way but is this close to idea? 
>> 
>> 
>> 
>>  public void commit(CommitUpdateCommand cmd) throws IOException {
>> 
>> if (cmd.optimize) {
>>   optimizeCommands.incrementAndGet();
>> } else {
>>   commitCommands.incrementAndGet();
>> }
>> 
>> Future[] waitSearcher = null;
>> if (cmd.waitSearcher) {
>>   waitSearcher = new Future[1];
>> }
>> 
>> boolean error=true;
>> iwCommit.lock();
>> try {
>>   log.info("start "+cmd);
>> 
>>   if (cmd.optimize) {
>> closeSearcher();
>> openWriter();
>> writer.optimize(cmd.maxOptimizeSegments);
>>   }
>> finally {
>>   iwCommit.unlock();
>>  }
>> 
>> 
>>   iwAccess.lock(); 
>>   try
>>  {
>>   writer.commit();
>>  }
>>  finally
>>  {
>>   iwAccess.unlock(); 
>>  }
>> 
>>   iwCommit.lock(); 
>>   try
>>  {
>>   callPostCommitCallbacks();
>>   if (cmd.optimize) {
>> callPostOptimizeCallbacks();
>>   }
>>   // open a new searcher in the sync block to avoid opening it
>>   // after a deleteByQuery changed the index, or in between
deletes
>>   // and adds of another commit being done.
>>   core.getSearcher(true,false,waitSearcher);
>> 
>>   // reset commit tracking
>>   tracker.didCommit();
>> 
>>   log.info("end_commit_flush");
>> 
>>   error=false;
>> }
>> finally {
>>   iwCommit.unlock();
>>   addCommands.set(0);
>>   deleteByIdCommands.set(0);
>>   deleteByQueryCommands.set(0);
>>   numErrors.set(error ? 1 : 0);
>> }
>> 
>> // if we are supposed to wait for the searcher to be registered,
then
>> we should do it
>> // outside of the synchronized block so that other update
operations
>> can proceed.
>> if (waitSearcher!=null && waitSearcher[0] != null) {
>>try {
>> waitSearcher[0].get();
>>   } catch (InterruptedException e) {
>> SolrException.log(log,e);
>>   } catch (ExecutionException e) {
>> SolrException.log(log,e);
>>   }
>> }
>>   }
>> 
>> 
>> 
>> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp2
3435224p23457422.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: master/slave failure scenario

2009-05-14 Thread nk 11
Ok so the VIP will point to the new master. but what makes a slave promoted
to a master? Only the fact that it will receive add/update requests?
And I suppose that this "hot" promotion is possible only if the slave is
convigured as master also...

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 

> ideally , we don't do that.
> you can just keep the master host behind a VIP so if you wish to
> change the master make the VIP point to the new host
>
> On Wed, May 13, 2009 at 10:52 PM, nk 11  wrote:
> > This is more interesting.Such a procedure would involve taking down and
> > reconfiguring the slave?
> >
> > On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot  >wrote:
> >
> >> Or ...
> >>
> >> 1. Promote existing slave to new master
> >> 2. Add new slave to cluster
> >>
> >>
> >>
> >>
> >> -Bryan
> >>
> >>
> >>
> >>
> >>
> >> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
> >>
> >>  - Migrate configuration files from old master (or backup) to new
> master.
> >>> - Replicate from a slave to the new master.
> >>> - Resume indexing to new master.
> >>>
> >>> -Jay
> >>>
> >>> On Wed, May 13, 2009 at 4:26 AM, nk 11  wrote:
> >>>
> >>>  Nice.
>  What if the master fails permanently (like a disk crash...) and the
> new
>  master is a clean machine?
>  2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 
> 
>   On Wed, May 13, 2009 at 12:10 PM, nk 11 
> wrote:
> >
> >> Hello
> >>
> >> I'm kind of new to Solr and I've read about replication, and the
> fact
> >>
> > that a
> >
> >> node can act as both master and slave.
> >> I a replica fails and then comes back on line I suppose that it will
> >>
> > resyncs
> >
> >> with the master.
> >>
> > right
> >
> >>
> >> But what happnes if the master fails? A slave that is configured as
> >>
> > master
> >
> >> will kick in? What if that slave is not yes fully sync'ed with the
> >>
> > failed
> 
> > master and has old data?
> >>
> > if the master fails you can't index the data. but the slaves will
> > continue serving the requests with the last index. You an bring back
> > the master up and resume indexing.
> >
> >
> >> What happens when the original master comes back on line? He will
> >>
> > remain
> 
> > a
> >
> >> slave because there is another node with the master role?
> >>
> >> Thank you!
> >>
> >>
> >
> >
> > --
> > -
> > Noble Paul | Principal Engineer| AOL | http://aol.com
> >
> >
> 
> >>
> >
>
>
>
> --
>  -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


Re: Solr vs Sphinx

2009-05-14 Thread Michael McCandless
On Wed, May 13, 2009 at 12:33 PM, Grant Ingersoll  wrote:
> I've contacted
> others in the past who have done "comparisons" and after one round of
> emailing it was almost always clear that they didn't know what best
> practices are for any given product and thus were doing things
> sub-optimally.

While I agree, one should properly match & tune all apps they are
testing (for a fair comparison), we in turn must set out-of-the-box
defaults (in Lucene and Solr) that get you as close to the "best
practices" as possible.

We don't always do that, and I think we should do better.

My most recent example of this is BooleanQuery's performance.  It
turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable
performance gain (27% on my most recent test) for OR queries.

So why haven't we enabled this by default, already?  (As far as I can
tell it's functionally equivalent, as long as the Collector can accept
out-of-order docs, which our core collectors can).

We can't expect the "other camp" to discover that this obscure setting
must be set, to maximize Lucene's OR query performance.

Mike


Re: Solr vs Sphinx

2009-05-14 Thread Andrey Klochkov
>
>
> My most recent example of this is BooleanQuery's performance.  It
> turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable
> performance gain (27% on my most recent test) for OR queries.
>

Mike,

Can you please point me to some information concerning allowDocsOutOfOrder?
What's this at all?


-- 
Andrew Klochkov


Query syntax

2009-05-14 Thread Radha C.
Hello List,

 

I need to search the multiple values from the same field. I am having the
following syntax 

I am thinking of the first option. Can anyone tell me which one is correct
syntax? 

 

 Q=+title:=test +site_id:="22 3000676 566644"

 Q=+title:=test +site_id:=22 3000676 566644

 Q=+title:=test +site_id:=22 +site_id=:3000676

 

Thanks,

Radha.C

 



Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-14 Thread Jack Godwin
20+ hours? I index 3 million records in 3 hours.  Is your auto commit
causing a snapshot?  What do you have listed in the events.

Jack

On 5/14/09, Gargate, Siddharth  wrote:
> Hi all,
>   I am also facing the same issue where autocommit blocks all
> other requests. I having around 1,00,000 documents with average size of
> 100K each. It took more than 20 hours to index.
> I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25.
> Do I need more configuration changes?
> Also I see that memory usage goes to peak level of heap specified(6 GB
> in my case). Looks like Solr spends most of the time in GC.
> According to my understanding, fix for Solr-1155 would be that commit
> will run in background and new documents will be queued in the memory.
> But I am afraid of the memory consumption by this queue if commit takes
> much longer to complete.
>
> Thanks,
> Siddharth
>
> -Original Message-
> From: jayson.minard [mailto:jayson.min...@gmail.com]
> Sent: Saturday, May 09, 2009 10:45 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Autocommit blocking adds? AutoCommit Speedup?
>
>
> First cut of updated handler now in:
> https://issues.apache.org/jira/browse/SOLR-1155
>
> Needs review from those that know Lucene better, and double check for
> errors
> in locking or other areas of the code.  Thanks.
>
> --j
>
>
> jayson.minard wrote:
>>
>> Can we move this to patch files within the JIRA issue please.  Will
> make
>> it easier to review and help out a as a patch to current trunk.
>>
>> --j
>>
>>
>> Jim Murphy wrote:
>>>
>>>
>>>
>>> Yonik Seeley-2 wrote:

 ...your code snippit elided and edited below ...

>>>
>>>
>>>
>>> Don't take this code as correct (or even compiling) but is this the
>>> essence?  I moved shared access to the writer inside the read lock
> and
>>> kept the other non-commit bits to the write lock.  I'd need to
> rethink
>>> the locking in a more fundamental way but is this close to idea?
>>>
>>>
>>>
>>>  public void commit(CommitUpdateCommand cmd) throws IOException {
>>>
>>> if (cmd.optimize) {
>>>   optimizeCommands.incrementAndGet();
>>> } else {
>>>   commitCommands.incrementAndGet();
>>> }
>>>
>>> Future[] waitSearcher = null;
>>> if (cmd.waitSearcher) {
>>>   waitSearcher = new Future[1];
>>> }
>>>
>>> boolean error=true;
>>> iwCommit.lock();
>>> try {
>>>   log.info("start "+cmd);
>>>
>>>   if (cmd.optimize) {
>>> closeSearcher();
>>> openWriter();
>>> writer.optimize(cmd.maxOptimizeSegments);
>>>   }
>>> finally {
>>>   iwCommit.unlock();
>>>  }
>>>
>>>
>>>   iwAccess.lock();
>>>   try
>>>  {
>>>   writer.commit();
>>>  }
>>>  finally
>>>  {
>>>   iwAccess.unlock();
>>>  }
>>>
>>>   iwCommit.lock();
>>>   try
>>>  {
>>>   callPostCommitCallbacks();
>>>   if (cmd.optimize) {
>>> callPostOptimizeCallbacks();
>>>   }
>>>   // open a new searcher in the sync block to avoid opening it
>>>   // after a deleteByQuery changed the index, or in between
> deletes
>>>   // and adds of another commit being done.
>>>   core.getSearcher(true,false,waitSearcher);
>>>
>>>   // reset commit tracking
>>>   tracker.didCommit();
>>>
>>>   log.info("end_commit_flush");
>>>
>>>   error=false;
>>> }
>>> finally {
>>>   iwCommit.unlock();
>>>   addCommands.set(0);
>>>   deleteByIdCommands.set(0);
>>>   deleteByQueryCommands.set(0);
>>>   numErrors.set(error ? 1 : 0);
>>> }
>>>
>>> // if we are supposed to wait for the searcher to be registered,
> then
>>> we should do it
>>> // outside of the synchronized block so that other update
> operations
>>> can proceed.
>>> if (waitSearcher!=null && waitSearcher[0] != null) {
>>>try {
>>> waitSearcher[0].get();
>>>   } catch (InterruptedException e) {
>>> SolrException.log(log,e);
>>>   } catch (ExecutionException e) {
>>> SolrException.log(log,e);
>>>   }
>>> }
>>>   }
>>>
>>>
>>>
>>>
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp2
> 3435224p23457422.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

-- 
Sent from my mobile device


Date field

2009-05-14 Thread Jack Godwin
Does anyone know if there is still a bug in date fields?  I'm having a
problem boosting documents by date in solr 1.3

Thank,
Jack

-- 
Sent from my mobile device


Re: Query syntax

2009-05-14 Thread Shalin Shekhar Mangar
On Thu, May 14, 2009 at 5:20 PM, Radha C.  wrote:

> I need to search the multiple values from the same field. I am having the
> following syntax
>
> I am thinking of the first option. Can anyone tell me which one is correct
> syntax?
>
>  Q=+title:=test +site_id:="22 3000676 566644"
>
>  Q=+title:=test +site_id:=22 3000676 566644
>
>  Q=+title:=test +site_id:=22 +site_id=:3000676
>
>
None of the above. That ":=" is not a valid syntax. The request parameter
should be a lower cased "q". The "+" character signifies "must occur"
similar to a boolean AND.

Should title:test must match? Should all of "22", "3000676" etc be
present in site_id or just one match is alright?
-- 
Regards,
Shalin Shekhar Mangar.


RE: Query syntax

2009-05-14 Thread Radha C.
Thanks for your reply. 

 

Yes by mistaken I added := in place of ":" . The title should match and the
site_id should match any of these 23243455 , 245, 3457676 .

 

 

 

  _  

From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Thursday, May 14, 2009 5:43 PM
To: solr-user@lucene.apache.org; cra...@ceiindia.com
Subject: Re: Query syntax

 

On Thu, May 14, 2009 at 5:20 PM, Radha C.  wrote:

I need to search the multiple values from the same field. I am having the
following syntax

I am thinking of the first option. Can anyone tell me which one is correct
syntax?

 Q=+title:=test +site_id:="22 3000676 566644"

 Q=+title:=test +site_id:=22 3000676 566644

 Q=+title:=test +site_id:=22 +site_id=:3000676




None of the above. That ":=" is not a valid syntax. The request parameter
should be a lower cased "q". The "+" character signifies "must occur"
similar to a boolean AND.

Should title:test must match? Should all of "22", "3000676" etc be
present in site_id or just one match is alright?
-- 
Regards,
Shalin Shekhar Mangar.



Re: Query syntax

2009-05-14 Thread Shalin Shekhar Mangar
In that case, the following will work:

q=+title:test +site_id:(23243455 245 3457676)

On Thu, May 14, 2009 at 5:35 PM, Radha C.  wrote:

> Thanks for your reply.
>
>
>
> Yes by mistaken I added := in place of ":" . The title should match and the
> site_id should match any of these 23243455 , 245, 3457676 .
>
>
>
>
>
>
>
>  _
>
> From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
> Sent: Thursday, May 14, 2009 5:43 PM
> To: solr-user@lucene.apache.org; cra...@ceiindia.com
> Subject: Re: Query syntax
>
>
>
> On Thu, May 14, 2009 at 5:20 PM, Radha C.  wrote:
>
> I need to search the multiple values from the same field. I am having the
> following syntax
>
> I am thinking of the first option. Can anyone tell me which one is correct
> syntax?
>
>  Q=+title:=test +site_id:="22 3000676 566644"
>
>  Q=+title:=test +site_id:=22 3000676 566644
>
>  Q=+title:=test +site_id:=22 +site_id=:3000676
>
>
>
>
> None of the above. That ":=" is not a valid syntax. The request parameter
> should be a lower cased "q". The "+" character signifies "must occur"
> similar to a boolean AND.
>
> Should title:test must match? Should all of "22", "3000676" etc be
> present in site_id or just one match is alright?
> --
> Regards,
> Shalin Shekhar Mangar.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: master/slave failure scenario

2009-05-14 Thread nk 11
oh, so the configuration must be manualy changed?
Can't something be passed at (re)start time?

>
>   2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> On Thu, May 14, 2009 at 4:07 PM, nk 11  wrote:
>> > Ok so the VIP will point to the new master. but what makes a slave
>> promoted
>> > to a master? Only the fact that it will receive add/update requests?
>> > And I suppose that this "hot" promotion is possible only if the slave is
>> > convigured as master also...
>> right.. By default you can setup all slaves to be master also. It does
>> not cost anything if it is not serving any requests.
>>
>> so , if you have such a setting you will have to disable that slave to
>> be a slave and restart it and you will have to make the VIP point to
>> this new slave as master.
>>
>> so hot promotion is still not possible.
>>  >
>> > 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>> >>
>> >> ideally , we don't do that.
>> >> you can just keep the master host behind a VIP so if you wish to
>> >> change the master make the VIP point to the new host
>> >>
>> >> On Wed, May 13, 2009 at 10:52 PM, nk 11 
>> wrote:
>> >> > This is more interesting.Such a procedure would involve taking down
>> and
>> >> > reconfiguring the slave?
>> >> >
>> >> > On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
>> >> > wrote:
>> >> >
>> >> >> Or ...
>> >> >>
>> >> >> 1. Promote existing slave to new master
>> >> >> 2. Add new slave to cluster
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> -Bryan
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
>> >> >>
>> >> >>  - Migrate configuration files from old master (or backup) to new
>> >> >> master.
>> >> >>> - Replicate from a slave to the new master.
>> >> >>> - Resume indexing to new master.
>> >> >>>
>> >> >>> -Jay
>> >> >>>
>> >> >>> On Wed, May 13, 2009 at 4:26 AM, nk 11 
>> wrote:
>> >> >>>
>> >> >>>  Nice.
>> >>  What if the master fails permanently (like a disk crash...) and
>> the
>> >>  new
>> >>  master is a clean machine?
>> >>  2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 
>> >> 
>> >>   On Wed, May 13, 2009 at 12:10 PM, nk 11 
>> >>  wrote:
>> >> >
>> >> >> Hello
>> >> >>
>> >> >> I'm kind of new to Solr and I've read about replication, and the
>> >> >> fact
>> >> >>
>> >> > that a
>> >> >
>> >> >> node can act as both master and slave.
>> >> >> I a replica fails and then comes back on line I suppose that it
>> >> >> will
>> >> >>
>> >> > resyncs
>> >> >
>> >> >> with the master.
>> >> >>
>> >> > right
>> >> >
>> >> >>
>> >> >> But what happnes if the master fails? A slave that is configured
>> as
>> >> >>
>> >> > master
>> >> >
>> >> >> will kick in? What if that slave is not yes fully sync'ed with
>> the
>> >> >>
>> >> > failed
>> >> 
>> >> > master and has old data?
>> >> >>
>> >> > if the master fails you can't index the data. but the slaves will
>> >> > continue serving the requests with the last index. You an bring
>> back
>> >> > the master up and resume indexing.
>> >> >
>> >> >
>> >> >> What happens when the original master comes back on line? He
>> will
>> >> >>
>> >> > remain
>> >> 
>> >> > a
>> >> >
>> >> >> slave because there is another node with the master role?
>> >> >>
>> >> >> Thank you!
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > -
>> >> > Noble Paul | Principal Engineer| AOL | http://aol.com
>> >> >
>> >> >
>> >> 
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> -
>> >> Noble Paul | Principal Engineer| AOL | http://aol.com
>> >
>> >
>>
>>
>>
>> --
>>  -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>
>


Re: Solr vs Sphinx

2009-05-14 Thread Grant Ingersoll
Totally agree on optimizing out of the box experience, it's just never  
a one size fits all thing.  And we have to be very careful about micro- 
benchmarks driving these settings.  Currently, many of us use  
Wikipedia, but that's just one doc set and I'd venture to say most  
Solr users do not have docs that look anything like Wikipedia.  One of  
the things the Open Relevance project (http://wiki.apache.org/lucene-java/OpenRelevance 
, see the discussion on gene...@lucene.a.o) should aim to do is bring  
in a variety of test collections, from lots of different genres.  This  
will help both with relevance and with speed testing.


-Grant

On May 14, 2009, at 6:47 AM, Michael McCandless wrote:

On Wed, May 13, 2009 at 12:33 PM, Grant Ingersoll  
 wrote:

I've contacted
others in the past who have done "comparisons" and after one round of
emailing it was almost always clear that they didn't know what best
practices are for any given product and thus were doing things
sub-optimally.


While I agree, one should properly match & tune all apps they are
testing (for a fair comparison), we in turn must set out-of-the-box
defaults (in Lucene and Solr) that get you as close to the "best
practices" as possible.

We don't always do that, and I think we should do better.

My most recent example of this is BooleanQuery's performance.  It
turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable
performance gain (27% on my most recent test) for OR queries.

So why haven't we enabled this by default, already?  (As far as I can
tell it's functionally equivalent, as long as the Collector can accept
out-of-order docs, which our core collectors can).

We can't expect the "other camp" to discover that this obscure setting
must be set, to maximize Lucene's OR query performance.

Mike


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: master/slave failure scenario

2009-05-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
yeah there is a hack
https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708316

On Thu, May 14, 2009 at 6:07 PM, nk 11  wrote:
> sorry for the mail. I wanted to hit reply :(
>
> On Thu, May 14, 2009 at 3:37 PM, nk 11  wrote:
>>
>> oh, so the configuration must be manualy changed?
>> Can't something be passed at (re)start time?
>>
>> 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>>>
>>> On Thu, May 14, 2009 at 4:07 PM, nk 11  wrote:
>>> > Ok so the VIP will point to the new master. but what makes a slave
>>> > promoted
>>> > to a master? Only the fact that it will receive add/update requests?
>>> > And I suppose that this "hot" promotion is possible only if the slave
>>> > is
>>> > convigured as master also...
>>> right.. By default you can setup all slaves to be master also. It does
>>> not cost anything if it is not serving any requests.
>>>
>>> so , if you have such a setting you will have to disable that slave to
>>> be a slave and restart it and you will have to make the VIP point to
>>> this new slave as master.
>>>
>>> so hot promotion is still not possible.
>>> >
>>> > 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>>> >>
>>> >> ideally , we don't do that.
>>> >> you can just keep the master host behind a VIP so if you wish to
>>> >> change the master make the VIP point to the new host
>>> >>
>>> >> On Wed, May 13, 2009 at 10:52 PM, nk 11 
>>> >> wrote:
>>> >> > This is more interesting.Such a procedure would involve taking down
>>> >> > and
>>> >> > reconfiguring the slave?
>>> >> >
>>> >> > On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
>>> >> > wrote:
>>> >> >
>>> >> >> Or ...
>>> >> >>
>>> >> >> 1. Promote existing slave to new master
>>> >> >> 2. Add new slave to cluster
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> -Bryan
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
>>> >> >>
>>> >> >>  - Migrate configuration files from old master (or backup) to new
>>> >> >> master.
>>> >> >>> - Replicate from a slave to the new master.
>>> >> >>> - Resume indexing to new master.
>>> >> >>>
>>> >> >>> -Jay
>>> >> >>>
>>> >> >>> On Wed, May 13, 2009 at 4:26 AM, nk 11 
>>> >> >>> wrote:
>>> >> >>>
>>> >> >>>  Nice.
>>> >>  What if the master fails permanently (like a disk crash...) and
>>> >>  the
>>> >>  new
>>> >>  master is a clean machine?
>>> >>  2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 
>>> >> 
>>> >>   On Wed, May 13, 2009 at 12:10 PM, nk 11 
>>> >>  wrote:
>>> >> >
>>> >> >> Hello
>>> >> >>
>>> >> >> I'm kind of new to Solr and I've read about replication, and
>>> >> >> the
>>> >> >> fact
>>> >> >>
>>> >> > that a
>>> >> >
>>> >> >> node can act as both master and slave.
>>> >> >> I a replica fails and then comes back on line I suppose that it
>>> >> >> will
>>> >> >>
>>> >> > resyncs
>>> >> >
>>> >> >> with the master.
>>> >> >>
>>> >> > right
>>> >> >
>>> >> >>
>>> >> >> But what happnes if the master fails? A slave that is
>>> >> >> configured as
>>> >> >>
>>> >> > master
>>> >> >
>>> >> >> will kick in? What if that slave is not yes fully sync'ed with
>>> >> >> the
>>> >> >>
>>> >> > failed
>>> >> 
>>> >> > master and has old data?
>>> >> >>
>>> >> > if the master fails you can't index the data. but the slaves
>>> >> > will
>>> >> > continue serving the requests with the last index. You an bring
>>> >> > back
>>> >> > the master up and resume indexing.
>>> >> >
>>> >> >
>>> >> >> What happens when the original master comes back on line? He
>>> >> >> will
>>> >> >>
>>> >> > remain
>>> >> 
>>> >> > a
>>> >> >
>>> >> >> slave because there is another node with the master role?
>>> >> >>
>>> >> >> Thank you!
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >> > --
>>> >> > -
>>> >> > Noble Paul | Principal Engineer| AOL | http://aol.com
>>> >> >
>>> >> >
>>> >> 
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> -
>>> >> Noble Paul | Principal Engineer| AOL | http://aol.com
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Custom Servlet Filter, Where to put filter-mappings

2009-05-14 Thread Erik Hatcher

I like Grant's suggestion as the simplest solution.

As for XML merging and XSLT, I really wouldn't want to go that route  
personally, but one solution that comes close to that is to template  
web.xml with some substitution tags and use Ant's ability to replace  
tokens.  So we could put in @FILTER@ and @FILTER_MAPPING@ placeholders  
in web.xml and pull in the replacements from fragment files.  But even  
with all of these fancy options available, I'd still just use the  
alternate web.xml technique that Grant proposed.


Erik


On May 13, 2009, at 10:55 PM, Jacob Singh wrote:


HI Grant,

That's not a bad idea... I could try that.  I was also looking at  
cactus:

http://jakarta.apache.org/cactus/integration/ant/index.html

It has an ant task to merge XML.  Could this be a contrib-crawl add- 
on?


Alternately, do you know of any xslt templates built for this?  Could
write one, but that's a fair bit of work to support everything.
Perhaps an xslt task combined with a contrib-crawl would do the trick?

Best,
-J

On Wed, May 13, 2009 at 6:07 PM, Grant Ingersoll  
 wrote:
Hmmm, maybe we need to think about someway to hook this into the  
build
process or make it easier to just drop it into the conf or lib  
dirs.  I'm no
web.xml expert, but I'm sure you're not the first one to want to do  
this

kind of thing.

The easiest way _might_ be to patch build.xml to take a property  
for the
location of the web.xml, defaulting to the current Solr one.  Then,  
people
who want to use their own version could just pass in - 
Dweb.xml=web.xml>.  The downside to this is that it may cause problems for  
us devs
when users ask questions about strange behavior and it turns out  
they have

mucked up the web.xml

FYI: dist-war is in build.xml, not common-build.xml.

-Grant

On May 12, 2009, at 5:52 AM, Jacob Singh wrote:


Hi folks,

I just wrote a Servlet Filter to handle authentication for our
service.  Here's what I did:

1. Created a dir in contrib
2. Put my project in there, I took the dataimporthandler build.xml  
as

an example and modified it to suit my needs.  Worked great!
3. ant dist now builds my jar and includes it

I now need to modify web.xml to add my filter-mapping, init params,
etc.  How can I do this cleanly?  Or do I need to manually open up  
the

archive and edit it and then re-war it?

In common-build I don't see a target for dist-war, so don't see  
how it

is possible...

Thanks!
Jacob

--

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using

Solr/Lucene:
http://www.lucidimagination.com/search






--

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com




Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
Solr already supports this .
please refer this
http://wiki.apache.org/solr/CoreAdmin#head-7ca1b98a9df8b8ca0dcfbfc49940ed5ac98c4a08

ensure that your solr.xml is persistent
http://wiki.apache.org/solr/CoreAdmin#head-7508c24c6e2dadad2dfea39b2fba045062481da8

On Thu, May 14, 2009 at 3:43 PM, KK  wrote:
> Thank you very much. Got the point.
> One off the track question, can we automate the creation of new cores[it
> requires manually editing the solr.xml file as I know, and what about the
> location of core index directory, do we need to point that manually as
> well].
> After going through the wiki what I found is we've to mention the names of
> cores in solr.xml. I want to automate the process in such a way that when a
> user registers[ on say my site for the service], we'll create a coresponding
> core for the same user and with a specific core id[unique for this user
> only] so that the user will be given a search interface that will redirect
> all searches for this user to http://host:port/ user>/select
> Will apprecite any ideas on this.
>
> Thanks,
> KK.
>
> 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> there is no hard limit on the no:of cores. it is limited by your
>> system's ability to open files and the resources.
>> the queries are automatically sent to appropriate core if your url is
>>
>> htt://host:port//select
>>
>> On Thu, May 14, 2009 at 1:58 PM, KK  wrote:
>> > I want to know the maximum no of cores supported by Solr. 1000s or may be
>> > millions all under one solr instance ?
>> > Also I want to know how to redirect a particular query to a particular
>> core.
>> > Actually I'm querying solr from Ajax, so I think there must be some
>> request
>> > parameter that says which core we want to query, right? Can some one tell
>> me
>> > how to do this, any good pointers on the same will be helpful as well.
>> > Thank you.
>> >
>> > --kk
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Solr vs Sphinx

2009-05-14 Thread Marvin Humphrey
On Thu, May 14, 2009 at 06:47:01AM -0400, Michael McCandless wrote:
> While I agree, one should properly match & tune all apps they are
> testing (for a fair comparison), we in turn must set out-of-the-box
> defaults (in Lucene and Solr) that get you as close to the "best
> practices" as possible.

So, should Lucene use the non-compound file format by default because some
idiot's sloppy benchmarks might run a smidge faster, even though that will
cause many users to run out of file descriptors?

Anyone doing comparative benchmarking who doesn't submit their code to the
support list for the software under review is either a dolt or a propagandist.

Good benchmarking is extremely difficult, like all experimental science.  If
there isn't ample evidence that the benchmarker appreciates that, their tests
aren't worth a second thought.  If you don't avail yourself of the help of
experts when assembling your experiment, you are unserious.

Richard Feynman:

"...if you're doing an experiment, you should report everything that you
think might make it invalid - not only what you think is right about it:
other causes that could possibly explain your results; and things you
thought of that you've eliminated by some other experiment, and how they
worked - to make sure the other fellow can tell they have been eliminated."

Marvin Humphrey



Re: master/slave failure scenario

2009-05-14 Thread nk 11
wow! that was just a couple of days old!
thanks as lot!
2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 

> yeah there is a hack
>
> https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708316
>
> On Thu, May 14, 2009 at 6:07 PM, nk 11  wrote:
> > sorry for the mail. I wanted to hit reply :(
> >
> > On Thu, May 14, 2009 at 3:37 PM, nk 11  wrote:
> >>
> >> oh, so the configuration must be manualy changed?
> >> Can't something be passed at (re)start time?
> >>
> >> 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
> >>>
> >>> On Thu, May 14, 2009 at 4:07 PM, nk 11  wrote:
> >>> > Ok so the VIP will point to the new master. but what makes a slave
> >>> > promoted
> >>> > to a master? Only the fact that it will receive add/update requests?
> >>> > And I suppose that this "hot" promotion is possible only if the slave
> >>> > is
> >>> > convigured as master also...
> >>> right.. By default you can setup all slaves to be master also. It does
> >>> not cost anything if it is not serving any requests.
> >>>
> >>> so , if you have such a setting you will have to disable that slave to
> >>> be a slave and restart it and you will have to make the VIP point to
> >>> this new slave as master.
> >>>
> >>> so hot promotion is still not possible.
> >>> >
> >>> > 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
> >>> >>
> >>> >> ideally , we don't do that.
> >>> >> you can just keep the master host behind a VIP so if you wish to
> >>> >> change the master make the VIP point to the new host
> >>> >>
> >>> >> On Wed, May 13, 2009 at 10:52 PM, nk 11 
> >>> >> wrote:
> >>> >> > This is more interesting.Such a procedure would involve taking
> down
> >>> >> > and
> >>> >> > reconfiguring the slave?
> >>> >> >
> >>> >> > On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
> >>> >> > wrote:
> >>> >> >
> >>> >> >> Or ...
> >>> >> >>
> >>> >> >> 1. Promote existing slave to new master
> >>> >> >> 2. Add new slave to cluster
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> -Bryan
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
> >>> >> >>
> >>> >> >>  - Migrate configuration files from old master (or backup) to new
> >>> >> >> master.
> >>> >> >>> - Replicate from a slave to the new master.
> >>> >> >>> - Resume indexing to new master.
> >>> >> >>>
> >>> >> >>> -Jay
> >>> >> >>>
> >>> >> >>> On Wed, May 13, 2009 at 4:26 AM, nk 11 
> >>> >> >>> wrote:
> >>> >> >>>
> >>> >> >>>  Nice.
> >>> >>  What if the master fails permanently (like a disk crash...) and
> >>> >>  the
> >>> >>  new
> >>> >>  master is a clean machine?
> >>> >>  2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 
> >>> >> 
> >>> >>   On Wed, May 13, 2009 at 12:10 PM, nk 11 <
> nick.cass...@gmail.com>
> >>> >>  wrote:
> >>> >> >
> >>> >> >> Hello
> >>> >> >>
> >>> >> >> I'm kind of new to Solr and I've read about replication, and
> >>> >> >> the
> >>> >> >> fact
> >>> >> >>
> >>> >> > that a
> >>> >> >
> >>> >> >> node can act as both master and slave.
> >>> >> >> I a replica fails and then comes back on line I suppose that
> it
> >>> >> >> will
> >>> >> >>
> >>> >> > resyncs
> >>> >> >
> >>> >> >> with the master.
> >>> >> >>
> >>> >> > right
> >>> >> >
> >>> >> >>
> >>> >> >> But what happnes if the master fails? A slave that is
> >>> >> >> configured as
> >>> >> >>
> >>> >> > master
> >>> >> >
> >>> >> >> will kick in? What if that slave is not yes fully sync'ed
> with
> >>> >> >> the
> >>> >> >>
> >>> >> > failed
> >>> >> 
> >>> >> > master and has old data?
> >>> >> >>
> >>> >> > if the master fails you can't index the data. but the slaves
> >>> >> > will
> >>> >> > continue serving the requests with the last index. You an
> bring
> >>> >> > back
> >>> >> > the master up and resume indexing.
> >>> >> >
> >>> >> >
> >>> >> >> What happens when the original master comes back on line? He
> >>> >> >> will
> >>> >> >>
> >>> >> > remain
> >>> >> 
> >>> >> > a
> >>> >> >
> >>> >> >> slave because there is another node with the master role?
> >>> >> >>
> >>> >> >> Thank you!
> >>> >> >>
> >>> >> >>
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > -
> >>> >> > Noble Paul | Principal Engineer| AOL | http://aol.com
> >>> >> >
> >>> >> >
> >>> >> 
> >>> >> >>
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> -
> >>> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> -
> >>> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>
> >
> >
>
>
>
> --
>  

RE: Autocommit blocking adds? AutoCommit Speedup?

2009-05-14 Thread jayson.minard

Siddharth,

The settings you have in your solrconfig for ramBufferSizeMB and
maxBufferedDocs control how much memory may be used during indexing besides
any overhead with the documents being "in-flight" at a given moment
(deserialized into memory but not yet handed to lucene).  There are
streaming versions of the client/server that help with that as well by
trying to process them as they arrive.

The patch SOLR-1155 does not add more memory use, but rather lets the
threads proceed through to Lucene without blocking within Solr as often.  So
instead of a stuck thread holding the documents in memory they will be
moving threads doing the same.

So the buffer sizes mentioned above along with the amount of documents you
send at a time will push your memory footprint.  Send smaller batches (less
efficient) or stream; or make sure you have enough memory for the amount of
docs you send at a time.  

For indexing I slow my commits down if there is no need for the documents to
become available for query right away.  For pure indexing, a long autoCommit
time and large max document count ebfore auto committing helps.  Committing
isn't what flushes them out of memory, it is what makes the on-disk version
part of the overall index.  Over committing will slow you way down. 
Especially if you have any listeners on the commits doing a lot of work
(i.e. Solr distribution).

Also, if you are querying on the indexer that can eat memory and compete
with the memory you are trying to reserve for indexing.  So a split model of
indexing and querying on different instances lets you tune each the best;
but then you have a gap in time from indexing to querying as the trade-off.

It is hard to say what is going on with GC without knowing what garbage
collection settings you are passing to the VM, and what version of the Java
VM you are using.  Which garbage collector are you using and what tuning
parameters?

I tend to use Parallel GC on my indexers with GC Overhead limit turned off
allowing for some pauses (which users don't see on a back-end indexer) but
good GC with lower heap fragmentation.  I tend to use concurrent mark and
sweep GC on my query slaves with tuned incremental mode and pacing which is
a low pause collector taking advantage of the cores on my servers and can
incrementally keep up with the needs of a query slave.

-- Jayson


Gargate, Siddharth wrote:
> 
> Hi all,
>   I am also facing the same issue where autocommit blocks all
> other requests. I having around 1,00,000 documents with average size of
> 100K each. It took more than 20 hours to index. 
> I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25.
> Do I need more configuration changes?
> Also I see that memory usage goes to peak level of heap specified(6 GB
> in my case). Looks like Solr spends most of the time in GC. 
> According to my understanding, fix for Solr-1155 would be that commit
> will run in background and new documents will be queued in the memory.
> But I am afraid of the memory consumption by this queue if commit takes
> much longer to complete.
> 
> Thanks,
> Siddharth
> 
> 
-- 
View this message in context: 
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23540569.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-14 Thread jayson.minard

Indexing speed comes down to a lot of factors.  The settings as talked about
above, VM settings, the size of the documents, how many are sent at a time,
how active you can keep the indexer (i.e. one thread sending documents lets
the indexer relax whereas N threads keeps pressure on the indexer), how
often you commit and of course the hardware you are running on.  Disk I/O is
a big factor along with having enough cores and memory to buffer and process
the documents.

Comparing two sets of numbers is tough.  We have indexes that range from
indexing a few million an hour up through 18-20M per hour in a indexing
cluster for distributed search.

--j


Jack Godwin wrote:
> 
> 20+ hours? I index 3 million records in 3 hours.  Is your auto commit
> causing a snapshot?  What do you have listed in the events.
> 
> Jack
> 
> On 5/14/09, Gargate, Siddharth  wrote:
>> Hi all,
>>  I am also facing the same issue where autocommit blocks all
>> other requests. I having around 1,00,000 documents with average size of
>> 100K each. It took more than 20 hours to index.
>> I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25.
>> Do I need more configuration changes?
>> Also I see that memory usage goes to peak level of heap specified(6 GB
>> in my case). Looks like Solr spends most of the time in GC.
>> According to my understanding, fix for Solr-1155 would be that commit
>> will run in background and new documents will be queued in the memory.
>> But I am afraid of the memory consumption by this queue if commit takes
>> much longer to complete.
>>
>> Thanks,
>> Siddharth
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23540643.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr vs Sphinx

2009-05-14 Thread Michael McCandless
On Thu, May 14, 2009 at 6:51 AM, Andrey Klochkov
 wrote:

> Can you please point me to some information concerning allowDocsOutOfOrder?
> What's this at all?

There is this cryptic static setter (in Lucene):

  BooleanQuery.setAllowDocsOutOfOrder(boolean)

It defaults to false, which means BooleanScorer2 will always be used
to compute hits for a BooleanQuery.  When set to true, BooleanScorer
will instead be used, when possible.  BooleanScorer gets better
performance, but it collects docs out of order, which for some
external collectors might cause a problem.

All of Lucene's core collectors work fine with out-of-order collection
(but I'm not sure about Solr's collectors).

If you experiment with this, please post back with your results!

Mike


Additional metadata when using Solr Cell

2009-05-14 Thread rossputin

Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My curl
post has a URL like:

../solr/update/extract?ext.idx.attr=true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody

Sure enough I see in the server logs:

params={ext.def.fl=text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody}

I am trying to get my field back in the results from a query:

../solr/select?indent=on&version=2.2&q=hello&start=0&rows=10&fl=author%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=

I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this field?

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread KK
Thank you very much. LOL, Its in the same wiki I was told to go through.
I've a question regarding creating ofsolr cores on the fly. The wiki says,

.Creates a new core and register it. If persistence is enabled
(persist=true), the configuration for this new core will be saved in
'solr.xml'. If a core with the same name exists, while the "new" created
core is initializing, the "old" one will continue to accept requests. Once
it has finished, all new request will go to the "new" core, and the "old"
core will be unloaded.

So I've to wait for some time [say a couple of secs, may be less than that]
before I start adding pages to that core. I think this is the way to handle
it , otherwise some content which should have been indexed by the new core,
will get indexed by the existing core[as the wiki says], which I don't want
to happen. Any other ideas for handling the same.


Thanks,
KK.

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 

> Solr already supports this .
> please refer this
>
> http://wiki.apache.org/solr/CoreAdmin#head-7ca1b98a9df8b8ca0dcfbfc49940ed5ac98c4a08
>
> ensure that your solr.xml is persistent
>
> http://wiki.apache.org/solr/CoreAdmin#head-7508c24c6e2dadad2dfea39b2fba045062481da8
>
> On Thu, May 14, 2009 at 3:43 PM, KK  wrote:
> > Thank you very much. Got the point.
> > One off the track question, can we automate the creation of new cores[it
> > requires manually editing the solr.xml file as I know, and what about the
> > location of core index directory, do we need to point that manually as
> > well].
> > After going through the wiki what I found is we've to mention the names
> of
> > cores in solr.xml. I want to automate the process in such a way that when
> a
> > user registers[ on say my site for the service], we'll create a
> coresponding
> > core for the same user and with a specific core id[unique for this user
> > only] so that the user will be given a search interface that will
> redirect
> > all searches for this user to http://host:port/ this
> > user>/select
> > Will apprecite any ideas on this.
> >
> > Thanks,
> > KK.
> >
> > 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
> >
> >> there is no hard limit on the no:of cores. it is limited by your
> >> system's ability to open files and the resources.
> >> the queries are automatically sent to appropriate core if your url is
> >>
> >> htt://host:port//select
> >>
> >> On Thu, May 14, 2009 at 1:58 PM, KK  wrote:
> >> > I want to know the maximum no of cores supported by Solr. 1000s or may
> be
> >> > millions all under one solr instance ?
> >> > Also I want to know how to redirect a particular query to a particular
> >> core.
> >> > Actually I'm querying solr from Ajax, so I think there must be some
> >> request
> >> > parameter that says which core we want to query, right? Can some one
> tell
> >> me
> >> > how to do this, any good pointers on the same will be helpful as well.
> >> > Thank you.
> >> >
> >> > --kk
> >> >
> >>
> >>
> >>
> >> --
> >> -
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


Re: Custom Servlet Filter, Where to put filter-mappings

2009-05-14 Thread Jacob Singh
I found a very elegant (I think) solution to this.

I'll post a patch today or tomorrow.

Best,
-Jacob

On Thu, May 14, 2009 at 6:22 PM, Erik Hatcher
 wrote:
> I like Grant's suggestion as the simplest solution.
>
> As for XML merging and XSLT, I really wouldn't want to go that route
> personally, but one solution that comes close to that is to template web.xml
> with some substitution tags and use Ant's ability to replace tokens.  So we
> could put in @FILTER@ and @FILTER_MAPPING@ placeholders in web.xml and pull
> in the replacements from fragment files.  But even with all of these fancy
> options available, I'd still just use the alternate web.xml technique that
> Grant proposed.
>
>        Erik
>
>
> On May 13, 2009, at 10:55 PM, Jacob Singh wrote:
>
>> HI Grant,
>>
>> That's not a bad idea... I could try that.  I was also looking at cactus:
>> http://jakarta.apache.org/cactus/integration/ant/index.html
>>
>> It has an ant task to merge XML.  Could this be a contrib-crawl add-on?
>>
>> Alternately, do you know of any xslt templates built for this?  Could
>> write one, but that's a fair bit of work to support everything.
>> Perhaps an xslt task combined with a contrib-crawl would do the trick?
>>
>> Best,
>> -J
>>
>> On Wed, May 13, 2009 at 6:07 PM, Grant Ingersoll 
>> wrote:
>>>
>>> Hmmm, maybe we need to think about someway to hook this into the build
>>> process or make it easier to just drop it into the conf or lib dirs.  I'm
>>> no
>>> web.xml expert, but I'm sure you're not the first one to want to do this
>>> kind of thing.
>>>
>>> The easiest way _might_ be to patch build.xml to take a property for the
>>> location of the web.xml, defaulting to the current Solr one.  Then,
>>> people
>>> who want to use their own version could just pass in -Dweb.xml=>> my
>>> web.xml>.  The downside to this is that it may cause problems for us devs
>>> when users ask questions about strange behavior and it turns out they
>>> have
>>> mucked up the web.xml
>>>
>>> FYI: dist-war is in build.xml, not common-build.xml.
>>>
>>> -Grant
>>>
>>> On May 12, 2009, at 5:52 AM, Jacob Singh wrote:
>>>
 Hi folks,

 I just wrote a Servlet Filter to handle authentication for our
 service.  Here's what I did:

 1. Created a dir in contrib
 2. Put my project in there, I took the dataimporthandler build.xml as
 an example and modified it to suit my needs.  Worked great!
 3. ant dist now builds my jar and includes it

 I now need to modify web.xml to add my filter-mapping, init params,
 etc.  How can I do this cleanly?  Or do I need to manually open up the
 archive and edit it and then re-war it?

 In common-build I don't see a target for dist-war, so don't see how it
 is possible...

 Thanks!
 Jacob

 --

 +1 510 277-0891 (o)
 +91  33 7458 (m)

 web: http://pajamadesign.com

 Skype: pajamadesign
 Yahoo: jacobsingh
 AIM: jacobsingh
 gTalk: jacobsi...@gmail.com
>>>
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>
>>
>>
>> --
>>
>> +1 510 277-0891 (o)
>> +91  33 7458 (m)
>>
>> web: http://pajamadesign.com
>>
>> Skype: pajamadesign
>> Yahoo: jacobsingh
>> AIM: jacobsingh
>> gTalk: jacobsi...@gmail.com
>
>



-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: Additional metadata when using Solr Cell

2009-05-14 Thread Grant Ingersoll

what does /admin/luke show for fields and terms in the fields?

On May 14, 2009, at 10:03 AM, rossputin wrote:



Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My  
curl

post has a URL like:

../solr/update/extract? 
ext 
.idx 
.attr 
=true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody


Sure enough I see in the server logs:

params 
= 
{ext 
.def 
.fl 
= 
text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody}


I am trying to get my field back in the results from a query:

../solr/select? 
indent=on&version=2.2&q=hello&start=0&rows=10&fl=author 
%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=


I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this  
field?


Thanks in advance for your help,

-- Ross
--
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Additional metadata when using Solr Cell

2009-05-14 Thread rossputin

There is no reference to the author field I am trying to set.. I am using the
latest nightly download.

 -- Ross


Grant Ingersoll-6 wrote:
> 
> what does /admin/luke show for fields and terms in the fields?
> 
> On May 14, 2009, at 10:03 AM, rossputin wrote:
> 
>>
>> Hi.
>>
>> I am indexing a PDF document with the ExtractingRequestHandler.  My  
>> curl
>> post has a URL like:
>>
>> ../solr/update/extract? 
>> ext 
>> .idx 
>> .attr 
>> =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody
>>
>> Sure enough I see in the server logs:
>>
>> params 
>> = 
>> {ext 
>> .def 
>> .fl 
>> = 
>> text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody}
>>
>> I am trying to get my field back in the results from a query:
>>
>> ../solr/select? 
>> indent=on&version=2.2&q=hello&start=0&rows=10&fl=author 
>> %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=
>>
>> I see the score in the results 'doc' but no reference to author.
>>
>> Can anyone advise on what I am forgetting to do, to get hold of this  
>> field?
>>
>> Thanks in advance for your help,
>>
>> -- Ross
>> -- 
>> View this message in context:
>> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Additional metadata when using Solr Cell

2009-05-14 Thread Grant Ingersoll

Do you have an author field in your schema?

On May 14, 2009, at 10:31 AM, rossputin wrote:



There is no reference to the author field I am trying to set.. I am  
using the

latest nightly download.

-- Ross


Grant Ingersoll-6 wrote:


what does /admin/luke show for fields and terms in the fields?

On May 14, 2009, at 10:03 AM, rossputin wrote:



Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My
curl
post has a URL like:

../solr/update/extract?
ext
.idx
.attr
=true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody

Sure enough I see in the server logs:

params
=
{ext
.def
.fl
=
text 
&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody}


I am trying to get my field back in the results from a query:

../solr/select?
indent=on&version=2.2&q=hello&start=0&rows=10&fl=author
%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=

I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this
field?

Thanks in advance for your help,

-- Ross
--
View this message in context:
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search





--
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Additional metadata when using Solr Cell

2009-05-14 Thread rossputin

There is now, thanks for your help.  On the same topic.. is there a best
practice for modifying schema, in a future-proof way ?

 -- Ross



Grant Ingersoll-6 wrote:
> 
> Do you have an author field in your schema?
> 
> On May 14, 2009, at 10:31 AM, rossputin wrote:
> 
>>
>> There is no reference to the author field I am trying to set.. I am  
>> using the
>> latest nightly download.
>>
>> -- Ross
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>> what does /admin/luke show for fields and terms in the fields?
>>>
>>> On May 14, 2009, at 10:03 AM, rossputin wrote:
>>>

 Hi.

 I am indexing a PDF document with the ExtractingRequestHandler.  My
 curl
 post has a URL like:

 ../solr/update/extract?
 ext
 .idx
 .attr
 =true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody

 Sure enough I see in the server logs:

 params
 =
 {ext
 .def
 .fl
 =
 text 
 &ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody}

 I am trying to get my field back in the results from a query:

 ../solr/select?
 indent=on&version=2.2&q=hello&start=0&rows=10&fl=author
 %2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=

 I see the score in the results 'doc' but no reference to author.

 Can anyone advise on what I am forgetting to do, to get hold of this
 field?

 Thanks in advance for your help,

 -- Ross
 -- 
 View this message in context:
 http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
 Sent from the Solr - User mailing list archive at Nabble.com.

>>>
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>> using Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23542620.html
Sent from the Solr - User mailing list archive at Nabble.com.



AW: AW: Geographical search based on latitude and longitude

2009-05-14 Thread Norman Leutner
Hi Grant, thanks for the reply.

Is the logic for a function query that calculates distances that Yonik 
mentioned (gdist(position,101.2,234.3))
already implemented?

This could be either very inaccurate or load intense.

If the logic isn't done until now maybe I can prepare it.

Norman


-Ursprüngliche Nachricht-
Von: Grant Ingersoll [mailto:gsing...@apache.org] 
Gesendet: Dienstag, 12. Mai 2009 19:43
An: solr-user@lucene.apache.org
Betreff: Re: AW: Geographical search based on latitude and longitude

Yes, that is part of it, but there is more to it.  See Yonik's comment  
about needs further down.


On May 12, 2009, at 7:36 AM, Norman Leutner wrote:

> So are you using boundary box to find results within a given range(km)
> like mentioned here: 
> http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html 
>  ?
>
>
> Best regards
>
> Norman Leutner
> all2e GmbH
>
> -Ursprüngliche Nachricht-
> Von: Grant Ingersoll [mailto:gsing...@apache.org]
> Gesendet: Dienstag, 12. Mai 2009 13:18
> An: solr-user@lucene.apache.org
> Betreff: Re: Geographical search based on latitude and longitude
>
> See https://issues.apache.org/jira/browse/SOLR-773.  In other words,
> we're working on it and would love some help!
>
> -Grant
>
> On May 12, 2009, at 7:12 AM, Norman Leutner wrote:
>
>> Hi together,
>>
>> I'm new to Solr and want to port a geographical range search from
>> MySQL to Solr.
>>
>> Currently I'm using some mathematical functions (based on GRS80
>> modell) directly within MySQL to calculate
>> the actual distance from the locations within the database to a
>> current location (lat and long are known):
>>
>> $query=SELECT street, zip, city, state, country, ".
>> $radius."*ACOS(cos(RADIANS(latitude))*cos(".
>> $theta.")*(sin(RADIANS(longitude))*sin(".$phi.")
>> +cos(RADIANS(longitude))*cos(".$phi."))+sin(RADIANS(latitude))*sin(".
>> $theta.")) AS Distance FROM ezgis_position WHERE ".
>> $radius."*ACOS(cos(RADIANS(latitude))*cos(".
>> $theta.")*(sin(RADIANS(longitude))*sin(".$phi.")
>> +cos(RADIANS(longitude))*cos(".$phi."))+sin(RADIANS(latitude))*sin(".
>> $theta.")) <= ".$range." ORDER BY Distance";
>>
>> This works pretty fine and fast. Due to we want to include this
>> within our Solr search result I would like to have a attribute like
>> "actual_distance" within the result. Is there a way to use those
>> functions like (radians, sin, acos,...) directly within Solr?
>>
>> Thanks in advance for any feedback
>> Norman Leutner
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search



Re: Solr vs Sphinx

2009-05-14 Thread gdeconto



Yonik Seeley-2 wrote:
> 
> It's probably the case that every search engine out there is faster
> than Solr at one thing or another, and that Solr is faster or better
> at some other things.
> 
> I prefer to spend my time improving Solr rather than engage in
> benchmarking wars... and Solr 1.4 will have a ton of speed
> improvements over Solr 1.3.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 

Solr is very fast even with 1.3 and the developers have done an incredible
job.

However, maybe the next Solr improvement should be the creation of a
configuration manager and/or automated tuning tool.  I know that optimizing
Solr performance can be time consuming and sometimes frustrating.


-- 
View this message in context: 
http://www.nabble.com/Solr-vs-Sphinx-tp23524676p23544492.html
Sent from the Solr - User mailing list archive at Nabble.com.



CommonsHttpSolrServer vs EmbeddedSolrServer

2009-05-14 Thread sachin78

What is the difference between EmbeddedSolrServer and CommonsHttpSolrServer.
Which is the preferred server to use?

In some blog i read that EmbeddedSolrServer  is 50% faster than
CommonsHttpSolrServer,then why do we need to use CommonsHttpSolrServer.

Can anyone please guide me the right path/way.So that i pick the right
implementation.

Thanks in advance.

--Sachin
-- 
View this message in context: 
http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Replication master+slave

2009-05-14 Thread Bryan Talbot

https://issues.apache.org/jira/browse/SOLR-1167



-Bryan




On May 13, 2009, at May 13, 7:20 PM, Otis Gospodnetic wrote:



Bryan, maybe it's time to stick this in JIRA?
http://wiki.apache.org/solr/HowToContribute

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Bryan Talbot 
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 10:11:21 PM
Subject: Re: Replication master+slave

I think the patch I included earlier covers solr core, but it looks  
like at
least some other extensions (DIH) create and use their own XML  
parser.  So, if
this functionality is to extend to all XML files, those will need  
similar

patches.

Here's one for DIH:

--- src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java

(revision 774137)
+++ src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java  (working

copy)
@@ -148,8 +148,10 @@
  void loadDataConfig(String configFile) {

try {
-  DocumentBuilder builder = DocumentBuilderFactory.newInstance()
-  .newDocumentBuilder();
+  DocumentBuilderFactory dbf =  
DocumentBuilderFactory.newInstance();

+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  DocumentBuilder builder = dbf.newDocumentBuilder();
  Document document = builder.parse(new InputSource(new  
StringReader(

  configFile)));



The only down side I can see to this is it doesn't offer very  
expressive
conditional inclusion: the file is included if it's present  
otherwise fallback
inclusions can be used.  It's also specific to XML files and  
obviously won't
work for other types of configuration files.  However, it is simple  
and

effective.


-Bryan




On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote:



Coincidentally, from
http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ 
 :


"Hadoop configuration files now support XInclude elements for  
including
portions of another configuration file (HADOOP-4944). This  
mechanism allows you

to make configuration files more modular and reusable."


So "others are doing it, too".

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Bryan Talbot
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 11:26:41 AM
Subject: Re: Replication master+slave

I see that Nobel's final comment in SOLR-1154 is that config  
files need to be
able to include snippets from external files.  In my limited  
testing, a

simple

patch to enable XInclude support seems to work.



--- src/java/org/apache/solr/core/Config.java   (revision 774137)
+++ src/java/org/apache/solr/core/Config.java   (working copy)
@@ -100,8 +100,10 @@
if (lis == null) {
  lis = loader.openConfig(name);
}
-  javax.xml.parsers.DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
-  doc = builder.parse(lis);
+  javax.xml.parsers.DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  doc = dbf.newDocumentBuilder().parse(lis);

  DOMUtil.substituteProperties(doc, loader.getCoreProperties());
} catch (ParserConfigurationException e)  {



This allows a clause like this to include the contents of  
replication.xml if

it

exists.  If it's not found an exception will be thrown.



href="http://localhost:8983/solr/corename/admin/file/?file=replication.xml 
"

   xmlns:xi="http://www.w3.org/2001/XInclude";>



If the file is optional and no exception should be thrown if the  
file is
missing, simply include a fallback action: in this case the  
fallback is empty

and does nothing.



href="http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml 
"

   xmlns:xi="http://www.w3.org/2001/XInclude";>




-Bryan




On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:

I was looking at the same problem, and had a discussion with  
Noble. You can

use a hack to achieve what you want, see

https://issues.apache.org/jira/browse/SOLR-1154

Thanks,

Jianhan


On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:

So how are people managing solrconfig.xml files which are  
largely the same

other than differences for replication?

I don't think it's a "good thing" to maintain two copies of the  
same file
and I'd like to avoid that.  Maybe enabling the XInclude  
feature in
DocumentBuilders would make it possible to modularize  
configuration files

to

make this possible?






http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)



-Bryan





On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar  
wrote:


On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot

wrote:


For replication in 1.4, the wiki at
http://wiki.apache.org/solr/SolrReplication says that a node  
can be both

the master and a slave:

A node can act as both master and slave. In that case both  

Powered by Solr

2009-05-14 Thread Terence Gannon
I was intending to make an entry to the 'Powered by Solr' page, so I
created a Wiki account and logged in.  When I go to that page, it
shows it as being 'immutable', which I take as meaning I can't edit
it.  Is there someone I can send the information to who can do the
edit?  Or perhaps there is some sort of trick to editing that page?
Thanks for your help, and apologies in advance if this is a silly
question...

Terence


Re: Powered by Solr

2009-05-14 Thread Yonik Seeley
On Thu, May 14, 2009 at 1:54 PM, Terence Gannon  wrote:
> I was intending to make an entry to the 'Powered by Solr' page, so I
> created a Wiki account and logged in.  When I go to that page, it
> shows it as being 'immutable', which I take as meaning I can't edit
> it.

Did you try hitting refresh on your browser after you logged in?

-Yonik
http://www.lucidimagination.com


Re: Solr vs Sphinx

2009-05-14 Thread Michael McCandless
On Thu, May 14, 2009 at 9:07 AM, Marvin Humphrey  wrote:
> Richard Feynman:
>
>"...if you're doing an experiment, you should report everything that you
>think might make it invalid - not only what you think is right about it:
>other causes that could possibly explain your results; and things you
>thought of that you've eliminated by some other experiment, and how they
>worked - to make sure the other fellow can tell they have been eliminated."

Excellent quote!

> So, should Lucene use the non-compound file format by default because some
> idiot's sloppy benchmarks might run a smidge faster, even though that will
> cause many users to run out of file descriptors?

No, I don't think we should change that default.

Nor (for example) can we switch to SweetSpotSimilarity by default,
even though it seems to improve relevance, because it requires
app-dependent configuration.

Nor should we set IndexWriter's RAM buffer to 1 GB.  Etc.

But when there is a choice that has near zero downside and improves
performance (like my example), we should make the switch.

Making IndexReader.open return a readOnly reader is another example
(... which we plan to do in 3.0).

Every time Lucene or Solr has a default built-in setting, we should
think carefully about how to set it.

> Anyone doing comparative benchmarking who doesn't submit their code to the
> support list for the software under review is either a dolt or a propagandist.
>
> Good benchmarking is extremely difficult, like all experimental science.  If
> there isn't ample evidence that the benchmarker appreciates that, their tests
> aren't worth a second thought.  If you don't avail yourself of the help of
> experts when assembling your experiment, you are unserious.

Agreed.

Mike


Re: Solr memory requirements?

2009-05-14 Thread vivek sar
I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson  wrote:
> Warning: I'm wy out of my competency range when I comment
> on SOLR, but I've seen the statement that string fields are NOT
> tokenized while text fields are, and I notice that almost all of your fields
> are string type.
>
> Would someone more knowledgeable than me care to comment on whether
> this is at all relevant? Offered in the spirit that sometimes there are
> things
> so basic that only an amateur can see them 
>
> Best
> Erick
>
> On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:
>
>> Thanks Otis.
>>
>> Our use case doesn't require any sorting or faceting. I'm wondering if
>> I've configured anything wrong.
>>
>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> stored). All my fields are basic data type - which I thought are not
>> sorted. My id field is unique key.
>>
>> Is there any field here that might be getting sorted?
>>
>>  > required="true" omitNorms="true" compressed="false"/>
>>
>>   > compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > default="NOW/HOUR"  compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > compressed="false"/>
>>   > compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > compressed="false"/>
>>   > compressed="false"/>
>>   > compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > compressed="false"/>
>>   > default="NOW/HOUR" omitNorms="true"/>
>>
>>
>>   
>>   > omitNorms="true" multiValued="true"/>
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>>  wrote:
>> >
>> > Hi,
>> > Some answers:
>> > 1) .tii files in the Lucene index.  When you sort, all distinct values
>> for the field(s) used for sorting.  Similarly for facet fields.  Solr
>> caches.
>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
>> consume during indexing.  There is no need to commit every 50K docs unless
>> you want to trigger snapshot creation.
>> > 3) see 1) above
>> >
>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
>> going to fly. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar 
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> >> Subject: Solr memory requirements?
>> >>
>> >> Hi,
>> >>
>> >>   I'm pretty sure this has been asked before, but I couldn't find a
>> >> complete answer in the forum archive. Here are my questions,
>> >>
>> >> 1) When solr starts up what does it loads up in the memory? Let's say
>> >> I've 4 cores with each core 50G in size. When Solr comes up how much
>> >> of it would be loaded in memory?
>> >>
>> >> 2) How much memory is required during index time? If I'm committing
>> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>> >> I need to give to Solr.
>> >>
>> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
>> >> size index? Is there any benchmark on this?
>> >>
>> >> Here are some of my configuration from solrconfig.xml,
>> >>
>> >> 1) 64
>> >> 2) All the caches (under query tag) are commented out
>> >> 3) Few others,
>> >>       a)  true    ==>
>> >> would this require memory?
>> >>       b)  50
>> >>       c) 200
>> >>       d)
>> >>       e) false
>> >>       f)  2
>> >>
>> >> The problem we are having is following,
>> >>
>> >> I've given Solr RAM of 6G. As the total index size (all cores
>> >> combined) start growing the Solr memory consumption  goes up. With 800
>> >> million documents, I see Solr already taking up all the memory at
>> >> startup. After that the commits, searches everything become slow. We
>> >> will be having distributed setup with multiple Solr instances (around
>> >> 8) on four boxes, but our requirement is to have each Solr instance at
>> >> least maintain around 1.5 billion documents.
>> >>
>> >> We are trying to see if we can somehow reduce the Solr memory
>> >> footprint. If someone can provide a pointer on what parameters affect
>> >> memory and what effects it has we can then decide whether we want that
>> >> parameter or not. I'm not sure if there is any minimum Solr
>> >> requirement for it to be able mainta

Re: Powered by Solr

2009-05-14 Thread Terence Gannon
> Did you try hitting refresh on your browser after you logged in?

Wow, I really should have known that...thank you for your patient reply, Yonik.

Regards...Terence


replication of lucene-write.lock file

2009-05-14 Thread Bryan Talbot


When using solr 1.4 replication, I see that the lucene-write.lock file  
is being replicated to slaves.  I'm importing data from a db every 5  
minutes using cron to trigger a DIH delta-import.  Replication polls  
every 60 seconds and the master is configured to take a snapshot  
(replicateAfter) commit.


Why should the lock file be replicated to slaves?

The lock file isn't stale on the master and is absent unless the delta- 
import is in process.  I've not tried it yet, but with the lock file  
replicated, it seems like promotion of a slave to a master in a  
failure recovery scenario requires the manual removal of the lock file.




-Bryan






Re: CommonsHttpSolrServer vs EmbeddedSolrServer

2009-05-14 Thread Eric Pugh
CommonsHttpSolrServer is how you access Solr from a Java client via  
HTTP.  You can connect to a Solr running anywhere   
EmbeddedSolrServer starts up Solr internally, and connects directly,  
all in a single JVM...  Embedded may be faster, the jury is out, but  
you have to have your Solr server and your Solr client on the same  
box...   Unless you really need it, I would start with  
CommonsHttpSolrServer, it's easier to configure and get going with and  
more flexible.


Eric


On May 14, 2009, at 1:30 PM, sachin78 wrote:



What is the difference between EmbeddedSolrServer and  
CommonsHttpSolrServer.

Which is the preferred server to use?

In some blog i read that EmbeddedSolrServer  is 50% faster than
CommonsHttpSolrServer,then why do we need to use  
CommonsHttpSolrServer.


Can anyone please guide me the right path/way.So that i pick the right
implementation.

Thanks in advance.

--Sachin
--
View this message in context: 
http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html
Sent from the Solr - User mailing list archive at Nabble.com.



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal






Re: CommonsHttpSolrServer vs EmbeddedSolrServer

2009-05-14 Thread Ryan McKinley
right -- which one you pick will depend more on your runtime  
environment then anything else.


If you need to hit a server (on a different machine)  
CommonsHttpSolrServer is your only option.


If you are running an embedded application -- where your custom code  
lives in the same JVM as solr -- you can use EmbeddedSolrServer.  The  
nice thing is that since they are the same interface, you can change  
later.


The performance comments on the wiki can be a bit misleading -- yes,  
in some cases embedded could be faster, but that may depend on how you  
are sending things -- are you sending 1000s of single document  
requests really fast?  If so, try sending a bunch of documents  
together in one request.


Also consider using the StreamingHttpSolrServer (https://issues.apache.org/jira/browse/SOLR-906 
) -- it has a few quirks, but can be much faster.


In any case, as long as you program against the SolrServer interface,  
then you could swap the implementation as needed.


ryan


On May 14, 2009, at 3:35 PM, Eric Pugh wrote:

CommonsHttpSolrServer is how you access Solr from a Java client via  
HTTP.  You can connect to a Solr running anywhere   
EmbeddedSolrServer starts up Solr internally, and connects directly,  
all in a single JVM...  Embedded may be faster, the jury is out, but  
you have to have your Solr server and your Solr client on the same  
box...   Unless you really need it, I would start with  
CommonsHttpSolrServer, it's easier to configure and get going with  
and more flexible.


Eric


On May 14, 2009, at 1:30 PM, sachin78 wrote:



What is the difference between EmbeddedSolrServer and  
CommonsHttpSolrServer.

Which is the preferred server to use?

In some blog i read that EmbeddedSolrServer  is 50% faster than
CommonsHttpSolrServer,then why do we need to use  
CommonsHttpSolrServer.


Can anyone please guide me the right path/way.So that i pick the  
right

implementation.

Thanks in advance.

--Sachin
--
View this message in context: 
http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html
Sent from the Solr - User mailing list archive at Nabble.com.



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal








Re: Solr vs Sphinx

2009-05-14 Thread Mike Klaas


On 14-May-09, at 9:46 AM, gdeconto wrote:


Solr is very fast even with 1.3 and the developers have done an  
incredible

job.

However, maybe the next Solr improvement should be the creation of a
configuration manager and/or automated tuning tool.  I know that  
optimizing

Solr performance can be time consuming and sometimes frustrating.


"Making Solr more self-service" has been a theme we have had and  
should strive to move toward.  In some respects, extreme  
configurability is a liability, if considerable tweaking and  
experimentation is needed to achieve optimum results.  You can't  
expect everyone to put in the investment to develop the expertise.


That said, it is very difficult to come up with appropriate auto- 
tuning heuristics that don't fail.  It almost calls for a level higher  
than Solr that you could hint what you want to do with the field  
(sort, facet, etc.), and it makes the field definitions  
appropriately.  The problem with such abstractions is that they are  
invariably leaky, and thus diagnosing problems requires similar  
expertise as omitting the abstraction step in the first place.


Getting this trade-off right is one of the central problems of  
computer science.


-Mike


Re: Solr memory requirements?

2009-05-14 Thread vivek sar
Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num #instances #bytes  class name
--
   1:   3890855  222608992  [C
   2:   3891673  155666920  java.lang.String
   3:   3284341  131373640  org.apache.lucene.index.TermInfo
   4:   3334198  106694336  org.apache.lucene.index.Term
   5:   271   26286496  [J
   6:16   26273936  [Lorg.apache.lucene.index.Term;
   7:16   26273936  [Lorg.apache.lucene.index.TermInfo;
   8:320512   15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9: 10335   11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar  wrote:
> I don't know if field type has any impact on the memory usage - does it?
>
> Our use cases require complete matches, thus there is no need of any
> analysis in most cases - does it matter in terms of memory usage?
>
> Also, is there any default caching used by Solr if I comment out all
> the caches under query in solrconfig.xml? I also don't have any
> auto-warming queries.
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 4:24 PM, Erick Erickson  
> wrote:
>> Warning: I'm wy out of my competency range when I comment
>> on SOLR, but I've seen the statement that string fields are NOT
>> tokenized while text fields are, and I notice that almost all of your fields
>> are string type.
>>
>> Would someone more knowledgeable than me care to comment on whether
>> this is at all relevant? Offered in the spirit that sometimes there are
>> things
>> so basic that only an amateur can see them 
>>
>> Best
>> Erick
>>
>> On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:
>>
>>> Thanks Otis.
>>>
>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>>> I've configured anything wrong.
>>>
>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>>> stored). All my fields are basic data type - which I thought are not
>>> sorted. My id field is unique key.
>>>
>>> Is there any field here that might be getting sorted?
>>>
>>>  >> required="true" omitNorms="true" compressed="false"/>
>>>
>>>   >> compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> default="NOW/HOUR"  compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> default="NOW/HOUR" omitNorms="true"/>
>>>
>>>
>>>   
>>>   >> omitNorms="true" multiValued="true"/>
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>>>  wrote:
>>> >
>>> > Hi,
>>> > Some answers:
>>> > 1) .tii files in the Lucene index.  When you sort, all distinct values
>>> for the field(s) used for sorting.  Similarly for facet fields.  Solr
>>> caches.
>>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
>>> consume during indexing.  There is no need to commit every 50K docs unless
>>> you want to trigger snapshot creation.
>>> > 3) see 1) above
>>> >
>>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
>>> going to fly. :)
>>> >
>>> > Otis
>>> > --
>>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> >
>>> >
>>> >
>>> > - Original Messag

Re: Solr vs Sphinx

2009-05-14 Thread Mark Miller

Michael McCandless wrote:
So why haven't we enabled this by default, already? 

Why isn't Lucene done already :)

- Mark




Search Query Questions

2009-05-14 Thread Chris Miller

I have two questions:

1) How do I search for ALL items? For example, I provide a sort query  
parameter of "updated" and a rows query parameter of 10 to limit the  
query results. I still have to provide a search query, of course. What  
if I want to provide a list of ALL results that match this? Or, in  
this case, the most recent 10 updated documents?


2) How do I search for all documents with a field that has data? For  
example, I have a field "foo" that is optional and multi-valued. How  
do I search for documents that have this field set to anything.


Thanks,

Chris Miller
ServerMotion
www.servermotion.com





Re: Search Query Questions

2009-05-14 Thread Chris Miller

Oh, one more question

3) Is there a way to effectively do a GROUP BY? For example, if I have  
a document that has a photoID attached to it, is there a way to return  
a set of results that does not duplicate the photoID field?


Thanks,

Chris Miller
ServerMotion
www.servermotion.com



On May 14, 2009, at 7:46 PM, Chris Miller wrote:


I have two questions:

1) How do I search for ALL items? For example, I provide a sort  
query parameter of "updated" and a rows query parameter of 10 to  
limit the query results. I still have to provide a search query, of  
course. What if I want to provide a list of ALL results that match  
this? Or, in this case, the most recent 10 updated documents?


2) How do I search for all documents with a field that has data? For  
example, I have a field "foo" that is optional and multi-valued. How  
do I search for documents that have this field set to anything.


Thanks,

Chris Miller
ServerMotion
www.servermotion.com







Re: Additional metadata when using Solr Cell

2009-05-14 Thread Mark Miller

rossputin wrote:

Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My curl
post has a URL like:

../solr/update/extract?ext.idx.attr=true&ext.def.fl=text&ext.literal.id=123&ext.literal.author=Somebody

Sure enough I see in the server logs:

params={ext.def.fl=text&ext.literal.id=123&ext.idx.attr=true&ext.literal.author=Somebody}

I am trying to get my field back in the results from a query:

../solr/select?indent=on&version=2.2&q=hello&start=0&rows=10&fl=author%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=

I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this field?

Thanks in advance for your help,

 -- Ross
  
Have you added author to the schema? If not, and if you are using the 
example config (that uses ext.ignore.und.fl=true), the field could just 
be ignored. Define it and it should be filled.


--
- Mark

http://www.lucidimagination.com





Re: Solr memory requirements?

2009-05-14 Thread Mark Miller

800 million docs is on the high side for modern hardware.

If even one field has norms on, your talking almost 800 MB right there. 
And then if another Searcher is brought up well the old one is serving 
(which happens when you update)? Doubled.


Your best bet is to distribute across a couple machines.

To minimize you would want to turn off or down caching, don't facet, 
don't sort, turn off all norms, possibly get at the Lucene term interval 
and raise it. Drop on deck searchers setting. Even then, 800 
million...time to distribute I'd think.


vivek sar wrote:

Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num #instances #bytes  class name
--
   1:   3890855  222608992  [C
   2:   3891673  155666920  java.lang.String
   3:   3284341  131373640  org.apache.lucene.index.TermInfo
   4:   3334198  106694336  org.apache.lucene.index.Term
   5:   271   26286496  [J
   6:16   26273936  [Lorg.apache.lucene.index.Term;
   7:16   26273936  [Lorg.apache.lucene.index.TermInfo;
   8:320512   15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9: 10335   11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar  wrote:
  

I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson  wrote:


Warning: I'm wy out of my competency range when I comment
on SOLR, but I've seen the statement that string fields are NOT
tokenized while text fields are, and I notice that almost all of your fields
are string type.

Would someone more knowledgeable than me care to comment on whether
this is at all relevant? Offered in the spirit that sometimes there are
things
so basic that only an amateur can see them 

Best
Erick

On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:

  

Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm wondering if
I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are just
stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?

 

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  

Thanks,
-vivek

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:


Hi,
Some answers:
1) .tii files in the Lucene index.  When you sort, all distinct values
  

for the field(s) used for sorting.  Similarly for facet fields.  Solr
caches.


2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
  

consume during indexing.  There is no need to commit every 50K docs unless
you want to trigger snapshot creation.


3) see 1) above

1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
  

going to fly. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: vivek sar 
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 3:04:46 PM
Subject: Solr memory requirements?

Hi,

  I'm pretty sure this has been asked before, but I couldn't find a
complete answer in the forum archive. Here are my questions,

1) When solr starts up what does it loads up in the memory? Let's say
I've 4 cores with each core 50G in size. When Solr comes up how much
of it would be loaded in memory?

2

Re: Search Query Questions

2009-05-14 Thread Matt Weber
I think you will want to look at the Field Collapsing patch for this.  http://issues.apache.org/jira/browse/SOLR-236 
.


Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com




On May 14, 2009, at 5:52 PM, Chris Miller wrote:


Oh, one more question

3) Is there a way to effectively do a GROUP BY? For example, if I  
have a document that has a photoID attached to it, is there a way to  
return a set of results that does not duplicate the photoID field?


Thanks,

Chris Miller
ServerMotion
www.servermotion.com



On May 14, 2009, at 7:46 PM, Chris Miller wrote:


I have two questions:

1) How do I search for ALL items? For example, I provide a sort  
query parameter of "updated" and a rows query parameter of 10 to  
limit the query results. I still have to provide a search query, of  
course. What if I want to provide a list of ALL results that match  
this? Or, in this case, the most recent 10 updated documents?


2) How do I search for all documents with a field that has data?  
For example, I have a field "foo" that is optional and multi- 
valued. How do I search for documents that have this field set to  
anything.


Thanks,

Chris Miller
ServerMotion
www.servermotion.com









Re: Solr memory requirements?

2009-05-14 Thread vivek sar
Thanks Mark.

I checked all the items you mentioned,

1) I've omitnorms=true for all my indexed fields (stored only fields I
guess doesn't matter)
2) I've tried commenting out all caches in the solrconfig.xml, but
that doesn't help much
3) I've tried commenting out the first and new searcher listeners
settings in the solrconfig.xml - the only way that helps is that at
startup time the memory usage doesn't spike up - that's only because
there is no auto-warmer query to run. But, I noticed commenting out
searchers slows down any other queries to Solr.
4) I don't have any sort or facet in my queries
5) I'm not sure how to change the "Lucene term interval" from Solr -
is there a way to do that?

I've been playing around with this memory thing the whole day and have
found that it's the search that's hogging the memory. Any time there
is a search on all the records (800 million) the heap consumption
jumps by 5G. This makes me think there has to be some configuration in
Solr that's causing some terms per document to be loaded in memory.

I've posted my settings several times on this forum, but no one has
been able to pin point what configuration might be causing this. If
someone is interested I can attach the solrconfig and schema files as
well. Here are the settings again under Query tag,


  1024
  true
  50
  200
   
  false
  2
 

and schema,

 

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  

  
  

Any help is greatly appreciated.

Thanks,
-vivek

On Thu, May 14, 2009 at 6:22 PM, Mark Miller  wrote:
> 800 million docs is on the high side for modern hardware.
>
> If even one field has norms on, your talking almost 800 MB right there. And
> then if another Searcher is brought up well the old one is serving (which
> happens when you update)? Doubled.
>
> Your best bet is to distribute across a couple machines.
>
> To minimize you would want to turn off or down caching, don't facet, don't
> sort, turn off all norms, possibly get at the Lucene term interval and raise
> it. Drop on deck searchers setting. Even then, 800 million...time to
> distribute I'd think.
>
> vivek sar wrote:
>>
>> Some update on this issue,
>>
>> 1) I attached jconsole to my app and monitored the memory usage.
>> During indexing the memory usage goes up and down, which I think is
>> normal. The memory remains around the min heap size (4 G) for
>> indexing, but as soon as I run a search the tenured heap usage jumps
>> up to 6G and remains there. Subsequent searches increases the heap
>> usage even more until it reaches the max (8G) - after which everything
>> (indexing and searching becomes slow).
>>
>> The search query is a very generic one in this case which goes through
>> all the cores (4 of them - 800 million records), finds 400million
>> matches and returns 100 rows.
>>
>> Does the Solr searcher holds up the reference to objects in memory? I
>> couldn't find any settings that would tell me it does, but every
>> search causing heap to go up is definitely suspicious.
>>
>> 2) I ran the jmap histo to get the top objects (this is on a smaller
>> instance with 2 G memory, this is before running search - after
>> running search I wasn't able to run jmap),
>>
>>  num     #instances         #bytes  class name
>> --
>>   1:       3890855      222608992  [C
>>   2:       3891673      155666920  java.lang.String
>>   3:       3284341      131373640  org.apache.lucene.index.TermInfo
>>   4:       3334198      106694336  org.apache.lucene.index.Term
>>   5:           271       26286496  [J
>>   6:            16       26273936  [Lorg.apache.lucene.index.Term;
>>   7:            16       26273936  [Lorg.apache.lucene.index.TermInfo;
>>   8:        320512       15384576
>> org.apache.lucene.index.FreqProxTermsWriter$PostingList
>>   9:         10335       11554136  [I
>>
>> I'm not sure what's the first one (C)? I couldn't profile it to know
>> what all the Strings are being allocated by - any ideas?
>>
>> Any ideas on what Searcher might be holding on and how can we change
>> that behavior?
>>
>> Thanks,
>> -vivek
>>
>>
>> On Thu, May 14, 2009 at 11:33 AM, vivek sar  wrote:
>>
>>>
>>> I don't know if field type has any impact on the memory usage - does it?
>>>
>>> Our use cases require complete matches, thus there is no need of any
>>> analysis in most cases - does it matter in terms of memory usage?
>>>
>>> Also, is there any default caching used by Solr if I comment out all
>>> the caches under query in solrconfig.xml? I also don't have any
>>> auto-warming queries.
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Wed, May 13, 2009 at 4:24 PM, Erick Erickson 
>>> wrote:
>>>

 Warning: I'm wy out of my competency range when I comment
 on SOLR, but I've seen the statement that string fields are NOT
 tokenized while text fields are, and I notice that almost all of your
 fields
 are string type.

 Would someone more knowledgeable than me care to comment on whether
 t