from:"Rahul R"

facet.missing=true returns null records with zero count also

2013-06-04 Thread Rahul R

All,
We had a requirement in our solr powered application where customers want
to see all the documents that have a blank value for a field. So when they
facet on a field, if the field has null values, they should be able select
that facet value and see all documents. I thought facet.missing=true was
the answer.

When I set facet.missing=true in solrconfig.xml, I expected to get facet
values that are null along with their count. However, when there is no null
value, I do not want the null to be returned along with a count of zero,
which is what is happening now.

Background information: Using SolrJ with Solr 3.4 and jdk7
Sample program
SolrQuery facquery= new SolrQuery();
facquery.setQuery("*:*");
facquery.addFilterQuery("Field2:\"ISC\"");
facquery.setRows(0);
facquery.setFacet(true);
facquery.setFacetMinCount(1);
facquery.setFacetLimit(2);
String[] orderedFacetList = new String[] {"Field1", "Field2", "Field3"};
for(int i=0; i < orderedFacetList.length; i++) {
facquery.addFacetField(orderedFacetList[i]);
}
try {
facResponse = server.query(facquery);
}catch(SolrServerException ex) {
}
FacetField ff1=facResponse.getFacetField("Field2");
int count = ff1.getValueCount();//This gives count of 2
List flist = ff1.getValues(); //The values are [ISC
(1077), null (0)]

In the above program, I am applying a filter on the field Field2 with a
value ISC. So the results will be only documents that have ISC for Field2.
My expectation is that flist in above program should only return [ISC
(1077)].

Appreciate any pointers on this. Thank you

- Rahul

Re: facet.missing=true returns null records with zero count also

2013-06-05 Thread Rahul R

Hoss,
We rely heavily on facet.mincount because once a user has selected a facet,
it doesn't make sense for us to show that facet field to him and let him
filter again with the same facet. Also, when a facet has only one value, it
doesn't make sense to show it to the user, since searching with that facet
is just going to give the same result set again. So when facet.missing does
not work with facet.mincount, it is a bit of a hassle for us Will work
on handling it in our program.Thank you for the clarification

- Rahul


On Wed, Jun 5, 2013 at 12:32 AM, Chris Hostetter
wrote:

>
> : that facet value and see all documents. I thought facet.missing=true was
> : the answer.
> ...
> : facquery.setFacetMinCount(1);
>
> Hmm, yeah -- it looks like facet.missing doesn't take facet.mincount into
> consideration.
>
> I don't remember if that was intentional or not, but as a special case
> one-off count it seems like a toss up as to wether it would be more or
> less surprising to hide it if it's below the mincount. (it's very similar
> to doing one off facet.query for example, and those are always included in
> the response and don't consider the facet.mincount either)
>
> In general, this seems like a low impact thing though, correct?  i mean:
> the main advantage of facet.mincount is to reduce what could be a very
> large amount of useless data from being stream from the server->client,
> particularly in the case of using facet.sort where you really need the
> consraints eliminated server side in order to get the sort=limit applied
> correctly.
>
> but with the facet.missing value, it's just a single value per field that
> can easily be ignored by the client if it's not desired because of the
> mincount.  or to put it another way: the amount of work needed to ignor
> this on the client, is less then the amount of work to make it
> configurable to ignore it on the server.
>
>
> -Hoss
>

OR query with null value and non-null value(s)

2013-06-06 Thread Rahul R

I have recently enabled facet.missing=true in solrconfig.xml which gives
null facet values also. As I understand it, the syntax to do a faceted
search on a null value is something like this:
&fq=-price:[* TO *]
So when I want to search on a particular value (for example : 4)  OR null
value, I would expect the syntax to be something like this:
&fq=(price:4+OR+(-price:[* TO *]))
But this does not work. After searching around for more, read somewhere
that the right way to achieve this would be:
fq=-(-price:4+AND+price:[*+TO+*])
Now this does work but seems like a very roundabout way. Is there a better
way to achieve this ?

I use solrJ in Solr 3.4.

Thank you.

- Rahul

Re: OR query with null value and non-null value(s)

2013-06-06 Thread Rahul R

Thank you Shawn. This does work. To help me understand better, why do
we need the *:* ? Shouldn't it be implicit ?
Shouldn't
fq=(price:4+OR+(-price:[* TO *]))  //does not work
mean the same as
fq=(price:4+OR+(*:* -price:[* TO *]))   //works

Why does Solr need the *:* there ?




On Fri, Jun 7, 2013 at 12:07 AM, Shawn Heisey  wrote:

> On 6/6/2013 12:28 PM, Rahul R wrote:
>
>> I have recently enabled facet.missing=true in solrconfig.xml which gives
>> null facet values also. As I understand it, the syntax to do a faceted
>> search on a null value is something like this:
>> &fq=-price:[* TO *]
>> So when I want to search on a particular value (for example : 4)  OR null
>> value, I would expect the syntax to be something like this:
>> &fq=(price:4+OR+(-price:[* TO *]))
>> But this does not work. After searching around for more, read somewhere
>> that the right way to achieve this would be:
>> fq=-(-price:4+AND+price:[*+TO+***])
>> Now this does work but seems like a very roundabout way. Is there a better
>> way to achieve this ?
>>
>
> Pure negative queries don't work -- you have to have results in the query
> before you can subtract.  For some top-level queries, Solr is able to
> detect this situation and fix it internally, but on inner queries you must
> explicitly state your intentions.  It is best if you always use '*:*
> -query' syntax, just to be safe.
>
> fq=(price:4+OR+(*:* -price:[* TO *]))
>
> Thanks,
> Shawn
>
>

Re: OR query with null value and non-null value(s)

2013-06-07 Thread Rahul R

Thank you for the Clarification Shawn.


On Fri, Jun 7, 2013 at 7:34 PM, Jack Krupansky wrote:

> Yes, it SHOULD! And in the LucidWorks Search query parser it does. Why
> doesn't it in Solr? Ask Yonik to explain that!
>
> -- Jack Krupansky
>
> -Original Message- From: Rahul R
> Sent: Friday, June 07, 2013 1:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: OR query with null value and non-null value(s)
>
>
> Thank you Shawn. This does work. To help me understand better, why do
> we need the *:* ? Shouldn't it be implicit ?
> Shouldn't
> fq=(price:4+OR+(-price:[* TO *]))  //does not work
> mean the same as
> fq=(price:4+OR+(*:* -price:[* TO *]))   //works
>
> Why does Solr need the *:* there ?
>
>
>
>
> On Fri, Jun 7, 2013 at 12:07 AM, Shawn Heisey  wrote:
>
>  On 6/6/2013 12:28 PM, Rahul R wrote:
>>
>>  I have recently enabled facet.missing=true in solrconfig.xml which gives
>>> null facet values also. As I understand it, the syntax to do a faceted
>>> search on a null value is something like this:
>>> &fq=-price:[* TO *]
>>> So when I want to search on a particular value (for example : 4)  OR null
>>> value, I would expect the syntax to be something like this:
>>> &fq=(price:4+OR+(-price:[* TO *]))
>>> But this does not work. After searching around for more, read somewhere
>>> that the right way to achieve this would be:
>>> fq=-(-price:4+AND+price:[*+TO+*])
>>>
>>> Now this does work but seems like a very roundabout way. Is there a
>>> better
>>> way to achieve this ?
>>>
>>>
>> Pure negative queries don't work -- you have to have results in the query
>> before you can subtract.  For some top-level queries, Solr is able to
>> detect this situation and fix it internally, but on inner queries you must
>> explicitly state your intentions.  It is best if you always use '*:*
>> -query' syntax, just to be safe.
>>
>> fq=(price:4+OR+(*:* -price:[* TO *]))
>>
>> Thanks,
>> Shawn
>>
>>
>>
>

License Info

2011-11-11 Thread Rahul R

Hello,
Since Apache Solr is governed by Apache License 2.0 - does it mean that all
jar files bundled within Solr are also governed by the same License ? Do I
have to worry about checking the License information of all bundled jar
files in my commercial Solr powered application ?

Even if I use them independent of Solr, will the same License apply ? Some
of the jar files - slf4j-api-1.6.1.jar, jcl-over-slf4j-1.6.1.jar etc - do
not have any License file inside the jar.

Regards
Rahul

Lucene FieldCache - Out of memory exception

2012-04-30 Thread Rahul R

Hello,
I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application server
on Solaris. I use embedded solr server. More details :
Number of docs in solr index : 1.4 million
Physical size of index : 640MB
Total number of fields in the index : 700 (99% of these are dynamic fields)
Total number of fields enabled for faceting : 440
Avg number of facet fields participating in a faceted query : 50-70
Total RAM allocated to weblogic appserver : 3GB (max possible)

In a multi user environment with 3 users using this application for a
period of around 40 minutes, the application runs out of memory. Analysis
of the heap dump shows that almost 85% of the memory is retained by the
FieldCache. Now I understand that the field cache is out of our control but
would appreciate some suggestions on how to handle this issue.

Some questions on this front :
- some mail threads on this forum seem to indicate that there could be some
connection between having dynamic fields and usage of FieldCache. Is this
true ? Most of the fields in my index are dynamic fields.
- as mentioned above, most of my faceted queries could have around 50-70
facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields
per query). Could this be the source of the problem ? Is this too high for
solr to support ?
- Initially, I had a facet.sort defined in solrconfig.xml. Since FieldCache
builds up on sorting, I even removed the facet.sort and tried, but no
respite. The behavior is same as before.
- The document id that I have for each document is quite big (around 50
characters on average). Can this be a problem ? I reduced this to around 15
characters and tried but still there is no improvement.
- Can the size of the data be a problem ? But on this forum, I see many
users talking of more than 100 million documents in their index. I have
only 1.4 million with physical size of 640MB. The physical server on which
this application is running, has sufficient RAM and CPU.
- What gets stored in the FieldCache ? Is it the entire document or just
the document Id ?


Any help is much appreciated. Thank you.

regards
Rahul

Re: get a total count

2012-05-01 Thread Rahul R

Hello,
A related question on this topic. How do I programmatically find the total
number of documents across many shards ? For EmbeddedSolrServer, I use the
following command to get the total count :
solrSearcher.getStatistics().get("numDocs")

With distributed search, how do i get the count of all records in all
shards. Apart from doing a *:* query, is there a way to get the total count
? I am not able to use the same command above because, I am not able to get
a handle to the SolrIndexSearcher object with distributed search. The conf
and data directories of my index reside directly under a folder called solr
(no core) under the weblogic domain directly. I dont have a SolrCore
object. With EmbeddedSolrServer, I used to get the SolrIndexSearcher object
using the following call :
solrSearcher = (SolrIndexSearcher)SolrCoreObject.getSearcher().get();

Stack Information :
OS : Solaris
jdk : 1.5.0_14 32 bit
Solr : 1.3
App Server : Weblogic 10MP1

Thank you.

- Rahul

On Tue, Nov 15, 2011 at 10:49 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> I'm assuming the question was about how MANY documents have been indexed
> across all shards.
>
> Answer #1:
> Look at the Solr Admin Stats page on each of your Solr instances and add
> up the numDocs numbers you see there
>
> Answer #2:
> Use Sematext's free Performance Monitoring tool for Solr
> On Index report choose "all, sum" in the Solr Host selector and that will
> show you the total # of docs across the cluster, total # of deleted docs,
> total segments, total size on disk, etc.
> URL: http://www.sematext.com/spm/solr-performance-monitoring/index.html
>
> Otis
> 
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> >
> >From: U Anonym 
> >To: solr-user@lucene.apache.org
> >Sent: Monday, November 14, 2011 11:50 AM
> >Subject: get a total count
> >
> >Hello everyone,
> >
> >A newbie question:  how do I find out how documents have been indexed
> >across all shards?
> >
> >Thanks much!
> >
> >
> >
>

Re: Lucene FieldCache - Out of memory exception

2012-05-01 Thread Rahul R

Here is one sample query that I picked up from the log file :

q=*%3A*&fq=Category%3A%223__107%22&fq=S_P1540477699%3A%22MICROCIRCUIT%2C+LINE+TRANSCEIVERS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S_C1503120369&facet.field=S_P1406389942&facet.field=S_P1430116878&facet.field=S_P1430116881&facet.field=S_P1406453552&facet.field=S_P1406451296&facet.field=S_P1406452465&facet.field=S_C2968809156&facet.field=S_P1406389980&facet.field=S_P1540477699&facet.field=S_P1406389982&facet.field=S_P1406389984&facet.field=S_P1406451284&facet.field=S_P1406389926&facet.field=S_P1424886581&facet.field=S_P2017662632&facet.field=F_P1946367021&facet.field=S_P1430116884&facet.field=S_P2017662620&facet.field=F_P1406451304&facet.field=F_P1406451306&facet.field=F_P1406451308&facet.field=S_P1500901421&facet.field=S_P1507138990&facet.field=I_P1406452433&facet.field=I_P1406453565&facet.field=I_P1406452463&facet.field=I_P1406453573&facet.field=I_P1406451324&facet.field=I_P1406451288&facet.field=S_P1406451282&facet.field=S_P1406452471&facet.field=S_P1424886605&facet.field=S_P1946367015&facet.field=S_P1424886598&facet.field=S_P1946367018&facet.field=S_P1406453556&facet.field=S_P1406389932&facet.field=S_P2017662623&facet.field=S_P1406450978&facet.field=F_P1406452455&facet.field=S_P1406389972&facet.field=S_P1406389974&facet.field=S_P1406389986&facet.field=F_P1946367027&facet.field=F_P1406451294&facet.field=F_P1406451286&facet.field=F_P1406451328&facet.field=S_P1424886593&facet.field=S_P1406453567&facet.field=S_P2017662629&facet.field=S_P1406453571&facet.field=F_P1946367030&facet.field=S_P1406453569&facet.field=S_P2017662626&facet.field=S_P1406389978&facet.field=F_P1946367024

My primary question here is, can Solr handle this kind of queries with so
many facet fields. I have tried using both enum and fc for facet.method and
there is no improvement with either.

Appreciate any help on this. Thank you.

- Rahul


On Mon, Apr 30, 2012 at 2:53 PM, Rahul R  wrote:

> Hello,
> I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application
> server on Solaris. I use embedded solr server. More details :
> Number of docs in solr index : 1.4 million
> Physical size of index : 640MB
> Total number of fields in the index : 700 (99% of these are dynamic fields)
> Total number of fields enabled for faceting : 440
> Avg number of facet fields participating in a faceted query : 50-70
> Total RAM allocated to weblogic appserver : 3GB (max possible)
>
> In a multi user environment with 3 users using this application for a
> period of around 40 minutes, the application runs out of memory. Analysis
> of the heap dump shows that almost 85% of the memory is retained by the
> FieldCache. Now I understand that the field cache is out of our control but
> would appreciate some suggestions on how to handle this issue.
>
> Some questions on this front :
> - some mail threads on this forum seem to indicate that there could be
> some connection between having dynamic fields and usage of FieldCache. Is
> this true ? Most of the fields in my index are dynamic fields.
> - as mentioned above, most of my faceted queries could have around 50-70
> facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields
> per query). Could this be the source of the problem ? Is this too high for
> solr to support ?
> - Initially, I had a facet.sort defined in solrconfig.xml. Since
> FieldCache builds up on sorting, I even removed the facet.sort and tried,
> but no respite. The behavior is same as before.
> - The document id that I have for each document is quite big (around 50
> characters on average). Can this be a problem ? I reduced this to around 15
> characters and tried but still there is no improvement.
> - Can the size of the data be a problem ? But on this forum, I see many
> users talking of more than 100 million documents in their index. I have
> only 1.4 million with physical size of 640MB. The physical server on which
> this application is running, has sufficient RAM and CPU.
> - What gets stored in the FieldCache ? Is it the entire document or just
> the document Id ?
>
>
> Any help is much appreciated. Thank you.
>
> regards
> Rahul
>
>
>

Re: Lucene FieldCache - Out of memory exception

2012-05-02 Thread Rahul R

Jack,
Yes, the queries work fine till I hit the OOM. The fields that start with
S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
definitions from schema.xml :

*Each FieldCache will be an array with maxdoc entries (your total number of
documents - 1.4 million) times the size of the field value or whatever a
string reference is in your JVM*
So if I understand correct - every field (dynamic or normal) will have its
own field cache. The size of the field cache for any field will be (maxDocs
* sizeOfField) ? If the field has only 100 unique values, will it occupy
(100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ?

*Roughly what is the typical or average length of one of your facet field
values? And, on average, how many unique terms are there within a typical
faceted field?*
Each field length may vary from 10 - 30 characters. Average of 20 maybe.
Number of unique terms within a faceted field will vary from 100 - 1000.
Average of 300. How will the number of unique terms affect performance ?

*3 GB sounds like it might not be enough for such heavy use of faceting. It
is probably not the 50-70 number, but the 440 or accumulated number across
many queries that pushes the memory usage up*
I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a
limitation that more RAM cannot be allocated.

*When you hit OOM, what does the Solr admin stats display say for
FieldCache?*
I don't have solr deployed as a separate web app. All solr jar files are
present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So
is there a way I can get this information that the admin would show ?

Thank you for your time.

-Rahul

On Wed, May 2, 2012 at 5:19 PM, Jack Krupansky wrote:

> The FieldCache gets populated the first time a given field is referenced
> as a facet and then will stay around forever. So, as additional queries get
> executed with different facet fields, the number of FieldCache entries will
> grow.
>
> If I understand what you have said, theses faceted queries do work
> initially, but after awhile they stop working with OOM, correct?
>
> The size of a single FieldCache depends on the field type. Since you are
> using dynamic fields, it depends on your "dynamicField" types - which you
> have not told us about. From your query I see that your fields start with
> "S_" and "F_" - presumably you have dynamic field types "S_*" and "F_*"?
> Are they strings, integers, floats, or what?
>
> Each FieldCache will be an array with maxdoc entries (your total number of
> documents - 1.4 million) times the size of the field value or whatever a
> string reference is in your JVM.
>
> String fields will take more space than numeric fields for the FieldCache,
> since a separate table is maintained for the unique terms in that field.
> Roughly what is the typical or average length of one of your facet field
> values? And, on average, how many unique terms are there within a typical
> faceted field?
>
> If you can convert many of these faceted fields to simple integers the
> size should go down dramatically, but that depends on your application.
>
> 3 GB sounds like it might not be enough for such heavy use of faceting. It
> is probably not the 50-70 number, but the 440 or accumulated number across
> many queries that pushes the memory usage up.
>
> When you hit OOM, what does the Solr admin stats display say for
> FieldCache?
>
> -- Jack Krupansky
>
> -Original Message- From: Rahul R
> Sent: Wednesday, May 02, 2012 2:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache - Out of memory exception
>
>
> Here is one sample query that I picked up from the log file :
>
> q=*%3A*&fq=Category%3A%223__**107%22&fq=S_P1540477699%3A%**
> 22MICROCIRCUIT%2C+LINE+**TRANSCEIVERS%22&rows=0&facet=**
> true&facet.mincount=1&facet.**limit=2&facet.field=S_**
> C1503120369&facet.field=S_**P1406389942&facet.field=S_**
> P1430116878&facet.field=S_**P1430116881&facet.field=S_**
> P1406453552&facet.field=S_**P1406451296&facet.field=S_**
> P1406452465&facet.field=S_**C2968809156&facet.field=S_**
> P1406389980&facet.field=S_**P1540477699&facet.field=S_**
> P1406389982&facet.field=S_**P1406389984&facet.field=S_**
> P1406451284&facet.field=S_**P1406389926&facet.field=S_**
> P1424886581&facet.field=S_**P2017662632&facet.field=F_**
> P1946367021&facet.field=S_**P1430116884&facet.field=S_**
> P2017662620&facet.field=F_**P1406451304&facet.field=F_**
> P1406451306&facet.field=F_**P1406451308&facet.field=S_**
> P1500901421&facet.field=S_**P1507138990&facet.field=I_**
> P1406452433&facet.field

Re: Lucene FieldCache - Out of memory exception

2012-05-07 Thread Rahul R

Jack,
Sorry for the delayed response:
Total memory allocated : 3GB
Free Memory on startup of application server : 2.85GB (95%)
Free Memory after first request by first user(1 request involves 3 queries)
: 2.7GB (90%)
Free Memory after a few requests by same user : 2.52GB (84%)

All values recorded above have been done after 2 force GCs were done to
identify the free memory.

The progression of memory usage looks quite high with the above numbers. As
the number of searches widen, the speed of memory consumption decreases.
But at some point it does hit OOM.

- Rahul

On Thu, May 3, 2012 at 8:37 PM, Jack Krupansky wrote:

> Just for a baseline, how much memory is available in the JVM (using
> jconsole or something similar) before you do your first query, and then
> after your first query (that has these 50-70 facets), and then after a few
> different queries (different facets.) Just to see how close you are to "the
> edge" even before a volume of queries start coming in.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Rahul R
> Sent: Thursday, May 03, 2012 1:28 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache - Out of memory exception
>
> Jack,
> Yes, the queries work fine till I hit the OOM. The fields that start with
> S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
> definitions from schema.xml :
>  omitNorms="true"/>
>   omitNorms="true"/>
>   omitNorms="true"/>
>   omitNorms="true"/>
>   omitNorms="true"/>
>
> *Each FieldCache will be an array with maxdoc entries (your total number of
>
> documents - 1.4 million) times the size of the field value or whatever a
> string reference is in your JVM*
>
> So if I understand correct - every field (dynamic or normal) will have its
> own field cache. The size of the field cache for any field will be (maxDocs
> * sizeOfField) ? If the field has only 100 unique values, will it occupy
> (100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ?
>
> *Roughly what is the typical or average length of one of your facet field
>
> values? And, on average, how many unique terms are there within a typical
> faceted field?*
>
> Each field length may vary from 10 - 30 characters. Average of 20 maybe.
> Number of unique terms within a faceted field will vary from 100 - 1000.
> Average of 300. How will the number of unique terms affect performance ?
>
> *3 GB sounds like it might not be enough for such heavy use of faceting. It
>
> is probably not the 50-70 number, but the 440 or accumulated number across
> many queries that pushes the memory usage up*
>
> I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a
> limitation that more RAM cannot be allocated.
>
> *When you hit OOM, what does the Solr admin stats display say for
> FieldCache?*
>
> I don't have solr deployed as a separate web app. All solr jar files are
> present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So
> is there a way I can get this information that the admin would show ?
>
> Thank you for your time.
>
> -Rahul
>
>
> On Wed, May 2, 2012 at 5:19 PM, Jack Krupansky **
> wrote:
>
>  The FieldCache gets populated the first time a given field is referenced
>> as a facet and then will stay around forever. So, as additional queries
>> get
>> executed with different facet fields, the number of FieldCache entries
>> will
>> grow.
>>
>> If I understand what you have said, theses faceted queries do work
>> initially, but after awhile they stop working with OOM, correct?
>>
>> The size of a single FieldCache depends on the field type. Since you are
>> using dynamic fields, it depends on your "dynamicField" types - which you
>> have not told us about. From your query I see that your fields start with
>> "S_" and "F_" - presumably you have dynamic field types "S_*" and "F_*"?
>> Are they strings, integers, floats, or what?
>>
>> Each FieldCache will be an array with maxdoc entries (your total number of
>> documents - 1.4 million) times the size of the field value or whatever a
>> string reference is in your JVM.
>>
>> String fields will take more space than numeric fields for the FieldCache,
>> since a separate table is maintained for the unique terms in that field.
>> Roughly what is the typical or average length of one of your facet field
>> values? And, on average, how many unique terms are there within a typical
>> faceted field?
>>
>> If you can convert many of these faceted fields to simple integers the
>> size should

Re: Lucene FieldCache - Out of memory exception

2012-05-08 Thread Rahul R

A update on the things I tried today. Since multiValued fields do not use
the fieldCache, I changed my schema to define all my fields as multiValued
fields. Although these fields need to be only single valued, I made this
change and recreated the index and tested with it. Observations :
- force GC always results in freeing up most of the heap i.e the FieldCache
doesn't seem to be created. So OOM issue does not occur.
- response time is terribly slow for faceting queries. Application is
almost unusable and system monitoring shows high CPU usage.
- using solr caches - documentCache, filterCache & queryResultsCache - does
not seem to improve performance. Cache sizes are documentCache - 100K,
filterCache - 10K, queryResultsCache - 10K.

I don't think I can use this as a solution because response times are very
poor. But a few questions :
- solr documentation indicates that the fieldCache gets built up on sorting
and function queries only. When I use single Valued fields, I don't do any
explicit sorting or use any functions. Could there be some setting that
results in automatic sorting to happen on the result set (although I don't
want a sort) ?
- is there a way I can improve faceting performance with all my fields as
multiValued fields ?

Appreciate any help on this. Thank you.

- Rahul

On Mon, May 7, 2012 at 7:23 PM, Rahul R  wrote:

> Jack,
> Sorry for the delayed response:
> Total memory allocated : 3GB
> Free Memory on startup of application server : 2.85GB (95%)
> Free Memory after first request by first user(1 request involves 3
> queries) : 2.7GB (90%)
> Free Memory after a few requests by same user : 2.52GB (84%)
>
> All values recorded above have been done after 2 force GCs were done to
> identify the free memory.
>
> The progression of memory usage looks quite high with the above numbers.
> As the number of searches widen, the speed of memory consumption decreases.
> But at some point it does hit OOM.
>
> - Rahul
>
>
> On Thu, May 3, 2012 at 8:37 PM, Jack Krupansky wrote:
>
>> Just for a baseline, how much memory is available in the JVM (using
>> jconsole or something similar) before you do your first query, and then
>> after your first query (that has these 50-70 facets), and then after a few
>> different queries (different facets.) Just to see how close you are to "the
>> edge" even before a volume of queries start coming in.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Rahul R
>> Sent: Thursday, May 03, 2012 1:28 AM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Lucene FieldCache - Out of memory exception
>>
>> Jack,
>> Yes, the queries work fine till I hit the OOM. The fields that start with
>> S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
>> definitions from schema.xml :
>> > omitNorms="true"/>
>>  > omitNorms="true"/>
>>  > omitNorms="true"/>
>>  > omitNorms="true"/>
>>  > omitNorms="true"/>
>>
>> *Each FieldCache will be an array with maxdoc entries (your total number
>> of
>>
>> documents - 1.4 million) times the size of the field value or whatever a
>> string reference is in your JVM*
>>
>> So if I understand correct - every field (dynamic or normal) will have its
>> own field cache. The size of the field cache for any field will be
>> (maxDocs
>> * sizeOfField) ? If the field has only 100 unique values, will it occupy
>> (100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ?
>>
>> *Roughly what is the typical or average length of one of your facet field
>>
>> values? And, on average, how many unique terms are there within a typical
>> faceted field?*
>>
>> Each field length may vary from 10 - 30 characters. Average of 20 maybe.
>> Number of unique terms within a faceted field will vary from 100 - 1000.
>> Average of 300. How will the number of unique terms affect performance ?
>>
>> *3 GB sounds like it might not be enough for such heavy use of faceting.
>> It
>>
>> is probably not the 50-70 number, but the 440 or accumulated number across
>> many queries that pushes the memory usage up*
>>
>> I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a
>> limitation that more RAM cannot be allocated.
>>
>> *When you hit OOM, what does the Solr admin stats display say for
>> FieldCache?*
>>
>> I don't have solr deployed as a separate web app. All solr jar files are
>> present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So
>> is there a way I can get

Solr Caches

2012-05-15 Thread Rahul R

Hello,
I am trying to understand how I can size the caches for my solr powered
application. Some details on the index and application :
Solr Version : 1.3
JDK : 1.5.0_14 32 bit
OS : Solaris 10
App Server : Weblogic 10 MP1
Number of documents : 1 million
Total number of fields : 1000 (750 strings, 225 int/float/double/long, 25
boolean)
Number of fields on which faceting and filtering can be done : 400
Physical size of  index : 600MB
Number of unique values for a field : Ranges from 5 - 1000. Average of 150
-Xms and -Xmx vals for jvm : 3G
Expected number of concurrent users : 15
No sorting planned for now

Now I want to set appropriate values for the caches. I have put below some
of my understanding and questions about the caches. Please correct and
answer accordingly.
FilterCache:
As per the solr wiki, this is used to store an unordered list of Ids of
matching documents for an fq param.
So if a query contains two fq params, it will create two separate entries
for each of these fq params. The value of each entry is the list of ids of
all documents across the index that match the corresponding fq param. Each
entry is independent of any other entry.
A minimum size for filterCache could be (total number of fields * avg
number of unique values per field) ? Is this correct ? I have not enabled
.
Max physical size of the filter cache would be (size * avg byte size of a
document id * avg number of docs returned per fq param) ?

QueryResultsCache:
Used to store an ordered list of ids of the documents that match the most
commonly used searches. So if my query is something like
q=Status:Active&fq=Org:Apache&fq=Version:13, it will create one entry that
contains list of ids of documents that match this full query. Is this
correct ? How can I size my queryResultsCache ? Some entries from
solrconfig.xml :
50
200
Max physical size of the filterCache would be (size * avg byte size of a
document id * avg number of docs per query). Is this correct ?


documentCache:
Stores the documents that are stored in the index. So I do two searches
that return three documents each with 1 document being common between both
result sets. This will result in 5 entries in the documentCache for the 5
unique documents that have been returned for the two queries ? Is this
correct ? For sizing, SolrWiki states that "*The size for the documentCache
should always be greater than  * *".
Why do we need the max_concurrent_queries parameter here ? Is it when
max_results is much lesser than numDocs ? In my case, a q=*:*search is done
the first time the index is loaded. So, will setting documentCache size to
numDocs be correct ? Can this be like the max that I need to allocate ?
Max physical size of document cache would be (size * avg byte size of a
document in the index). Is this correct ?

Thank you

-Rahul

Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-30 Thread Rahul R

Hello,
We are trying to get Solr to work for a really huge parts database. Details
of the database
- 55 million parts
- Totally 3700 properties (facets). But each record will not have value for
all properties.
- Most of these facets are defined as dynamic fields within the Solr Index

We were getting really unacceptable timing while doing faceting/searches on
an index created with this database. With only one user using the system,
query times are in excess of 1 minute. With more users concurrently using
the system, the response times are further high.

We thought that by limiting the number of properties that are available for
faceting, the performance can be improved. To test this, we enabled only 6
properties for faceting by setting indexed=true (in schema.xml) for only
these properties. All other properties which are defined as dynamic
properties had indexed=false. The observations after this change :

- Index size reduced by a meagre 5 % only
- Performance did not improve. Infact during PSR run we observed that it
degraded.

My questions:
 - Will reducing the number of facets improve faceting and search
performance ?
- Is there a better way to reduce the number of facets ?
- Will having a large number of properties defined as dynamic fields, reduce
performance ?

Thank you.

Regards
Rahul

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Rahul R

Erik,
I understand that caching is going to improve performance. Infact we did a
PSR run with caches enabled and we got awesome results. But these wouldn't
be really representative because the PSR scripts will be doing the same
searches again and again. These would be cached and there would be virtually
no evictions. This is not a practical case.

My hardware (in the PSR environment where I am testing) is pretty good - 12
CPU, 24 G RAM, Ultrasparc III 1.2 GHz processors, Solaris 10. We have
allocated 3.2 GB RAM for Weblogic (JVM). This is the maximum that I am able
to allocate for one JVM.
I think I need to go back and check if I am not using all the fields in the
query. I understand that setting indexed=false alone will not ensure that
all fields don't participate in the query.

Thanks a lot for your response.

Regards
Rahul
On Fri, Jul 31, 2009 at 3:33 PM, Erik Hatcher wrote:

>
> On Jul 31, 2009, at 2:35 AM, Rahul R wrote:
>
> Hello,
>> We are trying to get Solr to work for a really huge parts database.
>> Details
>> of the database
>> - 55 million parts
>> - Totally 3700 properties (facets). But each record will not have value
>> for
>> all properties.
>> - Most of these facets are defined as dynamic fields within the Solr Index
>>
>> We were getting really unacceptable timing while doing faceting/searches
>> on
>> an index created with this database.
>>
>
> Were you accounting for cache warming?  Were your caches sized
> appropriately?  What kind of hardware and RAM were you using?  What were the
> JVM settings?
>
> And certainly not least important - what version of Solr are you running?
> The difference in faceting performance and scalability between Solr 1.3 and
> what will be Solr 1.4 is quite dramatic.
>
> We thought that by limiting the number of properties that are available for
>> faceting, the performance can be improved. To test this, we enabled only 6
>> properties for faceting by setting indexed=true (in schema.xml) for only
>> these properties. All other properties which are defined as dynamic
>> properties had indexed=false.
>>
>
> These settings won't matter - what matters in this case is what facets you
> request, not what is actually in the index.
>
>
> My questions:
>> - Will reducing the number of facets improve faceting and search
>> performance ?
>>
>
> Reducing what fields you request will, of course.  But what you actually
> index has no effect on performance until you request it.
>
> - Is there a better way to reduce the number of facets ?
>>
>
> Hard to say without doing a deeper analysis of your needs.
>
> - Will having a large number of properties defined as dynamic fields,
>> reduce
>> performance ?
>>
>
> Dynamic fields versus statically named fields have no effect on
> performance.
>
>Erik
>
>

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Rahul R

In a production environment, having the caches enabled makes a lot of sense.
And most definitely we will be enabling them. However, the primary idea of
this exercise is to verify if limiting the number of facets will actually
improve the performance.

An update on this. I did verify and looks like although I set indexed=false
for most of the properties, I have not blocked them from participating in
the query. I now enabled only 7 properties for faceting. Now at any given
time only a maximum of 7 facets will participate in the query. Performance
has now improved from an erstwhile 60 seconds to around 10 seconds.

This really helped. Thanks a lot !

Regards
Rahul

On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher wrote:

>
> On Jul 31, 2009, at 7:17 AM, Rahul R wrote:
>
> Erik,
>> I understand that caching is going to improve performance. Infact we did a
>> PSR run with caches enabled and we got awesome results. But these wouldn't
>> be really representative because the PSR scripts will be doing the same
>> searches again and again. These would be cached and there would be
>> virtually
>> no evictions. This is not a practical case.
>>
>
> I don't understand how this is not practical.  Why wouldn't having the
> caches warmed and filled with the facets be practical for your needs?
>
>Erik
>
>

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Rahul R

We are using 1.3.0. Thanks for the suggestion. Will see if I can try one of
the ngihtly builds.

On Fri, Jul 31, 2009 at 7:49 PM, Erik Hatcher wrote:

> What version of Solr?   Try a nightly build if you're at Solr 1.3 or
> earlier and you'll be amazed at the difference.
>
>Erik
>
>
> On Jul 31, 2009, at 10:00 AM, Rahul R wrote:
>
> In a production environment, having the caches enabled makes a lot of
>> sense.
>> And most definitely we will be enabling them. However, the primary idea of
>> this exercise is to verify if limiting the number of facets will actually
>> improve the performance.
>>
>> An update on this. I did verify and looks like although I set
>> indexed=false
>> for most of the properties, I have not blocked them from participating in
>> the query. I now enabled only 7 properties for faceting. Now at any given
>> time only a maximum of 7 facets will participate in the query. Performance
>> has now improved from an erstwhile 60 seconds to around 10 seconds.
>>
>> This really helped. Thanks a lot !
>>
>> Regards
>> Rahul
>>
>> On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher > >wrote:
>>
>>
>>> On Jul 31, 2009, at 7:17 AM, Rahul R wrote:
>>>
>>> Erik,
>>>
>>>> I understand that caching is going to improve performance. Infact we did
>>>> a
>>>> PSR run with caches enabled and we got awesome results. But these
>>>> wouldn't
>>>> be really representative because the PSR scripts will be doing the same
>>>> searches again and again. These would be cached and there would be
>>>> virtually
>>>> no evictions. This is not a practical case.
>>>>
>>>>
>>> I don't understand how this is not practical.  Why wouldn't having the
>>> caches warmed and filled with the facets be practical for your needs?
>>>
>>>  Erik
>>>
>>>
>>>
>

JVM Heap utilization & Memory leaks with Solr

2009-08-03 Thread Rahul R

I am trying to track memory utilization with my Application that uses Solr.
Details of the setup :
 -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0
- Hardware : 12 CPU, 24 GB RAM

For testing during PSR I am using a smaller subset of the actual data that I
want to work with. Details of this smaller sub-set :
- 5 million records, 4.5 GB index size

Observations during PSR:
A) I have allocated 3.2 GB for the JVM(s) that I used. After all users
logout and doing a force GC, only 60 % of the heap is reclaimed. As part of
the logout process I am invalidating the HttpSession and doing a close() on
CoreContainer. From my application's side, I don't believe I am holding on
to any resource. I wanted to know if there are known issues surrounding
memory leaks with Solr ?
B) To further test this, I tried deploying with shards. 3.2 GB was allocated
to each JVM. All JVMs had 96 % free heap space after start up. I got varying
results with this.
Case 1 : Used 6 weblogic domains. My application was deployed one 1 domain.
I split the 5 million index into 5 parts of 1 million each and used them as
shards. After multiple users used the system and doing a force GC, around 94
- 96 % of heap was reclaimed in all the JVMs.
Case 2: Used 2 weblogic domains. My application was deployed on 1 domain. On
the other, I deployed the entire 5 million part index as one shard. After
multiple users used the system and doing a gorce GC, around 76 % of the heap
was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM where my
application was running. This result further convinces me that my
application can be absolved of holding on to memory resources.

I am not sure how to interpret these results ? For searching, I am using
Without Shards : EmbeddedSolrServer
With Shards :CommonsHttpSolrServer
In terms of Solr objects this is what differs in my code between normal
search and shards search (distributed search)

After looking at Case 1, I thought that the CommonsHttpSolrServer was more
memory efficient but Case 2 proved me wrong. Or could there still be memory
leaks in my application ? Any thoughts, suggestions would be welcome.

Regards
Rahul

Re: Rotating the primary shard in /solr/select

2009-08-03 Thread Rahul R

Philip,
I cannot answer your question, but I do have a question for you. Does
aggregation happen at the primary shard ? For eg : if I have three JVMs
JVM 1 : My application powered by Solr
JVM 2 : Shard 1
JVM 3 : Shard 2

I initialize my SolrServer like this
SolrServer _solrServer = *new* CommonsHttpSolrServer(shard1);

Does aggregation now happen at JVM 2 ? Is there any other reason for
initializing the SolrServer with one of the shard URLs ?

On Wed, Jul 29, 2009 at 2:57 AM, Phillip Farber  wrote:

>
> Is there any value in a round-robin scheme to cycle through the Solr
> instances supporting a multi-shard index over several machines when sending
> queries or is it better to just pick one instance and stick with it.  I'm
> assuming all machines in the cluster have the same hardware specs.
>
> So scenario A (round-robin):
>
> query 1: /solr-shard-1/select?q=dog... shards=shard-1,shard2
> query 2: /solr-shard-2/select?q=dog... shards=shard-1,shard2
> query 3: /solr-shard-1/select?q=dog... shards=shard-1,shard2
> etc.
>
> or or scenario B (fixed):
>
> query 1: /solr-shard-1/select?q=dog... shards=shard-1,shard2
> query 2: /solr-shard-1/select?q=dog... shards=shard-1,shard2
> query 3: /solr-shard-1/select?q=dog... shards=shard-1,shard2
> etc.
>
> Is there evidence that distributing the overhead of result merging over
> more machines (A) gives a performance boost?
>
> Thanks,
>
> Phil
>
>
>

Re: Rotating the primary shard in /solr/select

2009-08-04 Thread Rahul R

*The SolrServer is initialized to the server to which you want to send the
request. It has nothing to do with distributed search by itself.*

But isn't the request sent to all the shards ? We set all the shard urls in
the 'shards' parameter of our HttpRequest.Or is it something like the
request is first sent to the server (with which SolrServer is initialized)
and from there it is sent to all the other shards ?

Regards
Rahul
On Tue, Aug 4, 2009 at 2:29 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Aug 4, 2009 at 11:26 AM, Rahul R  wrote:
>
> > Philip,
> > I cannot answer your question, but I do have a question for you. Does
> > aggregation happen at the primary shard ? For eg : if I have three JVMs
> > JVM 1 : My application powered by Solr
> > JVM 2 : Shard 1
> > JVM 3 : Shard 2
> >
> > I initialize my SolrServer like this
> > SolrServer _solrServer = *new* CommonsHttpSolrServer(shard1);
> >
> > Does aggregation now happen at JVM 2 ?
>
>
> Yes.
>
>
> > Is there any other reason for
> > initializing the SolrServer with one of the shard URLs ?
> >
>
> The SolrServer is initialized to the server to which you want to send the
> request. It has nothing to do with distributed search by itself.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Rotating the primary shard in /solr/select

2009-08-04 Thread Rahul R

Shalin, thank you for the clarification.

Philip, I just realized that I have diverted the original topic of the
thread. My apologies.

Regards
Rahul

On Tue, Aug 4, 2009 at 3:35 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Aug 4, 2009 at 2:37 PM, Rahul R  wrote:
>
> > *The SolrServer is initialized to the server to which you want to send
> the
> > request. It has nothing to do with distributed search by itself.*
> >
> > But isn't the request sent to all the shards ? We set all the shard urls
> in
> > the 'shards' parameter of our HttpRequest.Or is it something like the
> > request is first sent to the server (with which SolrServer is
> initialized)
> > and from there it is sent to all the other shards ?
> >
>
> The request is sent to the server with which SolrServer is initialized.
> That
> server makes use of the shards parameter, queries other servers, merges the
> responses and sends it back to the client.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: JVM Heap utilization & Memory leaks with Solr

2009-08-04 Thread Rahul R

Otis,
Thank you for your response. I know there are a few variables here but the
difference in memory utilization with and without shards somehow leads me to
believe that the leak could be within Solr.

I tried using a profiling tool - Yourkit. The trial version was free for 15
days. But I couldn't find anything of significance.

Regards
Rahul


On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic  wrote:

> Hi Rahul,
>
> A) There are no known (to me) memory leaks.
> I think there are too many variables for a person to tell you what exactly
> is happening, plus you are dealing with the JVM here. :)
>
> Try jmap -histo:live PID-HERE | less and see what's using your memory.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message 
> > From: Rahul R 
> > To: solr-user@lucene.apache.org
> > Sent: Tuesday, August 4, 2009 1:09:06 AM
> > Subject: JVM Heap utilization & Memory leaks with Solr
> >
> > I am trying to track memory utilization with my Application that uses
> Solr.
> > Details of the setup :
> > -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0
> > - Hardware : 12 CPU, 24 GB RAM
> >
> > For testing during PSR I am using a smaller subset of the actual data
> that I
> > want to work with. Details of this smaller sub-set :
> > - 5 million records, 4.5 GB index size
> >
> > Observations during PSR:
> > A) I have allocated 3.2 GB for the JVM(s) that I used. After all users
> > logout and doing a force GC, only 60 % of the heap is reclaimed. As part
> of
> > the logout process I am invalidating the HttpSession and doing a close()
> on
> > CoreContainer. From my application's side, I don't believe I am holding
> on
> > to any resource. I wanted to know if there are known issues surrounding
> > memory leaks with Solr ?
> > B) To further test this, I tried deploying with shards. 3.2 GB was
> allocated
> > to each JVM. All JVMs had 96 % free heap space after start up. I got
> varying
> > results with this.
> > Case 1 : Used 6 weblogic domains. My application was deployed one 1
> domain.
> > I split the 5 million index into 5 parts of 1 million each and used them
> as
> > shards. After multiple users used the system and doing a force GC, around
> 94
> > - 96 % of heap was reclaimed in all the JVMs.
> > Case 2: Used 2 weblogic domains. My application was deployed on 1 domain.
> On
> > the other, I deployed the entire 5 million part index as one shard. After
> > multiple users used the system and doing a gorce GC, around 76 % of the
> heap
> > was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM where
> my
> > application was running. This result further convinces me that my
> > application can be absolved of holding on to memory resources.
> >
> > I am not sure how to interpret these results ? For searching, I am using
> > Without Shards : EmbeddedSolrServer
> > With Shards :CommonsHttpSolrServer
> > In terms of Solr objects this is what differs in my code between normal
> > search and shards search (distributed search)
> >
> > After looking at Case 1, I thought that the CommonsHttpSolrServer was
> more
> > memory efficient but Case 2 proved me wrong. Or could there still be
> memory
> > leaks in my application ? Any thoughts, suggestions would be welcome.
> >
> > Regards
> > Rahul
>
>

Re: JVM Heap utilization & Memory leaks with Solr

2009-08-12 Thread Rahul R

*You should try to generate heap dumps and analyze the heap using a tool
like the Eclipse Memory Analyzer. Maybe it helps spotting a group of
objects holding a large amount of memory*

The tool that I used also allows to capture heap snap shots. Eclipse had a
lot of pre-requisites. You need to apply some three or five patches before
you can start using it My observations with this tool were that some
Hashmaps were taking up a lot of space. Although I could not pin it down to
the exact HashMap. These would either be weblogic's or Solr's I will
anyway give eclipse's a try and see how it goes. Thanks for your input.

Rahul

On Wed, Aug 12, 2009 at 2:15 PM, Gunnar Wagenknecht
wrote:

> Rahul R schrieb:
> > I tried using a profiling tool - Yourkit. The trial version was free for
> 15
> > days. But I couldn't find anything of significance.
>
> You should try to generate heap dumps and analyze the heap using a tool
> like the Eclipse Memory Analyzer. Maybe it helps spotting a group of
> objects holding a large amount of memory.
>
> -Gunnar
>
> --
> Gunnar Wagenknecht
> gun...@wagenknecht.org
> http://wagenknecht.org/
>
>

Re: JVM Heap utilization & Memory leaks with Solr

2009-08-16 Thread Rahul R

My primary issue is not Out of Memory error at run time. It is memory leaks:
heap space not being released after doing a force GC also. So after sometime
as progressively more heap gets utilized, I start running out of memory
The verdict however seems unanimous that there are no known memory leak
issues within Solr. I am still looking at my application to analyse the
problem. Thank you.

On Thu, Aug 13, 2009 at 10:58 PM, Fuad Efendi  wrote:

> Most OutOfMemoryException (if not 100%) happening with SOLR are because of
>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/FieldCache.
> html
> - it is used internally in Lucene to cache Field value and document ID.
>
> My very long-term observations: SOLR can run without any problems few
> days/months and unpredictable OOM happens just because someone tried sorted
> search which will populate array with IDs of ALL documents in the index.
>
> The only solution: calculate exactly amount of RAM needed for FieldCache...
> For instance, for 100,000,000 documents single instance of FieldCache may
> require 8*100,000,000 bytes (8 bytes per document ID?) which is almost 1Gb
> (at least!)
>
>
> I didn't notice any memory leaks after I started to use 16Gb RAM for SOLR
> instance (almost a year without any restart!)
>
>
>
>
> -Original Message-
> From: Rahul R [mailto:rahul.s...@gmail.com]
> Sent: August-13-09 1:25 AM
> To: solr-user@lucene.apache.org
>  Subject: Re: JVM Heap utilization & Memory leaks with Solr
>
> *You should try to generate heap dumps and analyze the heap using a tool
> like the Eclipse Memory Analyzer. Maybe it helps spotting a group of
> objects holding a large amount of memory*
>
> The tool that I used also allows to capture heap snap shots. Eclipse had a
> lot of pre-requisites. You need to apply some three or five patches before
> you can start using it My observations with this tool were that
> some
> Hashmaps were taking up a lot of space. Although I could not pin it down to
> the exact HashMap. These would either be weblogic's or Solr's I will
> anyway give eclipse's a try and see how it goes. Thanks for your input.
>
> Rahul
>
> On Wed, Aug 12, 2009 at 2:15 PM, Gunnar Wagenknecht
> wrote:
>
> > Rahul R schrieb:
> > > I tried using a profiling tool - Yourkit. The trial version was free
> for
> > 15
> > > days. But I couldn't find anything of significance.
> >
> > You should try to generate heap dumps and analyze the heap using a tool
> > like the Eclipse Memory Analyzer. Maybe it helps spotting a group of
> > objects holding a large amount of memory.
> >
> > -Gunnar
> >
> > --
> > Gunnar Wagenknecht
> > gun...@wagenknecht.org
> > http://wagenknecht.org/
> >
> >
>
>
>

Re: JVM Heap utilization & Memory leaks with Solr

2009-08-19 Thread Rahul R

Fuad,
We have around 5 million documents and around 3700 fields. All documents
will not have values for all the fields JRockit is not approved for use
within my organization. But thanks for the info anyway.

Regards
Rahul

On Tue, Aug 18, 2009 at 9:41 AM, Funtick  wrote:

>
> BTW, you should really prefer JRockit which really rocks!!!
>
> "Mission Control" has necessary toolongs; and JRockit produces _nice_
> exception stacktrace (explaining almost everything) in case of even OOM
> which SUN JVN still fails to produce.
>
>
> SolrServlet still catches "Throwable":
>
>} catch (Throwable e) {
>  SolrException.log(log,e);
>  sendErr(500, SolrException.toStr(e), request, response);
>} finally {
>
>
>
>
>
> Rahul R wrote:
> >
> > Otis,
> > Thank you for your response. I know there are a few variables here but
> the
> > difference in memory utilization with and without shards somehow leads me
> > to
> > believe that the leak could be within Solr.
> >
> > I tried using a profiling tool - Yourkit. The trial version was free for
> > 15
> > days. But I couldn't find anything of significance.
> >
> > Regards
> > Rahul
> >
> >
> > On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic
> >  >> wrote:
> >
> >> Hi Rahul,
> >>
> >> A) There are no known (to me) memory leaks.
> >> I think there are too many variables for a person to tell you what
> >> exactly
> >> is happening, plus you are dealing with the JVM here. :)
> >>
> >> Try jmap -histo:live PID-HERE | less and see what's using your memory.
> >>
> >> Otis
> >> --
> >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >>
> >>
> >>
> >> - Original Message 
> >> > From: Rahul R 
> >> > To: solr-user@lucene.apache.org
> >> > Sent: Tuesday, August 4, 2009 1:09:06 AM
> >> > Subject: JVM Heap utilization & Memory leaks with Solr
> >> >
> >> > I am trying to track memory utilization with my Application that uses
> >> Solr.
> >> > Details of the setup :
> >> > -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr 1.3.0
> >> > - Hardware : 12 CPU, 24 GB RAM
> >> >
> >> > For testing during PSR I am using a smaller subset of the actual data
> >> that I
> >> > want to work with. Details of this smaller sub-set :
> >> > - 5 million records, 4.5 GB index size
> >> >
> >> > Observations during PSR:
> >> > A) I have allocated 3.2 GB for the JVM(s) that I used. After all users
> >> > logout and doing a force GC, only 60 % of the heap is reclaimed. As
> >> part
> >> of
> >> > the logout process I am invalidating the HttpSession and doing a
> >> close()
> >> on
> >> > CoreContainer. From my application's side, I don't believe I am
> holding
> >> on
> >> > to any resource. I wanted to know if there are known issues
> surrounding
> >> > memory leaks with Solr ?
> >> > B) To further test this, I tried deploying with shards. 3.2 GB was
> >> allocated
> >> > to each JVM. All JVMs had 96 % free heap space after start up. I got
> >> varying
> >> > results with this.
> >> > Case 1 : Used 6 weblogic domains. My application was deployed one 1
> >> domain.
> >> > I split the 5 million index into 5 parts of 1 million each and used
> >> them
> >> as
> >> > shards. After multiple users used the system and doing a force GC,
> >> around
> >> 94
> >> > - 96 % of heap was reclaimed in all the JVMs.
> >> > Case 2: Used 2 weblogic domains. My application was deployed on 1
> >> domain.
> >> On
> >> > the other, I deployed the entire 5 million part index as one shard.
> >> After
> >> > multiple users used the system and doing a gorce GC, around 76 % of
> the
> >> heap
> >> > was reclaimed in the shard JVM. And 96 % was reclaimed in the JVM
> where
> >> my
> >> > application was running. This result further convinces me that my
> >> > application can be absolved of holding on to memory resources.
> >> >
> >> > I am not sure how to interpret these results ? For searching, I am
> >> using
> >> > Without Shards : EmbeddedSolrServer
> >> > With Shards :CommonsHttpSolrServer
> >> > In terms of Solr objects this is what differs in my code between
> normal
> >> > search and shards search (distributed search)
> >> >
> >> > After looking at Case 1, I thought that the CommonsHttpSolrServer was
> >> more
> >> > memory efficient but Case 2 proved me wrong. Or could there still be
> >> memory
> >> > leaks in my application ? Any thoughts, suggestions would be welcome.
> >> >
> >> > Regards
> >> > Rahul
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/JVM-Heap-utilization---Memory-leaks-with-Solr-tp24802380p25018165.html
>  Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: JVM Heap utilization & Memory leaks with Solr

2009-08-20 Thread Rahul R

All these 3700 fields are single valued non-boolean fields. Thanks

Regards
Rahul

On Wed, Aug 19, 2009 at 8:33 PM, Fuad Efendi  wrote:

>
> Hi Rahul,
>
> JRockit could be used at least in a test environment to monitor JVM (and
> troubleshoot SOLR, licensed for-free for developers!); they have even
> Eclipse plugin now, and it is licensed by Oracle (BEA)... But, of course,
> in
> large companies test environment is in hands of testers :)
>
>
> But... 3700 fields will create (over time) 3700 arrays  each of size
> 5,000,000!!! Even if most of fields are empty for most of documents...
> Applicable to non-tokenized single-valued non-boolean fields only, Lucene
> internals, FieldCache... and it won't be GC-collected after user log-off...
> prefer dedicated box for SOLR.
>
> -Fuad
>
>
> -Original Message-
> From: Rahul R [mailto:rahul.s...@gmail.com]
> Sent: August-19-09 6:19 AM
> To: solr-user@lucene.apache.org
>  Subject: Re: JVM Heap utilization & Memory leaks with Solr
>
> Fuad,
> We have around 5 million documents and around 3700 fields. All documents
> will not have values for all the fields JRockit is not approved for use
> within my organization. But thanks for the info anyway.
>
> Regards
> Rahul
>
> On Tue, Aug 18, 2009 at 9:41 AM, Funtick  wrote:
>
> >
> > BTW, you should really prefer JRockit which really rocks!!!
> >
> > "Mission Control" has necessary toolongs; and JRockit produces _nice_
> > exception stacktrace (explaining almost everything) in case of even OOM
> > which SUN JVN still fails to produce.
> >
> >
> > SolrServlet still catches "Throwable":
> >
> >} catch (Throwable e) {
> >  SolrException.log(log,e);
> >  sendErr(500, SolrException.toStr(e), request, response);
> >} finally {
> >
> >
> >
> >
> >
> > Rahul R wrote:
> > >
> > > Otis,
> > > Thank you for your response. I know there are a few variables here but
> > the
> > > difference in memory utilization with and without shards somehow leads
> me
> > > to
> > > believe that the leak could be within Solr.
> > >
> > > I tried using a profiling tool - Yourkit. The trial version was free
> for
> > > 15
> > > days. But I couldn't find anything of significance.
> > >
> > > Regards
> > > Rahul
> > >
> > >
> > > On Tue, Aug 4, 2009 at 7:35 PM, Otis Gospodnetic
> > >  > >> wrote:
> > >
> > >> Hi Rahul,
> > >>
> > >> A) There are no known (to me) memory leaks.
> > >> I think there are too many variables for a person to tell you what
> > >> exactly
> > >> is happening, plus you are dealing with the JVM here. :)
> > >>
> > >> Try jmap -histo:live PID-HERE | less and see what's using your memory.
> > >>
> > >> Otis
> > >> --
> > >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> > >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> > >>
> > >>
> > >>
> > >> - Original Message 
> > >> > From: Rahul R 
> > >> > To: solr-user@lucene.apache.org
> > >> > Sent: Tuesday, August 4, 2009 1:09:06 AM
> > >> > Subject: JVM Heap utilization & Memory leaks with Solr
> > >> >
> > >> > I am trying to track memory utilization with my Application that
> uses
> > >> Solr.
> > >> > Details of the setup :
> > >> > -3rd party Software : Solaris 10, Weblogic 10, jdk_150_14, Solr
> 1.3.0
> > >> > - Hardware : 12 CPU, 24 GB RAM
> > >> >
> > >> > For testing during PSR I am using a smaller subset of the actual
> data
> > >> that I
> > >> > want to work with. Details of this smaller sub-set :
> > >> > - 5 million records, 4.5 GB index size
> > >> >
> > >> > Observations during PSR:
> > >> > A) I have allocated 3.2 GB for the JVM(s) that I used. After all
> users
> > >> > logout and doing a force GC, only 60 % of the heap is reclaimed. As
> > >> part
> > >> of
> > >> > the logout process I am invalidating the HttpSession and doing a
> > >> close()
> > >> on
> > >> > CoreContainer. From my application's side, I don't believe I am
> > holding
> > >> on
> >

Implementing a logout

2009-08-20 Thread Rahul R

Hello,
Can somebody give me some pointers on the Solr objects I need to clean
up/release while doing a logout on a Solr Application. I find that only the
SolrCore object has a close() method. I typically do a lot of faceting
queries on a large dataset with my application. I am using Solr 1.3.0.

Regards
Rahul

Re: Implementing a logout

2009-08-23 Thread Rahul R

Just clarifying : My query was more specific to Solr. I wanted to check if
there are any Solr resources that are session-specific that we need to
release.

*>> I can't understand: do you use several web applications in a same
>> container?
>> Are you trying to close shared SolrCore when one of many users (of
another
>> application) logs off?*
I have only one application that is built on top of Solr. I mentioned
SolrCore in my mail only because that was the only object that I noticed
which had a close() method. I don't intend to do a close() on SolrCore when
a particular user logs out (closes his session)

*> There is no 'logout'. There is no permanent state in Solr beyond the
Lucene
> index. There are caches, but these do not require any termination. The
> Lucene API has very solid self-protection for the indexes and Solr uses
the
> API in the right way.*
I understand Solr application does not have a logout. Please correct me if I
am wrong but you seem to be stating that there is no explicit action
required from our side to release any Solr resources when a user terminates
his/her session of a Solr based application. If that is the case, then my
query is answered.

Thank you all.

Regards
Rahul



On Sun, Aug 23, 2009 at 7:16 AM, Lance Norskog  wrote:

> Sorry, hit 'send' too soon. You can kill the servlet process, but it is
> much
> better to use the servlet container's shutdown protocol.
>
> On Sat, Aug 22, 2009 at 6:46 PM, Lance Norskog  wrote:
>
> > There is no 'logout'. There is no permanent state in Solr beyond the
> Lucene
> > index. There are caches, but these do not require any termination. The
> > Lucene API has very solid self-protection for the indexes and Solr uses
> the
> > API in the right way.
> >
> > If you run a Solr distribution in a standard servlet container, you can
> > just use the servlet's shutdown protocol. If you call a commit with
> > waitFlush=true, then do not index any records, you can kill the servlet
> > process.
> >
> >   On Fri, Aug 21, 2009 at 7:45 AM, Fuad Efendi  wrote:
> >
> >> I can't understand: do you use several web applications in a same
> >> container?
> >> Are you trying to close shared SolrCore when one of many users (of
> another
> >> application) logs off?
> >>
> >> Usually one needs to clean up only user-session specific objects (such
> as
> >> non-persistent shopping cart)...
> >>
> >>
> >> -Original Message-
> >> From: Rahul R [mailto:rahul.s...@gmail.com]
> >> Sent: August-21-09 1:20 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Implementing a logout
> >>
> >> Hello,
> >> Can somebody give me some pointers on the Solr objects I need to clean
> >> up/release while doing a logout on a Solr Application. I find that only
> >> the
> >> SolrCore object has a close() method. I typically do a lot of faceting
> >> queries on a large dataset with my application. I am using Solr 1.3.0.
> >>
> >> Regards
> >> Rahul
> >>
> >>
> >>
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
> >
> >
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Re: Implementing a logout

2009-08-24 Thread Rahul R

*"release any SOLR resources" - no need.*

My query is answered. Thank you.

Regards
Rahul

On Mon, Aug 24, 2009 at 12:32 AM, Fuad Efendi  wrote:

> Truly correct:
>
> - SOLR does not create HttpSession for user access to Admin screens (do we
> have any other screens of UI?)
> - SolrCore is shared object; closing it and reopening for each user session
> is extremely expensive; this object requires gigabytes of RAM in even
> simplest scenario
>
> User doesn't have any session with SOLR based application. User may have
> session with different application, and this one may use resources provided
> by SOLR.
>
>
> "release any SOLR resources" - no need.
>
> Of course, user session may store pointers to thousands documents retrieved
> via SOLR query, - just close the session object.
>
>
> >I understand Solr application does not have a logout. Please correct me if
> I
> >am wrong but you seem to be stating that there is no explicit action
> >required from our side to release any Solr resources when a user
> terminates
> >his/her session of a Solr based application. If that is the case, then my
> >query is answered.
>
>
>
>

Monitoring split time for fq queries when filter cache is used

2009-08-31 Thread Rahul R

Hello,
I am trying to measure the benefit that I am getting out of using the filter
cache. As I understand, there are two major parts to an fq query. Please
correct me if I am wrong :
- doing full index queries of each of the fq params (if filter cache is
used, this result will be retrieved from the cache)
- set intersection of above results (Will be done again even with filter
cache enabled)

Is there any flag/setting that I can enable to monitor how much time the
above operations take separately i.e. the querying and the set-intersection
?

Regards
Rahul

Re: Monitoring split time for fq queries when filter cache is used

2009-09-01 Thread Rahul R

Thank you Martijn.

On Tue, Sep 1, 2009 at 8:07 PM, Martijn v Groningen <
martijn.is.h...@gmail.com> wrote:

> Hi Rahul,
>
> Yes you are understanding is correct, but it is not possible to
> monitor these actions separately with Solr.
>
> Martijn
>
> 2009/9/1 Rahul R :
>  > Hello,
> > I am trying to measure the benefit that I am getting out of using the
> filter
> > cache. As I understand, there are two major parts to an fq query. Please
> > correct me if I am wrong :
> > - doing full index queries of each of the fq params (if filter cache is
> > used, this result will be retrieved from the cache)
> > - set intersection of above results (Will be done again even with filter
> > cache enabled)
> >
> > Is there any flag/setting that I can enable to monitor how much time the
> > above operations take separately i.e. the querying and the
> set-intersection
> > ?
> >
> > Regards
> > Rahul
> >
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>

Questions on copyField

2009-09-14 Thread Rahul R

Hello,
I have a few questions regarding the copyField directive in schema.xml

1. Does the destination field store a reference or the actual data ?
If I have soemthing like this

then will the values in the 'name' field get copied into the 'text' field or
will the 'text' field only store a reference to the 'name' field ? To put it
more simply, if I later delete the 'name' field from the index will I lose
the corresponding data in the 'text' field ?

2. Is there any inbuilt API which I can use to do the copyField action
programmatically ?

3. Can I do a copyfield from the schema as well as programmatically for the
same destination field
Suppose I want the 'text' field to contain values for name, age and
location. In my index only 'name' and 'age' are defined as fields. So I can
add directives like


The location however, I want to add it to the 'text' field programmatically.
I don't want to store the location as a separate field in the index. Can I
do this ?

Thank you.

Regards
Rahul

Re: Questions on copyField

2009-09-15 Thread Rahul R

Would appreciate any help on this. Thanks

Rahul
On Mon, Sep 14, 2009 at 5:12 PM, Rahul R  wrote:

> Hello,
> I have a few questions regarding the copyField directive in schema.xml
>
> 1. Does the destination field store a reference or the actual data ?
> If I have soemthing like this
> 
> then will the values in the 'name' field get copied into the 'text' field
> or will the 'text' field only store a reference to the 'name' field ? To put
> it more simply, if I later delete the 'name' field from the index will I
> lose the corresponding data in the 'text' field ?
>
> 2. Is there any inbuilt API which I can use to do the copyField action
> programmatically ?
>
> 3. Can I do a copyfield from the schema as well as programmatically for the
> same destination field
> Suppose I want the 'text' field to contain values for name, age and
> location. In my index only 'name' and 'age' are defined as fields. So I can
> add directives like
> 
> 
> The location however, I want to add it to the 'text' field
> programmatically. I don't want to store the location as a separate field in
> the index. Can I do this ?
>
> Thank you.
>
> Regards
> Rahul
>

Re: Questions on copyField

2009-09-16 Thread Rahul R

Shalin,
Can you please elaborate a little more on the third response
*You can send the location's value directly as the value of the text field.*
I dont follow. I am adding 'name' and 'age' to the 'text' field through the
schema. If I add the 'location' from the program, will either one copy
(schema or program) not over-write the other ?
*Also note, that you don't really need to index/store the source field. You
can make the location field's type as ignored in the schema.*
Understood

Thank you for your response.

Regards
Rahul
On Wed, Sep 16, 2009 at 1:56 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Sep 14, 2009 at 5:12 PM, Rahul R  wrote:
>
> > Hello,
> > I have a few questions regarding the copyField directive in schema.xml
> >
> > 1. Does the destination field store a reference or the actual data ?
> >
>
> It makes a copy. Storing or indexing of the field depends on the field
> configuration.
>
>
> > If I have soemthing like this
> > 
> > then will the values in the 'name' field get copied into the 'text' field
> > or
> > will the 'text' field only store a reference to the 'name' field ? To put
> > it
> > more simply, if I later delete the 'name' field from the index will I
> lose
> > the corresponding data in the 'text' field ?
> >
> >
> The values will get copied. If you delete all values from the 'name' field
> from the index, the data in "text" field remain as-is.
>
>
>
> > 2. Is there any inbuilt API which I can use to do the copyField action
> > programmatically ?
> >
> >
> No. But you can always copy explicitly before sending or you can use a
> custom UpdateRequestProcessor to copy values from one field to another
> during indexing.
>
>
> > 3. Can I do a copyfield from the schema as well as programmatically for
> the
> > same destination field
> > Suppose I want the 'text' field to contain values for name, age and
> > location. In my index only 'name' and 'age' are defined as fields. So I
> can
> > add directives like
> > 
> > 
> > The location however, I want to add it to the 'text' field
> > programmatically.
> > I don't want to store the location as a separate field in the index. Can
> I
> > do this ?
> >
> >
> You can send the location's value directly as the value of the text field.
> Also note, that you don't really need to index/store the source field. You
> can make the location field's type as ignored in the schema.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Questions on copyField

2009-09-17 Thread Rahul R

Thank you Shalin.

Regards
Rahul

On Thu, Sep 17, 2009 at 11:49 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Thu, Sep 17, 2009 at 11:19 AM, Rahul R  wrote:
>
> > Shalin,
> > Can you please elaborate a little more on the third response
> > *You can send the location's value directly as the value of the text
> > field.*
> > I dont follow. I am adding 'name' and 'age' to the 'text' field through
> the
> > schema. If I add the 'location' from the program, will either one copy
> > (schema or program) not over-write the other ?
> >
>
> No, it will not overwrite, it will just append values of name and age to
> the
> values already sent as the text field.
> --
> Regards,
> Shalin Shekhar Mangar.
>

Question on omitNorms definition

2009-09-18 Thread Rahul R

Hello,
A rather trivial question on omitNorms parameter in schema.xml. The
out-of-the-box schema.xml uses this parameter during both within
the  tag and  tag and  If we define the omitNorms during
the fieldType definition, will it hold good for all fields that are defined
using the same fieldType. For eg:




Now, will these dynamic fields have omitNorms=true for it ? I have read
about significant RAM usage when omitNorms is not set to true. Hence would
like to ensure that it is set to true for most of my fields.

Regards
Rahul

Measuring timing with debugQuery=true

2009-09-27 Thread Rahul R

Hello,
I am trying to measure why some of my queries take a long time. I am using
EmbeddedSolrServer and with logging statements before and
after the EmbeddedSolrServer.query(SolrQuery) function, I have found the
time to be around 16s. I added the debugQuery=true and the timing component
for this reads as following:

*
timing:{time=2438.0,prepare={time=0.0,org.apache.solr.handler.component.QueryComponent={time=0.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}},process={time=2438.0,org.apache.solr.handler.component.QueryComponent={time=2438.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}}}
*

As you can see, this shows only 2.4s being used by the query. I can't seem
to figure out where the rest of the time is being spent. This is within my
office intranet and I don't think the request-response time over the wire
will cause significant overhead. So my question : is the timing information
presented here comprehensive or are there more time consuming operations
that are not represented here ? I guess GC pause times could be one answer
(I hope not !) Also, the above result was for a faceted query. I can't
understand why the FacetComponent would be zero. Any thoughts ?

Rahul

Re: Measuring timing with debugQuery=true

2009-09-28 Thread Rahul R

Yonik,
I understand that the network can be a bottle-neck but I am pretty sure that
it is not. I am operating on a 100 MBPS intranet... How do I ensure that
stored fields are cached by the OS ? Only the Solr caches within the JVM are
under my control.. The result set has around 10K documents of which I am
retrieving only 10..I am displaying a max of only 3 fields per document
in my result set. Can the reading time for these stored fields be so long ?
I have totally around 1 million documents in my index Any thoughts
on why the FacetComponent does not take any time while the QueryComponent
takes around 2.4s. I am doing a faceted and keyword query ie I have both 'q'
and 'fq' params in my query Thank you for your response.

Regards
Rahul

On Mon, Sep 28, 2009 at 1:20 AM, Yonik Seeley wrote:

> The response times in a Solr request don't include the time to read
> stored fields (since the response is streamed) and doesn't include the
> time to transfer/read the response (which can be increased by a
> slow/congested network link, or a slow client that doesn't read the
> response immediately).
>
> How many documents are you retrieving?  Reading stored fields for
> documents can be slow if they aren't cached by the OS since it's often
> a disk seek per document read for a large index.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> On Sun, Sep 27, 2009 at 3:41 PM, Rahul R  wrote:
> > Hello,
> > I am trying to measure why some of my queries take a long time. I am
> using
> > EmbeddedSolrServer and with logging statements before and
> > after the EmbeddedSolrServer.query(SolrQuery) function, I have found the
> > time to be around 16s. I added the debugQuery=true and the timing
> component
> > for this reads as following:
> >
> > *
> >
> timing:{time=2438.0,prepare={time=0.0,org.apache.solr.handler.component.QueryComponent={time=0.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}},process={time=2438.0,org.apache.solr.handler.component.QueryComponent={time=2438.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}}}
> > *
> >
> > As you can see, this shows only 2.4s being used by the query. I can't
> seem
> > to figure out where the rest of the time is being spent. This is within
> my
> > office intranet and I don't think the request-response time over the wire
> > will cause significant overhead. So my question : is the timing
> information
> > presented here comprehensive or are there more time consuming operations
> > that are not represented here ? I guess GC pause times could be one
> answer
> > (I hope not !) Also, the above result was for a faceted query. I
> can't
> > understand why the FacetComponent would be zero. Any thoughts ?
> >
> > Rahul
> >
>

Re: Measuring timing with debugQuery=true

2009-09-29 Thread Rahul R

Sorry for the delayed response
**
*How big are your documents?*
I have totally 1 million documents. I have totally 1950 fields in the index.
Every document would probably have values for around 20 - 50 fields.
*What is the total size of the index?*
1 GB

*What's the amout of RAM on your box? How big is the JVM heap (and how much
free memory is left on your system)?*
I have 4 GB RAM. I am using Weblogic 10, 32 Bit. Since it is a windows box,
I am able to allocate only 1 GB to the JVM. No other applications are
running on the system. So the entire 4GB is at the disposal of the
application. I am simulating load using a load tool (15 users)

*Can you show what this slow query looks like (the whole request)?*
q=*%3A*&rows=0&facet=true&facet.mincount=1&facet.limit=2&f.S9942.facet.limit=100&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true

q=*%3A*&fq=S9942%3A%22TEXAS+INSTRUMENTS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true

Other information
Solr 1.3, JDK 1.5.0_14

regards
Rahul

On Mon, Sep 28, 2009 at 6:48 PM, Yonik Seeley wrote:

> On Mon, Sep 28, 2009 at 7:51 AM, Rahul R  wrote:
> > Yonik,
> > I understand that the network can be a bottle-neck but I am pretty sure
> that
> > it is not. I am operating on a 100 MBPS intranet... How do I ensure
> that
> > stored fields are cached by the OS ? Only the Solr caches within the JVM
> are
> > under my control.. The result set has around 10K documents of which I
> am
> > retrieving only 10..I am displaying a max of only 3 fields per
> document
> > in my result set. Can the reading time for these stored fields be so long
> ?
>
> It could be a seek per document if the index is too big to fit in the
> OS cache - but that still wouldn't be as slow as you report.
> Something is fishy here.
>
> How big are your documents?
> What is the total size of the index?
> What's the amout of RAM on your box?
> How big is the JVM heap (and how much free memory is left on your system)?
> Can you show what this slow query looks like (the whole request)?
>
> > I have totally around 1 million documents in my index Any
> thoughts
> > on why the FacetComponent does not take any time while the QueryComponent
> > takes around 2.4s.
>
> It could be a field that has very few unique values and faceting just
> completes quickly.
> Make sure you're actually getting faceting data back (that it's
> correctly turned on).
>
> -Yonik
> http://www.lucidimagination.com
>
> > I am doing a faceted and keyword query ie I have both 'q'
> > and 'fq' params in my query Thank you for your response.
> >
> > Regards
> > Rahul
> >
> > On Mon, Sep 28, 2009 at 1:20 AM, Yonik Seeley <
> yo...@lucidimagination.com>
> > wrote:
> >>
> >> The response times in a Solr request don't include the time to read
> >> stored fields (since the response is streamed) and doesn't include the
> >> time to transfer/read the response (which can be increased by a
> >> slow/congested network link, or a slow client that doesn't read the
> >> response immediately).
> >>
> >> How many documents are you retrieving?  Reading stored fields for
> >> documents can be slow if they aren't cached by the OS since it's often
> >> a disk seek per document read for a large index.
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >> On Sun, Sep 27, 2009 at 3:41 PM, Rahul R  wrote:
> >> > Hello,
> >> > I am trying to measure why some of my queries take a long time. I am
> >> > using
> >> > EmbeddedSolrServer and with logging statements before and
> >> > after the EmbeddedSolrServer.query(SolrQuery) function, I have found
> the
> >> > time to be around 16s. I added the debugQuery=true and the timing
> >> > component
> >> > for this reads as following:
> >> >
> >> > *
> >> >
> >> >
> timing:{time=2438.0,prepare={time=0.0,org.apache.solr.handler.component.QueryComponent={time=0.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.

Re: Measuring timing with debugQuery=true

2009-09-29 Thread Rahul R

I just want to clarify here that I understand my memory allocation might be
less given the load on the system. The response times were only slightly
better when we ran the test on a Solaris box with 12CPU, 24G RAM and with
3.2 GB allocated for the JVM. I know that I have a performance
problem. My main concern is to identify the reasons for the inconsistency
between the timing information shown between the debugQuery output (2.4s)
and the entire time taken by the EmbeddedSolrServer.query(SolrQuery)
function (16s). I feel that if I can find out where the remaining 13.6s gets
used, then I can look to improve accordingly. Thank you.

Regards
Rahul

On Tue, Sep 29, 2009 at 7:12 PM, Rahul R  wrote:

> Sorry for the delayed response
>  **
> *How big are your documents?*
> I have totally 1 million documents. I have totally 1950 fields in the
> index. Every document would probably have values for around 20 - 50 fields.
>  *What is the total size of the index?*
> 1 GB
>
> *What's the amout of RAM on your box? How big is the JVM heap (and how
> much free memory is left on your system)?*
> I have 4 GB RAM. I am using Weblogic 10, 32 Bit. Since it is a windows box,
> I am able to allocate only 1 GB to the JVM. No other applications are
> running on the system. So the entire 4GB is at the disposal of the
> application. I am simulating load using a load tool (15 users)
>
> *Can you show what this slow query looks like (the whole request)?*
>
> q=*%3A*&rows=0&facet=true&facet.mincount=1&facet.limit=2&f.S9942.facet.limit=100&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true
>
>
> q=*%3A*&fq=S9942%3A%22TEXAS+INSTRUMENTS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true
>
> Other information
> Solr 1.3, JDK 1.5.0_14
>
> regards
> Rahul
>
>   On Mon, Sep 28, 2009 at 6:48 PM, Yonik Seeley <
> yo...@lucidimagination.com> wrote:
>
>> On Mon, Sep 28, 2009 at 7:51 AM, Rahul R  wrote:
>> > Yonik,
>> > I understand that the network can be a bottle-neck but I am pretty sure
>> that
>> > it is not. I am operating on a 100 MBPS intranet... How do I ensure
>> that
>> > stored fields are cached by the OS ? Only the Solr caches within the JVM
>> are
>> > under my control.. The result set has around 10K documents of which
>> I am
>> > retrieving only 10..I am displaying a max of only 3 fields per
>> document
>> > in my result set. Can the reading time for these stored fields be so
>> long ?
>>
>> It could be a seek per document if the index is too big to fit in the
>> OS cache - but that still wouldn't be as slow as you report.
>> Something is fishy here.
>>
>> How big are your documents?
>> What is the total size of the index?
>> What's the amout of RAM on your box?
>> How big is the JVM heap (and how much free memory is left on your system)?
>> Can you show what this slow query looks like (the whole request)?
>>
>> > I have totally around 1 million documents in my index Any
>> thoughts
>> > on why the FacetComponent does not take any time while the
>> QueryComponent
>> > takes around 2.4s.
>>
>> It could be a field that has very few unique values and faceting just
>> completes quickly.
>> Make sure you're actually getting faceting data back (that it's
>> correctly turned on).
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> > I am doing a faceted and keyword query ie I have both 'q'
>> > and 'fq' params in my query Thank you for your response.
>> >
>> > Regards
>> > Rahul
>> >
>> > On Mon, Sep 28, 2009 at 1:20 AM, Yonik Seeley <
>> yo...@lucidimagination.com>
>> > wrote:
>> >>
>> >> The response times in a Solr request don't include the time to read
>> >> stored fields (since the response is streamed) and doesn't include the
>> >> time to transfer/read the response (which can be increased by a
>> >> slow/congested network link, or a slow client that doesn't read the
>> >> response immediately).
>> >>
>> >> How many documents are you retrieving?  Reading stored fields fo

Trouble Configuring WordDelimiterFilterFactory

2009-11-24 Thread Rahul R

Hello,
In our application we have a catch-all field (the 'text' field) which is
cofigured as the default search field. Now this field will have a
combination of numbers, alphabets, special characters etc. I have a
requirement wherein the WordDelimiterFilterFactory does not work on numbers,
especially those with decimal points. Accuracy of results with relevance to
numerical data is quite important, So if the text field of a document has
data like "Bridge-Diode 3.55 Volts", I want to make sure that a search for
"355" or "35.5" does not retrieve this document. So I found the following
setting for the WordDelimiterFilterFactory to work for me (for most parts):


I am using the same setting for both index and query.

Now the only problem is, if I have data like ".355". With the above setting,
the analysis jsp shows me that WordDelimiterFilterFactory is creating term
texts as both ".355' and "355". So a search for ".355" retrieves documents
containing both ".355" and "355". A search for "355" also has the same
effect. I noticed that when the entry for the WordDelimiterFilterFactory was
completely removed (both index and query), then the above problem was
resolved. But this seems too harsh a measure.

Is there a way by which I can prevent the WordDelimiterFilterFactory from
totally acting on numerical data ?

Regards
Rahul

Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-25 Thread Rahul R

Hello,
Would really appreciate any inputs/suggestions on this. Thank you.



On Tue, Nov 24, 2009 at 10:59 PM, Rahul R  wrote:

> Hello,
> In our application we have a catch-all field (the 'text' field) which is
> cofigured as the default search field. Now this field will have a
> combination of numbers, alphabets, special characters etc. I have a
> requirement wherein the WordDelimiterFilterFactory does not work on numbers,
> especially those with decimal points. Accuracy of results with relevance to
> numerical data is quite important, So if the text field of a document has
> data like "Bridge-Diode 3.55 Volts", I want to make sure that a search for
> "355" or "35.5" does not retrieve this document. So I found the following
> setting for the WordDelimiterFilterFactory to work for me (for most parts):
>  generateNumberParts="0" catenateWords="1" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> preserveOriginal="1"/>
>
> I am using the same setting for both index and query.
>
> Now the only problem is, if I have data like ".355". With the above
> setting, the analysis jsp shows me that WordDelimiterFilterFactory is
> creating term texts as both ".355' and "355". So a search for ".355"
> retrieves documents containing both ".355" and "355". A search for "355"
> also has the same effect. I noticed that when the entry for the
> WordDelimiterFilterFactory was completely removed (both index and query),
> then the above problem was resolved. But this seems too harsh a measure.
>
> Is there a way by which I can prevent the WordDelimiterFilterFactory from
> totally acting on numerical data ?
>
> Regards
> Rahul
>

Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-29 Thread Rahul R

Steve,
My settings for both index and query are :

Let me give an example. Suppose I have the following 2 documents:
Document 1(Text Field): Bridge-Diode .355 Volts
Document 2(Text Field): Bridge-Diode 355 Volts

Requirement : Search for ".355" should retrieve only document 1 (Not
happening now)
Requirement: Search for "Bridge" should retrieve both documents (Works as
expected)

The reason why a search for ".355" is retrieving both documents is that term
texts for .355 in the document are created as .355 and 355. Even if I set
generateWordParts and catenateWords to "0", the way term texts are created
for ".355" does not change.

Thank you for your time.

Regards
Rahul

On Sun, Nov 29, 2009 at 1:07 AM, Steven A Rowe  wrote:

> Hi Rahul,
>
> On 11/26/2009 at 12:53 AM, Rahul R wrote:
> > Is there a way by which I can prevent the WordDelimiterFilterFactory
> > from totally acting on numerical data ?
>
> "prevent ... from totally acting on" is pretty vague, and nowhere AFAICT do
> you say precisely what it is you want.
>
> It would help if you could give example text and the terms you think should
> be the result of analysis of the text.  If you want different index/query
> time behavior, please provide this info for both.
>
> Steve
>
>

IndexSearcher and Caches

2010-05-20 Thread Rahul R

Hello all,
I have a few questions w.r.t the caches and the IndexSearcher available in
solr. I am using solr 1.3.
- The solr wiki states that the caches are per IndexSearcher object i.e if I
set my filterCache size to 1000 it means that 1000 entries can be assigned
for every IndexSearcher object. Is this true for queryResultsCache,
filterCache and documentCache ? For the document cache, the wiki states that
the value should be greater than (number of records) * (max number of
queries). If the document cache is also sized per IndexSearcher object, then
why do we need the (max number of queries) parameter in the formula ?
- In a web application, where multiple users may log into the system and
query concurrently, should we assign a new IndexSearcher object for every
user ? I tried sharing the IndexSearcher object but noticed that the search
criteria and filters of one user gets carried over to another ? Or is there
some way to get over that ?
- Combining the above two, if the caches are per IndexSearcher objects, and
if we have to assign a new IndexSearcher for every new user (in a web
application), will the total cache size not explode ?

Apologies if these seem really basic. Thank you.

Regards
Rahul

Re: IndexSearcher and Caches

2010-05-23 Thread Rahul R

Mitch,
Thank you for your response. A few follow up questions for clarification :

<>
In my case, I have an index which will not be modified after creation. Does
this mean that in a multi-user scenario, I can have a static IndexSearcher
object that can be shared by multiple users ?

<>
If the IndexSearcher object is threadsafe, then only issues related to
concurrency are addressed. What about the case where the IndexSearcher is
static? User 1 logs in to the system, queries with the static IndexSearcher,
logs out; and then User 2 logs in to the system, queries with the same
static IndexSearcher, logs out. In this case, the users 1 and 2 are not
querying concurrently but one after another. Will the query information
(filters or any other data) of User 1 be retained when User 2 uses this ?

Understand your point about the filter cache but appreciate if you could
throw some light on how these caches are tied to the IndexSearcher object.
Pasting my initial question here :
The solr wiki states that the caches are per IndexSearcher object i.e if I
set my filterCache size to 1000 it means that 1000 entries can be assigned
for every IndexSearcher object. Is this true for queryResultsCache,
filterCache and documentCache ? For the document cache, the wiki states that
the value should be greater than (number of records) * (max number of
queries). If the document cache is also sized per IndexSearcher object, then
why do we need the (max number of queries) parameter in the formula ?

Thank you.

Regards
Rahul

On Fri, May 21, 2010 at 3:03 PM, MitchK  wrote:

>
> Rahul,
>
> the IndexSearcher of Solr gets shared with every request within two
> commits.
> That means one IndexSearcher + its caches got a lifetime of one commit.
> After every commit, there will be a new one created.
>
> The cache does not mean, that they are applied automatically. They mean,
> that a filter from a query will be cached and whenever an user-query
> requieres the same filtering-criteria, they will use the cached filter
> instead of creating a new one on the fly.
>
> I.e: fq=inStock:true
> The result of this filtering-criteria gets cached one time. If another user
> asks again for a query with fq=inStock:true, Solr reuses the already
> existing filter.
> Since such filters are cached as byteVectors, they are not large.
> In this case it does not care for what the user is querying in his q-param.
>
> BTW: The IndexSearcher is threadsafe. So there is no problem with
> concurrent
> usage.
>
> Hope this helps???
>
> Kind regards
> - Mitch
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/IndexSearcher-and-Caches-tp833567p833841.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: IndexSearcher and Caches

2010-05-24 Thread Rahul R

<>
I have an application deployed on an application server (Weblogic). This
application uses solr to query an index. Users (sessions) will log in to the
application, query and then log out. This login and logout has nothing to do
with solr but the application manages them separately. I am using
EmbeddedSolrServer here.

I think I know where my mistake is. From what you say, it looks to me as
though that I should not create a new SolrIndexSearcher object because Solr
will do this automatically. In my current implementation, I am explicitly
creating a new SolrIndexSearcher object for every new user who logs into the
application.

Let me provide a code snippet to explain further. This is how I initialize
the solr handles required for searching. I am using EmbeddedSolrServer.
SolrConfig solrConfig = new SolrConfig(configHome+"/solrconfig.xml");
IndexSchema indexSchema = new IndexSchema(solrConfig,
configHome+"/schema.xml", null);
File corefile = new File(coreHome, "solr.xml");
CoreContainer coreContainer = new CoreContainer(coreHome, corefile);
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, coreName,
solrConfig.getResourceLoader().getInstanceDir());
coreDescriptor.setConfigName(solrConfig.getResourceName());
coreDescriptor.setSchemaName(indexSchema.getResourceName());
SolrCore solrCore = new SolrCore(coreName, indexHome, solrConfig,
indexSchema, coreDescriptor);
coreContainer.register(coreName, solrCore, false);
SolrServer solrServer = new EmbeddedSolrServer( coreContainer, coreName );

//Next two lines executed for every user
SolrIndexSearcher solrSearcher = solrCore.newSearcher("s1");
SolrRequestParsers solrRequestParsers = new SolrRequestParsers(solrConfig);

Many thanks for the response(s).

Regards
Rahul

On Mon, May 24, 2010 at 1:55 AM, MitchK  wrote:

>
>
>
> > In my case, I have an index which will not be modified after creation.
> > Does
> > this mean that in a multi-user scenario, I can have a static
> IndexSearcher
> > object that can be shared by multiple users ?
> >
> I am not sure, what you mean with "multi-user"-scenario. Can you tell me
> what you got in mind?
> If your index never changes, your IndexSearcher won't change.
>
>
>
>
> > If the IndexSearcher object is threadsafe, then only issues related to
> > concurrency are addressed. What about the case where the IndexSearcher is
> > static? User 1 logs in to the system, queries with the static
> > IndexSearcher,
> > logs out; and then User 2 logs in to the system, queries with the same
> > static IndexSearcher, logs out. In this case, the users 1 and 2 are not
> > querying concurrently but one after another. Will the query information
> > (filters or any other data) of User 1 be retained when User 2 uses this ?
> >
> I am not sure about the benefit of a static IndexSearcher. What do you
> hope???
>
> If user 1 uses  a filter like "fq=name:Samuel&q=somethingIWantToKnow" and
> user 2 queries for "fq=name:Samuel&q=whatIReallyWantToKnow" than they use
> the same cached filter-object, retrived from Solr's internal cache (of
> course you need to have a cache-size that allows cacheing).
>
>
>
> > The solr wiki states that the caches are per IndexSearcher object i.e if
> I
> > set my filterCache size to 1000 it means that 1000 entries can be
> assigned
> > for every IndexSearcher object.
> >
> Yes. If a new searcher is created than the new Cache is built on the old
> one.
>
>
>
> > Is this true for queryResultsCache,
> > filterCache and documentCache ?
> >
> For FilterCache it's true. For queryResultsCache (if I understand the wiki
> right), too.
> Please note, that the documentCache's behaviour is different from the
> already mentioned ones.
> The wiki says:
>
>
> > Note: This cache cannot be used as a source for autowarming because
> > document IDs will change when anything in the index changes so they can't
> > be used by a new searcher.
> >
>
> The wiki says that the number of the document cache should not be bigger
> than the number of _results_ * number of _concurrent_ queries.
> I never worked with the document cache, so maybe someone else can throw
> some
> light into the dark.
> But from what I have understood it means the following:
>
> If you show 10 results per request and you think of up to 500 concurrent
> queries:
> 10 * 500 => 5000
>
> But I want to emphasize, that this is only a gues. I actually don't exactly
> know more about this topic.
>
> Kind regards
> - Mitch
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/IndexSearcher-and-Caches-tp833567p838367.html
>  Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: IndexSearcher and Caches

2010-05-24 Thread Rahul R

Thank you I found the API to get the existing SolrIndexSearcher to be
present in SolrCore:
SolrCore.getSearcher().get()

So if now the Index changes (a commit is done) in between, will I
automatically get the new SolrIndexSearcher from this call ?

Regards
Rahul


On Mon, May 24, 2010 at 11:25 PM, MitchK  wrote:

>
> Ahh, now I understand.
>
> No, you need no second IndexSearcher as long as the Server is alive.
> You can reuse your searcher for every user.
>
> The only commands you are executing per user are those to create a
> search-query.
>
> Kind regards,
> - Mitch
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/IndexSearcher-and-Caches-tp833567p840228.html
>  Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: IndexSearcher and Caches

2010-05-25 Thread Rahul R

Chris,
I am using SolrIndexSearcher to get a handle to the total number of records
in the index. I am doing it like this :
int num =
Integer.parseInt((String)solrSearcher.getStatistics().get("numDocs").toString());
Please let me know if there is a better way to do this.

Mark,
I can tell you what I do in my applicaiton. We provide a tool to do the
index update and assume that the user will always use it to create/update
the index. Whenever an update happens, we notify the querying application
and it creates a new instance of SolrCore, SolrServer etc. These continue to
be shared across multiple users (as statics) till the next update happens.

Thank you.

Regards
Rahul

On Tue, May 25, 2010 at 4:18 AM, Chris Hostetter
wrote:

>
> : Thank you I found the API to get the existing SolrIndexSearcher to be
> : present in SolrCore:
> : SolrCore.getSearcher().get()
>
> I think perhaps you need to take 5 big steps back and explain what your
> goal is.  99.999% of all solr users should never care about that method --
> even the 99.9% of the folks writing java code and using "EmbeddedSolr"
> should never ever have a need to call those -- so what exactly is it you
> are doing, and how did you get along hte path you find yourself on?
>
> this thread started with some fairly innoculous questions about how caches
> worked in regardes to new searchers -- which is all fine and dandy, those
> concepts that solr users should be aware of ... in the abstract.  you
> should almost never be instantiating those IndexSearchers or Caches
> yourself.
>
> Stick with teh SolrServer abstraction provided by SolrJ...
>
> http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
>
> http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrServer.html
>
>
> -Hoss
>
>

48 matches

Mail list logo