Solr/Lucene search term stats

2008-07-22 Thread Sunil
Hi All,

I am working on a module using Solr, where I want to get the stats of
each keyword found in each field.

If my search term is: (title:("web2.0" OR "ajax") OR
description:("web2.0" OR "ajax"))

Then I want to know how many times web2.0/ajax were found in title or
description.

Any suggestion on how to get this information (apart from & hl=true
variable).


Thanks,
Sunil




Alphabetical search on solr

2008-07-22 Thread Adrian M Bell

Ok this might be a simple one, or more likely, my understanding of solr is
shot to bits

We have a catalogue of documents that we have a solr index on.  We need to
provide an alphabetical search, so that a user can list all documents with a
title beginning A, B and so on...

So how do we do this?

Currently we have built up the following query:

/solr/select/?q=titleLong(a*)&rows=50

Whilst this is fine, it returns ALL documents that have an 'A' anywhere in
the title and as you can imagine, there are quite a few of these!  So
obviously we then strip out the results that don't begin with A.  This seems
to be incredibly wasteful to me.

Any one got any ideas?
-- 
View this message in context: 
http://www.nabble.com/Alphabetical-search-on-solr-tp18585264p18585264.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Alphabetical search on solr

2008-07-22 Thread Erik Hatcher


On Jul 22, 2008, at 5:08 AM, Adrian M Bell wrote:
We have a catalogue of documents that we have a solr index on.  We  
need to
provide an alphabetical search, so that a user can list all  
documents with a

title beginning A, B and so on...

So how do we do this?

Currently we have built up the following query:

/solr/select/?q=titleLong(a*)&rows=50


I'm assuming that is a typo and the query should be titleLong:a* (w/  
colon).


Whilst this is fine, it returns ALL documents that have an 'A'  
anywhere in

the title


This depends on what analyzer you have set up for titleLong, but it  
probably is only returning back documents that have words in the title  
that begin with "a", correct?



and as you can imagine, there are quite a few of these!  So
obviously we then strip out the results that don't begin with A.   
This seems

to be incredibly wasteful to me.

Any one got any ideas?



When presented with a situation where you want to look something up by  
part of a field, rather than contorting the query, consider it an  
indexing effort.   Index a firstLetterOfTitle field, such that the  
only values of that field would be A-Z (normalize on case too, so they  
are all either all uppercase or all lowercase).  The indexing client  
could extract that field, or you could use a copyField for the  
longTitle into this new field associated with an analyzer that outputs  
only the first letter.  There is probably a way to configure the built- 
in KeywordTokenizerFactory->PatternReplaceFilterFactory to accomplish  
this.


Erik



Re: Solr/Lucene search term stats

2008-07-22 Thread Preetam Rao
hi,

try using faceted search,
http://wiki.apache.org/solr/SimpleFacetParameters

something like facet=true&facet.query=title:("web2.0" OR "ajax")

facet.query - gives the number of matching documents for a query.
You can run the examples in the above link and see how it works..

You can also try using facet.field, which enumerates all the terms found in
a given field and also tells how many documnetss contained each term.

For both the above, the set of documents it acts on are the results of q. So
if you want get the facets for all documents, try q=*:*

On Tue, Jul 22, 2008 at 1:43 PM, Sunil <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I am working on a module using Solr, where I want to get the stats of
> each keyword found in each field.
>
> If my search term is: (title:("web2.0" OR "ajax") OR
> description:("web2.0" OR "ajax"))
>
> Then I want to know how many times web2.0/ajax were found in title or
> description.
>
> Any suggestion on how to get this information (apart from & hl=true
> variable).
>
>
> Thanks,
> Sunil
>
>
>


facets and filter query

2008-07-22 Thread Stefan Oestreicher
Hi,

I have a category field in my index which I'd like to use as a facet.
However my search frontend only allows you to search in one category at a
time for which I'm using a filter query. Unfortunately the filter query
restricts the facets as well.

My query looks like this:
?q=content:foo&fq=cat:default&fl=title,content&facet=true&facet.field=cat

What I'd like is to search only in the "default" category but get the result
count of that query for all categories. I thought maybe I can use the
facet.query parameter but this doesn't seem to do what I want, because the
result is the same.

Is there any way to accomplish this with only one request?

I'm using version 1.3 from trunk.

TIA,
 
Stefan Oestreicher



Re: Vote on a new solr logo

2008-07-22 Thread Shalin Shekhar Mangar
28 votes so far and counting!

When should we close this poll?

On Tue, Jul 22, 2008 at 1:18 AM, Mark Miller <[EMAIL PROTECTED]> wrote:

> Perfect! Thank you Shalin. Much appreciated, and a dead simple system. My
> vote is in.
>
> - Mark
>
>
> Shalin Shekhar Mangar wrote:
>
>> Will this do? A 1-5 for each image is difficult but I guess this goes 80%
>> with 20% effort :)
>>
>> http://people.apache.org/~shalin/poll.html
>>
>> On Tue, Jul 22, 2008 at 12:23 AM, Mike Klaas <[EMAIL PROTECTED]>
>> wrote:
>>
>>
>>
>>> How about a form that has a "1 to 5" radio button option beside each logo
>>> and a box to enter your name?
>>>
>>> -Mike
>>>
>>>
>>> On 21-Jul-08, at 11:43 AM, Mark Miller wrote:
>>>
>>>  I looked for a long time for an easy free image polling system, and
>>>
>>>
 couldn't find much beyond shareware. I am still looking for something
 though...

 Ryan McKinley wrote:



> nor does http://selectricity.org/
>
> On Jul 21, 2008, at 2:28 PM, Shalin Shekhar Mangar wrote:
>
>  Too bad the polls created with Google docs don't support images in
> them
>
>
>> (or
>> atleast i couldn't figure out how to do it)
>>
>> On Mon, Jul 21, 2008 at 11:52 PM, Ryan McKinley <[EMAIL PROTECTED]>
>> wrote:
>>
>>  I can't figure how to use the poll either...
>>
>>
>>> here are a few others to check out:
>>> http://lapnap.net/solr/
>>> perhaps "a" and "f" could live together, you use 'a' if you need a
>>> background other then white
>>>
>>>
>>>
>>> On Jul 21, 2008, at 2:14 PM, Mike Klaas wrote:
>>>
>>> On 20-Jul-08, at 6:19 PM, Mark Miller wrote:
>>>
>>>
>>>
 From the dev list:



> Shalin Shekhar Mangar:
>
> +1 for a new logo. It's a new release, let's have a new logo too!
> First
>
>
>
>> step
>>
>>
>>> is to decide which one of these is more Solr-ish.
>>>
>>>
>>>
>>>
>>  I'm looking to improve the look of solr, so I am going to do my
>>
>>
> best to
> push this process along.
> Not to keep shoving polls down everyones throat, but if you could,
> please
> go to the following site
> and rate the solr logos that you love or hate:
> http://solrlogo.myhardshadow.com/solr-logo-vote/
>
>
>
>
 I don't really understand how to use the poll.  I click on a logo,
 and
 am
 then taken to a page on which the stars are unclickable.  Which
 stars
 should
 be clicked on?

 -Mike




>>>
>>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>>
>
>

>>
>>
>>
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Vote on a new solr logo

2008-07-22 Thread Mark Miller
My opinion: if its already a runaway, we might as well not prolong 
things. If not though, we should probably give some time for any 
possible laggards. The 'admin look' poll received its first 19-20 votes 
in the first night / morning, and has only gotten 2 or 3 since then, so 
probably no use going to long.


- Mark

Shalin Shekhar Mangar wrote:

28 votes so far and counting!

When should we close this poll?

On Tue, Jul 22, 2008 at 1:18 AM, Mark Miller <[EMAIL PROTECTED]> wrote:

  

Perfect! Thank you Shalin. Much appreciated, and a dead simple system. My
vote is in.

- Mark


Shalin Shekhar Mangar wrote:



Will this do? A 1-5 for each image is difficult but I guess this goes 80%
with 20% effort :)

http://people.apache.org/~shalin/poll.html

On Tue, Jul 22, 2008 at 12:23 AM, Mike Klaas <[EMAIL PROTECTED]>
wrote:



  

How about a form that has a "1 to 5" radio button option beside each logo
and a box to enter your name?

-Mike


On 21-Jul-08, at 11:43 AM, Mark Miller wrote:

 I looked for a long time for an easy free image polling system, and




couldn't find much beyond shareware. I am still looking for something
though...

Ryan McKinley wrote:



  

nor does http://selectricity.org/

On Jul 21, 2008, at 2:28 PM, Shalin Shekhar Mangar wrote:

 Too bad the polls created with Google docs don't support images in
them




(or
atleast i couldn't figure out how to do it)

On Mon, Jul 21, 2008 at 11:52 PM, Ryan McKinley <[EMAIL PROTECTED]>
wrote:

 I can't figure how to use the poll either...


  

here are a few others to check out:
http://lapnap.net/solr/
perhaps "a" and "f" could live together, you use 'a' if you need a
background other then white



On Jul 21, 2008, at 2:14 PM, Mike Klaas wrote:

On 20-Jul-08, at 6:19 PM, Mark Miller wrote:





From the dev list:



  

Shalin Shekhar Mangar:

+1 for a new logo. It's a new release, let's have a new logo too!
First





step


  

is to decide which one of these is more Solr-ish.






 I'm looking to improve the look of solr, so I am going to do my


  

best to
push this process along.
Not to keep shoving polls down everyones throat, but if you could,
please
go to the following site
and rate the solr logos that you love or hate:
http://solrlogo.myhardshadow.com/solr-logo-vote/






I don't really understand how to use the poll.  I click on a logo,
and
am
then taken to a page on which the stars are unclickable.  Which
stars
should
be clicked on?

-Mike




  


--
Regards,
Shalin Shekhar Mangar.



  



  




  




Re: facets and filter query

2008-07-22 Thread Jon Baer

This is *exactly* my issue ... very nicely worded :-)

I would have thought facet.query=*:* would have been the solution but  
it does not seem to work.  Im interested in getting these *total*  
counts for UI display.


- Jon

On Jul 22, 2008, at 6:05 AM, Stefan Oestreicher wrote:


Hi,

I have a category field in my index which I'd like to use as a facet.
However my search frontend only allows you to search in one category  
at a
time for which I'm using a filter query. Unfortunately the filter  
query

restricts the facets as well.

My query looks like this:
? 
q 
= 
content:foo&fq=cat:default&fl=title,content&facet=true&facet.field=cat


What I'd like is to search only in the "default" category but get  
the result

count of that query for all categories. I thought maybe I can use the
facet.query parameter but this doesn't seem to do what I want,  
because the

result is the same.

Is there any way to accomplish this with only one request?

I'm using version 1.3 from trunk.

TIA,

Stefan Oestreicher





Re: facets and filter query

2008-07-22 Thread Erik Hatcher
All facet counts currently returned are _within_ the set of documents  
constrained by query (q) and filter query (fq) parameters - just to  
clarify what it does.  Why?  That's the general use case.  Returning  
back counts from differently constrained sets requires some custom  
coding - perhaps as a custom facet plugin? (not sure if that'd do the  
trick or not), but certainly a custom request handler would be able to  
manage this.


Worst case, make two requests to Solr, of course.  Less than ideal,  
but pragmatic, and not necessarily a bad performer, especially with  
HTTP caching in the mix.


[back in the day, the Collex @ NINES site made 3 requests to Solr for  
every page view just to keep things simple architecturally for a  
while one to get the search results and facets, one to render a  
tag cloud given different criteria, and one simply to get the total  
count of objects in the entire system to show at the bottom of the  
page for hype/marketing purposes]


Erik

On Jul 22, 2008, at 9:39 AM, Jon Baer wrote:


This is *exactly* my issue ... very nicely worded :-)

I would have thought facet.query=*:* would have been the solution  
but it does not seem to work.  Im interested in getting these  
*total* counts for UI display.


- Jon

On Jul 22, 2008, at 6:05 AM, Stefan Oestreicher wrote:


Hi,

I have a category field in my index which I'd like to use as a facet.
However my search frontend only allows you to search in one  
category at a
time for which I'm using a filter query. Unfortunately the filter  
query

restricts the facets as well.

My query looks like this:
? 
q 
= 
content:foo 
&fq=cat:default&fl=title,content&facet=true&facet.field=cat


What I'd like is to search only in the "default" category but get  
the result

count of that query for all categories. I thought maybe I can use the
facet.query parameter but this doesn't seem to do what I want,  
because the

result is the same.

Is there any way to accomplish this with only one request?

I'm using version 1.3 from trunk.

TIA,

Stefan Oestreicher





Restricting spellchecker for certain words

2008-07-22 Thread Jon Baer
It seems that spellchecker works great except all the "7 words you  
can't say on TV" resolve to very important people, is there a way to  
contain just certain words so they don't resolve?


Thanks.

- Jon


Solr cache statistics explanation

2008-07-22 Thread Marshall Gunter
Can someone point me to an in depth explanation of the Solr cache 
statistics? I'm having a hard time finding it online. Specifically, I'm 
interested in these fields that are listed on the Solr admin statistics 
pages in the cache section:


lookups
hits
hitratio
inserts
evictions
size
cumulative_lookups
cumulative_hits
cumulative_hitratio
cumulative_inserts
cumulative_evictions

--
Marshall Gunter



Re: spellchecker problems (bugs)

2008-07-22 Thread Geoffrey Young



Shalin Shekhar Mangar wrote:

The problems you described in the spellchecker are noted in
https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue to
synchronize spellcheck.build so that the index is not corrupted.


I'd like to discuss this a little...

I'm not sure that I want to rebuild the spelling index each time the 
underlying data index changes - the process takes very long and my 
updates are frequent changes to non-spelling related data.


what I'd really like is for a change to my index to not cause an 
exception.  IIRC the "old" way of using a spellchecker didn't work like 
this at all - I could completely rm data/index and leave data/spell in 
place, add new data, not issue cmd=build and the spelling parts still 
worked just fine (albeit with old data).


not to say that SOLR-622 isn't a good idea (it is) but I don't really 
think the entire solution is keeping the spellcheck index in sync.  do 
they need to be kept in sync for things not to implode on me?


--Geoff


Query for an exact match

2008-07-22 Thread Ian Connor
How can I require an exact field match in a query. For instance, if a
title field contains "Nature" or "Nature Cell Biology", when I search
title:Nature I only want "Nature" and not "Nature Cell Biology". Is
that something I do as a query or do I need to re index it with the
field defined in a certain way?

I have this definition now - but it returns all titles that contain
"Nature" rather than just the ones that equals it exactly.


   

-- 
Regards,

Ian Connor


Re: spellchecker problems (bugs)

2008-07-22 Thread Yonik Seeley
On Tue, Jul 22, 2008 at 11:07 AM, Geoffrey Young
<[EMAIL PROTECTED]> wrote:
> Shalin Shekhar Mangar wrote:
>>
>> The problems you described in the spellchecker are noted in
>> https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue
>> to
>> synchronize spellcheck.build so that the index is not corrupted.
>
> I'd like to discuss this a little...
>
> I'm not sure that I want to rebuild the spelling index each time the
> underlying data index changes - the process takes very long and my updates
> are frequent changes to non-spelling related data.
>
> what I'd really like is for a change to my index to not cause an exception.
>  IIRC the "old" way of using a spellchecker didn't work like this at all - I
> could completely rm data/index and leave data/spell in place, add new data,
> not issue cmd=build and the spelling parts still worked just fine (albeit
> with old data).
>
> not to say that SOLR-622 isn't a good idea (it is) but I don't really think
> the entire solution is keeping the spellcheck index in sync.  do they need
> to be kept in sync for things not to implode on me?

Agree... spell check indexes should not have to be in sync, and
anything to keep them in sync automatically should be optional (and
probably disabled by default).

-Yonik


Re: spellchecker problems (bugs)

2008-07-22 Thread Shalin Shekhar Mangar
On Tue, Jul 22, 2008 at 8:37 PM, Geoffrey Young <[EMAIL PROTECTED]>
wrote:

>
>
> Shalin Shekhar Mangar wrote:
>
>> The problems you described in the spellchecker are noted in
>> https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue
>> to
>> synchronize spellcheck.build so that the index is not corrupted.
>>
>
> I'd like to discuss this a little...
>
> I'm not sure that I want to rebuild the spelling index each time the
> underlying data index changes - the process takes very long and my updates
> are frequent changes to non-spelling related data.
>
> what I'd really like is for a change to my index to not cause an exception.
>  IIRC the "old" way of using a spellchecker didn't work like this at all - I
> could completely rm data/index and leave data/spell in place, add new data,
> not issue cmd=build and the spelling parts still worked just fine (albeit
> with old data).
>
> not to say that SOLR-622 isn't a good idea (it is) but I don't really think
> the entire solution is keeping the spellcheck index in sync.  do they need
> to be kept in sync for things not to implode on me?
>
> --Geoff
>

Sure, the intent of SOLR-622 is not to keep the two indices in sync, though
it can certainly be used to do that. The intention is to avoid having to
manually issue a build/reload command every time on startup. All the options
introduced by SOLR-622 will be entirely optional and configurable so you
will have the flexibility you want.

Note that the SpellCheckComponent is similar to the old request handler in
the way it manages the index. The spell checker index is separate from the
main index and they do not need to be kept in sync, though I most users will
probably use it that way.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Query for an exact match

2008-07-22 Thread Yonik Seeley
On Tue, Jul 22, 2008 at 11:08 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
> How can I require an exact field match in a query. For instance, if a
> title field contains "Nature" or "Nature Cell Biology", when I search
> title:Nature I only want "Nature" and not "Nature Cell Biology". Is
> that something I do as a query or do I need to re index it with the
> field defined in a certain way?
>
> I have this definition now - but it returns all titles that contain
> "Nature" rather than just the ones that equals it exactly.
>
>
>omitNorms="true"/>

That field definition should do it.
Try title:Nature  it may be that you have a default search field that
has a different analyzer configured.
If that doesn't work, make sure you have reindexed after your schema changes.

-Yonik


java.io.IOException: read past EOF

2008-07-22 Thread Rohan
Hi Guys,

This is my first post. We are running solr with multiple Indexes, 20
Indexes. I'm facing problem with 5 one. I'm not able to run optimized on
that index. I'm getting following error. Your help is really appreciated.


java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89
)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:
34)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
at
org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPo
sitions.java:70)
at
org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositio
ns.java:66)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:388)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:320)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:292)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:256)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97)
at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java
:508)
at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandl
er.java:214)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateR
equestHandler.java:84)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:1
91)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
159)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http
11Protocol.java:584)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
type Status
reportmessage read past EOF

java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89
)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:
34)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
at
org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPo
sitions.java:70)
at
org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositio
ns.java:66)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:388)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:320)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:292)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:256)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97)
at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java
:508)
at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandl
er.java:214)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateR
equestHandler.java:84)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:1
91)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
159)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:235)
at
org.apache.catalina.core.Applicatio

maximum length of string that Solr can index

2008-07-22 Thread Tom Lord
Hi, we've looked for info about this issue online and in the code and am
none the wiser - help would be much appreciated.

We are indexing the full text of journals using Solr. We currently pass
in the journal text, up to maybe 130 pages, and index it in one go.

We are seeing Solr stop indexing after ~30 pages or so. That is, when we
look at the indexed text field using Luke, we can see where it gives up
collecting information from the text.

What is the maximum size that we can index on? Is this a known issue or
standard behaviour, or is something else amiss? 

If this is standard behaviour, what is the approved way of avoiding this
issue? Should we index on a per-page basis rather than trying to do 130
pages as a single document?

thanks in advance,
Tom.

-- 
Tom Lord | ([EMAIL PROTECTED])

Aptivate | http://www.aptivate.org | Phone: +44 1223 760887 
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales 
with company number 04980791.



Re: Query for an exact match

2008-07-22 Thread Ian Connor
At the moment for "string", I have:



is there an example type so that it will do exact matches?

Would "alphaOnlySort" do the trick? It looks like it might.


  








  


On Tue, Jul 22, 2008 at 11:20 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Tue, Jul 22, 2008 at 11:08 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
>> How can I require an exact field match in a query. For instance, if a
>> title field contains "Nature" or "Nature Cell Biology", when I search
>> title:Nature I only want "Nature" and not "Nature Cell Biology". Is
>> that something I do as a query or do I need to re index it with the
>> field defined in a certain way?
>>
>> I have this definition now - but it returns all titles that contain
>> "Nature" rather than just the ones that equals it exactly.
>>
>>
>>   > omitNorms="true"/>
>
> That field definition should do it.
> Try title:Nature  it may be that you have a default search field that
> has a different analyzer configured.
> If that doesn't work, make sure you have reindexed after your schema changes.
>
> -Yonik
>



-- 
Regards,

Ian Connor
82 Fellsway W #2
Somerville, MA 02145
Direct Line: +1 (978) 672
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor


Re: Query for an exact match

2008-07-22 Thread Yonik Seeley
On Tue, Jul 22, 2008 at 11:39 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
>  omitNorms="true"/>

This will give you an exact match.  As I said, if it's not, then you
didn't restart and reindex, or you are querying the wrong field.

-Yonik


Re: maximum length of string that Solr can index

2008-07-22 Thread Yonik Seeley
Lucene has a maxFieldLength (the number of tokens to index for a given
field name).
It can be configured via solrconfig.xml:
1

-Yonik

On Tue, Jul 22, 2008 at 11:38 AM, Tom Lord <[EMAIL PROTECTED]> wrote:
> Hi, we've looked for info about this issue online and in the code and am
> none the wiser - help would be much appreciated.
>
> We are indexing the full text of journals using Solr. We currently pass
> in the journal text, up to maybe 130 pages, and index it in one go.
>
> We are seeing Solr stop indexing after ~30 pages or so. That is, when we
> look at the indexed text field using Luke, we can see where it gives up
> collecting information from the text.
>
> What is the maximum size that we can index on? Is this a known issue or
> standard behaviour, or is something else amiss?
>
> If this is standard behaviour, what is the approved way of avoiding this
> issue? Should we index on a per-page basis rather than trying to do 130
> pages as a single document?
>
> thanks in advance,
> Tom.
>
> --
> Tom Lord | ([EMAIL PROTECTED])
>
> Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
> The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES
>
> Aptivate is a not-for-profit company registered in England and Wales
> with company number 04980791.
>
>


Re: java.io.IOException: read past EOF

2008-07-22 Thread Fuad Efendi

Lucene index corrupted... which harddrive do you use?

Quoting Rohan <[EMAIL PROTECTED]>:


Hi Guys,

This is my first post. We are running solr with multiple Indexes, 20
Indexes. I'm facing problem with 5 one. I'm not able to run optimized on
that index. I'm getting following error. Your help is really appreciated.


java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89






Re: Query for an exact match

2008-07-22 Thread Ian Connor
Indeed - one of my shards had it listed as "text"  doh!

thanks for the assurance that led me to find my bug

On Tue, Jul 22, 2008 at 11:43 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Tue, Jul 22, 2008 at 11:39 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
>> > omitNorms="true"/>
>
> This will give you an exact match.  As I said, if it's not, then you
> didn't restart and reindex, or you are querying the wrong field.
>
> -Yonik
>



-- 
Regards,

Ian Connor


Re: Solr cache statistics explanation

2008-07-22 Thread Koji Sekiguchi

lookups : how many times the cache is referenced
hits : how many times the cache hits
hitratio : hits/lookups

and for other items, see my previous mail at:
http://www.nabble.com/about-cache-to10192953.html

Koji

Marshall Gunter wrote:
Can someone point me to an in depth explanation of the Solr cache 
statistics? I'm having a hard time finding it online. Specifically, 
I'm interested in these fields that are listed on the Solr admin 
statistics pages in the cache section:


lookups
hits
hitratio
inserts
evictions
size
cumulative_lookups
cumulative_hits
cumulative_hitratio
cumulative_inserts
cumulative_evictions





Re: Specifying explicit FacetQuery w/ a normal query?

2008-07-22 Thread Mike Klaas
I'm somewhat perplexed, under what circumstances would you be able to  
send one query to Solr but not two?


-Mike

On 21-Jul-08, at 8:37 PM, Jon Baer wrote:


Well that's my problem ... I can't :-)

When you put a fq=doctype:news in there your can't get an explicit  
facet.query, it will only let you deal w/ the stuff you have already  
filtered out.  I think what I was is possible, just need to dig in  
the code more.


- Jon

On Jul 21, 2008, at 9:14 PM, Mike Klaas wrote:



On 17-Jul-08, at 6:27 AM, Jon Baer wrote:

Ive gone from a complex multicore setup back to a single  
solrconfig setup and using a doctype field (since the index is  
pretty small), however there are a few spots where items are laid  
out in tabs and each tab has a count of docs associated, ie:


News (123) | Images (345) | Video (678) | Blogs (901)

Unfortunately the tab controlling is server side and Im trying to  
grab a facet count on doctype w/ a filter query and can't seem to  
do it w/o having to send the small facet query (for the counts on  
all items) and the filter query itself.  Is there any way to do  
this in a single request w/ any params Im missing?  (Using SolrJ  
if that helps).


No, there isn't, but does this really bother you?  It doesn't seem  
that the advantages to combining everything in one request are huge.


-Mike










Re: Vote on a new solr logo

2008-07-22 Thread Chris Hostetter

: http://people.apache.org/~shalin/poll.html

Except the existing Solr logo isn't on that list. 
i smell election tampering :)

Seriously though: I realized a long time ago that there was too much email 
to reply too, too many features to work on, too many patches to review, 
and too few hours in the day for me to really care what the Solr Admin 
screens or the Solr Logo looked like.

As long as the admin screens are functional and readable, and as long as 
the logo contains the word "Solr" I'm happy -- You crazy kids, with all 
your energy and enthusiasm, should feel free to go nuts.




-Hoss



Re: Vote on a new solr logo

2008-07-22 Thread Mark Miller

Chris Hostetter wrote:

: http://people.apache.org/~shalin/poll.html

Except the existing Solr logo isn't on that list. 
i smell election tampering :)
  
I had put it in my poll :) I actually considered bringing that up to 
Shalin as well, but couldn't bring myself to be so fair I suppose 
Seriously though: I realized a long time ago that there was too much email 
to reply too, too many features to work on, too many patches to review, 
and too few hours in the day for me to really care what the Solr Admin 
screens or the Solr Logo looked like.
  
In general, I couldn't agree more. Talented guys like yourself should be 
kept busy on important non visual stuff for sure. The work on what solr 
actually does should come first, no question! I am not trying to pull 
any busy developer into the admin gui.
As long as the admin screens are functional and readable, and as long as 
the logo contains the word "Solr" I'm happy -- You crazy kids, with all 
your energy and enthusiasm, should feel free to go nuts.
  
Good, cause all that above said, I still want improvement! In some ways, 
its not the best bang for your buck, but some volunteers might be into 
the design thing. The open source user is usually well aware that admin 
gui's come last, and doesn't make any judgments of code/product quality 
on such things - if you did you'd be making the wrong judgments!


But those outside of the open source circle often associate superficial 
looks with quality (and vice versa)- which is why commercial stuff 
'tends' (not always for sure) to look better than open source stuff. 
They pay a designer to make something that looks pleasing.


But I don't care about the possibly small value of it. I have energy to 
spare, and I want solr to look fantastic. Don't ask me why - personal 
itch. So no work necessary from anyone else, and I'll try to stay out of 
the way - I have a personal itch to upgrade the look - the current look 
just rubs me wrong :)


Its not that important, but I want to help improve it anyway - I think 
thats part of the beauty of open source  - no budgets. I am fully 
prepared to get nowhere as well - no use not trying though. Already 
there may be an improved logo 



-  Mark




-Hoss

  




OOM on Solr Sort

2008-07-22 Thread sundar shankar
Hi,We are developing a product in a agile manner and the current 
implementation has a data of size just about a 800 megs in dev. The memory 
allocated to solr on dev (Dual core Linux box) is 128-512. My config=   

trueMy Field=== 
  
Problem==I execute a query that returns 24 rows of result. I 
pick 10 out of it. I have no problem when I execute this. But When I do sort it 
by a String field that is fetched from this result. I get an OOM. I am able to 
execute several other queries with no problem. Just having a sort asc clause 
added to the query throws an OOM. Why is that. What should I have ideally done. 
My config on QA is pretty similar to the dev box and probably has more data 
than on dev. It didnt throw any OOM during the integration test. The 
Autocomplete is a new field we added recently.Another point is that the 
indexing is done with a field of type string and the autocomplete field is 
a copy field.The sorting is done based on string field.Please do lemme know 
what mistake am I doing?RegardsSundarP.S: The stack trace of the exception is 
Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing 
query at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86) 
at 
org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101) 
at 
com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)
 ... 105 moreCaused by: org.apache.solr.common.SolrException: Java heap space  
java.lang.OutOfMemoryError: Java heap space  at 
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) 
 at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) 
 at 
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)
  at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)
  at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
  at 
org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56)
  at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)
  at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
  at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)  at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
  at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)  at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) 
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
  at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
  at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
  at 
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
  at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
  at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
  at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
  at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
  at 
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)
  at 
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74) 
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)  
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)  
at 
org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:156)
  at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
  at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)  at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
_
Chose your Life Partner? Join MSN Matrimony
http://www.shaadi.com/msn/matrimony.php 

pf nixes fl

2008-07-22 Thread Jason Rennie
Just tried adding a pf field to my request handler.  When I did this, solr
returned all document fields for each doc (no "score") instead of returning
the fields specified in fl.  Bug?  Feature?  Anyone know what the reason for
this behavior is?  I'm using solr 1.2.

Thanks,

Jason


Re: pf nixes fl

2008-07-22 Thread Mike Klaas


On 22-Jul-08, at 11:53 AM, Jason Rennie wrote:

Just tried adding a pf field to my request handler.  When I did  
this, solr
returned all document fields for each doc (no "score") instead of  
returning
the fields specified in fl.  Bug?  Feature?  Anyone know what the  
reason for

this behavior is?  I'm using solr 1.2.


What exact url did you send to Solr?  I bet there is a missing '&'.

-Mike


RE: OOM on Solr Sort

2008-07-22 Thread sundar shankar

Sorry for that. I didnt realise how my had finally arrived. Sorry!!!

From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Subject: OOM on Solr Sort
Date: Tue, 22 Jul 2008 18:33:43 +

Hi,
We are developing a product in a agile manner and the current 
implementation has a data of size just about a 800 megs in dev. The memory 
allocated to solr on dev (Dual core Linux box) is 128-512.
===

My config
===


   

===







true

===

My Field
===

   
   



  








===

Problem
===


 I execute a query that returns 24 rows of result. 
I pick 10 out of it. I have no problem when I execute this. But When I do sort 
it by a String field that is fetched from this result. I get an OOM. I am able 
to execute several other queries with no problem. Just having a sort asc clause 
added to the query throws an OOM. Why is that. What should I have ideally done. 
My config on QA is pretty similar to the dev box and probably has more data 
than on dev. It didnt throw any OOM during the integration test. The 
Autocomplete is a new field we added recently.

Another point is that the indexing is done with a 
field of type string
  
   and the autocomplete field is a copy field.

The sorting is done based on string field. Please do lemme know what mistake am 
I doing?

Regards
Sundar

P.S: The stack trace of the exception is


Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing 
query
 at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)
 at 
org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)
 at 
com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)
 ... 105 more
Caused by: org.apache.solr.common.SolrException: Java heap space  
java.lang.OutOfMemoryError: Java heap space  at 
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) 
 at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) 
 at 
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)
  at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)
  at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
  at org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56) 
 at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)
  at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
  at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)  at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
  at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)  at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) 
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
  at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
  at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
  at 
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
  at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
  at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
  at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
  at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.j

Re: pf nixes fl

2008-07-22 Thread Jason Rennie
I'm using solrj and all I did was add a pf entry to solrconfig.xml.  I don't
think it could be an ampersand issue...

Here's an example query:

wt=xml&rows=10&start=0&q=urban+outfitters&qt=recsKeyword&version=2.2

Here's qt config:

  
 0.06
 
name^1.5 tags description^0.6 vendorname^0.3
manufacturer^0.3 category
 
 name description
 id score
 0
 status:0
  

The above query returns all document fields, no "score" field.

Jason

On Tue, Jul 22, 2008 at 2:55 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:

>
> On 22-Jul-08, at 11:53 AM, Jason Rennie wrote:
>
>  Just tried adding a pf field to my request handler.  When I did this, solr
>> returned all document fields for each doc (no "score") instead of
>> returning
>> the fields specified in fl.  Bug?  Feature?  Anyone know what the reason
>> for
>> this behavior is?  I'm using solr 1.2.
>>
>
> What exact url did you send to Solr?  I bet there is a missing '&'.
>
> -Mike
>



-- 
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/


Out of memory on Solr sorting

2008-07-22 Thread sundar shankar

Hi,
SOrry again fellos. I am not sure whats happening. The day with solr is bad for 
me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm 
subscription and when I did, it said I was already a member. Now my mails are 
all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes 
out right.



We are developing a product in a agile manner and the current 
implementation has a data of size just about a 800 megs in dev. 
The memory allocated to solr on dev (Dual core Linux box) is 128-512.

My config
=

   







true


My Field
===

   
   



  










Problem
==

I execute a query that returns 24 rows of result. I pick 10 out of it. I have 
no problem when I execute this.
But When I do sort it by a String field that is fetched from this result. I get 
an OOM. I am able to execute several
other queries with no problem. Just having a sort asc clause added to the query 
throws an OOM. Why is that.
What should I have ideally done. My config on QA is pretty similar to the dev 
box and probably has more data than on dev. 
It didnt throw any OOM during the integration test. The Autocomplete is a new 
field we added recently.

Another point is that the indexing is done with a field of type string
 

and the autocomplete field is a copy field.

The sorting is done based on string field.

Please do lemme know what mistake am I doing?

Regards
Sundar

P.S: The stack trace of the exception is


Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing 
query
 at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)
 at 
org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)
 at 
com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)
 ... 105 more
Caused by: org.apache.solr.common.SolrException: Java heap space  
java.lang.OutOfMemoryError: Java heap space 
at 
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) 
 
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) 
 
at 
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)
  
at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)
  
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
  
at org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56)  
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)
  
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
  
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)  
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
  
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)
  
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)
  
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)  
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) 
 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
  
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
  
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
  
at 
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
  
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
  
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
  
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
  
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
  
at 
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)
  
at 
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74) 
 
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)  
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) 
at 
org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:156)
  
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
  
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)  
at org.apache.coyote.http11.Http11Processor.process

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar



> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Out of memory on Solr sorting
> Date: Tue, 22 Jul 2008 19:11:02 +
> 
> 
> Hi,
> Sorry again fellos. I am not sure whats happening. The day with solr is bad 
> for me I guess. EZMLM didnt let me send any mails this morning. Asked me to 
> confirm subscription and when I did, it said I was already a member. Now my 
> mails are all coming out bad. Sorry for troubling y'all this bad. I hope this 
> mail comes out right.


Hi,
We are developing a product in a agile manner and the current 
implementation has a data of size just about a 800 megs in dev. 
The memory allocated to solr on dev (Dual core Linux box) is 128-512.

My config
=

   







true


My Field
===

   
   



  










Problem
==

I execute a query that returns 24 rows of result. I pick 10 out of it. I have 
no problem when I execute this.
But When I do sort it by a String field that is fetched from this result. I get 
an OOM. I am able to execute several
other queries with no problem. Just having a sort asc clause added to the query 
throws an OOM. Why is that.
What should I have ideally done. My config on QA is pretty similar to the dev 
box and probably has more data than on dev. 
It didnt throw any OOM during the integration test. The Autocomplete is a new 
field we added recently.

Another point is that the indexing is done with a field of type string
 

and the autocomplete field is a copy field.

The sorting is done based on string field.

Please do lemme know what mistake am I doing?

Regards
Sundar

P.S: The stack trace of the exception is


Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing 
query
 at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)
 at 
org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)
 at 
com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)
 ... 105 more
Caused by: org.apache.solr.common.SolrException: Java heap space  
java.lang.OutOfMemoryError: Java heap space 
at 
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) 
 
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) 
 
at 
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)
  
at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)
  
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
  
at 
org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56)
  
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)
  
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
  
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)  
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
  
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)
  
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)
  
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)  
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) 
 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
  
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
  
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
  
at 
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
  
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
  
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
  
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
  
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
  
at 
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)
  
at 
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74) 
 
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)  
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) 
at 
org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:156)
  
at 
org.apache.catalina.core.StandardEngineValve.invoke

Incremental indexing of database

2008-07-22 Thread anshuljohri

Hi,

In my project i have to index whole database which contains text data only.
So if i follow incremental indexing approch than my problem is that how will
I pick delta data from database. Is there any utility in solr to keep track
the last indexed record. Or is there any other approch to solve this
problem. 

Thanks,
Anshul Johri
-- 
View this message in context: 
http://www.nabble.com/Incremental-indexing-of-database-tp18596613p18596613.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: pf nixes fl

2008-07-22 Thread Jason Rennie
Doh!  I mistakenly changed the request handler from dismax to standard.
Ignore me...

Jason

On Tue, Jul 22, 2008 at 2:59 PM, Jason Rennie <[EMAIL PROTECTED]> wrote:

> I'm using solrj and all I did was add a pf entry to solrconfig.xml.  I
> don't think it could be an ampersand issue...
>
> Here's an example query:
>
> wt=xml&rows=10&start=0&q=urban+outfitters&qt=recsKeyword&version=2.2
>
> Here's qt config:
>
>   
>  0.06
>  
> name^1.5 tags description^0.6 vendorname^0.3
> manufacturer^0.3 category
>  
>  name description
>  id score
>  0
>  status:0
>   
>
> The above query returns all document fields, no "score" field.
>
> Jason
>
>
> On Tue, Jul 22, 2008 at 2:55 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:
>
>>
>> On 22-Jul-08, at 11:53 AM, Jason Rennie wrote:
>>
>>  Just tried adding a pf field to my request handler.  When I did this,
>>> solr
>>> returned all document fields for each doc (no "score") instead of
>>> returning
>>> the fields specified in fl.  Bug?  Feature?  Anyone know what the reason
>>> for
>>> this behavior is?  I'm using solr 1.2.
>>>
>>
>> What exact url did you send to Solr?  I bet there is a missing '&'.
>>
>> -Mike
>>
>
>
>
> --
> Jason Rennie
> Head of Machine Learning Technologies, StyleFeeder
> http://www.stylefeeder.com/
> Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/
>



-- 
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/


RE: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi

org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)

- this piece of code do not request Array[100M] (as I seen with  
Lucene), it asks only few bytes / Kb for a field...



Probably 128 - 512 is not enough; it is also advisable to use equal sizes
-Xms1024M -Xmx1024M
(it minimizes GC frequency, and itensures that 1024M is available at startup)

OOM happens also with fragmented memory, when application requests big  
contigues fragment and GC is unable to optimize; looks like your  
application requests a little and memory is not available...



Quoting sundar shankar <[EMAIL PROTECTED]>:






From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Subject: Out of memory on Solr sorting
Date: Tue, 22 Jul 2008 19:11:02 +


Hi,
Sorry again fellos. I am not sure whats happening. The day with   
solr is bad for me I guess. EZMLM didnt let me send any mails this   
morning. Asked me to confirm subscription and when I did, it said I  
 was already a member. Now my mails are all coming out bad. Sorry   
for troubling y'all this bad. I hope this mail comes out right.



Hi,
We are developing a product in a agile manner and the current   
implementation has a data of size just about a 800 megs in dev.

The memory allocated to solr on dev (Dual core Linux box) is 128-512.

My config
=

   







true


My Field
===





 pattern="([^a-z0-9])" replacement="" replace="all" />
maxGramSize="100" minGramSize="1" />





pattern="([^a-z0-9])" replacement="" replace="all" />
pattern="^(.{20})(.*)?" replacement="$1" replace="all" />





Problem
==

I execute a query that returns 24 rows of result. I pick 10 out of   
it. I have no problem when I execute this.
But When I do sort it by a String field that is fetched from this   
result. I get an OOM. I am able to execute several
other queries with no problem. Just having a sort asc clause added   
to the query throws an OOM. Why is that.
What should I have ideally done. My config on QA is pretty similar   
to the dev box and probably has more data than on dev.
It didnt throw any OOM during the integration test. The Autocomplete  
 is a new field we added recently.


Another point is that the indexing is done with a field of type string
 termVectors="true"/>


and the autocomplete field is a copy field.

The sorting is done based on string field.

Please do lemme know what mistake am I doing?

Regards
Sundar

P.S: The stack trace of the exception is


Caused by: org.apache.solr.client.solrj.SolrServerException: Error   
executing query
 at   
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)
 at   
org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)
 at   
com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)

 ... 105 more
Caused by: org.apache.solr.common.SolrException: Java heap space
java.lang.OutOfMemoryError: Java heap space
at   
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)

at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at   
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352)
at   
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)
at   
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)

at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at   
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)
at   
org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56)
at   
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)
at   
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
at   
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
at   
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
at   
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)
at   
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)
at   
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at   
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
at   
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at   
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at   
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
at   
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Thanks Fuad.
  But why does just sorting provide an OOM. I executed the 
query without adding the sort clause it executed perfectly. In fact I even 
tried remove the maxrows=10 and executed. it came out fine. Queries with bigger 
results seems to come out fine too. But why just sort of that too just 10 rows??
 
-Sundar



> Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]> To: 
> solr-user@lucene.apache.org> Subject: RE: Out of memory on Solr sorting> > 
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)>
>  > - this piece of code do not request Array[100M] (as I seen with > Lucene), 
> it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not 
> enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it 
> minimizes GC frequency, and itensures that 1024M is available at startup)> > 
> OOM happens also with fragmented memory, when application requests big > 
> contigues fragment and GC is unable to optimize; looks like your > 
> application requests a little and memory is not available...> > > Quoting 
> sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> 
> >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr 
> sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +> >>> >>> >> Hi,> >> Sorry 
> again fellos. I am not sure whats happening. The day with > >> solr is bad 
> for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me 
> to confirm subscription and when I did, it said I > >> was already a member. 
> Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. 
> I hope this mail comes out right.> >> >> > Hi,> > We are developing a product 
> in a agile manner and the current > > implementation has a data of size just 
> about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core 
> Linux box) is 128-512.> >> > My config> > => >> > > 
> >> >  > class="solr.LRUCache"> > size="512"> > 
> initialSize="512"> > autowarmCount="256"/>> >> >  > 
> class="solr.LRUCache"> > size="512"> > initialSize="512"> > 
> autowarmCount="256"/>> >> >  > class="solr.LRUCache"> > 
> size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > 
> true> >> >> > My Field> > 
> ===> >> > > > 
> > > > 
> > > >  class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" 
> replacement="" replace="all" />> >  class="solr.EdgeNGramFilterFactory" > > maxGramSize="100" minGramSize="1" />> 
> > > > > >  class="solr.KeywordTokenizerFactory"/>> >  class="solr.LowerCaseFilterFactory" />> >  class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" 
> replacement="" replace="all" />> >  class="solr.PatternReplaceFilterFactory" > > pattern="^(.{20})(.*)?" 
> replacement="$1" replace="all" />> > > > > >> >> > 
> Problem> > ==> >> > I execute a query that returns 24 rows of result. I 
> pick 10 out of > > it. I have no problem when I execute this.> > But When I 
> do sort it by a String field that is fetched from this > > result. I get an 
> OOM. I am able to execute several> > other queries with no problem. Just 
> having a sort asc clause added > > to the query throws an OOM. Why is that.> 
> > What should I have ideally done. My config on QA is pretty similar > > to 
> the dev box and probably has more data than on dev.> > It didnt throw any OOM 
> during the integration test. The Autocomplete > > is a new field we added 
> recently.> >> > Another point is that the indexing is done with a field of 
> type string> >  
> > termVectors="true"/>> >> > and the autocomplete field is a copy field.> >> 
> > The sorting is done based on string field.> >> > Please do lemme know what 
> mistake am I doing?> >> > Regards> > Sundar> >> > P.S: The stack trace of the 
> exception is> >> >> > Caused by: 
> org.apache.solr.client.solrj.SolrServerException: Error > > executing query> 
> > at > > 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)>
>  > at > > 
> org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)>
>  > at > > 
> com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)>
>  > ... 105 more> > Caused by: org.apache.solr.common.SolrException: Java heap 
> space > > java.lang.OutOfMemoryError: Java heap space> > at > > 
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)>
>  > at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > 
> at > > 
> org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352)>
>  > at > > 
> org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)>
>  > at > > 
> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)>
>  > at 
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > 
> at > > 
> org.apache.lucene.search.FieldSortedHit

Re: Out of memory on Solr sorting

2008-07-22 Thread Mark Miller
Because to sort efficiently, Solr loads the term to sort on for each doc 
in the index into an array. For ints,longs, etc its just an array the 
size of the number of docs in your index (i believe deleted or not). For 
a String its an array to hold each unique string and an array of ints 
indexing into the String array.


So if you do a sort, and search for something that only gets 1 doc as a 
hit...your still loading up that field cache for every single doc in 
your index on the first search. With solr, this happens in the 
background as it warms up the searcher. The end story is, you need more 
RAM to accommodate the sort most likely...have you upped your xmx 
setting? I think you can roughly say a 2 million doc index would need 
40-50 MB (depending and rough, but to give an idea) per field your 
sorting on.


- Mark

sundar shankar wrote:

Thanks Fuad.
  But why does just sorting provide an OOM. I executed the 
query without adding the sort clause it executed perfectly. In fact I even 
tried remove the maxrows=10 and executed. it came out fine. Queries with bigger 
results seems to come out fine too. But why just sort of that too just 10 rows??
 
-Sundar




  
Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> Subject: RE: Out of memory on Solr sorting> > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > - this piece of code do not request Array[100M] (as I seen with > Lucene), it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it minimizes GC frequency, and itensures that 1024M is available at startup)> > OOM happens also with fragmented memory, when application requests big > contigues fragment and GC is unable to optimize; looks like your > application requests a little and memory is not available...> > > Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +> >>> >>> >> Hi,> >> Sorry again fellos. I am not sure whats happening. The day with > >> solr is bad for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me to confirm subscription and when I did, it said I > >> was already a member. Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. I hope this mail comes out right.> >> >> > Hi,> > We are developing a product in a agile manner and the current > > implementation has a data of size just about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core Linux box) is 128-512.> >> > My config> > => >> > > >> >  > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> >  > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> >  > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > true> >> >> > My Field> > ===> >> > > > > > > > > >  > pattern="([^a-z0-9])" replacement="" replace="all" />> >  > maxGramSize="100" minGramSize="1" />> > > > > > > > > >  > pattern="([^a-z0-9])" replacement="" replace="all" />> >  > pattern="^(.{20})(.*)?" replacement="$1" replace="all" />> > > > > >> >> > Problem> > ==> >> > I execute a query that returns 24 rows of result. I pick 10 out of > > it. I have no problem when I execute this.> > But When I do sort it by a String field that is fetched from this > > result. I get an OOM. I am able to execute several> > other queries with no problem. Just having a sort asc clause added > > to the query throws an OOM. Why is that.> > What should I have ideally done. My config on QA is pretty similar > > to the dev box and probably has more data than on dev.> > It didnt throw any OOM during the integration test. The Autocomplete > > is a new field we added recently.> >> > Another point is that the indexing is done with a field of type string> >  > termVectors="true"/>> >> > and the autocomplete field is a copy field.> >> > The sorting is done based on string field.> >> > Please do lemme know what mistake am I doing?> >> > Regards> > Sundar> >> > P.S: The stack trace of the exception is> >> >> > Caused by: org.apache.solr.client.solrj.SolrServerException: Error > > executing query> > at > > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)> > at > > org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)> > at > > com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)> > ... 105 more> > Caused by: org.apache.solr.common.SolrException: Java heap space > > java.lang.OutOfMemoryError: Java heap space> > at > > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > at > > org.a

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
I've even seen exceptions (posted here) when "sort"-type queries  
caused Lucene to allocate 100Mb arrays, here is what happened to me:


SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360)
at  
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)





- it does not happen after I increased from 4096M to 8192M (JRockit  
R27; more intelligent stacktrace, isn't it?)


Thanks Mark; I didn't know that it happens only once (on warming up a  
searcher).




Quoting Mark Miller <[EMAIL PROTECTED]>:


Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the size of the number of docs in your index (i believe deleted or
not). For a String its an array to hold each unique string and an array
of ints indexing into the String array.

So if you do a sort, and search for something that only gets 1 doc as a
hit...your still loading up that field cache for every single doc in
your index on the first search. With solr, this happens in the
background as it warms up the searcher. The end story is, you need more
RAM to accommodate the sort most likely...have you upped your xmx
setting? I think you can roughly say a 2 million doc index would need
40-50 MB (depending and rough, but to give an idea) per field your
sorting on.

- Mark

sundar shankar wrote:

Thanks Fuad.
 But why does just sorting provide an OOM. I   
executed the query without adding the sort clause it executed   
perfectly. In fact I even tried remove the maxrows=10 and executed.  
 it came out fine. Queries with bigger results seems to come out   
fine too. But why just sort of that too just 10 rows??

-Sundar




Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]> To:   
solr-user@lucene.apache.org> Subject: RE: Out of memory on Solr   
sorting> >   
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > - this piece of code do not request Array[100M] (as I seen with > Lucene), it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it minimizes GC frequency, and itensures that 1024M is available at startup)> > OOM happens also with fragmented memory, when application requests big > contigues fragment and GC is unable to optimize; looks like your > application requests a little and memory is not available...> > > Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +> >>> >>> >> Hi,> >> Sorry again fellos. I am not sure whats happening. The day with > >> solr is bad for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me to confirm subscription and when I did, it said I > >> was already a member. Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. I hope this mail comes out right.> >> >> > Hi,> > We are developing a product in a agile manner and the current > > implementation has a data of size just about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core Linux box) is 128-512.> >> > My config> > => >> > > >> >  > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> >  > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> >  > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > true> >> >> > My Field> > ===> >> > > > > > > > > >  > pattern="([^a-z0-9])" replacement="" replace="all" />> >  > maxGramSize="100" minGramSize="1" />> > > > > > > > > >  > pattern="([^a-z0-9])" replacement="" replace="all" />> >  > pattern="^(.{20})(.*)?" replacement="$1" replace="all" />> > > > > >> >> > Problem> > ==> >> > I execute a query that returns 24 rows of result. I pick 10 out of > > it. I have no problem when I execute this.> > But When I do sort it by a String field that is fetched from this > > result. I get an OOM. I am able to execute several> > other queries with no problem. Just having a sort asc clause added > > to the query throws an OOM. Why is that.> > What should I have ideally done. My config on QA is pretty similar > > to the dev box and probably has more data than on dev.> > It didnt throw any OOM during the integration test. The Autocomplete > > is a new field we added recently.> >> > Another point is that the indexing is done with a field of type string> >  > termVectors="true"/>> >> > and the autocomplete field is a copy field.> >> > The sorting is done based on string field.> >> > Please do lemme know what mistake am I doing?> >> > Regards> > Sundar> >> > P.S: The stack trace of the exception is> >> >> > Caused by: org.apache.s

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979



I just noticed, this is an exact number of documents in index: 25191979

(http://www.tokenizer.org/, you can sort - click headers Id, [COuntry,  
Site, Price] in a table; experimental)



If array is allocated ONLY on new searcher warming up I am _extremely_  
happy... I had constant OOMs during past month (SUN Java 5).




Quoting Fuad Efendi <[EMAIL PROTECTED]>:


I've even seen exceptions (posted here) when "sort"-type queries caused
Lucene to allocate 100Mb arrays, here is what happened to me:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360)
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)




- it does not happen after I increased from 4096M to 8192M (JRockit
R27; more intelligent stacktrace, isn't it?)

Thanks Mark; I didn't know that it happens only once (on warming up a
searcher).



Quoting Mark Miller <[EMAIL PROTECTED]>:


Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the size of the number of docs in your index (i believe deleted or
not). For a String its an array to hold each unique string and an array
of ints indexing into the String array.

So if you do a sort, and search for something that only gets 1 doc as a
hit...your still loading up that field cache for every single doc in
your index on the first search. With solr, this happens in the
background as it warms up the searcher. The end story is, you need more
RAM to accommodate the sort most likely...have you upped your xmx
setting? I think you can roughly say a 2 million doc index would need
40-50 MB (depending and rough, but to give an idea) per field your
sorting on.

- Mark

sundar shankar wrote:

Thanks Fuad.
But why does just sorting provide an OOM. I
executed the query without adding the sort clause it executed
perfectly. In fact I even tried remove the maxrows=10 and   
executed.  it came out fine. Queries with bigger results seems to   
come out  fine too. But why just sort of that too just 10 rows??

-Sundar




Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]> To:   
 solr-user@lucene.apache.org> Subject: RE: Out of memory on Solr   
 sorting> >
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > - this piece of code do not request Array[100M] (as I seen with > Lucene), it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it minimizes GC frequency, and itensures that 1024M is available at startup)> > OOM happens also with fragmented memory, when application requests big > contigues fragment and GC is unable to optimize; looks like your > application requests a little and memory is not available...> > > Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +> >>> >>> >> Hi,> >> Sorry again fellos. I am not sure whats happening. The day with > >> solr is bad for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me to confirm subscription and when I did, it said I > >> was already a member. Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. I hope this mail comes out right.> >> >> > Hi,> > We are developing a product in a agile manner and the current > > implementation has a data of size just about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core Linux box) is 128-512.> >> > My config> > => >> > > >> >  > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> >  > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> >  > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > true> >> >> > My Field> > ===> >> > > > > > > > > >  > pattern="([^a-z0-9])" replacement="" replace="all" />> >  > maxGramSize="100" minGramSize="1" />> > > > > > > > > >  > pattern="([^a-z0-9])" replacement="" replace="all" />> >  > pattern="^(.{20})(.*)?" replacement="$1" replace="all" />> > > > > >> >> > Problem> > ==> >> > I execute a query that returns 24 rows of result. I pick 10 out of > > it. I have no problem when I execute this.> > But When I do sort it by a String field that is fetched from this > > result. I get an OOM. I am able to execute several> > other queries with no problem. Just having a sort asc clause added > > to the query throws an OOM. Why is that.> > What should I have ideally done. My config on QA is pretty similar > > to the dev box and probably has more data than 

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar

Thanks for the explanation mark. The reason I had it as 512 max was cos earlier 
the data file was just about 30 megs and it increased to this much for of the 
usage of EdgeNGramFactoryFilter for 2 fields. Thats great to know it just 
happens for the first search. But this exception has been occuring for me for 
the whole of today. Should I fiddle around with the warmer settings too? I have 
also instructed an increase in Heap to 1024. Will keep you posted on the turn 
arounds.

Thanks
-Sundar


> Date: Tue, 22 Jul 2008 15:46:04 -0400
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Out of memory on Solr sorting
> 
> Because to sort efficiently, Solr loads the term to sort on for each doc 
> in the index into an array. For ints,longs, etc its just an array the 
> size of the number of docs in your index (i believe deleted or not). For 
> a String its an array to hold each unique string and an array of ints 
> indexing into the String array.
> 
> So if you do a sort, and search for something that only gets 1 doc as a 
> hit...your still loading up that field cache for every single doc in 
> your index on the first search. With solr, this happens in the 
> background as it warms up the searcher. The end story is, you need more 
> RAM to accommodate the sort most likely...have you upped your xmx 
> setting? I think you can roughly say a 2 million doc index would need 
> 40-50 MB (depending and rough, but to give an idea) per field your 
> sorting on.
> 
> - Mark
> 
> sundar shankar wrote:
> > Thanks Fuad.
> >   But why does just sorting provide an OOM. I executed the 
> > query without adding the sort clause it executed perfectly. In fact I even 
> > tried remove the maxrows=10 and executed. it came out fine. Queries with 
> > bigger results seems to come out fine too. But why just sort of that too 
> > just 10 rows??
> >  
> > -Sundar
> >
> >
> >
> >   
> >> Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]> To: 
> >> solr-user@lucene.apache.org> Subject: RE: Out of memory on Solr sorting> > 
> >> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)>
> >>  > - this piece of code do not request Array[100M] (as I seen with > 
> >> Lucene), it asks only few bytes / Kb for a field...> > > Probably 128 - 
> >> 512 is not enough; it is also advisable to use equal sizes> -Xms1024M 
> >> -Xmx1024M> (it minimizes GC frequency, and itensures that 1024M is 
> >> available at startup)> > OOM happens also with fragmented memory, when 
> >> application requests big > contigues fragment and GC is unable to 
> >> optimize; looks like your > application requests a little and memory is 
> >> not available...> > > Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> 
> >> >> >> From: [EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org> >> 
> >> Subject: Out of memory on Solr sorting> >> Date: Tue, 22 Jul 2008 19:11:02 
> >> +> >>> >>> >> Hi,> >> Sorry again fellos. I am not sure whats 
> >> happening. The day with > >> solr is bad for me I guess. EZMLM didnt let 
> >> me send any mails this > >> morning. Asked me to confirm subscription and 
> >> when I did, it said I > >> was already a member. Now my mails are all 
> >> coming out bad. Sorry > >> for troubling y'all this bad. I hope this mail 
> >> comes out right.> >> >> > Hi,> > We are developing a product in a agile 
> >> manner and the current > > implementation has a data of size just about a 
> >> 800 megs in dev.> > The memory allocated to solr on dev (Dual core Linux 
> >> box) is 128-512.> >> > My config> > => >> > > >> >  > class="solr.LRUCache"> > size="512"> > 
> >> initialSize="512"> > autowarmCount="256"/>> >> >  > 
> >> class="solr.LRUCache"> > size="512"> > initialSize="512"> > 
> >> autowarmCount="256"/>> >> >  > class="solr.LRUCache"> > 
> >> size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > 
> >> true> >> >> > My Field> > 
> >> ===> >> > > > 
> >> > >  >> class="solr.KeywordTokenizerFactory"/>> >  >> class="solr.LowerCaseFilterFactory" />> >  >> class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" 
> >> replacement="" replace="all" />> >  >> class="solr.EdgeNGramFilterFactory" > > maxGramSize="100" minGramSize="1" 
> >> />> > > > > >  >> class="solr.KeywordTokenizerFactory"/>> >  >> class="solr.LowerCaseFilterFactory" />> >  >> class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" 
> >> replacement="" replace="all" />> >  >> class="solr.PatternReplaceFilterFactory" > > pattern="^(.{20})(.*)?" 
> >> replacement="$1" replace="all" />> > > > > >> >> > 
> >> Problem> > ==> >> > I execute a query that returns 24 rows of result. 
> >> I pick 10 out of > > it. I have no problem when I execute this.> > But 
> >> When I do sort it by a String field that is fetched from this > > result. 
> >> I get an OOM. I am able to execute several> > other queries with no 
> >> problem. Just having a sort asc clause added > > to the query 

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Sorry, Not 30, but 300 :)

From: [EMAIL PROTECTED]: [EMAIL PROTECTED]: RE: Out of memory on Solr 
sortingDate: Tue, 22 Jul 2008 20:19:49 +


Thanks for the explanation mark. The reason I had it as 512 max was cos earlier 
the data file was just about 30 megs and it increased to this much for of the 
usage of EdgeNGramFactoryFilter for 2 fields. Thats great to know it just 
happens for the first search. But this exception has been occuring for me for 
the whole of today. Should I fiddle around with the warmer settings too? I have 
also instructed an increase in Heap to 1024. Will keep you posted on the turn 
arounds.Thanks-Sundar> Date: Tue, 22 Jul 2008 15:46:04 -0400> From: [EMAIL 
PROTECTED]> To: solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr 
sorting> > Because to sort efficiently, Solr loads the term to sort on for each 
doc > in the index into an array. For ints,longs, etc its just an array the > 
size of the number of docs in your index (i believe deleted or not). For > a 
String its an array to hold each unique string and an array of ints > indexing 
into the String array.> > So if you do a sort, and search for something that 
only gets 1 doc as a > hit...your still loading up that field cache for every 
single doc in > your index on the first search. With solr, this happens in the 
> background as it warms up the searcher. The end story is, you need more > RAM 
to accommodate the sort most likely...have you upped your xmx > setting? I 
think you can roughly say a 2 million doc index would need > 40-50 MB 
(depending and rough, but to give an idea) per field your > sorting on.> > - 
Mark> > sundar shankar wrote:> > Thanks Fuad.> > But why does just sorting 
provide an OOM. I executed the query without adding the sort clause it executed 
perfectly. In fact I even tried remove the maxrows=10 and executed. it came out 
fine. Queries with bigger results seems to come out fine too. But why just sort 
of that too just 10 rows??> > > > -Sundar> >> >> >> > > >> Date: Tue, 22 Jul 
2008 12:24:35 -0700> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> 
Subject: RE: Out of memory on Solr sorting> > 
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)>
 > - this piece of code do not request Array[100M] (as I seen with > Lucene), 
it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not 
enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it 
minimizes GC frequency, and itensures that 1024M is available at startup)> > 
OOM happens also with fragmented memory, when application requests big > 
contigues fragment and GC is unable to optimize; looks like your > application 
requests a little and memory is not available...> > > Quoting sundar shankar 
<[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> >> To: 
solr-user@lucene.apache.org> >> Subject: Out of memory on Solr sorting> >> 
Date: Tue, 22 Jul 2008 19:11:02 +> >>> >>> >> Hi,> >> Sorry again fellos. I 
am not sure whats happening. The day with > >> solr is bad for me I guess. 
EZMLM didnt let me send any mails this > >> morning. Asked me to confirm 
subscription and when I did, it said I > >> was already a member. Now my mails 
are all coming out bad. Sorry > >> for troubling y'all this bad. I hope this 
mail comes out right.> >> >> > Hi,> > We are developing a product in a agile 
manner and the current > > implementation has a data of size just about a 800 
megs in dev.> > The memory allocated to solr on dev (Dual core Linux box) is 
128-512.> >> > My config> > => >> > > >> >  > 
class="solr.LRUCache"> > size="512"> > initialSize="512"> > 
autowarmCount="256"/>> >> >  > class="solr.LRUCache"> > 
size="512"> > initialSize="512"> > autowarmCount="256"/>> >> >  
> class="solr.LRUCache"> > size="512"> > initialSize="512"> > 
autowarmCount="0"/>> >> > 
true> >> >> > My Field> > 
===> >> > > > 
> > > > 
> >  > pattern="([^a-z0-9])" 
replacement="" replace="all" />> >  > maxGramSize="100" minGramSize="1" />> > > > > > > > > >  > pattern="([^a-z0-9])" 
replacement="" replace="all" />> >  > pattern="^(.{20})(.*)?" 
replacement="$1" replace="all" />> > > > > >> >> > 
Problem> > ==> >> > I execute a query that returns 24 rows of result. I 
pick 10 out of > > it. I have no problem when I execute this.> > But When I do 
sort it by a String field that is fetched from this > > result. I get an OOM. I 
am able to execute several> > other queries with no problem. Just having a sort 
asc clause added > > to the query throws an OOM. Why is that.> > What should I 
have ideally done. My config on QA is pretty similar > > to the dev box and 
probably has more data than on dev.> > It didnt throw any OOM during the 
integration test. The Autocomplete > > is a new field we added recently.> >> > 
Another point is that the indexing is done with a field of type string> > 
 > 
termVectors="true"/>> >> > and the autocomplete field is a copy field.> >> > 
The sorting is

Re: Out of memory on Solr sorting

2008-07-22 Thread Mark Miller

Fuad Efendi wrote:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979



I just noticed, this is an exact number of documents in index: 25191979

(http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, 
Site, Price] in a table; experimental)



If array is allocated ONLY on new searcher warming up I am _extremely_ 
happy... I had constant OOMs during past month (SUN Java 5).
It is only on warmup - I believe its lazy loaded, so the first time a 
search is done (solr does the search as part of warmup I believe) the 
fieldcache is loaded. The underlying IndexReader is the key to the 
fieldcache, so until you get a new IndexReader (SolrSearcher in solr 
world?) the field cache will be good. Keep in mind that as a searcher is 
warming, the other search is still serving, so that will up the ram 
requirements...and since I think you can have >1 searchers on deck...you 
get the idea.


As far as the number I gave, thats from a memory made months and months 
ago, so go with what you see.




Quoting Fuad Efendi <[EMAIL PROTECTED]>:


I've even seen exceptions (posted here) when "sort"-type queries caused
Lucene to allocate 100Mb arrays, here is what happened to me:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) 


at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) 






- it does not happen after I increased from 4096M to 8192M (JRockit
R27; more intelligent stacktrace, isn't it?)

Thanks Mark; I didn't know that it happens only once (on warming up a
searcher).



Quoting Mark Miller <[EMAIL PROTECTED]>:


Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the size of the number of docs in your index (i believe deleted or
not). For a String its an array to hold each unique string and an array
of ints indexing into the String array.

So if you do a sort, and search for something that only gets 1 doc as a
hit...your still loading up that field cache for every single doc in
your index on the first search. With solr, this happens in the
background as it warms up the searcher. The end story is, you need more
RAM to accommodate the sort most likely...have you upped your xmx
setting? I think you can roughly say a 2 million doc index would need
40-50 MB (depending and rough, but to give an idea) per field your
sorting on.

- Mark

sundar shankar wrote:

Thanks Fuad.
But why does just sorting provide an OOM. I   
executed the query without adding the sort clause it executed   
perfectly. In fact I even tried remove the maxrows=10 and  
executed.  it came out fine. Queries with bigger results seems to  
come out  fine too. But why just sort of that too just 10 rows??

-Sundar




Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]> To:  
 solr-user@lucene.apache.org> Subject: RE: Out of memory on Solr  
 sorting> >   
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> 
> - this piece of code do not request Array[100M] (as I seen with 
> Lucene), it asks only few bytes / Kb for a field...> > > 
Probably 128 - 512 is not enough; it is also advisable to use 
equal sizes> -Xms1024M -Xmx1024M> (it minimizes GC frequency, and 
itensures that 1024M is available at startup)> > OOM happens also 
with fragmented memory, when application requests big > contigues 
fragment and GC is unable to optimize; looks like your > 
application requests a little and memory is not available...> > > 
Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> 
From: [EMAIL PROTECTED]> >> To: 
solr-user@lucene.apache.org> >> Subject: Out of memory on Solr 
sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +> >>> >>> >> Hi,> 
>> Sorry again fellos. I am not sure whats happening. The day with 
> >> solr is bad for me I guess. EZMLM didnt let me send any mails 
this > >> morning. Asked me to confirm subscription and when I 
did, it said I > >> was already a member. Now my mails are all 
coming out bad. Sorry > >> for troubling y'all this bad. I hope 
this mail comes out right.> >> >> > Hi,> > We are developing a 
product in a agile manner and the current > > implementation has a 
data of size just about a 800 megs in dev.> > The memory allocated 
to solr on dev (Dual core Linux box) is 128-512.> >> > My config> 
> => >> > > >> > 
 > class="solr.LRUCache"> > size="512"> > 
initialSize="512"> > autowarmCount="256"/>> >> > 
 > class="solr.LRUCache"> > size="512"> > 
initialSize="512"> > autowarmCount="256"/>> >> >  > 
class="solr.LRUCache"> > size="512"> > initialSize="512"> > 
autowarmCount="0"/>> >> > 
true> >> >> > My 
Field> > ===> >> > class="solr.TextField">> > > > class="solr.KeywordTokenizerFactory"/>> > class="solr.LowerCaseFilterFactory"

Re: Incremental indexing of database

2008-07-22 Thread Ravish Bhagdev
Can't you write triggers for your database/tables you want to index?
That way you can keep track of all kinds of changes and updates and
not just addition of a new record.

Ravish

On Tue, Jul 22, 2008 at 8:15 PM, anshuljohri <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> In my project i have to index whole database which contains text data only.
> So if i follow incremental indexing approch than my problem is that how will
> I pick delta data from database. Is there any utility in solr to keep track
> the last indexed record. Or is there any other approch to solve this
> problem.
>
> Thanks,
> Anshul Johri
> --
> View this message in context: 
> http://www.nabble.com/Incremental-indexing-of-database-tp18596613p18596613.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi

Mark,

Question: how much memory I need for 25,000,000 docs if I do a sort by  
 field, 256 bytes. 6.4Gb?



Quoting Mark Miller <[EMAIL PROTECTED]>:


Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the size of the number of docs in your index (i believe deleted or
not). For a String its an array to hold each unique string and an array
of ints indexing into the String array.

So if you do a sort, and search for something that only gets 1 doc as a
hit...your still loading up that field cache for every single doc in
your index on the first search. With solr, this happens in the
background as it warms up the searcher. The end story is, you need more
RAM to accommodate the sort most likely...have you upped your xmx
setting? I think you can roughly say a 2 million doc index would need
40-50 MB (depending and rough, but to give an idea) per field your
sorting on.

- Mark







Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi

Thank you very much Mark,

it explains me a lot;

I am guessing: for 1,000,000 documents with a [string] field of  
average size 1024 bytes I need 1Gb for single IndexSearcher instance;  
field-level cache it is used internally by Lucene (can Lucene manage  
size if it?); we can't have 1G of such documents without having 1Tb  
RAM...




Quoting Mark Miller <[EMAIL PROTECTED]>:


Fuad Efendi wrote:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979



I just noticed, this is an exact number of documents in index: 25191979

(http://www.tokenizer.org/, you can sort - click headers Id,   
[COuntry, Site, Price] in a table; experimental)



If array is allocated ONLY on new searcher warming up I am   
_extremely_ happy... I had constant OOMs during past month (SUN   
Java 5).

It is only on warmup - I believe its lazy loaded, so the first time a
search is done (solr does the search as part of warmup I believe) the
fieldcache is loaded. The underlying IndexReader is the key to the
fieldcache, so until you get a new IndexReader (SolrSearcher in solr
world?) the field cache will be good. Keep in mind that as a searcher
is warming, the other search is still serving, so that will up the ram
requirements...and since I think you can have >1 searchers on
deck...you get the idea.

As far as the number I gave, thats from a memory made months and months
ago, so go with what you see.




Quoting Fuad Efendi <[EMAIL PROTECTED]>:


I've even seen exceptions (posted here) when "sort"-type queries caused
Lucene to allocate 100Mb arrays, here is what happened to me:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
   at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360)   
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)   
- it does not happen after I increased from 4096M to 8192M (JRockit

R27; more intelligent stacktrace, isn't it?)

Thanks Mark; I didn't know that it happens only once (on warming up a
searcher).



Quoting Mark Miller <[EMAIL PROTECTED]>:


Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the size of the number of docs in your index (i believe deleted or
not). For a String its an array to hold each unique string and an array
of ints indexing into the String array.

So if you do a sort, and search for something that only gets 1 doc as a
hit...your still loading up that field cache for every single doc in
your index on the first search. With solr, this happens in the
background as it warms up the searcher. The end story is, you need more
RAM to accommodate the sort most likely...have you upped your xmx
setting? I think you can roughly say a 2 million doc index would need
40-50 MB (depending and rough, but to give an idea) per field your
sorting on.

- Mark

sundar shankar wrote:

Thanks Fuad.
   But why does just sorting provide an OOM. I 
executed the query without adding the sort clause it executed 
perfectly. In fact I even tried remove the maxrows=10 and
executed.  it came out fine. Queries with bigger results seems   
to  come out  fine too. But why just sort of that too just 10   
rows??

-Sundar




Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]>   
To:   solr-user@lucene.apache.org> Subject: RE: Out of memory   
on Solr   sorting> > 
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > - this piece of code do not request Array[100M] (as I seen with > Lucene), it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it minimizes GC frequency, and itensures that 1024M is available at startup)> > OOM happens also with fragmented memory, when application requests big > contigues fragment and GC is unable to optimize; looks like your > application requests a little and memory is not available...> > > Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +> >>> >>> >> Hi,> >> Sorry again fellos. I am not sure whats happening. The day with > >> solr is bad for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me to confirm subscription and when I did, it said I > >> was already a member. Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. I hope this mail comes out right.> >> >> > Hi,> > We are developing a product in a agile manner and the current > > implementation has a data of size just about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core Linux box) is 128-512.> >> > My config> > => >> > > >> >  > class="solr.LRUCache"> > size="512"> > initialSize="

Re: Out of memory on Solr sorting

2008-07-22 Thread Mark Miller

Hmmm...I think its 32bits an integer with an index entry for each doc, so


   **25 000 000 x 32 bits = 95.3674316 megabytes**

Then you have the string array that contains each unique term from your 
index...you can guess that based on the number of terms in your index 
and an avg length guess.


There is some other overhead beyond the sort cache as well, but thats 
the bulk of what it will add. I think my memory may be bad with my 
original estimate :)


Fuad Efendi wrote:

Thank you very much Mark,

it explains me a lot;

I am guessing: for 1,000,000 documents with a [string] field of 
average size 1024 bytes I need 1Gb for single IndexSearcher instance; 
field-level cache it is used internally by Lucene (can Lucene manage 
size if it?); we can't have 1G of such documents without having 1Tb 
RAM...




Quoting Mark Miller <[EMAIL PROTECTED]>:


Fuad Efendi wrote:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979



I just noticed, this is an exact number of documents in index: 25191979

(http://www.tokenizer.org/, you can sort - click headers Id,  
[COuntry, Site, Price] in a table; experimental)



If array is allocated ONLY on new searcher warming up I am  
_extremely_ happy... I had constant OOMs during past month (SUN  
Java 5).

It is only on warmup - I believe its lazy loaded, so the first time a
search is done (solr does the search as part of warmup I believe) the
fieldcache is loaded. The underlying IndexReader is the key to the
fieldcache, so until you get a new IndexReader (SolrSearcher in solr
world?) the field cache will be good. Keep in mind that as a searcher
is warming, the other search is still serving, so that will up the ram
requirements...and since I think you can have >1 searchers on
deck...you get the idea.

As far as the number I gave, thats from a memory made months and months
ago, so go with what you see.




Quoting Fuad Efendi <[EMAIL PROTECTED]>:

I've even seen exceptions (posted here) when "sort"-type queries 
caused

Lucene to allocate 100Mb arrays, here is what happened to me:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
   at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360)  
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  
- it does not happen after I increased from 4096M to 8192M (JRockit

R27; more intelligent stacktrace, isn't it?)

Thanks Mark; I didn't know that it happens only once (on warming up a
searcher).



Quoting Mark Miller <[EMAIL PROTECTED]>:


Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the size of the number of docs in your index (i believe deleted or
not). For a String its an array to hold each unique string and an 
array

of ints indexing into the String array.

So if you do a sort, and search for something that only gets 1 doc 
as a

hit...your still loading up that field cache for every single doc in
your index on the first search. With solr, this happens in the
background as it warms up the searcher. The end story is, you need 
more

RAM to accommodate the sort most likely...have you upped your xmx
setting? I think you can roughly say a 2 million doc index would need
40-50 MB (depending and rough, but to give an idea) per field your
sorting on.

- Mark

sundar shankar wrote:

Thanks Fuad.
   But why does just sorting provide an OOM. I
executed the query without adding the sort clause it executed
perfectly. In fact I even tried remove the maxrows=10 and   
executed.  it came out fine. Queries with bigger results seems  
to  come out  fine too. But why just sort of that too just 10  
rows??

-Sundar




Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]>  
To:   solr-user@lucene.apache.org> Subject: RE: Out of memory  
on Solr   sorting> >
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> 
> - this piece of code do not request Array[100M] (as I seen 
with > Lucene), it asks only few bytes / Kb for a field...> > > 
Probably 128 - 512 is not enough; it is also advisable to use 
equal sizes> -Xms1024M -Xmx1024M> (it minimizes GC frequency, 
and itensures that 1024M is available at startup)> > OOM happens 
also with fragmented memory, when application requests big > 
contigues fragment and GC is unable to optimize; looks like your 
> application requests a little and memory is not available...> 
> > Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> 
>> >> From: [EMAIL PROTECTED]> >> To: 
solr-user@lucene.apache.org> >> Subject: Out of memory on Solr 
sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +> >>> >>> >> 
Hi,> >> Sorry again fellos. I am not sure whats happening. The 
day with > >> solr is bad for me I guess. EZMLM didnt let me 
send any mails this > >> morning. Asked me to confirm 
subscri

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Hi Mark,
I am still getting an OOM even after increasing the heap to 1024. 
The docset I have is 
 
numDocs : 1138976 maxDoc : 1180554 
 
Not sure how much more I would need. Is there any other way out of this. I 
noticed another interesting behavior. I have a Solr setup on a personal Box 
where I try out a lot of different configuration and stuff before I even roll 
the changes out to dev. This server has been running with a similar indexed 
data for a lot longer than the dev box and it seems to have fetched the results 
out properly. 
This box is a windows 2 core processor with just about a gig of memory and the 
whole 1024 megs have been allocated to heap. The dev is a linux with over 2 
Gigs of memory and 1024 allocated to heap now. :S
 
-Sundar



> Date: Tue, 22 Jul 2008 13:17:40 -0700> From: [EMAIL PROTECTED]> To: 
> solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> > 
> Mark,> > Question: how much memory I need for 25,000,000 docs if I do a sort 
> by >  field, 256 bytes. 6.4Gb?> > > Quoting Mark Miller <[EMAIL 
> PROTECTED]>:> > > Because to sort efficiently, Solr loads the term to sort on 
> for each> > doc in the index into an array. For ints,longs, etc its just an 
> array> > the size of the number of docs in your index (i believe deleted or> 
> > not). For a String its an array to hold each unique string and an array> > 
> of ints indexing into the String array.> >> > So if you do a sort, and search 
> for something that only gets 1 doc as a> > hit...your still loading up that 
> field cache for every single doc in> > your index on the first search. With 
> solr, this happens in the> > background as it warms up the searcher. The end 
> story is, you need more> > RAM to accommodate the sort most likely...have you 
> upped your xmx> > setting? I think you can roughly say a 2 million doc index 
> would need> > 40-50 MB (depending and rough, but to give an idea) per field 
> your> > sorting on.> >> > - Mark> >> > > 
_
Wish to Marry Now? Click Here to Register FREE
http://www.shaadi.com/registration/user/index.php?ptnr=mhottag

Re: Out of memory on Solr sorting

2008-07-22 Thread Mark Miller
Someone else is going to have to take over Sundar - I am new to solr 
myself. I will say this though - 25 million docs is pushing the limits 
of a single machine - especially with only 2 gig of RAM, especially with 
any sort fields. You are at the edge I believe.


But perhaps you can get by. Have you checked out all the solr stats on 
the admin page? Maybe you are trying to load up to many searchers at a 
time. I think there is a setting to limit the number of searchers that 
can be on deck...


sundar shankar wrote:

Hi Mark,
I am still getting an OOM even after increasing the heap to 1024. The docset I have is 
 
numDocs : 1138976 maxDoc : 1180554 
 
Not sure how much more I would need. Is there any other way out of this. I noticed another interesting behavior. I have a Solr setup on a personal Box where I try out a lot of different configuration and stuff before I even roll the changes out to dev. This server has been running with a similar indexed data for a lot longer than the dev box and it seems to have fetched the results out properly. 
This box is a windows 2 core processor with just about a gig of memory and the whole 1024 megs have been allocated to heap. The dev is a linux with over 2 Gigs of memory and 1024 allocated to heap now. :S
 
-Sundar




  
Date: Tue, 22 Jul 2008 13:17:40 -0700> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> > Mark,> > Question: how much memory I need for 25,000,000 docs if I do a sort by >  field, 256 bytes. 6.4Gb?> > > Quoting Mark Miller <[EMAIL PROTECTED]>:> > > Because to sort efficiently, Solr loads the term to sort on for each> > doc in the index into an array. For ints,longs, etc its just an array> > the size of the number of docs in your index (i believe deleted or> > not). For a String its an array to hold each unique string and an array> > of ints indexing into the String array.> >> > So if you do a sort, and search for something that only gets 1 doc as a> > hit...your still loading up that field cache for every single doc in> > your index on the first search. With solr, this happens in the> > background as it warms up the searcher. The end story is, you need more> > RAM to accommodate the sort most likely...have you upped your xmx> > setting? I think you can roughly say a 2 million doc index would need> > 40-50 MB (depending and rough, but to give an idea) per field your> > sorting on.> >> > - Mark> >> > > 


_
Wish to Marry Now? Click Here to Register FREE
http://www.shaadi.com/registration/user/index.php?ptnr=mhottag
  





RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Thanks for your help Mark. Lemme explore a little more and see if some one else 
can help me out too. :)

> Date: Tue, 22 Jul 2008 16:53:47 -0400> From: [EMAIL PROTECTED]> To: 
> solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> > 
> Someone else is going to have to take over Sundar - I am new to solr > 
> myself. I will say this though - 25 million docs is pushing the limits > of a 
> single machine - especially with only 2 gig of RAM, especially with > any 
> sort fields. You are at the edge I believe.> > But perhaps you can get by. 
> Have you checked out all the solr stats on > the admin page? Maybe you are 
> trying to load up to many searchers at a > time. I think there is a setting 
> to limit the number of searchers that > can be on deck...> > sundar shankar 
> wrote:> > Hi Mark,> > I am still getting an OOM even after increasing the 
> heap to 1024. The docset I have is > > > > numDocs : 1138976 maxDoc : 1180554 
> > > > > Not sure how much more I would need. Is there any other way out of 
> this. I noticed another interesting behavior. I have a Solr setup on a 
> personal Box where I try out a lot of different configuration and stuff 
> before I even roll the changes out to dev. This server has been running with 
> a similar indexed data for a lot longer than the dev box and it seems to have 
> fetched the results out properly. > > This box is a windows 2 core processor 
> with just about a gig of memory and the whole 1024 megs have been allocated 
> to heap. The dev is a linux with over 2 Gigs of memory and 1024 allocated to 
> heap now. :S> > > > -Sundar> >> >> >> > > >> Date: Tue, 22 Jul 2008 13:17:40 
> -0700> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> Subject: Re: 
> Out of memory on Solr sorting> > Mark,> > Question: how much memory I need 
> for 25,000,000 docs if I do a sort by >  field, 256 bytes. 6.4Gb?> > 
> > Quoting Mark Miller <[EMAIL PROTECTED]>:> > > Because to sort efficiently, 
> Solr loads the term to sort on for each> > doc in the index into an array. 
> For ints,longs, etc its just an array> > the size of the number of docs in 
> your index (i believe deleted or> > not). For a String its an array to hold 
> each unique string and an array> > of ints indexing into the String array.> 
> >> > So if you do a sort, and search for something that only gets 1 doc as a> 
> > hit...your still loading up that field cache for every single doc in> > 
> your index on the first search. With solr, this happens in the> > background 
> as it warms up the searcher. The end story is, you need more> > RAM to 
> accommodate the sort most likely...have you upped your xmx> > setting? I 
> think you can roughly say a 2 million doc index would need> > 40-50 MB 
> (depending and rough, but to give an idea) per field your> > sorting on.> >> 
> > - Mark> >> > > > >> > > 
> _> > Wish to 
> Marry Now? Click Here to Register FREE> > 
> http://www.shaadi.com/registration/user/index.php?ptnr=mhottag> > > > 
_
Missed your favourite programme? Stop surfing TV channels and start planning 
your weekend TV viewing with our comprehensive TV Listing
http://entertainment.in.msn.com/TV/TVListing.aspx

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi


Ok, what is confusing me is implicit guess that FieldCache contains  
"field" and Lucene uses in-memory sort instead of using file-system  
"index"...


Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte  
integers) to documents in index.


org.apache.lucene.search.FieldCacheImpl$10.createValue
...
357: protected Object createValue(IndexReader reader, Object fieldKey)
358:   throws IOException {
359:   String field = ((String) fieldKey).intern();
360:   final int[] retArray = new int[reader.maxDoc()]; // OutOfMemoryError!!!
...
408:   StringIndex value = new StringIndex (retArray, mterms);
409:   return value;
410: }
...

It's very confusing, I don't know such internals...


 termVectors="true"/>

 The sorting is done based on string field.



I think Sundar should not use [termVectors="true"]...



Quoting Mark Miller <[EMAIL PROTECTED]>:


Hmmm...I think its 32bits an integer with an index entry for each doc, so


   **25 000 000 x 32 bits = 95.3674316 megabytes**

Then you have the string array that contains each unique term from your
index...you can guess that based on the number of terms in your index
and an avg length guess.

There is some other overhead beyond the sort cache as well, but thats
the bulk of what it will add. I think my memory may be bad with my
original estimate :)

Fuad Efendi wrote:

Thank you very much Mark,

it explains me a lot;

I am guessing: for 1,000,000 documents with a [string] field of   
average size 1024 bytes I need 1Gb for single IndexSearcher   
instance; field-level cache it is used internally by Lucene (can   
Lucene manage size if it?); we can't have 1G of such documents   
without having 1Tb RAM...




Quoting Mark Miller <[EMAIL PROTECTED]>:


Fuad Efendi wrote:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979



I just noticed, this is an exact number of documents in index: 25191979

(http://www.tokenizer.org/, you can sort - click headers Id,
[COuntry, Site, Price] in a table; experimental)



If array is allocated ONLY on new searcher warming up I am
_extremely_ happy... I had constant OOMs during past month (SUN
Java 5).

It is only on warmup - I believe its lazy loaded, so the first time a
search is done (solr does the search as part of warmup I believe) the
fieldcache is loaded. The underlying IndexReader is the key to the
fieldcache, so until you get a new IndexReader (SolrSearcher in solr
world?) the field cache will be good. Keep in mind that as a searcher
is warming, the other search is still serving, so that will up the ram
requirements...and since I think you can have >1 searchers on
deck...you get the idea.

As far as the number I gave, thats from a memory made months and months
ago, so go with what you see.




Quoting Fuad Efendi <[EMAIL PROTECTED]>:


I've even seen exceptions (posted here) when "sort"-type queries caused
Lucene to allocate 100Mb arrays, here is what happened to me:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
  at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360)
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)  - it does not happen after I increased from 4096M to 8192M   
(JRockit

R27; more intelligent stacktrace, isn't it?)

Thanks Mark; I didn't know that it happens only once (on warming up a
searcher).



Quoting Mark Miller <[EMAIL PROTECTED]>:


Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the size of the number of docs in your index (i believe deleted or
not). For a String its an array to hold each unique string and an array
of ints indexing into the String array.

So if you do a sort, and search for something that only gets 1 doc as a
hit...your still loading up that field cache for every single doc in
your index on the first search. With solr, this happens in the
background as it warms up the searcher. The end story is, you need more
RAM to accommodate the sort most likely...have you upped your xmx
setting? I think you can roughly say a 2 million doc index would need
40-50 MB (depending and rough, but to give an idea) per field your
sorting on.

- Mark

sundar shankar wrote:

Thanks Fuad.
  But why does just sorting provide an OOM. I  
executed the query without adding the sort clause it executed   
   perfectly. In fact I even tried remove the maxrows=10 and
 executed.  it came out fine. Queries with bigger results  
seems   to  come out  fine too. But why just sort of that too  
just 10   rows??

-Sundar




Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]>   
 To:   solr-user@lucene.apache.org> Subject: RE: Out of  
memory   on Solr   sorting> >  
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > - this piece of code do not r

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi


Ok, after some analysis of FieldCacheImpl:

- it is supposed that (sorted) Enumeration of "terms" is less than  
total number of documents
(so that SOLR uses specific field type for sorted searches:  
solr.StrField with omitNorms="true")


It creates int[reader.maxDoc()] array, checks (sorted) Enumeration of  
"terms" (untokenized solr.StrField), and populates array with document  
Ids.



- it also creates array of String
 String[] mterms = new String[reader.maxDoc()+1];

Why do we need that? For 1G document with average term/StrField size  
of 100 bytes (which could be unique text!!!) it will create kind of  
huge 100Gb cache which is not really needed...

  StringIndex value = new StringIndex (retArray, mterms);

If I understand correctly... StringIndex _must_ be a file in a  
filesystem for such a case... We create StringIndex, and retrieve top  
10 documents, huge overhead.






Quoting Fuad Efendi <[EMAIL PROTECTED]>:



Ok, what is confusing me is implicit guess that FieldCache contains
"field" and Lucene uses in-memory sort instead of using file-system
"index"...

Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte
integers) to documents in index.

org.apache.lucene.search.FieldCacheImpl$10.createValue
...
357: protected Object createValue(IndexReader reader, Object fieldKey)
358:   throws IOException {
359:   String field = ((String) fieldKey).intern();
360:   final int[] retArray = new int[reader.maxDoc()]; //   
OutOfMemoryError!!!

...
408:   StringIndex value = new StringIndex (retArray, mterms);
409:   return value;
410: }
...

It's very confusing, I don't know such internals...


termVectors="true"/>

The sorting is done based on string field.



I think Sundar should not use [termVectors="true"]...



Quoting Mark Miller <[EMAIL PROTECTED]>:


Hmmm...I think its 32bits an integer with an index entry for each doc, so


  **25 000 000 x 32 bits = 95.3674316 megabytes**

Then you have the string array that contains each unique term from your
index...you can guess that based on the number of terms in your index
and an avg length guess.

There is some other overhead beyond the sort cache as well, but thats
the bulk of what it will add. I think my memory may be bad with my
original estimate :)

Fuad Efendi wrote:

Thank you very much Mark,

it explains me a lot;

I am guessing: for 1,000,000 documents with a [string] field of
average size 1024 bytes I need 1Gb for single IndexSearcher
instance; field-level cache it is used internally by Lucene (can
Lucene manage size if it?); we can't have 1G of such documents
without having 1Tb RAM...




Quoting Mark Miller <[EMAIL PROTECTED]>:


Fuad Efendi wrote:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979



I just noticed, this is an exact number of documents in index: 25191979

(http://www.tokenizer.org/, you can sort - click headers Id, 
[COuntry, Site, Price] in a table; experimental)



If array is allocated ONLY on new searcher warming up I am 
_extremely_ happy... I had constant OOMs during past month (SUN   
  Java 5).

It is only on warmup - I believe its lazy loaded, so the first time a
search is done (solr does the search as part of warmup I believe) the
fieldcache is loaded. The underlying IndexReader is the key to the
fieldcache, so until you get a new IndexReader (SolrSearcher in solr
world?) the field cache will be good. Keep in mind that as a searcher
is warming, the other search is still serving, so that will up the ram
requirements...and since I think you can have >1 searchers on
deck...you get the idea.

As far as the number I gave, thats from a memory made months and months
ago, so go with what you see.




Quoting Fuad Efendi <[EMAIL PROTECTED]>:


I've even seen exceptions (posted here) when "sort"-type queries caused
Lucene to allocate 100Mb arrays, here is what happened to me:

SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
 at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) 
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M
(JRockit

R27; more intelligent stacktrace, isn't it?)

Thanks Mark; I didn't know that it happens only once (on warming up a
searcher).



Quoting Mark Miller <[EMAIL PROTECTED]>:


Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the size of the number of docs in your index (i believe deleted or
not). For a String its an array to hold each unique string and an array
of ints indexing into the String array.

So if you do a sort, and search for something that only gets 1 doc as a
hit...your still loading up that field cache for every single doc in
your index on the first search. With solr, this happens in the
bac

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
I haven't seen the source code before, But I don't know why the sorting isn't 
done after the fetch is done. Wouldn't that make it more faster. at least in 
case of field level sorting? I could be wrong too and the implementation might 
probably be better. But don't know why all of the fields have had to be loaded.
 
 



> Date: Tue, 22 Jul 2008 14:26:26 -0700> From: [EMAIL PROTECTED]> To: 
> solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> > > 
> Ok, after some analysis of FieldCacheImpl:> > - it is supposed that (sorted) 
> Enumeration of "terms" is less than > total number of documents> (so that 
> SOLR uses specific field type for sorted searches: > solr.StrField with 
> omitNorms="true")> > It creates int[reader.maxDoc()] array, checks (sorted) 
> Enumeration of > "terms" (untokenized solr.StrField), and populates array 
> with document > Ids.> > > - it also creates array of String> String[] mterms 
> = new String[reader.maxDoc()+1];> > Why do we need that? For 1G document with 
> average term/StrField size > of 100 bytes (which could be unique text!!!) it 
> will create kind of > huge 100Gb cache which is not really needed...> 
> StringIndex value = new StringIndex (retArray, mterms);> > If I understand 
> correctly... StringIndex _must_ be a file in a > filesystem for such a 
> case... We create StringIndex, and retrieve top > 10 documents, huge 
> overhead.> > > > > > Quoting Fuad Efendi <[EMAIL PROTECTED]>:> > >> > Ok, 
> what is confusing me is implicit guess that FieldCache contains> > "field" 
> and Lucene uses in-memory sort instead of using file-system> > 
> "index"...> >> > Array syze: 100Mb (25M x 4 bytes), and it is just 
> pointers (4-byte> > integers) to documents in index.> >> > 
> org.apache.lucene.search.FieldCacheImpl$10.createValue> > ...> > 357: 
> protected Object createValue(IndexReader reader, Object fieldKey)> > 358: 
> throws IOException {> > 359: String field = ((String) fieldKey).intern();> > 
> 360: final int[] retArray = new int[reader.maxDoc()]; // > > 
> OutOfMemoryError!!!> > ...> > 408: StringIndex value = new StringIndex 
> (retArray, mterms);> > 409: return value;> > 410: }> > ...> >> > It's very 
> confusing, I don't know such internals...> >> >> >  type="string" indexed="true" stored="true" > > termVectors="true"/>> 
> > The sorting is done based on string field.> >> >> > I think Sundar 
> should not use [termVectors="true"]...> >> >> >> > Quoting Mark Miller 
> <[EMAIL PROTECTED]>:> >> >> Hmmm...I think its 32bits an integer with an 
> index entry for each doc, so> >>> >>> >> **25 000 000 x 32 bits = 95.3674316 
> megabytes**> >>> >> Then you have the string array that contains each unique 
> term from your> >> index...you can guess that based on the number of terms in 
> your index> >> and an avg length guess.> >>> >> There is some other overhead 
> beyond the sort cache as well, but thats> >> the bulk of what it will add. I 
> think my memory may be bad with my> >> original estimate :)> >>> >> Fuad 
> Efendi wrote:> >>> Thank you very much Mark,>  >>> it explains me a lot;> 
>  >>> I am guessing: for 1,000,000 documents with a [string] field of > 
> >>> average size 1024 bytes I need 1Gb for single IndexSearcher > >>> 
> instance; field-level cache it is used internally by Lucene (can > >>> Lucene 
> manage size if it?); we can't have 1G of such documents > >>> without having 
> 1Tb RAM...>    >>> Quoting Mark Miller <[EMAIL PROTECTED]>:>  
>  Fuad Efendi wrote:> >> SEVERE: java.lang.OutOfMemoryError: 
> allocLargeObjectOrArray - Object> >> size: 100767936, Num elements: 
> 25191979> >> >> > I just noticed, this is an exact number of 
> documents in index: 25191979> >> > (http://www.tokenizer.org/, you 
> can sort - click headers Id, > > [COuntry, Site, Price] in a table; 
> experimental)> >> >> > If array is allocated ONLY on new searcher 
> warming up I am > > _extremely_ happy... I had constant OOMs during past 
> month (SUN > > Java 5).>  It is only on warmup - I believe its lazy 
> loaded, so the first time a>  search is done (solr does the search as 
> part of warmup I believe) the>  fieldcache is loaded. The underlying 
> IndexReader is the key to the>  fieldcache, so until you get a new 
> IndexReader (SolrSearcher in solr>  world?) the field cache will be good. 
> Keep in mind that as a searcher>  is warming, the other search is still 
> serving, so that will up the ram>  requirements...and since I think you 
> can have >1 searchers on>  deck...you get the idea.> >  As far as 
> the number I gave, thats from a memory made months and months>  ago, so 
> go with what you see.> >> >> >> > Quoting Fuad Efendi <[EMAIL 
> PROTECTED]>:> >> >> I've even seen exceptions (posted here) when 
> "sort"-type queries caused> >> Lucene to allocate 100Mb arrays, here is 
> what ha

RE: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
I am hoping [new StringIndex (retArray, mterms)] is called only once  
per-sort-field and cached somewhere at Lucene;


theoretically you need multiply number of documents on size of field  
(supposing that field contains unique text); you need not tokenize  
this field; you need not store TermVector.


for 2 000 000 documents with simple untokenized text field such as  
title of book (256 bytes) you need probably 512 000 000 bytes per  
Searcher, and as Mark mentioned you should limit number of searchers  
in SOLR.


So that Xmx512M is definitely not enough even for simple cases...


Quoting sundar shankar <[EMAIL PROTECTED]>:

I haven't seen the source code before, But I don't know why the   
sorting isn't done after the fetch is done. Wouldn't that make it   
more faster. at least in case of field level sorting? I could be   
wrong too and the implementation might probably be better. But don't  
 know why all of the fields have had to be loaded.






Date: Tue, 22 Jul 2008 14:26:26 -0700> From: [EMAIL PROTECTED]> To:   
solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr   
sorting> > > Ok, after some analysis of FieldCacheImpl:> > - it is   
supposed that (sorted) Enumeration of "terms" is less than > total   
number of documents> (so that SOLR uses specific field type for   
sorted searches: > solr.StrField with omitNorms="true")> > It   
creates int[reader.maxDoc()] array, checks (sorted) Enumeration of   
> "terms" (untokenized solr.StrField), and populates array with   
document > Ids.> > > - it also creates array of String> String[]   
mterms = new String[reader.maxDoc()+1];> > Why do we need that? For  
 1G document with average term/StrField size > of 100 bytes (which   
could be unique text!!!) it will create kind of > huge 100Gb cache   
which is not really needed...> StringIndex value = new StringIndex   
(retArray, mterms);> > If I understand correctly... StringIndex   
_must_ be a file in a > filesystem for such a case... We create   
StringIndex, and retrieve top > 10 documents, huge overhead.> > > >  
 > > Quoting Fuad Efendi <[EMAIL PROTECTED]>:> > >> > Ok, what is   
confusing me is implicit guess that FieldCache contains> > "field"   
and Lucene uses in-memory sort instead of using file-system> >   
"index"...> >> > Array syze: 100Mb (25M x 4 bytes), and it is   
just pointers (4-byte> > integers) to documents in index.> >> >   
org.apache.lucene.search.FieldCacheImpl$10.createValue> > ...> >   
357: protected Object createValue(IndexReader reader, Object   
fieldKey)> > 358: throws IOException {> > 359: String field =   
((String) fieldKey).intern();> > 360: final int[] retArray = new   
int[reader.maxDoc()]; // > > OutOfMemoryError!!!> > ...> > 408:   
StringIndex value = new StringIndex (retArray, mterms);> > 409:   
return value;> > 410: }> > ...> >> > It's very confusing, I don't   
know such internals...> >> >> >  indexed="true" stored="true" > > termVectors="true"/>> >   
The sorting is done based on string field.> >> >> > I think Sundar   
should not use [termVectors="true"]...> >> >> >> > Quoting Mark   
Miller <[EMAIL PROTECTED]>:> >> >> Hmmm...I think its 32bits an  
 integer with an index entry for each doc, so> >>> >>> >> **25 000   
000 x 32 bits = 95.3674316 megabytes**> >>> >> Then you have the   
string array that contains each unique term from your> >>   
index...you can guess that based on the number of terms in your   
index> >> and an avg length guess.> >>> >> There is some other   
overhead beyond the sort cache as well, but thats> >> the bulk of   
what it will add. I think my memory may be bad with my> >> original  
 estimate :)> >>> >> Fuad Efendi wrote:> >>> Thank you very much   
Mark,>  >>> it explains me a lot;>  >>> I am guessing: for   
1,000,000 documents with a [string] field of > >>> average size   
1024 bytes I need 1Gb for single IndexSearcher > >>> instance;   
field-level cache it is used internally by Lucene (can > >>> Lucene  
 manage size if it?); we can't have 1G of such documents > >>>   
without having 1Tb RAM...>    >>> Quoting Mark Miller   
<[EMAIL PROTECTED]>:>   Fuad Efendi wrote:> >>   
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray -   
Object> >> size: 100767936, Num elements: 25191979> >>   
>> > I just noticed, this is an exact number of documents   
in index: 25191979> >> > (http://www.tokenizer.org/, you   
can sort - click headers Id, > > [COuntry, Site, Price] in a   
table; experimental)> >> >> > If array is allocated   
ONLY on new searcher warming up I am > > _extremely_ happy... I  
 had constant OOMs during past month (SUN > > Java 5).>  It  
 is only on warmup - I believe its lazy loaded, so the first time  
a>   search is done (solr does the search as part of warmup I   
believe) the>  fieldcache is loaded. The underlying IndexReader  
 is the key to the>  fieldcache, so until you get a

RE: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
Yes, it is a cache, it stores "sorted" by "sorted field" array of  
Document IDs together with sorted fields; query results can intersect  
with it and reorder accordingly.


But memory requirements should be well documented.

It uses internally WeakHashMap which is not good(!!!) - a lot of  
"underground" warming ups of caches which SOLR is not aware of...  
Could be.


I think Lucene-SOLR developers should join this discussion:


/**
 * Expert: The default cache implementation, storing all values in memory.
 * A WeakHashMap is used for storage.
 *
..

  // inherit javadocs
  public StringIndex getStringIndex(IndexReader reader, String field)
  throws IOException {
return (StringIndex) stringsIndexCache.get(reader, field);
  }

  Cache stringsIndexCache = new Cache() {

protected Object createValue(IndexReader reader, Object fieldKey)
throws IOException {
  String field = ((String) fieldKey).intern();
  final int[] retArray = new int[reader.maxDoc()];
  String[] mterms = new String[reader.maxDoc()+1];
  TermDocs termDocs = reader.termDocs();
  TermEnum termEnum = reader.terms (new Term (field, ""));






Quoting Fuad Efendi <[EMAIL PROTECTED]>:


I am hoping [new StringIndex (retArray, mterms)] is called only once
per-sort-field and cached somewhere at Lucene;

theoretically you need multiply number of documents on size of field
(supposing that field contains unique text); you need not tokenize this
field; you need not store TermVector.

for 2 000 000 documents with simple untokenized text field such as
title of book (256 bytes) you need probably 512 000 000 bytes per
Searcher, and as Mark mentioned you should limit number of searchers in
SOLR.

So that Xmx512M is definitely not enough even for simple cases...


Quoting sundar shankar <[EMAIL PROTECTED]>:

I haven't seen the source code before, But I don't know why the
sorting isn't done after the fetch is done. Wouldn't that make it
more faster. at least in case of field level sorting? I could be
wrong too and the implementation might probably be better. But   
don't  know why all of the fields have had to be loaded.






Date: Tue, 22 Jul 2008 14:26:26 -0700> From: [EMAIL PROTECTED]> To:
solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr
sorting> > > Ok, after some analysis of FieldCacheImpl:> > - it is  
  supposed that (sorted) Enumeration of "terms" is less than >   
total  number of documents> (so that SOLR uses specific field type  
 for  sorted searches: > solr.StrField with omitNorms="true")> >  
It   creates int[reader.maxDoc()] array, checks (sorted)  
Enumeration  of  > "terms" (untokenized solr.StrField), and  
populates array  with  document > Ids.> > > - it also creates  
array of String>  String[]  mterms = new  
String[reader.maxDoc()+1];> > Why do we  need that? For  1G  
document with average term/StrField size > of  100 bytes (which   
could be unique text!!!) it will create kind of  > huge 100Gb  
cache  which is not really needed...> StringIndex  value = new  
StringIndex  (retArray, mterms);> > If I understand  correctly...  
StringIndex  _must_ be a file in a > filesystem for  such a  
case... We create  StringIndex, and retrieve top > 10  documents,  
huge overhead.> > > >

> Quoting Fuad Efendi <[EMAIL PROTECTED]>:> > >> > Ok, what is
confusing me is implicit guess that FieldCache contains> > "field"  
  and Lucene uses in-memory sort instead of using file-system> >
"index"...> >> > Array syze: 100Mb (25M x 4 bytes), and it is   
 just pointers (4-byte> > integers) to documents in index.> >> >
org.apache.lucene.search.FieldCacheImpl$10.createValue> > ...> >
357: protected Object createValue(IndexReader reader, Object
fieldKey)> > 358: throws IOException {> > 359: String field =
((String) fieldKey).intern();> > 360: final int[] retArray = new
int[reader.maxDoc()]; // > > OutOfMemoryError!!!> > ...> > 408:
StringIndex value = new StringIndex (retArray, mterms);> > 409:
return value;> > 410: }> > ...> >> > It's very confusing, I don't   
 know such internals...> >> >> > type="string"  indexed="true" stored="true" > >   
termVectors="true"/>> >  The sorting is done based on string   
field.> >> >> > I think Sundar  should not use   
[termVectors="true"]...> >> >> >> > Quoting Mark  Miller   
<[EMAIL PROTECTED]>:> >> >> Hmmm...I think its 32bits an
integer with an index entry for each doc, so> >>> >>> >> **25 000   
 000 x 32 bits = 95.3674316 megabytes**> >>> >> Then you have the   
 string array that contains each unique term from your> >>
index...you can guess that based on the number of terms in your
index> >> and an avg length guess.> >>> >> There is some other
overhead beyond the sort cache as well, but thats> >> the bulk of   
 what it will add. I think my memory may be bad with my> >>   
original  estimate :)> >>> >> Fuad Efendi wrote:> >>> Thank you   
very 

Re: Vote on a new solr logo

2008-07-22 Thread Chris Harris
How about releasing the preliminary results so we can see if a run-off
is in order!

On Tue, Jul 22, 2008 at 6:37 AM, Mark Miller <[EMAIL PROTECTED]> wrote:
> My opinion: if its already a runaway, we might as well not prolong things.
> If not though, we should probably give some time for any possible laggards.
> The 'admin look' poll received its first 19-20 votes in the first night /
> morning, and has only gotten 2 or 3 since then, so probably no use going to
> long.
>
> - Mark
>
> Shalin Shekhar Mangar wrote:
>>
>> 28 votes so far and counting!
>>
>> When should we close this poll?


Seeking Anecdotes: Solr Plugins

2008-07-22 Thread Chris Hostetter


Hey everybody, I'll be giving a talk called "Apache Solr: Beyond the Box" 
at ApacheCon this year, which will focus on the how/when/why of 
writing Solr Plugins...


http://us.apachecon.com/c/acus2008/sessions/10

I've got several use cases I can refer to for examples, both from my day 
job and from public projects (DIH, Local Solr, etc.) but I'm hoping to be 
able to provide even broader examples of the types of "niche" plugins 
people have written for specific purposes, and what their motivations 
where for doing so (as opposed to doing alternate logic in their client).


If people would like to reply to this thread to share a little bit of 
information about their experiences, that would be awesome -- in addition 
to helping me write my slides, it would help other Solr users be aware of 
what's possible (and help Solr developers be more aware of how our users 
are taking advantage of the APIs).


If there are reasons why you can't publicly disclose the details of 
how you are using Solr, but you wouldn't mind sharing some of the info 
with me directly on the grounds that I keep you (and your company) 
anonymous you can also feel free to email me directly.




-Hoss



Re: Seeking Anecdotes: Solr Plugins

2008-07-22 Thread Mike Klaas


On 22-Jul-08, at 4:34 PM, Chris Hostetter wrote:



Hey everybody, I'll be giving a talk called "Apache Solr: Beyond the  
Box" at ApacheCon this year, which will focus on the how/when/why of  
writing Solr Plugins...


http://us.apachecon.com/c/acus2008/sessions/10

I've got several use cases I can refer to for examples, both from my  
day job and from public projects (DIH, Local Solr, etc.) but I'm  
hoping to be able to provide even broader examples of the types of  
"niche" plugins people have written for specific purposes, and what  
their motivations where for doing so (as opposed to doing alternate  
logic in their client).


If people would like to reply to this thread to share a little bit  
of information about their experiences, that would be awesome -- in  
addition to helping me write my slides, it would help other Solr  
users be aware of what's possible (and help Solr developers be more  
aware of how our users are taking advantage of the APIs).



analysis:
 - a WDF that handles some special-case tokens (e.g., '.NET') and  
that doesn't split tokens that are a series of small components (e.g.,  
r2d2) for performance

 - analyzers to incorporate payloads
 - custom token splitters

highlighting:
 - custom scorer, fragmenter
 - request handler for highlighting specific docs without querying

custom queries/scoring:
 - a "proximity" query that can handle cases where only a part of a  
query is proximate
 - a query similar to CustomScoreQuery that allows the arbitrary  
combination of query scorers (e.g., product), but allows any Query  
subclass (not just a ValueSourceQuery).
 - a heavy modified version of dismax that incorporates the above  
query types along with payloads and other features like query-injected  
filter queries.  This type of extension is largely obsolete with  
QueryComponents


Let me know if you want more detail--most of this is relative to a  
somewhat older version of Solr, so it might not all apply.


cheers,
-Mike


Re: Incremental indexing of database

2008-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
Did you take a look at DataImportHandler?

On Wed, Jul 23, 2008 at 1:57 AM, Ravish Bhagdev
<[EMAIL PROTECTED]> wrote:
> Can't you write triggers for your database/tables you want to index?
> That way you can keep track of all kinds of changes and updates and
> not just addition of a new record.
>
> Ravish
>
> On Tue, Jul 22, 2008 at 8:15 PM, anshuljohri <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> In my project i have to index whole database which contains text data only.
>> So if i follow incremental indexing approch than my problem is that how will
>> I pick delta data from database. Is there any utility in solr to keep track
>> the last indexed record. Or is there any other approch to solve this
>> problem.
>>
>> Thanks,
>> Anshul Johri
>> --
>> View this message in context: 
>> http://www.nabble.com/Incremental-indexing-of-database-tp18596613p18596613.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>



-- 
--Noble Paul


Re: Incremental indexing of database

2008-07-22 Thread anshuljohri

Thanks Paul, this is what I was looking for :)

-Anshul Johri


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> Did you take a look at DataImportHandler?
> 
> On Wed, Jul 23, 2008 at 1:57 AM, Ravish Bhagdev
> <[EMAIL PROTECTED]> wrote:
>> Can't you write triggers for your database/tables you want to index?
>> That way you can keep track of all kinds of changes and updates and
>> not just addition of a new record.
>>
>> Ravish
>>
>> On Tue, Jul 22, 2008 at 8:15 PM, anshuljohri <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Hi,
>>>
>>> In my project i have to index whole database which contains text data
>>> only.
>>> So if i follow incremental indexing approch than my problem is that how
>>> will
>>> I pick delta data from database. Is there any utility in solr to keep
>>> track
>>> the last indexed record. Or is there any other approch to solve this
>>> problem.
>>>
>>> Thanks,
>>> Anshul Johri
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Incremental-indexing-of-database-tp18596613p18596613.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Incremental-indexing-of-database-tp18596613p18604146.html
Sent from the Solr - User mailing list archive at Nabble.com.