How to avoid the unexpected character error?

2012-03-14 Thread neosky
I use the xml to index the data. One filed might contains some characters
like '' <=>
It seems that will produce the error
I modify that filed doesn't index, but it doesn't work. I need to store the
filed, but index might not be indexed.
Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3824726.html
Sent from the Solr - User mailing list archive at Nabble.com.


Too many open files - lots of sockets

2012-03-14 Thread Colin Howe
Hello,

We keep hitting the too many open files exception. Looking at lsof we have
a lot (several thousand) of entries like this:

java  19339 root 1619u sock0,7 0t0
 682291383 can't identify protocol


However, netstat -a doesn't show any of these.

Can anyone suggest a way to diagnose what these socket entries are? Happy
to post any more information as needed.


Cheers,
Colin


-- 
Colin Howe
@colinhowe

VP of Engineering
Conversocial Ltd
conversocial.com


Re: Too many open files - lots of sockets

2012-03-14 Thread Markus Jelsma
Are you running trunk and have auto-commit enabled? Then disable 
auto-commit. Even if you increase ulimits it will continue to swallow 
all available file descriptors.


On Wed, 14 Mar 2012 10:13:55 +, Colin Howe  
wrote:

Hello,

We keep hitting the too many open files exception. Looking at lsof we 
have

a lot (several thousand) of entries like this:

java  19339 root 1619u sock0,7
0t0

 682291383 can't identify protocol


However, netstat -a doesn't show any of these.

Can anyone suggest a way to diagnose what these socket entries are? 
Happy

to post any more information as needed.


Cheers,
Colin


--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Too many open files - lots of sockets

2012-03-14 Thread Colin Howe
Currently using 3.4.0. We have autocommit enabled but we manually do
commits every 100 documents anyway... I can turn it off if you think that
might help.


Cheers,
Colin


On Wed, Mar 14, 2012 at 10:24 AM, Markus Jelsma
wrote:

> Are you running trunk and have auto-commit enabled? Then disable
> auto-commit. Even if you increase ulimits it will continue to swallow all
> available file descriptors.
>
>
> On Wed, 14 Mar 2012 10:13:55 +, Colin Howe 
> wrote:
>
>> Hello,
>>
>> We keep hitting the too many open files exception. Looking at lsof we have
>> a lot (several thousand) of entries like this:
>>
>> java  19339 root 1619u sock0,70t0
>>  682291383 can't identify protocol
>>
>>
>> However, netstat -a doesn't show any of these.
>>
>> Can anyone suggest a way to diagnose what these socket entries are? Happy
>> to post any more information as needed.
>>
>>
>> Cheers,
>> Colin
>>
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/**markus17
> 050-8536600 / 06-50258350
>



-- 
Colin Howe
@colinhowe

VP of Engineering
Conversocial Ltd
conversocial.com


Re: Too many open files - lots of sockets

2012-03-14 Thread Michael Kuhlmann

I had the same problem, without auto-commit.

I never really found out what exactly the reason was, but I think it was 
because commits were triggered before a previous commit had the chance 
to finish.


We now commit after every minute or 1000 (quite large) documents, 
whatever comes first. And we never optimize. We haven't had this 
exceptions for months now.


Good luck!
-Kuli

Am 14.03.2012 11:22, schrieb Colin Howe:

Currently using 3.4.0. We have autocommit enabled but we manually do
commits every 100 documents anyway... I can turn it off if you think that
might help.


Cheers,
Colin


On Wed, Mar 14, 2012 at 10:24 AM, Markus Jelsma
wrote:


Are you running trunk and have auto-commit enabled? Then disable
auto-commit. Even if you increase ulimits it will continue to swallow all
available file descriptors.


On Wed, 14 Mar 2012 10:13:55 +, Colin Howe
wrote:


Hello,

We keep hitting the too many open files exception. Looking at lsof we have
a lot (several thousand) of entries like this:

java  19339 root 1619u sock0,70t0
  682291383 can't identify protocol


However, netstat -a doesn't show any of these.

Can anyone suggest a way to diagnose what these socket entries are? Happy
to post any more information as needed.


Cheers,
Colin



--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/**markus17
050-8536600 / 06-50258350









Sorting on non-stored field

2012-03-14 Thread Finotti Simone
I was wondering: is it possible to sort a Solr result-set on a non-stored value?

Thank you

Re: Sorting on non-stored field

2012-03-14 Thread Michael Kuhlmann

Am 14.03.2012 11:43, schrieb Finotti Simone:

I was wondering: is it possible to sort a Solr result-set on a non-stored value?


Yes, it is. It must be indexed, indeed.

-Kuli


Re: Sorting on non-stored field

2012-03-14 Thread Li Li
it should be indexed by not analyzed. it don't need stored.
reading field values from stored fields is extremely slow.
So lucene will use StringIndex to read fields for sort. so if you want to
sort by some field, you should index this field and don't analyze it.

On Wed, Mar 14, 2012 at 6:43 PM, Finotti Simone  wrote:

> I was wondering: is it possible to sort a Solr result-set on a non-stored
> value?
>
> Thank you


Re: How to avoid the unexpected character error?

2012-03-14 Thread Li Li
There is a class org.apache.solr.common.util.XML in solr
you can use this wrapper:
public static String escapeXml(String s) throws IOException{
StringWriter sw=new StringWriter();
XML.escapeCharData(s, sw);
return sw.getBuffer().toString();
}

On Wed, Mar 14, 2012 at 4:34 PM, neosky  wrote:

> I use the xml to index the data. One filed might contains some characters
> like '' <=>
> It seems that will produce the error
> I modify that filed doesn't index, but it doesn't work. I need to store the
> filed, but index might not be indexed.
> Thanks!
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3824726.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Too many open files - lots of sockets

2012-03-14 Thread Colin Howe
After some more digging around I discovered that there was a bug reported
in jetty 6:  https://jira.codehaus.org/browse/JETTY-1458

This prompted me to upgrade to Jetty 7 and things look a bit more stable
now :)



On Wed, Mar 14, 2012 at 10:26 AM, Michael Kuhlmann  wrote:

> I had the same problem, without auto-commit.
>
> I never really found out what exactly the reason was, but I think it was
> because commits were triggered before a previous commit had the chance to
> finish.
>
> We now commit after every minute or 1000 (quite large) documents, whatever
> comes first. And we never optimize. We haven't had this exceptions for
> months now.
>
> Good luck!
> -Kuli
>
> Am 14.03.2012 11:22, schrieb Colin Howe:
>
>> Currently using 3.4.0. We have autocommit enabled but we manually do
>> commits every 100 documents anyway... I can turn it off if you think that
>> might help.
>>
>>
>> Cheers,
>> Colin
>>
>>
>> On Wed, Mar 14, 2012 at 10:24 AM, Markus Jelsma
>> **wrote:
>>
>>  Are you running trunk and have auto-commit enabled? Then disable
>>> auto-commit. Even if you increase ulimits it will continue to swallow all
>>> available file descriptors.
>>>
>>>
>>> On Wed, 14 Mar 2012 10:13:55 +, Colin Howe
>>> wrote:
>>>
>>>  Hello,

 We keep hitting the too many open files exception. Looking at lsof we
 have
 a lot (several thousand) of entries like this:

 java  19339 root 1619u sock0,7
  0t0
  682291383 can't identify protocol


 However, netstat -a doesn't show any of these.

 Can anyone suggest a way to diagnose what these socket entries are?
 Happy
 to post any more information as needed.


 Cheers,
 Colin


>>> --
>>> Markus Jelsma - CTO - Openindex
>>> http://www.linkedin.com/in/markus17
>>> 
>>> >
>>> 050-8536600 / 06-50258350
>>>
>>>
>>
>>
>>
>


-- 
Colin Howe
@colinhowe

VP of Engineering
Conversocial Ltd
conversocial.com


Re: Trouble indexing word documents

2012-03-14 Thread Tomás Fernández Löbbe
Well, this is another error. Looks like you are using cores and you are not
adding the core name to the URL. Make sure you do it:

http://localhost:8585/solr/[CORENAME]/update/extract?literal.id=1&commit=true

The core name is the one you defined in solr.xml and should always be used
in the URL. If you enter through the browser you must add it too.
http://localhost:8585/solr/[CORENAME]/admin/ for example.

On Tue, Mar 13, 2012 at 9:12 AM, rdancy  wrote:

> Yes, I do have that one, as well as a bunch of other jars. I moved the
> lucidworks-solr-cell-3.2.0_01.jar to my classpath, I also placed it in
> /contrib/extraction. I restarted Solr and tried to index the document again
> and this is the result:
>
> "myfile=@troubleshooting_performance.doc"
> Apache Tomcat/6.0.14 - Error report
> 
> HTTP Status 400 - Missing solr core name in path
> *type* Status report*message*
> Missing solr core name in path*description* The request
> sent by the client was syntactically incorrect (Missing solr core name in
> path).
> Apache Tomcat/6.0.14
> bash-3.2#
>
> I went to my browser and got the following:
>
> HTTP Status 404 - missing core name in path
>
>
> 
>
> type Status report
>
> message missing core name in path
>
> description The requested resource (missing core name in path) is not
> available.
>
>
>
> 
>
> Apache Tomcat/6.0.14
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Trouble-indexing-word-documents-tp3819949p3822049.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Too many open files - lots of sockets

2012-03-14 Thread Michael Kuhlmann

Ah, good to know! Thank you!

I already had Jetty under suspicion, but we had this failure quite often 
in October and November, when the bug was not yet reported.


-Kuli

Am 14.03.2012 12:08, schrieb Colin Howe:

After some more digging around I discovered that there was a bug reported
in jetty 6:  https://jira.codehaus.org/browse/JETTY-1458

This prompted me to upgrade to Jetty 7 and things look a bit more stable
now :)


Dynamically changing facet hierarchies and facet values

2012-03-14 Thread Sphene Software
Hello,

I have a use case where the facet hierarchies as well as facet names change
very frequently.

For example:
(Smartphones >> Android ) may become
Smartphones >> GSM >> And roid.

OR
   "Smartphone"  could be renamed to "Smart Phone"

If I use traditional hierarchical faceting, then every change would mean a
re-index of a large number of documents.

Just curious to know how others have solved this problem in the past.


Appreciate your help!
Sphene


RE: solr 3.5 and indexing performance

2012-03-14 Thread Agnieszka Kukałowicz
Bug ticket created:
https://issues.apache.org/jira/browse/SOLR-3245

I also made test you ask with english dictionary.
The results are in the ticket.

Agnieszka

> -Original Message-
> From: Jan Høydahl [mailto:jan@cominvent.com]
> Sent: Wednesday, March 14, 2012 12:54 AM
> To: solr-user@lucene.apache.org
> Subject: Re: solr 3.5 and indexing performance
>
> Hi,
>
> Thanks a lot for your detailed problem description. It definitely is an
> error. Would you be so kind to register it as a bug ticket, including
> your descriptions from this email?
> http://wiki.apache.org/solr/HowToContribute#JIRA_tips_.28our_issue.2BAC8
> -bug_tracker.29. Also please attach to the issue your polish hunspell
> dictionaries. Then we'll try to reproduce the error.
>
> I wonder if this performance decrease is also seen for English
> dictionaries?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 13. mars 2012, at 16:42, Agnieszka Kukałowicz wrote:
>
> > Hi,
> >
> > I did some more tests for Hunspell in solr 3.4, 4.0:
> >
> > Solr 3.4, full import 489017 documents:
> >
> > StempelPolishStemFilterFactory -  2908 seconds, 168 docs/sec
> > HunspellStemFilterFactory - 3922 seconds, 125 docs/sec
> >
> > Solr 4.0, full import 489017 documents:
> >
> > StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec
> > HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11
> docs/sec
> >
> > Server specification and Java settings are the same as before.
> >
> > Cheers
> > Agnieszka
> >
> >
> >> -Original Message-
> >> From: Agnieszka Kukałowicz [mailto:agnieszka.kukalow...@usable.pl]
> >> Sent: Tuesday, March 13, 2012 10:39 AM
> >> To: 'solr-user@lucene.apache.org'
> >> Subject: RE: solr 3.5 and indexing performance
> >>
> >> Hi,
> >>
> >> Yes, I confirmed that without Hunspell indexing has normal speed.
> >> I did tests in solr 4.0 with Hunspell and PolishStemmer.
> >> With StempelPolishStemFilterFactory the speed is normal.
> >>
> >> My schema is quit easy. For Hunspell I have one text field I copy 14
> >> text fields to:
> >>
> >> " >> stored="false" multiValued="true"/>"
> >>
> >>
> >>>> dest="text"/> >> source="field4" dest="text"/>   dest="text"/>
> >>>> dest="text"/> >> source="field9" dest="text"/>   dest="text"/>
> >>source="field12"
> >> dest="text"/> >> source="field14" dest="text"/>
> >>
> >> The "text_pl_hunspell" configuration:
> >>
> >>  >> positionIncrementGap="100">
> >>  
> >>
> >> >>ignoreCase="true"
> >>words="dict/stopwords_pl.txt"
> >>enablePositionIncrements="true"
> >>/>
> >>
> >> >> dictionary="dict/pl_PL.dic" affix="dict/pl_PL.aff" ignoreCase="true"
> >>
> >>  
> >>  
> >>
> >> >> synonyms="dict/synonyms_pl.txt" ignoreCase="true" expand="true"/>
> >> >>ignoreCase="true"
> >>words="dict/stopwords_pl.txt"
> >>enablePositionIncrements="true"
> >>/>
> >>
> >> >> dictionary="dict/pl_PL.dic" affix="dict/pl_PL.aff" ignoreCase="true"
> >> >> protected="dict/protwords_pl.txt"/>
> >>  
> >>
> >>
> >> I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt,
> >> synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same
> >> files I used in 3.4 version.
> >>
> >> For Polish Stemmer the diffrence is only in definion text field:
> >>
> >> " >> multiValued="true"/>"
> >>
> >> >> positionIncrementGap="100">
> >>  
> >>
> >> >>ignoreCase="true"
> >>words="dict/stopwords_pl.txt"
> >>enablePositionIncrements="true"
> >>/>
> >>
> >>
> >> >> protected="dict/protwords_pl.txt"/>
> >>  
> >>  
> >>
> >> >> synonyms="dict/synonyms_pl.txt" ignoreCase="true" expand="true"/>
> >> >>ignoreCase="true"
> >>words="dict/stopwords_pl.txt"
> >>enablePositionIncrements="true"
> >>/>
> >>
> >>
> >> >> protected="dict/protwords_pl.txt"/>
> >>  
> >>
> >>
> >> One document has 23 fields:
> >> - 14 text fields copy to one text field (above) that is only indexed
> >> - 8 other indexed fields (2 strings, 2 tdates, 3 tint, 1 tfloat) The
> >> size of one document is 3-4 kB.
> >> So, I think this is not very complicated schema.
> >>
> >> My environment is:
> >> - Linux, RedHat 6.2, kernel 2.6.32
> >> - 2 physical CPU Xeon 5606 (4 cores each)
> >> - 32 GB RAM
> >> - 2 SSD disks in RAID 0
> >> - java version:
> >>
> >> java -version
> >> java version "1.6.0_26"
> >> Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM)
> >> 64-Bit Server VM (build 20.1-b02, mixed mode)
> >>
> >> - java is running with -server -Xms4096M -Xmx4096M

Re: Too many open files - lots of sockets

2012-03-14 Thread Erick Erickson
Colin:

FYI, you might consider just setting up the autocommit (or commitWithin if
you're using SolrJ) for some reasonable interval (I often use 10 minutes or so).

Even though you've figured it is a Tomcat issue, each
commit causes searcher re-opens, perhaps replication in a master/slave
setup, increased merges etc. It works, but it's also resource intensive...

FWIW
Erick

On Wed, Mar 14, 2012 at 6:40 AM, Michael Kuhlmann  wrote:
> Ah, good to know! Thank you!
>
> I already had Jetty under suspicion, but we had this failure quite often in
> October and November, when the bug was not yet reported.
>
> -Kuli
>
> Am 14.03.2012 12:08, schrieb Colin Howe:
>
>> After some more digging around I discovered that there was a bug reported
>> in jetty 6:  https://jira.codehaus.org/browse/JETTY-1458
>>
>> This prompted me to upgrade to Jetty 7 and things look a bit more stable
>> now :)


read only slaves and write only master

2012-03-14 Thread Mike Austin
Is there a way mark a master as write only and the slaves as read only?  I
guess I could just remove those handlers from the config?

Is there a benefit from doing this as far as performance or anything else?

Thanks,
Mike


Re: SolrCloud Force replication without restarting

2012-03-14 Thread Mark Miller

On Mar 14, 2012, at 12:03 PM, Jamie Johnson wrote:

> Is there a way to force Solr to do a replication without restarting
> when in SolrCloud?


You mean force a recovery? If so, then yes: there is a CoreAdminCommand 
(http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler) called 
REQUESTRECOVERY. The only param is core= and the name of the SolrCore.

- Mark Miller
lucidimagination.com













Re: How to avoid the unexpected character error?

2012-03-14 Thread neosky
Thanks!
Does the schema.xml support this parameter? I am using the example post.jar
to index my file.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3825959.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Force replication without restarting

2012-03-14 Thread Jamie Johnson
Great, so to be clear I would execute the following correct?

http://localhost:8983/solr/admin/cores?action=REQUESTRECOVERY&core=slice1_shard2

On Wed, Mar 14, 2012 at 12:18 PM, Mark Miller  wrote:
>
> On Mar 14, 2012, at 12:03 PM, Jamie Johnson wrote:
>
>> Is there a way to force Solr to do a replication without restarting
>> when in SolrCloud?
>
>
> You mean force a recovery? If so, then yes: there is a CoreAdminCommand 
> (http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler) called 
> REQUESTRECOVERY. The only param is core= and the name of the SolrCore.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Force replication without restarting

2012-03-14 Thread Mark Miller
Yeah, that looks right to me.

On Mar 14, 2012, at 12:54 PM, Jamie Johnson wrote:

> Great, so to be clear I would execute the following correct?
> 
> http://localhost:8983/solr/admin/cores?action=REQUESTRECOVERY&core=slice1_shard2
> 
> On Wed, Mar 14, 2012 at 12:18 PM, Mark Miller  wrote:
>> 
>> On Mar 14, 2012, at 12:03 PM, Jamie Johnson wrote:
>> 
>>> Is there a way to force Solr to do a replication without restarting
>>> when in SolrCloud?
>> 
>> 
>> You mean force a recovery? If so, then yes: there is a CoreAdminCommand 
>> (http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler) called 
>> REQUESTRECOVERY. The only param is core= and the name of the SolrCore.
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com













Re: Using two repeater to rapidly switching Master and Slave (Replication)?

2012-03-14 Thread stockii
Did your configuration works ?

i have the same issue and i dont know if it works...

i have 2 servers. each with 2 solr instances (one for updates other for
searching) now i need replication from solr1 to solr2. but what the hell do
solr if master crashed ???



-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
1 Core with 45 Million Documents other Cores < 200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-two-repeater-to-rapidly-switching-Master-and-Slave-Replication-tp3089653p3826234.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Memory Usage

2012-03-14 Thread Mike Austin
I'm looking at the solr admin interface site.  On the dashboard right
panel, I see three sections with size numbers like 227MB(light),
124MB(darker), and 14MB(darkest).

I'm on a windows server.

Couple questions about what I see in the solr app admin interface:

- In the top right section of the dashboard, does the lightest color of the
three with a number of 227MB come from the Xmx256 heap max setting for java
that I set?  Is this the max limit for all my solr apps running on this
instance of tomcat?
- Is the 124MB in the middle slightly darker gray the Xms128 setting I set?
- Is the 47MB of the darkest section the memory of the current solr app
that I'm viewing details on?
- Does the 227mb and 124mb apply to all solr apps? For example if I go look
under solrapp1 it should have the same 227mb and 124mb numbers but a
different darker section number for the memory of the current solrapp that
I'm viewing?
- If the numbers for the Xms and Xmx are the same for all solr apps on this
tomcat instance however the bottom darker memory number is specific to the
app admin that I'm viewing, how do I see the total usage of all solr apps?
Is it under /manager/status? JVM section part with "Free memory: 110.16 MB
Total memory: 124.31 MB Max memory: 227.56 MB"?
- If the Xmx is the max memory allocated to the jvm for tomcat, what is the
Xms used for? Is it to hold memory to not be allocating often and what
happens if you go over that number?
- Also, in windows do the Xmx/Xms settings and memory usage for these solr
apps display in task manager under the memory usage of tomcat application?
Only?

Sorry for the many questions but after google searches and research by a
non-java expert, I am yet to have a clear answers to these questions.

Thanks,
Mike


Solr core swap after rebuild in HA-setup / High-traffic

2012-03-14 Thread KeesSchepers
Hello everybody,

I am designing a new Solr architecture for one of my clients. This sorl
architecture is for a high-traffic website with million of visitors but I am
facing some design problems were I hope you guys could help me out.

In my situation there are 4 Solr servers running, 1 server is master and 3
are slave. They are running Solr version 1.4.

I use two cores 'live' and 'rebuild' and I use Solr DIH to rebuild a core
which goes like this:

1. I wipe the reindex core
2. I run the DIH to the complete dataset (4 million documents) in peices of
20.000 records (to prevent very long mysql locks)
3. After the DIH is finished (2 hours) we have to also have to update the
rebuild core with changes from the last two hours, this is a problem
4. After updating is done and the core is not more then some seconds behind
we want to SWAP the cores.

Everything goes well except for step 3. The rebuild and the core swap is all
okay. 

Because the website is undergoing changes every minute we cannot pauze the
delta-import on the live and walk behind for 2 hours. The problem is that I
can't figure out a closing system with not delaying the live core to long
and use the DIH instead of writing a lot of code.

Did anyone face this problem before or could give me some tips?

Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-core-swap-after-rebuild-in-HA-setup-High-traffic-tp3826461p3826461.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-14 Thread Young, Cody
We have a very similar system. In our case we have a row version field in our 
sql database. When we run the full import we keep track of the latest row 
version at the time that the full import started. Once the full import is done 
we run an optimize and then run a delta import (actually this is a full import 
in a different DIH file, same mappings just with some different parameters to 
the stored procedure that we call). This catches the offline core up to live 
and then we swap.

Why can't you run the delta import on the rebuild core? Is there some specific 
issue that is preventing you?

Cody
-Original Message-
From: KeesSchepers [mailto:k...@keesschepers.nl] 
Sent: Wednesday, March 14, 2012 11:59 AM
To: solr-user@lucene.apache.org
Subject: Solr core swap after rebuild in HA-setup / High-traffic

Hello everybody,

I am designing a new Solr architecture for one of my clients. This sorl 
architecture is for a high-traffic website with million of visitors but I am 
facing some design problems were I hope you guys could help me out.

In my situation there are 4 Solr servers running, 1 server is master and 3 are 
slave. They are running Solr version 1.4.

I use two cores 'live' and 'rebuild' and I use Solr DIH to rebuild a core which 
goes like this:

1. I wipe the reindex core
2. I run the DIH to the complete dataset (4 million documents) in peices of
20.000 records (to prevent very long mysql locks) 3. After the DIH is finished 
(2 hours) we have to also have to update the rebuild core with changes from the 
last two hours, this is a problem 4. After updating is done and the core is not 
more then some seconds behind we want to SWAP the cores.

Everything goes well except for step 3. The rebuild and the core swap is all 
okay. 

Because the website is undergoing changes every minute we cannot pauze the 
delta-import on the live and walk behind for 2 hours. The problem is that I 
can't figure out a closing system with not delaying the live core to long and 
use the DIH instead of writing a lot of code.

Did anyone face this problem before or could give me some tips?

Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-core-swap-after-rebuild-in-HA-setup-High-traffic-tp3826461p3826461.html
Sent from the Solr - User mailing list archive at Nabble.com.


problems with DisjunctionMaxQuery and early-termination

2012-03-14 Thread Carlos Gonzalez-Cadenas
Hello all,

We have a SOLR index filled with user queries and we want to retrieve the
ones that are more similar to a given query entered by an end-user. It is
kind of a "related queries" system.

The index is pretty big and we're using early-termination of queries (with
the index sorted so that the "more popular" queries have lower docids and
therefore the termination yields higher-quality results)

Clearly, when the user enters a user-level query into the search box, i.e.
"cheap hotels barcelona offers", we don't know whether there exists a
document (query) in the index that contains these four words or not.
 Therefore, when we're building the SOLR query, the first intuition would
be to do a query like this "cheap OR hotels OR barcelona OR offers".

If all the documents in the index where evaluated, the results of this
query would be good. For example, if there is no query in the index with
these four words but there's a query in the index with the text "cheap
hotels barcelona", it will probably be one of the top results, which is
precisely what we want.

The problem is that we're doing early termination and therefore this query
will exhaust very fast the top-K result limit (our custom collector limits
on the number of evaluated documents), given that queries like "hotels in
madrid" or "hotels in NYC" will match the OR expression described above
(because they all match "hotels").

Our next step was to think in a DisjunctionMaxQuery, trying to write a
query like this:

DisjunctionMaxQuery:
 1) +cheap +hotels +barcelona +offers
 2) +cheap +hotels +barcelona
 3) +cheap +hotels
 4) +hotels

We were thinking that perhaps the sub-queries within the
DisjunctionMaxQuery were going to get evaluated in "parallel" given that
they're separated queries, but in fact from a runtime perspective it does
behave in a similar way than the OR query that we described above.

Our desired behavior is to try match documents with each subquery within
the DisjunctionMaxQuery (up to a per-subquery limit that we put) and then
score them and return them all together (therefore we don't want all the
matches being done by a single sub-query, like it's happening now).

Clearly, we could create a script external to SOLR that just runs the
several sub-queries as standalone queries and then joins all the results
together, but before going for this we'd like to know if you have any ideas
on how to solve this problem within SOLR. We do have our own QParser, and
therefore we'd be able to implement any arbitrary query construction that
you can come up with, or even create a new Query type if it's needed.

Thanks a lot for your help,
Carlos


Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


Re: index size with replication

2012-03-14 Thread Mike Austin
The odd thing is that if I optimize the index it doubles in size.. If I
then, add one more document to the index it goes back down to half size?

Is there a way to force this without needing to wait until another document
is added? Or do you have more information on what you think is going on?
I'm using a trunk version of solr4 from 9/12/2011 with a master with two
slaves setup.  Everything besides this is working great!

Thanks,
Mike

On Tue, Mar 13, 2012 at 9:32 PM, Li Li  wrote:

>  optimize will generate new segments and delete old ones. if your master
> also provides searching service during indexing, the old files may be
> opened by old SolrIndexSearcher. they will be deleted later. So when
> indexing, the index size may double. But a moment later, old indexes will
> be deleted.
>
> On Wed, Mar 14, 2012 at 7:06 AM, Mike Austin 
> wrote:
>
> > I have a master with two slaves.  For some reason on the master if I do
> an
> > optimize after indexing on the master it double in size from 42meg to 90
> > meg.. however,  when the slaves replicate they get the 42meg index..
> >
> > Should the master and slaves always be the same size?
> >
> > Thanks,
> > Mike
> >
>


Re: index size with replication

2012-03-14 Thread Mike Austin
Another note.. if I reload solr app it goes back down in size.

here is my replication settings on the master:


   
 startup
 commit
 optimize
 1
 schema.xml,stopwords.txt,elevate.xml
 00:00:30
   


On Wed, Mar 14, 2012 at 3:54 PM, Mike Austin  wrote:

> The odd thing is that if I optimize the index it doubles in size.. If I
> then, add one more document to the index it goes back down to half size?
>
> Is there a way to force this without needing to wait until another
> document is added? Or do you have more information on what you think is
> going on?  I'm using a trunk version of solr4 from 9/12/2011 with a master
> with two slaves setup.  Everything besides this is working great!
>
> Thanks,
> Mike
>
>
> On Tue, Mar 13, 2012 at 9:32 PM, Li Li  wrote:
>
>>  optimize will generate new segments and delete old ones. if your master
>> also provides searching service during indexing, the old files may be
>> opened by old SolrIndexSearcher. they will be deleted later. So when
>> indexing, the index size may double. But a moment later, old indexes will
>> be deleted.
>>
>> On Wed, Mar 14, 2012 at 7:06 AM, Mike Austin 
>> wrote:
>>
>> > I have a master with two slaves.  For some reason on the master if I do
>> an
>> > optimize after indexing on the master it double in size from 42meg to 90
>> > meg.. however,  when the slaves replicate they get the 42meg index..
>> >
>> > Should the master and slaves always be the same size?
>> >
>> > Thanks,
>> > Mike
>> >
>>
>
>


Re: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-14 Thread KeesSchepers
Well, the point is as follows.

We have a mysql table where all the changes are tracked something very
simular to your situation. The first problem is that, the delta-import on
the live core needs to update this table to notify a record is done. I do
this very awfull now within a script transformer, offcourse DIH isn't
designed for this. 

The second thing is, that if the rebuild is running on the rebuild core, we
want to do a delta-import on this new core to make it less behind from the
live core but also while the rebuilding process is ongoing also the
delta-import on the live core runs every minute. 

The second problem is that the delta-import on the live core already set's
these rows to status 'processed' and the delta-update after the rebuild
wouldn't pick up these updates anymore.

There are some solutions but I can't figure out a clean way to solve this
architecture problem. Maybe there isn't a clean solution..

I am curious how other developers experiencing this thing..

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-core-swap-after-rebuild-in-HA-setup-High-traffic-tp3826461p3826835.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: index size with replication

2012-03-14 Thread Ahmet Arslan

> Another note.. if I reload solr app
> it goes back down in size.
> 
> here is my replication settings on the master:
> 
>  class="solr.ReplicationHandler" >
>        
>           name="replicateAfter">startup
>           name="replicateAfter">commit
>           name="replicateAfter">optimize
>           name="numberToKeep">1
>           name="confFiles">schema.xml,stopwords.txt,elevate.xml
>           name="commitReserveDuration">00:00:30
>        
> 

Could it be https://issues.apache.org/jira/browse/SOLR-3033 ?




RE: index size with replication

2012-03-14 Thread Dyer, James
SOLR-3033 is related to ReplcationHandler's ability to do backups.  It allows 
you to specify how many backups you want to keep.  You don't seem to have any 
backups configured here so it is not an applicable parameter (note that 
SOLR-3033 was committed to trunk recently but the config param was made 
"maxNumberOfBackups" ... see http://wiki.apache.org/solr/SolrReplication#Master 
)

I can only take a wild guess why you have the temporary increase in index size. 
 Could it be that something is locking the old segment files so they do not get 
deleted on optimize?  Then maybe they are subsequently getting cleaned up at 
your next commit and restart ?

Finally, keep in mind that doing optimizes aren't generally recommended 
anymore.  Everyone's situation is different, but if you have good settings for 
"mergeFactor" and "ramBufferSizeMB", then optimize is (probably) not going to 
do anything helpful. 

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Wednesday, March 14, 2012 4:25 PM
To: solr-user@lucene.apache.org
Subject: Re: index size with replication


> Another note.. if I reload solr app
> it goes back down in size.
> 
> here is my replication settings on the master:
> 
>  class="solr.ReplicationHandler" >
>        
>           name="replicateAfter">startup
>           name="replicateAfter">commit
>           name="replicateAfter">optimize
>           name="numberToKeep">1
>           name="confFiles">schema.xml,stopwords.txt,elevate.xml
>           name="commitReserveDuration">00:00:30
>        
> 

Could it be https://issues.apache.org/jira/browse/SOLR-3033 ?




Re: index size with replication

2012-03-14 Thread Mike Austin
Thanks.  I might just remove the optimize.  I had it planned for once a
week but maybe I'll just do it and restart the app if performance slows.


On Wed, Mar 14, 2012 at 4:37 PM, Dyer, James wrote:

> SOLR-3033 is related to ReplcationHandler's ability to do backups.  It
> allows you to specify how many backups you want to keep.  You don't seem to
> have any backups configured here so it is not an applicable parameter (note
> that SOLR-3033 was committed to trunk recently but the config param was
> made "maxNumberOfBackups" ... see
> http://wiki.apache.org/solr/SolrReplication#Master )
>
> I can only take a wild guess why you have the temporary increase in index
> size.  Could it be that something is locking the old segment files so they
> do not get deleted on optimize?  Then maybe they are subsequently getting
> cleaned up at your next commit and restart ?
>
> Finally, keep in mind that doing optimizes aren't generally recommended
> anymore.  Everyone's situation is different, but if you have good settings
> for "mergeFactor" and "ramBufferSizeMB", then optimize is (probably) not
> going to do anything helpful.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Wednesday, March 14, 2012 4:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: index size with replication
>
>
> > Another note.. if I reload solr app
> > it goes back down in size.
> >
> > here is my replication settings on the master:
> >
> >  > class="solr.ReplicationHandler" >
> >
> >   > name="replicateAfter">startup
> >   > name="replicateAfter">commit
> >   > name="replicateAfter">optimize
> >   > name="numberToKeep">1
> >   > name="confFiles">schema.xml,stopwords.txt,elevate.xml
> >   > name="commitReserveDuration">00:00:30
> >
> > 
>
> Could it be https://issues.apache.org/jira/browse/SOLR-3033 ?
>
>
>


Re: index size with replication

2012-03-14 Thread Shawn Heisey

On 3/14/2012 2:54 PM, Mike Austin wrote:

The odd thing is that if I optimize the index it doubles in size.. If I
then, add one more document to the index it goes back down to half size?

Is there a way to force this without needing to wait until another document
is added? Or do you have more information on what you think is going on?
I'm using a trunk version of solr4 from 9/12/2011 with a master with two
slaves setup.  Everything besides this is working great!


The not-very-helpful-but-true answer: Don't run on Windows.  I checked 
your prior messages to the list to verify that this is your 
environment.  If you can control index updates so they don't happen at 
the same time as your optimizes, you can also get around this problem by 
doing the optimize twice.  You would have to be absolutely sure that no 
changes are made to the index between the two optimizes, so the second 
one basically doesn't do anything except take care of the deletes.


Nuts and bolts of why this happens: Solr keeps the old files open so the 
existing reader can continue to serve queries.  That reader will not be 
closed until the last query completes, which may not happen until well 
after the time the new reader is completely online and ready.  I assume 
that the delete attempt occurs as soon as the new index segments are 
completely online, before the old reader begins to close.  I've not read 
the source code to find out.


On Linux and other UNIX-like environments, you can delete files while 
they are open by a process.  They continue to exist as in-memory links 
and take up space until those processes close them, at which point they 
are truly gone.  On Windows, an attempt to delete an open file will 
fail, even if it's open read-only.


There are probably a number of ways that this problem could be solved 
for Windows platforms.  The simplest that I can think of, assuming it's 
even possible, would be to wait until the old reader is closed before 
attempting the segment deletion.  That may not be possible - the 
information may not be available to the portion of code that does the 
deletion.  There are a few things standing in the way of me fixing this 
problem myself: 1) I'm a beginning Java programmer.  2) I'm not familiar 
with the Solr code at all. 3) My interest level is low because I run on 
Linux, not Windows.


Thanks,
Shawn



Re: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-14 Thread Shawn Heisey

On 3/14/2012 12:58 PM, KeesSchepers wrote:

1. I wipe the reindex core
2. I run the DIH to the complete dataset (4 million documents) in peices of
20.000 records (to prevent very long mysql locks)
3. After the DIH is finished (2 hours) we have to also have to update the
rebuild core with changes from the last two hours, this is a problem
4. After updating is done and the core is not more then some seconds behind
we want to SWAP the cores.

Everything goes well except for step 3. The rebuild and the core swap is all
okay.

Because the website is undergoing changes every minute we cannot pauze the
delta-import on the live and walk behind for 2 hours. The problem is that I
can't figure out a closing system with not delaying the live core to long
and use the DIH instead of writing a lot of code.


I solve this problem by tracking the current position outside of the 
database and Solr, in my build system.  The primary key on the mysql 
table is an autoincrement BIGINT field - I just keep track of the last 
value that was added.  When a rebuild happens, I continue to track it 
for the live cores, then when it comes time to swap, I restore the 
tracked value to what it was when the rebuild started.  You can put 
arbitrary parameters on your DIH url and have DIH construct your query 
with them.


I used to have a build system written in Perl that did *all* index 
activity with DIH.  Now I have a Java build system that only uses DIH 
for full index rebuilds.  It uses SolrJ for everything else.


Thanks,
Shawn



Responding to Requests with Chunks/Streaming

2012-03-14 Thread Nicholas Ball
Hello all,

I've been working on a plugin with a custom component and a few handlers for a 
research project. It's aim is to do some interesting distributed work, however 
I seem to have come to a road block when trying to respond to a clients request 
in multiple steps. Not even sure if this is possible with Solr but after no 
luck on the IRC channel, thought I'd ask here.

What I'd like to achieve is to be able to have the requestHandler return 
results to a user as soon as it has data available, then continue processing or 
performing other distributed calls, and then return some more data, all on the 
same single client request.

Now my understanding is that solr does some kind of streaming. Not sure how 
it's technically done over http in Solr so any information would be useful. I 
believe something like this would work well but again not sure:

http://en.m.wikipedia.org/wiki/Chunked_transfer_encoding

I also came across this issue/feature request in JIRA but not completely sure 
what the conclusion was or how someone might do/use this. Is it even relevant 
to what I'm looking for?

https://issues.apache.org/jira/browse/SOLR-578

Thank you very much for any help and time you can spare!

Nicholas (incunix)




Re: How to avoid the unexpected character error?

2012-03-14 Thread Li Li
no, it's nothing to do with schema.xml
post.jar just post a file, it don't parse this file.
solr will use xml parser to parse this file. if you don't escape special
characters, it's not a valid xml file and solr will throw exceptions.

On Thu, Mar 15, 2012 at 12:33 AM, neosky  wrote:

> Thanks!
> Does the schema.xml support this parameter? I am using the example post.jar
> to index my file.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3825959.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


create on multicore

2012-03-14 Thread Warren H. Prince
Every night I dump my mySql db and load it into a development db.  I have also 
configured solr as multicore with production and development as the cores.  In 
order to keep my index on development current, I figured I could do a create to 
a new core, transition, every night, and then swap development with transition.

I tried to create transition today, but it overwrote production.  IOW, it seems 
to be equivalent to a swap.  So, my create command must be wrong.  Can someone 
help me out?  My command was:

http://myServer:8983/solr/admin/cores?action=CREATE&name=transition&instanceDir=production&dataDir=transition/data&config=solrconfig.xml&schema=schema.xml

I'm assuming name provides the name of the new core and instanceDir tells it 
which core to copy ?

Thanks in advance

Re: index size with replication

2012-03-14 Thread Mike Austin
Shawn,

Thanks for the detailed answer! I will play around with this information in
hand.  Maybe a second optimize or just a dummy commit after the optimize
will help get me past this.  Both not the best options, but maybe it's a do
it because it's running on windows work-around. If it is indeed a file
locking issue, I think I can probably work around this since my indexing is
scheduled at certain times and not "live" so I could try the optimize again
soon after or do a single commit that seems to fix the issue also.  Or just
not optimize..

Thanks,
Mike

On Wed, Mar 14, 2012 at 6:34 PM, Shawn Heisey  wrote:

> On 3/14/2012 2:54 PM, Mike Austin wrote:
>
>> The odd thing is that if I optimize the index it doubles in size.. If I
>> then, add one more document to the index it goes back down to half size?
>>
>> Is there a way to force this without needing to wait until another
>> document
>> is added? Or do you have more information on what you think is going on?
>> I'm using a trunk version of solr4 from 9/12/2011 with a master with two
>> slaves setup.  Everything besides this is working great!
>>
>
> The not-very-helpful-but-true answer: Don't run on Windows.  I checked
> your prior messages to the list to verify that this is your environment.
>  If you can control index updates so they don't happen at the same time as
> your optimizes, you can also get around this problem by doing the optimize
> twice.  You would have to be absolutely sure that no changes are made to
> the index between the two optimizes, so the second one basically doesn't do
> anything except take care of the deletes.
>
> Nuts and bolts of why this happens: Solr keeps the old files open so the
> existing reader can continue to serve queries.  That reader will not be
> closed until the last query completes, which may not happen until well
> after the time the new reader is completely online and ready.  I assume
> that the delete attempt occurs as soon as the new index segments are
> completely online, before the old reader begins to close.  I've not read
> the source code to find out.
>
> On Linux and other UNIX-like environments, you can delete files while they
> are open by a process.  They continue to exist as in-memory links and take
> up space until those processes close them, at which point they are truly
> gone.  On Windows, an attempt to delete an open file will fail, even if
> it's open read-only.
>
> There are probably a number of ways that this problem could be solved for
> Windows platforms.  The simplest that I can think of, assuming it's even
> possible, would be to wait until the old reader is closed before attempting
> the segment deletion.  That may not be possible - the information may not
> be available to the portion of code that does the deletion.  There are a
> few things standing in the way of me fixing this problem myself: 1) I'm a
> beginning Java programmer.  2) I'm not familiar with the Solr code at all.
> 3) My interest level is low because I run on Linux, not Windows.
>
> Thanks,
> Shawn
>
>


Re: Sort by bayesian function for 5 star rating

2012-03-14 Thread Mike Austin
Why don't you just use that formula and calculate the weighted rating for
each movie and index that value? sort=wrating desc

Maybe I didn't understand your question.

mike

On Mon, Mar 12, 2012 at 1:38 PM, Zac Smith  wrote:

> Does anyone have an example formula that can be used to sort by a 5 star
> rating in SOLR?
> I am looking at an example on IMDB's top 250 movie list:
>
> The formula for calculating the Top Rated 250 Titles gives a true Bayesian
> estimate:
>  weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
> where:
>R = average for the movie (mean) = (Rating)
>v = number of votes for the movie = (votes)
>m = minimum votes required to be listed in the Top 250 (currently 3000)
>C = the mean vote across the whole report (currently 6.9)
>
>


Re: Solr out of memory exception

2012-03-14 Thread Li Li
how many memory are allocated to JVM?

On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar  wrote:

> Solr is giving out of memory exception. Full Indexing was completed fine.
> Later while searching maybe when it tries to load the results in memory it
> starts giving this exception. Though with the same memory allocated to
> Tomcat and exactly same solr replica on another server it is working
> perfectly fine. I am working on 64 bit software's including Java & Tomcat
> on Windows.
> Any help would be appreciated.
>
> Here are the logs:
>
> The server encountered an internal error (Severe errors in solr
> configuration. Check your log files for more detailed information on what
> may be wrong. If you want solr to continue after configuration errors,
> change: false in
> null -
> java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at
> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at
> org.apache.solr.core.SolrCore.(SolrCore.java:579) at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
> at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
> at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
> at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at
> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at
> org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
> at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
> at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at
> org.apache.catalina.core.StandardService.start(StandardService.java:525) at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at
> org.apache.catalina.startup.Catalina.start(Catalina.java:595) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at
> java.lang.reflect.Method.invoke(Unknown Source) at
> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
> org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by:
> java.lang.OutOfMemoryError: Java heap space at
> org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:180)
> at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:91)
> at
> org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:122)
> at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652) at
> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:613) at
> org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:104) at
> org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:27)
> at
> org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
> at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
> at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at
> org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at
> org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057) at
> org.apache.solr.core.SolrCore.(SolrCore.java:579) at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
> at
> org.apache.catalina.core.StandardContext.fil

RE: Solr out of memory exception

2012-03-14 Thread Husain, Yavar
Thanks for helping me out.

I have allocated Xms-2.0GB Xmx-2.0GB

However i see Tomcat is still using pretty less memory and not 2.0G

Total Memory on my Windows Machine = 4GB.

With smaller index size it is working perfectly fine. I was thinking of 
increasing the system RAM & tomcat heap space allocated but then how come on a 
different server with exactly same system and solr configuration & memory it is 
working fine?


-Original Message-
From: Li Li [mailto:fancye...@gmail.com] 
Sent: Thursday, March 15, 2012 11:11 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr out of memory exception

how many memory are allocated to JVM?

On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar  wrote:

> Solr is giving out of memory exception. Full Indexing was completed fine.
> Later while searching maybe when it tries to load the results in memory it
> starts giving this exception. Though with the same memory allocated to
> Tomcat and exactly same solr replica on another server it is working
> perfectly fine. I am working on 64 bit software's including Java & Tomcat
> on Windows.
> Any help would be appreciated.
>
> Here are the logs:
>
> The server encountered an internal error (Severe errors in solr
> configuration. Check your log files for more detailed information on what
> may be wrong. If you want solr to continue after configuration errors,
> change: false in
> null -
> java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at
> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at
> org.apache.solr.core.SolrCore.(SolrCore.java:579) at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
> at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
> at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
> at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at
> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at
> org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
> at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
> at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at
> org.apache.catalina.core.StandardService.start(StandardService.java:525) at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at
> org.apache.catalina.startup.Catalina.start(Catalina.java:595) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at
> java.lang.reflect.Method.invoke(Unknown Source) at
> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
> org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by:
> java.lang.OutOfMemoryError: Java heap space at
> org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:180)
> at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:91)
> at
> org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:122)
> at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652) at
> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:613) at
> org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:104) at
> org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:27)
> at
> org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
> at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
> at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at
> org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at
> org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
> at org.apache.solr.core

Re: Solr out of memory exception

2012-03-14 Thread Li Li
it seems you are using 64bit jvm(32bit jvm can only allocate about 1.5GB).
you should enable pointer compression by -XX:+UseCompressedOops

On Thu, Mar 15, 2012 at 1:58 PM, Husain, Yavar  wrote:

> Thanks for helping me out.
>
> I have allocated Xms-2.0GB Xmx-2.0GB
>
> However i see Tomcat is still using pretty less memory and not 2.0G
>
> Total Memory on my Windows Machine = 4GB.
>
> With smaller index size it is working perfectly fine. I was thinking of
> increasing the system RAM & tomcat heap space allocated but then how come
> on a different server with exactly same system and solr configuration &
> memory it is working fine?
>
>
> -Original Message-
> From: Li Li [mailto:fancye...@gmail.com]
> Sent: Thursday, March 15, 2012 11:11 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr out of memory exception
>
> how many memory are allocated to JVM?
>
> On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar 
> wrote:
>
> > Solr is giving out of memory exception. Full Indexing was completed fine.
> > Later while searching maybe when it tries to load the results in memory
> it
> > starts giving this exception. Though with the same memory allocated to
> > Tomcat and exactly same solr replica on another server it is working
> > perfectly fine. I am working on 64 bit software's including Java & Tomcat
> > on Windows.
> > Any help would be appreciated.
> >
> > Here are the logs:
> >
> > The server encountered an internal error (Severe errors in solr
> > configuration. Check your log files for more detailed information on what
> > may be wrong. If you want solr to continue after configuration errors,
> > change: false in
> > null -
> > java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
> at
> > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at
> > org.apache.solr.core.SolrCore.(SolrCore.java:579) at
> >
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> > at
> >
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
> > at
> >
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
> > at
> >
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
> > at
> >
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
> > at
> > org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
> > at
> >
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
> > at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
> > at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
> at
> > org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at
> > org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at
> > org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at
> > org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
> >
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
> > at
> >
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
> > at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
> at
> > org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at
> > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at
> > org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at
> > org.apache.catalina.core.StandardService.start(StandardService.java:525)
> at
> > org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at
> > org.apache.catalina.startup.Catalina.start(Catalina.java:595) at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> > sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at
> > java.lang.reflect.Method.invoke(Unknown Source) at
> > org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
> > org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by:
> > java.lang.OutOfMemoryError: Java heap space at
> >
> org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:180)
> > at
> org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:91)
> > at
> >
> org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:122)
> > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652) at
> > org.apache.lucene.index.SegmentReader.get(SegmentReader.java:613) at
> > org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:104)
> at
> >
> org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:27)
> > at
> > org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
> > at
> >
> org.a