date:20120613

Re: SolrJ dependencies

2012-06-13 Thread Thijs


Done see:
https://issues.apache.org/jira/browse/SOLR-3541

On 12-6-2012 18:39, Sami Siren wrote:

On Tue, Jun 12, 2012 at 4:22 PM, Thijs  wrote:

Hi
I just checked out and build solr&lucene from branches/lucene_4x

I wanted to upgrade my custom client to this new version (using solrj).
So I copied lucene/solr/dist/apache-solr-solrj-4.0-SNAPSHOT.jar &
  lucene/solr/dist/apache-solr-core-4.0-SNAPSHOT.jar to my project and I
updated the other libs from the libs in /solr/dist/solrj-lib

However, when I wanted to run my client I got exceptions indicating that I
was missing the HTTPClient jars. (httpclient, htpcore,httpmime)
Shouldn't those go into lucene/solr/dist/solrj-lib as wel?

Yes they should.


Do I need to create a ticket for this?

Please do so.

--
  Sami Siren

Re: Solr PHP highload search

2012-06-13 Thread Erick Erickson

How much memory are you giving the JVM? Have you put a performance
monitor on the running process to see what resources have been
exhausted (i.e. are you I/O bound? CPU bound?)

Best
Erick

On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov
 wrote:
> Hi, all.
>
> I need advice for configuring Solr search to use at highload production.
>
> I've wrote user's search engine (PHP class), that uses over 70 parameters
> for searching users.
> User's database is over 30 millions records.
> Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
> Previous search engine can handle 700,000 queries per day for searching
> users - it is ~8 queries/sec (4 mysql servers with manual sharding via
> Gearman)
>
> Example of queries are:
>
> [responseHeader] => SolrObject Object
>        (
>            [status] => 0
>            [QTime] => 517
>            [params] => SolrObject Object
>                (
>                    [bq] => Array
>                        (
>                            [0] => bool_field1:1^30
>                            [1] => str_field1:str_value1^15
>                            [2] => tint_field1:tint_field1^5
>                            [3] => bool_field2:1^6
>                            [4] => date_field1:[NOW-14DAYS TO NOW]^20
>                            [5] => date_field2:[NOW-14DAYS TO NOW]^5
>                        )
>
>                    [indent] => on
>                    [start] => 0
>                    [q.alt] => *:*
>                    [wt] => xml
>                    [fq] => Array
>                        (
>                            [0] => tint_field2:[tint_value2 TO tint_value22]
>                            [1] => str_field1:str_value1
>                            [2] => str_field2:str_value2
>                            [3] => tint_field3:(tint_value3 OR tint_value32
> OR tint_value33 OR tint_value34 OR tint_value5)
>                            [4] => tint_field4:tint_value4
>                            [5] => -bool_field1:[* TO *]
>                        )
>
>                    [version] => 2.2
>                    [defType] => dismax
>                    [rows] => 10
>                )
>
>        )
>
>
> I test my PHP search API and found that concurrent random queries, for
> example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
> at 2 nodes.
>
> 1. How can I tweak my queries or parameters or Solr's config to decrease
> QTime?
> 2. What if I put my index data to emulated RAM directory, can it increase
> greatly performance?
> 3. Sorting by boost queries has a great influence on QTime, how can I
> optimize boost queries?
> 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3 nodes
> per machine, will it increase performance?
> 5. What is "multi-core query", how can I configure it, and will it increase
> performance?
>
> Thank you!

Re: Solr PHP highload search

2012-06-13 Thread Alexandr Bocharov

Thank you for help :)

I'm giving 2048M the JVM for each node.
CPU load is jumping 70-90%.
Memory usage is increasing to max during testing (probably cache is
filling).
I/O I didn't monitor.

I'd like to see answers on my other questions.

2012/6/13 Erick Erickson 

> How much memory are you giving the JVM? Have you put a performance
> monitor on the running process to see what resources have been
> exhausted (i.e. are you I/O bound? CPU bound?)
>
> Best
> Erick
>
> On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov
>  wrote:
> > Hi, all.
> >
> > I need advice for configuring Solr search to use at highload production.
> >
> > I've wrote user's search engine (PHP class), that uses over 70 parameters
> > for searching users.
> > User's database is over 30 millions records.
> > Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
> > Previous search engine can handle 700,000 queries per day for searching
> > users - it is ~8 queries/sec (4 mysql servers with manual sharding via
> > Gearman)
> >
> > Example of queries are:
> >
> > [responseHeader] => SolrObject Object
> >(
> >[status] => 0
> >[QTime] => 517
> >[params] => SolrObject Object
> >(
> >[bq] => Array
> >(
> >[0] => bool_field1:1^30
> >[1] => str_field1:str_value1^15
> >[2] => tint_field1:tint_field1^5
> >[3] => bool_field2:1^6
> >[4] => date_field1:[NOW-14DAYS TO NOW]^20
> >[5] => date_field2:[NOW-14DAYS TO NOW]^5
> >)
> >
> >[indent] => on
> >[start] => 0
> >[q.alt] => *:*
> >[wt] => xml
> >[fq] => Array
> >(
> >[0] => tint_field2:[tint_value2 TO
> tint_value22]
> >[1] => str_field1:str_value1
> >[2] => str_field2:str_value2
> >[3] => tint_field3:(tint_value3 OR
> tint_value32
> > OR tint_value33 OR tint_value34 OR tint_value5)
> >[4] => tint_field4:tint_value4
> >[5] => -bool_field1:[* TO *]
> >)
> >
> >[version] => 2.2
> >[defType] => dismax
> >[rows] => 10
> >)
> >
> >)
> >
> >
> > I test my PHP search API and found that concurrent random queries, for
> > example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
> > at 2 nodes.
> >
> > 1. How can I tweak my queries or parameters or Solr's config to decrease
> > QTime?
> > 2. What if I put my index data to emulated RAM directory, can it increase
> > greatly performance?
> > 3. Sorting by boost queries has a great influence on QTime, how can I
> > optimize boost queries?
> > 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3
> nodes
> > per machine, will it increase performance?
> > 5. What is "multi-core query", how can I configure it, and will it
> increase
> > performance?
> >
> > Thank you!
>

Re: Exception when optimizing index

2012-06-13 Thread Robert Muir

On Thu, Jun 7, 2012 at 5:50 AM, Rok Rejc  wrote:
>   - java.runtime.nameOpenJDK Runtime Environment
>   - java.runtime.version1.6.0_22-b22
...
>
> As far as I see from the JIRA issue I have the patch attached (as mentioned
> I have a trunk version from May 12). Any ideas?
>

its not guaranteed that the patch will workaround all hotspot bugs
related to http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921

Since you can reproduce, is it possible for you to re-test the
scenario with a newer JVM (e.g. 1.7.0_04) just to rule that out?

-- 
lucidimagination.com

Re: Solr PHP highload search

2012-06-13 Thread Erick Erickson

Consider just looking at it with jconsole (should be in your Java release) to
get a sense of the memory usage/collection. How much physical memory
do you have overall?

Because this is not what  I'd expect. Your CPU load is actually reasonably high,
so it doesn't look like you're swapping.

By and large, trying to use RAMDirectories isn't a good solution, between the OS
and Solr, they read the necessary parts of your index into memory and use that.

Best
Erick

On Wed, Jun 13, 2012 at 7:13 AM, Alexandr Bocharov
 wrote:
> Thank you for help :)
>
> I'm giving 2048M the JVM for each node.
> CPU load is jumping 70-90%.
> Memory usage is increasing to max during testing (probably cache is
> filling).
> I/O I didn't monitor.
>
> I'd like to see answers on my other questions.
>
> 2012/6/13 Erick Erickson 
>
>> How much memory are you giving the JVM? Have you put a performance
>> monitor on the running process to see what resources have been
>> exhausted (i.e. are you I/O bound? CPU bound?)
>>
>> Best
>> Erick
>>
>> On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov
>>  wrote:
>> > Hi, all.
>> >
>> > I need advice for configuring Solr search to use at highload production.
>> >
>> > I've wrote user's search engine (PHP class), that uses over 70 parameters
>> > for searching users.
>> > User's database is over 30 millions records.
>> > Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
>> > Previous search engine can handle 700,000 queries per day for searching
>> > users - it is ~8 queries/sec (4 mysql servers with manual sharding via
>> > Gearman)
>> >
>> > Example of queries are:
>> >
>> > [responseHeader] => SolrObject Object
>> >        (
>> >            [status] => 0
>> >            [QTime] => 517
>> >            [params] => SolrObject Object
>> >                (
>> >                    [bq] => Array
>> >                        (
>> >                            [0] => bool_field1:1^30
>> >                            [1] => str_field1:str_value1^15
>> >                            [2] => tint_field1:tint_field1^5
>> >                            [3] => bool_field2:1^6
>> >                            [4] => date_field1:[NOW-14DAYS TO NOW]^20
>> >                            [5] => date_field2:[NOW-14DAYS TO NOW]^5
>> >                        )
>> >
>> >                    [indent] => on
>> >                    [start] => 0
>> >                    [q.alt] => *:*
>> >                    [wt] => xml
>> >                    [fq] => Array
>> >                        (
>> >                            [0] => tint_field2:[tint_value2 TO
>> tint_value22]
>> >                            [1] => str_field1:str_value1
>> >                            [2] => str_field2:str_value2
>> >                            [3] => tint_field3:(tint_value3 OR
>> tint_value32
>> > OR tint_value33 OR tint_value34 OR tint_value5)
>> >                            [4] => tint_field4:tint_value4
>> >                            [5] => -bool_field1:[* TO *]
>> >                        )
>> >
>> >                    [version] => 2.2
>> >                    [defType] => dismax
>> >                    [rows] => 10
>> >                )
>> >
>> >        )
>> >
>> >
>> > I test my PHP search API and found that concurrent random queries, for
>> > example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
>> > at 2 nodes.
>> >
>> > 1. How can I tweak my queries or parameters or Solr's config to decrease
>> > QTime?
>> > 2. What if I put my index data to emulated RAM directory, can it increase
>> > greatly performance?
>> > 3. Sorting by boost queries has a great influence on QTime, how can I
>> > optimize boost queries?
>> > 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3
>> nodes
>> > per machine, will it increase performance?
>> > 5. What is "multi-core query", how can I configure it, and will it
>> increase
>> > performance?
>> >
>> > Thank you!
>>

Re: Sharding in SolrCloud

2012-06-13 Thread Lenzner

Mark Miller  schrieb am 12.06.2012 19:19:01:
> 
> 
> On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote:
> 
> > Hello,
> > 
> > we tested SolrCloud in a setup with one collection, two shards and one 

> > replica per shard and it works quite fine with some example data. 
> > Now, we plan to set up our own collection and determine in how many 
shards 
> > we should devide it. 
> > We can estimate quite exactly the size of the collection, but we don't 

> > know, what the best approach for sharding is, 
> > even if we know the size and the amount of queries and updates.
> > Is there any documentation or a kind of design guidelines for sharding 
a 
> > collection in SolrCloud?
> > 
> > 
> > Thanks & regards,
> > Norman Lenzner
> 
> 
> It's hard to tell - I think you want to start with an idea of how 
> many docs you can fit on a single node. This can vary wildly 
> depending on many factors. Generally you have to do some testing 
> with your particular config and data. You can search the mailing 
> lists and perhaps dig up a little info, but there is really no 
> replacement for running some tests with real data.
> 
> Then you have to plan in your growth rate - resharding is naturally 
> a relatively expensive operation. Once you have an idea of how many 
> docs per machine you think seems comfortable, figure out how 
> machines you need given your estimated doc growth rate and perhaps 
> some padding. You might not get it right, but if you expect the 
> possibility of a lot of growth, erring on the more shards side is 
> obviously better.
> 
> - Mark Miller
> lucidimagination.com
> 

Hello and thanks for your reply,

We will run some tests to determine the size of our collection, but I 
think, there
won't be the need of a second shard at all. The problem is not the size or 
the growth of
the docs, but there will be a quite high update frequency. So, if we have 
many bulk updates, is
it reasonable to distribute the update load on multiple shards?

Thanks & regards,
Norman Lenzner

Re: Different sort for each facet

2012-06-13 Thread Christopher Gross

Hmm, it seems that if I leave off the initial "facet.sort=index" then
it will sort each by index by default, and I can use the
"f.people.facet.sort=count" as expected.

I thought I tried that yesterday, but I suppose it slipped my mind in
my sleep-deprived state.

Thanks Jack!

-- Chris


On Tue, Jun 12, 2012 at 10:58 PM, Jack Krupansky
 wrote:
> f.people.facet.sort=count should work.
>
> Make sure you don't have a conflicting setting for that same field and
> attribute.
>
> Does the "people" facet sort by count correctly with f.sort=index?
>
> What are the attributes and field type for the "people" field?
>
> -- Jack Krupansky
>
> -Original Message- From: Christopher Gross
> Sent: Tuesday, June 12, 2012 11:05 AM
> To: solr-user
> Subject: Different sort for each facet
>
>
> In Solr 3.4, is there a way I can sort two facets differently in the same
> query?
>
> If I have:
>
> http://mysolrsrvr/solr/select?q=*:*&facet=true&facet.field=people&facet.field=category
>
> is there a way that I can sort people by the count and category by the
> name all in one query?  Or do I need to do that in separate queries?
> I tried using "f.people.facet.sort=count" while also having
> "facet.sort=index" but both came back in alphabetical order.
>
> Doing more queries is OK, I'm just trying to avoid having to do too many.
>
> -- Chris

LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Daniel Brügge

Hi,

am struggling around with creating multiple collections on a 4 instances
SolrCloud
setup:

I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each
and
on one is also a standalone Zookeeper running.

Loading the Solr configuration into ZK works fine.

Then I startup the 4 instances and everything is also running smoothly.

After that I am adding one core with the name e.g. '123'.

This core is correctly visible on the instance I have used for creating it.

it maps like

'123' > shard1 -> virtual-instance-1


After that I am creating a core with the same name '123' on the second
instance and it
creates it, but an exception is thrown after some while and the cluster
state of
the newly created core goes to 'recovering'


  *"123":{"shard1":{
  "virtual-instance-1:8983_solr_123":{
"shard":"shard1",
"roles":null,
"leader":"true",
"state":"active",
"core":"123",
"collection":"123",
"node_name":"virtual-instance-1:8983_solr",
"base_url":"http://virtual-instance-1:8983/solr"},
  "**virtual-instance-2**:8983_solr_123":{*
*"shard":"shard1",
"roles":null,
"state":"recovering",
"core":"123",
"collection":"123",
"node_name":"virtual-instance-2:8983_solr",
"base_url":"http://virtual-instance-2:8983/solr"}}},*


The exception throws is on the first virtual instance:

*Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
*SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
* at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
* at org.apache.lucene.index.IndexWriter.(IndexWriter.java:607)*
* at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:58)*
* at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
*
* at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
*
* at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
*
* at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
*
* at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
*
* at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
*
* at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
*
* at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
*
* at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
*
* at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
*
* at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
* at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
*
* at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
*
* at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
*
* at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)*
* at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
*
* at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)*
* at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
*
* at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
*
* at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
* at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
*
* at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
*
* at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
*
* at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
*
* at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
*
* at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
*
* at org.eclipse.jetty.server.Server.handle(Server.java:351)*
* at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
*
* at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
*
* at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
*
* at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
*
* at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)*
* at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)*
* at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
*
* at
org.eclipse.jetty.server.bio.SocketC

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Daniel Brügge

BTW: i am running the solr instances using -Xms512M -Xmx1024M

so not so little memory.

Daniel

On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge <
daniel.brue...@googlemail.com> wrote:

> Hi,
>
> am struggling around with creating multiple collections on a 4 instances
> SolrCloud
> setup:
>
> I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
> each and
> on one is also a standalone Zookeeper running.
>
> Loading the Solr configuration into ZK works fine.
>
> Then I startup the 4 instances and everything is also running smoothly.
>
> After that I am adding one core with the name e.g. '123'.
>
> This core is correctly visible on the instance I have used for creating
> it.
>
> it maps like
>
> '123' > shard1 -> virtual-instance-1
>
>
> After that I am creating a core with the same name '123' on the second
> instance and it
> creates it, but an exception is thrown after some while and the cluster
> state of
> the newly created core goes to 'recovering'
>
>
>   *"123":{"shard1":{
>   "virtual-instance-1:8983_solr_123":{
> "shard":"shard1",
> "roles":null,
> "leader":"true",
> "state":"active",
> "core":"123",
> "collection":"123",
> "node_name":"virtual-instance-1:8983_solr",
> "base_url":"http://virtual-instance-1:8983/solr"},
>   "**virtual-instance-2**:8983_solr_123":{*
> *"shard":"shard1",
> "roles":null,
> "state":"recovering",
> "core":"123",
> "collection":"123",
> "node_name":"virtual-instance-2:8983_solr",
> "base_url":"http://virtual-instance-2:8983/solr"}}},*
>
>
> The exception throws is on the first virtual instance:
>
> *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
> *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
> obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
> * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
> * at org.apache.lucene.index.IndexWriter.(IndexWriter.java:607)*
> * at
> org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:58)*
> * at
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
> *
> * at
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
> *
> * at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
> *
> * at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
> *
> * at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> *
> * at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
> *
> * at
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
> *
> * at
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> *
> * at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> *
> * at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> *
> * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
> * at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> *
> * at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> *
> * at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> *
> * at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> *
> * at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> *
> * at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> *
> * at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> *
> * at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> *
> * at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
> * at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> *
> * at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> *
> * at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> *
> * at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> *
> * at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> *
> * at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> *
> * at org.eclipse.jetty.server.Server.handle(Server.java:351)*
> * at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> *
> * at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> *
> * at
> org.eclipse.jetty.server

Re: Different sort for each facet

2012-06-13 Thread Jack Krupansky

I'm glad that you have something working, but you shouldn't have to remove 
that facet.sort=index.


I tried the following and it works with the Solr 3.6 example after I indexed 
with exampledocs/books.json:


http://localhost:8983/solr/select/?q=*:*&facet=true&facet.field=name&facet.field=genre_s&facet.sort=index&f.name.facet.sort=count

I see the name field sorted by count and the genre_s field sorted by lexical 
order (note: "IT" comes before "fantasy" because upper case comes before 
lower case - it would be nice to have a case-neutral sort.)


Could you try it, just to see if maybe we are not communicating about what 
exactly is not working for you?


What release of Solr are you using? I am not aware of any fixes/changes that 
would make this behave differently as of 3.6.


BTW, the default sort is "index" IFF facet.limit <= 0. The default for 
facet.limit is 100, so sort should default to "count". I presume you have 
facet.limit set to -1 or 0.


You might also check to see what facet parameters might be set in your 
request handler as opposed to on the actual query request.


-- Jack Krupansky

-Original Message- 
From: Christopher Gross

Sent: Wednesday, June 13, 2012 9:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Different sort for each facet

Hmm, it seems that if I leave off the initial "facet.sort=index" then
it will sort each by index by default, and I can use the
"f.people.facet.sort=count" as expected.

I thought I tried that yesterday, but I suppose it slipped my mind in
my sleep-deprived state.

Thanks Jack!

-- Chris


On Tue, Jun 12, 2012 at 10:58 PM, Jack Krupansky
 wrote:

f.people.facet.sort=count should work.

Make sure you don't have a conflicting setting for that same field and
attribute.

Does the "people" facet sort by count correctly with f.sort=index?

What are the attributes and field type for the "people" field?

-- Jack Krupansky

-Original Message- From: Christopher Gross
Sent: Tuesday, June 12, 2012 11:05 AM
To: solr-user
Subject: Different sort for each facet


In Solr 3.4, is there a way I can sort two facets differently in the same
query?

If I have:

http://mysolrsrvr/solr/select?q=*:*&facet=true&facet.field=people&facet.field=category

is there a way that I can sort people by the count and category by the
name all in one query?  Or do I need to do that in separate queries?
I tried using "f.people.facet.sort=count" while also having
"facet.sort=index" but both came back in alphabetical order.

Doing more queries is OK, I'm just trying to avoid having to do too many.

-- Chris

Re: [DIH] Multiple repeat XPath stmts

2012-06-13 Thread alesp

TNX. A lifesaver...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Multiple-repeat-XPath-stmts-tp499770p3989439.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting maximum / minimum field value - slow query

2012-06-13 Thread rafal.gwizd...@gmail.com

What is more, I tried to get the maximum value using stats query
This time the response time was about 30 seconds and server ate 1.5 Gb of
memory when calculating the response. But there were no statistics in
response:



0
27578

*.*
true
Id
0










What's wrong here?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-maximum-minimum-field-value-slow-query-tp3989467p3989468.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Mark Miller

Thats an interesting data dir location: NativeFSLock@/home/myuser/
data/index/write.lock

Where are the other data dirs located? Are you sharing one drive or
something? It looks like something already has a writer lock - are you sure
another solr instance is not running somehow?

On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge <
daniel.brue...@googlemail.com> wrote:

> BTW: i am running the solr instances using -Xms512M -Xmx1024M
>
> so not so little memory.
>
> Daniel
>
> On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge <
> daniel.brue...@googlemail.com> wrote:
>
> > Hi,
> >
> > am struggling around with creating multiple collections on a 4 instances
> > SolrCloud
> > setup:
> >
> > I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
> > each and
> > on one is also a standalone Zookeeper running.
> >
> > Loading the Solr configuration into ZK works fine.
> >
> > Then I startup the 4 instances and everything is also running smoothly.
> >
> > After that I am adding one core with the name e.g. '123'.
> >
> > This core is correctly visible on the instance I have used for creating
> > it.
> >
> > it maps like
> >
> > '123' > shard1 -> virtual-instance-1
> >
> >
> > After that I am creating a core with the same name '123' on the second
> > instance and it
> > creates it, but an exception is thrown after some while and the cluster
> > state of
> > the newly created core goes to 'recovering'
> >
> >
> >   *"123":{"shard1":{
> >   "virtual-instance-1:8983_solr_123":{
> > "shard":"shard1",
> > "roles":null,
> > "leader":"true",
> > "state":"active",
> > "core":"123",
> > "collection":"123",
> > "node_name":"virtual-instance-1:8983_solr",
> > "base_url":"http://virtual-instance-1:8983/solr"},
> >   "**virtual-instance-2**:8983_solr_123":{*
> > *"shard":"shard1",
> > "roles":null,
> > "state":"recovering",
> > "core":"123",
> > "collection":"123",
> > "node_name":"virtual-instance-2:8983_solr",
> > "base_url":"http://virtual-instance-2:8983/solr"}}},*
> >
> >
> > The exception throws is on the first virtual instance:
> >
> > *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
> > *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
> > obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
> > * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
> > * at org.apache.lucene.index.IndexWriter.(IndexWriter.java:607)*
> > * at
> > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:58)*
> > * at
> >
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
> > *
> > * at
> >
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
> > *
> > * at
> >
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
> > *
> > * at
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
> > *
> > * at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> > *
> > * at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
> > *
> > * at
> >
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
> > *
> > * at
> >
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> > *
> > * at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> > *
> > * at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> > *
> > * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
> > * at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> > *
> > * at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> > *
> > * at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> > *
> > * at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> > *
> > * at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> > *
> > * at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> > *
> > * at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> > *
> > * at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> > *
> > * at
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
> > * at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> > *
> > * at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> > *
> > * at
> >
> org.eclipse.jetty.ser

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Casey Callendrello

What command are you using to create the cores?

I had this sort of problem, and it was because I'd accidentally created
two cores with the same instanceDir within the same SOLR process. Make
sure you don't have that kind of collision. The easiest way is to
specify an explicit instanceDir and dataDir.

Best,
Casey Callendrello


On 6/13/12 7:28 AM, Daniel Brügge wrote:
> Hi,
>
> am struggling around with creating multiple collections on a 4 instances
> SolrCloud
> setup:
>
> I have 4 virtual OpenVZ instances, where I have installed SolrCloud on each
> and
> on one is also a standalone Zookeeper running.
>
> Loading the Solr configuration into ZK works fine.
>
> Then I startup the 4 instances and everything is also running smoothly.
>
> After that I am adding one core with the name e.g. '123'.
>
> This core is correctly visible on the instance I have used for creating it.
>
> it maps like
>
> '123' > shard1 -> virtual-instance-1
>
>
> After that I am creating a core with the same name '123' on the second
> instance and it
> creates it, but an exception is thrown after some while and the cluster
> state of
> the newly created core goes to 'recovering'
>
>
>   *"123":{"shard1":{
>   "virtual-instance-1:8983_solr_123":{
> "shard":"shard1",
> "roles":null,
> "leader":"true",
> "state":"active",
> "core":"123",
> "collection":"123",
> "node_name":"virtual-instance-1:8983_solr",
> "base_url":"http://virtual-instance-1:8983/solr"},
>   "**virtual-instance-2**:8983_solr_123":{*
> *"shard":"shard1",
> "roles":null,
> "state":"recovering",
> "core":"123",
> "collection":"123",
> "node_name":"virtual-instance-2:8983_solr",
> "base_url":"http://virtual-instance-2:8983/solr"}}},*
>
>
> The exception throws is on the first virtual instance:
>
> *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
> *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
> obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
> * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
> * at org.apache.lucene.index.IndexWriter.(IndexWriter.java:607)*
> * at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:58)*
> * at
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
> *
> * at
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
> *
> * at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
> *
> * at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
> *
> * at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> *
> * at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
> *
> * at
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
> *
> * at
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> *
> * at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> *
> * at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> *
> * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
> * at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> *
> * at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> *
> * at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> *
> * at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)*
> * at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> *
> * at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)*
> * at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> *
> * at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> *
> * at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)*
> * at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> *
> * at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> *
> * at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> *
> * at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> *
> * at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> *
> * at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> *
> * at org.eclipse.jetty.server.Server.handle(Server.java:351)*
> * at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(

Re: Getting maximum / minimum field value - slow query

2012-06-13 Thread Jack Krupansky

Try the query without the sort to get the number of rows, then do a second 
query using a "start" equal to the number of rows. That should get you the 
last row/document.


-- Jack Krupansky

-Original Message- 
From: rafal.gwizd...@gmail.com

Sent: Wednesday, June 13, 2012 3:07 PM
To: solr-user@lucene.apache.org
Subject: Getting maximum / minimum field value - slow query

Hi, I have an index with about 9 millions of documents. Every document has 
an

integer 'Id' field (it's not the SOLR document identifier) and I want to get
the maximum value of that field.
Therefore I'm doing a search with the following parameters
query=*.*, sort=Id desc, rows=1



0
2672

*:*
1
Id desc




CRQIncident#45165891




The problem is that it takes quite a long time to get the response (2-10
seconds). Why is it so slow - isn't it a simple index lookup?

Best regards
RG

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-maximum-minimum-field-value-slow-query-tp3989467.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr1.4 and threads ....

2012-06-13 Thread Benson Margulies

We've got a tokenizer which is quite explicitly coded on the
assumption that it will only be called from one thread at a time.
After all, what would it mean for two threads to make interleaved
calls to the hasNext() function()?

Yet, a customer of ours with a gigantic instance of Solr 1.4 reports
incidents in which we throw an exception that indicates (we think),
that two different threads made interleaved calls.

Does this suggest anything to anyone? Other than that we've
misanalyzed the logic in the tokenizer and there's a way to make it
burp on one thread?

Re: Sharding in SolrCloud

2012-06-13 Thread Erick Erickson

Hmmm, are you sure SolrCloud fits your needs? You say that you think
everything will fit on one shard and are worried about bulk updates. In
that case I should think regular Solr master/slave (rather than cloud)
might be a better fit. Using Cloud and all that goes with it for a single shard
is certainly possible, but I question whether it's your best option here

Of course if NRT is a requirement, then SolrCloud is a much better option

With typical master/slave setups, since your bulk updates are happening on
a separate machine, having multiple slaves that query at a given interval
seems like it would work, but you'd have to be able to stand, say, 5-10 minute
latency...

Best
Erick

On Wed, Jun 13, 2012 at 7:47 AM,   wrote:
> Mark Miller  schrieb am 12.06.2012 19:19:01:
>>
>>
>> On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote:
>>
>> > Hello,
>> >
>> > we tested SolrCloud in a setup with one collection, two shards and one
>
>> > replica per shard and it works quite fine with some example data.
>> > Now, we plan to set up our own collection and determine in how many
> shards
>> > we should devide it.
>> > We can estimate quite exactly the size of the collection, but we don't
>
>> > know, what the best approach for sharding is,
>> > even if we know the size and the amount of queries and updates.
>> > Is there any documentation or a kind of design guidelines for sharding
> a
>> > collection in SolrCloud?
>> >
>> >
>> > Thanks & regards,
>> > Norman Lenzner
>>
>>
>> It's hard to tell - I think you want to start with an idea of how
>> many docs you can fit on a single node. This can vary wildly
>> depending on many factors. Generally you have to do some testing
>> with your particular config and data. You can search the mailing
>> lists and perhaps dig up a little info, but there is really no
>> replacement for running some tests with real data.
>>
>> Then you have to plan in your growth rate - resharding is naturally
>> a relatively expensive operation. Once you have an idea of how many
>> docs per machine you think seems comfortable, figure out how
>> machines you need given your estimated doc growth rate and perhaps
>> some padding. You might not get it right, but if you expect the
>> possibility of a lot of growth, erring on the more shards side is
>> obviously better.
>>
>> - Mark Miller
>> lucidimagination.com
>>
>
> Hello and thanks for your reply,
>
> We will run some tests to determine the size of our collection, but I
> think, there
> won't be the need of a second shard at all. The problem is not the size or
> the growth of
> the docs, but there will be a quite high update frequency. So, if we have
> many bulk updates, is
> it reasonable to distribute the update load on multiple shards?
>
> Thanks & regards,
> Norman Lenzner

Re: FilterCache - maximum size of document set

2012-06-13 Thread Erick Erickson

Hmmm, I think you may be looking at the wrong thing here. Generally, a
filterCache
entry will be maxDocs/8 (plus some overhead), so in your case they really
shouldn't be all that large, on the order of 3M/filter. That shouldn't
vary based
on the number of docs that match the fq, it's just a bitset. To see if
that makes any
sense, take a look at the admin page and the number of evictions in
your filterCache. If
that is > 0, you're probably using all the memory you're going to in
the filterCache during
the day..

But you haven't indicated what version of Solr you're using, I'm going from a
relatively recent 3x knowledge-base.

Have you put a memory analyzer against your Solr instance to see where
the memory
is being used?

Best
Erick

On Wed, Jun 13, 2012 at 1:05 PM, Pawel  wrote:
> Hi,
> I have solr index with about 25M documents. I optimized FilterCache size to
> reach the best performance (considering traffic characteristic that my Solr
> handles). I see that the only way to limit size of a Filter Cace is to set
> number of document sets that Solr can cache. There is no way to set memory
> limit (eg. 2GB, 4GB or something like that). When I process a standard
> trafiic (during day) everything is fine. But when Solr handle night traffic
> (and the charateristic of requests change) some problems appear. There is
> JVM out of memory error. I know what is the reason. Some filters on some
> fields are quite poor filters. They returns 15M of documents or even more.
> You could say 'Just put that into q'. I tried to put that filters into
> "Query" part but then, the statistics of request processing time (during
> day) become much worse. Reduction of Filter Cache maxSize is also not good
> solution because during day cache filters are very very helpful.
> You could be interested in type of filters that I use. These are range
> filters (I tried standard range filters and frange) - eg. price:[* TO
> 1]. Some fq with price can return few thousands of results (eg.
> price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions of
> documents. I'd also like to avoid solution which will introduce strict
> ranges that user can choose.
> Have you any suggestions what can I do? Is there any way to limit for
> example maximum size of docSet which is cached in FilterCache?
>
> --
> Pawel

Re: Solr1.4 and threads ....

2012-06-13 Thread Robert Muir

On Wed, Jun 13, 2012 at 4:38 PM, Benson Margulies  wrote:
>
> Does this suggest anything to anyone? Other than that we've
> misanalyzed the logic in the tokenizer and there's a way to make it
> burp on one thread?

it might suggest the different tokenstream instances refer to some
shared object that is not thread safe: we had bugs like this before
(e.g. sharing a JDK collator is ok, but ICU ones are not thread-safe,
so you must clone them).

Because of this we beefed up our base analysis class
(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java)
to find thread safety bugs like this.

I recommend just grabbing the test-framework.jar (we release it as an
artifact), extend that class and write a test like:
  public void testRandomStrings() throws Exception {
checkRandomData(random, analyzer, 10);
  }

(or use the one in the branch, its even been improved since 3.6)

-- 
lucidimagination.com

Re: Getting maximum / minimum field value - slow query

2012-06-13 Thread Erik Hatcher

A large start value is probably worse performing than the sort (see SOLR-1726). 
 Once the sort field is cached, it'll be quick from then on.  Put in a warming 
query in solrconfig for new and/or firstSearcher that does this sort and the 
cache will be built in advance of queries at least.

Erik

On Jun 13, 2012, at 16:09 , Jack Krupansky wrote:

> Try the query without the sort to get the number of rows, then do a second 
> query using a "start" equal to the number of rows. That should get you the 
> last row/document.
> 
> -- Jack Krupansky
> 
> -Original Message- From: rafal.gwizd...@gmail.com
> Sent: Wednesday, June 13, 2012 3:07 PM
> To: solr-user@lucene.apache.org
> Subject: Getting maximum / minimum field value - slow query
> 
> Hi, I have an index with about 9 millions of documents. Every document has an
> integer 'Id' field (it's not the SOLR document identifier) and I want to get
> the maximum value of that field.
> Therefore I'm doing a search with the following parameters
> query=*.*, sort=Id desc, rows=1
> 
> 
> 
> 0
> 2672
> 
> *:*
> 1
> Id desc
> 
> 
> 
> 
> CRQIncident#45165891
> 
> 
> 
> 
> The problem is that it takes quite a long time to get the response (2-10
> seconds). Why is it so slow - isn't it a simple index lookup?
> 
> Best regards
> RG
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Getting-maximum-minimum-field-value-slow-query-tp3989467.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: FilterCache - maximum size of document set

2012-06-13 Thread Pawel Rog

Thanks for your response
Yes, maybe you are right. I thought that filters can be larger than 3M. All
kinds of filters uses BitSet?
Moreover maxSize of filterCache is set to 16000 in my case. There are
evictions during day traffic
but not during night traffic.

Version of Solr which I use is 3.5

I haven't used Memory Anayzer yet. Could you write more details about it?

--
Regards,
Pawel

On Wed, Jun 13, 2012 at 10:55 PM, Erick Erickson wrote:

> Hmmm, I think you may be looking at the wrong thing here. Generally, a
> filterCache
> entry will be maxDocs/8 (plus some overhead), so in your case they really
> shouldn't be all that large, on the order of 3M/filter. That shouldn't
> vary based
> on the number of docs that match the fq, it's just a bitset. To see if
> that makes any
> sense, take a look at the admin page and the number of evictions in
> your filterCache. If
> that is > 0, you're probably using all the memory you're going to in
> the filterCache during
> the day..
>
> But you haven't indicated what version of Solr you're using, I'm going
> from a
> relatively recent 3x knowledge-base.
>
> Have you put a memory analyzer against your Solr instance to see where
> the memory
> is being used?
>
> Best
> Erick
>
> On Wed, Jun 13, 2012 at 1:05 PM, Pawel  wrote:
> > Hi,
> > I have solr index with about 25M documents. I optimized FilterCache size
> to
> > reach the best performance (considering traffic characteristic that my
> Solr
> > handles). I see that the only way to limit size of a Filter Cace is to
> set
> > number of document sets that Solr can cache. There is no way to set
> memory
> > limit (eg. 2GB, 4GB or something like that). When I process a standard
> > trafiic (during day) everything is fine. But when Solr handle night
> traffic
> > (and the charateristic of requests change) some problems appear. There is
> > JVM out of memory error. I know what is the reason. Some filters on some
> > fields are quite poor filters. They returns 15M of documents or even
> more.
> > You could say 'Just put that into q'. I tried to put that filters into
> > "Query" part but then, the statistics of request processing time (during
> > day) become much worse. Reduction of Filter Cache maxSize is also not
> good
> > solution because during day cache filters are very very helpful.
> > You could be interested in type of filters that I use. These are range
> > filters (I tried standard range filters and frange) - eg. price:[* TO
> > 1]. Some fq with price can return few thousands of results (eg.
> > price:[40 TO 50]), but some (eg. price:[* TO 1]) can return milions
> of
> > documents. I'd also like to avoid solution which will introduce strict
> > ranges that user can choose.
> > Have you any suggestions what can I do? Is there any way to limit for
> > example maximum size of docSet which is cached in FilterCache?
> >
> > --
> > Pawel
>

Regarding number of documents

2012-06-13 Thread Swetha Shenoy

Hi,

I have a data config file that contains the data import query. If I just
run the import query against MySQL, I get a certain number of results. I
assume that if I run the full-import, I should get the same number of
documents added to the index, but I see that it's not the case and the
number of documents added to the index are less than what I see from the
MySQL query result. Can any one tell me if my assumption is correct and why
the number of documents would be off?

Thanks,
Swetha

Re: Regarding number of documents

2012-06-13 Thread Swetha Shenoy

Note: I don't see any errors in the logs when I run the index.

On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy  wrote:

> Hi,
>
> I have a data config file that contains the data import query. If I just
> run the import query against MySQL, I get a certain number of results. I
> assume that if I run the full-import, I should get the same number of
> documents added to the index, but I see that it's not the case and the
> number of documents added to the index are less than what I see from the
> MySQL query result. Can any one tell me if my assumption is correct and why
> the number of documents would be off?
>
> Thanks,
> Swetha
>

Re: Regarding number of documents

2012-06-13 Thread Afroz Ahmad

Could it be that you are getting records that are not unique. If so then
SOLR would just overwrite the non unique documents.

Thanks
Afroz

On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy  wrote:

> Note: I don't see any errors in the logs when I run the index.
>
> On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy  wrote:
>
> > Hi,
> >
> > I have a data config file that contains the data import query. If I just
> > run the import query against MySQL, I get a certain number of results. I
> > assume that if I run the full-import, I should get the same number of
> > documents added to the index, but I see that it's not the case and the
> > number of documents added to the index are less than what I see from the
> > MySQL query result. Can any one tell me if my assumption is correct and
> why
> > the number of documents would be off?
> >
> > Thanks,
> > Swetha
> >
>

Re: Regarding number of documents

2012-06-13 Thread Swetha Shenoy

That makes sense. But I added a new entry that showed up in the MySQL
results and not in the Solr search results. The count of documents also did
not increase after the addition. How can a new entry show up in MySQL
results and not as a new document?

On Wed, Jun 13, 2012 at 6:26 PM, Afroz Ahmad  wrote:

> Could it be that you are getting records that are not unique. If so then
> SOLR would just overwrite the non unique documents.
>
> Thanks
> Afroz
>
> On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy  wrote:
>
> > Note: I don't see any errors in the logs when I run the index.
> >
> > On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy 
> wrote:
> >
> > > Hi,
> > >
> > > I have a data config file that contains the data import query. If I
> just
> > > run the import query against MySQL, I get a certain number of results.
> I
> > > assume that if I run the full-import, I should get the same number of
> > > documents added to the index, but I see that it's not the case and the
> > > number of documents added to the index are less than what I see from
> the
> > > MySQL query result. Can any one tell me if my assumption is correct and
> > why
> > > the number of documents would be off?
> > >
> > > Thanks,
> > > Swetha
> > >
> >
>

Re: Regarding number of documents

2012-06-13 Thread Jack Krupansky

Check the ID for that latest record and try to query it in Solr.

One way you can get multiple records in an RDBMS query is via join. In that 
case, each of the records could have the same value in the column(s) that 
you are using for your unique key field in Solr.

-- Jack Krupansky

-Original Message- 
From: Swetha Shenoy

Sent: Wednesday, June 13, 2012 7:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Regarding number of documents

That makes sense. But I added a new entry that showed up in the MySQL
results and not in the Solr search results. The count of documents also did
not increase after the addition. How can a new entry show up in MySQL
results and not as a new document?

On Wed, Jun 13, 2012 at 6:26 PM, Afroz Ahmad  wrote:

Could it be that you are getting records that are not unique. If so then
SOLR would just overwrite the non unique documents.

Thanks
Afroz

On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy  wrote:

> Note: I don't see any errors in the logs when I run the index.
>
> On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy 
wrote:
>
> > Hi,
> >
> > I have a data config file that contains the data import query. If I
just
> > run the import query against MySQL, I get a certain number of results.
I
> > assume that if I run the full-import, I should get the same number of
> > documents added to the index, but I see that it's not the case and the
> > number of documents added to the index are less than what I see from
the
> > MySQL query result. Can any one tell me if my assumption is correct 
> > and

> why
> > the number of documents would be off?
> >
> > Thanks,
> > Swetha
> >
>

Re: Regarding number of documents

2012-06-13 Thread Gora Mohanty

On 14 June 2012 04:51, Swetha Shenoy  wrote:
> That makes sense. But I added a new entry that showed up in the MySQL
> results and not in the Solr search results. The count of documents also did
> not increase after the addition. How can a new entry show up in MySQL
> results and not as a new document?

Sorry, but this is not very clear: Are you running a
full-import, or a delta-import after adding the new
entry in mysql? By any chance, does the new entry
have an ID that already exists in the Solr index?

What is the number of records that DIH reports
after an import is completed?

Regards,
Gora

Re: Unexpected DIH behavior for onError attribute

2012-06-13 Thread Gora Mohanty

On 13 June 2012 10:45, Pranav Prakash  wrote:
> My DIH Config file goes as follows. We have two db hosts, one of which
> contains blocks of content and the other contain transcripts of those
> content blocks. The makeDynamicTranscript function is used to create row
> names like transcript_en, transcript_es and so on, which are dynamic fields
> in Solr with appropriate tokenizers.
[...]

This looks fine. Have you looked in the Solr logs
for more information? Is it possible that the error
is causing some connection issue? What is the
error exactly, and is it happening on the SELECT
in the inner entity, or on the outer one?

Regards,
Gora

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-13 Thread Daniel Brügge

Will check later to use different data dirs for the core on
each instance.
But because each Solr sits in it's own openvz instance (virtual
server respectively) they should be totally separated. At least
from my point of understanding virtualization.

Will check and get back here...

Thanks.

On Wed, Jun 13, 2012 at 8:10 PM, Mark Miller  wrote:

> Thats an interesting data dir location: NativeFSLock@/home/myuser/
> data/index/write.lock
>
> Where are the other data dirs located? Are you sharing one drive or
> something? It looks like something already has a writer lock - are you sure
> another solr instance is not running somehow?
>
> On Wed, Jun 13, 2012 at 11:11 AM, Daniel Brügge <
> daniel.brue...@googlemail.com> wrote:
>
> > BTW: i am running the solr instances using -Xms512M -Xmx1024M
> >
> > so not so little memory.
> >
> > Daniel
> >
> > On Wed, Jun 13, 2012 at 4:28 PM, Daniel Brügge <
> > daniel.brue...@googlemail.com> wrote:
> >
> > > Hi,
> > >
> > > am struggling around with creating multiple collections on a 4
> instances
> > > SolrCloud
> > > setup:
> > >
> > > I have 4 virtual OpenVZ instances, where I have installed SolrCloud on
> > > each and
> > > on one is also a standalone Zookeeper running.
> > >
> > > Loading the Solr configuration into ZK works fine.
> > >
> > > Then I startup the 4 instances and everything is also running smoothly.
> > >
> > > After that I am adding one core with the name e.g. '123'.
> > >
> > > This core is correctly visible on the instance I have used for creating
> > > it.
> > >
> > > it maps like
> > >
> > > '123' > shard1 -> virtual-instance-1
> > >
> > >
> > > After that I am creating a core with the same name '123' on the second
> > > instance and it
> > > creates it, but an exception is thrown after some while and the cluster
> > > state of
> > > the newly created core goes to 'recovering'
> > >
> > >
> > >   *"123":{"shard1":{
> > >   "virtual-instance-1:8983_solr_123":{
> > > "shard":"shard1",
> > > "roles":null,
> > > "leader":"true",
> > > "state":"active",
> > > "core":"123",
> > > "collection":"123",
> > > "node_name":"virtual-instance-1:8983_solr",
> > > "base_url":"http://virtual-instance-1:8983/solr"},
> > >   "**virtual-instance-2**:8983_solr_123":{*
> > > *"shard":"shard1",
> > > "roles":null,
> > > "state":"recovering",
> > > "core":"123",
> > > "collection":"123",
> > > "node_name":"virtual-instance-2:8983_solr",
> > > "base_url":"http://virtual-instance-2:8983/solr"}}},*
> > >
> > >
> > > The exception throws is on the first virtual instance:
> > >
> > > *Jun 13, 2012 2:18:40 PM org.apache.solr.common.SolrException log*
> > > *SEVERE: null:org.apache.lucene.store.LockObtainFailedException: Lock
> > > obtain timed out: NativeFSLock@/home/myuser/data/index/write.lock*
> > > * at org.apache.lucene.store.Lock.obtain(Lock.java:84)*
> > > * at org.apache.lucene.index.IndexWriter.(IndexWriter.java:607)*
> > > * at
> > > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:58)*
> > > * at
> > >
> >
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:112)
> > > *
> > > * at
> > >
> >
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:52)
> > > *
> > > * at
> > >
> >
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:364)
> > > *
> > > * at
> > >
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
> > > *
> > > * at
> > >
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> > > *
> > > * at
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
> > > *
> > > * at
> > >
> >
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
> > > *
> > > * at
> > >
> >
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> > > *
> > > * at
> > >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> > > *
> > > * at
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> > > *
> > > * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566)*
> > > * at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> > > *
> > > * at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> > > *
> > > * at
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> > > *
> > > * at
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> > > *
> > > * at
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHand

Re: SolrJ dependencies

Re: Solr PHP highload search

Re: Solr PHP highload search

Re: Exception when optimizing index

Re: Solr PHP highload search

Re: Sharding in SolrCloud

Re: Different sort for each facet

LockObtainFailedException after trying to create cores on second SolrCloud instance

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

Re: Different sort for each facet

Re: [DIH] Multiple repeat XPath stmts

Re: Getting maximum / minimum field value - slow query

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

Re: Getting maximum / minimum field value - slow query

Solr1.4 and threads ....

Re: Sharding in SolrCloud

Re: FilterCache - maximum size of document set

Re: Solr1.4 and threads ....

Re: Getting maximum / minimum field value - slow query

Re: FilterCache - maximum size of document set

Regarding number of documents

Re: Regarding number of documents

Re: Regarding number of documents

Re: Regarding number of documents

Re: Regarding number of documents

Re: Regarding number of documents

Re: Unexpected DIH behavior for onError attribute

Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

29 matches

Site Navigation

Mail list logo

Footer information