from:"Tim"

Re: PriceJunkie.com using solr!

2007-11-14 Thread Tim

Great job Mike. I'm working on a solr-based engine for our newspaper  
website and I am wondering what your server specs are for this site?


Ram? Processor? Dedicated vs. Shared? Tomcat, Jetty, other?

Any info like that which would be useful to help guage server  
requirements.


Maybe an idea of how many indexed records too? I realize project  
scopes vary, but at least we can ballpark it.


In my case I'm trying to figure if I need a dedicated server for this  
project.


Tim

Bangordailynews.com New Media

On Nov 14, 2007, at 8:22 PM, "Nick Jenkin" <[EMAIL PROTECTED]> wrote:


Hi
This is faceting, http://wiki.apache.org/solr/SolrFacetingOverview
-Nick

On Nov 14, 2007 1:21 AM, William Silva <[EMAIL PROTECTED]> wrote:

Hi Mike,
I´m checking out www.pricejunkie.com and Í would like to know how 
 do
you group the products and find the price range. Is it a batch  
process

? Are you using MoreLikeThis to do it ?
Thanks,
William.




ahait is wonderful.



2007/5/24, Mike Austin <[EMAIL PROTECTED]>:



Just one.



-Original Message-
From: James liu [EMAIL PROTECTED]
Sent: Wednesday, May 16, 2007 10:30 PM
To: solr-user@lucene.apache.org
Subject: Re: PriceJunkie.com using solr!




how many solr instance?




2007/5/17, Yonik Seeley <[EMAIL PROTECTED]>:

Congrats, very nice job!
It's fast too.

-Yonik

On 5/16/07, Mike Austin <[EMAIL PROTECTED]> wrote:

I just wanted to say thanks to everyone for the creation of

solr.  I've

been

using it for a while now and I have recently brought one of my side

projects

online.  I have several other projects that will be using solr for

it's

search and facets.

Please check out www.pricejunkie.com and let us know what you  
think..

You

can give feedback and/or sign up on the mailing list for future

updates.
The site is very basic right now and many new and useful features  
plus

merchants and product categories will be coming soon!  I thought it

would be
a good idea to at least have a few people use it to get some  
feedback

early

and often.

Some of the nice things behind the scenes that we did with solr:
- created custom request handlers that have category to facet to

attribute

caching built in
- category to facet management
   - ability to manage facet groups (attributes within a set

facet)

and assign

them to categories
   - ability to create any category structure and share facet

groups


- facet inheritance for any category (a facet group can be  
defined on

a

parent category and pushed down to all children)
- ability to create sub-categories as facets instead of normal sub
categories
- simple xml configuration for the final outputted category

configuration

file


I'm sure there are more cool things but that is all for now.   
Join the

mailing list to see more improvements in the future.

Also.. how do I get added to the Using Solr wiki page?


Thanks,
Mike Austin

Re: Exception in SOLR when querying for fields of type string

2007-11-15 Thread Tim

In scheme.XML in your example/solr/conf folder look for  
[text]


[text] is the field you'll want to be the default field.

You'll have to restart Solr to make the change take affect


Bangordailynews.com New Media

On Nov 14, 2007, at 4:29 PM, Kasi Sankaralingam <[EMAIL PROTECTED]>  
wrote:


No I do not have a default field defined, how would you do that?  
Thanks a lot, kasi


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of  
Yonik Seeley

Sent: Tuesday, November 13, 2007 5:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Exception in SOLR when querying for fields of type string

On Nov 13, 2007 6:23 PM, Kasi Sankaralingam <[EMAIL PROTECTED]>  
wrote:

It is not tokenized, it is a string field, so will it still match
"photo" for field 'title_s' and "book" for the default field?


Yes, because the query parser splits up things by whitespace before
analyzers are even applied.
Do you have a default field defined?

-Yonik

[CDCR]Unable to locate core

2019-01-30 Thread Tim

I'm trying to setup CDCR but I'm running into an issue where one or two
shards/replicas will not be replicated but the rest will out of the six
cores.

The only error that appears in the logs is: "Unable to locate core". 

Occasionally restarting the instance will fix this but then the issue will
repeat itself next time there is an update to the source collection. But it
will not necessarily happen to the same core again.

Has anyone run into an error such as this before? 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: [CDCR]Unable to locate core

2019-02-01 Thread Tim

After some more investigation it seems that we're running into the  same bug
found here   .

However if my understanding is correct that bug in 7.3 was patched out.
Unfortunately we're running into the same behavior in 7.5

CDCR is replicating successfully to the leader node but is not replicating
to the followers.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: [CDCR]Unable to locate core

2019-02-02 Thread Tim

Thank you for the reply. Sorry I did not include more information in the
first post. 

So maybe there's some confusion here from my end. So both the target and
source clusters are running in cloud mode. So I think you're correct that it
is a different issue. So it looks like the source leader to target leader is
successful but the target leader is then unsuccessful in replicating to its
followers.

The "unable to locate core" message is originally coming from the target
cluster. 
*Here are the logs being generated from the source for reference:*
2019-02-02 20:10:19.551 INFO 
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager CDCR
bootstrap successful in 3 seconds
2019-02-02 20:10:19.564 INFO 
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Create
new update log reader for target testcollection with checkpoint
1624389130873995265 @ testcollection:shard3
2019-02-02 20:10:19.568 ERROR
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Unable to
bootstrap the target collection testcollection shard: shard3
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://targethost001.com:30100/solr: Unable to locate core
testcollection_shard2_replica_n4
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollower(CdcrReplicatorManager.java:439)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollowers(CdcrReplicatorManager.java:428)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager.access$300(CdcrReplicatorManager.java:63)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable.run(CdcrReplicatorManager.java:306)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_192]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_192]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_192]

Re: [CDCR]Unable to locate core

2019-02-07 Thread Tim

So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion (implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it sends
the recovery command to the node which has the leader for a given replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
while the follower s3r8 is on node2, then the core recovery command meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Geospatial search question - document with multiple locations

2015-12-24 Thread Tim Hearn

Hi everyone,

Suppose I have the following fields in my schema:





And I index multiple latlon coordinates to a document.

Then I do a geofilt search against my index.  When I do that geofilt
search, will ALL locations associated with that document have to be within
the 'circle' produced by geofilt, or will only ONE?

Example:

doc1: {
...
latlon:-20,30
latlon:140,-70
...
}

Will the query fq={!geofilt p=-20.1,30.1 sfield=latlon d=150} match doc1
because the point -20,30 is within the query?  Or will it not because even
though the point -20,30 is within the query, the point 140,-70 is not?

Thanks much!
Tim

Re: mlt and document boost

2015-12-24 Thread Tim Hearn

One workaround is to use the 'important terms' feature to grab the query
generated by the MLT handler, then parse that list into your own solr query
to use through a standard search handler.  That way, you can get the same
results as if you used the MLT handler, and you can also use filter
querying, highlighting, etc.

Note:  I am currently running a Solr 5.0.0 Single-Core installation

On Thu, Dec 24, 2015 at 11:57 AM, Upayavira  wrote:

> Which morelikethis are you using? Handler, SearchComponent or
> QueryParser?
>
> You should be a able to wrap the mlt query parser with the boost query
> parser with no problem.
>
> Upayavira
>
> On Thu, Dec 24, 2015, at 05:18 AM, Binoy Dalal wrote:
> > Have you tried applying the boosts to individual fields with mlt.qf?
> > Optionally, you could get the patch that is on jira and integrate it into
> > your code if you're so inclined.
> >
> > On Thu, 24 Dec 2015, 03:17 CrazyDiamond  wrote:
> >
> > > So no way to apply boost to mlt or any other way to change order of
> > > document
> > > in mlt result? also may be there is a way to make to mlt query  at
> once and
> > > merge.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> http://lucene.472066.n3.nabble.com/mlt-and-document-boost-tp4246522p4247154.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> > --
> > Regards,
> > Binoy Dalal
>

Re: Running Lucene/SOR on Hadoop

2016-01-04 Thread Tim Williams

Apache Blur (Incubating) has several approaches (hive, spark, m/r)
that could probably help with this ranging from very experimental to
stable.  If you're interested, you can ask over on
blur-u...@incubator.apache.org ...

Thanks,
--tim

On Fri, Dec 25, 2015 at 4:28 AM, Dino Chopins  wrote:
> Hi Erick,
>
> Thank you for your response and pointer. What I mean by running Lucene/SOLR
> on Hadoop is to have Lucene/SOLR index available to be queried using
> mapreduce or any best practice recommended.
>
> I need to have this mechanism to do large scale row deduplication. Let me
> elaborate why I need this:
>
>1. I have two data sources with 35 and 40 million records of customer
>profile - the data come from two systems (SAP and MS CRM)
>2. Need to index and compare row by row of the two data sources using
>name, address, birth date, phone and email field. For birth date and email
>it will use exact comparison, but for the other fields will use
>probabilistic comparison. Btw, the data has been normalized before they are
>being indexed.
>3. Each finding will be categorized under same person, and will be
>deduplicated automatically or under user intervention depending on the
>score.
>
> I usually use it using Lucene index on local filesystem and use term
> vector, but since this will be repeated task and then challenged by
> management to do this on top of Hadoop cluster I need to have a framework
> or best practice to do this.
>
> I understand that to have Lucene index on HDFS is not very appropriate
> since HDFS is designed for large block operation. With that understanding,
> I use SOLR and hope to query it using http call from mapreduce job.  The
> snippet code is below.
>
> url = new URL(SOLR-Query-URL);
>
> HttpURLConnection connection = (HttpURLConnection)
> url.openConnection();
> connection.setRequestMethod("GET");
>
> The later method turns out to perform very bad. The simple mapreduce job
> that only read the data sources and write to hdfs takes 15 minutes, but
> once I do the http request it takes three hours now and still ongoing.
>
> What went wrong? And what will be solution to my problem?
>
> Thanks,
>
> Dino
>
> On Mon, Dec 14, 2015 at 12:30 AM, Erick Erickson 
> wrote:
>
>> First, what do you mean "run Lucene/Solr on Hadoop"?
>>
>> You can use the HdfsDirectoryFactory to store Solr/Lucene
>> indexes on Hadoop, at that point the actual filesystem
>> that holds the index is transparent to the end user, you just
>> use Solr as you would if it was using indexes on the local
>> file system. See:
>> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
>>
>> If you want to use Map-Reduce to _build_ indexes, see the
>> MapReduceIndexerTool in the Solr contrib area.
>>
>> Best,
>> Erick
>>
>
>
>
>
> --
> Regards,
>
> Dino

NPE when executing clustering query search

2016-03-22 Thread Tim Hearn

Hi everyone,

I am trying to execute a clustering query to my single-core master-slave
solr setup and it is returning a NullPointerException.  I checked the line
in the source code where it is being thrown, and it looks like the null
object is some sort of 'filt' object, which doesn't make sense.  Below is
the query, my schema, solrconfig, and the exception.  If anyone could
please help that would be great!

Thank you!

QUERY:

1510649 [qtp1855032000-20] INFO  org.apache.solr.core.SolrCore  û
[collection1] webapp=/solr
path=/clustering
params{
mlt.minwl=3&
mlt.boost=true&
mlt.fl=textpropertymlt&
sort=score+desc&
carrot.snippet=impnoteplain&
mlt.mintf=1&
qf=concept_name&
mlt.interestingTerms=details&
wt=javabin&
clustering.engine=lingo&
version=2&
rows=500&
mlt.mindf=2&
debugQuery=true&
fl=id,concept_name,impnoteplain&
start=0&
q=id:567065dc658089be9f5c2c0d5670653d658089be9f5c2ae2&
carrot.title=concept_name&
clustering.results=true&
qt=/clustering&
fq=storeid:5670653d658089be9f5c2ae2&
fq={!edismax+v%3D''+qf%3D'textpropertymlt'+mm%3D'2<40%25'}&carrot.url=id&clustering=true}
status=500 QTime=217

ERROR:

1510697 [qtp1855032000-20] ERROR org.apache.solr.servlet.SolrDispatchFilter
 û null:java.lang.NullPointerException
at
org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1416)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:586)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:511)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:235)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)


SCHEMA.XML:

   
   
   
   

   
   

   
   
   
   
   
   

   


   
   

   

   
   

   
   
   
   c

   
   
   

   



   

   

   
   

   
   

   

   


SOLR CONFIG.XML:

  



  clustering

Re: Indexing 700 docs per second

2016-04-19 Thread Tim Robertson

Hi Mark,

We were putting in and updating docs of around 20-25 indexed fields (mainly
INTs, but some Strings and multivalue fields) at >1000/sec on far lesser
hardware and a total of 600 million docs (batch updates of course) while
also serving live queries for a website which had about 30 concurrent users
steady state (not all hitting SOLR though).

It seems realistic with that kind of hardware in my experience, but you
didn't mention what else was going on that might affect it (e.g. reads).

HTH,
Tim


On Tue, Apr 19, 2016 at 7:12 PM, Erick Erickson 
wrote:

> Make very sure you batch updates though.
> Here's a benchmark I ran:
> https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/
>
> NOTE: it's not entirely clear that you want to
> put 122M docs on a single shard. Depending on the queries
> you'll run you may want 2 or more shards, but that depends
> on the query pattern and your SLAs. Here's the long version
> of "you really have to load test this":
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best,
> Erick
>
> On Tue, Apr 19, 2016 at 6:48 AM, Susheel Kumar 
> wrote:
> >  It sounds achievable with your machine configuration and i would suggest
> > to try out atomic update.  Use SolrJ with multi-threaded indexing for
> > higher indexing rate.
> >
> > Thanks,
> > Susheel
> >
> >
> >
> > On Tue, Apr 19, 2016 at 9:27 AM, Tom Evans 
> wrote:
> >
> >> On Tue, Apr 19, 2016 at 10:25 AM, Mark Robinson <
> mark123lea...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I have a requirement to index (mainly updation) 700 docs per second.
> >> > Suppose I have a 128GB RAM, 32 CPU machine, with each doc size around
> 260
> >> > byes (6 fields out of which only 2 will undergo updation at the above
> >> > rate). This collection has around 122Million docs and that count is
> >> pretty
> >> > much a constant.
> >> >
> >> > 1. Can I manage this updation rate with a non-sharded ie single Solr
> >> > instance set up?
> >> > 2. Also is atomic update or a full update (the whole doc) of the
> changed
> >> > records the better approach in this case.
> >> >
> >> > Could some one please share their views/ experience?
> >>
> >> Try it and see - everyone's data/schemas are different and can affect
> >> indexing speed. It certainly sounds achievable enough - presumably you
> >> can at least produce the documents at that rate?
> >>
> >> Cheers
> >>
> >> Tom
> >>
>

solr errors integrating with drupal

2015-09-09 Thread Tim Dunphy

Hey guys,

 I've setup a slightly older version of solr (4.10) with apache tomcat
7.0.64. And set up some drupal configurations according to this guide:

http://duntuk.com/how-install-apache-solr-46-apache-tomcat-7-use-drupal


Everything seemed to work after I copied the log4j libraries to the correct
location, which this tutorial leaves out.

But I find that I am now getting these errors in the solr logs when I load
up solr in the web browser:

Time (Local)LevelLoggerMessage9/9/2015, 5:19:41 PMWARNSolrResourceLoaderCan't
find (or read) directory to add to classloader:
../../../contrib/extraction/lib (resolved as:
/usr/local/tomcat/solr/drupal/../../../contrib/extraction/lib).9/9/2015,
5:19:41 PMWARNSolrResourceLoaderCan't find (or read) directory to add to
classloader: ../../../contrib/clustering/lib/ (resolved as:
/usr/local/tomcat/solr/drupal/../../../contrib/clustering/lib).9/9/2015,
5:19:42 PMWARNSolrResourceLoaderSolr loaded a deprecated plugin/analysis
class [solr.FloatField]. Please consult documentation how to replace it
accordingly.9/9/2015, 5:19:42 PMWARNSolrResourceLoaderSolr loaded a
deprecated plugin/analysis class [solr.DateField]. Please consult
documentation how to replace it accordingly.9/9/2015, 5:19:42
PMWARNSolrCore[drupal]
Solr index directory
'/usr/local/apache-tomcat-7.0.64/solr/drupal/data/index' doesn't exist.
Creating new index...9/9/2015, 5:19:42 PMWARNRequestHandlersMultiple
requestHandler registered to the same name: /update ignoring:
org.apache.solr.handler.UpdateRequestHandler9/9/2015, 5:19:42 PMWARN
RequestHandlersMultiple requestHandler registered to the same name:
/update/csv ignoring: org.apache.solr.handler.UpdateRequestHandler9/9/2015,
5:19:42 PMWARNRequestHandlersMultiple requestHandler registered to the same
name: /update/json ignoring: org.apache.solr.handler.UpdateRequestHandler

How can I correct these errors? I'll be able to show you whatever config
files you think may lead to a solution. I'll just need to know which ones
to show you, as I am still new to solr.

Thanks!
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: solr training

2015-09-13 Thread Tim Dunphy

Cool, I'll check it out. Thanks!

On Sun, Sep 13, 2015 at 9:53 PM, Otis Gospodnetić <
otis.gospodne...@gmail.com> wrote:

> Hi Tim,
>
> A slightly delayed reply ;)
> We are running Solr training in NYC next month -
> http://sematext.com/training/solr-training.html - 2nd seat is 50% off.
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Fri, May 1, 2015 at 2:18 PM, Tim Dunphy  wrote:
>
> > Hey guys,
> >
> >  My company has a training budget that it wants me to use. So what I'd
> like
> > to find out is if there is any instructor lead courses in the NY/NJ area,
> > or courses online that are instructor lead that you could recommend?
> >
> > Thanks,
> > Tim
> >
> > --
> > GPG me!!
> >
> > gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> >
>



-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: solr training

2015-09-17 Thread Tim Dunphy

>
> How about in Denver?


Nah dude. I'm in Jersey. Denver's like a half a country away!

On Thu, Sep 17, 2015 at 12:18 AM, William Bell  wrote:

> How about in Denver?
>
> On Sun, Sep 13, 2015 at 7:53 PM, Otis Gospodnetić <
> otis.gospodne...@gmail.com> wrote:
>
> > Hi Tim,
> >
> > A slightly delayed reply ;)
> > We are running Solr training in NYC next month -
> > http://sematext.com/training/solr-training.html - 2nd seat is 50% off.
> >
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Fri, May 1, 2015 at 2:18 PM, Tim Dunphy  wrote:
> >
> > > Hey guys,
> > >
> > >  My company has a training budget that it wants me to use. So what I'd
> > like
> > > to find out is if there is any instructor lead courses in the NY/NJ
> area,
> > > or courses online that are instructor lead that you could recommend?
> > >
> > > Thanks,
> > > Tim
> > >
> > > --
> > > GPG me!!
> > >
> > > gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> > >
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Query to count matching terms and disable 'coord' multiplication

2015-10-06 Thread Tim Hearn

Hello everyone,

I have two questions

1) Is there a way to query solr to rank results based purely on the amount
of terms in the query which are contained in the document?
Example:
doc1: 'foo bar poo car foo'
q1: 'foo, car, two, start'
score(doc1, q1) = 2 (since both foo and car both occur in doc1 - never mind
that foo occurs twice)

This is also the numerator in the coord query

2) Is there a way to disable the 'coord' and 'query norm' multiplication of
query results all together?

solr training

2015-05-02 Thread Tim Dunphy

Hey guys,

 My company has a training budget that it wants me to use. So what I'd like
to find out is if there is any instructor lead courses in the NY/NJ area,
or courses online that are instructor lead that you could recommend?

Thanks,
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

apache 5.1.0 under apache web server

2015-05-04 Thread Tim Dunphy

Hey all,

I need to run solr 5.1.0 on port 80 with some basic apache authentication.
Normally, under earlier versions of solr I would set it up to run under
tomcat, then connect it to apache web server using mod_jk.

However 5.1.0 seems totally different. I see that tomcat support has been
removed from the latest versions. So how do I set this up in front of
apache web server? I need to get this running on port 443 with SSL and
something at least equivalent to basic apache auth.

I really wish this hadn't changed because I could set this up under the old
method rather easily and quickly. Sigh..

But thank you for your advice!

Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Tim Dunphy

>
> The container in the default 5.x install is a completely unmodified
> Jetty 8.x (soon to be Jetty 9.x) with a stripped and optimized config.
> The config for Jetty is similar to tomcat, you just need to figure out
> how to make it work with Apache like you would with Tomcat.
>
> Incidentially, at least for right now, you CAN still take the .war file
> out of the jetty install and put it in Tomcat just like you would have
> with a 4.3 or later version.  We are planning on making that impossible
> in a later 5.x version, but for right now, it is still possible.

Hmm well of the two options you present the second one sounds a little
easier and more attractive. However, when I tried doing just that like so:

[root@aoadbld00032la ~]# cp -v solr-5.1.0/server/webapps/solr.war
/usr/local/tomcat/webapps/
`solr-5.1.0/server/webapps/solr.war' -> `/usr/local/tomcat/webapps/solr.war'

And then start tomcat up... I can't get to the solr interface :(

HTTP Status 503 - Server is shutting down or failed to initialize

*type* Status report

*message* *Server is shutting down or failed to initialize*

*description* *The requested service is not currently available.*
--
Apache Tomcat/8.0.21Not seeing anything telling in the logs, unfortunately:

[root@aoadbld00032la ~]# tail /usr/local/tomcat/logs/catalina.out
04-May-2015 15:48:26.945 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web
application directory /usr/local/apache-tomcat-8.0.21/webapps/ROOT has
finished in 32 ms
04-May-2015 15:48:26.946 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deploying web
application directory /usr/local/apache-tomcat-8.0.21/webapps/host-manager
04-May-2015 15:48:26.979 INFO [localhost-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
04-May-2015 15:48:26.983 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web
application directory /usr/local/apache-tomcat-8.0.21/webapps/host-manager
has finished in 36 ms
04-May-2015 15:48:26.983 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deploying web
application directory /usr/local/apache-tomcat-8.0.21/webapps/examples
04-May-2015 15:48:27.195 INFO [localhost-startStop-1]
org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned
for TLDs yet contained no TLDs. Enable debug logging for this logger for a
complete list of JARs that were scanned but no TLDs were found in them.
Skipping unneeded JARs during scanning can improve startup time and JSP
compilation time.
04-May-2015 15:48:27.245 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web
application directory /usr/local/apache-tomcat-8.0.21/webapps/examples has
finished in 262 ms
04-May-2015 15:48:27.248 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
["http-nio-8080"]
04-May-2015 15:48:27.257 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
["ajp-nio-8009"]
04-May-2015 15:48:27.258 INFO [main]
org.apache.catalina.startup.Catalina.start Server startup in 3350 ms

However it sounds like you're sure it's supposed to work this way. Can I
get some advice on this error?

Thanks
Tim

On Mon, May 4, 2015 at 3:12 PM, Shawn Heisey  wrote:

> On 5/4/2015 1:04 PM, Tim Dunphy wrote:
> > I need to run solr 5.1.0 on port 80 with some basic apache
> authentication.
> > Normally, under earlier versions of solr I would set it up to run under
> > tomcat, then connect it to apache web server using mod_jk.
> >
> > However 5.1.0 seems totally different. I see that tomcat support has been
> > removed from the latest versions. So how do I set this up in front of
> > apache web server? I need to get this running on port 443 with SSL and
> > something at least equivalent to basic apache auth.
> >
> > I really wish this hadn't changed because I could set this up under the
> old
> > method rather easily and quickly. Sigh..
> >
> > But thank you for your advice!
>
> The container in the default 5.x install is a completely unmodified
> Jetty 8.x (soon to be Jetty 9.x) with a stripped and optimized config.
> The config for Jetty is similar to tomcat, you just need to figure out
> how to make it work with Apache like you would with Tomcat.
>
> Incidentially, at least for right now, you CAN still take the .war file
> out of the jetty install and put it in Tomcat just like you would have
&g

solr 3.6.2 under tomcat 8 missing corename in path

2015-05-06 Thread Tim Dunphy

I'm trying to setup an old version of Solr for one of our drupal
developers. Apparently only versions 1.x or 3.x will work with the current
version of drupal.

I'm setting up solr 3.4.2 under tomcat.

And I'm getting this error when I start tomcat and surf to the /solr/admin
URL:

 HTTP Status 404 - missing core name in path

type Status report

message missing core name in path

description The requested resource is not available.

I have solr living in /opt:

# ls -ld /opt/solr
lrwxrwxrwx. 1 root root 17 May  6 12:48 /opt/solr -> apache-solr-3.6.2

And I have my cores located here:

# ls -ld /opt/solr/admin/cores
drwxr-xr-x. 3 root root 4096 May  6 14:37 /opt/solr/admin/cores

Just one core so far, until I can get this working.

# ls -l /opt/solr/admin/cores/
total 4
drwxr-xr-x. 5 root root 4096 May  6 14:08 collection1

I have this as my solr.xml file:


 
   


Which is located in these two places:

# ls -l /opt/solr/solr.xml /usr/local/tomcat/conf/Catalina/solr.xml
-rw-r--r--. 1 root root 169 May  6 14:38 /opt/solr/solr.xml
-rw-r--r--. 1 root root 169 May  6 14:38
/usr/local/tomcat/conf/Catalina/solr.xml

These are the contents of my /opt/solr directory

# ls -l  /opt/solr/
total 436
drwxr-xr-x.  3 root root   4096 May  6 14:37 admin
-rw-r--r--.  1 root root 176647 Dec 18  2012 CHANGES.txt
drwxr-xr-x.  3 root root   4096 May  6 12:48 client
drwxr-xr-x.  9 root root   4096 Dec 18  2012 contrib
drwxr-xr-x.  3 root root   4096 May  6 12:48 dist
drwxr-xr-x.  3 root root   4096 May  6 12:48 docs
-rw-r--r--.  1 root root   1274 May  6 13:28 elevate.xml
drwxr-xr-x. 11 root root   4096 May  6 12:48 example
-rw-r--r--.  1 root root  81331 Dec 18  2012 LICENSE.txt
-rw-r--r--.  1 root root  20828 Dec 18  2012 NOTICE.txt
-rw-r--r--.  1 root root   5270 Dec 18  2012 README.txt
-rw-r--r--.  1 root root  55644 May  6 13:27 schema.xml
-rw-r--r--.  1 root root  60884 May  6 13:27 solrconfig.xml
-rw-r--r--.  1 root root169 May  6 14:38 solr.xml


Yet, when I bounce tomcat, this is the result that I get:

HTTP Status 404 - missing core name in path

type Status report

message missing core name in path

description The requested resource is not available.

Cany anyone tell me what I'm doing wrong?


Thanks!!
Tim


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Re: solr 3.6.2 under tomcat 8 missing corename in path

2015-05-07 Thread Tim Dunphy

Hi Shawn,


> The URL must include the core name.  Your defaultCoreName is
> collection1, and I'm guessing you don't have a core named collection1.
> Try browsing to just /solr instead of /solr/admin ... you should get a
> list of links for valid cores, each of which will take you to the admin
> page for that core.
> Probably what you will find is that when you click on one of those
> links, you will end up on /solr/corename/admin.jsp as the URL in your
> browser.


When I browse to /solr I see a link that points me to /solr/admin. And when
I click on that link is when I see the error:

*missing core name in path*


I think what my problem is, is that I am not listing the cores correctly in
the solr.xml file.

This is what I have in my solr.xml file:

  
 
   


So what I did was create a directory at solr/admin/cores and put
collection1 there:

[root@aoadbld00032la solr]# ls -ld admin/cores/collection1
drwxr-xr-x. 5 root root 4096 May  6 17:29 admin/cores/collection1

So, if I assume correctly, that the way I reference the collection1
directory is the problem, how can I express this differently in my solr.xml
file so that it works?

Thanks,
Tim



On Wed, May 6, 2015 at 8:00 PM, Shawn Heisey  wrote:

> On 5/6/2015 2:29 PM, Tim Dunphy wrote:
> > I'm trying to setup an old version of Solr for one of our drupal
> > developers. Apparently only versions 1.x or 3.x will work with the
> current
> > version of drupal.
> >
> > I'm setting up solr 3.4.2 under tomcat.
> >
> > And I'm getting this error when I start tomcat and surf to the
> /solr/admin
> > URL:
> >
> >  HTTP Status 404 - missing core name in path
> >
> > type Status report
> >
> > message missing core name in path
> >
> > description The requested resource is not available.
>
> The URL must include the core name.  Your defaultCoreName is
> collection1, and I'm guessing you don't have a core named collection1.
>
> Try browsing to just /solr instead of /solr/admin ... you should get a
> list of links for valid cores, each of which will take you to the admin
> page for that core.
>
> Probably what you will find is that when you click on one of those
> links, you will end up on /solr/corename/admin.jsp as the URL in your
> browser.
>
> Thanks,
> Shawn
>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

NPE when Faceting with MoreLikeThis handler in Solr 5.1.0

2015-05-14 Thread Tim Hearn

Hi everyone,

Recently I upgraded to solr 5.1.0.  When trying to generate facets using
the more like this handler, I now get a a NullPointerException.  I never
got this exception while using Solr 4.10.0 Details are below:

Stack Trace:
at
org.apache.solr.request.SimpleFacets.getHeatmapCounts(SimpleFacets.java:1555)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:284)
at
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:233)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


Query:
qt=/mlt&
q=id:545dbb57b54c2403f286050e546dcdcab54cf2d074e5a2f7&
mlt.mindf=5&
mlt.mintf=1&
mlt.minwl=3&
mlt.boost=true&
fq=storeid:546dcdcab54cf2d074e5a2f7&
mlt.fl=overview_mlt,abstract_mlt,description_mlt,company_profile_mlt,bio_mlt&
mlt.interestingTerms=details&
fl=conceptid,score
&sort=score desc&
start=0&
rows=2&
facet=true&
facet.field=tags&
facet.field=locations&
facet.mincount=1&
facet.method=enum&
facet.limit=-1&
facet.sort=count

Schema.xml(relevant parts):
   

   

   


solrconfig.xml(relevant parts):

NPE when Faceting with MoreLikeThis handler in Solr 5.1.0

2015-05-15 Thread Tim Hearn

Hi everyone,

Recently I upgraded to solr 5.1.0.  When trying to generate facets using
the more like this handler, I now get a a NullPointerException.  I never
got this exception while using Solr 4.10.0 Details are below:

Stack Trace:
at
org.apache.solr.request.SimpleFacets.getHeatmapCounts(SimpleFacets.java:1555)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:284)
at
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:233)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


Query:
qt=/mlt&
q=id:545dbb57b54c2403f286050e546dcdcab54cf2d074e5a2f7&
mlt.mindf=5&
mlt.mintf=1&
mlt.minwl=3&
mlt.boost=true&
fq=storeid:546dcdcab54cf2d074e5a2f7&
mlt.fl=overview_mlt,abstract_mlt,description_mlt,company_profile_mlt,bio_mlt&
mlt.interestingTerms=details&
fl=conceptid,score
&sort=score desc&
start=0&
rows=2&
facet=true&
facet.field=tags&
facet.field=locations&
facet.mincount=1&
facet.method=enum&
facet.limit=-1&
facet.sort=count

Schema.xml(relevant parts):
   

   

   


solrconfig.xml(relevant parts):

NPE with faceting query on MoreLikeThis handler

2015-05-18 Thread Tim Hearn

Hi everyone,

Recently I upgraded to solr 5.1.0.  When trying to generate facets using
the more like this handler, I now get a a NullPointerException.  I never
got this exception while using Solr 4.10.0 Details are below:

Stack Trace:
at
org.apache.solr.request.SimpleFacets.getHeatmapCounts(SimpleFacets.java:1555)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:284)
at
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:233)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


Query:
qt=/mlt&
q=id:545dbb57b54c2403f286050e546dcdcab54cf2d074e5a2f7&
mlt.mindf=5&
mlt.mintf=1&
mlt.minwl=3&
mlt.boost=true&
fq=storeid:546dcdcab54cf2d074e5a2f7&
mlt.fl=overview_mlt,abstract_mlt,description_mlt,company_profile_mlt,bio_mlt&
mlt.interestingTerms=details&
fl=conceptid,score
&sort=score desc&
start=0&
rows=2&
facet=true&
facet.field=tags&
facet.field=locations&
facet.mincount=1&
facet.method=enum&
facet.limit=-1&
facet.sort=count

Schema.xml(relevant parts):
   

   

   


solrconfig.xml(relevant parts):

NPE when faceting with MLT Query from upgrade to Solr 5.1.0

2015-05-18 Thread Tim H

Hi everyone,

Recently I upgraded to solr 5.1.0.  When trying to generate facets using
the more like this handler, I now get a a NullPointerException.  I never
got this exception while using Solr 4.10.0 Details are below:

Stack Trace:
at
org.apache.solr.request.SimpleFacets.getHeatmapCounts(SimpleFacets.java:1555)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:284)
at
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:233)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


Query:
qt=/mlt&
q=id:545dbb57b54c2403f286050e546dcdcab54cf2d074e5a2f7&
mlt.mindf=5&
mlt.mintf=1&
mlt.minwl=3&
mlt.boost=true&
fq=storeid:546dcdcab54cf2d074e5a2f7&
mlt.fl=overview_mlt,abstract_mlt,description_mlt,company_profile_mlt,bio_mlt&
mlt.interestingTerms=details&
fl=conceptid,score
&sort=score desc&
start=0&
rows=2&
facet=true&
facet.field=tags&
facet.field=locations&
facet.mincount=1&
facet.method=enum&
facet.limit=-1&
facet.sort=count

Schema.xml(relevant parts):
   

   

   


solrconfig.xml(relevant parts):

Solr Cloud 2nd Server Recover Stuck

2016-06-29 Thread Tim Chen

Hi,

I need some help please.

I am running Solr Cloud 4.10.4, with ensemble ZooKeeper.

Server A running Solr Cloud + ZooKeeper
Server B running Solr Cloud + ZooKeeper
Server C running ZooKeeper only.

For some reason Server B is crashed and all data lost. I have cleaned it up, 
deleted all existing collection index files and start up the Solr service fresh.

If a Collection that has only 1 shard, Server B has managed to create and 
replicate from Server A:
   SolrCore [collection1] Solr index directory 
'/collection1/data/index' doesn't exist. Creating new index...

If a Collection that has 2 shards, Server B doesn't seem to be doing anything. 
The Collection was configured 2 shards and 2 replication originally.

Here is the Clusterstate.json from ZooKeeper.

Collection1 has only 1 shard.
Collection cr_dev has 2 shards, one is on server A, one was on server B.
Server A: 10.1.11.70
Server B: 10.2.11.244

Is it because "autoCreated" is missing from collection cr_dev? How do I set 
this? API call?

"collection1":{
"shards":{"shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{
  "core_node1":{
"state":"active",
"core":"collection1",
"node_name":"10.1.11.70:8983_solr",
"base_url":"http://10.1.11.70:8983/solr";,
"leader":"true"},
  "core_node2":{
"state":"active",
"core":"collection1",
"node_name":"10.2.11.244:8983_solr",
"base_url":"http://10.2.11.244:8983/solr",
"maxShardsPerNode":"1",
"router":{"name":"compositeId"},
"replicationFactor":"1",
"autoAddReplicas":"false",
"autoCreated":"true"},
  "cr_dev":{
"shards":{
  "shard1":{
"range":"8000-",
"state":"active",
"replicas":{
  "core_node1":{
"state":"active",
"core":"cr_dev_shard1_replica1",
"node_name":"10.1.11.70:8983_solr",
"base_url":"http://10.1.11.70:8983/solr";,
"leader":"true"},
  "core_node4":{
"state":"down",
"core":"cr_dev_shard1_replica2",
"node_name":"10.2.11.244:8983_solr",
"base_url":"http://10.2.11.244:8983/solr"}}},
  "shard2":{
"range":"0-7fff",
"state":"active",
"replicas":{
  "core_node2":{
"state":"active",
"core":"cr_dev_shard2_replica1",
"node_name":"10.1.11.70:8983_solr",
"base_url":"http://10.1.11.70:8983/solr";,
"leader":"true"},
  "core_node3":{
"state":"down",
"core":"cr_dev_shard2_replica2",
"node_name":"10.2.11.244:8983_solr",
"base_url":"http://10.2.11.244:8983/solr",
"maxShardsPerNode":"2",
"router":{"name":"compositeId"},
"replicationFactor":"2",
"autoAddReplicas":"false"},

Many thanks,
Tim


[tour de france 2 july 8:30pm]<http://www.sbs.com.au/cyclingcentral/>

RE: Solr Cloud 2nd Server Recover Stuck

2016-06-29 Thread Tim Chen

Hi Erick,

I have followed your instruction to added as new replica and deleted the old 
replica - works great!

Everything back to normal now.

Thanks mate!

Cheers,
Tim

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Thursday, 30 June 2016 1:49 AM
To: solr-user
Subject: Re: Solr Cloud 2nd Server Recover Stuck

I'm assuming that 10.1.11.79 is server A here.

What this _looks_ like is that you deleted the entire
directory here:
cr_dev_shard1_replica2
cr_dev_shard2_replica2

but not
collection1

on server B. This is a little inconsistent, but I think the collection1
core naming was a little weird with the default collection in 4.10...

Anyway, if this is true then there'll be no
core.properties
file in cr_dev_blah blah.

So, Zookeeper has  a record of there being
such a thing, but it's not present on your sever B.
To Zookeeper, since the replica hasn't registered
itself it still looks like the machine is just down.

So here's what I'd try:
Well, first I'd back up server As index directories...

Use the Collections API ADDREPLICA command to
add a replica on Server B for each shard, use the "node"
parameter.

That should churn for a while but eventually create a replica
and sync it with the leader. Once that's done, use the DELETEREPLICA
to force Zookeeper to remove the traces of the original replicas on
server B.

Best,
Erick

On Wed, Jun 29, 2016 at 12:05 AM, Tim Chen  wrote:
> Hi,
>
> I need some help please.
>
> I am running Solr Cloud 4.10.4, with ensemble ZooKeeper.
>
> Server A running Solr Cloud + ZooKeeper
> Server B running Solr Cloud + ZooKeeper
> Server C running ZooKeeper only.
>
> For some reason Server B is crashed and all data lost. I have cleaned it up, 
> deleted all existing collection index files and start up the Solr service 
> fresh.
>
> If a Collection that has only 1 shard, Server B has managed to create and 
> replicate from Server A:
>SolrCore [collection1] Solr index directory 
> '/collection1/data/index' doesn't exist. Creating new index...
>
> If a Collection that has 2 shards, Server B doesn't seem to be doing 
> anything. The Collection was configured 2 shards and 2 replication originally.
>
> Here is the Clusterstate.json from ZooKeeper.
>
> Collection1 has only 1 shard.
> Collection cr_dev has 2 shards, one is on server A, one was on server B.
> Server A: 10.1.11.70
> Server B: 10.2.11.244
>
> Is it because "autoCreated" is missing from collection cr_dev? How do I set 
> this? API call?
>
> "collection1":{
> "shards":{"shard1":{
> "range":"8000-7fff",
> "state":"active",
> "replicas":{
>   "core_node1":{
> "state":"active",
> "core":"collection1",
> "node_name":"10.1.11.70:8983_solr",
> "base_url":"http://10.1.11.70:8983/solr";,
> "leader":"true"},
>   "core_node2":{
> "state":"active",
> "core":"collection1",
> "node_name":"10.2.11.244:8983_solr",
> "base_url":"http://10.2.11.244:8983/solr",
> "maxShardsPerNode":"1",
> "router":{"name":"compositeId"},
> "replicationFactor":"1",
> "autoAddReplicas":"false",
> "autoCreated":"true"},
>   "cr_dev":{
> "shards":{
>   "shard1":{
> "range":"8000-",
> "state":"active",
> "replicas":{
>   "core_node1":{
> "state":"active",
> "core":"cr_dev_shard1_replica1",
> "node_name":"10.1.11.70:8983_solr",
> "base_url":"http://10.1.11.70:8983/solr";,
> "leader":"true"},
>   "core_node4":{
> "state":"down",
> "core":"cr_dev_shard1_replica2",
> "node_name":"10.2.11.244:8983_solr",
> "base_url":"http://10.2.11.244:8983/solr"}}},
>   "shard2":{
> "range":"0-7fff",
> "state":"active",
> "replicas":{
>   "core_node2":{
> "state":"active",
> "core":"cr_dev_shard2_replica1",
> "node_name":"10.1.11.70:8983_solr",
> "base_url":"http://10.1.11.70:8983/solr";,
> "leader":"true"},
>   "core_node3":{
> "state":"down",
> "core":"cr_dev_shard2_replica2",
> "node_name":"10.2.11.244:8983_solr",
> "base_url":"http://10.2.11.244:8983/solr",
> "maxShardsPerNode":"2",
> "router":{"name":"compositeId"},
> "replicationFactor":"2",
> "autoAddReplicas":"false"},
>
> Many thanks,
> Tim
>
>
> [tour de france 2 july 8:30pm]<http://www.sbs.com.au/cyclingcentral/>


[tour de france 2 july 8:30pm]<http://www.sbs.com.au/cyclingcentral/>

Is it possible to force a Shard Leader change?

2016-07-26 Thread Tim Chen

Hi Guys,

I am running a Solr Cloud 4.10, with 4 Solr servers and 5 Zookeeper setup.

Solr servers:
solr01, solr02, solr03, solr04

I have around 20 collections in Solr cloud, and there are 4 Shards for each 
Collection. For each Shard, I have 4 Replicas, and sitting on each Solr server, 
with one of them is the Shard Leader.

The issue I am having right now is all the Shard Leader are pointing to the 
same server, eg: solr01.  When there are documents update, they are all pushed 
to the Leader. I really want to distribute the Shard Leader across all 4 Solr 
servers.

I noticed Solr 6 has a "REBALANCELEADERS" command to do that, but not available 
in Solr 4.

Questions:

1, Is my setup OK? with 4 Shards for each Collection and 4 Replicas for each 
Shard. Each Solr server has full set of documents.
2, To distribute the Shard Leader to different Solr servers, can I somehow 
shutdown a single Replica that is currently a Shard Leader and force Solr to 
elect a different replica to be new Shard Leader?

Thanks guys!

Regards,
Tim


[Roots Wednesday 27 July 8.30pm]<http://www.sbs.com.au/programs/roots/>

RE: Is it possible to force a Shard Leader change?

2016-07-28 Thread Tim Chen

Thanks Erick. You made the point, the CPU usage is not high on my Leader 
server, so I guess I will leave it.

And if I want to, as you suggested, I guess I could remove some replica from 
certain collections and add them back in to break the election order, then take 
JVM down to force an election.

Thanks again.

Cheers,
Tim

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Thursday, 28 July 2016 12:10 AM
To: solr-user
Subject: Re: Is it possible to force a Shard Leader change?

The REBALANCELEADERS stuff was put in to deal with 100s of leaders winding up 
on a single machine in a case where extremely high throughput was required. 
Until you get into pretty high scale the additional "work" on a leader is 
minimal. So unless your CPU usage is consistently significantly higher on the 
machine with all the leaders, I wouldn't worry about it.

Otherwise there isn't much you can do I'm afraid. If you have asymmetric 
replica placement leaders will tend to different machines.
You could try to take the JVM down on the machine with all the leaders and let 
leader election redistribute, but I that's not a long-term solution.

Best,
Erick

On Tue, Jul 26, 2016 at 9:27 PM, Tim Chen  wrote:
> Hi Guys,
>
> I am running a Solr Cloud 4.10, with 4 Solr servers and 5 Zookeeper setup.
>
> Solr servers:
> solr01, solr02, solr03, solr04
>
> I have around 20 collections in Solr cloud, and there are 4 Shards for each 
> Collection. For each Shard, I have 4 Replicas, and sitting on each Solr 
> server, with one of them is the Shard Leader.
>
> The issue I am having right now is all the Shard Leader are pointing to the 
> same server, eg: solr01.  When there are documents update, they are all 
> pushed to the Leader. I really want to distribute the Shard Leader across all 
> 4 Solr servers.
>
> I noticed Solr 6 has a "REBALANCELEADERS" command to do that, but not 
> available in Solr 4.
>
> Questions:
>
> 1, Is my setup OK? with 4 Shards for each Collection and 4 Replicas for each 
> Shard. Each Solr server has full set of documents.
> 2, To distribute the Shard Leader to different Solr servers, can I somehow 
> shutdown a single Replica that is currently a Shard Leader and force Solr to 
> elect a different replica to be new Shard Leader?
>
> Thanks guys!
>
> Regards,
> Tim
>
>
> [Roots Wednesday 27 July
> 8.30pm]<http://www.sbs.com.au/programs/roots/>

[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]<http://theworldgame.sbs.com.au/>

Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-04 Thread Tim Chen

Hi Guys,

Me again. :)

We have 5 Solr servers:
01 -04 running Solr version 4.10 and ZooKeeper service
05 running ZooKeeper only.

JVM Max Memory set to 10G.

We have around 20 collections, and for each collection, there are 4 shards, for 
each shard, there are 4 replica sitting across on 4 Solr servers.

Unfortunately most of time, all the Shards have the same Leader (eg, Solr 
server 01).

Now, If we are adding a lot of documents to Solr, and eventually Solr 01 (All 
Shard's Leader) throws Out of memory in Tomcat log, and service goes down (but 
8983 port is still responding to telnet).
At this moment, I went to see logs on Solr02, Solr03, Solr04, and there are a 
lot of "Connection time out", in another 2 minutes, all these three Solr 
servers' service goes down too!

My feeling is that, when there are a lot of documents pushing in, Leader will 
be busy with indexing, and also requesting other (non-leader) servers to do the 
index as well. All other non-leader server are relying on Leader to finish the 
new document index. At a certain point, that Solr01 (Leader) server has no more 
memory, it gives up, but other (non-leader) servers are still waiting for 
Leader to respond. The whole Solr Cloud cluster breaks from here  No more 
requests being served.

Couple of thoughts:
1, If Leader goes down, it should just go down, like dead down, so other 
servers can do the election and choose the new leader. This at least avoids 
bringing down the whole cluster. Am I right?
2, Apparently we should not pushing too many documents to Solr, how do you guys 
handle this? Set a limit somewhere?

Thanks,
Tim




[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]<http://theworldgame.sbs.com.au/>

RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-05 Thread Tim Chen

Thanks Guys. Very very helpful.

I will probably look at consolidate 4 Solr servers into 2 bigger/better server 
- it gives more memory, and it cut down the replica the Leader needs to manage.

Also, I may look into write a script to monitor the tomcat log and if there is 
OOM, kill tomcat, then restart it. A bit dirty, but may work for a short term.

I don't know too much about how documents indexed, and how to save memory from 
that. Will probably work with a developer on this as well.

Many Thanks guys.

Cheers,
Tim

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Friday, 5 August 2016 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

On 8/4/2016 8:14 PM, Tim Chen wrote:
> Couple of thoughts: 1, If Leader goes down, it should just go down,
> like dead down, so other servers can do the election and choose the
> new leader. This at least avoids bringing down the whole cluster. Am I
> right?

Supplementing what Erick told you:

When a typical Java program throws OutOfMemoryError, program behavior is 
completely unpredictable.  There are programming techniques that can be used so 
that behavior IS predictable, but writing that code can be challenging.

Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
option to execute a script when OutOfMemoryError happens.  This script kills 
Solr completely.  We are working on adding this capability when running on 
Windows.

> 2, Apparently we should not pushing too many documents to Solr, how do
> you guys handle this? Set a limit somewhere?

There are exactly two ways to deal with OOME problems: Increase the heap or 
reduce Solr's memory requirements.  The number of documents you push to Solr is 
unlikely to have a large effect on the amount of memory that Solr requires.  
Here's some information on this topic:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn

[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]<http://theworldgame.sbs.com.au/>

RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-07 Thread Tim Chen

Hi Erick, Shawn,

Thanks for following this up.

1,
For some reason, ramBufferSizeMB in our solrconfig.xml is not set to 100MB, but 
32MB.

In that case, considering we have 10G for JVM, my understanding is we should 
not run out of memory due to large number of documents being added to Solr.

Just to make sure I understand it correctly, the documents adding to Solr will 
be stored in an internal queue in Solr, and Solr will only use that 32MB (or 
99% of 32M + one extra document memory) for indexing documents. The documents 
in the queue will be indexed one by one.

2,
Based on our tomcat (Solr) access_log and website peak hours, the time we had 
our cluster failure is not likely because of _searching_traffic. Eg, we can see 
much more Solr requests with 'update' keyword, but as usual number of requests 
with 'select' keyword.

3,
Now, this leads me to the only reason I can think of: (you mentioned this 
earlier as well):
Since each Shard has 4 replicas in our setup, when there are large number of 
documents being add, the Leader will create a lot of threads to send the 
document to other replica servers. All these threads are the one consumed all 
the memory on Leader server, and leads to OOM.

If my assumption was right, to try or fix this issue, is to:
a): still need to limit the documents being add to Solr
b): change to 2 replica for each shard (loss of data reliability, but..)
c): bump up server memory.

Am I going the right way? Any advice and suggestions are much appreciated!!

Also attached part of catalina.out OOM log for reference:

Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8983-exec-6861" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8983-exec-6671" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)

Many thanks,
Tim


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, 6 August 2016 2:31 AM
To: solr-user
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

You don't really have to worry that much about memory consumed during indexing.
The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount of 
RAM consumed, when adding a doc if that limit is exceeded then the buffer is 
flushed.

So you can reduce that number, but it's default is 100M and if you're running 
that close to your limits I suspect you'd get, at best, a bit more runway 
before you hit the problem again.

NOTE: that number isn't an absolute limit, IIUC the algorithm is
> index a doc to the in-memory structures check if the limit is exceeded
> and flush if so.

So say you were at 99% of your ramBufferSizeMB setting and then indexed a 
ginormous doc your in-memory stuff might be significantly bigger.

Searching usually is the bigger RAM consumer, so when I say "a bit more runway" 
what I'm thinking about is that when you start _searching_ the data your memory 
requ

RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-07 Thread Tim Chen

Sorry Erick, forgot to answer your question:

No, I didn't increase the maxWarmingSearchers. It is set to 
2. I read it somewhere that 
increasing this is a risk.

Just to make sure, you didn't mean the "autowarmCount " in the 








-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, 6 August 2016 2:31 AM
To: solr-user
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

You don't really have to worry that much about memory consumed during indexing.
The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount of 
RAM consumed, when adding a doc if that limit is exceeded then the buffer is 
flushed.

So you can reduce that number, but it's default is 100M and if you're running 
that close to your limits I suspect you'd get, at best, a bit more runway 
before you hit the problem again.

NOTE: that number isn't an absolute limit, IIUC the algorithm is
> index a doc to the in-memory structures check if the limit is exceeded
> and flush if so.

So say you were at 99% of your ramBufferSizeMB setting and then indexed a 
ginormous doc your in-memory stuff might be significantly bigger.

Searching usually is the bigger RAM consumer, so when I say "a bit more runway" 
what I'm thinking about is that when you start _searching_ the data your memory 
requirements will continue to grow and you'll be back where you started.

And just as a sanity check: You didn't perchance increase the 
maxWarmingSearchers parameter in solrconfig.xml, did you? If so, that's really 
a red flag.

Best,
Erick

On Fri, Aug 5, 2016 at 12:41 AM, Tim Chen  wrote:
> Thanks Guys. Very very helpful.
>
> I will probably look at consolidate 4 Solr servers into 2 bigger/better 
> server - it gives more memory, and it cut down the replica the Leader needs 
> to manage.
>
> Also, I may look into write a script to monitor the tomcat log and if there 
> is OOM, kill tomcat, then restart it. A bit dirty, but may work for a short 
> term.
>
> I don't know too much about how documents indexed, and how to save memory 
> from that. Will probably work with a developer on this as well.
>
> Many Thanks guys.
>
> Cheers,
> Tim
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Friday, 5 August 2016 4:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader
> out of memory
>
> On 8/4/2016 8:14 PM, Tim Chen wrote:
>> Couple of thoughts: 1, If Leader goes down, it should just go down,
>> like dead down, so other servers can do the election and choose the
>> new leader. This at least avoids bringing down the whole cluster. Am
>> I right?
>
> Supplementing what Erick told you:
>
> When a typical Java program throws OutOfMemoryError, program behavior is 
> completely unpredictable.  There are programming techniques that can be used 
> so that behavior IS predictable, but writing that code can be challenging.
>
> Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
> option to execute a script when OutOfMemoryError happens.  This script kills 
> Solr completely.  We are working on adding this capability when running on 
> Windows.
>
>> 2, Apparently we should not pushing too many documents to Solr, how
>> do you guys handle this? Set a limit somewhere?
>
> There are exactly two ways to deal with OOME problems: Increase the heap or 
> reduce Solr's memory requirements.  The number of documents you push to Solr 
> is unlikely to have a large effect on the amount of memory that Solr 
> requires.  Here's some information on this topic:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> Thanks,
> Shawn
>
>
>
> [Premiere League Starts Saturday 13 August 9.30pm on
> SBS]<http://theworldgame.sbs.com.au/>


[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]<http://theworldgame.sbs.com.au/>

RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-09 Thread Tim Chen

Guys, (@Erick & @Shawn),

Thanks for the great suggestions!

I have increased Tomcat MaxThreads from 200 to 1 on our staging 
environment. So far so good.

I will perform some more indexing test and see how it goes.

Many thanks,
Tim

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Monday, 8 August 2016 11:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

On 8/7/2016 6:53 PM, Tim Chen wrote:
> Exception in thread "http-bio-8983-exec-6571" java.lang.OutOfMemoryError: 
> unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
> at 
> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> at java.lang.Thread.run(Thread.java:745)

I find myself chasing Erick once again. :)  Supplementing what he told you:

There are two things that might be happening here.

1) The Tomcat setting "maxThreads" may limiting the number of threads.
This defaults to 200, and should be increased to 1.  The specific error 
doesn't sound like an application limit, though -- it acts more like Java 
itself can't create the thread.  If you have already adjusted maxThreads, then 
it's more likely to be the second option:

2) The operating system may be imposing a limit on the number of 
processes/threads a user is allowed to start.  On Linux systems, this is 
typically 1024.  For other operating systems, I am not sure what the default 
limit is.

Thanks,
Shawn

[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]<http://theworldgame.sbs.com.au/>

Solr 4.10 Joins: Slow performance with millions of documents

2016-08-14 Thread Tim Frey

Hi there.  I'm trying to fix a performance problem I have with queries that
use Solr's Join feature.  The query is intended to find all Job
Applications that have an Interview in a particular state.  There are 20
million Job Applications and around 7 million Interviews, with 1 million
Interviews in the state I'm looking for.  With all other filters applied,
the total result set is around 5000 documents.  The query takes around 10
seconds.

After reading up on how Joins are essentially just subqueries, I understand
why my original approach would be slow.  However, when I add another
restriction for the "inner query" to a single Job Application the entire
query still takes around 5 seconds.  In this case, the inner query matches
2 documents and the total result set size is 1 document (as expected.)

Here's the debug output:
https://gist.github.com/tfrey7/50cd92c98e767ec612cc98bf430b9931

I'm using Solr 4.10.  All documents are in the same index.  The ID columns
are dynamic integer fields (because we're using the Sunspot ruby library,
exactly like:
https://github.com/sunspot/sunspot/blob/master/sunspot_solr/solr/solr/configsets/sunspot/conf/schema.xml#L179
)

Is there something obviously wrong with the query that I'm making?  Can
query-time Joins ever work for a scenario like this?

Thanks!

Configuration options/concerns for multiple Solr versions

2016-10-14 Thread Tim Parker

We have a ColdFusion-based CMS product which can interface with Solr for 
search functionality.  ColdFusion ships with an ancient version of Solr 
(old enough that it crashes when the search criteria includes a leading 
wildcard), so to get current Solr functionality... we have to interface 
directly to Solr.  We don't bundle Solr with our product, so it's 
important to be able to interface to as wide a variety of Solr releases 
as possible.


Our initial implementation used Solr 4.10.2, and includes customizations 
in schema.xml and solrconfig.xml.  We have since made some minor changes 
for compatibility with Solr 5.x, but our goal is to minimize the 
version-specific information so we can create new cores without having 
to worry about what Solr version is in play.


Aside from keeping the luceneMatchVersion setting in agreement with the 
actual Solr release in use, what other configuration and/or schema 
changes do we need to worry about?


[enhancement request: set luceneMatchVersion to the running version if 
it's not found in solrconfig.xml - and/or allow a setting of 'current' 
so this doesn't have to be touched without specific reason to do so - 
any thoughts?  Am I missing something?]


--
Tim Parker
Senior Engineer
PaperThin, Inc.
300 Congress Street, Suite 303
Quincy, MA 02169
Ph: 617.471.4440 x203
CommonSpot helps organizations improve engagement across the web, mobile 
devices, and social media outlets to achieve better marketing results.  Find 
out what's new in CommonSpot at www.paperthin.com.

child doc filter

2016-11-03 Thread Tim Williams

I'm using the BlockJoinQuery to query child docs and return the
parent.  I'd like to have the equivalent of a filter that applies to
child docs and I don't see a way to do that with the BlockJoin stuffs.
It looks like I could modify it to accept some childFilter param and
add a QueryWrapperFilter right after the child query is created[1] but
before I did that, I wanted to see if there's a built-in way to
achieve the same behavior?

Thanks,
--tim

[1] - 
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java#L69

Re: Spatial Search based on the amount of docs, not the distance

2017-06-22 Thread Tim Casey

deniz,

I was going to add something here.  The reason what you want is probably
hard to do is because you are asking solr, which stores a document, to
return documents using an attribute of document pairs.  As only a though
exercise, if you stored record pairs as a single document, you could
probably query it directly.  That is, if you have d1 and d2 and you are
querying  around d1 and ordering by distance, then you could get this
directly from a document representing a record pair.  I don't think this is
practical, because it is an n^2 store.

Since the n^2 problem is there, people are going to suggest some heuristic
which avoids this problem.  What Erick is suggesting is down this path.
Query around a point and sort by distance taking the top K results.  The
result is taking a linear slice of the n^2 distance attribute.

tim

On Wed, Jun 21, 2017 at 7:50 PM, Erick Erickson 
wrote:

> Would it serve to sort by distance? True, if you matched a zillion
> documents within a 1km radius you'd still perform the distance calcs, but
> the result would be a manageable number.
>
> I have to ask "Why to you care?". Is this an efficiency question (i.e. you
> want to keep Solr from having to do expensive work) or is it a question of
> having to get hits at all? It's at least possible that the solution for one
> is not the solution for the other.
>
> Best,
> Erick
>
> On Wed, Jun 21, 2017 at 5:32 PM, deniz  wrote:
>
> > it is for sure possible to use d value for limiting the distance,
> however,
> > it
> > might not be very efficient, as some of the coords may not have any docs
> > around for a large value of d... so it is hard to determine a default
> value
> > for d.
> >
> > though it sounds like havinga default d and gradual increments on its
> value
> > might be a work around for top K results...
> >
> >
> >
> >
> >
> > -
> > Zeki ama calismiyor... Calissa yapar...
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Spatial-Search-based-on-the-amount-of-docs-not-the-distance-
> > tp4342108p4342258.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Arabic words search in solr

2017-08-02 Thread Tim Casey

There should be a way to use a phrasal query for the specific names.

On Wed, Aug 2, 2017 at 2:15 PM, Phil Scadden  wrote:

> Hopefully changing to default AND solves your problem. If so, I would be
> quite interested in what your index config looks like in the end. I also
> have upcoming need to index Arabic words.
>
> -Original Message-
> From: mohanmca01 [mailto:mohanmc...@gmail.com]
> Sent: Thursday, 3 August 2017 12:58 a.m.
> To: solr-user@lucene.apache.org
> Subject: RE: Arabic words search in solr
>
> Hi Phil Scadden,
>
>  Thank you for your reply,
>
> we tried your suggested solution by removing hyphen while indexing, but it
> was getting wrong results. i was searching for "شرطة ازكي" and it was
> showing me the result that am looking for, plus irrelevant result which
> either have the first or second word that i have typed while searching.
>
> First word: شرطة
> Second Word: ازكي
>
> results that we are getting:
>
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 3,
> "params": {
>   "indent": "true",
>   "q": "bizNameAr:(شرطة ازكي)",
>   "_": "1501678260335",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 444,
> "start": 0,
> "docs": [
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> - مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>   {
> "id": "13937",
> "bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
> "_version_": 157462113219720
>   },
>   {
> "id": "15914",
> "bizNameAr": "العلوي والازكي المتحدة ش.م.م",
> "_version_": 1574621132344000500
>   },
>   {
> "id": "20639",
> "bizNameAr": "سحائب ازكي للتجارة",
> "_version_": 1574621132574687200
>   },
>   {
> "id": "25108",
> "bizNameAr": "المستشفيات -  - مستشفى إزكي",
> "_version_": 1574621132737216500
>   },
>   {
> "id": "27629",
> "bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
> "_version_": 1574621132833685500
>   },
>   {
> "id": "36351",
> "bizNameAr": "طوارئ الكهرباء - إزكي",
> "_version_": 157462113318391
>   },
>   {
> "id": "61235",
> "bizNameAr": "اضواء ازكي للتجارة",
> "_version_": 1574621133785792500
>   },
>   {
> "id": "66821",
> "bizNameAr": "أطلال إزكي للتجارة",
> "_version_": 1574621133915816000
>   },
>   {
> "id": "67011",
> "bizNameAr": "بنك ظفار - فرع ازكي",
> "_version_": 1574621133920010200
>   }
> ]
>   }
> }
>
> Actually  we expecting the below results only since it has both the words
> that we typed while searching:
>
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> - مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>
>
> Configuration:
>
> In schema.xml we configured as below:
>
> 
>
>
>  positionIncrementGap="100">
>   
> 
>  words="lang/stopwords_ar.txt" />
> 
> 
> 
> 
>  replacement="ئ"/>
>  replacement=""/>
>   
> 
>
>
> Thanks,
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>

Re: Solr query help

2017-08-18 Thread Tim Casey

You can add a ~3 to the query to allow the order to be reversed, but you
will get extra hits.  Maybe it is a ~4, i can never remember on phrases and
reversals.  I usually just try it.

Alternatively, you can create a custom query field for what you need from
dates.  For example, if you want to search by queries like "fourth
tuesday", you need to have 'tuesday" in a query and better to have " 4
tuesday " as part of the field.

Instead of a phrase query, you do +2017 +(04 03) +(01 02 03 04 05 06 07 08
09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31),
which does all the days in march and apr.  A more complicated nested query
would do more complicated date ranges.

I don't know if there is a way to get repeating date range queries, like
the fourth tuesday for all months in a year.  The date support is usually
about querying a specified range at a time.

tim

On Fri, Aug 18, 2017 at 11:19 AM, Webster Homer 
wrote:

> What field types are you using for your dates?
> Have a look at:
> https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
>
> On Thu, Aug 17, 2017 at 10:08 AM, Nawab Zada Asad Iqbal 
> wrote:
>
> > Hi Krishna
> >
> > I haven't used date range queries myself. But if Solr only supports a
> > particular date format, you can write a thin client for queries, which
> will
> > convert the date to solr's format and query solr.
> >
> > Nawab
> >
> > On Thu, Aug 17, 2017 at 7:36 AM, chiru s  wrote:
> >
> > > Hello guys
> > >
> > > I am working on Apache solr and I am stuck with a use case.
> > >
> > >
> > > The input data will be in the documents like 2017/03/15 in 1st
> document,
> > >
> > > 2017/04/15 in 2nd doc,
> > >
> > > 2017/05/15 in 3rd doc,
> > >
> > > 2017/06/15 in 4th doc so on
> > >
> > > But while fetching the data it should fetch like 03/15/2017 for the
> first
> > > doc and so on.
> > >
> > > My requirement is like this ..
> > >
> > >
> > > The data is like above and when I do an fq with name:[2017/03/15 TO
> > > 2017/05/15] it fetches me the 1st three documents.. but the need the
> data
> > > as 03/15/2017 instead of 2017/03/15.
> > >
> > >
> > > I tried solr.pattetnReplaceCharFilterFactory but it doesn't seem
> > working..
> > >
> > > Can you please help on the above.
> > >
> > >
> > > Thanks in advance
> > >
> > >
> > > Krishna...
> > >
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>

Re: Java profiler?

2017-12-06 Thread Tim Casey

I really like Profiler.  It takes a little bit of set up, but it works.

tim

On Wed, Dec 6, 2017 at 2:04 AM, Peter Sturge  wrote:

> Hi,
> We'be been using JPRofiler (www.ej-technologies.com) for years now.
> Without a doubt, the most comprehensive and useful profiler for java.
> Works very well, supports remote profiling and includes some very neat heap
> walking/gc profiling.
> Peter
>
>
> On Tue, Dec 5, 2017 at 3:21 PM, Walter Underwood 
> wrote:
>
> > Anybody have a favorite profiler to use with Solr? I’ve been asked to
> look
> > at why out queries are slow on a detail level.
> >
> > Personally, I think they are slow because they are so long, up to 40
> terms.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >
>

Re: Howto search for § character

2017-12-07 Thread Tim Casey

My last company we ended up writing a custom analyzer to handle
punctuation.  But this was for lucent 2 or 3.  That analyzer was carried
forward as we updated and was used for all human derived text.

Although now there are way better analyzers and way better ways to hook
them up, as noted above by Erick, We really cared about how this was done
and all of the work put into the analyzer paid off.

I would expect there to be an analyzer which would maintain punctuation
tokens for search.  One of the issues which comes up is if you want
multiple-runs of punctuation to be a single token or separate tokens.  So
what happens to "§!"  or "§?" or "?§", and in the case of things like
text/email what happens to "§".

In any event, my 2 pence worth

tim

On Thu, Dec 7, 2017 at 10:00 AM, Shawn Heisey  wrote:

> On 12/7/2017 9:37 AM, Bernd Schmidt wrote:
> > Indeed, I saw in the analysis tab of the solr admin that the § char will
> be removed when using type text_general.
> > But in this use case we want to make a full text search like
> "_text_:§45" or "_text_:§*" to find words starting with §.
> > We need a text field here, not a string field!
> > What is your recommended way to deal with it?
> > Is it possible to remove the word break behaviour for the  § char?
> > Or is the best way to encode all § chars when indexing and searching?
>
> This character is classified by Unicode as punctuation:
>
> http://www.fileformat.info/info/unicode/char/00a7/index.htm
>
> Almost any example field type for full-text search that you're likely to
> encounter is going to be designed to split on punctuation and remove it
> from the token stream.  That's one of the most common things that
> full-text search engines do.
>
> You're going to need to design a new analysis chain that *doesn't* do
> this, apply the fieldType containing that analysis to your field,
> restart/reload, and reindex.
>
> Designing analysis chains is an art form, and tends to be one of the
> hardest parts of setting up a production Solr install.  It took me at
> least a month of almost constant work to settle on the schema design for
> the indexes that I maintain.  All of the "solr.TextField" types in my
> schema are completely custom -- none of the analysis chains in Solr
> examples are in that schema.
>
> Thanks,
> Shawn
>
>

Re: Question about best way to architect a Solr application with many data sources

2017-02-22 Thread Tim Casey

I would possibly extend this a bit futher.  There is the source, then the
'normalized' version of the data, then the indexed version.
Sometimes you realize you miss something in the normalized view and you
have to go back to the actual source.

This will be as likely as there are number of sources for data.   I would
expect the "DB" version of the data would be the normalized view.
It is also possible, the DB holds the raw bytes of the source which are
then transformed and into a normalized view.  Indexing always happens from
the normalized view.  In this scheme, frequently there is a way to mark
what failed normalization so you can go back and recapture the data for a
re-index.

Also, if you are dealing with timely data, being able to reindex helps
removing stale information from the search index.  In the pipeline of
captured source -> normalized -> analyzed -> information, where analyzed is
indexed here, what you do with the data over a year or more becomes part of
the thinking.



On Tue, Feb 21, 2017 at 8:24 PM, Walter Underwood 
wrote:

> Reindexing is exactly why you want the Single Source of Truth to be in a
> repository outside of Solr.
>
> For our slowly-changing data sets, we have an intermediate JSONL batch.
> That is created from the source repositories and saved in Amazon S3. Then
> we load it into Solr nightly. That allows us to reload whenever we need to,
> like loading prod data in test or moving search to a different Amazon
> region.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Feb 21, 2017, at 7:34 PM, Erick Erickson 
> wrote:
> >
> > Dave:
> >
> > Oh, I agree that a DB is a perfectly valid place to store the data and
> > you're absolutely right that it allows better interaction than flat
> > files; you can ask questions of an RDBMS that you can't easily ask the
> > disk ;). Storing to disk is an alternative if you're unwilling to deal
> > with a DB is all.
> >
> > But the main point is you'll change your schema sometime and have to
> > re-index. Having the data you're indexing stored locally in whatever
> > form will allow much faster turn-around rather than re-crawling. Of
> > course it'll result in out of date data so you'll have to refresh
> > somehow sometime.
> >
> > Erick
> >
> > On Tue, Feb 21, 2017 at 6:07 PM, Dave 
> wrote:
> >> Ha I think I went to one of your training seminars in NYC maybe 4 years
> ago Eric. I'm going to have to respectfully disagree about the rdbms.  It's
> such a well know data format that you could hire a high school programmer
> to help with the db end if you knew how to flatten it to solr. Besides it's
> easy to visualize and interact with the data before it goes to solr. A
> Json/Nosql format would work just as well, but I really think a database
> has its place in a scenario like this
> >>
> >>> On Feb 21, 2017, at 8:20 PM, Erick Erickson 
> wrote:
> >>>
> >>> I'll add that I _guarantee_ you'll want to re-index the data as you
> >>> change your schema
> >>> and the like. You'll be able to do that much more quickly if the data
> >>> is stored locally somehow.
> >>>
> >>> A RDBMS is not necessary however. You could simply store the data on
> >>> disk in some format
> >>> you could re-read and send to Solr.
> >>>
> >>> Best,
> >>> Erick
> >>>
>  On Tue, Feb 21, 2017 at 5:17 PM, Dave 
> wrote:
>  B is a better option long term. Solr is meant for retrieving flat
> data, fast, not hierarchical. That's what a database is for and trust me
> you would rather have a real database on the end point.  Each tool has a
> purpose, solr can never replace a relational database, and a relational
> database could not replace solr. Start with the slow model (database) for
> control/display and enhance with the fast model (solr) for retrieval/search
> 
> 
> 
> > On Feb 21, 2017, at 7:57 PM, Robert Hume  wrote:
> >
> > To learn how to properly use Solr, I'm building a little experimental
> > project with it to search for used car listings.
> >
> > Car listings appear on a variety of different places ... central
> places
> > Craigslist and also many many individual Used Car dealership
> websites.
> >
> > I am wondering, should I:
> >
> > (a) deploy a Solr search engine and build individual indexers for
> every
> > type of web site I want to find listings on?
> >
> > or
> >
> > (b) build my own database to store car listings, and then build
> services
> > that scrape data from different sites and feed entries into the
> database;
> > then point my Solr search to my database, one simple source of
> listings?
> >
> > My concerns are:
> >
> > With (a) ... I have to be smart enough to understand all those
> different
> > data sources and remove/update listings when they change; while this
> be
> > harder to do with custom Solr indexers than writing something from
> scratch?
> >
> > With (b) ...

Re: query rewriting

2017-03-07 Thread Tim Casey

Hendrik,

I would recommend attempting to stick to the query syntax, as it is in
lucene, as close as possible.

However, if you do your own query parse build up, you can use a Lucene
Query object.  I don't know where this bolts into solr, exactly.  But I
have done this extensively with lucene.  The reason was to combine two
distinct portions of content into one unified query language.  Also, we did
some remapping of field names into a normalized user experience.  This
meant the field names could be exposed in the UI, independent of the
metadata of the underlying content.  For what I did, the source content
could be vastly different from one index to another.  Usually this is not
the case.

You end up building or/and query phrases, then passing it off to the query
engine.  If you do this, you can also optimize and add boost terms under
specific circumstances.  If there are a set of required terms/phrases, then
you can add terms to boost or remove non-required terms without any loss to
the overall result set.  This changes the order in which items are
returned, so may impact user perception of recall, but is possible under
for specific reasons.

tim

On Sun, Mar 5, 2017 at 11:40 PM, Hendrik Haddorp 
wrote:

> Hi,
>
> I would like to dynamically modify a query, for example by replacing a
> field name with a different one. Given how complex the query parsing is it
> does look error prone to duplicate that so I would like to work on the
> Lucene Query object model instead. The subclasses of Query look relatively
> simple and easy to rewrite on the Lucene side but on the Solr side this
> does not seem to be the case. Any suggestions on how this could be done?
>
> thanks,
> Hendrik
>

Re: model building

2017-03-21 Thread Tim Casey

Joe,

To do this correctly, soundly, you will need to sample the data and mark
them as threatening or neutral.  You can probably expand on this quite a
bit, but that would be a good start.  You can then draw another set of
samples and see how you did.  You use one to train and one to validate.

What you are doing is probably just noise, from a model point of view, and
it will probably not make too much difference how you index/query/model
through the noise.

I don't mean this critically, just plainly.  Effectively the less
mathematically correctly you do this process, the more anecdotal the result.

tim


On Mon, Mar 20, 2017 at 4:42 PM, Joel Bernstein  wrote:

> I've only tested with the training data in it's own collection, but it was
> designed for multiple training sets in the same collection.
>
> I suspect you're training set is too small to get a reliable model from.
> The training sets we tested with were considerably larger.
>
> All the idfs_ds values being the same seems odd though. The idfs_ds in
> particular were designed to be accurate when there are multiple training
> sets in the same collection.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Mar 20, 2017 at 5:41 PM, Joe Obernberger <
> joseph.obernber...@gmail.com> wrote:
>
> > If I put the training data into its own collection and use q="*:*", then
> > it works correctly.  Is that a requirement?
> > Thank you.
> >
> > -Joe
> >
> >
> >
> > On 3/20/2017 3:47 PM, Joe Obernberger wrote:
> >
> >> I'm trying to build a model using tweets.  I've manually tagged 30
> tweets
> >> as threatening, and 50 random tweets as non-threatening.  When I build
> the
> >> mode with:
> >>
> >> update(models2, batchSize="50",
> >>  train(UNCLASS,
> >>   features(UNCLASS,
> >>  q="ProfileID:PROFCLUST1",
> >>  featureSet="threatFeatures3",
> >>  field="ClusterText",
> >>  outcome="out_i",
> >>  positiveLabel=1,
> >>  numTerms=250),
> >>   q="ProfileID:PROFCLUST1",
> >>   name="threatModel3",
> >>   field="ClusterText",
> >>   outcome="out_i",
> >>   maxIterations="100"))
> >>
> >> It appears to work, but all the idfs_ds values are identical. The
> >> terms_ss values look reasonable, but nearly all the weights_ds are 1.0.
> >> For out_i it is either -1 for non-threatening tweets, and +1 for
> >> threatening tweets.  I'm trying to follow along with Joel Bernstein's
> >> excellent post here:
> >> http://joelsolr.blogspot.com/2017/01/deploying-ai-alerting-s
> >> ystem-with-solrs.html
> >>
> >> Tips?
> >>
> >> Thank you!
> >>
> >> -Joe
> >>
> >>
> >
>

RE: Limit on # of collections -SolrCloud

2014-03-21 Thread Tim Potter

Hi Chris,

Thanks for the link to Patrick's github (looks like some good stuff in there).

One thing to try (and this isn't the final word on this, but is helpful) is to 
go into the tree view in the Cloud panel and find out which node is hosting the 
Overseer (/overseer_elect/leader). When restarting your cluster, make sure you 
restart this node last. We've seen instances where you end up restarting the 
overseer node each time as you restart the cluster, which causes all kinds of 
craziness. I'll bet you'll see better results by doing this, but let us know 
either way.

Also, at 600 cores per machine, I had to reduce the JVM's thread stack size to 
-Xss256k as there are an extreme number of threads allocated when starting up 
that many cores.

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Chris W 
Sent: Thursday, March 20, 2014 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Limit on # of collections -SolrCloud

The replication factor is two. I have equally sharded all collections
across all nodes. We have a 6 node cluster setup. 300* 6 shards and 2
replicas per shard. I have almost 600 cores per machine

Also one fact is that my zk timeout is in the order of 2-3 minutes. I see
zk responses very slow and a lot of outstanding requests (found that out
thanks to https://github.com/phunt/)




On Thu, Mar 20, 2014 at 2:53 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hours sounds too long indeed.  We recently had a client with several
> thousand collections, but restart wasn't taking hours...
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Mar 20, 2014 5:49 PM, "Erick Erickson"  wrote:
>
> > How many total replicas are we talking here?
> > As in how many shards and, for each shard,
> > how many replicas? I'm not asking for a long list
> > here, just if you have a bazillion replicas in aggregate.
> >
> > Hours is surprising.
> >
> > Best,
> > Erick
> >
> > On Thu, Mar 20, 2014 at 2:17 PM, Chris W 
> wrote:
> > > Thanks, Shalin. Making clusterstate.json on a collection basis sounds
> > > awesome.
> > >
> > >  I am not having problems with #2 . #3 is a major time hog in my
> > > environment. I have over 300 +collections and restarting the entire
> > cluster
> > > takes in the order of hours.  (2-3 hour). Can you explain more about
> the
> > > leaderVoteWait setting?
> > >
> > >
> > >
> > >
> > > On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar <
> > > shalinman...@gmail.com> wrote:
> > >
> > >> There are no arbitrary limits on the number of collections but yes
> > >> there are practical limits. For example, the cluster state can become
> > >> a bottleneck. There is a lot of work happening on finding and
> > >> addressing these problems. See
> > >> https://issues.apache.org/jira/browse/SOLR-5381
> > >>
> > >> Boot up time is because of:
> > >> 1) Core discovery, schema/config parsing etc
> > >> 2) Transaction log replay on startup
> > >> 3) Wait time for enough replicas to become available before leader
> > >> election happens
> > >>
> > >> You can't do much about 1 right now I think. For #2, you can keep your
> > >> transaction logs smaller by a hard commit before shutdown. For #3
> > >> there is a leaderVoteWait settings but I'd rather not touch that
> > >> unless it becomes a problem.
> > >>
> > >> On Fri, Mar 21, 2014 at 1:39 AM, Chris W 
> > wrote:
> > >> > Hi there
> > >> >
> > >> >  Is there a limit on the # of collections solrcloud can support? Can
> > >> > zk/solrcloud handle 1000s of collections?
> > >> >
> > >> > Also i see that the bootup time of solrcloud increases with increase
> > in #
> > >> > of cores. I do not have any expensive warm up queries. How do i
> > speedup
> > >> > solr startup?
> > >> >
> > >> > --
> > >> > Best
> > >> > --
> > >> > C
> > >>
> > >>
> > >>
> > >> --
> > >> Regards,
> > >> Shalin Shekhar Mangar.
> > >>
> > >
> > >
> > >
> > > --
> > > Best
> > > --
> > > C
> >
>



--
Best
--
C

Solr Cloud Shards and Replica not reviving after restarting

2014-05-20 Thread Tim Burner

Hi Everyone,

I have installed Solr Cloud 4.6.2 with external Zookeeper and Tomcat,
having 3 shards with 2 replica each. I tried indexing some documents which
went easy.

After which I restarted my Tomcat, and now the Shards are not getting up,
its coming up with bunch of Exceptions. First exception was "*no servers
hosting shard:"*

All the replica and leader are down and not responding, its even giving

RecoveryStrategy Error while trying to recover.
core=recollection_shard1_replica1:org.apache.solr.client.solrj.SolrServerException:
Server refused connection at: http://192.168.2.183:9090/solr

It would be great if you can help me out solving this issue. Expert advice
needed.

Thanks in Advance!

Vague Behavior while setting Solr Cloud

2014-05-20 Thread Tim Burner

Hi Everyone,

I am trying to setup Solr Cloud referring to the blog
http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html

if I complete the set in one go, then it seems to be going fine.

when the setup is complete and I am trying to restart Solr by restarted
Tomcat instance, it does not deploy and moreover the shards and replicas
are not up.

Urgent call, let me know if you know anything!

Thanks in Advance!

Re: Solr Cloud Shards and Replica not reviving after restarting

2014-05-20 Thread Tim Burner

Thanks Erick,

I much appreciate your help I got it fixed, actually there were some
background process already running for tomcat which weren't stopped by the
time I faced these issues.

Thanks again!


On Wed, May 21, 2014 at 8:25 AM, Erick Erickson wrote:

> First thing I'd look at is the log on the server. It's possible that
> you've changed the configuration such that Solr can't start. Shot in
> the dark, but that's where I'd start looking.
>
> Best,
> Erick
>
> On Tue, May 20, 2014 at 4:45 AM, Tim Burner  wrote:
> > Hi Everyone,
> >
> > I have installed Solr Cloud 4.6.2 with external Zookeeper and Tomcat,
> > having 3 shards with 2 replica each. I tried indexing some documents
> which
> > went easy.
> >
> > After which I restarted my Tomcat, and now the Shards are not getting up,
> > its coming up with bunch of Exceptions. First exception was "*no servers
> > hosting shard:"*
> >
> > All the replica and leader are down and not responding, its even giving
> >
> > RecoveryStrategy Error while trying to recover.
> >
> core=recollection_shard1_replica1:org.apache.solr.client.solrj.SolrServerException:
> > Server refused connection at: http://192.168.2.183:9090/solr
> >
> > It would be great if you can help me out solving this issue. Expert
> advice
> > needed.
> >
> > Thanks in Advance!
>

Re: Vague Behavior while setting Solr Cloud

2014-05-20 Thread Tim Burner

Thanks Shawn,

I much appreciate your help I got it fixed, actually there were some
background process already running for tomcat which weren't stopped by the
time I faced these issues.

Thanks again!
Tim


On Tue, May 20, 2014 at 11:33 PM, Shawn Heisey  wrote:

> On 5/20/2014 7:10 AM, Tim Burner wrote:
> > I am trying to setup Solr Cloud referring to the blog
> > http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html
> >
> > if I complete the set in one go, then it seems to be going fine.
> >
> > when the setup is complete and I am trying to restart Solr by restarted
> > Tomcat instance, it does not deploy and moreover the shards and replicas
> > are not up.
>
> You've given us nearly zero information about what the problem is.  All
> we know right now is that you restart tomcat and Solr doesn't deploy.
> See this wiki page:
>
> http://wiki.apache.org/solr/UsingMailingLists
>
> Getting specific, we'll need tomcat logs, Solr logs, versions of
> everything.  We might also need your config and schema, depending on
> what the other information reveals.
>
> Thanks,
> Shawn
>
>

Indexing getting failed after some millions of documents

2014-05-20 Thread Tim Burner

Hi Everyone,

I have installed Solr-4.6 Cloud with external Zookeeper-3.4.5 and Tomcat-7,
the configuration is as mentioned below.

Single Machine Cluster Setup with 3 shards and 2 Replica deployed on 3
Tomcats with 3 Zookeeper.

Everything is installed good and fine, I start with the index and till I
reach some millions of documents(~1.6M) the indexing stops saying "*#503
Service Unavailable" *and the Cloud Dashboard log says

*"ERROR DistributedUpdateProcessor ClusterState says we are the leader,
but locally we don't think so"*


*"ERROR SolrCore org.apache.solr.common.SolrException: ClusterState says we
are the leader (http://host:port1/solr/recollection_shard1_replica1), but
locally we don't think so. Request came from
http://host:port2/solr/recollection_shard2_replica1/"*


*"ERROR ZkController Error registering
SolrCore:org.apache.solr.common.SolrException: Error getting leader from zk
for shard shard2"*

Any suggestions/advice would be appreciated!

Thanks!
Tim

Re: Indexing getting failed after some millions of documents

2014-05-22 Thread Tim Burner

Hi Eric,

I am running the code from Eclipse with default heap size of 384Mb and
indexing using Solr SimplePostTool, posting xml files through Http Request.
I feel its not concern with Heap, otherwise the program would have made my
process pretty slow rather this is observed every 2-3 hours of indexing
after which some of the node goes down.

I personally feel this may be due to the reason of Leader re election,
because again last exception traced on my Cloud UI Log(mentioned below),
couple of question striking me.

1) Is the leader election not getting over in the zookeeper alloted time.
2) Do I need to increase the zookeerTimeOut param with some greater value
from what its been set currently.
3) Can't we manually elect the Leader and let the election happens if it
goes down.

ERROR StreamingSolrServers error

org.apache.solr.common.SolrException: Service Unavailable



request: http://host:port2
/solr/recollection_shard3_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fhost%3Aport1%2Fsolr%2Frecollection_shard1_replica1%2F&wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

14:01:19 ERROR SolrCore org.apache.solr.common.SolrException: No registered
leader was found, collection:recollection slice:shard3

org.apache.solr.common.SolrException: No registered leader was found,
collection:recollection slice:shard3
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484)
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467)
at
org.apache.solr.update.SolrCmdDistributor$RetryNode.checkRetry(SolrCmdDistributor.java:351)
at
org.apache.solr.update.SolrCmdDistributor.doRetriesIfNeeded(SolrCmdDistributor.java:78)
at
org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:61)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:499)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1288)
at
org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1002)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

Looking forward for your response.

Thanks,
Tim


On Wed, May 21, 2014 at 8:34 PM, Erick Erickson wrote:

> How much memory have you allocated the JVMs? Also, what's does the
> Solr log show on the machine that isn't coming up? Sounds like the
> node went down and perhaps went into recovery
>
> And how are you indexing?
>
> Best,
> Erick
>
> On Tue, May 20, 2014 at 11:54 PM, Tim Burner 
> wrote:
> > Hi Everyone,
> >
> > I have installed Solr-4.6 Cloud with external Zookeeper-3.4.5 and
> Tomcat-7,
> > the configuration is as mentioned below.
> >
> > Single Machine Cluster Setup with 3 shards and 2 Replica deployed on 3
> > Tomcats with 3 Zookeeper.
> >
> > Everything is installed good and fine, I start with the index

fq caching question

2013-10-14 Thread Tim Vaillancourt


Hey guys,

Sorry for such a simple question, but I am curious as to the differences 
in caching between a "combined" filter query, and many separate filter 
queries.


Here are 2 example queries, one with combined fq, one separate:

1) "/select?q=*:*&fq=type:bid&fq=user_id:3"
2) "/select?q=*:*&fq=(type:bid%20AND%20user_id:3)"

For query #1: am I correct that the first query will keep 2 independent 
entries in the filterCache for type:bid and user_id:3?\
For query #2: is it correct that the 2nd query will keep 1 entry in the 
filterCache that satisfies all conditions?


Lastly, is it a fair statement that under general query patterns, many 
separate filter queries are more-cacheable than 1 combined one? Eg, if I 
performed query #2 (in the filterCache) and then changed the user_id, 
nothing about my new query is cache able, correct (but if I used 2 
separate filter queries than 1 of 2 is still cached)?


Cheers,

Tim Vaillancourt

Re: fq caching question

2013-10-14 Thread Tim Vaillancourt


Thanks Koji!

Cheers,

Tim

On 14/10/13 03:56 PM, Koji Sekiguchi wrote:

Hi Tim,

(13/10/15 5:22), Tim Vaillancourt wrote:

Hey guys,

Sorry for such a simple question, but I am curious as to the 
differences in caching between a

"combined" filter query, and many separate filter queries.

Here are 2 example queries, one with combined fq, one separate:

1) "/select?q=*:*&fq=type:bid&fq=user_id:3"
2) "/select?q=*:*&fq=(type:bid%20AND%20user_id:3)"

For query #1: am I correct that the first query will keep 2 
independent entries in the filterCache

for type:bid and user_id:3?\


Correct.

For query #2: is it correct that the 2nd query will keep 1 entry in 
the filterCache that satisfies

all conditions?


Correct.

Lastly, is it a fair statement that under general query patterns, 
many separate filter queries are
more-cacheable than 1 combined one? Eg, if I performed query #2 (in 
the filterCache) and then
changed the user_id, nothing about my new query is cache able, 
correct (but if I used 2 separate

filter queries than 1 of 2 is still cached)?


Yes, it is.

koji

Skipping caches on a /select

2013-10-16 Thread Tim Vaillancourt

Hey guys,

I am debugging some /select queries on my Solr tier and would like to see
if there is a way to tell Solr to skip the caches on a given /select query
if it happens to ALREADY be in the cache. Live queries are being inserted
and read from the caches, but I want my debug queries to bypass the cache
entirely.

I do know about the "cache=false" param (that causes the results of a
select to not be INSERTED in to the cache), but what I am looking for
instead is a way to tell Solr to not read the cache at all, even if there
actually is a cached result for my query.

Is there a way to do this (without disabling my caches in solrconfig.xml),
or is this feature request?

Thanks!

Tim Vaillancourt

Re: SolrCloud on SSL

2013-10-16 Thread Tim Vaillancourt

Not important, but I'm also curious why you would want SSL on Solr (adds
overhead, complexity, harder-to-troubleshoot, etc)?

To avoid the overhead, could you put Solr on a separate VLAN (with ACLs to
client servers)?

Cheers,

Tim


On 12 October 2013 17:30, Shawn Heisey  wrote:

> On 10/11/2013 9:38 AM, Christopher Gross wrote:
> > On Fri, Oct 11, 2013 at 11:08 AM, Shawn Heisey 
> wrote:
> >
> >> On 10/11/2013 8:17 AM, Christopher Gross wrote: 
> >>> Is there a spot in a Solr configuration that I can set this up to use
> >> HTTPS?
> >>
> >> From what I can tell, not yet.
> >>
> >> https://issues.apache.org/jira/browse/SOLR-3854
> >> https://issues.apache.org/jira/browse/SOLR-4407
> >> https://issues.apache.org/jira/browse/SOLR-4470
> >>
> >>
> > Dang.
>
> Christopher,
>
> I was just looking through Solr source code for a completely different
> issue, and it seems that there *IS* a way to do this in your configuration.
>
> If you were to use "https://hostname"; or "https://ipaddress"; as the
> "host" parameter in your solr.xml file on each machine, it should do
> what you want.  The parameter is described here, but not the behavior
> that I have discovered:
>
> http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params
>
> Boring details: In the org.apache.solr.cloud package, there is a
> ZkController class.  The getHostAddress method is where I discovered
> that you can do this.
>
> If you could try this out and confirm that it works, I will get the wiki
> page updated and look into the Solr reference guide as well.
>
> Thanks,
> Shawn
>
>

Re: Skipping caches on a /select

2013-10-17 Thread Tim Vaillancourt


Thanks Yonik,

Does "cache=false" apply to all caches? The docs make it sound like it 
is for filterCache only, but I could be misunderstanding.


When I force a commit and perform a /select a query many times with 
"cache=false", I notice my query gets cached still, my guess is in the 
queryResultCache. At first the query takes 500ms+, then all subsequent 
requests take 0-1ms. I'll confirm this queryResultCache assumption today.


Cheers,

Tim

On 16/10/13 06:33 PM, Yonik Seeley wrote:

On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourt  wrote:

I am debugging some /select queries on my Solr tier and would like to see
if there is a way to tell Solr to skip the caches on a given /select query
if it happens to ALREADY be in the cache. Live queries are being inserted
and read from the caches, but I want my debug queries to bypass the cache
entirely.

I do know about the "cache=false" param (that causes the results of a
select to not be INSERTED in to the cache), but what I am looking for
instead is a way to tell Solr to not read the cache at all, even if there
actually is a cached result for my query.

Yeah, cache=false for "q" or "fq" should already not use the cache at
all (read or write).

-Yonik

Re: Skipping caches on a /select

2013-10-17 Thread Tim Vaillancourt



  
  
Awesome, this make a lot of sense now. Thanks a lot guys.

Currently the only mention of this setting in the docs is under
filterQuery on the "SolrCaching" page as:

" Solr3.4 Adding the
localParam flag of {!cache=false} to a query will prevent
the filterCache from being consulted for that query. "

I will update the docs sometime soon to reflect that this can apply
to any query (q or fq).

    Cheers,

Tim

On 17/10/13 01:44 PM, Chris Hostetter wrote:

  

: Does "cache=false" apply to all caches? The docs make it sound like it is for
: filterCache only, but I could be misunderstanding.

it's per *query* -- not per cache, or per request...

 /select?q={!cache=true}foo&fq={!cache=false}bar&fq={!cache=true}baz

...should cause 1 lookup/insert in the filterCache (baz) and 1 
lookup/insert into the queryResultCache (for the main query with it's 
associated filters & pagination)



-Hoss

Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt

I agree with Jonathan (and Shawn on the Jetty explanation), I think the
docs should make this a bit more clear - I notice many people choosing
Tomcat and then learning these details after, possibly regretting it.

I'd be glad to modify the docs but I want to be careful how it is worded.
Is it fair to go as far as saying Jetty is 100% THE "recommended" container
for Solr, or should a recommendation be avoided, and maybe just a list of
pros/cons?

Cheers,

Tim

Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt

Hmm, thats an interesting move. I'm on the fence on that one but it surely
simplifies some things. Good info, thanks!

Tim


On 24 October 2013 16:46, Anshum Gupta  wrote:

> Thought you may want to have a look at this:
>
> https://issues.apache.org/jira/browse/SOLR-4792
>
> P.S: There are no timelines for 5.0 for now, but it's the future
> nevertheless.
>
>
>
> On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt  >wrote:
>
> > I agree with Jonathan (and Shawn on the Jetty explanation), I think the
> > docs should make this a bit more clear - I notice many people choosing
> > Tomcat and then learning these details after, possibly regretting it.
> >
> > I'd be glad to modify the docs but I want to be careful how it is worded.
> > Is it fair to go as far as saying Jetty is 100% THE "recommended"
> container
> > for Solr, or should a recommendation be avoided, and maybe just a list of
> > pros/cons?
> >
> > Cheers,
> >
> > Tim
> >
>
>
>
> --
>
> Anshum Gupta
> http://www.anshumgupta.net
>

Re: difference between apache tomcat vs Jetty

2013-10-25 Thread Tim Vaillancourt

I (jokingly) propose we take it a step further and drop Java :)! I'm 
getting tired of trying to scale GC'ing JVMs!


Tim

On 25/10/13 09:02 AM, Mark Miller wrote:

Just to add to the “use jetty for Solr” argument - Solr 5.0 will no longer 
consider itself a webapp and will consider the fact that Jetty is a used an 
implementation detail.

We won’t necessarily make it impossible to use a different container, but the 
project won’t condone it or support it and may do some things that assume 
Jetty. Solr is taking over this layer in 5.0.

- Mark

On Oct 25, 2013, at 11:18 AM, Cassandra Targett  wrote:


In terms of adding or fixing documentation, the "Installing Solr" page
(https://cwiki.apache.org/confluence/display/solr/Installing+Solr)
includes a yellow box that says:

"Solr ships with a working Jetty server, with optimized settings for
Solr, inside the example directory. It is recommended that you use the
provided Jetty server for optimal performance. If you absolutely must
use a different servlet container then continue to the next section on
how to install Solr."

So, it's stated, but maybe not in a way that makes it clear to most
users. And maybe it needs to be repeated in another section.
Suggestions?

I did find this page,
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Jetty,
which pretty much contradicts the previous text. I'll fix that now.

Other recommendations for where doc could be more clear are welcome.

On Thu, Oct 24, 2013 at 7:14 PM, Tim Vaillancourt  wrote:

Hmm, thats an interesting move. I'm on the fence on that one but it surely
simplifies some things. Good info, thanks!

Tim


On 24 October 2013 16:46, Anshum Gupta  wrote:


Thought you may want to have a look at this:

https://issues.apache.org/jira/browse/SOLR-4792

P.S: There are no timelines for 5.0 for now, but it's the future
nevertheless.



On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt
wrote:
I agree with Jonathan (and Shawn on the Jetty explanation), I think the
docs should make this a bit more clear - I notice many people choosing
Tomcat and then learning these details after, possibly regretting it.

I'd be glad to modify the docs but I want to be careful how it is worded.
Is it fair to go as far as saying Jetty is 100% THE "recommended"

container

for Solr, or should a recommendation be avoided, and maybe just a list of
pros/cons?

Cheers,

Tim




--

Anshum Gupta
http://www.anshumgupta.net

Re: Chegg is looking for a search engineer

2013-11-18 Thread Tim Casey

I have been chasing the chegg recruiters.  I expect to here back from Glenn
sometime tomorrow.

tim


On Mon, Nov 18, 2013 at 6:37 PM, Walter Underwood wrote:

> I work at Chegg.com and I really like it, but we have more search work
> than I can do by myself, so we are hiring a senior software engineer for
> search. The search services include: textbooks (rental and purchase),
> user-generated homework Q&A, expert-written textbook solutions, search
> within e-books, customer support FAQ, and schools and scholarships for
> Zinch.com. Most of our search services are on Solr.
>
> http://www.chegg.com/jobs/listings/?jvi=oAQGXfwN,Job
>
> If you'd like to know a lot more about Chegg's business, you can read the
> S1 that we filed recently in preparation for our IPO or you can follow us
> as CHGG on the New York Stock Exchange.
>
> wunder
> --
> Walter Underwood
> wun...@wunderwood.org
> Search Guy
> chegg.com
>
>

RE: Questions about commits and OOE

2013-12-04 Thread Tim Potter

Hi Metin,

I think removing the softCommit=true parameter on the client side will 
definitely help as NRT wasn't designed to re-open searchers after every 
document. Try every 1 second (or even every few seconds), I doubt your users 
will notice. To get an idea of what threads are running in your JVM process, 
you can use jstack.

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: OSMAN Metin 
Sent: Wednesday, December 04, 2013 7:36 AM
To: solr-user@lucene.apache.org
Subject: Questions about commits and OOE

Hi all,

let me first explain our situation :

We have


-   two virtual servers with each :

4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has -Xms2048m 
-Xmx2048m -XX:MaxPermSize=384m
1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.)
CentOS 6.4
Sun JDK 1.6.0-31
16 GB of RAM
4 vCPU


-   only one core and one shard

-   ~25 docs and 50-100 MB of index size

-   two load balancers (apache + mod_cluster) who are both connected to the 
8 SolR nodes

-   1 VIP pointing to these two LB

The commit configuration is

-   every update request do a soft commit (i.e. param softCommit=true in 
the http request)

-   autosoftcommit disabled

-   autocommit enabled every 15 seconds

The client application is a java app with SolRj client using the previous VIP 
as an endpoint.
We need NearRealTime modifications visible by the end users.
During the day, the client uses SolR with about 80% of select requests and 20% 
of update requests.
Every morning, the client is sending a massive bunch of updates (about 1 in 
a few minutes).

During this massive update, we have sometimes a peak of active threads 
exceeding the limit of 8192 process authorized for the user running the tomcat 
and zookeeper process.
When this happens, every hardCommit is failing with an "OutOfMemory : unable to 
create native thread" message.


Now, I have some questions :

-   Why are there some many threads created ? Is the softCommit on every 
update that opens a new thread ?

-   Once an OOE occurs, every hardcommit will be broken, even if the number 
of threads opened on the system is low. Is there any way to "free" the JVM ? 
The only solution we have found is to restart all the JVM.

-   When the OOE occurs, the SolR cloud console shows the leader node as 
active and the others as recovering

o   is the replication working at that moment ?

o   as all the hardcommits are failing but the softcommits not, am I very sure 
that I will not lose some updates when restarting all the nodes ?

By the way, we are planning to

-   disable the softCommit parameter on the client side and to enable the 
autosoftcommit instead.

-   create another server and make 3 zookeeper chorum instead of a unique 
zookeeper master.

-   skip the use of load balancers and let zookeeper decide which node will 
respond to the requests

Any help would be appreciated !

Metin OSMAN

RE: Setting routerField/shardKey on specific collection?

2013-12-04 Thread Tim Potter

Hi Daniel,

I'm not sure how this would apply to an existing collection (in your case 
collection1). Try using the collections API to create a new collection and pass 
the router.field parameter. Grep'ing over the code, the parameter is named: 
router.field (not routerField or routeField).

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Daniel Bryant 
Sent: Wednesday, December 04, 2013 9:40 AM
To: solr-user@lucene.apache.org
Subject: Setting routerField/shardKey on specific collection?

Hi,

I'm using Solr 4.6 and trying to specify a router.field (shard key) on a
specific collection so that all documents with the same value in the
specified field end up in the same collection.

However, I can't find an example of how to do this via the solr.xml? I
see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there
is a mention of a routeField property.

Should the solr.xml contain the following?


 


Any help would be greatly appreciate? I've been yak shaving all
afternoon reading various Jira tickets and wikis trying to get this to
work :-)

Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
*
daniel.bry...@tai-dev.co.uk   |  +44
(0) 7799406399  |  Twitter: @taidevcouk

Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt


Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of the 
SolrCloud collection on each instance, only to notice the same problem - 
the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of a 
SolrCloud collection, the SolrCloud routing is bypassed and I am talking 
directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


"solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are no 
deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core "directly"?


An interesting observation is when I do an /admin/cores call to see the 
docCount of the core's index, it does not fluctuate, only the query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim

Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt


To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg: 
"q=key:timvaillancourt"), not just the "q=*:*" I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is 
remaining "state: active" in my /clusterstate.json - something is really 
wrong with this cloud! Would a Zookeeper issue explain my varied results 
when querying a core directly?


Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of 
the SolrCloud collection on each instance, only to notice the same 
problem - the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of 
a SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


"solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are 
no deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core "directly"?


An interesting observation is when I do an /admin/cores call to see 
the docCount of the core's index, it does not fluctuate, only the 
query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim

Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt


Thanks Markus,

I'm not sure if I'm encountering the same issue. This JIRA mentions 10s 
of docs difference, I'm seeing differences in the multi-millions of 
docs, and even more strangely it very predictably flaps between a 123M 
value and an 87M value, a 30M+ doc difference.


Secondly, I'm not comparing values from 2 instances (Leader to Replica), 
I'm currently performing the same curl call to the same core directly 
and am seeing flapping results each time I perform the query, so this is 
currently happening within a single instance/core unless I am 
misunderstanding how to directly query a core.


Cheers,

Tim

On 04/12/13 02:46 PM, Markus Jelsma wrote:

https://issues.apache.org/jira/browse/SOLR-4260

Join the club Tim! Can you upgrade to trunk or incorporate the latest patches 
of related issues? You can fix it by trashing the bad node's data, although 
without multiple clusters it may be difficult to decide which node is bad.

We use the latest commits now (since tuesday) and are still waiting for it to 
happen again.

-Original message-

From:Tim Vaillancourt
Sent: Wednesday 4th December 2013 23:38
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent numFound in SC when querying core directly

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg:
"q=key:timvaillancourt"), not just the "q=*:*" I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is
remaining "state: active" in my /clusterstate.json - something is really
wrong with this cloud! Would a Zookeeper issue explain my varied results
when querying a core directly?

Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).

Currently we are noticing inconsistent results from the SolrCloud when
performing the same simple /select query many times to our collection.
Almost every other query the numFound count (and the returned data)
jumps between two very different values.

Initially I suspected a replica in a shard of the collection was
inconsistent (and every other request hit that node) and started
performing the same /select query direct to the individual cores of
the SolrCloud collection on each instance, only to notice the same
problem - the count jumps between two very different values!

I may be incorrect here, but I assumed when querying a single core of
a SolrCloud collection, the SolrCloud routing is bypassed and I am
talking directly to a plain/non-SolrCloud core.

As you can see here, the count for 1 core of my SolrCloud collection
fluctuates wildly, and is only receiving updates and no deletes to
explain the jumps:

"solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"


Could anyone help me understand why the same /select query direct to a
single core would return inconsistent, flapping results if there are
no deletes issued in my app to cause such jumps? Am I incorrect in my
assumption that I am querying the core "directly"?

An interesting observation is when I do an /admin/cores call to see
the docCount of the core's index, it does not fluctuate, only the
query result.

That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim

Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to the 
thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am talking
: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add "distrib=false" to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/

Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt


Hey all,

Now that I am getting correct results with "distrib=false", I've 
identified that 1 of my nodes has just 1/3rd of the total data set and 
totally explains the flapping in results. The fix for this is obvious 
(rebuild replica) but the cause is less obvious.


There is definately more than one issue going on with this SolrCloud 
(but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that 
/clusterstate.json doesn't seem to get updated when nodes are brought 
down/up is the reason why this replica remained in the distributed 
request chain without recovering/re-replicating from leader.


I imagine my Zookeeper ensemble is having some problems unrelated to 
Solr that is the real root cause.


Thanks!

Tim

On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to 
the thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core 
of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking

: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add "distrib=false" to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/

Re: Inconsistent numFound in SC when querying core directly

2013-12-05 Thread Tim Vaillancourt

Very good point. I've seen this issue occur once before when I was playing
with 4.3.1 and don't  remember it happening since 4.5.0+, so that is good
news - we are just behind.

For anyone that is curious, on my earlier mention that
Zookeeper/clusterstate.json was not taking updates: this was NOT correct.
Zookeeper has no issues taking set/creates to clusterstate.json (or any
znode), just this one node seemed to stay stuck as "state: active" while it
was very inconsistent for reasons unknown, potentially just bugs.

The good news is this will be resolved today with a create/destroy of the
bad replica.

Thanks all!

Tim


On 4 December 2013 16:50, Mark Miller  wrote:

> Keep in mind, there have been a *lot* of bug fixes since 4.3.1.
>
> - Mark
>
> On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt  wrote:
>
> > Hey all,
> >
> > Now that I am getting correct results with "distrib=false", I've
> identified that 1 of my nodes has just 1/3rd of the total data set and
> totally explains the flapping in results. The fix for this is obvious
> (rebuild replica) but the cause is less obvious.
> >
> > There is definately more than one issue going on with this SolrCloud
> (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that
> /clusterstate.json doesn't seem to get updated when nodes are brought
> down/up is the reason why this replica remained in the distributed request
> chain without recovering/re-replicating from leader.
> >
> > I imagine my Zookeeper ensemble is having some problems unrelated to
> Solr that is the real root cause.
> >
> > Thanks!
> >
> > Tim
> >
> > On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
> >> Chris, this is extremely helpful and it's silly I didn't think of this
> sooner! Thanks a lot, this makes the situation make much more sense.
> >>
> >> I will gather some proper data with your suggestion and get back to the
> thread shortly.
> >>
> >> Thanks!!
> >>
> >> Tim
> >>
> >> On 04/12/13 02:57 PM, Chris Hostetter wrote:
> >>> :
> >>> : I may be incorrect here, but I assumed when querying a single core
> of a
> >>> : SolrCloud collection, the SolrCloud routing is bypassed and I am
> talking
> >>> : directly to a plain/non-SolrCloud core.
> >>>
> >>> No ... every query received from a client by solr is handled by a
> single
> >>> core -- if that core knows it's part of a SolrCloud collection then it
> >>> will do a distributed search across a random replica from each shard in
> >>> that collection.
> >>>
> >>> If you want to bypass the distribute search logic, you have to say so
> >>> explicitly...
> >>>
> >>> To ask an arbitrary replica to only search itself add "distrib=false"
> to
> >>> the request.
> >>>
> >>> Alternatively: you can ask that only certain shard names (or certain
> >>> explicit replicas) be included in a distribute request..
> >>>
> >>> https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
> >>>
> >>>
> >>>
> >>> -Hoss
> >>> http://www.lucidworks.com/
>
>

RE: starting up solr automatically

2013-12-05 Thread Tim Potter

Apologies for chiming in late on this one ... just wanted to mention what I've 
used with good success in the past is supervisord (http://supervisord.org/). 
It's easy to install and configure and has the benefit of restarting nodes if 
they crash (such as due to an OOM). I'll also mention that you should consider 
configuring the OOM killer for your JVM when using SolrCloud as an OOM'd 
process is like zombie in your cluster, causing all kinds of malice.

-XX:OnOutOfMemoryError="/home/solr/oom_killer.sh $x %p"

But, whether you use that or not, definitely take a look at supervisord if 
you're on Linux as it has been a great way to run SolrCloud in a good sized 
cluster for me.

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Greg Walters 
Sent: Thursday, December 05, 2013 1:23 PM
To: solr-user@lucene.apache.org
Subject: Re: starting up solr automatically

Eric,

Sorry about that, the entire OPTIONS= part can be dropped. That's there to 
support a war file that we deploy next to solr.

Greg

On Dec 5, 2013, at 1:51 PM, Eric Palmer  wrote:

> some progress but getting this error now
> sudo service jetty start
> Starting Jetty: -bash: line 1: cd: /var/lib/answers/atlascloud/solr45: No
> such file or directory
> STARTED Jetty Thu Dec  5 19:50:09 UTC 2013
> [ec2-user@ip-10-50-203-92 ~]$ java.lang.IllegalArgumentException: No such
> OPTIONS: jsp
> at org.eclipse.jetty.start.Config.getCombinedClasspath(Config.java:411)
> at org.eclipse.jetty.start.Config.getActiveClasspath(Config.java:388)
> at org.eclipse.jetty.start.Main.start(Main.java:509)
> at org.eclipse.jetty.start.Main.main(Main.java:96)
>
>
> On Thu, Dec 5, 2013 at 2:28 PM, Greg Walters wrote:
>
>> Eric,
>>
>> If you're using the script from the gist I posted make sure you're
>> sourcing the jetty file at line 140.
>>
>> Thanks,
>> Greg
>>
>> On Dec 5, 2013, at 1:21 PM, Eric Palmer  wrote:
>>
>>> Greg or anyone that can help, when I try to start jetty as a service
>>> sudo service jetty start
>>>
>>> I get this error
>>> ** ERROR: JETTY_HOME not set, you need to set it or install in a standard
>>> location
>>>
>>> same for
>>> sudo service jetty stop
>>> sudo service jetty check
>>> etc
>>>
>>> I have a file here and the permissions look right
>>> ls -al /etc/default/
>>> total 20
>>> drwxr-xr-x  2 root root 4096 Dec  5 19:18 .
>>> drwxr-xr-x 68 root root 4096 Dec  5 19:03 ..
>>> -rwxr-xr-x  1 root root  317 Dec  5 19:18 jetty
>>>
>>> the contents if the jetty file is
>>> JAVA_HOME=/usr/lib/jvm/jre
>>> JETTY_HOME=/home/ec2-user/solr/solr-4.5.1/example/
>>> JETTY_USER=ec2-user
>>> JETTY_LOGS=/home/ec2-user/solr/solr-4.5.1/example/logs
>>> JAVA_OPTIONS="\
>>> -Dsolr.solr.home=/home/ec2-user/solr/solr-4.5.1/example/solr/ \
>>> -Xms1g \
>>> -Djetty.port=8983 \
>>> -Dcollection.configName=collection1 \
>>> $JAVA_OPTIONS"
>>>
>>> Any ideas what I should check?
>>>
>>> Eric P
>>>
>>> thanks in advance
>>>
>>>
>>>
>>> On Thu, Dec 5, 2013 at 11:28 AM, Greg Walters >> wrote:
>>>
 Alan,

 Yes, that's intentional. There's two reasons for this:

 1: We make schema changes frequently (more frequently than I like)
 2: So far as I've noticed, it doesn't hurt anything and covers my butt
 when I've got to clear out all the solr related data from ZK while
>> testing

 Thanks,
 Greg

 On Dec 5, 2013, at 5:53 AM, Alan Woodward  wrote:

> Hi Greg,
>
> It looks as though your script below will bootstrap a collection
 configuration every time Solr is restarted, which probably isn't what
>> you
 want to do?  You only need to upload the config once.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 4 Dec 2013, at 21:26, Greg Walters wrote:
>
>> I almost forgot, you'll need a file to setup the environment a bit
>> too:
>>
>> **
>> JAVA_HOME=/usr/java/default
>> JAVA_OPTIONS="-Xmx15g \
>> -Xms15g \
>> -XX:+PrintGCApplicationStoppedTime \
>> -XX:+PrintGCDateStamps \
>> -XX:+PrintGCDetails \
>> -XX:+UseConcMarkSweepGC \
>> -XX:+UseParNewGC \
>> -XX:+UseTLAB \
>> -XX:+CMSParallelRemarkEnabled \
>> -XX:+CMSScavengeBeforeRemark \
>> -XX:+UseCMSInitiatingOccupancyOnly \
>> -XX:CMSInitiatingOccupancyFraction=50 \
>> -XX:CMSWaitDuration=30 \
>> -XX:GCTimeRatio=40 \
>> -Xloggc:/tmp/solr45_gc.log \
>> -Dbootstrap_conf=true \
>>

>> -Dbootstrap_confdir=/var/lib/answers/atlascloud/solr45/solr/wa-en-collection_1/conf/
 \
>> -Dcollection.configName=wa-en-collection \
>> -DzkHost= \
>> -DnumShards= \
>> -Dsolr.solr.home=/var/lib/answers/atlascloud/solr45/solr/ \
>>

>> -Dlog4j.configuration=file:///var/lib/answers/atlascloud/solr45/resources/log4j.properties
 \
>> -Djetty.port=9101 \
>> $JAVA_OPTIONS"
>> JETTY_HOME=/var/lib/answers/atlascloud/solr45/
>> JETTY_USER=tomcat
>> JETTY_LOGS=/

Re: Inconsistent numFound in SC when querying core directly

2013-12-05 Thread Tim Vaillancourt

I spoke too soon, my plan for fixing this didn't quite work.

I've moved this issue into a new thread/topic: "No /clusterstate.json
updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE".

Thanks all for the help on this one!

Tim


On 5 December 2013 11:37, Tim Vaillancourt  wrote:

> Very good point. I've seen this issue occur once before when I was playing
> with 4.3.1 and don't  remember it happening since 4.5.0+, so that is good
> news - we are just behind.
>
> For anyone that is curious, on my earlier mention that
> Zookeeper/clusterstate.json was not taking updates: this was NOT correct.
> Zookeeper has no issues taking set/creates to clusterstate.json (or any
> znode), just this one node seemed to stay stuck as "state: active" while it
> was very inconsistent for reasons unknown, potentially just bugs.
>
> The good news is this will be resolved today with a create/destroy of the
> bad replica.
>
> Thanks all!
>
> Tim
>
>
> On 4 December 2013 16:50, Mark Miller  wrote:
>
>> Keep in mind, there have been a *lot* of bug fixes since 4.3.1.
>>
>> - Mark
>>
>> On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt 
>> wrote:
>>
>> > Hey all,
>> >
>> > Now that I am getting correct results with "distrib=false", I've
>> identified that 1 of my nodes has just 1/3rd of the total data set and
>> totally explains the flapping in results. The fix for this is obvious
>> (rebuild replica) but the cause is less obvious.
>> >
>> > There is definately more than one issue going on with this SolrCloud
>> (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that
>> /clusterstate.json doesn't seem to get updated when nodes are brought
>> down/up is the reason why this replica remained in the distributed request
>> chain without recovering/re-replicating from leader.
>> >
>> > I imagine my Zookeeper ensemble is having some problems unrelated to
>> Solr that is the real root cause.
>> >
>> > Thanks!
>> >
>> > Tim
>> >
>> > On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
>> >> Chris, this is extremely helpful and it's silly I didn't think of this
>> sooner! Thanks a lot, this makes the situation make much more sense.
>> >>
>> >> I will gather some proper data with your suggestion and get back to
>> the thread shortly.
>> >>
>> >> Thanks!!
>> >>
>> >> Tim
>> >>
>> >> On 04/12/13 02:57 PM, Chris Hostetter wrote:
>> >>> :
>> >>> : I may be incorrect here, but I assumed when querying a single core
>> of a
>> >>> : SolrCloud collection, the SolrCloud routing is bypassed and I am
>> talking
>> >>> : directly to a plain/non-SolrCloud core.
>> >>>
>> >>> No ... every query received from a client by solr is handled by a
>> single
>> >>> core -- if that core knows it's part of a SolrCloud collection then it
>> >>> will do a distributed search across a random replica from each shard
>> in
>> >>> that collection.
>> >>>
>> >>> If you want to bypass the distribute search logic, you have to say so
>> >>> explicitly...
>> >>>
>> >>> To ask an arbitrary replica to only search itself add "distrib=false"
>> to
>> >>> the request.
>> >>>
>> >>> Alternatively: you can ask that only certain shard names (or certain
>> >>> explicit replicas) be included in a distribute request..
>> >>>
>> >>> https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
>> >>>
>> >>>
>> >>>
>> >>> -Hoss
>> >>> http://www.lucidworks.com/
>>
>>
>

No /clusterstate.json updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE

2013-12-05 Thread Tim Vaillancourt

Hey guys,

I've been having an issue with 1 of my 4 replicas having an inconsistent
replica, and have been trying to fix it. At the core of this issue, I've
noticed /clusterstate.json doesn't seem to be receiving updates when cores
get unhealthy, or even added/removed.

Today I decided I would remove the "bad" replica from the SolrCloud and
force a sync of a new clean replica, so I ran a
'/admin/cores?command=UNLOAD&name=name' to drop it. After this, on the
instance with the "bad" replica, the core was removed from solr.xml but
strangely NOT the /clusterstate.json in Zookeeper - it remained in
Zookeeper unchanged, still with "state: active" :(.

So, I then manually edited the clusterstate.json with a Perl script,
removing the json data for the "bad" replica. I checked all nodes saw the
change themselves, things looked good. Then I brought the node up/down to
check that it was properly adding/removing itself from /live_nodes znode in
Zookeeper. That all worked perfectly, too.

Here is the really odd part: when I created a new replica on this node (to
replace the "bad" replica), the core was created on the node, and NO update
was made to /clusterstate.json. At this point this node had no cores, no
cores with state in /clusterstate.json, and all data dirs deleted, so this
is quite confusing.

Upon checking ACLs on /clusterstate.json, it is world/anyone accessible:

"[zk: localhost:2181(CONNECTED) 18] getAcl /clusterstate.json
'world,'anyone
: cdrwa"

Also, keep in mind my external Perl script had no issue updating
/clusterstate.json. Can anyone make any suggestions why /clusterstate.json
isn't getting updated when I create this new core?

One other thing I checked was the health of the Zookeeper ensemble, and all
3 Zookeepers have the same mZxid, ctime, mtime, etc for /clusterstate.json
and receive updates no problem, just this node isn't updating Zookeeper
somehow.

Any thoughts are much appreciated!

Thanks!

Tim

RE: Cloud graph gone after manually editing clusterstate.json

2013-12-11 Thread Tim Potter

Hi Michael,

Can you /get clusterstate.json again to see the contents? Also, maybe just a 
typo but you have `cate clusterstate.json` vs. `cat ..`

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: michael.boom 
Sent: Wednesday, December 11, 2013 6:37 AM
To: solr-user@lucene.apache.org
Subject: Cloud graph gone after manually editing clusterstate.json

HI,

Today I changed my ZK config, removing one instance in the quorum and then
restarted both all ZKs and all Solr instances.
After this operation i noticed that one of the shards in one collection was
missing the range ("range":null). Router for that collection was
compositeId.

So, I proceeded adding the missing range manually by editing
clusterstate.json
$ zkCli.sh -server zk1:9983 get /clusterstate.json > clusterstate.json
i did my edits, and then:
$ zkCli.sh -server zk1:9983 set /clusterstate.json "`cate
clusterstate.json`"

Everything fine, I check in the Admin - the clusterstate.json was updated,
but now when i try to see the graph view or radial graph i can't see
anything. Just white space.

Any idea why?
Thanks!





-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Cloud graph gone after manually editing clusterstate.json

2013-12-11 Thread Tim Potter

I'm not sure at this point as what you're describing seems fine to me ... I'm 
not too familiar with Solr's UI implementation, but I suspect the cloud graph 
stuff may be client side, so are you seeing any JavaScript errors in the dev 
console in your browser?

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: michael.boom 
Sent: Wednesday, December 11, 2013 8:21 AM
To: solr-user@lucene.apache.org
Subject: RE: Cloud graph gone after manually editing clusterstate.json

Thanks for the reply Tim,

Yes, that was just a typo, i used "cat" not "cate".
As for the checks everything looks fine, my edits were:
1. updating the shard range
2. removed the header which looked log information, as below:
*<<<< removed header start here*
Connecting to solr3:9983
2013-12-11 16:15:05,372 [myid:] - INFO  [main:Environment@100] - Client
environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2013-12-11 16:15:05,376 [myid:] - INFO  [main:Environment@100] - Client
environment:host.name=solr3.internal
2013-12-11 16:15:05,377 [myid:] - INFO  [main:Environment@100] - Client
environment:java.version=1.7.0_25
2013-12-11 16:15:05,377 [myid:] - INFO  [main:Environment@100] - Client
environment:java.vendor=Oracle Corporation
2013-12-11 16:15:05,378 [myid:] - INFO  [main:Environment@100] - Client
environment:java.home=/usr/lib/jvm/java-7-openjdk-amd64/jre
2013-12-11 16:15:05,378 [myid:] - INFO  [main:Environment@100] - Client
environment:java.class.path=/opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/z$
2013-12-11 16:15:05,378 [myid:] - INFO  [main:Environment@100] - Client
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/jni:/lib:/usr/lib
2013-12-11 16:15:05,379 [myid:] - INFO  [main:Environment@100] - Client
environment:java.io.tmpdir=/tmp
2013-12-11 16:15:05,379 [myid:] - INFO  [main:Environment@100] - Client
environment:java.compiler=
2013-12-11 16:15:05,380 [myid:] - INFO  [main:Environment@100] - Client
environment:os.name=Linux
2013-12-11 16:15:05,380 [myid:] - INFO  [main:Environment@100] - Client
environment:os.arch=amd64
2013-12-11 16:15:05,381 [myid:] - INFO  [main:Environment@100] - Client
environment:os.version=3.2.0-4-amd64
2013-12-11 16:15:05,381 [myid:] - INFO  [main:Environment@100] - Client
environment:user.name=solr
2013-12-11 16:15:05,382 [myid:] - INFO  [main:Environment@100] - Client
environment:user.home=/home/solr
2013-12-11 16:15:05,382 [myid:] - INFO  [main:Environment@100] - Client
environment:user.dir=/opt/zookeeper
2013-12-11 16:15:05,384 [myid:] - INFO  [main:ZooKeeper@438] - Initiating
client connection, connectString=solr3:9983 sessionTimeout=3
watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@58a5f543
2013-12-11 16:15:05,412 [myid:] - INFO
[main-SendThread(solr3.productdb.internal:9983):ClientCnxn$SendThread@966] -
Opening socket connection to server solr3.internal/10.33.182.78:9983. Will
not attempt to authenticate $
2013-12-11 16:15:05,419 [myid:] - INFO
[main-SendThread(solr3.productdb.internal:9983):ClientCnxn$SendThread@849] -
Socket connection established to solr3.internal/10.33.182.78:9983,
initiating session
2013-12-11 16:15:05,427 [myid:] - INFO
[main-SendThread(solr3.productdb.internal:9983):ClientCnxn$SendThread@1207]
- Session establishment complete on server solr3.internal/10.33.182.78:9983,
sessionid = 0x142e187355000$

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
*<< i removed the above until here*
{
  "offers_collection_GB":{
"shards":{
  "shard1":{
"range":"8000-bfff",
"state":"active",
"replicas":{
.. and so on


Could this be the problem?



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cloud-graph-gone-after-manually-editing-clusterstate-json-tp4106142p4106161.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Load existing HDFS files into solr?

2013-12-12 Thread Tim Potter

Hi Chen,

I'm not aware of any direct integration between the two at this time. You might 
ping the Hive user list with this question too. That said, I've been thinking 
whether it makes sense to build a Hive StorageHandler for Solr? That at least 
seems like a quick way to go. However, it might also be possible to just plug a 
Hive InputFormat into Mark's MapReduce/Solr stuff?

See: https://github.com/markrmiller/solr-map-reduce-example 

Cheers,

Timothy Potter
www.lucidworks.com


From: cynosure 
Sent: Thursday, December 12, 2013 12:11 AM
To: solr-user@lucene.apache.org
Subject: Load existing HDFS files into solr?

Folks,
Our current data is stored in hive tables. Is there a way to specify solr
to index the existing hdfs files directly? or I have to import each hive
table to solr?
Can any one point to me some reference?
Thank you very much!
Chen

RE: How can you move a shard from one SolrCloud node to another?

2013-12-16 Thread Tim Potter

Hi Chris,

The easiest approach is to just create a new core on the new machine that 
references the collection and shard you want to migrate. For example, say you 
split shard1 of a collection named "cloud", which results in having: shard1_0 
and shard1_1. Now let's say you want to migrate shard 1_0 over to the new 
machine. 

First, fire off a q=*:*&distrib=false query to the shard you're migrating so 
that you know how many docs it has (which will be used to verify the migration 
was clean below).

Next, bring up the new machine in cloud mode (-zkHost=?) and then go to the 
admin console on that server. Nav to the core admin page and create a new core, 
specifying the collection and shard1_0 in the form; note: the form leads you to 
believe you need to create the directory on the local system but you actually 
don't need to worry about doing that as the config will get pulled from ZK and 
the directory will get created on the fly (at least that's what happened in my 
env using branch_4x). 

When the new core initializes, it will use good ol' snapshot replication to 
pull the index from the leader. Verify the new core is happy by executing the 
q=*:*&distrib=false  query again. Once you're satisfied, you can unload the 
core you migrated.

Btw ... you can do all this with the core admin API instead of the Web UI if 
you want to script it.

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: cwhi 
Sent: Sunday, December 15, 2013 3:43 PM
To: solr-user@lucene.apache.org
Subject: How can you move a shard from one SolrCloud node to another?

Let's say I want to rebalance a SolrCloud collection.  I call SPLITSHARD to
split an existing shard, and then I'd like to move one of the subshards to a
new machine so the index is more balanced.  Can this be done?  If not, how
do you rebalance an existing SolrCloud collection?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-you-move-a-shard-from-one-SolrCloud-node-to-another-tp4106815.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How can you move a shard from one SolrCloud node to another?

2013-12-16 Thread Tim Potter

Hi Yago,

When you create a new core (via API or Web UI), you specify the collection name 
and shard id, in my example "cloud" and "shard1_0". When the core initializes 
in SolrCloud mode, it recognizes that the collection exists and adds itself as 
a replica to the shard. Then the main replica recovery process kicks in; try 
PeerSync, realize too far out of date, try snapshot replication from leader. 
The following core API command led to the same result as using the UI:

curl -v 
"http://localhost:8986/solr/admin/cores?action=CREATE&collection=cloud&shard=shard1_0&name=cloud_shard1_0_replica3";

The only trick here is you need to set the name of the core, which from what I 
can tell can be arbitrary but I chose to use the same naming standard as the 
other cores

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Yago Riveiro 
Sent: Monday, December 16, 2013 9:32 AM
To: solr-user@lucene.apache.org
Subject: Re: How can you move a shard from one SolrCloud node to another?

Tim,

Can you explain how the replication snapshot is done using the coreAdminAPI?

--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, December 16, 2013 at 4:23 PM, Tim Potter wrote:

> Hi Chris,
>
> The easiest approach is to just create a new core on the new machine that 
> references the collection and shard you want to migrate. For example, say you 
> split shard1 of a collection named "cloud", which results in having: shard1_0 
> and shard1_1. Now let's say you want to migrate shard 1_0 over to the new 
> machine.
>
> First, fire off a q=*:*&distrib=false query to the shard you're migrating so 
> that you know how many docs it has (which will be used to verify the 
> migration was clean below).
>
> Next, bring up the new machine in cloud mode (-zkHost=?) and then go to the 
> admin console on that server. Nav to the core admin page and create a new 
> core, specifying the collection and shard1_0 in the form; note: the form 
> leads you to believe you need to create the directory on the local system but 
> you actually don't need to worry about doing that as the config will get 
> pulled from ZK and the directory will get created on the fly (at least that's 
> what happened in my env using branch_4x).
>
> When the new core initializes, it will use good ol' snapshot replication to 
> pull the index from the leader. Verify the new core is happy by executing the 
> q=*:*&distrib=false query again. Once you're satisfied, you can unload the 
> core you migrated.
>
> Btw ... you can do all this with the core admin API instead of the Web UI if 
> you want to script it.
>
> Cheers,
>
> Timothy Potter
> Sr. Software Engineer, LucidWorks
> www.lucidworks.com (http://www.lucidworks.com)
>
> 
> From: cwhi mailto:chris.whi...@gmail.com)>
> Sent: Sunday, December 15, 2013 3:43 PM
> To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
> Subject: How can you move a shard from one SolrCloud node to another?
>
> Let's say I want to rebalance a SolrCloud collection. I call SPLITSHARD to
> split an existing shard, and then I'd like to move one of the subshards to a
> new machine so the index is more balanced. Can this be done? If not, how
> do you rebalance an existing SolrCloud collection?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-can-you-move-a-shard-from-one-SolrCloud-node-to-another-tp4106815.html
> Sent from the Solr - User mailing list archive at Nabble.com 
> (http://Nabble.com).
>
>

RE: SolrCloud Suggester java ClassNotFoundException: org.apache.solr.suggest.tst.TSTLookup

2013-12-16 Thread Tim Potter

There have been some recent refactorings in this area of the code. The 
following class name should work:

org.apache.solr.spelling.suggest.tst.TSTLookupFactory

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Trevor Handley 
Sent: Monday, December 16, 2013 11:27 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud Suggester java ClassNotFoundException: 
org.apache.solr.suggest.tst.TSTLookup

Also, I'm aware that there's two typos in my schema.xml attached. I forgot to 
remove the linebreak \ character from the two splitOnCaseChange sections.
This typo does not exist in the official schema.xml that solr is using.

-Original Message-
From: Trevor Handley [mailto:hand...@civicplus.com]
Sent: Monday, December 16, 2013 12:24 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud Suggester java ClassNotFoundException: 
org.apache.solr.suggest.tst.TSTLookup

Hello, I'm working with SolrCloud and trying to integrate the Suggester 
functionality http://wiki.apache.org/solr/Suggester.

I've configured the requestHandler and searchComponent sections of 
solrconfig.xml, and added a new fieldtype and field to schema.xml. These 
documents are attached to this message.

Background: This is a change I'm trying to make to a currently working/stable 
version of Solr 4.6 that has nearly 1.5 million unique documents in the index. 
The whole architecture is a single SolrCloud collection with 2 core shards that 
are replicated for a total of 4 cores. The shard1_slice1 core and shard2_slice2 
core exist on one physical server, and shard1_slice2 core and shard2_slice1 
core exist on a separate physical server.

When I try to restart solr with suggester enabled then I get java error 
"java.lang.ClassNotFoundException: org.apache.solr.suggest.tst.TSTLookup"

I tried a few different suggester classes but they all fail to load with the 
same message here.
I verified that my .jar files do contain that class and are in the correct lib 
directory using a script that searches .jar files for a class name:

[solr@Searchnode-001 ~]$ ./findclass.sh /opt/solrcloud/lib/ TSTLookup 
/opt/solrcloud/lib/solr-core-4.6.0.jar:org/apache/solr/spelling/suggest/tst/TSTLookupFactory.class
/opt/solrcloud/lib/lucene-suggest-4.6-SNAPSHOT.jar:org/apache/lucene/search/suggest/tst/TSTLookup.class

And here's a listing of the jar files in my lib directory:
activation-1.1.jar
AlchemyAPIAnnotator-2.3.1.jar
apache-mime4j-core-0.7.2.jar
apache-mime4j-dom-0.7.2.jar
attributes-binder-1.2.0.jar
bcmail-jdk15-1.45.jar
bcprov-jdk15-1.45.jar
boilerpipe-1.1.0.jar
carrot2-mini-3.8.0.jar
commons-beanutils-1.7.0.jar
commons-collections-3.2.1.jar
commons-compress-1.4.1.jar
commons-digester-2.0.jar
dom4j-1.6.1.jar
fontbox-1.8.1.jar
hppc-0.5.2.jar
icu4j-49.1.jar
isoparser-1.0-RC-1.jar
jackson-core-asl-1.7.4.jar
jackson-mapper-asl-1.7.4.jar
jdom-1.0.jar
jempbox-1.8.1.jar
jetty-continuation-8.1.10.v20130312.jar
jetty-deploy-8.1.10.v20130312.jar
jetty-http-8.1.10.v20130312.jar
jetty-io-8.1.10.v20130312.jar
jetty-jmx-8.1.10.v20130312.jar
jetty-security-8.1.10.v20130312.jar
jetty-server-8.1.10.v20130312.jar
jetty-servlet-8.1.10.v20130312.jar
jetty-util-8.1.10.v20130312.jar
jetty-webapp-8.1.10.v20130312.jar
jetty-xml-8.1.10.v20130312.jar
jsonic-1.2.7.jar
juniversalchardet-1.0.3.jar
langdetect-1.1-20120112.jar
lucene-analyzers-common-4.6-SNAPSHOT.jar
lucene-analyzers-kuromoji-4.6-SNAPSHOT.jar
lucene-analyzers-phonetic-4.6-SNAPSHOT.jar
lucene-codecs-4.6-SNAPSHOT.jar
lucene-core-4.6-SNAPSHOT.jar
lucene-grouping-4.6-SNAPSHOT.jar
lucene-highlighter-4.6-SNAPSHOT.jar
lucene-join-4.6-SNAPSHOT.jar
lucene-memory-4.6-SNAPSHOT.jar
lucene-misc-4.6-SNAPSHOT.jar
lucene-queries-4.6-SNAPSHOT.jar
lucene-queryparser-4.6-SNAPSHOT.jar
lucene-spatial-4.6-SNAPSHOT.jar
lucene-suggest-4.6-SNAPSHOT.jar
mahout-collections-1.0.jar
mahout-math-0.6.jar
mail-1.4.1.jar
metadata-extractor-2.6.2.jar
morfologik-fsa-1.7.1.jar
morfologik-polish-1.7.1.jar
morfologik-stemming-1.7.1.jar
netcdf-4.2-min.jar
OpenCalaisAnnotator-2.3.1.jar
pdfbox-1.8.1.jar
poi-3.9.jar
poi-ooxml-3.9.jar
poi-ooxml-schemas-3.9.jar
poi-scratchpad-3.9.jar
rome-0.9.jar
servlet-api-3.0.jar
simple-xml-2.7.jar
solr-analysis-extras-4.6.0.jar
solr-cell-4.6.0.jar
solr-clustering-4.6.0.jar
solr-core-4.6.0.jar
solr-dataimporthandler-4.6.0.jar
solr-dataimporthandler-extras-4.6.0.jar
solr-langid-4.6.0.jar
solr-solrj-4.6.0.jar
solr-test-framework-4.6.0.jar
solr-uima-4.6.0.jar
solr-velocity-4.6.0.jar
Tagger-2.3.1.jar
tagsoup-1.2.1.jar
tika-core-1.4.jar
tika-parsers-1.4.jar
uimaj-core-2.3.1.jar
velocity-1.7.jar
velocity-tools-2.0.jar
vorbis-java-core-0.1.jar
vorbis-java-tika-0.1.jar
WhitespaceTokenizer-2.3.1.jar
xercesImpl-2.9.1.jar
xmlbeans-2.3.0.jar
xz-1.0.jar

This is how I start solr with jetty:
java -Dbootstrap_confdir=/opt/solrcloud/zkBootstrapConfigs/ 
-Dcollection.configName=CP_Search 
-DzkHost=zookeeper-001:2181,zookeeper-002:2181,zookeeper-003:2181 
-Dcom.sun.manage

RE: SolrCloud Suggester java ClassNotFoundException: org.apache.solr.suggest.tst.TSTLookup

2013-12-16 Thread Tim Potter

Awesome ... I'll update the Wiki to reflect the new class names.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com

From: Trevor Handley 
Sent: Monday, December 16, 2013 11:44 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud Suggester java ClassNotFoundException: 
org.apache.solr.suggest.tst.TSTLookup

Brilliant, thanks Timothy!

Changing the solrconfig.xml lookupImpl (not className) to the 
org.apache.solr.spelling.suggest.tst.TSTLookupFactory fixed this issue for me.

Thanks, Trevor

-Original Message-
From: Tim Potter [mailto:tim.pot...@lucidworks.com]
Sent: Monday, December 16, 2013 12:32 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud Suggester java ClassNotFoundException: 
org.apache.solr.suggest.tst.TSTLookup

There have been some recent refactorings in this area of the code. The 
following class name should work:

org.apache.solr.spelling.suggest.tst.TSTLookupFactory

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com

From: Trevor Handley 
Sent: Monday, December 16, 2013 11:27 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud Suggester java ClassNotFoundException: 
org.apache.solr.suggest.tst.TSTLookup

Also, I'm aware that there's two typos in my schema.xml attached. I forgot to 
remove the linebreak \ character from the two splitOnCaseChange sections.
This typo does not exist in the official schema.xml that solr is using.

-Original Message-
From: Trevor Handley [mailto:hand...@civicplus.com]
Sent: Monday, December 16, 2013 12:24 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud Suggester java ClassNotFoundException: 
org.apache.solr.suggest.tst.TSTLookup

Hello, I'm working with SolrCloud and trying to integrate the Suggester 
functionality http://wiki.apache.org/solr/Suggester.

I've configured the requestHandler and searchComponent sections of 
solrconfig.xml, and added a new fieldtype and field to schema.xml. These 
documents are attached to this message.

Background: This is a change I'm trying to make to a currently working/stable 
version of Solr 4.6 that has nearly 1.5 million unique documents in the index. 
The whole architecture is a single SolrCloud collection with 2 core shards that 
are replicated for a total of 4 cores. The shard1_slice1 core and shard2_slice2 
core exist on one physical server, and shard1_slice2 core and shard2_slice1 
core exist on a separate physical server.

When I try to restart solr with suggester enabled then I get java error 
"java.lang.ClassNotFoundException: org.apache.solr.suggest.tst.TSTLookup"

I tried a few different suggester classes but they all fail to load with the 
same message here.
I verified that my .jar files do contain that class and are in the correct lib 
directory using a script that searches .jar files for a class name:

[solr@Searchnode-001 ~]$ ./findclass.sh /opt/solrcloud/lib/ TSTLookup 
/opt/solrcloud/lib/solr-core-4.6.0.jar:org/apache/solr/spelling/suggest/tst/TSTLookupFactory.class
/opt/solrcloud/lib/lucene-suggest-4.6-SNAPSHOT.jar:org/apache/lucene/search/suggest/tst/TSTLookup.class

And here's a listing of the jar files in my lib directory:
activation-1.1.jar
AlchemyAPIAnnotator-2.3.1.jar
apache-mime4j-core-0.7.2.jar
apache-mime4j-dom-0.7.2.jar
attributes-binder-1.2.0.jar
bcmail-jdk15-1.45.jar
bcprov-jdk15-1.45.jar
boilerpipe-1.1.0.jar
carrot2-mini-3.8.0.jar
commons-beanutils-1.7.0.jar
commons-collections-3.2.1.jar
commons-compress-1.4.1.jar
commons-digester-2.0.jar
dom4j-1.6.1.jar
fontbox-1.8.1.jar
hppc-0.5.2.jar
icu4j-49.1.jar
isoparser-1.0-RC-1.jar
jackson-core-asl-1.7.4.jar
jackson-mapper-asl-1.7.4.jar
jdom-1.0.jar
jempbox-1.8.1.jar
jetty-continuation-8.1.10.v20130312.jar
jetty-deploy-8.1.10.v20130312.jar
jetty-http-8.1.10.v20130312.jar
jetty-io-8.1.10.v20130312.jar
jetty-jmx-8.1.10.v20130312.jar
jetty-security-8.1.10.v20130312.jar
jetty-server-8.1.10.v20130312.jar
jetty-servlet-8.1.10.v20130312.jar
jetty-util-8.1.10.v20130312.jar
jetty-webapp-8.1.10.v20130312.jar
jetty-xml-8.1.10.v20130312.jar
jsonic-1.2.7.jar
juniversalchardet-1.0.3.jar
langdetect-1.1-20120112.jar
lucene-analyzers-common-4.6-SNAPSHOT.jar
lucene-analyzers-kuromoji-4.6-SNAPSHOT.jar
lucene-analyzers-phonetic-4.6-SNAPSHOT.jar
lucene-codecs-4.6-SNAPSHOT.jar
lucene-core-4.6-SNAPSHOT.jar
lucene-grouping-4.6-SNAPSHOT.jar
lucene-highlighter-4.6-SNAPSHOT.jar
lucene-join-4.6-SNAPSHOT.jar
lucene-memory-4.6-SNAPSHOT.jar
lucene-misc-4.6-SNAPSHOT.jar
lucene-queries-4.6-SNAPSHOT.jar
lucene-queryparser-4.6-SNAPSHOT.jar
lucene-spatial-4.6-SNAPSHOT.jar
lucene-suggest-4.6-SNAPSHOT.jar
mahout-collections-1.0.jar
mahout-math-0.6.jar
mail-1.4.1.jar
metadata-extractor-2.6.2.jar
morfologik-fsa-1.7.1.jar
morfologik-polish-1.7.1.jar
morfologik-stemming-1.7.1.jar
netcdf-4.2-min.jar
OpenCalaisAnnotator-2.3.1.jar
pdfbox-1.8.1.jar
poi-3.9.jar
poi-ooxml-3.9.jar
poi-ooxm

RE: solr cloud - deleting and adding the same doc

2013-12-17 Thread Tim Potter

Yes, SolrCloud uses a transaction log to keep track of ordered updates to a 
document. The latest update will be immediately visible from the real-time get 
handler /get?id=X even without a commit.

Cheers,
Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: adfel70 
Sent: Tuesday, December 17, 2013 7:54 AM
To: solr-user@lucene.apache.org
Subject: solr cloud - deleting and adding the same doc

Hi
in SolrCloud, if I send 2 different requests to solr - one with delete
action of doc with id X and another with add action of doc with the same id
- is it guaranteed that the delete action will occur before the add action?

Is it guaranteed that after all actions are done, the index will have doc X
with its most updated state?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-cloud-deleting-and-adding-the-same-doc-tp4107111.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr failure results in misreplication?

2013-12-18 Thread Tim Potter

Any chance you still have the logs from the servers hosting 1 & 2? I would open 
a JIRA ticket for this one as it sounds like something went terribly wrong on 
restart. 

You can update the /clusterstate.json to fix this situation.

Lastly, it's recommended to use an OOM killer script with SolrCloud so that you 
don't end up with zombie nodes hanging around in your cluster. I use something 
like: -XX:OnOutOfMemoryError="$SCRIPT_DIR/oom_solr.sh $x %p"

$x in start script is the port # and %p is the process ID ... My oom_solr.sh 
script is something like this:

#!/bin/bash
SOLR_PORT=$1
SOLR_PID=$2
NOW=$(date +"%F%T")
(
echo "Running OOM killer script for process $SOLR_PID for Solr on port 
89$SOLR_PORT"
kill -9 $SOLR_PID
echo "Killed process $SOLR_PID"
) | tee oom_killer-89$SOLR_PORT-$NOW.log

I use supervisord do handle the restart after the process gets killed by the 
OOM killer, which is why you don't see the restart in this script ;-)

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: youknow...@heroicefforts.net 
Sent: Tuesday, December 17, 2013 10:31 PM
To: solr-user@lucene.apache.org
Subject: Solr failure results in misreplication?

My client has a test cluster Solr 4.6 with three instances 1, 2, and 3 hosting 
shards 1, 2, and 3, respectively.  There is no replication in this cluster.  We 
started receiving OOME during indexing; likely the batches were too large.  The 
cluster was rebooted to restore the system.  However, upon reboot, instance 2 
now shows as a replica of shard 1 and its shard2 is down with a null range.  
Instance 2 is queryable shards.tolerant=true&distribute=false and returns a 
different set of records than instance 1 (as would be expected during normal 
operations).  Clusterstate.json is similar to the following:

mycollection:{
shard1:{
range:800-d554,
state:active,
replicas:{
instance1state:active...,
instance2state:active...
}
},
shard3:{state:active.},
shard2:{
range:null,
state:active,
replicas:{
instance2{state:down}
}
},
maxShardsPerNode:1,
replicationFactor:1
}

Any ideas on how this would come to pass?  Would manually correcting the 
clusterstate.json in Zk correct this situation?

RE: monitoring solr logs

2013-12-30 Thread Tim Potter

I'm using logstash4solr (http://logstash4solr.org) for something similar ...

I setup my Solr to use Log4J by passing the following on the command-line when 
starting Solr: 
-Dlog4j.configuration=file:///$SCRIPT_DIR/log4j.properties

Then I use a custom Log4J appender that writes to RabbitMQ: 

https://github.com/plant42/rabbitmq-log4j-appender

You can then configure a RabbitMQ input for logstash - 
http://logstash.net/docs/1.3.2/inputs/rabbitmq

This decouples the log writes from log indexing in logstash4solr, which scales 
better for active Solr installations.

Btw ... I just log everything from Solr using this approach but you can use 
standard Log4J configuration settings to limit which classes / log levels to 
send to the RabbitMQ appender.

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: adfel70 
Sent: Monday, December 30, 2013 8:15 AM
To: solr-user@lucene.apache.org
Subject: monitoring solr logs

hi
i'm trying to figure out which solr and zookeeper logs i should monitor and
collect.
All the logs will be written to a file but I want to collect some of them
with logstash in order to be able to analyze them efficiently.
any inputs on logs of which classes i should collect?

thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/monitoring-solr-logs-tp4108721.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: monitoring solr logs

2013-12-30 Thread Tim Potter

We're (LucidWorks) are actively developing on logstash4solr so if you have 
issues, let us know. So far, so good for me but I upgraded to logstash 1.3.2 
even though the logstash4solr version includes 1.2.2 you can use the newer one. 
I'm not quite in production with my logstash4solr <- rabbit-mq <- log4j <- Solr 
solution yet though ;-)

Yeah, 50GB is too much logging for only 150K docs. Maybe start by filtering by 
log level (WARN and more severe). If a server crashes, you're likely to see 
some errors in the logstash side but sometimes you may have to SSH to the 
specific box and look at the local log (so definitely append all messages to 
the local Solr log too), I'm using something like the following for local 
logging:

log4j.rootLogger=INFO, file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.MaxFileSize=50MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.File=logs/solr.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{ISO8601} [%t] %-5p %c{3} %x - 
%m%n


Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: adfel70 
Sent: Monday, December 30, 2013 9:34 AM
To: solr-user@lucene.apache.org
Subject: RE: monitoring solr logs

Actually I was considering using logstash4solr, but it didn't seem mature
enough.
does it work fine? any known bugs?

are you collecting the logs in the same solr cluster you use for the
production systems?
if so, what will you do if for some reason solr is down and you would like
to analyze the logs to see what happend?

btw, i started a new solr cluster with 7 shards, replicationfactor=3 and run
indexing job of 400K docs,
it got stuck on 150K because I used Socketappender directly to write to
logstash and logstash disk got full.

that's why I moved to using AsyncAppender, and I plan on moving to using
rabbit.
but this is also why I wanted to filter some of the logs. indexing 150K docs
prodcued 50GB of logs.
this seemed too much.




Tim Potter wrote
> I'm using logstash4solr (http://logstash4solr.org) for something similar
> ...
>
> I setup my Solr to use Log4J by passing the following on the command-line
> when starting Solr:
> -Dlog4j.configuration=file:///$SCRIPT_DIR/log4j.properties
>
> Then I use a custom Log4J appender that writes to RabbitMQ:
>
> https://github.com/plant42/rabbitmq-log4j-appender
>
> You can then configure a RabbitMQ input for logstash -
> http://logstash.net/docs/1.3.2/inputs/rabbitmq
>
> This decouples the log writes from log indexing in logstash4solr, which
> scales better for active Solr installations.
>
> Btw ... I just log everything from Solr using this approach but you can
> use standard Log4J configuration settings to limit which classes / log
> levels to send to the RabbitMQ appender.
>
> Cheers,
>
> Timothy Potter
> Sr. Software Engineer, LucidWorks
> www.lucidworks.com
>
> 
> From: adfel70 <

> adfel70@

> >
> Sent: Monday, December 30, 2013 8:15 AM
> To:

> solr-user@.apache

> Subject: monitoring solr logs
>
> hi
> i'm trying to figure out which solr and zookeeper logs i should monitor
> and
> collect.
> All the logs will be written to a file but I want to collect some of them
> with logstash in order to be able to analyze them efficiently.
> any inputs on logs of which classes i should collect?
>
> thanks.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/monitoring-solr-logs-tp4108721.html
> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/monitoring-solr-logs-tp4108721p4108737.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: monitoring solr logs

2013-12-30 Thread Tim Potter

I've just been using the Solr query form so far :P but have plans to try out 
Kibana too. Let me know how that goes for you and I'll do the same.


From: adfel70 
Sent: Monday, December 30, 2013 10:06 AM
To: solr-user@lucene.apache.org
Subject: RE: monitoring solr logs

And are you using any tool like kibana as a dashboard for the logs?



Tim Potter wrote
> We're (LucidWorks) are actively developing on logstash4solr so if you have
> issues, let us know. So far, so good for me but I upgraded to logstash
> 1.3.2 even though the logstash4solr version includes 1.2.2 you can use the
> newer one. I'm not quite in production with my logstash4solr <- rabbit-mq
> <- log4j <- Solr solution yet though ;-)
>
> Yeah, 50GB is too much logging for only 150K docs. Maybe start by
> filtering by log level (WARN and more severe). If a server crashes, you're
> likely to see some errors in the logstash side but sometimes you may have
> to SSH to the specific box and look at the local log (so definitely append
> all messages to the local Solr log too), I'm using something like the
> following for local logging:
>
> log4j.rootLogger=INFO, file
> log4j.appender.file=org.apache.log4j.RollingFileAppender
> log4j.appender.file.MaxFileSize=50MB
> log4j.appender.file.MaxBackupIndex=10
> log4j.appender.file.File=logs/solr.log
> log4j.appender.file.layout=org.apache.log4j.PatternLayout
> log4j.appender.file.layout.ConversionPattern=%d{ISO8601} [%t] %-5p %c{3}
> %x - %m%n
>
>
> Timothy Potter
> Sr. Software Engineer, LucidWorks
> www.lucidworks.com
>
> 
> From: adfel70 <

> adfel70@

> >
> Sent: Monday, December 30, 2013 9:34 AM
> To:

> solr-user@.apache

> Subject: RE: monitoring solr logs
>
> Actually I was considering using logstash4solr, but it didn't seem mature
> enough.
> does it work fine? any known bugs?
>
> are you collecting the logs in the same solr cluster you use for the
> production systems?
> if so, what will you do if for some reason solr is down and you would like
> to analyze the logs to see what happend?
>
> btw, i started a new solr cluster with 7 shards, replicationfactor=3 and
> run
> indexing job of 400K docs,
> it got stuck on 150K because I used Socketappender directly to write to
> logstash and logstash disk got full.
>
> that's why I moved to using AsyncAppender, and I plan on moving to using
> rabbit.
> but this is also why I wanted to filter some of the logs. indexing 150K
> docs
> prodcued 50GB of logs.
> this seemed too much.
>
>
>
>
> Tim Potter wrote
>> I'm using logstash4solr (http://logstash4solr.org) for something similar
>> ...
>>
>> I setup my Solr to use Log4J by passing the following on the command-line
>> when starting Solr:
>> -Dlog4j.configuration=file:///$SCRIPT_DIR/log4j.properties
>>
>> Then I use a custom Log4J appender that writes to RabbitMQ:
>>
>> https://github.com/plant42/rabbitmq-log4j-appender
>>
>> You can then configure a RabbitMQ input for logstash -
>> http://logstash.net/docs/1.3.2/inputs/rabbitmq
>>
>> This decouples the log writes from log indexing in logstash4solr, which
>> scales better for active Solr installations.
>>
>> Btw ... I just log everything from Solr using this approach but you can
>> use standard Log4J configuration settings to limit which classes / log
>> levels to send to the RabbitMQ appender.
>>
>> Cheers,
>>
>> Timothy Potter
>> Sr. Software Engineer, LucidWorks
>> www.lucidworks.com
>>
>> 
>> From: adfel70 <
>
>> adfel70@
>
>> >
>> Sent: Monday, December 30, 2013 8:15 AM
>> To:
>
>> solr-user@.apache
>
>> Subject: monitoring solr logs
>>
>> hi
>> i'm trying to figure out which solr and zookeeper logs i should monitor
>> and
>> collect.
>> All the logs will be written to a file but I want to collect some of them
>> with logstash in order to be able to analyze them efficiently.
>> any inputs on logs of which classes i should collect?
>>
>> thanks.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/monitoring-solr-logs-tp4108721.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/monitoring-solr-logs-tp4108721p4108737.html
> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/monitoring-solr-logs-tp4108721p4108744.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: side logging requests

2013-12-30 Thread Tim Potter

You can wire-in a custom UpdateRequestProcessor - 
http://wiki.apache.org/solr/UpdateRequestProcessor

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: elmerfudd 
Sent: Monday, December 30, 2013 10:26 AM
To: solr-user@lucene.apache.org
Subject: side logging requests

Hi all,
currently there are 2 things I want to accomplish.
I want that on demend Every doc (xml) that is sent to be indexed in solr
will be copied to a big log file (I want to control when to activate this
feature and when to deactivate)
same as for queries.
Also, I may need to manipulate the data before its written.

Is there any way achieving this without changing solr sourcecode? (So it
won't be affected by updates).

I thought of a possible way,
I posted before about making "transparent" request handler , is it possible
? If so, how?


thankkk you!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/side-logging-requests-tp4108752.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Redis as Solr Cache

2014-01-02 Thread Tim Vaillancourt

This is a neat idea, but could be too close to lucene/etc.

You could jump up one level in the stack and use Redis/memcache as a
distributed HTTP cache in conjunction with Solr's HTTP caching and a proxy.
I tried doing this myself with Nginx, but I forgot what issue I hit - I
think "misses" needed logic outside of nginx but I didn't spend too much
time on it.

Tim


On 2 January 2014 07:51, Alexander Ramos Jardim <
alexander.ramos.jar...@gmail.com> wrote:

> You touched an interesting point. I am really assuming if a quick win
> scenario is even possible. But what would be the advantage of using Redis
> to keep Solr Cache if each node would keep it's own Redis cache?
>
>
> 2013/12/29 Upayavira 
>
> > On Sun, Dec 29, 2013, at 02:35 PM, Alexander Ramos Jardim wrote:
> > > While researching for Solr Caching options and interesting cases, I
> > > bumped
> > > on this https://github.com/dfdeshom/solr-redis-cache. Does anyone has
> > any
> > > experience with this setup? Using Redis as Solr Cache.
> > >
> > > I see a lot of advantage in having a distributed cache for solr. One
> solr
> > > node benefiting from the cache generated on another one would be
> > > beautiful.
> > >
> > > I see problems too. Performance wise, I don't know if it would be
> viable
> > > for Solr to write it's cache through the network on Redis Master node.
> > >
> > > And what about if I have Solr nodes with different index version
> looking
> > > at
> > > the same cache?
> > >
> > > IMO as long as Redis is useful, if it isn't to have a distributed
> cache,
> > > I
> > > think it's not possible to get better performance using it.
> >
> > This idea makes assumptions about how a Solr/Lucene index operates.
> > Certainly, in a SolrCloud setup, each node is responsible for its own
> > committing, and its caches exist for the timespan between commits. Thus,
> > the cache one node will need will not necessarily be the same as the one
> > that is needed by another node, which might have a commit interval
> > slightly out of sync with the first.
> >
> > So, whilst this may be possible, and may give some benefits, I'd reckon
> > that it would be a rather substantial engineering exercise, rather than
> > the quick win you seem to be assuming it might be.
> >
> > Upayavira
> >
>
>
>
> --
> Alexander Ramos Jardim
>

RE: Solr Cloud Query Scaling

2014-01-09 Thread Tim Potter

Absolutely adding replicas helps you scale query load. Queries do not need to 
be routed to leaders; they can be handled by any replica in a shard. Leaders 
are only needed for handling update requests.

In general, a distributed query has two phases, driven by a controller node 
(what you called collator below). The controller is the Solr that received the 
query request from the client. In Phase 1, the controller distributes the query 
to one of the replicas for all shards and receives back the list of matching 
document IDs from each replica (only a page worth btw). 

The controller merges the results and sorts them to generate a final page of 
results to be returned to the client. In Phase 2, the controller collects all 
the fields from the documents to generate the final result set by querying the 
replicas involved in Phase 1.

The controller uses SolrJ's LBSolrServer to query the shards in Phase 1 so you 
get some basic load-balancing amongst replicas for a shard. I've not done any 
research to see how balanced that selection process is in production but I 
suspect if you have 3 replicas in a shard, then roughly 1/3 of the queries go 
to each.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Sir Gilligan 
Sent: Thursday, January 09, 2014 11:02 AM
To: solr-user@lucene.apache.org
Subject: Solr Cloud Query Scaling

Question: Does adding replicas help with query load?

Scenario: 3 Physical Machines. 3 Shards
Query any machine, get results. Standard Solr Cloud stuff.

Update Scenario: 6 Physical Machines. 3 Shards.
M = Machine, S = Shard, -L = Leader
M1S1-L
M2S2
M3S3
M4S1
M5S2-L
M6S3-L

Incoming Query to M2S2. How will Solr Cloud (4.6.0) distribute the query?
Will M2S2 handle the query for shard 2? Or, will it send it to the
leader of S2 which is M5S2?
When the query is distributed, will it send it to the other leaders? OR,
will it send it to any shard?
Specifically:
Query sent to M2S2. Solr Cloud distributes the query. Could it possibly
send the query on to M3S3 and M4S1? Some kind of query load balance
functionality (maybe like a round robin to the shard members).
OR will M2S2 just be the collator, and send the query to the leaders?
OR something different that I have not described?

If queries do not have to be processed by leaders then we could add
three more physical machines (now total 9 machines) and handle more
query load.

Thank you.

Re: Perl Client for SolrCloud

2014-01-10 Thread Tim Vaillancourt

I'm pretty interested in taking a stab at a Perl CPAN for SolrCloud that 
is Zookeeper-aware; it's the least I can do for Solr as a non-Java 
developer. :)


A quick question though: how would I write the shard logic to behave 
similar to Java's Zookeeper-aware client? I'm able to get the hash/hex 
needed for each shard from clusterstate.json, but how do I know which 
field to hash on?


I'm guessing I also need to read the collection's schema.xml from 
Zookeeper to get uniqueKey, and then use that for sharding, or does the 
Java client take the sharding field as input? Looking for ideas here.


Thanks!

Tim

On 08/01/14 09:35 AM, Chris Hostetter wrote:

:>  I couldn't find anyone which can connect to SolrCloud similar to SolrJ's
:>  CloudSolrServer.
:
: Since I have a load balancer in front of 8 nodes, WebService::Solr[1] still
: works fine.

Right -- just because SolrJ is ZooKeeper aware doesn't mean you can *only*
talk to SolrCloud with SolrJ -- you can still use any HTTP client of your
choice to connect to your Solr nodes in a round robin fashion (or via a
load blancer) if you wish -- just like with a non SolrCloud deployment
using something like master/slave.

What you might want to consider, is taking a look at something like
Net::ZooKeeper to have a ZK aware perl client layer that could wrap
WebService::Solr.


-Hoss
http://www.lucidworks.com/

RE: Trying to config solr cloud

2014-01-21 Thread Tim Potter

Hi Svante,

It seems like the TermVectorComponent is in the search component chain of your 
/select search handler but you haven't indexed docs with term vectors enabled 
(at least from what's in the schema you provided). Admittedly, the NamedList 
code could be a little more paranoid but I think the key is to check the 
component chain of your /select handler to make sure tvComponent isn't included 
(or re-index with term vectors enabled).

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: saka.csi...@gmail.com  on behalf of svante 
karlsson 
Sent: Tuesday, January 21, 2014 4:20 PM
To: solr-user@lucene.apache.org
Subject: Trying to config solr cloud

I've been playing around with solr 4.6.0 for some weeks and I'm trying to
get a solrcloud configuration running.

I've installed two physical machines and I'm trying to set up 4 shards on
each.

I installled a zookeeper on each host as well

I uploaded a config to zookeeper with
/opt/solr-4.6.0/example/cloud-scripts/zkcli.sh -cmd upconfig -zkhost
192.168.0.93:2181 -confdir /opt/solr/om5/conf/ -confname om5

The /opt/solr/om5 was where I kept my normal solr and I'm trying to reuse
that config.


now I start two hosts (one on each server)
java -DzkHost=192.168.0.93:2181,192.168.0.94:2181 -Dhost=192.168.0.93 -jar
start.jar
java -DzkHost=192.168.0.93:2181,192.168.0.94:2181 -Dhost=192.168.0.94 -jar
start.jar

and finally I'll run
curl '
http://192.168.0.93:8983/solr/admin/collections?action=CREATE&name=om5&numShards=8&replicationFactor=1&maxShardsPerNode=4
'

This gets me 8 shard in the web gui
http://192.168.0.94:8983/solr/#/~cloud

Now I add documents to this and that seems to work. I pushed 97 million
docs during the night. ( each shard reports a 8th of the documents )

But all questions returns http 500 in variants of the below result. I get
correct data in the body but always an error trace after that...

http://192.168.0.93:8983/solr/om5/select?q=*:*&rows=1&fl=id

returns



500
32



b1e5865c-3b01---0471b12d16ac




java.lang.NullPointerException at
org.apache.solr.common.util.NamedList.nameValueMapToList(NamedList.java:114)
at org.apache.solr.common.util.NamedList.(NamedList.java:80) at
org.apache.solr.handler.component.TermVectorComponent.finishStage(TermVectorComponent.java:453)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:317)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)

500



So I must be doing something wrong

4.3.1 SC - IndexWriter issues causing replication + failures

2014-02-05 Thread Tim Vaillancourt

Hey guys,

I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2
shards over 4 Solr instances, (which results in 1 core per Solr instance).

After some time in Production without issues, we are seeing errors related
to the IndexWriter all over our logs and an infinite loop of failing
replication from Leader on our 2 replicas.

We see a flood of: "org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed" stacktraces, then the Solr replica tries to
replicate/recover, then fails replication and then the following 2 errors
show up:

1) "SolrIndexWriter was not closed prior to finalize(), indicates a bug --
POSSIBLE RESOURCE LEAK!!!"
2) "Error closing IndexWriter, trying rollback" (which results in a
null-pointer exception).

I'm guessing the best way forward would be to upgrade to latest, but that
is an undertaking that will take significant time/testing. In the meantime,
is there anything I can do to mitigate or understand the issue more?

Does anyone know what the IndexWriter errors refer to?

Below is a URL to a .txt file with summarized portions of my solr.log. Any
help is really appreciated as always!!

http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt

Thanks all,

Tim

Re: 4.3.1 SC - IndexWriter issues causing replication + failures

2014-02-06 Thread Tim Vaillancourt

Some more info to provide:

-Replication almost never completes following the "this IndexWriter is
closed" stacktraces.
-When the replication begins after "this IndexWriter is closed" error, over
a few hours the replica eventually fills the disk to 100% with index files
under data/. There are so many files in the data directory it can't be
listed and takes a very long time to delete. It seems the frequent
replications are filling the disk with new files whose sum is roughly 3
times larger than the real index. Is it leaking filehandles or forgetting
it has downloaded something?

Is this a better question for the lucene list? It seems (see below) that
this stacktrace is occuring in the lucene layer vs solr, but maybe someone
could confirm?

"ERROR [2014-01-27 18:28:49.368] [org.apache.solr.common.SolrException]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at
org.apache.lucene.index.DocumentsWriter.ensureOpen(DocumentsWriter.java:199)
at
org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:338)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:419)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1508)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:519)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:655)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
... "

Thanks!

Tim

On 5 February 2014 13:04, Tim Vaillancourt  wrote:

> Hey guys,
>
> I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2
> shards over 4 Solr instances, (which results in 1 core per Solr instance).
>
> After some time in Production without issues, we are seeing errors related
> to the IndexWriter all over our logs and an infinite loop of failing
> replication from Leader on our 2 replicas.
>
> We see a flood of: "org.apache.lucene.store.AlreadyClosedException: this
> IndexWriter is closed" stacktraces, then the Solr replica tries to
> replicate/recover, then fails replication and then the following 2 errors
> show up:
>
> 1) "SolrIndexWriter was not closed prior to finalize(), indicates a bug --
> POSSIBLE RESOURCE LEAK!!!"
> 2) "Error closing IndexWriter, trying rollback" (which results in a
> null-pointer exception).
>
> I'm guessing the best way forward would be to upgrade to latest, but that
> is an undertaking that will take significant time/testing. In the meantime,
> is there anything I can do to mitigate or understand the issue more?
>
> Does anyone know what the IndexWriter errors refer to?
>
> Below is a URL to a .txt file with summarized portions of my solr.log. Any
> help is really appreciated as always!!
>
> http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt
>
> Thanks all,
>
> Tim
>

RE: Solr Permgen Exceptions when creating/removing cores

2014-02-26 Thread Tim Potter

Hi Josh,

Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM versions, 
permgen collection was disabled by default.

Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may be 
too small.


Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Josh 
Sent: Wednesday, February 26, 2014 12:27 PM
To: solr-user@lucene.apache.org
Subject: Solr Permgen Exceptions when creating/removing cores

We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows
installation with 64bit Java 1.7U51 and we are seeing consistent issues
with PermGen exceptions. We have the permgen configured to be 512MB.
Bitnami ships with a 32bit version of Java for windows and we are replacing
it with a 64bit version.

Passed in Java Options:

-XX:MaxPermSize=64M
-Xms3072M
-Xmx6144M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+CMSClassUnloadingEnabled
-XX:NewRatio=3

-XX:MaxTenuringThreshold=8

This is our use case:

We have what we call a database core which remains fairly static and
contains the imported contents of a table from SQL server. We then have
user cores which contain the record ids of results from a text search
outside of Solr. We then query for the data we want from the database core
and limit the results to the content of the user core. This allows us to
combine facet data from Solr with the search results from another engine.
We are creating the user cores on demand and removing them when the user
logs out.

Our issue is the constant creation and removal of user cores combined with
the constant importing seems to push us over our PermGen limit. The user
cores are removed at the end of every session and as a test I made an
application that would loop creating the user core, import a set of data to
it, query the database core using it as a limiter and then remove the user
core. My expectation was in this scenario that all the permgen associated
with that user cores would be freed upon it's unload and allow permgen to
reclaim that memory during a garbage collection. This was not the case, it
would constantly go up until the application would exhaust the memory.

I also investigated whether the there was a connection between the two
cores left behind because I was joining them together in a query but even
unloading the database core after unloading all the user cores won't
prevent the limit from being hit or any memory to be garbage collected from
Solr.

Is this a known issue with creating and unloading a large number of cores?
Could it be configuration based for the core? Is there something other than
unloading that needs to happen to free the references?

Thanks

Notes: I've tried using tools to determine if it's a leak within Solr such
as Plumbr and my activities turned up nothing.

RE: Replicating Between Solr Clouds

2014-03-04 Thread Tim Potter

Unfortunately, there is no out-of-the-box solution for this at the moment. 

In the past, I solved this using a couple of different approaches, which 
weren't all that elegant but served the purpose and were simple enough to allow 
the ops folks to setup monitors and alerts if things didn't work.

1) use DIH's Solr entity processor to pull data from one Solr to another, see: 
http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

This only works if you store all fields, which in my use case was OK because I 
also did lots of partial document updates, which also required me to store all 
fields

2) use the replication handler's snapshot support to create snapshots on a 
regular basis and then move the files over the network

This one works but required the use of read and write aliases and two 
collections on the remote (slave) data center so that I could rebuild my write 
collection from the snapshots and then update the aliases to point the reads at 
the updated collection. Work on an automated backup/restore solution is 
planned, see https://issues.apache.org/jira/browse/SOLR-5750, but if you need 
something sooner, you can write a backup driver using SolrJ that uses 
CloudSolrServer to get the address of all the shard leaders, initiate the 
backup command on each leader, poll the replication details handler for 
snapshot completion on each shard, and then ship the files across the network. 
Obviously, this isn't a solution for NRT multi-homing ;-)

Lastly, these aren't the only ways to go about this, just wanted to share some 
high-level details about what has worked.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: perdurabo 
Sent: Tuesday, March 04, 2014 1:04 PM
To: solr-user@lucene.apache.org
Subject: Replicating Between Solr Clouds

We are looking to setup a highly available failover site across a WAN for our
SolrCloud instance.  The main production instance is at colo center A and
consists of a 3-node ZooKeeper ensemble managing configs for a 4-node
SolrCloud running Solr 4.6.1.  We only have one collection among the 4 cores
and there are two shards in the collection, one master node and one replica
node for each shard.  Our search and indexing services address the Solr
cloud through a load balancer VIP, not a compound API call.

Anyway, the Solr wiki explains fairly well how to replicate single node Solr
collections, but I do not see an obvious way for replicating a SolrCloud's
indices over a WAN to another SolrCloud.  I need for a SolrCloud in another
data center to be able to replicate both shards of the collection in the
other data center over a WAN.  It needs to be able to replicate from a load
balancer VIP, not a single named server of the SolrCloud, which round robins
across all four nodes/2 shards for high availability.

I've searched high and low for a white paper or some discussion of how to do
this and haven't found anything.  Any ideas?

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SOLRJ and SOLR compatibility

2014-03-04 Thread Tim Potter

Just my 2 cents on this while I wait for a build ... I think we have to ensure 
that an older client will work with a newer server or newer client will work 
with older server to support hot rolling upgrades. It's not unheard of these 
days for an org to have 10's (or even 100's) of Solr cloud servers. As Solr is 
a mission-critical technology, sometimes it can't just be taken off-line, so 
most need to upgrade servers one-by-one. This implies that during a hot rolling 
upgrade, there's going to be a mix of Solr server versions talking to each 
other and clients talking to different versions of servers. You can't take it 
out of the LB either since eventually, you'll have no nodes in your LB. 

I think ops folks will accept either solution (old client -> new server or new 
client -> old server), but we as a community need to pick one and build out the 
test suites that ensure SolrJ compatibility with different versions.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Michael Sokolov 
Sent: Tuesday, March 04, 2014 8:37 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLRJ and SOLR compatibility

Does that mean newer clients work with older servers (I think so, from
reading this thread), or the other way round?  If so, I guess the advice
would be --  upgrade all your clients first?

-Mike


On 03/04/2014 10:00 AM, Mark Miller wrote:
> Yeah, sorry :(  the fix applied is only for compatibility in one direction. 
> Older code won’t know what this type 19 is.
>
> - Mark
>
> http://about.me/markrmiller
>
> On Mar 4, 2014, at 2:42 AM, Thomas Scheffler  
> wrote:
>
>> Am 04.03.2014 07:21, schrieb Thomas Scheffler:
>>> Am 27.02.2014 09:15, schrieb Shawn Heisey:
 On 2/27/2014 12:49 AM, Thomas Scheffler wrote:
>> What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
>> that I'm completely ignorant here, but I have not heard of any.
> Actually bug reports arrive me that sound like
>
> "Unknown type 19"
 Aha!  I found it!  It was caused by the change applied for SOLR-5658,
 fixed in 4.7.0 (just released) by SOLR-5762.  Just my luck that there's
 a bug bad enough to contradict what I told you.

 https://issues.apache.org/jira/browse/SOLR-5658
 https://issues.apache.org/jira/browse/SOLR-5762

 I've added a comment that will help users find SOLR-5762 with a search
 for "Unknown type 19".

 If you use SolrJ 4.7.0, compatibility should be better.
>>> Hi,
>>>
>>> I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR
>>> 4.5.1. I received a client stack trace this morning and still waiting
>>> for a Log-Output from the Server:
>> Here we go for the server side (4.5.1):
>>
>> Mrz 03, 2014 2:39:26 PM org.apache.solr.core.SolrCore execute
>> Information: [clausthal_test] webapp=/solr path=/select
>> params={fl=*,score&sort=mods.dateIssued+desc&q=%2BobjectType:"mods"+%2Bcategory:"clausthal_status\:published"&wt=javabin&version=2&rows=3}
>> hits=186 status=0 QTime=2
>> Mrz 03, 2014 2:39:38 PM
>> org.apache.solr.update.processor.LogUpdateProcessor finish
>> Information: [clausthal_test] webapp=/solr path=/update
>> params={wt=javabin&version=2} {} 0 0
>> Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log
>> Schwerwiegend: java.lang.RuntimeException: Unknown type 19
>> at
>> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
>> at
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
>> at
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
>> at
>> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
>> at
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
>> at
>> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
>> at
>> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
>> at
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
>> at
>> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
>> at
>> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
>> at
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>> at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
>> at
>> org.apache.solr.servlet.Sol

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-14 Thread Tim Potter

Try adding shards.info=true and debug=track to your queries ... these will
give more detailed information about what's going behind the scenes.

On Mon, Oct 13, 2014 at 11:11 PM, S.L  wrote:

> Erick,
>
> I have upgraded to SolrCloud 4.10.1 with the same toplogy , 3 shards and 2
> replication factor with six cores altogether.
>
> Unfortunately , I still see the issue of intermittently no results being
> returned.I am not able to figure out whats going on here, I have included
> the logging information below.
>
> *Here's the query that I run.*
>
>
> http://server1.mydomain.com:8081/solr/dyCollection1/select/?q=*:*&fq=%28id:220a8dce-3b31-4d46-8386-da8405595c47%29&wt=json&distrib=true
>
>
>
> *Scenario 1: No result returned.*
>
> *Log Information for Scenario #1 .*
> 92860314 [http-bio-8081-exec-103] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> null
> 92860315 [http-bio-8081-exec-103] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> null
> 92860315 [http-bio-8081-exec-103] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> null
> 92860315 [http-bio-8081-exec-103] INFO  org.apache.solr.core.SolrCore  –
> [dyCollection1_shard2_replica1] webapp=/solr path=/select/
>
> params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)}
> hits=0 status=0 QTime=5
>
> *Scenario #2 : I get result back*
>
>
>
> *Log information for scenario #2.*92881911 [http-bio-8081-exec-177] INFO
> org.apache.solr.core.SolrCore  – [dyCollection1_shard2_replica1]
> webapp=/solr path=/select
>
> params={spellcheck=true&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=
>
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&fl=productURL,score&df=suggestAggregate&start=0&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fsv=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5
> }
> hits=1 status=0 QTime=1
> 92881913 [http-bio-8081-exec-177] INFO  org.apache.solr.core.SolrCore  –
> [dyCollection1_shard2_replica1] webapp=/solr path=/select
>
> params={spellcheck=false&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&ids=
>
> http://www.searcheddomain.com/p/ironwork-8-piece-comforter-set/-/A-15273248&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&df=suggestAggregate&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5
> }
> status=0 QTime=0
> 92881914 [http-bio-8081-exec-169] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> null
> 92881914 [http-bio-8081-exec-169] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> null
> 92881914 [http-bio-8081-exec-169] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> null
> 92881914 [http-bio-8081-exec-169] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> null
> 92881915 [http-bio-8081-exec-169] INFO  org.apache.solr.core.SolrCore  –
> [dyCollection1_shard2_replica1] webapp=/solr path=/select/
>
> params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)}
> hits=1 status=0 QTime=7
>
>
> *Autocommit and Soft commit settings.*
>
>  
>${solr.autoSoftCommit.maxTime:-1}
>  
>
>  
>${solr.autoCommit.maxTime:15000}
>
>true
>  
>
>
>
> On Tue, Oct 7, 2014 a

Re: Recovering from Out of Mem

2014-10-14 Thread Tim Potter

jfyi - the bin/solr script does the following:

-XX:OnOutOfMemoryError="$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT" where
$SOLR_PORT is the port Solr is bound to, e.g. 8983

The oom_solr.sh script looks like:

SOLR_PORT=$1

SOLR_PID=`ps waux | grep start.jar | grep $SOLR_PORT | grep -v grep | awk
'{print $2}' | sort -r`

if [ "$SOLR_PID" == "" ]; then

  echo "Couldn't find Solr process running on port $SOLR_PORT!"

  exit

fi

NOW=$(date +"%F%T")

(

echo "Running OOM killer script for process $SOLR_PID for Solr on port
$SOLR_PORT"

kill -9 $SOLR_PID

echo "Killed process $SOLR_PID"

) | tee solr_oom_killer-$SOLR_PORT-$NOW.log


I usually run Solr behind a supervisor type process (supervisord or
upstart) that will restart it if the process dies.


On Tue, Oct 14, 2014 at 8:09 AM, Markus Jelsma  wrote:

> This will do:
> kill -9 `ps aux | grep -v grep | grep tomcat6 | awk '{print $2}'`
>
> pkill should also work
>
> On Tuesday 14 October 2014 07:02:03 Yago Riveiro wrote:
> > Boogie,
> >
> >
> >
> >
> > Any example for java_error.sh script?
> >
> >
> > —
> > /Yago Riveiro
> >
> > On Tue, Oct 14, 2014 at 2:48 PM, Boogie Shafer <
> boogie.sha...@proquest.com>
> >
> > wrote:
> > > a really simple approach is to have the OOM generate an email
> > > e.g.
> > > 1) create a simple script (call it java_oom.sh) and drop it in your
> tomcat
> > > bin dir echo `date` | mail -s "Java Error: OutOfMemory - $HOSTNAME"
> > > not...@domain.com 2) configure your java options (in setenv.sh or
> > > similar) to trigger heap dump and the email script when OOM occurs #
> > > config error behaviors
> > > CATALINA_OPTS="$CATALINA_OPTS -XX:+HeapDumpOnOutOfMemoryError
> > > -XX:HeapDumpPath=$TOMCAT_DIR/temp/tomcat-dump.hprof
> > > -XX:OnError=$TOMCAT_DIR/bin/java_error.sh
> > > -XX:OnOutOfMemoryError=$TOMCAT_DIR/bin/java_oom.sh
> > > -XX:ErrorFile=$TOMCAT_DIR/temp/java_error%p.log"
> > > 
> > > From: Mark Miller 
> > > Sent: Tuesday, October 14, 2014 06:30
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Recovering from Out of Mem
> > > Best is to pass the Java cmd line option that kills the process on OOM
> and
> > > setup a supervisor on the process to restart it.  You need a somewhat
> > > recent release for this to work properly though. - Mark
> > >
> > >> On Oct 14, 2014, at 9:06 AM, Salman Akram
> > >>  wrote:
> > >>
> > >> I know there are some suggestions to avoid OOM issue e.g. setting
> > >> appropriate Max Heap size etc. However, what's the best way to recover
> > >> from
> > >> it as it goes into non-responding state? We are using Tomcat on back
> end.
> > >>
> > >> The scenario is that once we face OOM issue it keeps on taking queries
> > >> (doesn't give any error) but they just time out. So even though we
> have a
> > >> fail over system implemented but we don't have a way to distinguish if
> > >> these are real time out queries OR due to OOM.
> > >>
> > >> --
> > >> Regards,
> > >>
> > >> Salman Akram
>
>

Re: Frequent recovery of nodes in SolrCloud

2014-10-17 Thread Tim Potter

A couple of things to check:

1) How many znodes are under the /overseer/queue (which you can see in the
Cloud Tree panel in the Admin UI)
2) How often are you committing? The general advice is that your indexing
client(s) should not send commits and instead rely on auto-commit settings
in solrconfig.xml. I usually start with a hard auto-commit every 60secs
3) Anything in the logs telling you why a replica thinks it needs to
recover? Specifically, I'd search for ZooKeeper session expiration log
messages (grep expired solr.log)



On Thu, Oct 16, 2014 at 10:01 PM, Sachin Kale  wrote:

> Also, the PingRequestHandler is configured as:
>
> 
> server-enabled.txt
>
>
> On Fri, Oct 17, 2014 at 9:07 AM, Sachin Kale 
> wrote:
>
> > From ZooKeeper side, we have following configuration:
> > tickTime=2000
> > dataDir=/var/lib/zookeeper
> > clientPort=2181
> > initLimit=5
> > syncLimit=2
> > server.1=192.168.70.27:2888:3888
> > server.2=192.168.70.64:2889:3889
> > server.3=192.168.70.26:2889:3889
> >
> > Also, in solr.xml, we have zkClientTimeout set to 3.
> >
> > On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson  >
> > wrote:
> >
> >> And what is your zookeeper timeout? When it's too short that can lead
> >> to this behavior.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Oct 16, 2014 at 4:34 PM, "Jürgen Wagner (DVT)"
> >>  wrote:
> >> > Hello,
> >> >   you have one shard and 11 replicas? Hmm...
> >> >
> >> > - Why you have to keep two nodes on some machines?
> >> > - Physical hardware or virtual machines?
> >> > - What is the size of this index?
> >> > - Is this all on a local network or are there links with potential
> >> outages
> >> > or failures in between?
> >> > - What is the query load?
> >> > - Have you had a look at garbage collection?
> >> > - Do you use the internal Zookeeper?
> >> > - How many nodes?
> >> > - Any observers?
> >> > - What kind of load does Zookeeper show?
> >> > - How much RAM do these nodes have available?
> >> > - Do some servers get into swapping?
> >> > - ...
> >> >
> >> > How about some more details in terms of sizing and topology?
> >> >
> >> > Cheers,
> >> > --Jürgen
> >> >
> >> >
> >> > On 16.10.2014 18:41, sachinpkale wrote:
> >> >
> >> > Hi,
> >> >
> >> > Recently we have shifted to SolrCloud (4.10.1) from traditional
> >> Master-Slave
> >> > configuration. We have only one collection and it has only only one
> >> shard.
> >> > Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens,
> we
> >> have
> >> > two instances running on each) out of which one is leader.
> >> >
> >> > Whenever I see the cluster status using http://
> :/solr/#/~cloud,
> >> it
> >> > shows at least one (sometimes, it is 2-3) node status as recovering.
> We
> >> are
> >> > using HAProxy load balancer and there also many times, it is showing
> the
> >> > nodes are recovering. This is happening for all nodes in the cluster.
> >> >
> >> > What would be the problem here? How do I check this in logs?
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >> >
> >>
> http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
> >> > Sent from the Solr - User mailing list archive at Nabble.com.
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> >> > уважением
> >> > i.A. Jürgen Wagner
> >> > Head of Competence Center "Intelligence"
> >> > & Senior Cloud Consultant
> >> >
> >> > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> >> > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
> >> 1543
> >> > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
> >> >
> >> > 
> >> > Managing Board: Jürgen Hatzipantelis (CEO)
> >> > Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> >> > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
> >> >
> >> >
> >>
> >
> >
>

Re: Recovering from Out of Mem

2014-10-17 Thread Tim Potter

You'd still want to kill it ... so you'll need to register a cmd script
with the JVM using -XX:OnOutOfMemoryError=kill.cmd and then you could
either

1) trap the PID at startup using something like:

title SolrCloud

for /F "tokens=2 delims= " %%A in ('TASKLIST /FI ^"WINDOWTITLE eq
SolrCloud^" /NH') do (

set /A SOLR_PID=%%A

echo !SOLR_PID!>solr.pid


or


2) if you keep track of the port (which all my Windows scripts do), then
you can do:


For /f "tokens=5" %%j in ('netstat -aon ^| find /i "listening" ^| find
":%SOLR_PORT%"') do (

  taskkill /t /f /pid %%j > nul 2>&1

)


On Fri, Oct 17, 2014 at 1:11 AM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> I know this might sound weird but any easy way to do it in Windows?
>
> On Tue, Oct 14, 2014 at 7:51 PM, Boogie Shafer  >
> wrote:
>
> > yago,
> >
> > you can put more complex restart logic as shown in the examples below or
> > just do something similar to the java_oom.sh i posted earlier where you
> > just spit out an email alert and deal with service restarts and
> > troubleshooting manually
> >
> >
> > e.g. something like the following for a java_error.sh will drop an email
> > with a timestamp
> >
> >
> >
> > echo `date` | mail -s "Java Error: General - $HOSTNAME"
> not...@domain.com
> >
> >
> > 
> > From: Tim Potter 
> > Sent: Tuesday, October 14, 2014 07:35
> > To: solr-user@lucene.apache.org
> > Subject: Re: Recovering from Out of Mem
> >
> > jfyi - the bin/solr script does the following:
> >
> > -XX:OnOutOfMemoryError="$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT" where
> > $SOLR_PORT is the port Solr is bound to, e.g. 8983
> >
> > The oom_solr.sh script looks like:
> >
> > SOLR_PORT=$1
> >
> > SOLR_PID=`ps waux | grep start.jar | grep $SOLR_PORT | grep -v grep | awk
> > '{print $2}' | sort -r`
> >
> > if [ "$SOLR_PID" == "" ]; then
> >
> >   echo "Couldn't find Solr process running on port $SOLR_PORT!"
> >
> >   exit
> >
> > fi
> >
> > NOW=$(date +"%F%T")
> >
> > (
> >
> > echo "Running OOM killer script for process $SOLR_PID for Solr on port
> > $SOLR_PORT"
> >
> > kill -9 $SOLR_PID
> >
> > echo "Killed process $SOLR_PID"
> >
> > ) | tee solr_oom_killer-$SOLR_PORT-$NOW.log
> >
> >
> > I usually run Solr behind a supervisor type process (supervisord or
> > upstart) that will restart it if the process dies.
> >
> >
> > On Tue, Oct 14, 2014 at 8:09 AM, Markus Jelsma 
> > wrote:
> >
> > > This will do:
> > > kill -9 `ps aux | grep -v grep | grep tomcat6 | awk '{print $2}'`
> > >
> > > pkill should also work
> > >
> > > On Tuesday 14 October 2014 07:02:03 Yago Riveiro wrote:
> > > > Boogie,
> > > >
> > > >
> > > >
> > > >
> > > > Any example for java_error.sh script?
> > > >
> > > >
> > > > —
> > > > /Yago Riveiro
> > > >
> > > > On Tue, Oct 14, 2014 at 2:48 PM, Boogie Shafer <
> > > boogie.sha...@proquest.com>
> > > >
> > > > wrote:
> > > > > a really simple approach is to have the OOM generate an email
> > > > > e.g.
> > > > > 1) create a simple script (call it java_oom.sh) and drop it in your
> > > tomcat
> > > > > bin dir echo `date` | mail -s "Java Error: OutOfMemory - $HOSTNAME"
> > > > > not...@domain.com 2) configure your java options (in setenv.sh or
> > > > > similar) to trigger heap dump and the email script when OOM occurs
> #
> > > > > config error behaviors
> > > > > CATALINA_OPTS="$CATALINA_OPTS -XX:+HeapDumpOnOutOfMemoryError
> > > > > -XX:HeapDumpPath=$TOMCAT_DIR/temp/tomcat-dump.hprof
> > > > > -XX:OnError=$TOMCAT_DIR/bin/java_error.sh
> > > > > -XX:OnOutOfMemoryError=$TOMCAT_DIR/bin/java_oom.sh
> > > > > -XX:ErrorFile=$TOMCAT_DIR/temp/java_error%p.log"
> > > > > 
> > > > > From: Mark Miller 
> > > > > Sent: Tuesday, October 14, 2014 06:30
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Recovering from Out of Mem
> > > > > Best is to pass the Java cmd line option that kills the process on
> > OOM
> > > and
> > > > > setup a supervisor on the process to restart it.  You need a
> somewhat
> > > > > recent release for this to work properly though. - Mark
> > > > >
> > > > >> On Oct 14, 2014, at 9:06 AM, Salman Akram
> > > > >>  wrote:
> > > > >>
> > > > >> I know there are some suggestions to avoid OOM issue e.g. setting
> > > > >> appropriate Max Heap size etc. However, what's the best way to
> > recover
> > > > >> from
> > > > >> it as it goes into non-responding state? We are using Tomcat on
> back
> > > end.
> > > > >>
> > > > >> The scenario is that once we face OOM issue it keeps on taking
> > queries
> > > > >> (doesn't give any error) but they just time out. So even though we
> > > have a
> > > > >> fail over system implemented but we don't have a way to
> distinguish
> > if
> > > > >> these are real time out queries OR due to OOM.
> > > > >>
> > > > >> --
> > > > >> Regards,
> > > > >>
> > > > >> Salman Akram
> > >
> > >
> >
>
>
>
> --
> Regards,
>
> Salman Akram
>

Solr error : sorry, no dataimport-handler defined!

2014-11-02 Thread Tim Dunphy

Hey guys,

 I'm real new at working with Solr. But I need to get up to speed and I
appreciate your bearing with me.

 I've installed solr 4 and am running it under tomcat 7. The install went
perfectly fine and everything seems to work, up to a point. I've even
automated the installation with puppet which gets everything up and running
perfectly as well.

 However my problem is that I need to be able to import some data from a
mysql database.

 I've followed this tutorial to try and do this:


http://www.beingjavaguys.com/2013/01/how-to-use-solr-data-import-handler-to.html


I've added a file called data-config.xml to the following location under my
solr root:

[root@solr1:*/opt/solr/collection1/con*f] #cat data-config.xml

 

 

   









   

   

   

   

   

   

   

   

   



And added the following section to
my /opt/solr/collection1/conf/solrconfig.xml

  

 

   data-config.xml

   



   

Then restart tomcat. I then navigate to collection1 -> data import in the
solr admin interface and see the following response:

sorry, no dataimport-handler defined!

I am ok with parsing XML with my eyes. I've worked in some big
environments, where I've had to read it until my eyes bled! hah.. but I am
not sure if I am placing the section in solrconfig.xml that it needs to be.
I'm probably missing something obvious since I'm so new at using solr. I'm
hoping someone with more experience can point me in the right direction.

I'm enclosing my solrconfig.xml files and data-config.xml in case someone
wants to get a sense of the context that I'm working with.

Thanks!
-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
 

 
   

        
            
            

   
   
   
   
   
   
       

   
   






  

  
  4.10.1

  

  
  
  

  
  

  
  

  
  

  
  
  
  
  ${solr.data.dir:}


  
  

 
  
 
${solr.hdfs.home:}

${solr.hdfs.confdir:}

${solr.hdfs.blockcache.enabled:true}

${solr.hdfs.blockcache.global:true}

   

  
  

  
  

  
  
















   







${solr.lock.type:native}












  
  
  
  
  
  



 true


 false
  


  
  
  
  
  
  

  
  

 

  ${solr.ulog.dir:}

 

  
   ${solr.autoCommit.maxTime:15000} 
   false 
 



  
   ${solr.autoSoftCommit.maxTime:-1} 
 






  
  
  
  
  
  

  
  

1024









   



 










true

   
   

   
   20

   
   200

   


  

  


  

  static firstSearcher warming in solrconfig.xml

  



false


2

  


  
  
 








  

  
  
  

 
   explicit
   10
   text
 









  
  
 
   explicit
   json
   true
   text
 
  

     

   
   data-config.xml



   


  
  
 
   true
   json
   true
 
  

  

  

  {!xport}
  xsort
  false



  query

  






  
  
 
   explicit

   
   velocity
   browse
   layout
   Solritas

   
   edismax
   
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
   
   text
   100%
   *:*
   10
   *,score

   
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
   
   text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename
   3

   
   on
   true
   cat
   manu_exact
   content_type
   author_s
   ipod
   GB
   1
   cat,inStock
   after
   price
   0
   600
   50
   popularity
   0
   10
   3
   manufacturedate_dt
   NOW/YEAR-10YEARS
   NOW
   +1YEAR
   before
   after

   
   on
   content features title name
   true
   html
   
   
   0
   title
   0
   name
   3
   200
   content
   750

   
   on
   false   
   5
   2
   5   
   true
   true  
   5
   3   
 

 
 
   spellcheck
 
  


  
  


  

  

  
  

  true
  ignored_

  
  true
  links
  ignored_

Re: Solr error : sorry, no dataimport-handler defined!

2014-11-02 Thread Tim Dunphy

 that, I still don't have the ability to import data
from mysql. :(

Any other ideas?

Thanks,

Tim




On Sun, Nov 2, 2014 at 8:56 PM, Alexandre Rafalovitch 
wrote:

> Well,
>
> I thought the ""
> and the ending span were broken email thing but they seem to be in the
> solrconfig.xml file as well. I would start from removing those and
> leaving just the actual definition.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 2 November 2014 20:50, Tim Dunphy  wrote:
> >
> > Hey guys,
> >
> >  I'm real new at working with Solr. But I need to get up to speed and I
> > appreciate your bearing with me.
> >
> >  I've installed solr 4 and am running it under tomcat 7. The install went
> > perfectly fine and everything seems to work, up to a point. I've even
> > automated the installation with puppet which gets everything up and
> running
> > perfectly as well.
> >
> >  However my problem is that I need to be able to import some data from a
> > mysql database.
> >
> >  I've followed this tutorial to try and do this:
> >
> >
> >
> http://www.beingjavaguys.com/2013/01/how-to-use-solr-data-import-handler-to.html
> >
> >
> > I've added a file called data-config.xml to the following location under
> my
> > solr root:
> >
> > [root@solr1:/opt/solr/collection1/conf] #cat data-config.xml
> >
> >  
> >
> >   > url="jdbc:mysql://web1.mydomain.com:3306/mydomain" user="admin"
> > password=“secret” batchSize="1" />
> >
> >
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >
> >
> >
> >
> >
> >
> > name="user_activation_key" />
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 
> >
> > And added the following section to my
> > /opt/solr/collection1/conf/solrconfig.xml
> >
> >> name="/dataimport"
> > class="org.apache.solr.handler.dataimport.DataImportHandler">
> >
> >  
> >
> >data-config.xml
> >
> >
> >
> > 
> >
> >
> >
> > Then restart tomcat. I then navigate to collection1 -> data import in the
> > solr admin interface and see the following response:
> >
> > sorry, no dataimport-handler defined!
> >
> > I am ok with parsing XML with my eyes. I've worked in some big
> environments,
> > where I've had to read it until my eyes bled! hah.. but I am not sure if
> I
> > am placing the section in solrconfig.xml that it needs to be. I'm
> probably
> > missing something obvious since I'm so new at using solr. I'm hoping
> someone
> > with more experience can point me in the right direction.
> >
> > I'm enclosing my solrconfig.xml files and data-config.xml in case someone
> > wants to get a sense of the context that I'm working with.
> >
> > Thanks!
> >
> > --
> > GPG me!!
> >
> > gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> >
>



-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

1 2 3 4 5 >

1 - 100 of 456 matches

Mail list logo