from:"Mark Miller"

Re: Specifying a different txn log directory

2016-01-09 Thread Mark Miller

dataDir and tlog dir cannot be changed with a core reload.

- Mark

On Sat, Jan 9, 2016 at 1:20 PM Erick Erickson 
wrote:

> Please show us exactly what you did. and exactly
> what you saw to say that "does not seem to work".
>
> Best,
> Erick
>
> On Fri, Jan 8, 2016 at 7:47 PM, KNitin  wrote:
> > Hi,
> >
> > How do I specify a different directory for transaction logs? I tried
> using
> > the updatelog entry in solrconfig.xml and reloaded the collection but
> that
> > does not seem to work.
> >
> > Is there another setting I need to change?
> >
> > Thanks
> > Nitin
>
-- 
- Mark
about.me/markrmiller

Re: Possible Bug - MDC handling in org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable)

2016-01-11 Thread Mark Miller

Not sure I'm onboard with the first proposed solution, but yes, I'd open a
JIRA issue to discuss.

- Mark

On Mon, Jan 11, 2016 at 4:01 AM Konstantin Hollerith 
wrote:

> Hi,
>
> I'm using SLF4J MDC to log additional Information in my WebApp. Some of my
> MDC-Parameters even include Line-Breaks.
> It seems, that Solr takes _all_ MDC parameters and puts them into the
> Thread-Name, see
>
> org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable).
>
> When there is some logging of Solr, the log gets cluttered:
>
> [11.01.16 09:14:19:170 CET] 02a3 SystemOut O 09:14:19,169
> [zkCallback-14-thread-1-processing-My
> Custom
> MDC
> Parameter ROraqiFWaoXqP21gu4uLpMh SANDHO] WARN
> common.cloud.ConnectionManager [session=ROraqiFWaoXqP21gu4uLpMh]
> [user=SANDHO]: zkClient received AuthFailed
>
> (some of my MDC-Parameters are only active in Email-Logs and are not
> included in the file-log)
>
> I think this is a Bug. Solr should only put its own MDC-Parameter into the
> Thread-Name.
>
> Possible Solution: Since all (as far as i can check) invocations in Solr of
> MDC.put uses a Prefix like "ConcurrentUpdateSolrClient" or
> "CloudSolrClient" etc., it would be possible to put a check into
> MDCAwareThreadPoolExecutor.execute(Runnable) that process only those
> Prefixes.
>
> Should i open a Jira-Issue for this?
>
> Thanks,
>
> Konstantin
>
> Environment: JSF-Based App with WebSphrere 8.5, Solr 5.3.0, slf4j-1.7.12,
> all jars are in WEB-INF/lib.
>
-- 
- Mark
about.me/markrmiller

Re: Solr has multiple log lines for single search

2016-01-11 Thread Mark Miller

Two of them are sub requests. They have params isShard=true and
distrib=false. The top level user query will not have distrib or isShard
because they default the other way.

- Mark

On Mon, Jan 11, 2016 at 6:30 AM Syed Mudasseer 
wrote:

> Hi,
> I have solr configured on cloud with the following details:
> Every collection has 3 shards andEach shard consists of 3 replicas.
> Whenever I search for any field in solr, having faceting and highlighting
> query checked,then I get more than 2 search logs stored in the log file.
> (sometimes, it goes up to 8 log lines).
> I am trying to get the search terms entered by user, but due to duplicate
> records I am not able to decide which query is more appropriate to parse.
> Here is an example of log lines(field search with faceting) gives me 3
> results in the log,
> INFO  - 2016-01-11 11:07:09.321; org.apache.solr.core.SolrCore;
> [mycollection_shard2_replica1] webapp=/solr path=/select
> params={f.ab_model.facet.limit=160&lowercaseOperators=true&facet=true&qf=description&distrib=false&hl.simple.pre=&wt=javabin&hl=false&version=2&rows=100&defType=edismax&NOW=1452510429317&shard.url=
> http://MyURL:8983/solr/mycollection_shard2_replica1/|http://MyURL:8983/solr/mycollection_shard2_replica3/|http://MyURL:8983/solr/mycollection_shard2_replica2/&fl=id&fl=score&df=search&start=0&q=MySearchTerm&f.ab_model.facet.mincount=0&_=9652510428630&hl.simple.post=&facet.field=ab_model&isShard=true&stopwords=true&fsv=true}
> hits=753 status=0 QTime=1
> INFO  - 2016-01-11 11:07:09.349; org.apache.solr.core.SolrCore;
> [mycollection_shard2_replica1] webapp=/solr path=/select
> params={lowercaseOperators=true&facet=false&ids=2547891056_HDR,3618199460_HDR,3618192453_HDR,3618277839_HDR,3618186992_HDR,3618081995_HDR,3618074192_HDR,3618189660_HDR,3618073929_HDR,3618078287_HDR,3618084580_HDR,3618075438_HDR,3618170375_HDR,3618195949_HDR,3618074030_HDR,3618085730_HDR,3618078288_HDR,3618072500_HDR,3618086961_HDR,3618170928_HDR,3618077108_HDR,3618074090_HDR,3618181279_HDR,3618188058_HDR,3618181018_HDR,3618199309_HDR,3618195610_HDR,3618281575_HDR,3618195568_HDR,3618080877_HDR,3618199114_HDR,3618199132_HDR,3618084030_HDR,3618280868_HDR,3618193086_HDR,3618275194_HDR,3618074917_HDR,3618195102_HDR,3618086958_HDR,3618084870_HDR,3618174630_HDR,3618075776_HDR,3618190529_HDR,3618192993_HDR,3618084217_HDR,3618176677_HDR,3618183612_HDR&qf=description&distrib=false&hl.simple.pre=&wt=javabin&hl=true&version=2&rows=100&defType=edismax&NOW=1452510429317&shard.url=
> http://MyURL:8983/solr/mycollection_shard2_replica1/|http://MyURL:8983/solr/mycollection_shard2_replica3/|http://MyURL:8983/solr/mycollection_shard2_replica2/&df=search&q=MySearchTerm&_=1452510428630&hl.simple.post=&facet.field=ab_model&isShard=true&stopwords=true}
> status=0 QTime=15
> INFO  - 2016-01-11 11:07:09.352; org.apache.solr.core.SolrCore;
> [mycollection_shard1_replica1] webapp=/solr path=/select
> params={lowercaseOperators=true&facet=true&indent=true&qf=description&hl.simple.pre=&wt=json&hl=true&defType=edismax&q=MySearchTerm&_=1452510428630&hl.simple.post=&facet.field=ab_model&stopwords=true}
> hits=2276 status=0 QTime=35
> If I have highlighted query checked, then I get more than 3 logs.
> So my question is which line is more appropriate to get the search query
> entered by User?or Should I consider all of the log lines?
>

-- 
- Mark
about.me/markrmiller

Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

2016-02-03 Thread Mark Miller

You get this when the Overseer is either bogged down or not processing
events generally.

The Overseer is way, way faster at processing events in 5x.

If you search your logs for .Overseer you can see what it's doing. Either
nothing at the time, or bogged down processing state updates probably.

Along with 5x Overseer processing being much more efficient, SOLR-7281 is
going to take out a lot of state publishing on shutdown that can end up
getting processed on the next startup.

- Mark

On Wed, Feb 3, 2016 at 6:39 PM hawk  wrote:

> Here are more details around the event.
>
> 160201 11:57:22.272 http-bio-8082-exec-18 [] webapp=/solr path=/update
> params={waitSearcher=true&distrib.from=http://x:x
> /solr//&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false}
> {commit=} 0 134
>
> 160201 11:57:25.993 RecoveryThread Error while trying to recover.
> core=x
> java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
> asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
> still do not see the requested state. I see state: recovering live:true
> leader from ZK: http://x:x/solr//
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> at
>
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
> at
>
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
> asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
> still do not see the requested state. I see state: recovering live:true
> leader from ZK: http://x:x/solr//
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:550)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> 160201 11:57:25.993 RecoveryThread Recovery failed - trying again... (7)
> core=
>
> 160201 11:57:25.994 RecoveryThread Wait 256.0 seconds before trying to
> recover again (8)
>
> 160201 11:57:30.370 http-bio-8082-exec-3
> org.apache.solr.common.SolrException: no servers hosting shard:
> at
>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
> at
>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255073.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller

Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

2016-02-04 Thread Mark Miller

Only INFO level, so I suspect not bad...

If that Overseer closed, another node should have picked up where it left
off. See that in another log?

Generally an Overseer close means a node or cluster restart.

This can cause a lot of DOWN state publishing. If it's a cluster restart, a
lot of those DOWN publishes are not processed until the cluster is started
back up - which can lead to the Overseer being overwhelmed and things not
responding fast enough. You should be able to see an active Overseer
working on publishing those states though (it shows that at INFO logging
level).

If the Overseer is simply down and another did not take over, that is just
some kind of bug. If it's overwhelmed, 5x is much much faster,
and SOLR-7281 should also help, but that is no real help for 4.x at this
point.

Anyway, key is, what is the active Overseer doing. Is there no active
Overseer? Or is it busy trying to push through a backlog of operations.

- Mark

On Wed, Feb 3, 2016 at 8:46 PM hawk  wrote:

> Thanks Mark.
>
> I was able to search "Overseer" in the solr logs around the time frame of
> the condition. This particular message was from the leader node of the
> shard.
>
> 160201 11:26:36.380 localhost-startStop-1 Overseer (id=null) closing
>
> Also I found this message in the zookeeper logs.
>
> 11:26:35,218 [myid:02] - INFO [ProcessThread(sid:2
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
> processing sessionid:0x15297c0fe2e3f2d type:create cxid:0x3
> zxid:0xf0001be48
> txntype:-1 reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
> NodeExists for /overseer
>
> Any thoughts what these messages suggest?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255105.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller

Re: SolrCloud Admin UI shows node is Down, but state.json says it's active/up

2015-09-09 Thread Mark Miller

Perhaps there is something preventing clean shutdown. Shutdown makes a best
effort attempt to publish DOWN for all the local cores.

Otherwise, yes, it's a little bit annoying, but full state is a combination
of the state entry and whether the live node for that replica exists or not.

- Mark

On Wed, Sep 9, 2015 at 1:50 AM Arcadius Ahouansou 
wrote:

> Thank you Tomás for pointing to the JavaDoc
>
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE
>
> The Javadoc is quite clear. So this stale state.json is not an issue after
> all.
>
> However, it's very confusing that when a node goes down, state.json may be
> updated for 1 collection while it remains stale in the other collection.
> Also in our case, the node did not crash as per the JavaDoc... it was a
> normal server stop/shut-down.
> We may need to review our shut-down process and see whether things change.
>
> Thank you very much Erick and Tomás for your valuable help... very
> appreciated.
>
> Arcadius.
>
>
> On 8 September 2015 at 18:28, Erick Erickson 
> wrote:
>
> > bq: You were probably referring to state.json
> >
> > yep, I'm never sure whether people are on the old or new ZK versions.
> >
> > OK, With Tomás' comment, I think it's explained... although confusing.
> >
> > WDYT?
> >
> >
> > On Tue, Sep 8, 2015 at 10:03 AM, Arcadius Ahouansou
> >  wrote:
> > > Hello Erick.
> > >
> > > Yes,
> > >
> > > 1> liveNodes has N nodes listed (correctly): Correct, liveNodes is
> always
> > > right.
> > >
> > > 2> clusterstate.json has N+M nodes listed as "active":
> clusterstate.json
> > is
> > > always empty as it's no longer being "used" in 5.3. You were
> > > probably referring to state.json which is in individual collections.
> Yes,
> > > that one reflects the wrong value i.e N+M
> > >
> > > 3> using the collection API to get CLUSTERSTATUS always return the
> > correct
> > > value N
> > >
> > > 4> The Front-end code in code in cloud.js displays the right colour
> when
> > > nodes go down because it checks for the live node
> > >
> > > The problem is only with state.json under certain circumstances.
> > >
> > > Thanks.
> > >
> > > On 8 September 2015 at 17:51, Erick Erickson 
> > > wrote:
> > >
> > >> Arcadius:
> > >>
> > >> Hmmm. It may take a while for the cluster state to change, but I'm
> > >> assuming that this state persists for minutes/hours/days.
> > >>
> > >> So to recap: If dump the entire ZK node from the root, you have
> > >> 1> liveNodes has N nodes listed (correctly)
> > >> 2> clusterstate.json has N+M nodes listed as "active"
> > >>
> > >> Doesn't sound right to me, but I'll have to let people who are deep
> > >> into that code speculate from here.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Tue, Sep 8, 2015 at 1:13 AM, Arcadius Ahouansou <
> > arcad...@menelic.com>
> > >> wrote:
> > >> > On Sep 8, 2015 6:25 AM, "Erick Erickson" 
> > >> wrote:
> > >> >>
> > >> >> Perhaps the browser cache? What happens if you, say, use
> > >> >> Zookeeper client tools to bring down the the cluster state in
> > >> >> question? Or perhaps just refresh the admin UI when showing
> > >> >> the cluster status
> > >> >>
> > >> >
> > >> > Hello Erick.
> > >> >
> > >> > Thank you very much for answering.
> > >> > I did use the ZooInspetor tool to check the state.json in all 5 zk
> > nodes
> > >> > and they are all out of date and identical to what I get through the
> > tree
> > >> > view in sole admin ui.
> > >> >
> > >> > Looking at the source code cloud.js that correctly display nodes as
> > >> "gone"
> > >> > in the graph view, it calls the end point /zookeeper?wt=json and
> > relies
> > >> on
> > >> > the live nodes to mark a node as down instead of status.json.
> > >> >
> > >> > Thanks.
> > >> >
> > >> >> Shot in the dark,
> > >> >> Erick
> > >> >>
> > >> >> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou <
> > >> arcad...@menelic.com>
> > >> > wrote:
> > >> >> > We are running the latest Solr 5.3.0
> > >> >> >
> > >> >> > Thanks.
> > >>
> > >
> > >
> > >
> > > --
> > > Arcadius Ahouansou
> > > Menelic Ltd | Information is Power
> > > M: 07908761999
> > > W: www.menelic.com
> > > ---
> >
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Information is Power
> M: 07908761999
> W: www.menelic.com
> ---
>
-- 
- Mark
about.me/markrmiller

Re: Ant Ivy resolve / Authenticated Proxy Issue

2015-09-16 Thread Mark Miller

Have you used jconsole or visualvm to see what it is actually hanging on to
there? Perhaps it is lock files that are not cleaned up or something else?

You might try: find ~/.ivy2 -name "*.lck" -type f -exec rm {} \;

- Mark

On Wed, Sep 16, 2015 at 9:50 AM Susheel Kumar  wrote:

> Hi,
>
> Sending it to Solr group in addition to Ivy group.
>
>
> I have been building Solr trunk (
> http://svn.apache.org/repos/asf/lucene/dev/trunk/) using "ant eclipse"
> from
> quite some time but this week i am on a job where things are behind the
> firewall and a proxy is used.
>
> Issue: When not in company network then build works fine but when inside
> company network  Ivy stucks during resolve when downloading
> https://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.jar (see
> below) I have set ANT_OPTS=-Dhttp.proxyHost=myproxyhost
> -Dhttp.proxyPort=8080 -Dhttp.proxyUser=myproxyusername
> -Dhttp.proxyPassword=myproxypassword  but that doesn't help.   Similar
> issue i run into with SVN but i was able to specify proxy & auth into
> .subversion/servers file and it worked.With Ant Ivy no idea what's
> going wrong.  I also tried -autoproxy with ant command line but no luck.
> In the meantime .ivy2 folder which got populated outside network would help
> to proceed temporarily.
>
> Machine : mac 10.10.3
> Ant : 1.9.6
> Ivy : 2.4.0
>
> Attach build.xml & ivysettings.xml
>
> kumar$ ant eclipse
>
> Buildfile: /Users/kumar/sourcecode/trunk/build.xml
>
> resolve:
>
> resolve:
>
> ivy-availability-check:
>
> ivy-fail:
>
> ivy-configure:
>
> [ivy:configure] :: Apache Ivy 2.4.0 - 20141213170938 ::
> http://ant.apache.org/ivy/ ::
>
> [ivy:configure] :: loading settings :: file =
> /Users/kumar/sourcecode/trunk/lucene/ivy-settings.xml
>
>
> resolve:
>
-- 
- Mark
about.me/markrmiller

Re: Ant Ivy resolve / Authenticated Proxy Issue

2015-09-16 Thread Mark Miller

I mention the same thing in
https://issues.apache.org/jira/browse/LUCENE-6743

They claim to have addressed this with Java delete on close stuff, but it
still happens even with 2.4.0.

Locally, I now use the nio strategy and never hit it.

- Mark

On Wed, Sep 16, 2015 at 12:17 PM Shawn Heisey  wrote:

> On 9/16/2015 9:32 AM, Mark Miller wrote:
> > Have you used jconsole or visualvm to see what it is actually hanging on
> to
> > there? Perhaps it is lock files that are not cleaned up or something
> else?
> >
> > You might try: find ~/.ivy2 -name "*.lck" -type f -exec rm {} \;
>
> If that does turn out to be the problem and deleting lockfiles fixes it,
> then you may be running into what I believe is a bug.  It is a bug that
> was (in theory) fixed in IVY-1388.
>
> https://issues.apache.org/jira/browse/IVY-1388
>
> I have seen the same problem even in version 2.3.0 which contains a fix
> for IVY-1388, so I filed a new issue:
>
> https://issues.apache.org/jira/browse/IVY-1489
>
> Thanks,
> Shawn
>
> --
- Mark
about.me/markrmiller

Re: Ant Ivy resolve / Authenticated Proxy Issue

2015-09-16 Thread Mark Miller

You should be able to easily see where the task is hanging in ivy code.

- Mark

On Wed, Sep 16, 2015 at 1:36 PM Susheel Kumar  wrote:

> Not really. There are no lock files & even after cleaning up lock files (to
> be sure) problem still persists.  It works outside company network but
> inside it stucks.  let me try to see if jconsole can show something
> meaningful.
>
> Thanks,
> Susheel
>
> On Wed, Sep 16, 2015 at 12:17 PM, Shawn Heisey 
> wrote:
>
> > On 9/16/2015 9:32 AM, Mark Miller wrote:
> > > Have you used jconsole or visualvm to see what it is actually hanging
> on
> > to
> > > there? Perhaps it is lock files that are not cleaned up or something
> > else?
> > >
> > > You might try: find ~/.ivy2 -name "*.lck" -type f -exec rm {} \;
> >
> > If that does turn out to be the problem and deleting lockfiles fixes it,
> > then you may be running into what I believe is a bug.  It is a bug that
> > was (in theory) fixed in IVY-1388.
> >
> > https://issues.apache.org/jira/browse/IVY-1388
> >
> > I have seen the same problem even in version 2.3.0 which contains a fix
> > for IVY-1388, so I filed a new issue:
> >
> > https://issues.apache.org/jira/browse/IVY-1489
> >
> > Thanks,
> > Shawn
> >
> >
>
-- 
- Mark
about.me/markrmiller

Re: Cloud Deployment Strategy... In the Cloud

2015-10-01 Thread Mark Miller

On Wed, Sep 30, 2015 at 10:36 AM Steve Davids  wrote:

> Our project built a custom "admin" webapp that we use for various O&M
> activities so I went ahead and added the ability to upload a Zip
> distribution which then uses SolrJ to forward the extracted contents to ZK,
> this package is built and uploaded via a Gradle build task which makes life
> easy on us by allowing us to jam stuff into ZK which is sitting in a
> private network (local VPC) without necessarily needing to be on a ZK
> machine. We then moved on to creating collection (trivial), and
> adding/removing replicas. As for adding replicas I am rather confused as to
> why I would need specify a specific shard for replica placement, before
> when I threw down a core.properties file the machine would automatically
> come up and figure out which shard it should join based on reasonable
> assumptions - why wouldn't the same logic apply here?


I'd file a JIRA issue for the functionality.


> I then saw that
> a Rule-based
> Replica Placement
> <
> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
> >
> feature was added which I thought would be reasonable but after looking at
> the tests  it appears to
> still require a shard parameter for adding a replica which seems to defeat
> the entire purpose.


I was not involved in the addReplica command, but the predefined stuff
worked that way just to make bootstrapping up a cluster really simple. I
don't see why addReplica couldn't follow the same logic if no shard was
specified.


> So after getting bummed out about that, I took a look
> at the delete replica request since we are having machines come/go we need
> to start dropping them and found that the delete replica requires a
> collection, shard, and replica name and if I have the name of the machine
> it appears the only way to figure out what to remove is by walking the
> clusterstate tree for all collections and determine which replicas are a
> candidate for removal which seems unnecessarily complicated.
>

You should not need the shard for this call. The collection and replica
core node name will be unique. Another JIRA issue?


>
> Hopefully I don't come off as complaining, but rather looking at it from a
> client perspective, the Collections API doesn't seem simple to use and
> really the only reason I am messing around with it now is because there is
> repeated threats to make "zk as truth" the default in the 5.x branch at
> some point in the future. I would personally advocate that something like
> the autoManageReplicas 
> be
> introduced to make life much simpler on clients as this appears to be the
> thing I am trying to implement externally.
>
> If anyone has happened to to build a system to orchestrate Solr for cloud
> infrastructure and have some pointers it would be greatly appreciated.
>
> Thanks,
>
> -Steve
>
>
> --
- Mark
about.me/markrmiller

Re: Implementing AbstractFullDistribZkTestBase

2015-10-05 Thread Mark Miller

If it's always when using https as in your examples, perhaps it's SOLR-5776.

- mark

On Mon, Oct 5, 2015 at 10:36 AM Markus Jelsma 
wrote:

> Hmmm, i tried that just now but i sometimes get tons of Connection reset
> errors. The tests then end with "There are still nodes recoverying - waited
> for 30 seconds".
>
> [RecoveryThread-collection1] ERROR org.apache.solr.cloud.RecoveryStrategy
> - Error while trying to recover.:java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: https://127.0.0.1:49146
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
> occured when talking to server at: https://127.0.0.1:49146
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:574)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:209)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
> at sun.security.ssl.InputRecord.read(InputRecord.java:503)
> at
> sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)
> at
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1343)
> at
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)
> at
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
> at
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
> at
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
> at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
> at
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:465)
> ... 7 more
>
> [RecoveryThread-collection1] ERROR org.apache.solr.cloud.RecoveryStrategy
> - Recovery failed - trying again... (1)
> [RecoveryThread-collection1] INFO org.apache.solr.cloud.RecoveryStrategy -
> Wait 4.0 seconds before trying to recover again (2)
>
>
>
> -Original message-
> > From:Erick Erickson 
> > Sent: Monday 5th October 2015 15:59
> > To: solr-user@lucene.apache.org
> > Subject: Re: Implementing AbstractFullDistribZkTestBase
> >
> > Right, I'm assuming you're creating a cluster somewhere.
> > Try calling (from memory) waitForRecoveriesToFinish in
> > AbstractDistribZkTestBase after creating the collection
> > to insure that the nodes are up and running before you
> > index to them.
> >
> > Shot in the dark
> > Erick
> >
> > On Mon, Oct 5, 2015 at 1:36 AM, Markus Jelsma
> >  wrote:
> > > Hello,
> > >
> > > I have several implementations of AbstractFullDistribZkTestBase of
> Solr 5.3.0. Sometimes a test fails with either "There are still nodes
> recoverying - waited for 30 seconds" or "IOException occured when talking
> to server at: https://127.0.0.1:44474/collection1";, so usually at least
> one of all test fails. These are very simple implementations such as :
> > >
> > >   @Test
> > >   @ShardsFixed(num = 2)
> > >   pu

Re: Implementing AbstractFullDistribZkTestBase

2015-10-05 Thread Mark Miller

Not sure what that means :)

SOLR-5776 would not happen all the time, but too frequently. It also
wouldn't matter the power of CPU, cores or RAM :)

Do you see fails without https is what you want to check.

- mark

On Mon, Oct 5, 2015 at 2:16 PM Markus Jelsma 
wrote:

> Hi - no, i don't think so, it doesn't happen all the time, but too
> frequently. The machine running the tests has a high powered CPU, plenty of
> cores and RAM.
>
> Markus
>
>
>
> -Original message-
> > From:Mark Miller 
> > Sent: Monday 5th October 2015 19:52
> > To: solr-user@lucene.apache.org
> > Subject: Re: Implementing AbstractFullDistribZkTestBase
> >
> > If it's always when using https as in your examples, perhaps it's
> SOLR-5776.
> >
> > - mark
> >
> > On Mon, Oct 5, 2015 at 10:36 AM Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hmmm, i tried that just now but i sometimes get tons of Connection
> reset
> > > errors. The tests then end with "There are still nodes recoverying -
> waited
> > > for 30 seconds".
> > >
> > > [RecoveryThread-collection1] ERROR
> org.apache.solr.cloud.RecoveryStrategy
> > > - Error while trying to
> recover.:java.util.concurrent.ExecutionException:
> > > org.apache.solr.client.solrj.SolrServerException: IOException occured
> when
> > > talking to server at: https://127.0.0.1:49146
> > > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > > at
> > >
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
> > > at
> > >
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
> > > at
> > > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> > > Caused by: org.apache.solr.client.solrj.SolrServerException:
> IOException
> > > occured when talking to server at: https://127.0.0.1:49146
> > > at
> > >
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:574)
> > > at
> > >
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
> > > at
> > >
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > at
> > >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
> > > at
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > at
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > at java.lang.Thread.run(Thread.java:745)
> > > Caused by: java.net.SocketException: Connection reset
> > > at java.net.SocketInputStream.read(SocketInputStream.java:209)
> > > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > > at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
> > > at sun.security.ssl.InputRecord.read(InputRecord.java:503)
> > > at
> > > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)
> > > at
> > >
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1343)
> > > at
> > > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)
> > > at
> > > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
> > > at
> > >
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
> > > at
> > >
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
> > > at
> > >
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
> > > at
> > >
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
> > > at
> > >
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
> > > at
> > >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
> > > at
> > >
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> > > at
> > >
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> > > at
> > >
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
> > > at
> > >
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> > > at
> > >
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:465)
> > > ... 7 more
> > >
> > > [RecoveryThread-collection1] ERROR
> org.apache.solr.cloud.RecoveryStrategy
> > > - Recovery failed - trying again... (1)
> > > [RecoveryThread-collection1] INFO
> org.apache.solr.cloud.RecoveryStrategy -
> > > Wait 4.0 se

Re: Recovery Thread Blocked

2015-10-05 Thread Mark Miller

I'd make two guess:

Looks like you are using Jrocket? I don't think that is common or well
tested at this point.

There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace of
SolrCloud, you are dealing with something fairly ancient and so it will be
harder to find help with older issues most likely.

- Mark

On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:

> Any takers on this? Any kinda clue would help. Thanks.
>
> On 10/4/15 10:14 AM, Rallavagu wrote:
> > As there were no responses so far, I assume that this is not a very
> > common issue that folks come across. So, I went into source (4.6.1) to
> > see if I can figure out what could be the cause.
> >
> >
> > The thread that is locking is in this block of code
> >
> > synchronized (recoveryLock) {
> >// to be air tight we must also check after lock
> >if (cc.isShutDown()) {
> >  log.warn("Skipping recovery because Solr is shutdown");
> >  return;
> >}
> >log.info("Running recovery - first canceling any ongoing
> recovery");
> >cancelRecovery();
> >
> >while (recoveryRunning) {
> >  try {
> >recoveryLock.wait(1000);
> >  } catch (InterruptedException e) {
> >
> >  }
> >  // check again for those that were waiting
> >  if (cc.isShutDown()) {
> >log.warn("Skipping recovery because Solr is shutdown");
> >return;
> >  }
> >  if (closed) return;
> >}
> >
> > Subsequently, the thread will get into cancelRecovery method as below,
> >
> > public void cancelRecovery() {
> >  synchronized (recoveryLock) {
> >if (recoveryStrat != null && recoveryRunning) {
> >  recoveryStrat.close();
> >  while (true) {
> >try {
> >  recoveryStrat.join();
> >} catch (InterruptedException e) {
> >  // not interruptible - keep waiting
> >  continue;
> >}
> >break;
> >  }
> >
> >  recoveryRunning = false;
> >  recoveryLock.notifyAll();
> >}
> >  }
> >}
> >
> > As per the stack trace "recoveryStrat.join()" is where things are
> > holding up.
> >
> > I wonder why/how cancelRecovery would take time so around 870 threads
> > would be waiting on. Is it possible that ZK is not responding or
> > something else like Operating System resources could cause this? Thanks.
> >
> >
> > On 10/2/15 4:17 PM, Rallavagu wrote:
> >> Here is the stack trace of the thread that is holding the lock.
> >>
> >>
> >> "Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
> >> native_blocked, daemon
> >>  -- Waiting for notification on:
> >> org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
> >>  at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
> >>  at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
> >>  at
> >> syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
> >>  at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
> >>  at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
> >>  at
> >>
> RJNI_jrockit_vm_Threads_waitForNotifySignal+73(rnithreads.c:72)@0x7ff31351939a
> >>
> >>
> >>  at
> >> jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native
> >> Method)
> >>  at java/lang/Object.wait(J)V(Native Method)
> >>  at java/lang/Thread.join(Thread.java:1206)
> >>  ^-- Lock released while waiting:
> >> org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
> >>  at java/lang/Thread.join(Thread.java:1259)
> >>  at
> >>
> org/apache/solr/update/DefaultSolrCoreState.cancelRecovery(DefaultSolrCoreState.java:331)
> >>
> >>
> >>  ^-- Holding lock: java/lang/Object@0x114d8dd00[recursive]
> >>  at
> >>
> org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:297)
> >>
> >>
> >>  ^-- Holding lock: java/lang/Object@0x114d8dd00[fat lock]
> >>  at
> >>
> org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)
> >>
> >>
> >>  at jrockit/vm/RNI.c2java(J)V(Native Method)
> >>
> >>
> >> Stack trace of one of the 870 threads that is waiting for the lock to be
> >> released.
> >>
> >> "Thread-55489" id=77520 idx=0xebc tid=1494 prio=5 alive, blocked,
> >> native_blocked, daemon
> >>  -- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat
> >> lock]
> >>  at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
> >>  at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
> >>  at
> >> syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
> >>  at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
> >>  at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
> >>  at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
> >>  at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
> >>

Re: Solr Log Analysis

2015-10-05 Thread Mark Miller

Best tool for this job really depends on your needs, but one option:

I have a dev tool for Solr log analysis:
https://github.com/markrmiller/SolrLogReader

If you use the -o option, it will spill out just the queries to a file with
qtimes.

- Mark

On Wed, Sep 23, 2015 at 8:16 PM Tarala, Magesh  wrote:

> I'm using Solr 4.10.4 in a 3 node cloud setup. I have 3 shards and 3
> replicas for the collection.
>
> I want to analyze the logs to extract the queries and query times. Is
> there a tool or script someone has created already for this?
>
> Thanks,
> Magesh
>
-- 
- Mark
about.me/markrmiller

Re: Recovery Thread Blocked

2015-10-06 Thread Mark Miller

That amount of RAM can easily be eaten up depending on your sorting,
faceting, data.

Do you have gc logging enabled? That should describe what is happening with
the heap.

- Mark

On Tue, Oct 6, 2015 at 4:04 PM Rallavagu  wrote:

> Mark - currently 5.3 is being evaluated for upgrade purposes and
> hopefully get there sooner. Meanwhile, following exception is noted from
> logs during updates
>
> ERROR org.apache.solr.update.CommitTracker  – auto commit
> error...:java.lang.IllegalStateException: this writer hit an
> OutOfMemoryError; cannot commit
>  at
>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
>  at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
>  at
>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
>  at
> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
>  at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>  at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>  at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
>  at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>  at java.lang.Thread.run(Thread.java:682)
>
> Considering the fact that the machine is configured with 48G (24G for
> JVM which will be reduced in future) wondering how would it still go out
> of memory. For memory mapped index files the remaining 24G or what is
> available off of it should be available. Looking at the lsof output the
> memory mapped files were around 10G.
>
> Thanks.
>
>
> On 10/5/15 5:41 PM, Mark Miller wrote:
> > I'd make two guess:
> >
> > Looks like you are using Jrocket? I don't think that is common or well
> > tested at this point.
> >
> > There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace
> of
> > SolrCloud, you are dealing with something fairly ancient and so it will
> be
> > harder to find help with older issues most likely.
> >
> > - Mark
> >
> > On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:
> >
> >> Any takers on this? Any kinda clue would help. Thanks.
> >>
> >> On 10/4/15 10:14 AM, Rallavagu wrote:
> >>> As there were no responses so far, I assume that this is not a very
> >>> common issue that folks come across. So, I went into source (4.6.1) to
> >>> see if I can figure out what could be the cause.
> >>>
> >>>
> >>> The thread that is locking is in this block of code
> >>>
> >>> synchronized (recoveryLock) {
> >>> // to be air tight we must also check after lock
> >>> if (cc.isShutDown()) {
> >>>   log.warn("Skipping recovery because Solr is shutdown");
> >>>   return;
> >>> }
> >>> log.info("Running recovery - first canceling any ongoing
> >> recovery");
> >>> cancelRecovery();
> >>>
> >>> while (recoveryRunning) {
> >>>   try {
> >>> recoveryLock.wait(1000);
> >>>   } catch (InterruptedException e) {
> >>>
> >>>   }
> >>>   // check again for those that were waiting
> >>>   if (cc.isShutDown()) {
> >>> log.warn("Skipping recovery because Solr is shutdown");
> >>> return;
> >>>   }
> >>>   if (closed) return;
> >>> }
> >>>
> >>> Subsequently, the thread will get into cancelRecovery method as below,
> >>>
> >>> public void cancelRecovery() {
> >>>   synchronized (recoveryLock) {
> >>> if (recoveryStrat != null && recoveryRunning) {
> >>>   recoveryStrat.close();
> >>>   while (true) {
> >>> try {
> >>>   recoveryStrat.join();
> >>> } catch (InterruptedException e) {
> >>>   // not interruptible - keep waiting
> >>>   continue;
> >>> }
> >>> break;
> >>>   }
> >>>
> >>&

Re: Recovery Thread Blocked

2015-10-06 Thread Mark Miller

If it's a thread and you have plenty of RAM and the heap is fine, have you
checked raising OS thread limits?

- Mark

On Tue, Oct 6, 2015 at 4:54 PM Rallavagu  wrote:

> GC logging shows normal. The "OutOfMemoryError" appears to be pertaining
> to a thread but not to JVM.
>
> On 10/6/15 1:07 PM, Mark Miller wrote:
> > That amount of RAM can easily be eaten up depending on your sorting,
> > faceting, data.
> >
> > Do you have gc logging enabled? That should describe what is happening
> with
> > the heap.
> >
> > - Mark
> >
> > On Tue, Oct 6, 2015 at 4:04 PM Rallavagu  wrote:
> >
> >> Mark - currently 5.3 is being evaluated for upgrade purposes and
> >> hopefully get there sooner. Meanwhile, following exception is noted from
> >> logs during updates
> >>
> >> ERROR org.apache.solr.update.CommitTracker  – auto commit
> >> error...:java.lang.IllegalStateException: this writer hit an
> >> OutOfMemoryError; cannot commit
> >>   at
> >>
> >>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
> >>   at
> >>
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
> >>   at
> >>
> >>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
> >>   at
> >> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> >>   at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
> >>   at
> >>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> >>   at
> >>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> >>   at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
> >>   at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
> >>   at java.lang.Thread.run(Thread.java:682)
> >>
> >> Considering the fact that the machine is configured with 48G (24G for
> >> JVM which will be reduced in future) wondering how would it still go out
> >> of memory. For memory mapped index files the remaining 24G or what is
> >> available off of it should be available. Looking at the lsof output the
> >> memory mapped files were around 10G.
> >>
> >> Thanks.
> >>
> >>
> >> On 10/5/15 5:41 PM, Mark Miller wrote:
> >>> I'd make two guess:
> >>>
> >>> Looks like you are using Jrocket? I don't think that is common or well
> >>> tested at this point.
> >>>
> >>> There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace
> >> of
> >>> SolrCloud, you are dealing with something fairly ancient and so it will
> >> be
> >>> harder to find help with older issues most likely.
> >>>
> >>> - Mark
> >>>
> >>> On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:
> >>>
> >>>> Any takers on this? Any kinda clue would help. Thanks.
> >>>>
> >>>> On 10/4/15 10:14 AM, Rallavagu wrote:
> >>>>> As there were no responses so far, I assume that this is not a very
> >>>>> common issue that folks come across. So, I went into source (4.6.1)
> to
> >>>>> see if I can figure out what could be the cause.
> >>>>>
> >>>>>
> >>>>> The thread that is locking is in this block of code
> >>>>>
> >>>>> synchronized (recoveryLock) {
> >>>>>  // to be air tight we must also check after lock
> >>>>>  if (cc.isShutDown()) {
> >>>>>log.warn("Skipping recovery because Solr is shutdown");
> >>>>>return;
> >>>>>  }
> >>>>>  log.info("Running recovery - first canceling any ongoing
> >>>> recovery");
> >>>>>  cancelRecovery();
> >>>>>
> >>>>>  while (recoveryRunning) {
> >>>>>try {
> >>>>>  recoveryLock.wait(1000);
> >>>>>} catch (InterruptedExceptio

Re: No live SolrServers available to handle this request

2015-10-08 Thread Mark Miller

Your Lucene and Solr versions must match.

On Thu, Oct 8, 2015 at 4:02 PM Steve  wrote:

> I've loaded the Films data into a 4 node cluster.  Indexing went well, but
> when I issue a query, I get this:
>
> "error": {
> "msg": "org.apache.solr.client.solrj.SolrServerException: No live
> SolrServers available to handle this request:
> [
>
> http://host-192-168-0-63.openstacklocal:8081/solr/CollectionFilms_shard1_replica2
> ,
>
>
> http://host-192-168-0-62.openstacklocal:8081/solr/CollectionFilms_shard2_replica2
> ,
>
>
> http://host-192-168-0-60.openstacklocal:8081/solr/CollectionFilms_shard2_replica1
> ]",
> ...
>
> and further down in the stacktrace:
>
> Server Error
> Caused by:
> java.lang.NoSuchMethodError:
>
> org.apache.lucene.index.TermsEnum.postings(Lorg/apache/lucene/index/PostingsEnum;I)Lorg/apache/lucene/index/PostingsEnum;\n\tat
>
> org.apache.solr.search.SolrIndexSearcher.getFirstMatch(SolrIndexSearcher.java:802)\n\tat
>
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:333)\n\tat
> ...
>
>
> I'm using:
>
> solr version 5.3.1
>
> lucene 5.2.1
>
> zookeeper version 3.4.6
>
> indexing with:
>
>cd /opt/solr/example/films;
>
> /opt/solr/bin/post -c CollectionFilms -port 8081  films.json
>
>
>
> thx,
> .strick
>
-- 
- Mark
about.me/markrmiller

Re: Explicit commit with openSearcher=false

2015-11-11 Thread Mark Miller

openSearcher is a valid param for a commit whatever the api you are using
to issue it.

- Mark

On Wed, Nov 11, 2015 at 12:32 PM Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Does waitSearcher=false works like you need?
>
> On Wed, Nov 11, 2015 at 1:34 PM, Sathyakumar Seshachalam <
> sathyakumar_seshacha...@trimble.com> wrote:
>
> > Hi,
> >
> > I have a Search system based on Solr that relies on autoCommit
> > configuration (with openSearcher=false). I now have a use-case that
> > requires me to disable autoCommit and issue explicit commit commands, But
> > as I understand an explicit commit command "always" opens a searcher. Is
> > this correct ? Is there anyway to work-around this?  I really do not want
> > to open searcher overtime I hard commit (I rely on autoSoftCommit for
> this).
> >
> > Regards,
> > Sathya
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>
-- 
- Mark
about.me/markrmiller

Re: Explicit commit with openSearcher=false

2015-11-12 Thread Mark Miller

You can pass arbitrary params with Solrj. The API usage is just a little
more arcane.

- Mark

On Wed, Nov 11, 2015 at 11:33 PM Sathyakumar Seshachalam <
sathyakumar_seshacha...@trimble.com> wrote:

> I intend to use SolrJ. I only saw the below overloaded commit method in
> documentation (http://lucene.apache.org/solr/4_10_3/solr-solrj/index.html)
> of class ³org.apache.solr.client.solrj.SolrServer"
>
> public UpdateResponse commit(boolean waitFlush, boolean waitSearcher,
> boolean softCommit).
>
>
> And I assumed waitSearcher is not the same as openSearcher.  (From the
> documentation atleast it would seem that waitSearcher when false only does
> not block the call, but a searcher is still opened).
> None of the add methods take a openSearcher param either.
>
> Regards
> Sathya
>
>
> On 11/11/15, 11:58 PM, "Chris Hostetter"  wrote:
>
> >
> >: I saw mention of openSearcher for SolrJ, so I looked in the source of
> >: the UpdateRequestHandler, and there is no mention of openSearcher in
> >: there that I can see, for XML, JSON or SolrJ requests.
> >:
> >: So my take is that this isn't possible right now :-(
> >
> >It's handled by the Loaders - all of which (i think?) delegate to
> >RequestHandlerUtils.handleCommit to generate the CommitUpdateCommand
> >according to the relevant UpdateParams.
> >
> >Most of the constants you see in UpdateRequestHandler look like dead code
> >that should be removed.
> >
> >
> >-Hoss
> >http://www.lucidworks.com/
>
> --
- Mark
about.me/markrmiller

Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-18 Thread Mark Miller

If you see "WARNING: too many searchers on deck" or something like that in
the logs, that could cause this behavior and would indicate you are opening
searchers faster than Solr can keep up.

- Mark

On Tue, Nov 17, 2015 at 2:05 PM Erick Erickson 
wrote:

> That's what was behind my earlier comment about perhaps
> the call is timing out, thus the commit call is returning
> _before_ the actual searcher is opened. But the call
> coming back is not a return from commit, but from Jetty
> even though the commit hasn't really returned.
>
> Just a guess however.
>
> Best,
> Erick
>
> On Tue, Nov 17, 2015 at 12:11 AM, adfel70  wrote:
> > Thanks Eric,
> > I'll try to play with the autowarm config.
> >
> > But I have a more direct question - why does the commit return without
> > waiting till the searchers are fully refreshed?
> >
> > Could it be that the parameter waitSearcher=true doesn't really work?
> > or maybe I don't understand something here...
> >
> > Thanks,
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/CloudSolrCloud-Commit-returns-but-not-all-data-is-visible-occasionally-tp4240368p4240518.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller

Re: SolrCloud 4.8.1 - commit wait

2015-12-11 Thread Mark Miller

He has waitSearcher as false it looks, so all the time should be in the
commit. So that amount of time does sound odd.

I would certainly change those commit settings though. I would not use
maxDocs, that is an ugly way to control this. And one second is much too
aggressive as Erick says.

If you want to attempt that kind of visibility, you should use the
softAutoCommit. The regular autoCommit should be at least 15 or 20 seconds.

- Mark

On Fri, Dec 11, 2015 at 1:22 PM Erick Erickson 
wrote:

> First of all, your autocommit settings are _very_ aggressive. Committing
> every second is far to frequent IMO.
>
> As an aside, I generally prefer to omit the maxDocs as it's not all
> that predictable,
> but that's a personal preference and really doesn't bear on your problem..
>
> My _guess_ is that you are doing a lot of autowarming. The number of docs
> doesn't really matter if your autowarming is taking forever, your Solr logs
> should report the autowarm times at INFO level, have you checked those?
>
> The commit settings shouldn't be a problem in terms of your server dying,
> the indexing process flushes docs to the tlog independent of committing so
> upon restart they should be recovered. Here's a blog on the subject:
>
>
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Fri, Dec 11, 2015 at 8:24 AM, Vincenzo D'Amore 
> wrote:
> > Hi all,
> >
> > I have a SolrCloud cluster with a collection (2.5M docs) with 3 shards
> and
> > 15 replicas.
> > There is a solrj application that feeds the collection, updating few
> > documents every hour, I don't understand why, at end of process, the hard
> > commit takes about 8/10 minutes.
> >
> > Even if there are only few hundreds of documents.
> >
> > This is the autocommit configuration:
> >
> > 
> > 1
> > 1000
> > false
> > 
> >
> > In your experience why hard commit takes so long even for so few
> documents?
> >
> > Now I'm changing the code to softcommit, calling commit (waitFlush =
> > false, waitSearcher
> > = false, softCommit = true);
> >
> > solrServer.commit(false, false, true);.
> >
> > I have configured NRTCachingDirectoryFactory, but I'm a little bit
> worried
> > if a server goes down (something like: kill -9, SolrCloud crashes, out of
> > memory, etc.), and if, using this strategy
> softcommit+NRTCachingDirectory,
> > SolrCloud instance could not recover a replica.
> >
> > Should I worry about this new configuration? I was thinking to take a
> > snapshot of everything every day, in order to recover immediately the
> > index. Could this be considered a best practice?
> >
> > Thanks in advance for your time,
> > Vincenzo
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
>
-- 
- Mark
about.me/markrmiller

Re: Collections API - HTTP verbs

2015-02-18 Thread Mark Miller

Perhaps try quotes around the url you are providing to curl. It's not
complaining about the http method - Solr has historically always taken
simple GET's for http - for good or bad, you pretty much only post
documents / updates.

It's saying the name param is required and not being found and since you
are trying to specify the name, I'm guessing something about the command is
not working. You might try just shoving it in a browser url bar as well.

- Mark

On Wed Feb 18 2015 at 8:56:26 PM Hrishikesh Gadre 
wrote:

> Hi,
>
> Can we please document which HTTP method is supposed to be used with each
> of these APIs?
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> I am trying to invoke following API
>
> curl http://
> :8983/solr/admin/collections?action=CLUSTERPROP&name=urlScheme&
> val=https
>
> This request is failing due to following error,
>
> 2015-02-18 17:29:39,965 INFO org.apache.solr.servlet.SolrDispatchFilter:
> [admin] webapp=null path=/admin/collections params={action=CLUSTERPROP}
> status=400 QTime=20
>
> org.apache.solr.core.SolrCore: org.apache.solr.common.SolrException:
> Missing required parameter: name
>
> at
> org.apache.solr.common.params.RequiredSolrParams.get(
> RequiredSolrParams.java:49)
>
> at
> org.apache.solr.common.params.RequiredSolrParams.check(
> RequiredSolrParams.java:153)
>
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleProp(
> CollectionsHandler.java:238)
>
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(
> CollectionsHandler.java:200)
>
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:135)
>
> at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(
> SolrDispatchFilter.java:770)
>
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:271)
>
> I am using Solr 4.10.3 version.
>
> Thanks
>
> Hrishikesh
>

Re: New leader/replica solution for HDFS

2015-02-26 Thread Mark Miller

I’ll be working on this at some point: 
https://issues.apache.org/jira/browse/SOLR-6237

- Mark

http://about.me/markrmiller

> On Feb 25, 2015, at 2:12 AM, longsan  wrote:
> 
> We used HDFS as our Solr index storage and we really have a heavy update
> load. We had met much problems with current leader/replica solution. There
> is duplicate index computing on Replilca side. And the data sync between
> leader/replica is always a problem.
> 
> As HDFS already provides data replication on data layer, could Solr provide
> just service layer replication?
> 
> My thought is that the leader and the replica all bind to the same data
> index directory. And the leader will build up index for new request, the
> replica will just keep update the index version with the leader(such as a
> soft commit periodically? ). If the leader lost then the replica will take
> the duty immediately. 
> 
> Thanks for any suggestion of this idea.
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/New-leader-replica-solution-for-HDFS-tp4188735.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrcloud Index corruption

2015-03-05 Thread Mark Miller

If you google replication can cause index corruption there are two jira issues 
that are the most likely cause of corruption in a solrcloud env. 

- Mark

> On Mar 5, 2015, at 2:20 PM, Garth Grimm  
> wrote:
> 
> For updates, the document will always get routed to the leader of the 
> appropriate shard, no matter what server first receives the request.
> 
> -Original Message-
> From: Martin de Vries [mailto:mar...@downnotifier.com] 
> Sent: Thursday, March 05, 2015 4:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solrcloud Index corruption
> 
> Hi Erick,
> 
> Thank you for your detailed reply.
> 
> You say in our case some docs didn't made it to the node, but that's not 
> really true: the docs can be found on the corrupted nodes when I search on 
> ID. The docs are also complete. The problem is that the docs do not appear 
> when I filter on certain fields (however the fields are in the doc and have 
> the right value when I search on ID). So something seems to be corrupt in the 
> filter index. We will try the checkindex, hopefully it is able to identify 
> the problematic cores.
> 
> I understand there is not a "master" in SolrCloud. In our case we use haproxy 
> as a load balancer for every request. So when indexing every document will be 
> sent to a different solr server, immediately after each other. Maybe 
> SolrCloud is not able to handle that correctly?
> 
> 
> Thanks,
> 
> Martin
> 
> 
> 
> 
> Erick Erickson schreef op 05.03.2015 19:00:
> 
>> Wait up. There's no "master" index in SolrCloud. Raw documents are 
>> forwarded to each replica, indexed and put in the local tlog. If a 
>> replica falls too far out of synch (say you take it offline), then the 
>> entire index _can_ be replicated from the leader and, if the leader's 
>> index was incomplete then that might propagate the error.
>> 
>> The practical consequence of this is that if _any_ replica has a 
>> complete index, you can recover. Before going there though, the 
>> brute-force approach is to just re-index everything from scratch.
>> That's likely easier, especially on indexes this size.
>> 
>> Here's what I'd do.
>> 
>> Assuming you have the Collections API calls for ADDREPLICA and 
>> DELETEREPLICA, then:
>> 0> Identify the complete replicas. If you're lucky you have at least
>> one for each shard.
>> 1> Copy 1 good index from each shard somewhere just to have a backup.
>> 2> DELETEREPLICA on all the incomplete replicas
>> 2.5> I might shut down all the nodes at this point and check that all 
>> the cores I'd deleted were gone. If any remnants exist, 'rm -rf 
>> deleted_core_dir'.
>> 3> ADDREPLICA to get the ones removed in back.
>> 
>> should copy the entire index from the leader for each replica. As you 
>> do the leadership will change and after you've deleted all the 
>> incomplete replicas, one of the complete ones will be the leader and 
>> you should be OK.
>> 
>> If you don't want to/can't use the Collections API, then
>> 0> Identify the complete replicas. If you're lucky you have at least
>> one for each shard.
>> 1> Shut 'em all down.
>> 2> Copy the good index somewhere just to have a backup.
>> 3> 'rm -rf data' for all the incomplete cores.
>> 4> Bring up the good cores.
>> 5> Bring up the cores that you deleted the data dirs from.
>> 
>> What should do is replicate the entire index from the leader. When you 
>> restart the good cores (step 4 above), they'll _become_ the leader.
>> 
>> bq: Is it possible to make Solrcloud invulnerable for network problems 
>> I'm a little surprised that this is happening. It sounds like the 
>> network problems were such that some nodes weren't out of touch long 
>> enough for Zookeeper to sense that they were down and put them into 
>> recovery. Not sure there's any way to secure against that.
>> 
>> bq: Is it possible to see if a core is corrupt?
>> There's "CheckIndex", here's at least one link:
>> http://java.dzone.com/news/lucene-and-solrs-checkindex
>> What you're describing, though, is that docs just didn't make it to 
>> the node, _not_ that the index has unexpected bits, bad disk sectors 
>> and the like so CheckIndex can't detect that. How would it know what 
>> _should_ have been in the index?
>> 
>> bq: I noticed a difference in the "Gen" column on Overview - 
>> Replication. Does this mean there is something wrong?
>> You cannot infer anything from this. In particular, the merging will 
>> be significantly different between a single full-reindex and what the 
>> state of segment merges is in an incrementally built index.
>> 
>> The admin UI screen is rooted in the pre-cloud days, the Master/Slave 
>> thing is entirely misleading. In SolrCloud, since all the raw data is 
>> forwarded to all replicas, and any auto commits that happen may very 
>> well be slightly out of sync, the index size, number of segments, 
>> generations, and all that are pretty safely ignored.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries 
>> 
>> wrote:
>> 
>

Re: 4.10.4 - nodes up, shard without leader

2015-03-08 Thread Mark Miller

Interesting bug.

First there is the already closed transaction log. That by itself deserves
a look. I'm not even positive we should be replaying the log we
reconnecting from ZK disconnect, but even if we do, this should never
happen.

Beyond that there seems to be some race. Because of the log trouble, we try
and cancel the election - but we don't find the ephemeral election node yet
for some reason and so just assume it's fine, no node there to remove
(well, we WARN, because it is a little unexpected). Then that ephemeral
node materializes I guess, and the new leader doesn't register because the
old leader won't give up the thrown. We don't try and force the new leader
because that may just hide bugs and cause data loss, we no leader is
elected.

I'd guess there are two JIRA issues to resolve here.

- Mark

On Sun, Mar 8, 2015 at 8:37 AM Markus Jelsma 
wrote:

> Hello - i stumbled upon an issue i've never seen earlier, a shard with all
> nodes up and running but no leader. This is on 4.10.4. One of the two nodes
> emits the following error log entry:
>
> 2015-03-08 05:25:49,095 WARN [solr.cloud.ElectionContext] - [Thread-136] -
> : cancelElection did not find election node to remove
> /overseer_elect/election/93434598784958483-178.21.116.
> 225:8080_solr-n_000246
> 2015-03-08 05:25:49,121 WARN [solr.cloud.ElectionContext] - [Thread-136] -
> : cancelElection did not find election node to remove
> /collections/oi/leader_elect/shard3/election/93434598784958483-178.21.116.
> 225:8080_solr_oi_h-n_43
> 2015-03-08 05:25:49,220 ERROR [solr.update.UpdateLog] - [Thread-136] - :
> Error inspecting tlog 
> tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0001394
> refcount=2}
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
> at org.apache.solr.update.ChannelFastInputStream.
> readWrappedStream(TransactionLog.java:784)
> at org.apache.solr.common.util.FastInputStream.refill(
> FastInputStream.java:89)
> at org.apache.solr.common.util.FastInputStream.read(
> FastInputStream.java:125)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.solr.update.TransactionLog.endsWithCommit(
> TransactionLog.java:218)
> at org.apache.solr.update.UpdateLog.recoverFromLog(
> UpdateLog.java:800)
> at org.apache.solr.cloud.ZkController.register(
> ZkController.java:841)
> at org.apache.solr.cloud.ZkController$1.command(
> ZkController.java:277)
> at org.apache.solr.common.cloud.ConnectionManager$1$1.run(
> ConnectionManager.java:166)
> 2015-03-08 05:25:49,225 ERROR [solr.update.UpdateLog] - [Thread-136] - :
> Error inspecting tlog 
> tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0001471
> refcount=2}
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
> at org.apache.solr.update.ChannelFastInputStream.
> readWrappedStream(TransactionLog.java:784)
> at org.apache.solr.common.util.FastInputStream.refill(
> FastInputStream.java:89)
> at org.apache.solr.common.util.FastInputStream.read(
> FastInputStream.java:125)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.solr.update.TransactionLog.endsWithCommit(
> TransactionLog.java:218)
> at org.apache.solr.update.UpdateLog.recoverFromLog(
> UpdateLog.java:800)
> at org.apache.solr.cloud.ZkController.register(
> ZkController.java:841)
> at org.apache.solr.cloud.ZkController$1.command(
> ZkController.java:277)
> at org.apache.solr.common.cloud.ConnectionManager$1$1.run(
> ConnectionManager.java:166)
> 2015-03-08 12:21:04,438 WARN [solr.cloud.RecoveryStrategy] -
> [zkCallback-2-thread-28] - : Stopping recovery for core=oi_h coreNodeName=
> 178.21.116.225:8080_solr_oi_h
>
> The other node makes a mess in the logs:
>
> 2015-03-08 05:25:46,020 WARN [solr.cloud.RecoveryStrategy] -
> [zkCallback-2-thread-20] - : Stopping recovery for core=oi_c coreNodeName=
> 194.145.201.190:
> 8080_solr_oi_c
> 2015-03-08 05:26:08,670 ERROR [solr.cloud.ShardLeaderElectionContext] -
> [zkCallback-2-thread-19] - : There was a problem trying to register as the
> leader:org.
> apache.solr.common.SolrException: Could not register as the leader
> because creating the ephemeral registration node in ZooKeeper failed
> at org.apache.solr.cloud.ShardLeaderElectionContextBase
> .runLeaderProcess(ElectionContext.java:146)
> at org.apache.solr.cloud.ShardLeaderElectionContext.
> runLeaderProcess(ElectionContext.java:317)
> at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(
> LeaderElector.java:163)
> at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(
> LeaderElector.java:125)
> at org.apache.solr

Re: How to use ConcurrentUpdateSolrServer for Secured Solr?

2015-03-23 Thread Mark Miller

Doesn't ConcurrentUpdateSolrServer take an HttpClient in one of it's
constructors?

- Mark

On Sun, Mar 22, 2015 at 3:40 PM Ramkumar R. Aiyengar <
andyetitmo...@gmail.com> wrote:

> Not a direct answer, but Anshum just created this..
>
> https://issues.apache.org/jira/browse/SOLR-7275
>  On 20 Mar 2015 23:21, "Furkan KAMACI"  wrote:
>
> > Is there anyway to use ConcurrentUpdateSolrServer for secured Solr as
> like
> > CloudSolrServer:
> >
> > HttpClientUtil.setBasicAuth(cloudSolrServer.getLbServer().
> getHttpClient(),
> > , );
> >
> > I see that there is no way to access HTTPClient for
> > ConcurrentUpdateSolrServer?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
>

Re: Solr 5.0.0 and HDFS

2015-03-28 Thread Mark Miller

Hmm...can you file a JIRA issue with this info?

- Mark

On Fri, Mar 27, 2015 at 6:09 PM Joseph Obernberger 
wrote:

> I just started up a two shard cluster on two machines using HDFS. When I
> started to index documents, the log shows errors like this. They repeat
> when I execute searches.  All seems well - searches and indexing appear
> to be working.
> Possibly a configuration issue?
> My HDFS config:
>   class="solr.HdfsDirectoryFactory">
>  true
>  160
>   name="solr.hdfs.blockcache.direct.memory.allocation">true
>  16384
>  true
>  false
>  true
>  64
>  512
>  hdfs://nameservice1:8020/solr5
>  /etc/hadoop/conf.cloudera.hdfs1 str>
>  
> Thank you!
>
> -Joe
> 
>
> java.lang.IllegalStateException: file:
> BlockDirectory(HdfsDirectory@799d5a0e
> lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@49838b82) appears
> both in delegate and in cache: cache=[_25.fnm, _2d.si, _2e.nvd, _2b.si,
> _28.tvx, _2c.tvx, _1t.si, _27.nvd, _2b.tvd, _2d_Lucene50_0.pos, _23.nvd,
> _28_Lucene50_0.doc, _28_Lucene50_0.dvd, _2d.fdt, _2c_Lucene50_0.pos,
> _23.fdx, _2b_Lucene50_0.doc, _2d.nvm, _28.nvd, _23.fnm,
> _2b_Lucene50_0.tim, _2e.fdt, _2d_Lucene50_0.doc, _2b_Lucene50_0.dvd,
> _2d_Lucene50_0.dvd, _2b.nvd, _2g.tvx, _28_Lucene50_0.dvm,
> _1v_Lucene50_0.tip, _2e_Lucene50_0.dvm, _2e_Lucene50_0.pos, _2g.fdx,
> _2e.nvm, _2f.fdx, _1s.tvd, _23.nvm, _27.nvm, _1s_Lucene50_0.tip,
> _2c.fnm, _2b.fdt, _2d.fdx, _2c.fdx, _2c.nvm, _2e.fnm,
> _2d_Lucene50_0.dvm, _28.nvm, _28.fnm, _2b_Lucene50_0.tip,
> _2e_Lucene50_0.dvd, _2c.si, _2f.fdt, _2b.fnm, _2e_Lucene50_0.tip,
> _28.si, _28_Lucene50_0.tip, _2f.tvd, _2d_Lucene50_0.tim, _2f.tvx,
> _2b_Lucene50_0.pos, _2e.fdx, _28.fdx, _2c_Lucene50_0.dvd, _2g.tvd,
> _2c_Lucene50_0.tim, _2b.nvm, _23.fdt, _1s_Lucene50_0.tim,
> _28_Lucene50_0.tim, _2c_Lucene50_0.doc, _28.tvd, _2b.tvx, _2c.nvd,
> _2b.fdx, _2c_Lucene50_0.tip, _2e_Lucene50_0.doc, _2e_Lucene50_0.tim,
> _2c.fdt, _27.tvd, _2d.tvd, _2d.tvx, _28_Lucene50_0.pos,
> _2b_Lucene50_0.dvm, _2e.si, _2e.tvd, _2d.fnm, _2c.tvd, _2g.fdt, _2e.tvx,
> _28.fdt, _2d_Lucene50_0.tip, _2c_Lucene50_0.dvm,
> _2d.nvd],delegate=[_10.fdt, _10.fdx, _10.fnm, _10.nvd, _10.nvm, _10.si,
> _10.tvd, _10.tvx, _10_Lucene50_0.doc, _10_Lucene50_0.dvd,
> _10_Lucene50_0.dvm, _10_Lucene50_0.pos, _10_Lucene50_0.tim,
> _10_Lucene50_0.tip, _11.fdt, _11.fdx, _11.fnm, _11.nvd, _11.nvm, _11.si,
> _11.tvd, _11.tvx, _11_Lucene50_0.doc, _11_Lucene50_0.dvd,
> _11_Lucene50_0.dvm, _11_Lucene50_0.pos, _11_Lucene50_0.tim,
> _11_Lucene50_0.tip, _12.fdt, _12.fdx, _12.fnm, _12.nvd, _12.nvm, _12.si,
> _12.tvd, _12.tvx, _12_Lucene50_0.doc, _12_Lucene50_0.dvd,
> _12_Lucene50_0.dvm, _12_Lucene50_0.pos, _12_Lucene50_0.tim,
> _12_Lucene50_0.tip, _13.fdt, _13.fdx, _13.fnm, _13.nvd, _13.nvm, _13.si,
> _13.tvd, _13.tvx, _13_Lucene50_0.doc, _13_Lucene50_0.dvd,
> _13_Lucene50_0.dvm, _13_Lucene50_0.pos, _13_Lucene50_0.tim,
> _13_Lucene50_0.tip, _14.fdt, _14.fdx, _14.fnm, _14.nvd, _14.nvm, _14.si,
> _14.tvd, _14.tvx, _14_Lucene50_0.doc, _14_Lucene50_0.dvd,
> _14_Lucene50_0.dvm, _14_Lucene50_0.pos, _14_Lucene50_0.tim,
> _14_Lucene50_0.tip, _15.fdt, _15.fdx, _15.fnm, _15.nvd, _15.nvm, _15.si,
> _15.tvd, _15.tvx, _15_Lucene50_0.doc, _15_Lucene50_0.dvd,
> _15_Lucene50_0.dvm, _15_Lucene50_0.pos, _15_Lucene50_0.tim,
> _15_Lucene50_0.tip, _1f.fdt, _1f.fdx, _1f.fnm, _1f.nvd, _1f.nvm, _1f.si,
> _1f.tvd, _1f.tvx, _1f_Lucene50_0.doc, _1f_Lucene50_0.dvd,
> _1f_Lucene50_0.dvm, _1f_Lucene50_0.pos, _1f_Lucene50_0.tim,
> _1f_Lucene50_0.tip, _1g.fdt, _1g.fdx, _1g.fnm, _1g.nvd, _1g.nvm, _1g.si,
> _1g.tvd, _1g.tvx, _1g_Lucene50_0.doc, _1g_Lucene50_0.dvd,
> _1g_Lucene50_0.dvm, _1g_Lucene50_0.pos, _1g_Lucene50_0.tim,
> _1g_Lucene50_0.tip, _1h.fdt, _1h.fdx, _1h.fnm, _1h.nvd, _1h.nvm, _1h.si,
> _1h.tvd, _1h.tvx, _1h_Lucene50_0.doc, _1h_Lucene50_0.dvd,
> _1h_Lucene50_0.dvm, _1h_Lucene50_0.pos, _1h_Lucene50_0.tim,
> _1h_Lucene50_0.tip, _1i.fdt, _1i.fdx, _1i.fnm, _1i.nvd, _1i.nvm, _1i.si,
> _1i.tvd, _1i.tvx, _1i_Lucene50_0.doc, _1i_Lucene50_0.dvd,
> _1i_Lucene50_0.dvm, _1i_Lucene50_0.pos, _1i_Lucene50_0.tim,
> _1i_Lucene50_0.tip, _1j.fdt, _1j.fdx, _1j.fnm, _1j.nvd, _1j.nvm, _1j.si,
> _1j.tvd, _1j.tvx, _1j_Lucene50_0.doc, _1j_Lucene50_0.dvd,
> _1j_Lucene50_0.dvm, _1j_Lucene50_0.pos, _1j_Lucene50_0.tim,
> _1j_Lucene50_0.tip, _1k.fdt, _1k.fdx, _1k.fnm, _1k.nvd, _1k.nvm, _1k.si,
> _1k.tvd, _1k.tvx, _1k_Lucene50_0.doc, _1k_Lucene50_0.dvd,
> _1k_Lucene50_0.dvm, _1k_Lucene50_0.pos, _1k_Lucene50_0.tim,
> _1k_Lucene50_0.tip, _1l.fdt, _1l.fdx, _1l.fnm, _1l.nvd, _1l.nvm, _1l.si,
> _1l.tvd, _1l.tvx, _1l_Lucene50_0.doc, _1l_Lucene50_0.dvd,
> _1l_Lucene50_0.dvm, _1l_Lucene50_0.pos, _1l_Lucene50_0.tim,
> _1l_Lucene50_0.tip, _1m.fdt, _1m.fdx, _1m.fnm, _1m.nvd, _1m.nvm, _1m.si,
> _1m.tvd, _1m.tvx, _1m_Lucene50_0.doc, _1m_Lucene50_0.dvd,
> _1m_Lucene50_0.dvm, _1m_Lucene50_0.pos, _1m_Lucene50_0.tim,
>

Re: Multiple index.timestamp directories using up disk space

2015-04-28 Thread Mark Miller

If copies of the index are not eventually cleaned up, I'd fill a JIRA to
address the issue. Those directories should be removed over time. At times
there will have to be a couple around at the same time and others may take
a while to clean up.

- Mark

On Tue, Apr 28, 2015 at 3:27 AM Ramkumar R. Aiyengar <
andyetitmo...@gmail.com> wrote:

> SolrCloud does need up to twice the amount of disk space as your usual
> index size during replication. Amongst other things, this ensures you have
> a full copy of the index at any point. There's no way around this, I would
> suggest you provision the additional disk space needed.
> On 20 Apr 2015 23:21, "Rishi Easwaran"  wrote:
>
> > Hi All,
> >
> > We are seeing this problem with solr 4.6 and solr 4.10.3.
> > For some reason, solr cloud tries to recover and creates a new index
> > directory - (ex:index.20150420181214550), while keeping the older index
> as
> > is. This creates an issues where the disk space fills up and the shard
> > never ends up recovering.
> > Usually this requires a manual intervention of  bouncing the instance and
> > wiping the disk clean to allow for a clean recovery.
> >
> > Any ideas on how to prevent solr from creating multiple copies of index
> > directory.
> >
> > Thanks,
> > Rishi.
> >
>

Re: Solr 5.1.0 Cloud and Zookeeper

2015-05-05 Thread Mark Miller

A bug fix version difference probably won't matter. It's best to use the
same version everyone else uses and the one our tests use, but it's very
likely 3.4.5 will work without a hitch.

- Mark

On Tue, May 5, 2015 at 9:09 AM shacky  wrote:

> Hi.
>
> I read on
> https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
> that Solr needs to use the same ZooKeeper version it owns (at the
> moment 3.4.6).
> Debian Jessie has ZooKeeper 3.4.5
> (https://packages.debian.org/jessie/zookeeper).
>
> Are you sure that this version won't work with Solr 5.1.0?
>
> Thank you very much for your help!
> Bye
>

Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Mark Miller

File a JIRA issue please. That OOM Exception is getting wrapped in a
RuntimeException it looks. Bug.

- Mark

On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV 
wrote:

> Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available
> for Solr.
>
> I am seeing the following OOMs:
> ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1]
> org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
> at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
> at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Java heap space
> WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1]
> org.eclipse.jetty.servlet.ServletHandler; Error for
> /solr/customer-1-de_CH_1/suggest_phrase
> java.lang.OutOfMemoryError: Java heap space
>
> The full commandline is
> /usr/local/java/bin/java -server -Xss256k -Xms16G
> -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90
> -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
> -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc
> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log
> -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
> -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr
> -Dlog4j.configuration=file:/opt/solr/log4j.properties
> -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983
> /opt/solr/logs OPTIONS=default,rewrite
>
> So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this
> does not seem to "happen". What am I missing? Is it o to pull a heapdump
> from Solr before killing/rebooting in oom_solr.sh?
>
> Also I would like to know what query parameters were sent to
> /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he
> OOM ...
>
>
> --
- Mark
about.me/markrmiller

Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Mark Miller

We will have to a find a way to deal with this long term. Browsing the code
I can see a variety of places where problem exception handling has been
introduced since this all was fixed.

- Mark

On Wed, Jun 3, 2015 at 8:19 AM Mark Miller  wrote:

> File a JIRA issue please. That OOM Exception is getting wrapped in a
> RuntimeException it looks. Bug.
>
> - Mark
>
>
> On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV 
> wrote:
>
>> Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available
>> for Solr.
>>
>> I am seeing the following OOMs:
>> ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1]
>> org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>> at
>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>> at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> at
>> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
>> at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
>> at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>> WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1]
>> org.eclipse.jetty.servlet.ServletHandler; Error for
>> /solr/customer-1-de_CH_1/suggest_phrase
>> java.lang.OutOfMemoryError: Java heap space
>>
>> The full commandline is
>> /usr/local/java/bin/java -server -Xss256k -Xms16G
>> -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90
>> -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>> -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
>> -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
>> -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc
>> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
>> -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log
>> -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks

Re: Please help test the new Angular JS Admin UI

2015-06-15 Thread Mark Miller

I didn't really follow this issue - what was the motivation for the rewrite?

Is it entirely under: "new code should be quite a bit easier to work on for
programmer
types" or are there other reasons as well?

- Mark

On Mon, Jun 15, 2015 at 10:40 AM Erick Erickson 
wrote:

> Gaaah, that'll teach me to type URLs late on Sunday!
>
> Thanks Upayavira!
>
> You'll notice that 5.2.1 just had the release announcement posted,
> so let the fun begin!
>
> Erick
>
> On Mon, Jun 15, 2015 at 4:12 AM, Upayavira  wrote:
> > Slight correction, the url, if running locally, would be:
> >
> > Http://localhost:8983/solr/index.html
> >
> > The reason we need your help: there is so much to the admin UI that I
> > cannot possibly have created the test setups to have tested it all. If
> > there are aspects of the UI you rely upon, please try them out on 5.2.1
> > - any bugs we don't find could persist long enough to be annoying and
> > inconvenient.
> >
> > Likewise, the sooner we can finish testing, the sooner we can do some
> > fun things:
> > * revamp the UI to be cloud friendly, e.g. Create and manage collections
> > * update schema browser to allow you to update your schema
> > * improve query tab to be able to prettily display your search results,
> > e.g.
> >- graphical explains viewer
> >- parsed query debugger
> > * and much more
> >
> > If enough people engage with testing, I will publish a zip file you can
> > unpack on top of your 5.2.1 zip to clear up any bugs that have been
> > found so far.
> >
> > Keep the bug reports coming!!
> >
> > Upayavira
> >
> > On Mon, Jun 15, 2015, at 01:53 AM, Erick Erickson wrote:
> >> And anyone who, you know, really likes working with UI code please
> >> help making it better!
> >>
> >> As of Solr 5.2, there is a new version of the Admin UI available, and
> >> several improvements are already in 5.2.1 (release imminent). The old
> >> admin UI is still the default, the new one is available at
> >>
> >> /admin/index.html
> >>
> >> Currently, you will see very little difference at first glance; the
> >> goal for this release was to have as much of the current functionality
> >> as possible ported to establish the framework. Upayavira has done
> >> almost all of the work getting this in place, thanks for taking that
> >> initiative Upayavira!
> >>
> >> Anyway, the plan is several fold:
> >> > Get as much testing on this as possible over the 5.2 time frame.
> >> > Make the new Angular JS-based code the default in 5.3
> >> > Make improvements/bug fixes to the admin UI on the new code line,
> particularly SolrCloud functionality.
> >> > Deprecate the current code and remove it eventually.
> >>
> >> The new code should be quite a bit easier to work on for programmer
> >> types, and there are Big Plans Afoot for making the admin UI more
> >> SolrCloud-friendly. Now that the framework is in place, it should be
> >> easier for anyone who wants to volunteer to contribute, please do!
> >>
> >> So please give it a whirl. I'm sure there will be things that crop up,
> >> and any help addressing them will be appreciated. There's already an
> >> umbrella JIRA for this work, see:
> >> https://issues.apache.org/jira/browse/SOLR-7666. Please link any new
> >> issues to this JIRA so we can keep track of it all as well as
> >> coordinate efforts. If all goes well, this JIRA can be used to see
> >> what's already been reported too.
> >>
> >> Note that things may be moving pretty quickly, so trunk and 5x will
> >> always be the most current. That said looking at 5.2.1 will be much
> >> appreciated.
> >>
> >> Erick
>
-- 
- Mark
about.me/markrmiller

Re: Please help test the new Angular JS Admin UI

2015-06-15 Thread Mark Miller

Sure, just curious. Wasn't sure if there was other motivations around what
could be done, or the overall look and feel that could be achieved or
anything beyond it's just easier for devs to work on and maintain (which is
always good when it comes to JavaScript - I still wish it was all GWT :) ).

- Mark

On Mon, Jun 15, 2015 at 11:35 AM Upayavira  wrote:

> The current UI was written before tools like AngularJS were widespread,
> and before decent separation of concerns was easy to achieve in
> Javascript.
>
> In a sense, your paraphrase of the justification was as you described -
> to make it easier for programmer types - partly by using a tool that is
> closer to their working model, but also this rewrite has more than
> halved the number of lines of code in the UI, which will make it vastly
> more maintainable and extensible.
>
> As a case in point, I've got a working patch that I'll release at some
> point soon that gives us a "collections" version of the "core admin"
> pane. I'd love to add HDFS support to the UI if there were APIs worth
> exposing (I haven't dug into HDFS support yet).
>
> Make sense?
>
> Upayavira
>
> On Mon, Jun 15, 2015, at 07:49 AM, Mark Miller wrote:
> > I didn't really follow this issue - what was the motivation for the
> > rewrite?
> >
> > Is it entirely under: "new code should be quite a bit easier to work on
> > for
> > programmer
> > types" or are there other reasons as well?
> >
> > - Mark
> >
> > On Mon, Jun 15, 2015 at 10:40 AM Erick Erickson  >
> > wrote:
> >
> > > Gaaah, that'll teach me to type URLs late on Sunday!
> > >
> > > Thanks Upayavira!
> > >
> > > You'll notice that 5.2.1 just had the release announcement posted,
> > > so let the fun begin!
> > >
> > > Erick
> > >
> > > On Mon, Jun 15, 2015 at 4:12 AM, Upayavira  wrote:
> > > > Slight correction, the url, if running locally, would be:
> > > >
> > > > Http://localhost:8983/solr/index.html
> > > >
> > > > The reason we need your help: there is so much to the admin UI that I
> > > > cannot possibly have created the test setups to have tested it all.
> If
> > > > there are aspects of the UI you rely upon, please try them out on
> 5.2.1
> > > > - any bugs we don't find could persist long enough to be annoying and
> > > > inconvenient.
> > > >
> > > > Likewise, the sooner we can finish testing, the sooner we can do some
> > > > fun things:
> > > > * revamp the UI to be cloud friendly, e.g. Create and manage
> collections
> > > > * update schema browser to allow you to update your schema
> > > > * improve query tab to be able to prettily display your search
> results,
> > > > e.g.
> > > >- graphical explains viewer
> > > >- parsed query debugger
> > > > * and much more
> > > >
> > > > If enough people engage with testing, I will publish a zip file you
> can
> > > > unpack on top of your 5.2.1 zip to clear up any bugs that have been
> > > > found so far.
> > > >
> > > > Keep the bug reports coming!!
> > > >
> > > > Upayavira
> > > >
> > > > On Mon, Jun 15, 2015, at 01:53 AM, Erick Erickson wrote:
> > > >> And anyone who, you know, really likes working with UI code please
> > > >> help making it better!
> > > >>
> > > >> As of Solr 5.2, there is a new version of the Admin UI available,
> and
> > > >> several improvements are already in 5.2.1 (release imminent). The
> old
> > > >> admin UI is still the default, the new one is available at
> > > >>
> > > >> /admin/index.html
> > > >>
> > > >> Currently, you will see very little difference at first glance; the
> > > >> goal for this release was to have as much of the current
> functionality
> > > >> as possible ported to establish the framework. Upayavira has done
> > > >> almost all of the work getting this in place, thanks for taking that
> > > >> initiative Upayavira!
> > > >>
> > > >> Anyway, the plan is several fold:
> > > >> > Get as much testing on this as possible over the 5.2 time frame.
> > > >> > Make the new Angular JS-based code the default in 5.3
> > > >> > Make improvements/bug fixes to t

Re: Deletion Policy in Solr Cloud

2015-06-15 Thread Mark Miller

SolrCloud does not really support any form of rollback.

On Mon, Jun 15, 2015 at 5:05 PM Aurélien MAZOYER <
aurelien.mazo...@francelabs.com> wrote:

> Hi all,
>
> Is DeletionPolicy customization still available in Solr Cloud? Is there
> a way to rollback to a previous commit point in Solr Cloud thanks to a
> specific deletion policy?
>
> Thanks,
>
>   Aurélien
>
-- 
- Mark
about.me/markrmiller

Re: mapreduce job using soirj 5

2015-06-17 Thread Mark Miller

I think there is some better classpath isolation options in the works for
Hadoop. As it is, there is some harmonization that has to be done depending
on versions used, and it can get tricky.

- Mark

On Wed, Jun 17, 2015 at 9:52 AM Erick Erickson 
wrote:

> For sure there are a few rough edges here
>
> On Wed, Jun 17, 2015 at 12:28 AM, adfel70  wrote:
> > We cannot downgrade httpclient in solrj5 because its using new features
> and
> > we dont want to start altering solr code, anyway we thought about
> upgrading
> > httpclient in hadoop but as Erick said its sounds more work than just put
> > the jar in the data nodes.
> >
> > About that flag we tried it, hadoop even has an environment variable
> > HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed.
> >
> > We thought this is an issue that is more likely that solr users will
> > encounter rather than cloudera users, so we will be glad for a more
> elegant
> > solution or workaround than to replace the httpclient jar in the data
> nodes
> >
> > Thank you all for your responses
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller

Re: ClassNotFoundException with Custom ZkACLProvider

2016-11-15 Thread Mark Miller

Could you file a JIRA issue so that this report does not get lost?

- Mark

On Tue, Nov 15, 2016 at 10:49 AM Solr User  wrote:

> For those interested, I ended up bundling the customized ACL provider with
> the solr.war.  I could not stomach looking at the stack trace in the logs.
>
> On Mon, Nov 7, 2016 at 4:47 PM, Solr User  wrote:
>
> > This is mostly just an FYI regarding future work on issues like
> SOLR-8792.
> >
> > I wanted admin update but world read on ZK since I do not have anything
> > sensitive from a read perspective in the Solr data and did not want to
> > force all SolrCloud clients to implement authentication just for read.
> So,
> > I extended DefaultZkACLProvider and implemented a replacement for
> > VMParamsAllAndReadonlyDigestZkACLProvider.
> >
> > My custom code is loaded from the sharedLib in solr.xml.  However, there
> > is a temporary ZK lookup to read solr.xml (and chroot) which is obviously
> > done before loading sharedLib.  Therefore, I am faced with a
> > ClassNotFoundException.  This has no negative effect on the ACL
> > functionalityjust the annoying stack trace in the logs.  I do not
> want
> > to package this custom code with the Solr code and do not want to package
> > this along with Solr dependencies in the Jetty lib/ext.
> >
> > So, I am planning to live with the stack trace and just wanted to share
> > this for any future work on the dynamic solr.xml and chroot lookups or in
> > case I am missing some work-around.
> >
> > Thanks!
> >
> >
>
-- 
- Mark
about.me/markrmiller

Re: autoAddReplicas:true not working

2016-11-15 Thread Mark Miller

Look at the Overseer host and see if there are any relevant logs for
autoAddReplicas.

- Mark

On Mon, Oct 24, 2016 at 3:01 PM Chetas Joshi  wrote:

> Hello,
>
> I have the following configuration for the Solr cloud and a Solr collection
> This is Solr on HDFS and Solr version I am using is 5.5.0
>
> No. of hosts: 52 (Solr Cloud)
>
> shard count:   50
> replicationFactor:   1
> MaxShardsPerNode: 1
> autoAddReplicas:   true
>
> Now, one of my shards is down. Although there are two hosts which are
> available in my cloud on which a new replica could be created, it just does
> not create a replica. All 52 hosts are healthy. What could be the reason
> for this?
>
> Thanks,
>
> Chetas.
>
-- 
- Mark
about.me/markrmiller

Re: solr shutdown

2016-11-15 Thread Mark Miller

That is probably partly because of hdfs cache key unmapping. I think I
improved that in some issue at some point.

We really want to wait by default for a long time though - even 10 minutes
or more. If you have tons of SolrCores, each of them has to be torn down,
each of them might commit on close, custom code and resources can be used
and need to be released, and a lot of time can be spent legit. Given these
long shutdowns will normally be legit and not some hang, I think we want to
be willing to wait a long time. A user that finds this too long can always
kill the process themselves, or lower the wait. But most of the time you
will pay for that for a non clean shutdown except in exceptional situations.

- Mark

On Fri, Oct 21, 2016 at 12:10 PM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thanks Shawn - We've had to increase this to 300 seconds when using a
> large cache size with HDFS, and a fairly heavily loaded index routine (3
> million docs per day).  I don't know if that's why it takes a long time
> to shutdown, but it can take a while for solr cloud to shutdown
> gracefully.  If it does not, you end up with write.lock files for some
> (if not all) of the shards, and have to delete them manually before
> restarting.
>
> -Joe
>
>
> On 10/21/2016 9:01 AM, Shawn Heisey wrote:
> > On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
> >> I'm running solrcloud in foreground mode (-f). Does it make a
> >> difference for Solr if I stop it by pressing ctrl-c, sending it a
> >> SIGTERM or using "solr stop"?
> > All of those should produce the same result in the end -- Solr's
> > shutdown hook will be called and a graceful shutdown will commence.
> >
> > Note that in the case of the "bin/solr stop" command, the default is to
> > only wait five seconds for graceful shutdown before proceeding to a
> > forced kill, which for a typical install, means that forced kills become
> > the norm rather than the exception.  We have an issue to increase the
> > max timeout, but it hasn't been done yet.
> >
> > I strongly recommend anyone going into production should edit the script
> > to increase the timeout.  For the shell script I would do at least 60
> > seconds.  The Windows script just does a pause, not an intelligent wait,
> > so going that high probably isn't advisable on Windows.
> >
> > Thanks,
> > Shawn
> >
>
> --
- Mark
about.me/markrmiller

Re: Solr node not found in ZK live_nodes

2016-12-07 Thread Mark Miller

That already happens. The ZK client itself will reconnect when it can and
trigger everything to be setup like when the cluster first starts up,
including a live node and leader election, etc.

You may have hit a bug or something else missing from this conversation,
but reconnecting after losing the ZK connection is a basic feature from day
one.

Mark
On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada 
wrote:

> Thanks Erick! Should I create a JIRA issue for the same?
>
> Regarding the logs, I have changed the log level to WARN. That may be the
> reason, I couldn't get anything from it.
>
> Thanks,
> Manohar
>
> On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson 
> wrote:
>
> > Most likely reason is that the Solr node in question,
> > was not reachable thus it was removed from
> > live_nodes. Perhaps due to temporary network
> > glitch, long GC pause or the like. If you're rolling
> > your logs over it's quite possible that any illuminating
> > messages were lost. The default 4M size for each
> > log is quite lo at INFO level...
> >
> > It does seem possible for a Solr node to periodically
> > check its status and re-insert itself into live_nodes,
> > go through recovery and all that. So far most of that
> > registration logic is baked into startup code. What
> > do others think? Worth a JIRA?
> >
> > Erick
> >
> > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada 
> > wrote:
> > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).
> > >
> > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> > setup
> > > was done 3 months back. Suddenly, few days back our search started
> > failing
> > > because one of the solr node(consider s16) was not seen in Zookeeper,
> > i.e.,
> > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
> > > However, the corresponding Solr process was up and running.
> > >
> > > To my surprise, I couldn't find any errors or warnings in solr or
> > zookeeper
> > > logs related to this. I have few questions -
> > >
> > > 1. Is there any reason why this registration to ZK was lost? I know
> logs
> > > should provide some information, but, it didn't. Did anyone encountered
> > > similar issue, if so, what can be the root cause?
> > > 2. Shouldn't Solr be clever enough to detect that the registration to
> ZK
> > > was lost (for some reason) and should try to re-register again?
> > >
> > > PS: The issue is resolved by restarting the Solr node. However, I am
> > > curious to know why it happened in the first place.
> > >
> > > Thanks
> >
>
-- 
- Mark
about.me/markrmiller

Re: hung threads and CLOSE_WAIT sockets

2014-03-07 Thread Mark Miller


On Mar 7, 2014, at 3:11 AM, Avishai Ish-Shalom  wrote:

> SOLR-5216 


Yes, that is the one.

- Mark

http://about.me/markrmiller

Re: More Maintenance Releases?

2014-03-12 Thread Mark Miller

+1 to the idea, I love bug fix releases (which is why I volunteered to do the 
last couple).

The main limiting factor is a volunteer to do it. Users requesting a specific 
bug fix relese is probably a good way to prompt volunteers though.

-- 
Mark Miller
about.me/markrmiller

On March 12, 2014 at 9:14:50 AM, Doug Turnbull 
(dturnb...@opensourceconnections.com) wrote:

Hello Solr community,  

We have been using Solr to great effect at OpenSource Connections.  
Occasionally though, we'll hit a bug in say 4.5.1, that gets fixed in  
4.6.0. Unfortunately, as 4.6.0 is a release sporting several new features,  
there's invariably new bugs that get introduced. So while my bug in 4.5.1  
is fixed, a new bug related to new features in 4.6.0 means 4.6.0 might be a  
showstopper.  

This is more a question for the PMC than anything (with comments from  
others welcome). Would it be possible to do more minor bug-fix releases? I  
realize this could be a burden, so maybe it would be good to pick a  
version and decide this will be a "long term support" release. We will  
backport bug fixes and do several additional bug-fix releases for 4-6  
months? Then we'd pick another version to be a "long term support" release?  

This would help with the overall stability of Solr and help in the decision  
about how/when to upgrade Solr.  

Cheers,  
--  
Doug Turnbull  
Search & Big Data Architect  
OpenSource Connections <http://o19s.com>

Re: Change replication factor

2014-03-12 Thread Mark Miller

You can simply create a new SolrCore with the same collection and shard id as 
the colleciton and shard you want to add a replica too.

There is also an addReplica command comming to the collections API. Or perhaps 
it’s in 4.7, I don’t know, this JIRA issue is a little confusing as it’s still 
open, though it looks like stuff has been committed: 
https://issues.apache.org/jira/browse/SOLR-5130
-- 
Mark Miller
about.me/markrmiller

On March 12, 2014 at 10:40:15 AM, Mike Hugo (m...@piragua.com) wrote:

After a collection has been created in SolrCloud, is there a way to modify  
the Replication Factor?  

Say I start with a few nodes in the cluster, and have a replication factor  
of 2. Over time, the index grows and we add more nodes to the cluster, can  
I increase the replication factor to 3?  

Thanks!  

Mike

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Mark Miller

Honestly, the best approach is to start with no collections defined and use the 
collections api.

If you want to prefconfigure (which has it’s warts and will likely go away as 
an option), it’s tricky to do it with different numShards, as that is a global 
property per node.

You would basically set -DnumShards=1 and start your cluster with Foo defined. 
Then you stop the cluster and define Bar and start with -DnumShards=3.

The ability to preconfigure and bootstrap like this was kind of a transitional 
system meant to help people that knew Solr pre SolrCloud get something up 
quickly back before we had a collections api.

The collections API is much better if you want multiple collections and it’s 
the future.
-- 
Mark Miller
about.me/markrmiller

On March 20, 2014 at 10:24:18 AM, Ugo Matrangolo (ugo.matrang...@gmail.com) 
wrote:

Hi,  

I would like some advice about the best way to bootstrap from scratch a  
SolrCloud cluster housing at least two collections with different  
sharding/replication setup.  

Going through the docs/'Solr In Action' book what I have sees so far is  
that there is a way to bootstrap a SolrCloud cluster with sharding  
configuration using the:  

-DnumShards=2  

but this (afaik) works only for a single collection. What I need is a way  
to deploy from scratch a SolrCloud cluster housing (e.g.) two collections  
Foo and Bar where Foo has only one shard and is replicated everywhere while  
Bar has three shards and ,again, is replicated.  

I can't find a config file where to put this sharding plan and I'm starting  
to think that the only way to do this is after the deploy using the  
Collections API.  

Is there a best approach way to do this ?  

Ugo

Re: solr cloud distributed optimize() becomes serialized

2014-03-21 Thread Mark Miller

Recently fixed in Lucene - should be able to find the issue if you dig a little.
-- 
Mark Miller
about.me/markrmiller

On March 21, 2014 at 10:25:56 AM, Greg Walters (greg.walt...@answers.com) wrote:

I've seen this on 4.6.  

Thanks,  
Greg  

On Mar 20, 2014, at 11:58 PM, Shalin Shekhar Mangar  
wrote:  

> That's not right. Which Solr versions are you on (question for both  
> William and Chris)?  
>  
> On Fri, Mar 21, 2014 at 8:07 AM, William Bell  wrote:  
>> Yeah. optimize() also used to come back immediately if the index was  
>> already indexed. It just reopened the index.  
>>  
>> We uses to use that for cleaning up the old directories quickly. But now it  
>> does another optimize() even through the index is already optimized.  
>>  
>> Very strange.  
>>  
>>  
>> On Tue, Mar 18, 2014 at 11:30 AM, Chris Lu  wrote:  
>>  
>>> I wonder whether this is a known bug. In previous SOLR cloud versions, 4.4  
>>> or maybe 4.5, an explicit optimize(), without any parameters, it usually  
>>> took 2 minutes for a 32 core cluster.  
>>>  
>>> However, in 4.6.1, the same call took about 1 hour. Checking the index  
>>> modification time for each core shows 2 minutes gap if sorted.  
>>>  
>>> We are using a solrj client connecting to zookeeper. I found it is talking  
>>> to a specific solr server A, and that server A is distributing the calls to 
>>>  
>>> all other solr servers. Here is the thread dump for this server A:  
>>>  
>>> at  
>>>  
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395)
>>>   
>>> at  
>>>  
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>>>   
>>> at  
>>>  
>>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293)
>>>   
>>> at  
>>>  
>>> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:226)
>>>   
>>> at  
>>>  
>>> org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:195)
>>>   
>>> at  
>>>  
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1250)
>>>   
>>> at  
>>>  
>>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>>>   
>>>  
>>  
>>  
>>  
>> --  
>> Bill Bell  
>> billnb...@gmail.com  
>> cell 720-256-8076  
>  
>  
>  
> --  
> Regards,  
> Shalin Shekhar Mangar.

RE: Limit on # of collections -SolrCloud

2014-03-21 Thread Mark Miller


On March 21, 2014 at 1:46:13 PM, Tim Potter (tim.pot...@lucidworks.com) wrote:

We've seen instances where you end up restarting the overseer node each time as 
you restart the cluster, which causes all kinds of craziness. 


That would be a great test to add tot he suite.

-- 
Mark Miller
about.me/markrmiller

Re: MergingSolrIndexes not supported by SolrCloud?why?

2014-03-26 Thread Mark Miller

FWIW, you can use merge like this if you run on HDFS rather than local 
filesystem.

-- 
Mark 

On March 26, 2014 at 12:34:39 PM, Shawn Heisey (s...@elyograg.org) wrote:

On 3/26/2014 3:14 AM, rulinma wrote:  
> MergingSolrIndexes:  
>  
> http://192.168.22.32:8080/solr/admin/cores?action=mergeindexes&core=collection_cms2&indexDir=/home/solr/mrl/data/index
>   
>  
> I use it in local that supported well, but in solrCloud(3 machine) that not  
> work well.  
> can anyone give me some advice. thanks.  

There are two reasons that it doesn't work in SolrCloud:  

1) The indexes that you are merging must be on the same host. With  
SolrCloud this is not guaranteed, and normally will NOT be the case.  

2) Once the indexes are merged, the SolrCloud clusterstate in zookeeper  
will be wrong.  

I don't know whether the first problem can be overcome without copying  
one of the shards before merging.  

Assuming you can overcome the first problem, or it's not a problem  
because they are already on the same host, overcoming the second would  
require a merge option on the Collections API. I do not see any  
evidence in the reference guide that this exists right now.  

Thanks,  
Shawn

Re: [ANN] Solr in Action book release (Solr 4.7)

2014-03-27 Thread Mark Miller

Nice, Congrats!
-- 
Mark Miller
about.me/markrmiller

On March 27, 2014 at 11:17:49 AM, Trey Grainger (solrt...@gmail.com) wrote:

I'm excited to announce the final print release of *Solr in Action*, the  
newest Solr book by Manning publications covering through Solr 4.7 (the  
current version). The book is available for immediate purchase in print and  
ebook formats, and the *outline*, some *free chapters* as well as the *full  
source code are also available* at http://solrinaction.com.  

I would love it if you would check the book out, and I would also  
appreciate your feedback on it, especially if you find the book to be a  
useful guide as you are working with Solr! Timothy Potter and I (Trey  
Grainger) worked tirelessly on the book for nearly 2 years to bring you a  
thorough (664 pg.) and fantastic example-driven guide to the best Solr has  
to offer.  

*Solr in Action* is intentionally designed to be a learning guide as  
opposed to a reference manual. It builds from an initial introduction to  
Solr all the way to advanced topics such as implementing a predictive  
search experience, writing your own Solr plugins for function queries and  
multilingual text analysis, using Solr for big data analytics, and even  
building your own Solr-based recommendation engine. The book uses fun  
real-world examples, including analyzing the text of tweets, searching and  
faceting on restaurants, grouping similar items in an ecommerce  
application, highlighting interesting keywords in UFO sighting reports, and  
even building a personalized job search experience.  

For a more detailed write-up about the book and it's contents, you can also  
visit the Solr homepage at  
https://lucene.apache.org/solr/books.html#solr-in-action. Thanks in advance  
for checking it out, and I really hope many of you find the book to be  
personally useful!  

All the best,  

Trey Grainger  
Co-author,  
*Solr in Action*Director of Engineering, Search & Analytics @CareerBuilder

Re: SolrCloud 4.6.1 hanging

2014-03-28 Thread Mark Miller

I'm looking into a hang as well - not sure of it involves searching as well, 
but it may. Can you file a JIRA issue - let's track it down. 

- Mark

> On Mar 28, 2014, at 8:07 PM, Rafał Kuć  wrote:
> 
> Hello!
> 
> I have an issue with one of the SolrCloud deployments and I wanted to
> ask maybe someone had a similar issue. Six machines, a collection with
> 6 shards with a replication factor of 3. It all runs on 6 physical
> servers, each with 24 cores. We've indexed about 32 milion documents
> and everything was fine until that point.
> 
> Now, during performance tests, we run into an issue - SolrCloud hangs
> when querying and indexing is run at the same time. First we see a
> normal load on the machines, than the load starts to drop and thread
> dump shown numerous threads like this:
> 
> Thread 12624: (state = BLOCKED)
> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
> be imprecise)
> - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=186 (Compiled frame)
> - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
> @bci=42, line=2043 (Compiled frame)
> - org.apache.http.pool.PoolEntryFuture.await(java.util.Date) @bci=50, 
> line=131 (Compiled frame)
> - 
> org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(java.lang.Object, 
> java.lang.Object, long, java.util.concurrent.TimeUnit, 
> org.apache.http.pool.PoolEntryFuture) @bci=431, line=281 (Compiled frame)
> - 
> org.apache.http.pool.AbstractConnPool.access$000(org.apache.http.pool.AbstractConnPool,
>  java.lang.Object, java.lang.Object, long, java.util.concurrent.TimeUnit, 
> org.apache.http.pool.PoolEntryFuture) @bci=8, line=62 (Compiled frame)
> - org.apache.http.pool.AbstractConnPool$2.getPoolEntry(long, 
> java.util.concurrent.TimeUnit) @bci=15, line=176 (Compiled frame)
> - org.apache.http.pool.AbstractConnPool$2.getPoolEntry(long, 
> java.util.concurrent.TimeUnit) @bci=3, line=169 (Compiled frame)
> - org.apache.http.pool.PoolEntryFuture.get(long, 
> java.util.concurrent.TimeUnit) @bci=38, line=100 (Compiled frame)
> - 
> org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(java.util.concurrent.Future,
>  long, java.util.concurrent.TimeUnit) @bci=4, line=212 (Compiled frame)
> - 
> org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(long,
>  java.util.concurrent.TimeUnit) @bci=10, line=199 (Compiled frame)
> - 
> org.apache.http.impl.client.DefaultRequestDirector.execute(org.apache.http.HttpHost,
>  org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext) @bci=259, 
> line=456 (Compiled frame)
> - 
> org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.HttpHost,
>  org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext) @bci=344, 
> line=906 (Compiled frame)
> - 
> org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest,
>  org.apache.http.protocol.HttpContext) @bci=21, line=805 (Compiled frame)
> - 
> org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest)
>  @bci=6, line=784 (Compiled frame)
> - 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest,
>  org.apache.solr.client.solrj.ResponseParser) @bci=1175, line=395 
> (Interpreted frame)
> - 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest)
>  @bci=17, line=199 (Compiled frame)
> - 
> org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(org.apache.solr.client.solrj.impl.LBHttpSolrServer$Req)
>  @bci=132, line=285 (Interpreted frame)
> - 
> org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(org.apache.solr.client.solrj.request.QueryRequest,
>  java.util.List) @bci=13, line=214 (Compiled frame)
> - org.apache.solr.handler.component.HttpShardHandler$1.call() @bci=246, 
> line=161 (Compiled frame)
> - org.apache.solr.handler.component.HttpShardHandler$1.call() @bci=1, 
> line=118 (Interpreted frame)
> - java.util.concurrent.FutureTask$Sync.innerRun() @bci=29, line=334 
> (Interpreted frame)
> - java.util.concurrent.FutureTask.run() @bci=4, line=166 (Compiled frame)
> - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=471 
> (Interpreted frame)
> - java.util.concurrent.FutureTask$Sync.innerRun() @bci=29, line=334 
> (Interpreted frame)
> - java.util.concurrent.FutureTask.run() @bci=4, line=166 (Compiled frame)
> - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1145 (Compiled frame)
> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 
> (Interpreted frame)
> - java.lang.Thread.run() @bci=11, line=724 (Interpreted frame)
> 
> I've checked I/O statistics, GC working, memory usage, networking and
> all of that - those resources are not exhausted during the test.
> 
> Hard autocommit is set to 15

Re: zookeeper reconnect failure

2014-03-30 Thread Mark Miller

We don’t currently retry, but I don’t think it would hurt much if we did - at 
least briefly.

If you want to file a JIRA issue, that would be the best way to get it in a 
future release.

-- 
Mark Miller
about.me/markrmiller

On March 28, 2014 at 5:40:47 PM, Michael Della Bitta 
(michael.della.bi...@appinions.com) wrote:

Hi, Jessica,  

We've had a similar problem when DNS resolution of our Hadoop task nodes  
has failed. They tend to take a dirt nap until you fix the problem  
manually. Are you experiencing this in AWS as well?  

I'd say the two things to do are to poll the node state via HTTP using a  
monitoring tool so you get an immediate notification of the problem, and to  
install some sort of caching server like nscd if you expect to have DNS  
resolution failures regularly.  



Michael Della Bitta  

Applications Developer  

o: +1 646 532 3062  

appinions inc.  

"The Science of Influence Marketing"  

18 East 41st Street  

New York, NY 10017  

t: @appinions <https://twitter.com/Appinions> | g+:  
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
  
w: appinions.com <http://www.appinions.com/>  


On Fri, Mar 28, 2014 at 4:27 PM, Jessica Mallet wrote:  

> Hi,  
>  
> First off, I'd like to give a disclaimer that this probably is a very edge  
> case issue. However, since it happened to us, I would like to get some  
> advice on how to best handle this failure scenario.  
>  
> Basically, we had some network issue where we temporarily lost connection  
> and DNS. The zookeeper client properly triggered the watcher. However, when  
> trying to reconnect, this following Exception is thrown:  
>  
> 2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line  
> 121) :java.net.UnknownHostException: : Name or  
> service not known  
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)  
> at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)  
> at  
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)  
> at java.net.InetAddress.getAllByName0(InetAddress.java:1211)  
> at java.net.InetAddress.getAllByName(InetAddress.java:1127)  
> at java.net.InetAddress.getAllByName(InetAddress.java:1063)  
> at  
>  
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60)
>   
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)  
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)  
> at  
> org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)  
> at  
>  
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
>   
> at  
>  
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147)
>   
> at  
>  
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) 
>  
> at  
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)  
>  
> I tried to look at the code and it seems that there'd be no further retries  
> to connect to Zookeeper, and the node is basically left in a bad state and  
> will not recover on its own. (Please correct me if I'm reading this wrong.)  
> Thinking about it, this is probably fair, since normally you wouldn't  
> expect retries to fix an "unknown host" issue--even though in our case it  
> would have--but I'm wondering what we should do to handle this situation if  
> it happens again in the future.  
>  
> Any advice is appreciated.  
>  
> Thanks,  
> Jessica  
>

Re: Race condition in Leader Election

2014-04-15 Thread Mark Miller

We have to fix that then.

-- 
Mark Miller
about.me/markrmiller

On April 15, 2014 at 12:20:03 PM, Rich Mayfield (mayfield.r...@gmail.com) wrote:

I see something similar where, given ~1000 shards, both nodes spend a LOT of 
time sorting through the leader election process. Roughly 30 minutes.  

I too am wondering - if I force all leaders onto one node, then shut down both, 
then start up the node with all of the leaders on it first, then start up the 
other node, then I think I would have a much faster startup sequence.  

Does that sound reasonable? And if so, is there a way to trigger the leader 
election process without taking the time to unload and recreate the shards?  

> Hi  
>  
> When restarting a node in solrcloud, i run into scenarios where both the  
> replicas for a shard get into "recovering" state and never come up causing  
> the error "No servers hosting this shard". To fix this, I either unload one  
> core or restart one of the nodes again so that one of them becomes the  
> leader.  
>  
> Is there a way to "force" leader election for a shard for solrcloud? Is  
> there a way to break ties automatically (without restarting nodes) to make  
> a node as the leader for the shard?  
>  
>  
> Thanks  
> Nitin

Re: Distributed commits in CloudSolrServer

2014-04-15 Thread Mark Miller

Inline responses below.
-- 
Mark Miller
about.me/markrmiller

On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkee...@gmail.com) wrote:

I have a SolrCloud index, 1 shard, with a leader and one replica, and 3 
ZKs. The Solr indexes are behind a load balancer. There is one 
CloudSolrServer client updating the indexes. The index schema includes 3 
ExternalFileFields. When the CloudSolrServer client issues a hard commit, I 
observe that the commits occur sequentially, not in parallel, on the leader 
and replica. The duration of each commit is about a minute. Most of this 
time is spent reloading the 3 ExternalFileField files. Because of the 
sequential commits, there is a period of time (1 minute+) when the index 
searchers will return different results, which can cause a bad user 
experience. This will get worse as replicas are added to handle 
auto-scaling. The goal is to keep all replicas in sync w.r.t. the user 
queries. 

My questions: 

1. Is there a reason that the distributed commits are done in sequence, not 
in parallel? Is there a way to change this behavior? 


The reason is that updates are currently done this way - it’s the only safe way 
to do it without solving some more problems. I don’t think you can easily 
change this. I think we should probably file a JIRA issue to track a better 
solution for commit handling. I think there are some complications because of 
how commits can be added on update requests, but its something we probably want 
to try and solve before tackling *all* updates to replicas in parallel with the 
leader.



2. If instead, the commits were done in parallel by a separate client via a 
GET to each Solr instance, how would this client get the host/port values 
for each Solr instance from zookeeper? Are there any downsides to doing 
commits this way? 

Not really, other than the extra management.





Thanks, 
Peter

Re: clusterstate.json does not reflect current state of down versus active

2014-04-16 Thread Mark Miller

bq.  before any of Solr gets to do its shutdown sequence
Yeah, this is kind of an open issue. There might be a JIRA for it, but I cannot 
remember. What we really need is an explicit shutdown call that can be made 
before stopping jetty so that it’s done gracefully.

-- 
Mark Miller
about.me/markrmiller

On April 16, 2014 at 2:54:15 PM, Daniel Collins (danwcoll...@gmail.com) wrote:

We actually have a similar scenario, we have 64 cores per machine, and even  
that sometimes has issues when we shutdown all cores at once. We did start  
to write a "force election for Shard X" tool but it was harder than we  
expected, its still on our to-do list.  

Some context, we run 256 shards spread over 4 machines, and several Solr  
instances per machine (16 cores per instance, 4 instances per machine).  
Our machines regularly go down for maintenance, and shutting down the Solr  
core closes the HTTP interface (at Jetty level) before any of Solr gets to  
do its shutdown sequence: publishing as down, election, etc. Since we run  
an NRT system, that causes all kinds of backlogs in the indexing pipeline  
whilst Solr queues up indexing requests waiting for a valid leader...  
Hence the need for an API to move leadership off the instance, *before* we  
begin shutdown.  

Any insight would be appreciated, we are happy to contribute this back if  
we can get it working!  

On 16 April 2014 15:49, Shawn Heisey  wrote:  

> On 4/16/2014 8:02 AM, Rich Mayfield wrote:  
> > However there doesn’t appear to be a way to force leadership to/from a  
> > particular replica.  
>  
> I would have expected that doing a core reload on the current leader  
> would force an election and move the leader, but on my 4.2.1 SolrCloud  
> (the only version I have running at the moment) that does not appear to  
> be happening. IMHO we need a way to force a leader change on a shard.  
> An API for "move all leaders currently on this Solr instance" would  
> actually be a very useful feature.  
>  
> I can envision two issues for you to file in Jira. The first would be  
> an Improvement issue, the second would be a Bug:  
>  
> * SolrCloud: Add API to move leader off a Solr instance  
> * SolrCloud: LotsOfCollections takes a long time to stabilize  
>  
> If we can get a dev who specializes in SolrCloud to respond, perhaps  
> they'll have a recommendation about whether these are sensible issues,  
> and if not, what they'd recommend.  
>  
> Thanks,  
> Shawn  
>  
>

Re: waitForLeaderToSeeDownState when leader is down

2014-04-16 Thread Mark Miller

What version are you testing? Thought we had addressed this.
-- 
Mark Miller
about.me/markrmiller

On April 16, 2014 at 6:02:09 PM, Jessica Mallet (mewmewb...@gmail.com) wrote:

Hi Furkan,  

Thanks for the reply. I understand the intent. However, in the case I  
described, the follower is blocked on looking for a leader (throws the  
pasted exception because it can't find the leader) before it participates  
in election; therefore, it will never come up while the leader waits for it  
to come up (they're deadlocked waiting for each other). What I'm suggesting  
is that maybe the follower should just just skip waitForLeaderToSeeDownState  
when there's no leader (instead of failing with the pasted stacktrace) and  
go ahead and start participating in election. That way the leader will see  
more replicas come up, and they can sync with each other and move on.  

Thanks,  
Jessica  


On Sat, Apr 12, 2014 at 4:14 PM, Furkan KAMACI wrote:  

> Hi;  
>  
> There is an explanation as follows: "This is meant to protect the case  
> where you stop a shard or it fails and then the first node to get started  
> back up has stale data - you don't want it to just become the leader. So we  
> wait to see everyone we know about in the shard up to 3 or 5 min by  
> default. Then we know all the shards participate in the leader election and  
> the leader will end up with all updates it should have." You can check it  
> from here:  
>  
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3ccajt9wng_yykcxggentgcxguhhcjhidear-jygpgrnkaedrz...@mail.gmail.com%3E
>   
>  
> Thanks;  
> Furkan KAMACI  
>  
>  
> 2014-04-08 23:51 GMT+03:00 Jessica Mallet :  
>  
> > To clarify, when I said "leader" and "follower" I meant the old leader  
> and  
> > follower before the zookeeper session expiration. When they're recovering  
> > there's no leader.  
> >  
> >  
> > On Tue, Apr 8, 2014 at 1:49 PM, Jessica Mallet   
> > wrote:  
> >  
> > > I'm playing with dropping the cluster's connections to zookeeper and  
> then  
> > > reconnecting them, and during recovery, I always see this on the  
> leader's  
> > > logs:  
> > >  
> > > ElectionContext.java (line 361) Waiting until we see more replicas up  
> for  
> > > shard shard1: total=2 found=1 timeoutin=139902  
> > >  
> > > and then on the follower, I see:  
> > > SolrException.java (line 121) There was a problem finding the leader in  
> > > zk:org.apache.solr.common.SolrException: Could not get leader props  
> > > at  
> > >  
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:958)  
> > > at  
> > >  
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:922)  
> > > at  
> > >  
> >  
> org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1463)
>   
> > > at  
> > >  
> >  
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:380)
>   
> > > at  
> > > org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)  
> > > at  
> > > org.apache.solr.cloud.ZkController$1.command(ZkController.java:232)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.ConnectionManager$2$1.run(ConnectionManager.java:179)
>   
> > > Caused by: org.apache.zookeeper.KeeperException$NoNodeException:  
> > > KeeperErrorCode = NoNode for /collections/lc4/leaders/shard1  
> > > at  
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:111)  
> > > at  
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)  
> > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:273)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:270)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>   
> > > at  
> > >  
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:270)  
> > > at  
> > >  
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:936)  
> > > ... 6 more  
> > >  
> > > They block each other's progress until leader decides to give up and  
> not  
> > > wait for more replicas to come up:  
> > >  
> > > ElectionContext.java (line 368) Was waiting for replicas to come up,  
> but  
> > > they are taking too long - assuming they won't come back till later  
> > >  
> > > and then recovery moves forward again.  
> > >  
> > > Should waitForLeaderToSeeDownState move on if there's no leader at the  
> > > moment?  
> > > Thanks,  
> > > Jessica  
> > >  
> >  
>

Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-17 Thread Mark Miller

Odd - might be helpful if you can share your sorlconfig.xml being used.

-- 
Mark Miller
about.me/markrmiller

On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com) wrote:

I'm doing HDFS input and output in my job, with the following:  

hadoop jar /mnt/faas-solr.jar \  
-D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \  
--update-conflict-resolver com.massrel.faassolr.SolrConflictResolver  
\  
--morphline-file /mnt/morphline-ignore.conf \  
--zk-host $ZKHOST \  
--output-dir hdfs://$MASTERIP:9000/output/ \  
--collection $COLLECTION \  
--go-live \  
--verbose \  
hdfs://$MASTERIP:9000/input/  

Index creation works,  

$ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0  
drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data  
drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index  
-rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0.fdt  
-rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0.fdx  
-rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0.fnm  
-rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0.si  
-rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc  
-rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos  
-rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim  
-rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip  
-rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd  
-rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm  
-rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/segments_1  
-rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/segments_2  
drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/tlog  
-rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/tlog/tlog.000  

But the go-live step fails, it's trying to use the HDFS path as the remote  
index path?  

14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into  
Solr cluster...  
14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs://  
10.98.33.114:9000/output/results/part-0 into  
http://discover8-test-1d.i.massrel.com:8983/solr  
14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command  
java.util.concurrent.ExecutionException:  
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:  
directory '/mnt/solr_8983/home/hdfs:/  
10.98.33.114:9000/output/results/part-0/data/index' does not exist  
at java.util.concurrent.FutureTask.report(FutureTask.java:122)  
at java.util.concurrent.FutureTask.get(FutureTask.java:188)  
at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)  
at  
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)  
at  
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)  
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)  
at  
org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596) 
 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
at  
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
at  
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
at java.lang.reflect.Method.invoke(Method.java:606)  
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)  
Caused by:  
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:  
directory '/mnt/solr_8983/home/hdfs:/  
10.98.33.114:9000/output/results/part-0/data/index' does not exist  
at  
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
  
at  
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
  
at  
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
  
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)  
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)  
at java.util.concurrent.FutureTask

Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-23 Thread Mark Miller

Currently, go-live is only supported when you are running Solr on HDFS.

bq. The indexes must exist on the disk of the Solr host

This does not apply when you are running Solr on HDFS. It’s a shared 
filesystem, so local does not matter here.

"no writes should be allowed on either core until the merge is complete. If 
writes are allowed, corruption may occur on the merged index.”
Doesn’t sound right to me at all.

-- 
Mark Miller
about.me/markrmiller

On April 22, 2014 at 10:38:08 AM, Brett Hoerner (br...@bretthoerner.com) wrote:

I think I'm just misunderstanding the use of go-live. From mergeindexes  
docs: "The indexes must exist on the disk of the Solr host, which may make  
using this in a distributed environment cumbersome."  

I'm guessing I'll have to write some sort of tool that pulls each completed  
index out of HDFS and onto the respective SolrCloud machines and manually  
do some kind of merge? I don't want to (can't) be running my Hadoop jobs on  
the same nodes that SolrCloud is running on...  

Also confusing to me: "no writes should be allowed on either core until the  
merge is complete. If writes are allowed, corruption may occur on the  
merged index." Is that saying that Solr will block writes, or is that  
saying the end user has to ensure no writes are happening against the  
collection during a merge? That seems... risky?  

On Tue, Apr 22, 2014 at 9:29 AM, Brett Hoerner wrote:  

> Anyone have any thoughts on this?  
>  
> In general, am I expected to be able to go-live from an unrelated cluster  
> of Hadoop machines to a SolrCloud that isn't running off of HDFS?  
>  
> intput: HDFS  
> output: HDFS  
> go-live cluster: SolrCloud cluster on different machines running on plain  
> MMapDirectory  
>  
> I'm back to looking at the code but holy hell is debugging Hadoop hard. :)  
>  
>  
> On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner 
> wrote:  
>  
>> https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b  
>>  
>>  
>> On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote:  
>>  
>>> Odd - might be helpful if you can share your sorlconfig.xml being used.  
>>>  
>>> --  
>>> Mark Miller  
>>> about.me/markrmiller  
>>>  
>>> On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com)  
>>> wrote:  
>>>  
>>> I'm doing HDFS input and output in my job, with the following:  
>>>  
>>> hadoop jar /mnt/faas-solr.jar \  
>>> -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \  
>>> --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver  
>>> \  
>>> --morphline-file /mnt/morphline-ignore.conf \  
>>> --zk-host $ZKHOST \  
>>> --output-dir hdfs://$MASTERIP:9000/output/ \  
>>> --collection $COLLECTION \  
>>> --go-live \  
>>> --verbose \  
>>> hdfs://$MASTERIP:9000/input/  
>>>  
>>> Index creation works,  
>>>  
>>> $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0  
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data  
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index  
>>> -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdt  
>>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdx  
>>> -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fnm  
>>> -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.si  
>>> -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc  
>>> -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos  
>>> -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim  
>>> -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip  
>>> -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd  
>>> -rwxr-xr-x

Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller

System.currentTimeMillis can jump around due to NTP, etc. If you are trying to 
count elapsed time, you don’t want to use a method that can jump around with 
the results.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) wrote:

Hi Rafał Kuć  
I got it,the point is many operating systems measure time in units of  
tens of milliseconds,and the System.currentTimeMillis() is just base on  
operating system.  
In my case,I just do DIH with a crontable, Is there any possiblity to get  
in that trouble?I am really can not picture what the situation may lead to  
the problem.  


Thanks very much.  


2014-04-26 20:49 GMT+08:00 YouPeng Yang :  

> Hi Mark Miller  
> Sorry to get you in these discussion .  
> I notice that Mark Miller report this issure in  
> https://issues.apache.org/jira/browse/SOLR-5734 according to  
> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
> the zookeeper.  
> If I just do DIH with JDBCDataSource ,I do not think it will get the  
> problem.  
> Please give some hints  
>  
> >> Bonus,just post the last mail I send about the problem:  
>  
> I have just compare the difference between the version 4.6.0 and 4.7.1.  
> Notice that the time in the getConnection function is declared with the  
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
> Curious about the resson for the change.the benefit of it .Is it  
> neccessory?  
> I have read the SOLR-5734 ,  
> https://issues.apache.org/jira/browse/SOLR-5734  
> Do some google about the difference of currentTimeMillis and nano,but  
> still can not figure out it.  
>  
> Thank you very much.  
>  
>  
> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>  
> Hi  
>> I have just compare the difference between the version 4.6.0 and  
>> 4.7.1. Notice that the time in the getConnection function is declared  
>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>>  
>>  
>>  
>>  
>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>  
>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>>>  
>>>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
>>>> process that we are using takes 4x as long to complete. The only odd  
>>>> thing I notice is when I enable debug logging for the dataimporthandler  
>>>> process, it appears that in the new version each sql query is resulting  
>>>> in  
>>>> a new connection opened through jdbcdatasource (log:  
>>>> http://pastebin.com/JKh4gpmu). Were there any changes that would  
>>>> affect  
>>>> the speed of running a full import?  
>>>>  
>>>  
>>> This is most likely the problem you are experiencing:  
>>>  
>>> https://issues.apache.org/jira/browse/SOLR-5954  
>>>  
>>> The fix will be in the new 4.8 version. The release process for 4.8 is  
>>> underway right now. A second release candidate was required yesterday. If  
>>> no further problems are encountered, the release should be made around the  
>>> middle of next week. If problems are encountered, the release will be  
>>> delayed.  
>>>  
>>> Here's something very important that has been mentioned before: Solr  
>>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
>>> current release from Oracle as I write this) is recommended as a minimum.  
>>>  
>>> If a 4.7.3 version is built, this is a fix that we should backport.  
>>>  
>>> Thanks,  
>>> Shawn  
>>>  
>>>  
>>  
>

Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller

My answer remains the same. I guess if you want more precise terminology, 
nanoTime will generally be monotonic and currentTimeMillis will not be, due to 
things like NTP, etc. You want monotonicity for measuring elapsed times.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
wrote:

NTP should slew the clock rather than jump it. I haven't checked recently, but 
that is how it worked in the 90's when I was organizing the NTP hierarchy at 
HP.  

It only does step changes if the clocks is really wrong. That is most likely at 
reboot, when other demons aren't running yet.  

wunder  

On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  

> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
> to count elapsed time, you don’t want to use a method that can jump around 
> with the results.  
> --  
> Mark Miller  
> about.me/markrmiller  
>  
> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
> wrote:  
>  
> Hi Rafał Kuć  
> I got it,the point is many operating systems measure time in units of  
> tens of milliseconds,and the System.currentTimeMillis() is just base on  
> operating system.  
> In my case,I just do DIH with a crontable, Is there any possiblity to get  
> in that trouble?I am really can not picture what the situation may lead to  
> the problem.  
>  
>  
> Thanks very much.  
>  
>  
> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>  
>> Hi Mark Miller  
>> Sorry to get you in these discussion .  
>> I notice that Mark Miller report this issure in  
>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>> the zookeeper.  
>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>> problem.  
>> Please give some hints  
>>  
>>>> Bonus,just post the last mail I send about the problem:  
>>  
>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>> Notice that the time in the getConnection function is declared with the  
>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>>  
>> Thank you very much.  
>>  
>>  
>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>  
>> Hi  
>>> I have just compare the difference between the version 4.6.0 and  
>>> 4.7.1. Notice that the time in the getConnection function is declared  
>>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>>  
>>>  
>>>  
>>>  
>>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>>  
>>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>>>>  
>>>>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
>>>>> process that we are using takes 4x as long to complete. The only odd  
>>>>> thing I notice is when I enable debug logging for the dataimporthandler  
>>>>> process, it appears that in the new version each sql query is resulting  
>>>>> in  
>>>>> a new connection opened through jdbcdatasource (log:  
>>>>> http://pastebin.com/JKh4gpmu). Were there any changes that would  
>>>>> affect  
>>>>> the speed of running a full import?  
>>>>>  
>>>>  
>>>> This is most likely the problem you are experiencing:  
>>>>  
>>>> https://issues.apache.org/jira/browse/SOLR-5954  
>>>>  
>>>> The fix will be in the new 4.8 version. The release process for 4.8 is  
>>>> underway right now. A second release candidate was required yesterday. If  
>>>> no further problems are encountered, the release should be made around the 
>>>>  
>>>> middle of next week. If problems are encountered, the release will be  
>>>> delayed.  
>>>>  
>>>> Here's something very important that has been mentioned before: Solr  
>>>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
>>>> current release from Oracle as I write this) is recommended as a minimum.  
>>>>  
>>>> If a 4.7.3 version is built, this is a fix that we should backport.  
>>>>  
>>>> Thanks,  
>>>> Shawn  
>>>>  
>>>>  
>>>  
>>  

--  
Walter Underwood  
wun...@wunderwood.org

Re: zkCli zkhost parameter

2014-04-26 Thread Mark Miller

Have you tried a comma-separated list or are you going by documentation? It 
should work. 
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 1:03:25 PM, Scott Stults 
(sstu...@opensourceconnections.com) wrote:

It looks like this only takes a single host as its value, whereas the  
zkHost environment variable for Solr takes a comma-separated list.  
Shouldn't the client also take a comma-separated list?  

k/r,  
Scott

Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller

bq. due to things like NTP, etc.

The full sentence is very important. NTP is not the only way for this to happen 
- you also have leap seconds, daylight savings time, internet clock sync, a 
whole host of things that affect currentTimeMillis and not nanoTime. It is 
without question the way to go to even hope for monotonicity.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 1:11:14 PM, Walter Underwood (wun...@wunderwood.org) wrote:

NTP works very hard to keep the clock positive monotonic. But nanoTime is 
intended for elapsed time measurement anyway, so it is the right choice.  

You can get some pretty fun clock behavior by running on virtual machines, like 
in AWS. And some system real time clocks don't tick during a leap second. And 
Windows system clocks are probably still hopeless.  

If you want to run the clock backwards, we don't need NTP, we can set it with 
"date".  

wunder  

On Apr 26, 2014, at 9:10 AM, Mark Miller  wrote:  

> My answer remains the same. I guess if you want more precise terminology, 
> nanoTime will generally be monotonic and currentTimeMillis will not be, due 
> to things like NTP, etc. You want monotonicity for measuring elapsed times.  
> --  
> Mark Miller  
> about.me/markrmiller  
>  
> On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
> wrote:  
>  
> NTP should slew the clock rather than jump it. I haven't checked recently, 
> but that is how it worked in the 90's when I was organizing the NTP hierarchy 
> at HP.  
>  
> It only does step changes if the clocks is really wrong. That is most likely 
> at reboot, when other demons aren't running yet.  
>  
> wunder  
>  
> On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  
>  
>> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
>> to count elapsed time, you don’t want to use a method that can jump around 
>> with the results.  
>> --  
>> Mark Miller  
>> about.me/markrmiller  
>>  
>> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
>> wrote:  
>>  
>> Hi Rafał Kuć  
>> I got it,the point is many operating systems measure time in units of  
>> tens of milliseconds,and the System.currentTimeMillis() is just base on  
>> operating system.  
>> In my case,I just do DIH with a crontable, Is there any possiblity to get  
>> in that trouble?I am really can not picture what the situation may lead to  
>> the problem.  
>>  
>>  
>> Thanks very much.  
>>  
>>  
>> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>>  
>>> Hi Mark Miller  
>>> Sorry to get you in these discussion .  
>>> I notice that Mark Miller report this issure in  
>>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>>> the zookeeper.  
>>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>>> problem.  
>>> Please give some hints  
>>>  
>>>>> Bonus,just post the last mail I send about the problem:  
>>>  
>>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>>> Notice that the time in the getConnection function is declared with the  
>>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>>  
>>> Thank you very much.  
>>>  
>>>  
>>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>>  
>>> Hi  
>>>> I have just compare the difference between the version 4.6.0 and  
>>>> 4.7.1. Notice that the time in the getConnection function is declared  
>>>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>>> Curious about the resson for the change.the benefit of it .Is it  
>>>> neccessory?  
>>>> I have read the SOLR-5734 ,  
>>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>>> Do some google about the difference of currentTimeMillis and nano,but  
>>>> still can not figure out it.  
>>>>  
>>>>  
>>>>  
>>>>  
>>>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>>>  
>>>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>>>>&

Re: overseer queue clogged

2014-05-01 Thread Mark Miller

What version are you running? This was fixed in a recent release. It can happen 
if you hit add core with the defaults on the admin page in older versions.

-- 
Mark Miller
about.me/markrmiller

On May 1, 2014 at 11:19:54 AM, ryan.cooke (ryan.co...@gmail.com) wrote:

I saw an overseer queue clogged as well due to a bad message in the queue.  
Unfortunately this went unnoticed for a while until there were 130K messages  
in the overseer queue. Since it was a production system we were not able to  
simply stop everything and delete all Zookeeper data, so we manually deleted  
messages by issuing commands directly through the zkCli.sh tool. After all  
the messages had been cleared, some nodes were in the wrong state (e.g.  
'down' when should have been 'active'). Restarting the 'down' or 'recovery  
failed' nodes brought the whole cluster back to a stable and healthy state.  

Since it can take some digging to determine backlog in the overseer queue,  
some of the symptoms we saw were:  
Overseer throwing an exception like "Path must not end with / character"  
Random nodes throwing an exception like "ClusterState says we are the  
leader, but locally we don't think so"  
Bringing up new replicas time out when attempting to fetch shard id  



--  
View this message in context: 
http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4134129.html
  
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replica active during warming

2014-05-13 Thread Mark Miller

If you are sure about this, can you file a JIRA issue?
-- 
Mark Miller
about.me/markrmiller

On May 12, 2014 at 8:50:42 PM, lboutros (boutr...@gmail.com) wrote:

Dear All,  

we just finished the migration of a cluster from Solr 4.3.1 to Solr 4.6.1.  
With solr 4.3.1 a node was not considered as active before the end of the  
warming process.  

Now, with solr 4.6.1 a replica is considered as active during the warming  
process.  
This means that if you restart a replica or create a new one, queries will  
be send to this replica and the query will hang until the end of the warming  
process (We do not use cold searchers).  

We have quite long warming queries and this is a big issue.  
Is there a parameter I do not know that could control this behavior ?  

thanks,  

Ludovic.  



-  
Jouve  
France.  
--  
View this message in context: 
http://lucene.472066.n3.nabble.com/Replica-active-during-warming-tp4135274.html 
 
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: AliasAPI-Maximum number of collections

2014-06-06 Thread Mark Miller

The main limit is the 1mb zk node limit. But even that can be raised. 

- Mark

> On Jun 6, 2014, at 6:21 AM, Shalin Shekhar Mangar  
> wrote:
> 
> No, there's no theoretical limit.
> 
> 
>> On Fri, Jun 6, 2014 at 11:20 AM, ku3ia  wrote:
>> 
>> Hi all!
>> The question is how many collections I can put to one alias, using
>> SolrCloud
>> alias collection API
>> 
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api4
>> to process distributed requests? Is it limited?
>> 
>> Thanks.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SolrCloud-AliasAPI-Maximum-number-of-collections-tp4140305.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: SolrCloud multiple data center support

2014-06-23 Thread Mark Miller

We have been waiting for that issue to be finished before thinking too hard 
about how it can improve things. There have been a couple ideas (I’ve mostly 
wanted it for improving the internal zk mode situation), but no JIRAs yet that 
I know of.
-- 
Mark Miller
about.me/markrmiller

On June 23, 2014 at 10:37:27 AM, Arcadius Ahouansou (arcad...@menelic.com) 
wrote:

On 3 February 2014 22:16, Daniel Collins  wrote:  

>  
> One other option is in ZK trunk (but not yet in a release) is the ability  
> to dynamically reconfigure ZK ensembles (  
> https://issues.apache.org/jira/browse/ZOOKEEPER-107). That would give the  
> ability to create new ZK instances in the event of a DC failure, and  
> reconfigure the Solr Cloud without having to reload everything. That would  
> help to some extent.  
>  


ZOOKEEPER-107 has now been implemented.  
I checked the Solr Jira and it seems there is nothing for multi-data-center  
support.  

Do we need to create a ticket or is there already one?  

Thanks.  

Arcadius.

Re: Question about solrcloud recovery process

2014-07-03 Thread Mark Miller

I don’t know offhand about the num docs issue - are you doing NRT?

As far as being able to query the replica, I’m not sure anyone ever got to 
making that fail if you directly query a node that is not active. It certainly 
came up, but I have no memory of anyone tackling it. Of course in many other 
cases, information is being pulled from zookeeper and recovering nodes are 
ignored. If this is the issue I think it is, it should only be an issue when 
you directly query recovery node.

The CloudSolrServer client works around this issue as well.

-- 
Mark Miller
about.me/markrmiller

On July 3, 2014 at 8:42:48 AM, Peter Keegan (peterlkee...@gmail.com) wrote:

I bring up a new Solr node with no index and watch the index being  
replicated from the leader. The index size is 12G and the replication takes  
about 6 minutes, according to the replica log (from 'Starting recovery  
process' to 'Finished recovery process). However, shortly after the  
replication begins, while the index files are being copied, I am able to  
query the index on the replica and see q=*:* find all of the documents.  
But, from the core admin screen, numDocs = 0, and in the cloud screen the  
replica is in 'recovering' mode. How can this be?  

Peter

Re: Slow inserts when using Solr Cloud

2014-07-08 Thread Mark Miller

Updates are currently done locally before concurrently being sent to all 
replicas - so on a single update, you can expect 2x just from that.

As for your results, it sounds like perhaps there is more overhead than we 
would like in the code that sends to replicas and forwards updates? Someone 
would have to dig in to really know I think. I would doubt it’s a configuration 
issue, but you never know.

-- 
Mark Miller
about.me/markrmiller

On July 8, 2014 at 9:18:28 AM, Ian Williams (NWIS - Applications Design) 
(ian.willi...@wales.nhs.uk) wrote:

Hi  

I'm encountering a surprisingly high increase in response times when I insert 
new documents into a SolrCloud, compared with a standalone Solr instance.  

I have a SolrCloud set up for test and evaluation purposes. I have four shards, 
each with a leader and a replica, distributed over four Windows virtual 
servers. I have zookeeper running on three of the four servers. There are not 
many documents in my SolrCloud (just a few hundred). I am using composite id 
routing, specifying a prefix to my document ids which is then used by Solr to 
determine which shard the document should be stored on.  

I determine in advance which shard a document with a given id prefix will end 
up in, by trying it out in advance. I then try the following scenarios, using 
inserts without commits. E.g. I use:  
curl http://servername:port/solr/update -H "Content-Type: text/xml" 
--data-binary @test.txt  

1. Insert a document, sending it to the server hosting the correct shard, with 
replicas turned off (response time <20ms)  
I find that if I 'switch off' the replicas for my shard (by shutting down Solr 
for the replicas), and then I send the new document to the server hosting the 
leader for the correct shard, then I get a very fast response, i.e. under 10ms, 
which is similar to the performance I get when not using SolrCloud. This is 
expected, as I've removed any overhead to do with replicas or routing to the 
correct shard.  

2. Insert a document, sending it to the server hosting the correct shard, but 
with replicas turned on (response time approx 250ms)  
If I switch on the replica for that shard, then my average response time for an 
insert increases from <10ms to around 250ms. Now I expect an overhead, because 
the leader has to find out where the replica is (from Zookeeper?) and then 
forward the request to that replica, then wait for a reply - but an increase 
from <20ms to 250ms seems very high?  

3. Insert a document, sending it to a server hosting the incorrect shard, with 
replicas turned on (response time approx 500ms)  
If I do the same thing again but this time send to the server hosting a 
different shard to the shard my document will end up in, the average response 
times increase again to around 500ms. Again, I'd expect an increase because of 
the extra step of needing to forward to the correct shard, but the increase 
seems very high?  


Should I expect this much of an overhead for shard routing and replicas, or 
might this indicate a problem in my configuration?  

Many thanks  
Ian  

---  
Mae?r wybodaeth a gynhwysir yn y neges e-bost hon ac yn unrhyw atodiadau?n 
gyfrinachol. Os ydych yn ei derbyn ar gam, rhowch wybod i?r anfonwr a?i dileu?n 
ddi-oed. Ni fwriedir i ddatgelu i unrhyw un heblaw am y derbynnydd, boed yn 
anfwriadol neu fel arall, hepgor cyfrinachedd. Efallai bydd Gwasanaeth Gwybodeg 
GIG Cymru (NWIS) yn monitro ac yn cofnodi pob neges e-bost rhag firysau a 
defnydd amhriodol. Mae?n bosibl y bydd y neges e-bost hon ac unrhyw atebion neu 
atodiadau dilynol yn ddarostyngedig i?r Ddeddf Rhyddid Gwybodaeth. Mae?r farn a 
fynegir yn y neges e-bost hon yn perthyn i?r anfonwr ac nid ydynt o reidrwydd 
yn perthyn i NWIS.  

The information included in this email and any attachments is confidential. If 
received in error, please notify the sender and delete it immediately. 
Disclosure to any party other than the addressee, whether unintentional or 
otherwise, is not intended to waive confidentiality. The NHS Wales Informatics 
Service (NWIS) may monitor and record all emails for viruses and inappropriate 
use. This e-mail and any subsequent replies or attachments may be subject to 
the Freedom of Information Act. The views expressed in this email are those of 
the sender and not necessarily of NWIS.  
---

Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Mark Miller

I think that’s pretty much a search time param, though it might end being used 
on the update side as well. In any case, I know it doesn’t affect commit or 
optimize.

Also, to my knowledge, SolrCloud optimize support was never explicitly added or 
tested.

--  
Mark Miller
about.me/markrmiller

On July 9, 2014 at 12:00:27 PM, Shawn Heisey (s...@elyograg.org) wrote:
> > I thought a bug had been filed on the distrib=false problem,

Re: SolrCloud replica dies under high throughput

2014-07-21 Thread Mark Miller

Looks like you probably have to raise the http client connection pool limits to 
handle that kind of load currently.

They are specified as top level config in solr.xml:

maxUpdateConnections
maxUpdateConnectionsPerHost

--  
Mark Miller
about.me/markrmiller

On July 21, 2014 at 7:14:59 PM, Darren Lee (d...@amplience.com) wrote:
> Hi,
>  
> I'm doing some benchmarking with Solr Cloud 4.9.0. I am trying to work out 
> exactly how  
> much throughput my cluster can handle.
>  
> Consistently in my test I see a replica go into recovering state forever 
> caused by what  
> looks like a timeout during replication. I can understand the timeout and 
> failure (I  
> am hitting it fairly hard) but what seems odd to me is that when I stop the 
> heavy load it still  
> does not recover the next time it tries, it seems broken forever until I 
> manually go in,  
> clear the index and let it do a full resync.
>  
> Is this normal? Am I misunderstanding something? My cluster has 4 nodes (2 
> shards, 2 replicas)  
> (AWS m3.2xlarge). I am indexing with ~800 concurrent connections and a 10 sec 
> soft commit.  
> I consistently get this problem with a throughput of around 1.5 million 
> documents per  
> hour.
>  
> Thanks all,
> Darren
>  
>  
> Stack Traces & Messages:
>  
> [qtp779330563-627] ERROR org.apache.solr.servlet.SolrDispatchFilter â 
> null:org.apache.http.conn.ConnectionPoolTimeoutException:  
> Timeout waiting for connection from pool
> at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226)
>   
> at 
> org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
>   
> at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:422)
>   
> at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>   
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>   
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>   
> at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
>   
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   
> at java.lang.Thread.run(Thread.java:724)
>  
> Error while trying to recover. 
> core=assets_shard2_replica1:java.util.concurrent.ExecutionException:  
> org.apache.solr.client.solrj.SolrServerException: IOException occured when  
> talking to server at: http://xxx.xxx.15.171:8080/solr
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> at 
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
>   
> at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)  
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)  
> Caused by: org.apache.solr.client.solrj.SolrServerException: IOException 
> occured  
> when talking to server at: http://xxx.xxx.15.171:8080/solr
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
>   
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
>   
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
>   
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.net.SocketException: Socket closed
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at 
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
>   
> at 
> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
>   
> at 
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
>   
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:

Re: Are there any performance impact of using a non-standard length UUID as the unique key of Solr?

2014-07-24 Thread Mark Miller

Some good info on unique id’s for Lucene / Solr can be found here: 
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
-- 
Mark Miller
about.me/markrmiller

On July 24, 2014 at 9:51:28 PM, He haobo (haob...@gmail.com) wrote:

Hi,  

In our Solr collection (Solr 4.8), we have the following unique key  
definition.  
  

id  


In our external java program, we will generate an UUID with  
UUID.randomUUID().toString() first. Then, we will use Cryptographic hash to  
generate a 32 bytes length text and finally use it as id.  

For now, we might need to post more than 20k Solr docs per second. Then  
UUID.randomUUID() or the Cryptographic hash stuff might take time. We might  
have a simple workaround to share one Cryptographic hash stuff for many  
Solr docs. Namely, we want to append sequence to Cryptographic hash such  
as 9AD0BB6DDD7AA9FE4D9EB1FF16B3BDFY00,  
9AD0BB6DDD7AA9FE4D9EB1FF16B3BDFY01,  
9AD0BB6DDD7AA9FE4D9EB1FF16B3BDFY02, etc.  


What we want to know, if we use a 38 bytes length id, are there any  
performance impact for Solr data insert or query? Or, if we use Solr's  
default automatically generated id implementation, should it be more  
efficient?  



Thanks,  
Eternal

Re: Disabling transaction logs

2014-08-13 Thread Mark Miller

That is good testing :) We should track down what is up with that 30%. Might 
open a JIRA with some logs.

It can help if you restart the overseer node last.

There are likely some improvements around this post 4.6.

-- 
Mark Miller
about.me/markrmiller

On August 13, 2014 at 12:05:27 PM, KNitin (nitin.t...@gmail.com) wrote:
> Thank u all! Yes I want to disable it for testing purposes
> 
> The main issue is that rolling restart of solrcloud for 1000 collections is
> extremely unreliable and slow. More than 30% of the collections fail to
> recover.
> 
> What are some good guidelines to follow while restarting a massive cluster
> like this ?
> 
> Are there any new improvements (post 4.6) in solr that helps restarts to be
> more robust ?
> 
> Thanks
> 
> On Sunday, August 10, 2014, rulinma wrote:
> 
> > good.
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Disabling-transaction-logs-tp4151721p415.html
> >  
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Timeout Errors while using Collections API

2013-10-17 Thread Mark Miller

There was a reload bug in SolrCloud that was fixed in 4.4 - 
https://issues.apache.org/jira/browse/SOLR-4805

Mark

On Oct 17, 2013, at 7:18 AM, Grzegorz Sobczyk  wrote:

> Sorry for previous spam (something eat my message)
> 
> I have the same problem but with reload action
> ENV:
> - 3x Solr 4.2.1 with 4 cores each
> - ZK
> 
> Before error I have:
> - 14, 2013 5:25:36 AM CollectionsHandler handleReloadAction INFO: Reloading
> Collection : name=products&action=RELOAD
> - hundreds of (with the same timestamp) 14, 2013 5:25:36 AM
> DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path:
> /overseer/collection-queue-work state: SyncConnected type
> NodeChildrenChanged
> - 13 times (from 2013 5:25:39 to 5:25:45):
> -- 14, 2013 5:25:39 AM SolrDispatchFilter handleAdminRequest INFO: [admin]
> webapp=null path=/admin/cores params={action=STATUS&wt=ruby} status=0
> QTime=2
> -- 14, 2013 5:25:39 AM SolrDispatchFilter handleAdminRequest INFO: [admin]
> webapp=null path=/admin/cores params={action=STATUS&wt=ruby} status=0
> QTime=1
> -- 14, 2013 5:25:39 AM SolrCore execute INFO: [forum] webapp=/solr
> path=/admin/mbeans params={stats=true&wt=ruby} status=0 QTime=2
> -- 14, 2013 5:25:39 AM SolrCore execute INFO: [knowledge] webapp=/solr
> path=/admin/mbeans params={stats=true&wt=ruby} status=0 QTime=2
> -- 14, 2013 5:25:39 AM SolrCore execute INFO: [products] webapp=/solr
> path=/admin/mbeans params={stats=true&wt=ruby} status=0 QTime=2
> -- 14, 2013 5:25:39 AM SolrCore execute INFO: [shops] webapp=/solr
> path=/admin/mbeans params={stats=true&wt=ruby} status=0 QTime=1
> - 14, 2013 5:26:21 AM SolrCore execute INFO: [products] webapp=/solr
> path=/select/ params={q=solrpingquery} hits=0 status=0 QTime=0
> - 14, 2013 5:26:36 AM DistributedQueue$LatchChildWatcher process INFO:
> Watcher fired on path: /overseer/collection-queue-work/qnr-000806
> state: SyncConnected type NodeDeleted
> - 14, 2013 5:26:36 AM SolrException log SEVERE:
> org.apache.solr.common.SolrException: reloadcollection the collection time
> out:60s
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:162)
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleReloadAction(CollectionsHandler.java:184)
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:120)
> 
> What are possilibities of such behaviour? When this error is thrown?
> Does anybody has the same issue?
> 
> 
> On 17 October 2013 13:08, Grzegorz Sobczyk  wrote:
> 
>> 
>> 
>> On 16 October 2013 11:48, RadhaJayalakshmi <
>> rlakshminaraya...@inautix.co.in> wrote:
>> 
>>> Hi,
>>> My setup is
>>> Zookeeper ensemble - running with 3 nodes
>>> Tomcats - 9 Tomcat instances are brought up, by registereing with
>>> zookeeper.
>>> 
>>> Steps :
>>> 1) I uploaded the solr configuration like db_data_config, solrconfig,
>>> schema
>>> xmls into zookeeoper
>>> 2)  Now, i am trying to create a collection with the collection API like
>>> below:
>>> 
>>> 
>>> http://miadevuser001.albridge.com:7021/solr/admin/collections?action=CREATE&name=Schwab_InvACC_Coll&numShards=1&replicationFactor=2&createNodeSet=localhost:7034_solr,localhost:7036_solr&collection.configName=InvestorAccountDomainConfig
>>> 
>>> Now, when i execute this command, i am getting the following error:
>>> 500>> name="QTime">60015>> name="msg">createcollection the collection time out:60s>> name="trace">org.apache.solr.common.SolrException: createcollection the
>>> collection time out:60s
>>>at
>>> 
>>> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:175)
>>>at
>>> 
>>> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:156)
>>>at
>>> 
>>> org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:290)
>>>at
>>> 
>>> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:112)
>>>at
>>> 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>>at
>>> 
>>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
>>>at
>>> 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)
>>>at
>>> 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
>>>at
>>> 
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>>>at
>>> 
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>>>at
>>> 
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>>>at
>>> 
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>>>at
>>> 
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>>>at
>>> 
>>>

Re: Solr 4.6.0 latest build

2013-10-22 Thread Mark Miller

I would try the 4.6 builds and report back your results.

I don't know that Chris is seeing the same thing that has come up in the
past.

In my testing, I'm not having issues with the latest 4.6. The more people
that try it out, the more we will know.

- Mark

On Tue, Oct 22, 2013 at 6:31 AM, michael.boom  wrote:

> Thanks Chris & Rafal!
>
> So the problem actually persists in 4.6.
> I'll then watch this issue and cheer for Mark's fix.
>
>
>
> -
> Thanks,
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-6-0-latest-build-tp4096960p4096992.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

-- 
- Mark

Re: Minor bug with CloudSolrServer and collection-alias.

2013-10-23 Thread Mark Miller

I filed https://issues.apache.org/jira/browse/SOLR-5380 and just committed a 
fix.

- Mark

On Oct 23, 2013, at 11:15 AM, Shawn Heisey  wrote:

> On 10/23/2013 3:59 AM, Thomas Egense wrote:
>> Using cloudSolrServer.setDefaultCollection(collectionId) does not work as
>> intended for an alias spanning more than 1 collection.
>> The virtual collection-alias collectionID is recoqnized as a existing
>> collection, but it does only query one of the collections it is mapped to.
>> 
>> You can confirm this easy in AliasIntegrationTest.
>> 
>> The test-class AliasIntegrationTest creates to cores with 2 and 3 different
>> documents. And then creates an alias pointing to both of them.
>> 
>> Line 153:
>>// search with new cloud client
>>CloudSolrServer cloudSolrServer = new
>> CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
>>cloudSolrServer.setParallelUpdates(random().nextBoolean());
>>query = new SolrQuery("*:*");
>>query.set("collection", "testalias");
>>res = cloudSolrServer.query(query);
>>cloudSolrServer.shutdown();
>>assertEquals(5, res.getResults().getNumFound());
>> 
>> No unit-test bug here, however if you change it from setting the
>> collectionid on the query but on CloudSolrServer instead,it will produce
>> the bug:
>> 
>>// search with new cloud client
>>CloudSolrServer cloudSolrServer = new
>>CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
>>cloudSolrServer.setDefaultCollection("testalias");
>>cloudSolrServer.setParallelUpdates(random().nextBoolean());
>>query = new SolrQuery("*:*");
>>//query.set("collection", "testalias");
>>res = cloudSolrServer.query(query);
>>cloudSolrServer.shutdown();
>>assertEquals(5, res.getResults().getNumFound());  <-- Assertion failure
>> 
>> Should I create a Jira issue for this?
> 
> Thomas,
> 
> I have confirmed this with the following test patch, which adds to the
> test rather than changing what's already there:
> 
> http://apaste.info/9ke5
> 
> I'm about to head off to the train station to start my commute, so I
> will be unavailable for a little while.  If you haven't gotten the jira
> filed by the time I get to another computer, I will create it.
> 
> Thanks,
> Shawn
>

[ANNOUNCE] Apache Solr 4.5.1 released.

2013-10-24 Thread Mark Miller

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

October 2013, Apache Solr™ 4.5.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic clustering,
database integration, rich document (e.g., Word, PDF) handling, and
geospatial search. Solr is highly scalable, providing fault tolerant
distributed search and indexing, and powers the search and navigation
features of many of the world's largest internet sites.

Solr 4.5.1 includes 16 bug fixes as well as Lucene 4.5.1 and its bug
fixes. The release is available for immediate download at:

http://lucene.apache.org/solr/mirrors-solr-latest-redir.html


See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using
may not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.

Happy searching,

Lucene/Solr developers
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSaUdSAAoJED+/0YJ4eWrI90UP/RGSmLBdvrc/5NZEb7LSCSjW
z4D3wJ2i4a0rLpiW2qA547y/NZ5KZcmrDSzJu0itf8Q/0q+tm7/d30uPg/cdRlgl
wGERcxsyfPfTqBjzdSNNGgNm++tnkkqRJbYEfsG5ApWrKicitU7cPb82m8oCdlnn
4wnhYt6tfu/EPCglt9ixF7Ukv5o7txMnwWGmkGTbUt8ugp9oOMN/FfGHex/FVxcF
xHhWBLymIJy24APEEF/Mq3UW12hQT+aRof66xBch0fEPVlbDitBa9wNuRNQ98M90
ZpTl8o0ITMUKjTKNkxZJCO5LQeNwhYaOcM5nIykGadWrXBZo5Ob611ZKeYPZBWCW
Ei88dwJQkXaDcVNLZ/HVcAePjmcALHd3nc4uNfcJB8zvgZOPagMpXW2rRSXFACHM
FdaRezTdH8Uh5zp2n3hsqYCbpDreRoXGXaiOgVZ+8EekVMGYUnMFKdqNlqhVnF6r
tzp+aaCBhGDUD5xUw2w2fb5c9Jh1oIQ9f7fsVH78kgsHShySnte3NbfoFWUClPMX
PwrfWuZpmu9In2ZiJVYSOD6MBqmJ+z3N1bnf1kqsitv7MonkvQkOoDIafW835vG9
3aajknE1vazOATSGHIxCtJfqzTEqeqFqVbjG/qS72XIhMey8tVAwjrjcgFnayk9Z
xrG1W1o2sjrYkioJ7nZK
=8++G
-END PGP SIGNATURE-

Re: difference between apache tomcat vs Jetty

2013-10-25 Thread Mark Miller

Just to add to the “use jetty for Solr” argument - Solr 5.0 will no longer 
consider itself a webapp and will consider the fact that Jetty is a used an 
implementation detail.

We won’t necessarily make it impossible to use a different container, but the 
project won’t condone it or support it and may do some things that assume 
Jetty. Solr is taking over this layer in 5.0.

- Mark

On Oct 25, 2013, at 11:18 AM, Cassandra Targett  wrote:

> In terms of adding or fixing documentation, the "Installing Solr" page
> (https://cwiki.apache.org/confluence/display/solr/Installing+Solr)
> includes a yellow box that says:
> 
> "Solr ships with a working Jetty server, with optimized settings for
> Solr, inside the example directory. It is recommended that you use the
> provided Jetty server for optimal performance. If you absolutely must
> use a different servlet container then continue to the next section on
> how to install Solr."
> 
> So, it's stated, but maybe not in a way that makes it clear to most
> users. And maybe it needs to be repeated in another section.
> Suggestions?
> 
> I did find this page,
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Jetty,
> which pretty much contradicts the previous text. I'll fix that now.
> 
> Other recommendations for where doc could be more clear are welcome.
> 
> On Thu, Oct 24, 2013 at 7:14 PM, Tim Vaillancourt  
> wrote:
>> Hmm, thats an interesting move. I'm on the fence on that one but it surely
>> simplifies some things. Good info, thanks!
>> 
>> Tim
>> 
>> 
>> On 24 October 2013 16:46, Anshum Gupta  wrote:
>> 
>>> Thought you may want to have a look at this:
>>> 
>>> https://issues.apache.org/jira/browse/SOLR-4792
>>> 
>>> P.S: There are no timelines for 5.0 for now, but it's the future
>>> nevertheless.
>>> 
>>> 
>>> 
>>> On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt >>> wrote:
>>> 
 I agree with Jonathan (and Shawn on the Jetty explanation), I think the
 docs should make this a bit more clear - I notice many people choosing
 Tomcat and then learning these details after, possibly regretting it.
 
 I'd be glad to modify the docs but I want to be careful how it is worded.
 Is it fair to go as far as saying Jetty is 100% THE "recommended"
>>> container
 for Solr, or should a recommendation be avoided, and maybe just a list of
 pros/cons?
 
 Cheers,
 
 Tim
 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Anshum Gupta
>>> http://www.anshumgupta.net
>>>

Re: Global User defined properties - solr.xml from Solr 4.4 to Solr 4.5

2013-10-25 Thread Mark Miller

Can you file a JIRA issue?

- Mark

On Oct 25, 2013, at 12:52 PM, marotosg  wrote:

> Right, but what if you have many properties being shared across multiple
> cores.
> That means you have to copy  same properties in each individual
> core.properties.
> 
> Is not this redundant data.
> 
> My main problem is I would like to keep several properties at solr level not
> to core level.
> 
> Thanka a lot
> Sergio
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Global-User-defined-properties-solr-xml-from-Solr-4-4-to-Solr-4-5-tp4097740p4097789.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Lucene/Solr 4.5.1 svn tag

2013-10-25 Thread Mark Miller

I’ll look into it. I ran the command to create the tag, but perhaps it did not 
‘take’ :)

- Mark

On Oct 25, 2013, at 3:56 PM, André Widhani  wrote:

> Hi,
> 
> shouldn't there be a tag for the 4.5.1 release under 
> http://svn.apache.org/repos/asf/lucene/dev/tags/ ?
> 
> Or am I looking at the wrong place?
> 
> Regards,
> André
>

Re: Lucene/Solr 4.5.1 svn tag

2013-10-25 Thread Mark Miller

I had created it in a ‘retired’ location. The tag should be in the correction 
spot now.

Thanks!

- Mark

On Oct 25, 2013, at 4:04 PM, Mark Miller  wrote:

> I’ll look into it. I ran the command to create the tag, but perhaps it did 
> not ‘take’ :)
> 
> - Mark
> 
> On Oct 25, 2013, at 3:56 PM, André Widhani  wrote:
> 
>> Hi,
>> 
>> shouldn't there be a tag for the 4.5.1 release under 
>> http://svn.apache.org/repos/asf/lucene/dev/tags/ ?
>> 
>> Or am I looking at the wrong place?
>> 
>> Regards,
>> André
>> 
>

Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

2013-10-25 Thread Mark Miller

On Oct 24, 2013, at 6:37 AM, michael.boom  wrote:

> Any idea what is happening and why the core on which i wanted the
> optimization to happen, got no optimization and instead another shard got
> optimized, on both servers?

Sounds like a bug we should fix. If you don’t specify distrib=false, it should 
optimize your whole collection. I’ve never really looked into this though - I’m 
sure we need some tests.

- Mark

Re: difference between apache tomcat vs Jetty

2013-10-25 Thread Mark Miller

Things have actually improved quite a bit in that area. There have been many 
optimizations and additional ways to create large data structures off heap 
added in recent releases.

Someday G1 might even help a bit.

- Mark

On Oct 25, 2013, at 7:20 PM, Tim Vaillancourt  wrote:

> I (jokingly) propose we take it a step further and drop Java :)! I'm getting 
> tired of trying to scale GC'ing JVMs!
> 
> Tim
> 
> On 25/10/13 09:02 AM, Mark Miller wrote:
>> Just to add to the “use jetty for Solr” argument - Solr 5.0 will no longer 
>> consider itself a webapp and will consider the fact that Jetty is a used an 
>> implementation detail.
>> 
>> We won’t necessarily make it impossible to use a different container, but 
>> the project won’t condone it or support it and may do some things that 
>> assume Jetty. Solr is taking over this layer in 5.0.
>> 
>> - Mark
>> 
>> On Oct 25, 2013, at 11:18 AM, Cassandra Targett  
>> wrote:
>> 
>>> In terms of adding or fixing documentation, the "Installing Solr" page
>>> (https://cwiki.apache.org/confluence/display/solr/Installing+Solr)
>>> includes a yellow box that says:
>>> 
>>> "Solr ships with a working Jetty server, with optimized settings for
>>> Solr, inside the example directory. It is recommended that you use the
>>> provided Jetty server for optimal performance. If you absolutely must
>>> use a different servlet container then continue to the next section on
>>> how to install Solr."
>>> 
>>> So, it's stated, but maybe not in a way that makes it clear to most
>>> users. And maybe it needs to be repeated in another section.
>>> Suggestions?
>>> 
>>> I did find this page,
>>> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Jetty,
>>> which pretty much contradicts the previous text. I'll fix that now.
>>> 
>>> Other recommendations for where doc could be more clear are welcome.
>>> 
>>> On Thu, Oct 24, 2013 at 7:14 PM, Tim Vaillancourt  
>>> wrote:
>>>> Hmm, thats an interesting move. I'm on the fence on that one but it surely
>>>> simplifies some things. Good info, thanks!
>>>> 
>>>> Tim
>>>> 
>>>> 
>>>> On 24 October 2013 16:46, Anshum Gupta  wrote:
>>>> 
>>>>> Thought you may want to have a look at this:
>>>>> 
>>>>> https://issues.apache.org/jira/browse/SOLR-4792
>>>>> 
>>>>> P.S: There are no timelines for 5.0 for now, but it's the future
>>>>> nevertheless.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt>>>>> wrote:
>>>>>> I agree with Jonathan (and Shawn on the Jetty explanation), I think the
>>>>>> docs should make this a bit more clear - I notice many people choosing
>>>>>> Tomcat and then learning these details after, possibly regretting it.
>>>>>> 
>>>>>> I'd be glad to modify the docs but I want to be careful how it is worded.
>>>>>> Is it fair to go as far as saying Jetty is 100% THE "recommended"
>>>>> container
>>>>>> for Solr, or should a recommendation be avoided, and maybe just a list of
>>>>>> pros/cons?
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Tim
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Anshum Gupta
>>>>> http://www.anshumgupta.net
>>>>>

Re: Solr 4.5.1 replication Bug? "Illegal to have multiple roots (start tag in epilog?)."

2013-10-29 Thread Mark Miller

Has someone filed a JIRA issue with the current known info yet?

- Mark

> On Oct 29, 2013, at 12:36 AM, Sai Gadde  wrote:
> 
> Hi Michael,
> 
> I downgraded to Solr 4.4.0 and this issue is gone. No additional settings
> or tweaks are done.
> 
> This is not a fix or solution I guess but, in our case we wanted something
> working and we were running out of time.
> 
> I will watch this thread if there are any suggestions but, possibly we will
> stay with 4.4.0 for sometime.
> 
> Regards
> Sai
> 
> 
>> On Tue, Oct 29, 2013 at 4:36 AM, Michael Tracey  wrote:
>> 
>> Hey, this is Michael, who was having the exact error on the Jetty side
>> with an update.  I've upgraded jetty from the 4.5.1 embedded version (in
>> the example directory) to version 9.0.6, which means I had to upgrade my
>> OpenJDK from 1.6 to 1.7.0_45.  Also, I added the suggested (very large)
>> settings to my solrconfig.xml:
>> 
>> > formdataUploadLimitInKB="2048000" multipartUploadLimitInKB="2048000" />
>> 
>> but I am still getting the errors when I put a second server in the cloud.
>> Single servers (external zookeeper, but no cloud partner) works just fine.
>> 
>> I suppose my next step is to try Tomcat, but according to your post, it
>> will not help!
>> 
>> Any help is appreciated,
>> 
>> M.
>> 
>> - Original Message -
>> From: "Sai Gadde" 
>> To: solr-user@lucene.apache.org
>> Sent: Monday, October 28, 2013 7:10:41 AM
>> Subject: Solr 4.5.1 replication Bug? "Illegal to have multiple roots
>> (start tag in epilog?)."
>> 
>> we have a similar error as this thread.
>> 
>> http://www.mail-archive.com/solr-user@lucene.apache.org/msg90748.html
>> 
>> Tried tomcat setting from this post. We used exact setting sepecified
>> here. we merge 500 documents at a time. I am creating a new thread
>> because Michael is using Jetty where as we use Tomcat.
>> 
>> 
>> formdataUploadLimitInKB and multipartUploadLimitInKB limits are set to very
>> high value 2GB. As suggested in the following thread.
>> https://issues.apache.org/jira/browse/SOLR-5331
>> 
>> 
>> We use out of the box Solr 4.5.1 no customization done. If we merge
>> documents via SolrJ to a single server it is perfectly working fine.
>> 
>> 
>> But as soon as we add another node to the cloud we are getting
>> following while merging documents.
>> 
>> 
>> 
>> This is the error we are getting on the server (10.10.10.116 - IP is
>> irrelavent just for clarity)where merging is happening. 10.10.10.119
>> is the new node here. This server gets RemoteSolrException
>> 
>> 
>> shard update error StdNode:
>> 
>> http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException
>> :
>> Illegal to have multiple roots (start tag in epilog?).
>> at [row,col {unknown-source}]: [1,12468]
>>at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425)
>>at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
>>at
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
>>at
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1)
>>at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>>at java.util.concurrent.FutureTask.run(Unknown Source)
>>at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
>> Source)
>>at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>>at java.util.concurrent.FutureTask.run(Unknown Source)
>>at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
>> Source)
>>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> Source)
>>at java.lang.Thread.run(Unknown Source)
>> 
>> 
>> 
>> 
>> 
>> On the other server 10.10.10.119 we get following error
>> 
>> 
>> org.apache.solr.common.SolrException: Illegal to have multiple roots
>> (start tag in epilog?).
>> at [row,col {unknown-source}]: [1,12468]
>>at
>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
>>at
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
>>at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>>at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>>at
>> org.apache.catalina.core

Re: SolrCloud never fully recovers after slow disks

2013-11-10 Thread Mark Miller

Which version of solr are you using? Regardless of your env, this is a fail 
safe that you should not hit. 

- Mark

> On Nov 5, 2013, at 8:33 AM, Henrik Ossipoff Hansen 
>  wrote:
> 
> I previously made a post on this, but have since narrowed down the issue and 
> am now giving this another try, with another spin to it.
> 
> We are running a 4 node setup (over Tomcat7) with a 3-ensemble external 
> ZooKeeper. This is running no a total of 7 (4+3) different VMs, and each VM 
> is using our Storage system (NFS share in VMWare).
> 
> Now I do realize and have heard, that NFS is not the greatest way to run Solr 
> on, but we have never had this issue on non-SolrCloud setups.
> 
> Basically, each night when we run our backup jobs, our storage becomes a bit 
> slow in response - this is obviously something we’re trying to solve, but 
> bottom line is, that all our other systems somehow stays alive or recovers 
> gracefully when bandwidth exists again.
> SolrCloud - not so much. Typically after a session like this, 3-5 nodes will 
> either go into a Down state or a Recovering state - and stay that way. 
> Sometimes such node will even be marked as leader. A such node will have 
> something like this in the log:
> 
> ERROR - 2013-11-05 08:57:45.764; 
> org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState 
> says we are the leader, but locally we don't think so
> ERROR - 2013-11-05 08:57:45.768; org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: ClusterState says we are the leader 
> (http://solr04.cd-et.com:8080/solr/products_fi_shard1_replica2), but locally 
> we don't think so. Request came from 
> http://solr01.cd-et.com:8080/solr/products_fi_shard2_replica1/
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:381)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:243)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
>at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
>at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
>at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
>at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
>at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
>at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
>at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
>at 
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
>at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
>at 
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
>at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
>at 
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:724)
> 
> On the other nodes, an error similar to this will be in the log:
> 
> 09:27:34 - ERROR - SolrCmdDistributor shard update error RetryNode: 
> http://solr04.cd-et.com:8080/solr/products_dk_shard1_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>  Server at http://solr04.cd-et.com:8080/solr/products_dk_shard1_replica2 
> returned non ok status:503, message:Service Unavailable
> 09:27:34 -ERROR - SolrCmdDistributor forwarding update to 
> http://solr04.cd-et.com:8080/solr/products_dk_shard1_replica2/ failed - 
> retrying ...
> 
> Does anyone have any ideas or leads towards a so

Re: SolrCloud keeps repeating exception 'SolrCoreState already closed'

2013-11-10 Thread Mark Miller

Can you isolate any exceptions that happened just before that exception. 
started repeating?

- Mark

> On Nov 7, 2013, at 9:09 AM, Eric Bus  wrote:
> 
> Hi,
> 
> I'm having a problem with one of my shards. Since yesterday, SOLR keeps 
> repeating the same exception over and over for this shard.
> The webinterface for this SOLR instance is also not working (it hangs on the 
> Loading indicator).
> 
> Nov 7, 2013 9:08:12 AM org.apache.solr.update.processor.LogUpdateProcessor 
> finish
> INFO: [website1_shard1_replica3] webapp=/solr path=/update 
> params={update.distrib=TOLEADER&wt=javabin&version=2} {} 0 0
> Nov 7, 2013 9:08:12 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.RuntimeException: SolrCoreState already closed
>at 
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:79)
>at 
> org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:276)
>at 
> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721)
>at 
> org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
>at 
> org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
>at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
>at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
>at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
>at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>at java.lang.Thread.run(Thread.java:662)
> 
> I have about 3GB of logfiles for this single message. Reloading the 
> collection does not work. Reloading the specific shard core returns the same 
> exception. The only option seems to be to restart the server. But because 
> it's the leader for a lot of collections, I want to know why this is 
> happening. I've seen this problem before, and I haven't figured out what is 
> causing it.
> 
> I've reported a different problem a few days ago with 'hanging' deleted 
> logfiles. Could this be related? Could the hanging logfiles prevent a new 
> Searcher from opening? I've updated two of my three hosts to 4.5.1 but after 
> only 2 days uptime, I'm still seeing about 11.000 deleted logfiles in the 
> lsof output.
> 
> Best regards,
> Eric Bus
> 
>

Re: SolrCloud never fully recovers after slow disks

2013-11-11 Thread Mark Miller

ERROR Overseer Could not create Overseer node
> 03:06:47 WARN LeaderElector
> 03:06:47 WARN ZkStateReader ZooKeeper watch triggered, but Solr cannot talk 
> to ZK
> 03:07:41 WARN RecoveryStrategy Stopping recovery for 
> zkNodeName=solr04.cd-et.com:8080_solr_auto_suggest_shard1_replica2core=auto_suggest_shard1_replica2
> 
> After this, the cluster state seems to be fine, and I'm not being spammed 
> with errors in the log files.
> 
> Bottom line is that the issues are fixed for now it seems, but I still find 
> it weird that Solr was not able to fully receover.
> 
> // Henrik Ossipoff
> 
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: 10. november 2013 19:27
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud never fully recovers after slow disks
> 
> Which version of solr are you using? Regardless of your env, this is a fail 
> safe that you should not hit.
> 
> - Mark
> 
>> On Nov 5, 2013, at 8:33 AM, Henrik Ossipoff Hansen 
>>  wrote:
>> 
>> I previously made a post on this, but have since narrowed down the issue and 
>> am now giving this another try, with another spin to it.
>> 
>> We are running a 4 node setup (over Tomcat7) with a 3-ensemble external 
>> ZooKeeper. This is running no a total of 7 (4+3) different VMs, and each VM 
>> is using our Storage system (NFS share in VMWare).
>> 
>> Now I do realize and have heard, that NFS is not the greatest way to run 
>> Solr on, but we have never had this issue on non-SolrCloud setups.
>> 
>> Basically, each night when we run our backup jobs, our storage becomes a bit 
>> slow in response - this is obviously something we’re trying to solve, but 
>> bottom line is, that all our other systems somehow stays alive or recovers 
>> gracefully when bandwidth exists again.
>> SolrCloud - not so much. Typically after a session like this, 3-5 nodes will 
>> either go into a Down state or a Recovering state - and stay that way. 
>> Sometimes such node will even be marked as leader. A such node will have 
>> something like this in the log:
>> 
>> ERROR - 2013-11-05 08:57:45.764;
>> org.apache.solr.update.processor.DistributedUpdateProcessor; ClusterState 
>> says we are the leader, but locally we don't think so ERROR - 2013-11-05 
>> 08:57:45.768; org.apache.solr.common.SolrException; 
>> org.apache.solr.common.SolrException: ClusterState says we are the leader 
>> (http://solr04.cd-et.com:8080/solr/products_fi_shard1_replica2), but locally 
>> we don't think so. Request came from 
>> http://solr01.cd-et.com:8080/solr/products_fi_shard2_replica1/
>> at 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:381)
>> at 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:243)
>> at 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428)
>> at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
>> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
>> at 
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>> at 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
>> at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
>> at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
>> at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
>> at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>> at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>> at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
>> at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
>> at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
>> at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
>> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
>> at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>> at 
>> org.apache.catalina

Re: Why do people want to deploy to Tomcat?

2013-11-13 Thread Mark Miller

RE: the example folder

It’s something I’ve been pushing towards moving away from for a long time - see 
https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to 
'server' and pull examples into an 'examples’ directory

Part of a push I’ve been on to own the Container level (people are now on board 
with that for 5.0), add start scripts, and other niceties that we should have 
but don’t yet.

Even our config files should move away from being an “example” and end up more 
like a default starting template. Like a database, it should be simple to 
create a collection without needing to deal with config - you want to deal with 
the config when you need to, not face it all up front every time it is time to 
create a new collection.

IMO, the name example is historical - most people already use it this way, the 
name just confuses matters.

- Mark

On Nov 13, 2013, at 12:30 PM, Shawn Heisey  wrote:

> On 11/13/2013 5:29 AM, Dmitry Kan wrote:
>> Reading that people have considered deploying "example" folder is slightly
>> strange to me. No wonder they are confused and confuse their ops.
> 
> I do use the stripped jetty included in the example, but my setup is not a 
> straight copy of the example directory. I removed a lot of it and changed how 
> jars get loaded.  I built my own init script from scratch, tailored for my 
> setup.
> 
> I'll start a new thread with my init script and some info about how I 
> installed Solr.
> 
> Thanks,
> Shawn
>

Re: collections API error

2013-11-13 Thread Mark Miller

Try Solr 4.5.1.

https://issues.apache.org/jira/browse/SOLR-5306  Extra collection creation 
parameters like collection.configName are not being respected.

- Mark

On Nov 13, 2013, at 2:24 PM, Christopher Gross  wrote:

> Running Apache Solr 4.5 on Tomcat 7.0.29, Java 1.6_30.  3 SolrCloud nodes
> running.  5 ZK nodes (v 3.4.5), one on each SolrCloud server, and on 2
> other servers.
> 
> I want to create a collection on all 3 nodes.  I only need 1 shard.  The
> config is in Zookeeper (another collection is using it)
> 
> http://solrserver:8080/solr/admin/collections?action=CREATE&name=newtest&numShards=1&replicationFactor=3&collection.configName=test
> 
> I get this error (3 times, though for a different replica #)
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
> CREATEing SolrCore 'newtest_shard1_replica2': Unable to create core:
> newtest_shard1_replica2
> 
> The SolrCloud Admin logs give this as the root error:
> 
> Caused by: org.apache.solr.common.cloud.ZooKeeperException: Specified
> config does not exist in ZooKeeper:newtest
> 
> You can see from my call that I don't want it to be called "test" (already
> have one) but I want to make a new instance of the "test" collection.
> 
> This seems  pretty straightforward -- what am I missing?  Did the
> parameters change and the wiki not get updated?
> [
> http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
> ]
> 
> Thanks.
> 
> -- Chris

Re: SolrCloud question

2013-11-15 Thread Mark Miller

We are moving away from pre defining SolrCores for SolrCloud. The correct 
approach would be to use thew Collections API - then it is quite simple to 
change the number of shards for each collection you create.

Hopefully our examples will move to doing this before long.

- Mark

On Nov 15, 2013, at 11:47 AM, Beale, Jim (US-KOP)  wrote:

> Hello all,
> 
> I am trying to set up a SolrCloud deployment consisting of 5 boxes each of 
> which is running Solr under jetty.  A zookeeper ensemble is running 
> separately on 3 of the boxes.
> 
> Each Solr instance has 2 cores, one of which is sharded across the five boxes 
> and the other not sharded at all because it is a much smaller index.  
> numShards is set to 5 in the command to start jetty, -DnumShards=5.
> 
> It turns out that getting this configuration to work is not as easy as I had 
> hoped.  According to JIRA SOLR-3186, "If you are bootstrapping a multi-core 
> setup, you currently have to settle for the same
> numShards for every core."  Unfortunately that JIRA was closed without any 
> implementation.
> 
> Is this limitation still in effect?  Does the new core discovery mode offer 
> anything in this regard?
> 
> Is there any way at all to deploy two cores with different numShards?
> 
> How hard would it be to implement this?  Is it compatible with the 
> architecture of Solr 5?
> 
> Thanks,
> Jim Beale
> 
> 
> The information contained in this email message, including any attachments, 
> is intended solely for use by the individual or entity named above and may be 
> confidential. If the reader of this message is not the intended recipient, 
> you are hereby notified that you must not read, use, disclose, distribute or 
> copy any part of this communication. If you have received this communication 
> in error, please immediately notify me by email and destroy the original 
> message, including any attachments. Thank you.

Re: distributed search is significantly slower than direct search

2013-11-17 Thread Mark Miller

You are asking for 5000 docs right? And that’s forcing us to look up 5000 
external to internal ids. I think this always had a cost, but it’s obviously 
worse if you ask for a ton of results. I don’t think single node has to do 
this? And if we had like Searcher leases or something (we will eventually), I 
think we could avoid it and just use internal ids.

- Mark

On Nov 17, 2013, at 12:44 PM, Yuval Dotan  wrote:

> Hi Tomás
> This is just a test environment meant only to reproduce the issue I am
> currently investigating.
> The number of documents should grow substantially (billions of docs).
> 
> 
> 
> On Sun, Nov 17, 2013 at 7:12 PM, Tomás Fernández Löbbe <
> tomasflo...@gmail.com> wrote:
> 
>> Hi Yuval, quick question. You say that your code has 750k docs and around
>> 400mb? Is this some kind of test dataset and you expect it to grow
>> significantly? For an index of this size, I wouldn't use distributed
>> search, single shard should be fine.
>> 
>> 
>> Tomás
>> 
>> 
>> On Sun, Nov 17, 2013 at 6:50 AM, Yuval Dotan  wrote:
>> 
>>> Hi,
>>> 
>>> I isolated the case
>>> 
>>> Installed on a new machine (2 x Xeon E5410 2.33GHz)
>>> 
>>> I have an environment with 12Gb of memory.
>>> 
>>> I assigned 6gb of memory to Solr and I’m not running any other memory
>>> consuming process so no memory issues should arise.
>>> 
>>> Removed all indexes apart from two:
>>> 
>>> emptyCore – empty – used for routing
>>> 
>>> core1 – holds the stored data – has ~750,000 docs and size of 400Mb
>>> 
>>> Again this is a single machine that holds both indexes.
>>> 
>>> The query
>>> 
>>> 
>> http://localhost:8210/solr/emptyCore/select?rows=5000&q=*:*&shards=127.0.0.1:8210/solr/core1&wt=jsonQTime
>>> takes ~3 seconds
>>> 
>>> and direct query
>>> http://localhost:8210/solr/core1/select?rows=5000&q=*:*&wt=json Qtime
>>> takes
>>> ~15 ms - a magnitude difference.
>>> 
>>> I ran the long query several times and got an improvement of about a sec
>>> (33%) but that’s it.
>>> 
>>> I need to better understand why this is happening.
>>> 
>>> I tried looking at Solr code and debugging the issue but with no success.
>>> 
>>> The one thing I did notice is that the getFirstMatch method which
>> receives
>>> the doc id, searches the term dict and returns the internal id takes most
>>> of the time for some reason.
>>> 
>>> I am pretty stuck and would appreciate any ideas
>>> 
>>> My only solution for the moment is to bypass the distributed query,
>>> implement code in my own app that directly queries the relevant cores and
>>> handles the sorting etc..
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> 
>>> On Sat, Nov 16, 2013 at 2:39 PM, Michael Sokolov <
>>> msoko...@safaribooksonline.com> wrote:
>>> 
 Did you say what the memory profile of your machine is?  How much
>> memory,
 and how large are the shards? This is just a random guess, but it might
>>> be
 that if you are memory-constrained, there is a lot of thrashing caused
>> by
 paging (swapping?) in and out the sharded indexes while a single index
>>> can
 be scanned linearly, even if it does need to be paged in.
 
 -Mike
 
 
 On 11/14/2013 8:10 AM, Elran Dvir wrote:
 
> Hi,
> 
> We tried returning just the id field and got exactly the same
>>> performance.
> Our system is distributed but all shards are in a single machine so
> network issues are not a factor.
> The code we found where Solr is spending its time is on the shard and
>>> not
> on the routing core, again all shards are local.
> We investigated the getFirstMatch() method and noticed that the
> MultiTermEnum.reset (inside MultiTerm.iterator) and
>> MultiTerm.seekExact
> take 99% of the time.
> Inside these methods, the call to BlockTreeTermsReader$
> FieldReader$SegmentTermsEnum$Frame.loadBlock  takes most of the time.
> Out of the 7 seconds  run these methods take ~5 and
> BinaryResponseWriter.write takes the rest(~ 2 seconds).
> 
> We tried increasing cache sizes and got hits, but it only improved the
> query time by a second (~6), so no major effect.
> We are not indexing during our tests. The performance is similar.
> (How do we measure doc size? Is it important due to the fact that the
> performance is the same when returning only id field?)
> 
> We still don't completely understand why the query takes this much
>>> longer
> although the cores are on the same machine.
> 
> Is there a way to improve the performance (code, configuration,
>> query)?
> 
> -Original Message-
> From: idokis...@gmail.com [mailto:idokis...@gmail.com] On Behalf Of
> Manuel Le Normand
> Sent: Thursday, November 14, 2013 1:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: distributed search is significantly slower than direct
>>> search
> 
> It's surprising such a query takes a long time, I would assume that
>>> after
> trying consistently q=*:* you should be gettin

Re: [ANNOUNCE] Apache Solr Reference Guide 4.5 Available

2013-11-18 Thread Mark Miller

There appear to be plugins to do this, but since Apache hosts the wiki infra 
for us, we don’t get to toss in any plugins we want unfortunately.

- Mark

On Nov 18, 2013, at 8:16 AM, Uwe Reh  wrote:

> I'd like to read the guide as e-paper. Is there a way to obtain the document 
> in the Format epub or odt.
> Trying to convert the PDF with Calibre,  wasn't very satisfyingly. :-(
> 
> Uwe
> 
> 
> Am 05.10.2013 14:19, schrieb Steve Rowe:
>> The Lucene PMC is pleased to announce the release of the Apache Solr 
>> Reference Guide for Solr 4.5.
>> 
>> This 338 page PDF serves as the definitive users manual for Solr 4.5.
>> 
>> The Solr Reference Guide is available for download from the Apache mirror 
>> network:
>> 
>> https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/
>> 
>> 
>> Steve
>> 
>

Re: Listing Collections in Solr Cloud

2013-11-18 Thread Mark Miller

We should have a list command in the collections api. I can help if someone 
wants to make a JIRA issue.

- Mark

On Nov 18, 2013, at 2:11 PM, Anirudha Jadhav  wrote:

> you can use the following 2 ways
> 
> 1. ZK client API
>you could just do a get_children on the zk node
> /collections/ to get all collections.
> 
> or
> without ZK client API point this url at your solrCloud install
> http://host:port/solr/zookeeper?detail=true&path=%2Fcollections&_=1384801522456
> 
> you should be looking for children under the collections node.
> 
> I dont know if there is any other easier api, but this should work
> 
> enjoy!
> -Ani
> 
> 
> On Mon, Nov 18, 2013 at 12:48 PM, Dave Seltzer  wrote:
> 
>> Hello,
>> 
>> I have a need to dynamically add collections to Solr cloud. The collections
>> API makes this quite simple.
>> 
>> However, when I'm done adding the new collection I need to update an Alias
>> to include the new collection. To do this I need a list of current
>> collections, but I don't see a way to do this.
>> 
>> Does anyone know if there's a way to list all the collections hosted on a
>> Solr server?
>> 
>> Thanks!
>> 
>> -Dave
>> 
> 
> 
> 
> -- 
> Anirudha P. Jadhav

Re: SolrCloud question

2013-11-18 Thread Mark Miller

You shouldn’t be configuring the replication handler if you are using solrcloud.

- Mark

On Nov 18, 2013, at 3:51 PM, Beale, Jim (US-KOP)  wrote:

> Thanks Michael,
> 
> I am having a terrible time getting this non-sharded index up.  Everything I 
> try leads to a dead-end.
> 
> http://10.0.15.44:8511/solr/admin/collections?action=CREATE&name=tp&numShards=1&replicationFactor=5
> 
> it uses the solrconfig.xml from another core.  That solrconfig.xml is 
> deployed in conjunction with a solrcore.properties and the replication 
> handler is configured with properties from that core's solrcore.properties 
> file.  The CREATE action uses the solrconfig.xml but not the properties so it 
> fails.
> 
> I tried to upload a different solrconfig.xml to zookeeper using the zkcli 
> script -cmd upconfig and then to specify that config in the creation of the 
> TP core like so
> 
> http://10.0.15.44:8511/solr/admin/collections?action=CREATE&name=tp&numShards=1&replicationFactor=5&collection.configName=solrconfigTP.xml
> 
> However, how can replication masters and slaves be configured with a single 
> solrconfig.xml file unless each node is allowed to have its own config?
> 
> This is a royal PITA. I may be wrong, but I think it is broken.  Without a 
> way to specify numShards per core in solr.xml, it seems impossible to have 
> one sharded core and one non-sharded core.
> 
> To be honest, I don't even care about replication.  Why can't I specify a 
> core that is non-sharded, non-replicated and have the exact same core on all 
> five of my boxes?
> 
> 
> 
> Thanks,
> Jim
> 
> 
> -Original Message-
> From: michael.boom [mailto:my_sky...@yahoo.com]
> Sent: Monday, November 18, 2013 7:14 AM
> To: solr-user@lucene.apache.org
> Subject: RE: SolrCloud question
> 
> Hi,
> 
> The CollectionAPI provides some more options that will prove to be very
> usefull to you:
> /admin/collections?action=CREATE&name=name&numShards=number&replicationFactor=number&maxShardsPerNode=number&createNodeSet=nodelist&collection.configName=configname
> 
> Have a look at:
> https://cwiki.apache.org/confluence/display/solr/Collections+API
> 
> Regarding your observations:
> 1. Completely normal, that's standard naming
> 2. When you created the collection you did not specify a configuration so
> the new collection will use the conf already stored in ZK. If you have more
> than one not sure which one will be picked as default.
> 3. You should be able to create replicas, by adding new cores on the other
> machines, and specifying the collection name and shard id. The data will
> then be replicated automatically to the new node. If you already tried that
> and get errors/problems while doing it provide some more details.
> 
> As far as i know you should be able to move/replace the index data, as long
> as the source collection has the same config as the target collection.
> Afterwards you'll have to reload your core / restart the Solr instance - not
> sure which one will do it - most likely the latter.
> But it will be easier if you use the method described at point 3 above.
> Please someone correct me, if i'm wrong.
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-question-tp4101266p4101675.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> The information contained in this email message, including any attachments, 
> is intended solely for use by the individual or entity named above and may be 
> confidential. If the reader of this message is not the intended recipient, 
> you are hereby notified that you must not read, use, disclose, distribute or 
> copy any part of this communication. If you have received this communication 
> in error, please immediately notify me by email and destroy the original 
> message, including any attachments. Thank you.

Re: Problems bulk adding documents to Solr Cloud in 4.5.1

2013-11-19 Thread Mark Miller

4.6 no longer uses XML to send requests between nodes. It’s probably worth 
trying it and seeing if there is still a problem. Here is the RC we are voting 
on today: 
http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC4-rev1543363/

Otherwise, I do plan on looking into this issue soon.

- Mark

On Nov 19, 2013, at 10:11 AM, Michael Tracey  wrote:

> Dave, that's the exact symptoms we all have had in SOLR-5402.  After many 
> attempted fixes (including upgrading jetty, switching to tomcat, messing with 
> buffer settings) my solution was to fall back to 4.4 and await a fix.
> 
> - Original Message -
> From: "Dave Seltzer" 
> To: solr-user@lucene.apache.org
> Sent: Monday, November 18, 2013 9:48:46 PM
> Subject: Problems bulk adding documents to Solr Cloud in 4.5.1
> 
> Hello,
> 
> I'm having quite a bit of trouble indexing content in Solr Cloud. I build a
> content indexer on top of the REST API designed to index my data quickly.
> It was working very well indexing about 100 documents per ""
> instruction.
> 
> After some tweaking of the schema I switched on a few more servers. Set up
> a few shards and started indexing data. Everything was working perfectly,
> but as soon as I switched to "Cloud" I started getting
> RemoteServerExceptions "Illegal to have multiple roots."
> 
> I'm using the stock Jetty container on both servers.
> 
> To get things working I reduced the number of documents per add until it
> worked. Unfortunately that has limited me to adding a single document per
> add - which is quite slow.
> 
> I'm fairly sure it's not the size of the HTTP post because things were
> working just fine until I moved over to Solr Cloud.
> 
> Does anyone have any information about this problem? It sounds a lot like
> Sai Gadde's https://issues.apache.org/jira/browse/SOLR-5402
> 
> Thanks so much!
> 
> -Dave

Re: Option to enforce a majority quorum approach to accepting updates in SolrCloud?

2013-11-19 Thread Mark Miller

Yeah, this is kind of like one of many little features that we have just not 
gotten to yet. I’ve always planned for a param that let’s you say how many 
replicas an update must be verified on before responding success. Seems to make 
sense to fail that type of request early if you notice there are not enough 
replicas up to satisfy the param to begin with.

I don’t think there is a JIRA issue yet, fire away if you want.

- Mark

On Nov 19, 2013, at 12:14 PM, Timothy Potter  wrote:

> I've been thinking about how SolrCloud deals with write-availability using
> in-sync replica sets, in which writes will continue to be accepted so long
> as there is at least one healthy node per shard.
> 
> For a little background (and to verify my understanding of the process is
> correct), SolrCloud only considers active/healthy replicas when
> acknowledging a write. Specifically, when a shard leader accepts an update
> request, it forwards the request to all active/healthy replicas and only
> considers the write successful if all active/healthy replicas ack the
> write. Any down / gone replicas are not considered and will sync up with
> the leader when they come back online using peer sync or snapshot
> replication. For instance, if a shard has 3 nodes, A, B, C with A being the
> current leader, then writes to the shard will continue to succeed even if B
> & C are down.
> 
> The issue is that if a shard leader continues to accept updates even if it
> loses all of its replicas, then we have acknowledged updates on only 1
> node. If that node, call it A, then fails and one of the previous replicas,
> call it B, comes back online before A does, then any writes that A accepted
> while the other replicas were offline are at risk to being lost.
> 
> SolrCloud does provide a safe-guard mechanism for this problem with the
> leaderVoteWait setting, which puts any replicas that come back online
> before node A into a temporary wait state. If A comes back online within
> the wait period, then all is well as it will become the leader again and no
> writes will be lost. As a side note, sys admins definitely need to be made
> more aware of this situation as when I first encountered it in my cluster,
> I had no idea what it meant.
> 
> My question is whether we want to consider an approach where SolrCloud will
> not accept writes unless there is a majority of replicas available to
> accept the write? For my example, under this approach, we wouldn't accept
> writes if both B&C failed, but would if only C did, leaving A & B online.
> Admittedly, this lowers the write-availability of the system, so may be
> something that should be tunable? Just wanted to put this out there as
> something I've been thinking about lately ...
> 
> Cheers,
> Tim

Re: Question regarding possibility of data loss

2013-11-19 Thread Mark Miller

I’d recommend you start with the upcoming 4.6 release. Should be out this week 
or next.

- Mark

On Nov 19, 2013, at 8:18 AM, adfel70  wrote:

> Hi, we plan to establish an ensemble of solr with zookeeper. 
> We gonna have 6 solr servers with 2 instances on each server, also we'll
> have 6 shards with replication factor 2, in addition we'll have 3
> zookeepers. 
> 
> Our concern is that we will send documents to index and solr won't index
> them but won't send any error message and we will suffer a data loss
> 
> 1. Is there any situation that can cause this kind of problem? 
> 2. Can it happen if some of ZKs are down? or some of the solr instances? 
> 3. How can we monitor them? Can we do something to prevent these kind of
> errors? 
> 
> Thanks in advance 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Question-regarding-possibility-of-data-loss-tp4101915.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Option to enforce a majority quorum approach to accepting updates in SolrCloud?

2013-11-19 Thread Mark Miller

Mostly a lot of other systems already offer these types of things, so they were 
hard not to think about while building :) Just hard to get back to a lot of 
those things, even though a lot of them are fairly low hanging fruit. Hardening 
takes the priority :(

- Mark

On Nov 19, 2013, at 12:42 PM, Timothy Potter  wrote:

> You're thinking is always one-step ahead of me! I'll file the JIRA
> 
> Thanks.
> Tim
> 
> 
> On Tue, Nov 19, 2013 at 10:38 AM, Mark Miller  wrote:
> 
>> Yeah, this is kind of like one of many little features that we have just
>> not gotten to yet. I’ve always planned for a param that let’s you say how
>> many replicas an update must be verified on before responding success.
>> Seems to make sense to fail that type of request early if you notice there
>> are not enough replicas up to satisfy the param to begin with.
>> 
>> I don’t think there is a JIRA issue yet, fire away if you want.
>> 
>> - Mark
>> 
>> On Nov 19, 2013, at 12:14 PM, Timothy Potter  wrote:
>> 
>>> I've been thinking about how SolrCloud deals with write-availability
>> using
>>> in-sync replica sets, in which writes will continue to be accepted so
>> long
>>> as there is at least one healthy node per shard.
>>> 
>>> For a little background (and to verify my understanding of the process is
>>> correct), SolrCloud only considers active/healthy replicas when
>>> acknowledging a write. Specifically, when a shard leader accepts an
>> update
>>> request, it forwards the request to all active/healthy replicas and only
>>> considers the write successful if all active/healthy replicas ack the
>>> write. Any down / gone replicas are not considered and will sync up with
>>> the leader when they come back online using peer sync or snapshot
>>> replication. For instance, if a shard has 3 nodes, A, B, C with A being
>> the
>>> current leader, then writes to the shard will continue to succeed even
>> if B
>>> & C are down.
>>> 
>>> The issue is that if a shard leader continues to accept updates even if
>> it
>>> loses all of its replicas, then we have acknowledged updates on only 1
>>> node. If that node, call it A, then fails and one of the previous
>> replicas,
>>> call it B, comes back online before A does, then any writes that A
>> accepted
>>> while the other replicas were offline are at risk to being lost.
>>> 
>>> SolrCloud does provide a safe-guard mechanism for this problem with the
>>> leaderVoteWait setting, which puts any replicas that come back online
>>> before node A into a temporary wait state. If A comes back online within
>>> the wait period, then all is well as it will become the leader again and
>> no
>>> writes will be lost. As a side note, sys admins definitely need to be
>> made
>>> more aware of this situation as when I first encountered it in my
>> cluster,
>>> I had no idea what it meant.
>>> 
>>> My question is whether we want to consider an approach where SolrCloud
>> will
>>> not accept writes unless there is a majority of replicas available to
>>> accept the write? For my example, under this approach, we wouldn't accept
>>> writes if both B&C failed, but would if only C did, leaving A & B online.
>>> Admittedly, this lowers the write-availability of the system, so may be
>>> something that should be tunable? Just wanted to put this out there as
>>> something I've been thinking about lately ...
>>> 
>>> Cheers,
>>> Tim
>> 
>>

Re: Zookeeper down question

2013-11-19 Thread Mark Miller


On Nov 19, 2013, at 2:24 PM, Timothy Potter  wrote:

> Good questions ... From my understanding, queries will work if Zk goes down
> but writes do not work w/o Zookeeper. This works because the clusterstate
> is cached on each node so Zookeeper doesn't participate directly in queries
> and indexing requests. Solr has to decide not to allow writes if it loses
> its connection to Zookeeper, which is a safe guard mechanism. In other
> words, Solr assumes it's pretty safe to allow reads if the cluster doesn't
> have a healthy coordinator, but chooses to not allow writes to be safe.

Right - we currently stop accepting writes when Solr cannot talk to ZooKeeper - 
this is because we can no longer count on knowing about any changes to the 
cluster and no new leaders can be elected, etc. It gets tricky fast if you 
consider allowing updates without ZooKeeper connectivity for very long.

> 
> If a Solr nodes goes down while ZK is not available, since Solr no longer
> accepts writes, leader / replica doesn't really matter. I'd venture to
> guess there is some failover logic built in when executing distributing
> queries but I'm not as familiar with that part of the code (I'll brush up
> on it though as I'm now curious as well).

Right - query requests will fail over to other replicas - this is important in 
general because the cluster state a Solr instance has can be a bit stale - so a 
request might hit something that has gone down and another replica in the shard 
can be tried. We use the load balancing solrj client for these internal 
requests. CloudSolrServer handles failover for the user (or non internal) 
requests. Or you can use your own external load balancer.

- Mark

> 
> Cheers,
> Tim
> 
> 
> On Tue, Nov 19, 2013 at 11:58 AM, Garth Grimm <
> garthgr...@averyranchconsulting.com> wrote:
> 
>> Given a 4 solr node instance (i.e. 2 shards, 2 replicas per shard), and a
>> standalone zookeeper.
>> 
>> Correct me if any of my understanding is incorrect on the following:
>> If ZK goes down, most normal operations will still function, since my
>> understanding is that ZK isn't involved on a transaction by transaction
>> basis for each of these.
>> Document adds, updates, and deletes on existing collection will still work
>> as expected.
>> Queries will still get processed as expected.
>> Is the above correct?
>> 
>> But adding new collections, changing configs, etc., will all fail while ZK
>> is down (or at least, place things in an inconsistent state?)
>> Is that correct?
>> 
>> If, while ZK is down, one of the 4 solr nodes also goes down, will all
>> normal operations fail?  Will they all continue to succeed?  I.e. will each
>> of the nodes realize which node is down and route indexing and query
>> requests around them, or is that impossible while ZK is down?  Will some
>> queries succeed (because they were lucky enough to get routed to the one
>> replica on the one shard that is still functional) while other queries fail
>> (they aren't so lucky and get routed to the one replica that is down on the
>> one shard)?
>> 
>> Thanks,
>> Garth Grimm
>> 
>> 
>>

Re: Multiple data/index.YYYYMMDD.... dirs == bug?

2013-11-20 Thread Mark Miller

There might be a JIRA issue out there about replication not cleaning up on all 
fails - e.g. on startup or something - kind of rings a bell…if so, it will be 
addressed eventually.

Otherwise, you might have two for a bit just due to multiple searchers being 
around at once for a while or something - but it should not be something that 
lasts a long time.

- Mark

On Nov 20, 2013, at 11:50 AM, Daniel Collins  wrote:

> In our experience (with SolrCloud), if you trigger a full replication (e.g.
> new replica), you get the "timestamp" directory, it never renames back to
> just "index".  Since index.properties gives you the name of the real
> directory, we had never considered that a problem/bug.  Why bother with the
> rename afterwards, it just seems unnecessary?
> 
> So to answer your questions:
> 
> 1) Not in normal circumstances, but if replication crashes or stops, it
> might leave it hanging.
> 2) No, as long as there is an index.properties file.
> 
> Not official answers, but that's our experience.
> 
> 
> On 20 November 2013 15:55, michael.boom  wrote:
> 
>> I encountered this problem often when i restarted a solr instance before
>> replication was finished more than once.
>> I would then have multiple timestamped directories and the index directory.
>> However, the index.properties points to the active index directory.
>> 
>> The moment when the replication succeeded the temp dir is renamed "index"
>> and the index.properties is gone.
>> 
>> On the situation when the index is missing, not sure about that. Maybe this
>> happens when the replica is too old and an old-school replication is done.
>> 
>> 
>> 
>> -
>> Thanks,
>> Michael
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Multiple-data-index-MMDD-dirs-bug-tp4102163p4102168.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: Extensibility of HttpSolrServer

2013-11-20 Thread Mark Miller

Feel free to file a JIRA issue with the changes you think make sense.

- Mark

On Nov 20, 2013, at 4:21 PM, Eugen Paraschiv  wrote:

> Hi,
> Quick question about the HttpSolrServer implementation - I would like to
> extend some of the functionality of this class - but when I extend it - I'm
> having issues with how extensible it is.
> For example - some of the details are not visible externally - setters
> exist for maxRetries and followRedirects but no getters.
> It would really help to make this class a bit more extensible - I'm sure it
> usually enough, but when it does need to be extended - it would make sense
> to allow that rather than the client implement alternative version of it
> via copy-paste (which looks like the only option available right now).
> Hope this makes sense.
> Cheers,
> Eugen.

Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Mark Miller

Yes, more details…

Solr version, which garbage collector, how does heap usage look, cpu, etc.

- Mark

On Nov 21, 2013, at 6:46 PM, Erick Erickson  wrote:

> How real time is NRT? In particular, what are you commit settings?
> 
> And can you characterize "periodic slowness"? Queries that usually
> take 500ms not tail 10s? Or 1s? How often? How are you measuring?
> 
> Details matter, a lot...
> 
> Best,
> Erick
> 
> 
> 
> 
> On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer  wrote:
> 
>> I'm doing some performance testing against an 8-node Solr cloud cluster,
>> and I'm noticing some periodic slowness.
>> 
>> 
>> http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png
>> 
>> I'm doing random test searches against an Alias Collection made up of four
>> smaller (monthly) collections. Like this:
>> 
>> MasterCollection
>> |- Collection201308
>> |- Collection201309
>> |- Collection201310
>> |- Collection201311
>> 
>> The last collection is constantly updated. New documents are being added at
>> the rate of about 3 documents per second.
>> 
>> I believe the slowness may due be to NRT, but I'm not sure. How should I
>> investigate this?
>> 
>> If the slowness is related to NRT, how can I alleviate the issue without
>> disabling NRT?
>> 
>> Thanks Much!
>> 
>> -Dave
>>

Re: Commit behaviour in SolrCloud

2013-11-24 Thread Mark Miller

SolrCloud does not use commits for update acceptance promises.

The idea is, if you get a success from the update, it’s in the system, commit 
or not.

Soft Commits are used for visibility only.

Standard Hard Commits are used essentially for internal purposes and should be 
done via auto commit generally.

To your question though - it is fine to send a commit while updates are coming 
in from another source - it’s just not generally necessary to do that anyway.

- Mark

On Nov 24, 2013, at 1:01 PM, adfel70  wrote:

> Hi everyone,
> 
> I am wondering how commit operation works in SolrCloud:
> Say I have 2 parallel indexing processes. What if one process sends big
> update request (an add command with a lot of docs), and the other one just
> happens to send a commit command while the update request is being
> processed. 
> Is it possible that only part of the documents will be commited? 
> What will happen with the other docs? Is Solr transactional and promise that
> there will be no partial results?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Commit behaviour in SolrCloud

2013-11-24 Thread Mark Miller

If you want this promise and complete control, you pretty much need to do a doc 
per request and many parallel requests for speed.

The bulk and streaming methods of adding documents do not have a good fine 
grained error reporting strategy yet. It’s okay for certain use cases and and 
especially batch loading, and you will know when an update is rejected - it 
just might not be easy to know which in the batch / stream.

Documents that come in batches are added as they come / are processed - not in 
some atomic unit.

What controls how soon you will see documents or whether you will see them as 
they are still loading is simply when you soft commit and how many docs have 
been indexed when the soft commit happens.

- Mark

On Nov 25, 2013, at 1:03 AM, adfel70  wrote:

> Hi Mark, Thanks for the answer.
> 
> One more question though: You say that if I get a success from the update,
> it’s in the system, commit or not. But when exactly do I get this feedback -
> Is it one feedback per the whole request, or per one add inside the request?
> I will give an example clarify my question: Say I have new empty index, and
> I repeatedly send indexing requests - every request adds 500 new documents
> to the index. Is it possible that in some point during this process, to
> query the index and get a total of 1,030 docs total? (Lets assume there were
> no indexing errors got from Solr)
> 
> Thanks again.
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Commit-behaviour-in-SolrCloud-tp4102879p4102996.html
> Sent from the Solr - User mailing list archive at Nabble.com.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1567 matches

Mail list logo