Confusion when using go-live and MapReduceIndexerTool

2014-04-17 Thread Brett Hoerner
I'm doing HDFS input and output in my job, with the following: hadoop jar /mnt/faas-solr.jar \ -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \ --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver \ --morphline-file /mnt/morphline-ignore.conf \

Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-17 Thread Brett Hoerner
https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote: > Odd - might be helpful if you can share your sorlconfig.xml being used. > > -- > Mark Miller > about.me/markrmiller > > On April 17, 2014 at 12:18:37

Re: index merge question

2014-04-17 Thread Brett Hoerner
Sorry to bump this, I have the same issue and was curious about the sanity of trying to work around it. * I have a constant stream of realtime documents I need to continually index. Sometimes they even overwrite very old documents (by using the same unique ID). * I also have a *huge* backlog of do

Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-22 Thread Brett Hoerner
I'm back to looking at the code but holy hell is debugging Hadoop hard. :) On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner wrote: > https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b > > > On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote: > >> Odd - might b

Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-22 Thread Brett Hoerner
merge is complete. If writes are allowed, corruption may occur on the merged index." Is that saying that Solr will block writes, or is that saying the end user has to ensure no writes are happening against the collection during a merge? That seems... risky? On Tue, Apr 22, 2014 at 9:29 AM, Brett

Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)

2014-06-03 Thread Brett Hoerner
If I run a query like this, fq=text:lol fq=created_at_tdid:[1400544000 TO 1400630400] It takes about 6 seconds. Following queries take only 50ms or less, as expected because my fqs are cached. However, if I change the query to not cache my big range query: fq=text:lol fq={!cache=false}created_a

Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)

2014-06-03 Thread Brett Hoerner
act of storing the work after it's done (it has to be done in either case) is taking 4 whole seconds? On Tue, Jun 3, 2014 at 3:59 PM, Shawn Heisey wrote: > On 6/3/2014 2:44 PM, Brett Hoerner wrote: > > If I run a query like this, > > > > fq=text:lol > > fq=created_a

Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)

2014-06-03 Thread Brett Hoerner
, but that seems... surprising to me. On Tue, Jun 3, 2014 at 4:02 PM, Brett Hoerner wrote: > In this case, I have >400 million documents, so I understand it taking a > while. > > That said, I'm still not sure I understand why it would take *more* time. > In your exampl

Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)

2014-06-03 Thread Brett Hoerner
Yonik, I'm familiar with your blog posts -- and thanks very much for them. :) Though I'm not sure what you're trying to show me with the q=*:* part? I was of course using q=*:* in my queries, but I assume you mean to leave off the text:lol bit? I've done some Cluster changes, so these are my basel

"Fake" cached join query much faster than cached fq?

2014-06-04 Thread Brett Hoerner
The following two queries are doing the same thing, one using a "normal" fq range query and another using a parent query. The cache is warm (these are both hits) but the "normal" ones takes ~6 to 7.5sec while the parent query hack takes ~1.2sec. Is this expected? Is there anything "wrong" with my

Re: "Fake" cached join query much faster than cached fq?

2014-06-05 Thread Brett Hoerner
lso if you tell the overall number of docs > in the index, and cardinality of both filters, it might allow to guess > something. Anyway, jvisualvm sampling can give an exact answer. Giving > responses, it's enough to profile one of the slave nodes. > > > On Wed, Jun 4, 2014

Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Can anyone explain the difference between these two queries? text:(+"happy") AND -user:("123456789") = numFound 2912224 But text:(+"happy") AND user:(-"123456789") = numFound 0 Now, you may just say "then just put - infront of your field, duh!" Well, text:(+"happy") = numFound 2912224

Re: Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
quot;. For example: > > text:(+"happy") AND user:(*:* -"123456789") > > -- Jack Krupansky > > -Original Message- From: Brett Hoerner > Sent: Tuesday, July 1, 2014 2:51 PM > To: solr-user@lucene.apache.org > Subject: Confusion about location of

Re: Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Also, does anyone have the Solr or Lucene bug # for this? On Tue, Jul 1, 2014 at 3:06 PM, Brett Hoerner wrote: > Interesting, is there a performance impact to sending the *:*? > > > On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky > wrote: > >> Yeah, there's a k

Trouble with manually routed collection after upgrade to 4.6

2013-11-25 Thread Brett Hoerner
Hi, I've been using a collection on Solr 4.5.X for a few weeks and just did an upgrade to 4.6 and am having some issues. First: this collection is, I guess, implicitly routed. I do this for every document insert using SolrJ: document.addField("_route_", shardId) After upgrading the servers to

Re: Trouble with manually routed collection after upgrade to 4.6

2013-11-25 Thread Brett Hoerner
Here's my clusterstate.json: https://gist.github.com/bretthoerner/a8120a8d89c93f773d70 On Mon, Nov 25, 2013 at 10:18 AM, Brett Hoerner wrote: > Hi, I've been using a collection on Solr 4.5.X for a few weeks and just > did an upgrade to 4.6 and am having some issues.

Re: Trouble with manually routed collection after upgrade to 4.6

2013-11-25 Thread Brett Hoerner
; (is there a tool for this? I've always done it manually), started the cluster up again and it's all good now. On Mon, Nov 25, 2013 at 10:38 AM, Brett Hoerner wrote: > Here's my clusterstate.json: > > https://gist.github.com/bretthoerner/a8120a8d89c93f773d70 > >

After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-07 Thread Brett Hoerner
I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ 4.6.1 and indexing ceased (indexer returned "No live servers for shard" but the real root from the Solr servers is below). Note that SolrJ 4.6.1 is fine for the query side, just not adding documents. 21:35:21.508 [qtp14184

Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-07 Thread Brett Hoerner
On Fri, Feb 7, 2014 at 6:15 PM, Mark Miller wrote: > You have to update the other nodes to 4.6.1 as well. > I'm not sure I follow, all of the Solr instances in the cluster are 4.6.1 to my knowledge? Thanks, Brett

Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Brett Hoerner
; > - Mark > > http://about.me/markrmiller > > > > On Feb 7, 2014, 7:01:24 PM, Brett Hoerner wrote: > I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ > 4.6.1 and indexing ceased (indexer returned "No live servers for shard" but > th

Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Brett Hoerner
not 4.6.1. That code couldn’t have been 4.6.1 it seems. > > - Mark > > http://about.me/markrmiller > > On Feb 8, 2014, at 11:12 AM, Brett Hoerner wrote: > > > Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I > > verified 4.6.1 i

Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Brett Hoerner
Mark, you were correct. I realized I was still running a prerelease of 4.6.1 (by a handful of commits). Bounced them with proper 4.6.1 and we're all good, sorry for the spam. :) On Sat, Feb 8, 2014 at 10:29 AM, Brett Hoerner wrote: > Oh, I was talking about my indexer. That stack is

Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)

2014-09-16 Thread Brett Hoerner
I have a very weird problem that I'm going to try to describe here to see if anyone has any "ah-ha" moments or clues. I haven't created a small reproducible project for this but I guess I will have to try in the future if I can't figure it out. (Or I'll need to bisect by running long Hadoop jobs...

Re: Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)

2014-09-23 Thread Brett Hoerner
(StandardDirectoryReader.java:277) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476) ... 25 more On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner wrote: > I have a very weird prob

Re: Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)

2014-09-23 Thread Brett Hoerner
To be clear, those exceptions are during the "main" mapred job that is creating the many small indexes. If these errors above occur (they don't fail the job), I am 99% sure that is when the MTree job later hangs. On Tue, Sep 23, 2014 at 1:02 PM, Brett Hoerner wrote: > I

Solr mapred MTree merge stage ~6x slower in 4.10

2014-09-25 Thread Brett Hoerner
ing take a long time. I haven't tried to see if the issue shows on smaller jobs yet (does 1 minute become 6 minutes?). Brett On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner wrote: > I have a very weird problem that I'm going to try to describe here to see > if anyone has any &

Advice for using Solr 4.5 custom sharding to handle rolling time-oriented event data

2013-10-01 Thread Brett Hoerner
I'm interesting in using the new custom sharding features in the collections API to search a rolling window of event data. I'd appreciate a spot/sanity check of my plan/understanding. Say I only care about the last 7 days of events and I have thousands per second (billions per week). Am I correct

Problems with maxShardsPerNode in 4.5

2013-10-01 Thread Brett Hoerner
It seems that changes in 4.5 collection configuration now require users to set a maxShardsPerNode (or it defaults to 1). Maybe this was the case before, but with the new CREATESHARD API it seems a very restrictive. I've just created a very simple test collection on 3 machines where I set maxShards

Re: Problems with maxShardsPerNode in 4.5

2013-10-01 Thread Brett Hoerner
would create 1 new shard with 1 replica on any server in 4.5? Thanks! On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner wrote: > It seems that changes in 4.5 collection configuration now require users to > set a maxShardsPerNode (or it defaults to 1). > > Maybe this was the case before

Re: Problems with maxShardsPerNode in 4.5

2013-10-02 Thread Brett Hoerner
which will create > only one replica even if maxShardsPerNode=1000 at collection level. > > I'll open an issue. > > > On Wed, Oct 2, 2013 at 7:25 AM, Brett Hoerner >wrote: > > > Related, 1 more try: > > > > Created collection starting with 4 shards on 1 box

What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
I'm curious what the later "shard-local" bits do, if anything? I have a very large cluster (256 shards) and I'm sending most of my data with a single "composite", e.g. 1234!, but I'm noticing the data is being split among many of the shards. My guess right now is that since I'm only using the def

Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
zeable amount of data (68M and 128M) and the rest are very small as expected. The fact that two are receiving so much makes me think my data is being split into two shards. I'm trying to debug more now. On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley wrote: > On Tue, Oct 8, 2013 at 6:

Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
y hour and it's been running for 2). There *is* a little old data in my stream, but not that much (like <5%). What's confusing to me is that 5 of them are rather large, when I'd expect 2 of them to be. On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley wrote: > On Tue, Oct 8, 2013 at

Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
e, Oct 8, 2013 at 7:31 PM, Brett Hoerner > wrote: > > This is my clusterstate.json: > > https://gist.github.com/bretthoerner/0098f741f48f9bb51433 > > > > And these are my core sizes (note large ones are sorted to the end): > > https://gist.github.com/bretthoerner/f5b5e0

Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
Ignore me I forgot about shards= from the wiki. On Tue, Oct 8, 2013 at 7:11 PM, Brett Hoerner wrote: > I have a silly question, how do I query a single shard in SolrCloud? When > I hit solr/foo_shard1_replica1/select it always seems to do a full cluster > query. > > I can&

Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-11 Thread Brett Hoerner
Thanks folks, As an update for future readers --- the problem was on my side (my logic in picking the _route_ was flawed) as expected. :) On Tue, Oct 8, 2013 at 7:35 PM, Yonik Seeley wrote: > On Tue, Oct 8, 2013 at 8:27 PM, Shawn Heisey wrote: > > There is also the "distrib=false" parameter t

SolrCloud facet query repeatably fails with "No live SolrServers" for some terms, not all

2013-05-01 Thread Brett Hoerner
An example: https://gist.github.com/bretthoerner/2ffc362450bcd4c2487a I'll note that all shards and replicas show as "Up" (green) in the Admin UI. Does anyone know how this could happen? I can repeat this over and over with the same terms. It was my understanding that something like a facet query

SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)

2012-12-04 Thread Brett Hoerner
Hi, I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection, which I called "default" and haven't used since. I'm using an external ZK ensemble that was completely empty before I started this cloud. Once I had all 4 nodes in the cloud I used the collection API to create the real

Re: SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)

2012-12-07 Thread Brett Hoerner
CREATE or DELETE actually did anything, though. (Again, HTTP 200 OK) Still stuck here, any ideas? Brett On Tue, Dec 4, 2012 at 7:19 PM, Brett Hoerner wrote: > Hi, > > I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection, > which I called "default" and ha

Re: SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)

2012-12-09 Thread Brett Hoerner
stuff off to the overseer, you will always > get back a 200 - there is a JIRA issue that addresses this though > (collection API responses) and I hope to get it committed soon. > > - Mark > > On Dec 7, 2012, at 7:26 AM, Brett Hoerner wrote: > > > For what it's wort

Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-19 Thread Brett Hoerner
I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm trying out 4.1 RC1 right now but the update URL I used to use is returning HTTP 404. For example, I would post my document updates to, http://localhost:8983/solr/collection1 But that is 404ing now (collection1 exists according

Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-19 Thread Brett Hoerner
on it reports 404 sometimes. What's odd is that I can use curl to post a JSON document to the same URL and it will return 200. When I log every request I make from my indexer process (using solr4j) it's about 50/50 between 404 and 200... On Sat, Jan 19, 2013 at 5:22 PM, Brett Hoerner wrot

Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-20 Thread Brett Hoerner
So the ticket I created wasn't related, there is a working patch for that now but my original issue remains, I get 404 when trying to post updates to a URL that worked fine in Solr 4.0. On Sat, Jan 19, 2013 at 5:56 PM, Brett Hoerner wrote: > I'm actually wondering if this other is

Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-20 Thread Brett Hoerner
Sorry, I take it back. It looks like fixing https://issues.apache.org/jira/browse/SOLR-4321 fixed my issue after all. On Sun, Jan 20, 2013 at 2:21 PM, Brett Hoerner wrote: > So the ticket I created wasn't related, there is a working patch for that > now but my original issue remains

Problem querying collection in Solr 4.1

2013-01-21 Thread Brett Hoerner
I have a collection in Solr 4.1 RC1 and doing a simple query like text:"puppy dog" is causing an exception. Oddly enough, I CAN query for text:puppy or text:"puppy", but adding the space breaks everything. Schema and config: https://gist.github.com/f49da15e39e5609b75b1 This happens whether I quer

Re: Problem querying collection in Solr 4.1

2013-01-22 Thread Brett Hoerner
set to use Lucene version 4.0 > index format but you mention you are using it 4.1 > > LUCENE_40 > > > > On Mon, Jan 21, 2013 at 4:26 PM, Brett Hoerner >wrote: > > > I have a collection in Solr 4.1 RC1 and doing a simple query like > > text:"pup

Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-02 Thread Brett Hoerner
Hi, I have a 5 server cluster running 1 collection with 20 shards, replication factor of 2. Earlier this week I had to do a rolling restart across the cluster, this worked great and the cluster stayed up the whole time. The problem is that the last node I restarted is now the leader of 0 shards,

Re: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-02 Thread Brett Hoerner
very busy, indexing 5k+ small documents per second, but the nodes were all fine until I had to restart them and they had to re-sync. Here is the log since reboot: https://gist.github.com/396af4b217ce8f536db6 Any ideas? On Sat, Feb 2, 2013 at 10:27 AM, Brett Hoerner wrote: > Hi, > > I have

Re: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-03 Thread Brett Hoerner
ores?action=unload&name=core1. This removes the core/shard from > bob, giving the other servers a chance to grab leader props. > > -Joey > > On Feb 2, 2013, at 11:27 AM, Brett Hoerner wrote: > > > Hi, > > > > I have a 5 server cluster running 1 collection with

Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards, replication factor of 2) that I've been using for over a month now in production. Suddenly, the hourly cron I run that dispatches a delete by query completely halts all indexing. Select queries still run (and quickly), there is n

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
4.1, I'll induce it again and run jstack. On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller wrote: > Which version of Solr? > > Can you use jconsole, visualvm, or jstack to get some stack traces and see > where things are halting? > > - Mark > > On Mar 6, 2013, at 1

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
of Solr? > > Can you use jconsole, visualvm, or jstack to get some stack traces and see > where things are halting? > > - Mark > > On Mar 6, 2013, at 11:45 AM, Brett Hoerner wrote: > > > I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards, > > repli

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
ture that keeps events from happening all at > > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > > > > On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller > wrote: > > > >> Thans Brett, good stuff (though not a good problem). &

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner
replica as well? (also when > it's locked up of course). > > - Mark > > On Mar 6, 2013, at 3:34 PM, Brett Hoerner wrote: > > > If there's anything I can try, let me know. Interestingly, I think I have > > noticed that if I stop my indexer, do my delete, and r

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner
As a side note, do you think that was a poor idea? I figured it's better to spread the master "load" around? On Thu, Mar 7, 2013 at 11:29 AM, Mark Miller wrote: > > On Mar 7, 2013, at 9:03 AM, Brett Hoerner wrote: > > > To be clear, neither is really "the

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner
Thu, Mar 7, 2013 at 11:03 AM, Brett Hoerner wrote: > Here is the other server when it's locked: > https://gist.github.com/3529b7b6415756ead413 > > To be clear, neither is really "the replica", I have 32 shards and each > physical server is the leader for 16, and the rep