I'm doing HDFS input and output in my job, with the following:
hadoop jar /mnt/faas-solr.jar \
-D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \
--update-conflict-resolver com.massrel.faassolr.SolrConflictResolver
\
--morphline-file /mnt/morphline-ignore.conf \
https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b
On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote:
> Odd - might be helpful if you can share your sorlconfig.xml being used.
>
> --
> Mark Miller
> about.me/markrmiller
>
> On April 17, 2014 at 12:18:37
Sorry to bump this, I have the same issue and was curious about the sanity
of trying to work around it.
* I have a constant stream of realtime documents I need to continually
index. Sometimes they even overwrite very old documents (by using the same
unique ID).
* I also have a *huge* backlog of do
I'm back to looking at the code but holy hell is debugging Hadoop hard. :)
On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner wrote:
> https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b
>
>
> On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote:
>
>> Odd - might b
merge is complete. If writes are allowed, corruption may occur on the
merged index." Is that saying that Solr will block writes, or is that
saying the end user has to ensure no writes are happening against the
collection during a merge? That seems... risky?
On Tue, Apr 22, 2014 at 9:29 AM, Brett
If I run a query like this,
fq=text:lol
fq=created_at_tdid:[1400544000 TO 1400630400]
It takes about 6 seconds. Following queries take only 50ms or less, as
expected because my fqs are cached.
However, if I change the query to not cache my big range query:
fq=text:lol
fq={!cache=false}created_a
act of storing the work after
it's done (it has to be done in either case) is taking 4 whole seconds?
On Tue, Jun 3, 2014 at 3:59 PM, Shawn Heisey wrote:
> On 6/3/2014 2:44 PM, Brett Hoerner wrote:
> > If I run a query like this,
> >
> > fq=text:lol
> > fq=created_a
, but that seems...
surprising to me.
On Tue, Jun 3, 2014 at 4:02 PM, Brett Hoerner
wrote:
> In this case, I have >400 million documents, so I understand it taking a
> while.
>
> That said, I'm still not sure I understand why it would take *more* time.
> In your exampl
Yonik, I'm familiar with your blog posts -- and thanks very much for them.
:) Though I'm not sure what you're trying to show me with the q=*:* part? I
was of course using q=*:* in my queries, but I assume you mean to leave off
the text:lol bit?
I've done some Cluster changes, so these are my basel
The following two queries are doing the same thing, one using a "normal" fq
range query and another using a parent query. The cache is warm (these are
both hits) but the "normal" ones takes ~6 to 7.5sec while the parent query
hack takes ~1.2sec.
Is this expected? Is there anything "wrong" with my
lso if you tell the overall number of docs
> in the index, and cardinality of both filters, it might allow to guess
> something. Anyway, jvisualvm sampling can give an exact answer. Giving
> responses, it's enough to profile one of the slave nodes.
>
>
> On Wed, Jun 4, 2014
Can anyone explain the difference between these two queries?
text:(+"happy") AND -user:("123456789") = numFound 2912224
But
text:(+"happy") AND user:(-"123456789") = numFound 0
Now, you may just say "then just put - infront of your field, duh!" Well,
text:(+"happy") = numFound 2912224
quot;. For example:
>
> text:(+"happy") AND user:(*:* -"123456789")
>
> -- Jack Krupansky
>
> -Original Message- From: Brett Hoerner
> Sent: Tuesday, July 1, 2014 2:51 PM
> To: solr-user@lucene.apache.org
> Subject: Confusion about location of
Also, does anyone have the Solr or Lucene bug # for this?
On Tue, Jul 1, 2014 at 3:06 PM, Brett Hoerner
wrote:
> Interesting, is there a performance impact to sending the *:*?
>
>
> On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky
> wrote:
>
>> Yeah, there's a k
Hi, I've been using a collection on Solr 4.5.X for a few weeks and just did
an upgrade to 4.6 and am having some issues.
First: this collection is, I guess, implicitly routed. I do this for every
document insert using SolrJ:
document.addField("_route_", shardId)
After upgrading the servers to
Here's my clusterstate.json:
https://gist.github.com/bretthoerner/a8120a8d89c93f773d70
On Mon, Nov 25, 2013 at 10:18 AM, Brett Hoerner wrote:
> Hi, I've been using a collection on Solr 4.5.X for a few weeks and just
> did an upgrade to 4.6 and am having some issues.
; (is there
a tool for this? I've always done it manually), started the cluster up
again and it's all good now.
On Mon, Nov 25, 2013 at 10:38 AM, Brett Hoerner wrote:
> Here's my clusterstate.json:
>
> https://gist.github.com/bretthoerner/a8120a8d89c93f773d70
>
>
I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ
4.6.1 and indexing ceased (indexer returned "No live servers for shard" but
the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
fine for the query side, just not adding documents.
21:35:21.508 [qtp14184
On Fri, Feb 7, 2014 at 6:15 PM, Mark Miller wrote:
> You have to update the other nodes to 4.6.1 as well.
>
I'm not sure I follow, all of the Solr instances in the cluster are 4.6.1
to my knowledge?
Thanks,
Brett
;
> - Mark
>
> http://about.me/markrmiller
>
>
>
> On Feb 7, 2014, 7:01:24 PM, Brett Hoerner wrote:
> I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ
> 4.6.1 and indexing ceased (indexer returned "No live servers for shard" but
> th
not 4.6.1. That code couldn’t have been 4.6.1 it seems.
>
> - Mark
>
> http://about.me/markrmiller
>
> On Feb 8, 2014, at 11:12 AM, Brett Hoerner wrote:
>
> > Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I
> > verified 4.6.1 i
Mark, you were correct. I realized I was still running a prerelease of
4.6.1 (by a handful of commits). Bounced them with proper 4.6.1 and we're
all good, sorry for the spam. :)
On Sat, Feb 8, 2014 at 10:29 AM, Brett Hoerner wrote:
> Oh, I was talking about my indexer. That stack is
I have a very weird problem that I'm going to try to describe here to see
if anyone has any "ah-ha" moments or clues. I haven't created a small
reproducible project for this but I guess I will have to try in the future
if I can't figure it out. (Or I'll need to bisect by running long Hadoop
jobs...
(StandardDirectoryReader.java:277)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476)
... 25 more
On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner
wrote:
> I have a very weird prob
To be clear, those exceptions are during the "main" mapred job that is
creating the many small indexes. If these errors above occur (they don't
fail the job), I am 99% sure that is when the MTree job later hangs.
On Tue, Sep 23, 2014 at 1:02 PM, Brett Hoerner
wrote:
> I
ing take a long time. I haven't tried to
see if the issue shows on smaller jobs yet (does 1 minute become 6
minutes?).
Brett
On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner
wrote:
> I have a very weird problem that I'm going to try to describe here to see
> if anyone has any &
I'm interesting in using the new custom sharding features in the
collections API to search a rolling window of event data. I'd appreciate a
spot/sanity check of my plan/understanding.
Say I only care about the last 7 days of events and I have thousands per
second (billions per week).
Am I correct
It seems that changes in 4.5 collection configuration now require users to
set a maxShardsPerNode (or it defaults to 1).
Maybe this was the case before, but with the new CREATESHARD API it seems a
very restrictive. I've just created a very simple test collection on 3
machines where I set maxShards
would create 1 new shard with 1 replica on any
server in 4.5?
Thanks!
On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner wrote:
> It seems that changes in 4.5 collection configuration now require users to
> set a maxShardsPerNode (or it defaults to 1).
>
> Maybe this was the case before
which will create
> only one replica even if maxShardsPerNode=1000 at collection level.
>
> I'll open an issue.
>
>
> On Wed, Oct 2, 2013 at 7:25 AM, Brett Hoerner >wrote:
>
> > Related, 1 more try:
> >
> > Created collection starting with 4 shards on 1 box
I'm curious what the later "shard-local" bits do, if anything?
I have a very large cluster (256 shards) and I'm sending most of my data
with a single "composite", e.g. 1234!, but I'm noticing the data
is being split among many of the shards.
My guess right now is that since I'm only using the def
zeable amount of data (68M and 128M) and the rest are very
small as expected.
The fact that two are receiving so much makes me think my data is being
split into two shards. I'm trying to debug more now.
On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley wrote:
> On Tue, Oct 8, 2013 at 6:
y hour and it's
been running for 2). There *is* a little old data in my stream, but not
that much (like <5%). What's confusing to me is that 5 of them are rather
large, when I'd expect 2 of them to be.
On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley wrote:
> On Tue, Oct 8, 2013 at
e, Oct 8, 2013 at 7:31 PM, Brett Hoerner
> wrote:
> > This is my clusterstate.json:
> > https://gist.github.com/bretthoerner/0098f741f48f9bb51433
> >
> > And these are my core sizes (note large ones are sorted to the end):
> > https://gist.github.com/bretthoerner/f5b5e0
Ignore me I forgot about shards= from the wiki.
On Tue, Oct 8, 2013 at 7:11 PM, Brett Hoerner wrote:
> I have a silly question, how do I query a single shard in SolrCloud? When
> I hit solr/foo_shard1_replica1/select it always seems to do a full cluster
> query.
>
> I can&
Thanks folks,
As an update for future readers --- the problem was on my side (my logic in
picking the _route_ was flawed) as expected. :)
On Tue, Oct 8, 2013 at 7:35 PM, Yonik Seeley wrote:
> On Tue, Oct 8, 2013 at 8:27 PM, Shawn Heisey wrote:
> > There is also the "distrib=false" parameter t
An example:
https://gist.github.com/bretthoerner/2ffc362450bcd4c2487a
I'll note that all shards and replicas show as "Up" (green) in the Admin UI.
Does anyone know how this could happen? I can repeat this over and over
with the same terms. It was my understanding that something like a facet
query
Hi,
I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection,
which I called "default" and haven't used since. I'm using an external ZK
ensemble that was completely empty before I started this cloud.
Once I had all 4 nodes in the cloud I used the collection API to create the
real
CREATE or DELETE actually did anything, though. (Again, HTTP
200 OK)
Still stuck here, any ideas?
Brett
On Tue, Dec 4, 2012 at 7:19 PM, Brett Hoerner wrote:
> Hi,
>
> I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection,
> which I called "default" and ha
stuff off to the overseer, you will always
> get back a 200 - there is a JIRA issue that addresses this though
> (collection API responses) and I hope to get it committed soon.
>
> - Mark
>
> On Dec 7, 2012, at 7:26 AM, Brett Hoerner wrote:
>
> > For what it's wort
I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm
trying out 4.1 RC1 right now but the update URL I used to use is returning
HTTP 404.
For example, I would post my document updates to,
http://localhost:8983/solr/collection1
But that is 404ing now (collection1 exists according
on it reports 404
sometimes. What's odd is that I can use curl to post a JSON document to the
same URL and it will return 200.
When I log every request I make from my indexer process (using solr4j) it's
about 50/50 between 404 and 200...
On Sat, Jan 19, 2013 at 5:22 PM, Brett Hoerner wrot
So the ticket I created wasn't related, there is a working patch for that
now but my original issue remains, I get 404 when trying to post updates to
a URL that worked fine in Solr 4.0.
On Sat, Jan 19, 2013 at 5:56 PM, Brett Hoerner wrote:
> I'm actually wondering if this other is
Sorry, I take it back. It looks like fixing
https://issues.apache.org/jira/browse/SOLR-4321 fixed my issue after all.
On Sun, Jan 20, 2013 at 2:21 PM, Brett Hoerner wrote:
> So the ticket I created wasn't related, there is a working patch for that
> now but my original issue remains
I have a collection in Solr 4.1 RC1 and doing a simple query like
text:"puppy dog" is causing an exception. Oddly enough, I CAN query for
text:puppy or text:"puppy", but adding the space breaks everything.
Schema and config: https://gist.github.com/f49da15e39e5609b75b1
This happens whether I quer
set to use Lucene version 4.0
> index format but you mention you are using it 4.1
>
> LUCENE_40
>
>
>
> On Mon, Jan 21, 2013 at 4:26 PM, Brett Hoerner >wrote:
>
> > I have a collection in Solr 4.1 RC1 and doing a simple query like
> > text:"pup
Hi,
I have a 5 server cluster running 1 collection with 20 shards, replication
factor of 2.
Earlier this week I had to do a rolling restart across the cluster, this
worked great and the cluster stayed up the whole time. The problem is that
the last node I restarted is now the leader of 0 shards,
very busy, indexing 5k+ small documents
per second, but the nodes were all fine until I had to restart them and
they had to re-sync.
Here is the log since reboot: https://gist.github.com/396af4b217ce8f536db6
Any ideas?
On Sat, Feb 2, 2013 at 10:27 AM, Brett Hoerner wrote:
> Hi,
>
> I have
ores?action=unload&name=core1. This removes the core/shard from
> bob, giving the other servers a chance to grab leader props.
>
> -Joey
>
> On Feb 2, 2013, at 11:27 AM, Brett Hoerner wrote:
>
> > Hi,
> >
> > I have a 5 server cluster running 1 collection with
I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
replication factor of 2) that I've been using for over a month now in
production.
Suddenly, the hourly cron I run that dispatches a delete by query
completely halts all indexing. Select queries still run (and quickly),
there is n
4.1, I'll induce it again and run jstack.
On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller wrote:
> Which version of Solr?
>
> Can you use jconsole, visualvm, or jstack to get some stack traces and see
> where things are halting?
>
> - Mark
>
> On Mar 6, 2013, at 1
of Solr?
>
> Can you use jconsole, visualvm, or jstack to get some stack traces and see
> where things are halting?
>
> - Mark
>
> On Mar 6, 2013, at 11:45 AM, Brett Hoerner wrote:
>
> > I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
> > repli
ture that keeps events from happening all at
> > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
> >
> >
> > On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller
> wrote:
> >
> >> Thans Brett, good stuff (though not a good problem).
&
replica as well? (also when
> it's locked up of course).
>
> - Mark
>
> On Mar 6, 2013, at 3:34 PM, Brett Hoerner wrote:
>
> > If there's anything I can try, let me know. Interestingly, I think I have
> > noticed that if I stop my indexer, do my delete, and r
As a side note, do you think that was a poor idea? I figured it's better to
spread the master "load" around?
On Thu, Mar 7, 2013 at 11:29 AM, Mark Miller wrote:
>
> On Mar 7, 2013, at 9:03 AM, Brett Hoerner wrote:
>
> > To be clear, neither is really "the
Thu, Mar 7, 2013 at 11:03 AM, Brett Hoerner wrote:
> Here is the other server when it's locked:
> https://gist.github.com/3529b7b6415756ead413
>
> To be clear, neither is really "the replica", I have 32 shards and each
> physical server is the leader for 16, and the rep
56 matches
Mail list logo