Confusion when using go-live and MapReduceIndexerTool

2014-04-17 Thread Brett Hoerner
I'm doing HDFS input and output in my job, with the following:

hadoop jar /mnt/faas-solr.jar \
   -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \
   --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver
\
   --morphline-file /mnt/morphline-ignore.conf \
   --zk-host $ZKHOST \
   --output-dir hdfs://$MASTERIP:9000/output/ \
   --collection $COLLECTION \
   --go-live \
   --verbose \
   hdfs://$MASTERIP:9000/input/

Index creation works,

$ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0
drwxr-xr-x   - hadoop supergroup  0 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data
drwxr-xr-x   - hadoop supergroup  0 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index
-rwxr-xr-x   1 hadoop supergroup 61 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0.fdt
-rwxr-xr-x   1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0.fdx
-rwxr-xr-x   1 hadoop supergroup   1681 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0.fnm
-rwxr-xr-x   1 hadoop supergroup396 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0.si
-rwxr-xr-x   1 hadoop supergroup 67 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc
-rwxr-xr-x   1 hadoop supergroup 37 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos
-rwxr-xr-x   1 hadoop supergroup508 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim
-rwxr-xr-x   1 hadoop supergroup305 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip
-rwxr-xr-x   1 hadoop supergroup120 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd
-rwxr-xr-x   1 hadoop supergroup351 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm
-rwxr-xr-x   1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/segments_1
-rwxr-xr-x   1 hadoop supergroup110 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/index/segments_2
drwxr-xr-x   - hadoop supergroup  0 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/tlog
-rw-r--r--   1 hadoop supergroup333 2014-04-17 16:00 hdfs://
10.98.33.114:9000/output/results/part-0/data/tlog/tlog.000

But the go-live step fails, it's trying to use the HDFS path as the remote
index path?

14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into
Solr cluster...
14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs://
10.98.33.114:9000/output/results/part-0 into
http://discover8-test-1d.i.massrel.com:8983/solr
14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command
java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
directory '/mnt/solr_8983/home/hdfs:/
10.98.33.114:9000/output/results/part-0/data/index' does not exist
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
at
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
at
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
directory '/mnt/solr_8983/home/hdfs:/
10.98.33.114:9000/output/results/part-0/data/index' does not exist
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$Runn

Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-17 Thread Brett Hoerner
https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b


On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller  wrote:

> Odd - might be helpful if you can share your sorlconfig.xml being used.
>
> --
> Mark Miller
> about.me/markrmiller
>
> On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com)
> wrote:
>
> I'm doing HDFS input and output in my job, with the following:
>
> hadoop jar /mnt/faas-solr.jar \
> -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \
> --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver
> \
> --morphline-file /mnt/morphline-ignore.conf \
> --zk-host $ZKHOST \
> --output-dir hdfs://$MASTERIP:9000/output/ \
> --collection $COLLECTION \
> --go-live \
> --verbose \
> hdfs://$MASTERIP:9000/input/
>
> Index creation works,
>
> $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0
> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data
> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index
> -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdt
> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdx
> -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0.fnm
> -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0.si
> -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc
> -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos
> -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim
> -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip
> -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd
> -rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm
> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/segments_1
> -rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/index/segments_2
> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
> 10.98.33.114:9000/output/results/part-0/data/tlog
> -rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs://
>
> 10.98.33.114:9000/output/results/part-0/data/tlog/tlog.000
>
> But the go-live step fails, it's trying to use the HDFS path as the remote
> index path?
>
> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into
> Solr cluster...
> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs://
> 10.98.33.114:9000/output/results/part-0 into
> http://discover8-test-1d.i.massrel.com:8983/solr
> 14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command
> java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> directory '/mnt/solr_8983/home/hdfs:/
> 10.98.33.114:9000/output/results/part-0/data/index' does not exist
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
> at
>
> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
> at
>
> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
>
> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> directory '/mnt/solr_8983/home/hdfs:/
> 10.98.33.114:9000/output/results/part-0/data/index' does not exist
> at
>
>

Re: index merge question

2014-04-17 Thread Brett Hoerner
Sorry to bump this, I have the same issue and was curious about the sanity
of trying to work around it.

* I have a constant stream of realtime documents I need to continually
index. Sometimes they even overwrite very old documents (by using the same
unique ID).
* I also have a *huge* backlog of documents I'd like to get into a
SolrCloud cluster via Hadoop.

I understand that the MERGEINDEXES operation expects me to have unique
documents, but is it reasonable at all for me to be able to change that? In
a plain Solr instance I can add doc1, then add doc1 again with new fields
and the new update "wins" and I assume during segment merges the old update
is eventually removed. Does that mean it's possible for me to somehow
override a merge policy (or something like that?) to effectively do exactly
what my Hadoop conflict-resolver does? I already have logic there that
knows how to (1) decide which of 2 duplicate documents to keep and (2)
respect and "keep" deletes over anything else.

I'd love some pointers at what Solr/Lucene classes to look at if I wanted
to try my hand at this. I'm down in Lucene SegmentMerger right now but it
seems too low level to understand whatever Solr "knows" about enforcing a
single unique ID at merge (and search...? or update...?) time.

Thanks!



On Tue, Jun 11, 2013 at 11:10 AM, Mark Miller  wrote:

> Right - but that sounds a little different than what we were talking about.
>
> You had brought up the core admin merge cmd that let's you merge an index
> into a running Solr cluster.
>
> We are calling that the golive option in the map reduce indexing code. It
> has the limitations we have discussed.
>
> However, if you are only using map reduce to build indexes, there are
> facilities for dealing with duplicate id's - as you see in the
> documentation. The merges involved in that are different though - these are
> merges that happen as the final index is being constructed by the map
> reduce job. The final step is the golive step, where the indexes will be
> deployed to the running Solr cluster - this is what uses the core admin
> merge command, and if you are doing updates or adds outside of map reduce,
> you will face the issues we have discussed.
>
>
> - Mark
>
> On Jun 11, 2013, at 11:57 AM, James Thomas  wrote:
>
> > FWIW, the Solr included with Cloudera Search, by default, "ignores all
> but the most recent document version" during merges.
> > The conflict resolution is configurable however.  See the documentation
> for details.
> >
> http://www.cloudera.com/content/support/en/documentation/cloudera-search/cloudera-search-documentation-v1-latest.html
> > -- see the user guide pdf, " update-conflict-resolver" parameter
> >
> > James
> >
> > -Original Message-
> > From: anirudh...@gmail.com [mailto:anirudh...@gmail.com] On Behalf Of
> Anirudha Jadhav
> > Sent: Tuesday, June 11, 2013 10:47 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: index merge question
> >
> > From my experience the lucene mergeTool and the one invoked by coreAdmin
> is a pure lucene implementation and does not understand the concepts of a
> unique Key(solr land concept)
> >
> >  http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note
> at the end
> >
> > we do frequent index merges for which we externally run map/reduce (
> java code using lucene api's) jobs to merge & validate merged indices with
> sources.
> > -Ani
> >
> > On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller 
> wrote:
> >> Yeah, you have to carefully manage things if you are map/reduce
> building indexes *and* updating documents in other ways.
> >>
> >> If your 'source' data for MR index building is the 'truth', you also
> have the option of not doing incremental index merging, and you could
> simply rebuild the whole thing every time - of course, depending your
> cluster size, that could be quite expensive.
> >
> >>
> >> - Mark
> >>
> >> On Jun 10, 2013, at 8:36 PM, Jamie Johnson  wrote:
> >>
> >>> Thanks Mark.  My question is stemming from the new cloudera search
> stuff.
> >>> My concern its that if while rebuilding the index someone updates a
> >>> doc that update could be lost from a solr perspective.  I guess what
> >>> would need to happen to ensure the correct information was indexed
> >>> would be to record the start time and reindex the information that
> changed since then?
> >>> On Jun 8, 2013 2:37 PM, "Mark Miller"  wrote:
> >>>
> 
>  On Jun 8, 2013, at 12:52 PM, Jamie Johnson  wrote:
> 
> > When merging through the core admin (
> > http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy
> > for conflicts during the merge?  So for instance if I am merging
> > core 1 and core 2 into core 0 (first example), what happens if core
> > 1 and core 2
>  both
> > have a document with the same key, say core 1 has a newer version
> > of core 2?  Does the merge fail, does the newer document remain?
> 
>  You end up with both documents, both with that ID

Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-22 Thread Brett Hoerner
Anyone have any thoughts on this?

In general, am I expected to be able to go-live from an unrelated cluster
of Hadoop machines to a SolrCloud that isn't running off of HDFS?

intput: HDFS
output: HDFS
go-live cluster: SolrCloud cluster on different machines running on plain
MMapDirectory

I'm back to looking at the code but holy hell is debugging Hadoop hard. :)


On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner wrote:

> https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b
>
>
> On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote:
>
>> Odd - might be helpful if you can share your sorlconfig.xml being used.
>>
>> --
>> Mark Miller
>> about.me/markrmiller
>>
>> On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com)
>> wrote:
>>
>> I'm doing HDFS input and output in my job, with the following:
>>
>> hadoop jar /mnt/faas-solr.jar \
>> -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \
>> --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver
>> \
>> --morphline-file /mnt/morphline-ignore.conf \
>> --zk-host $ZKHOST \
>> --output-dir hdfs://$MASTERIP:9000/output/ \
>> --collection $COLLECTION \
>> --go-live \
>> --verbose \
>> hdfs://$MASTERIP:9000/input/
>>
>> Index creation works,
>>
>> $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0
>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data
>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index
>> -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdt
>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdx
>> -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fnm
>> -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0.si
>> -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc
>> -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos
>> -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim
>> -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip
>> -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd
>> -rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm
>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/segments_1
>> -rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/index/segments_2
>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
>> 10.98.33.114:9000/output/results/part-0/data/tlog
>> -rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs://
>>
>> 10.98.33.114:9000/output/results/part-0/data/tlog/tlog.000
>>
>> But the go-live step fails, it's trying to use the HDFS path as the remote
>> index path?
>>
>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into
>> Solr cluster...
>> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs://
>> 10.98.33.114:9000/output/results/part-0 into
>> http://discover8-test-1d.i.massrel.com:8983/solr
>> 14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command
>> java.util.concurrent.ExecutionException:
>> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> directory '/mnt/solr_8983/home/hdfs:/
>> 10.98.33.114:9000/output/results/part-0/data/index' does not exist
>> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
>> at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
>> at
>>
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
>> at
>>
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609

Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-22 Thread Brett Hoerner
I think I'm just misunderstanding the use of go-live. From mergeindexes
docs: "The indexes must exist on the disk of the Solr host, which may make
using this in a distributed environment cumbersome."

I'm guessing I'll have to write some sort of tool that pulls each completed
index out of HDFS and onto the respective SolrCloud machines and manually
do some kind of merge? I don't want to (can't) be running my Hadoop jobs on
the same nodes that SolrCloud is running on...

Also confusing to me: "no writes should be allowed on either core until the
merge is complete. If writes are allowed, corruption may occur on the
merged index." Is that saying that Solr will block writes, or is that
saying the end user has to ensure no writes are happening against the
collection during a merge? That seems... risky?


On Tue, Apr 22, 2014 at 9:29 AM, Brett Hoerner wrote:

> Anyone have any thoughts on this?
>
> In general, am I expected to be able to go-live from an unrelated cluster
> of Hadoop machines to a SolrCloud that isn't running off of HDFS?
>
> intput: HDFS
> output: HDFS
> go-live cluster: SolrCloud cluster on different machines running on plain
> MMapDirectory
>
> I'm back to looking at the code but holy hell is debugging Hadoop hard. :)
>
>
> On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner wrote:
>
>> https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b
>>
>>
>> On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote:
>>
>>> Odd - might be helpful if you can share your sorlconfig.xml being used.
>>>
>>> --
>>> Mark Miller
>>> about.me/markrmiller
>>>
>>> On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com)
>>> wrote:
>>>
>>> I'm doing HDFS input and output in my job, with the following:
>>>
>>> hadoop jar /mnt/faas-solr.jar \
>>> -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \
>>> --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver
>>> \
>>> --morphline-file /mnt/morphline-ignore.conf \
>>> --zk-host $ZKHOST \
>>> --output-dir hdfs://$MASTERIP:9000/output/ \
>>> --collection $COLLECTION \
>>> --go-live \
>>> --verbose \
>>> hdfs://$MASTERIP:9000/input/
>>>
>>> Index creation works,
>>>
>>> $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index
>>> -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdt
>>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdx
>>> -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fnm
>>> -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.si
>>> -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc
>>> -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos
>>> -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim
>>> -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip
>>> -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd
>>> -rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm
>>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/segments_1
>>> -rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/index/segments_2
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://
>>> 10.98.33.114:9000/output/results/part-0/data/tlog
>>> -rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs://
>>>
>>> 10.98.33.114:9000/

Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)

2014-06-03 Thread Brett Hoerner
If I run a query like this,

fq=text:lol
fq=created_at_tdid:[1400544000 TO 1400630400]

It takes about 6 seconds. Following queries take only 50ms or less, as
expected because my fqs are cached.

However, if I change the query to not cache my big range query:

fq=text:lol
fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400]

It takes 2 seconds every time, which is a much better experience for my
"first query for that range."

What's odd to me is that I would expect both of these (first) queries to
have to do the same amount of work, expect the first one stuffs the
resulting bitset into a map at the end... which seems to have a 4 second
overhead?

Here's my filterCache from solrconfig:



Thanks.


Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)

2014-06-03 Thread Brett Hoerner
In this case, I have >400 million documents, so I understand it taking a
while.

That said, I'm still not sure I understand why it would take *more* time.
In your example above, wouldn't it have to create an 11.92MB bitset even if
I *don't* cache the bitset? It seems the mere act of storing the work after
it's done (it has to be done in either case) is taking 4 whole seconds?



On Tue, Jun 3, 2014 at 3:59 PM, Shawn Heisey  wrote:

> On 6/3/2014 2:44 PM, Brett Hoerner wrote:
> > If I run a query like this,
> >
> > fq=text:lol
> > fq=created_at_tdid:[1400544000 TO 1400630400]
> >
> > It takes about 6 seconds. Following queries take only 50ms or less, as
> > expected because my fqs are cached.
> >
> > However, if I change the query to not cache my big range query:
> >
> > fq=text:lol
> > fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400]
> >
> > It takes 2 seconds every time, which is a much better experience for my
> > "first query for that range."
> >
> > What's odd to me is that I would expect both of these (first) queries to
> > have to do the same amount of work, expect the first one stuffs the
> > resulting bitset into a map at the end... which seems to have a 4 second
> > overhead?
> >
> > Here's my filterCache from solrconfig:
> >
> >  >  size="64"
> >  initialSize="64"
> >  autowarmCount="32"/>
>
> I think that probably depends on how many documents you have in the
> single index/shard.  If you have one hundred million documents stored in
> the Lucene index, then each filter entry is 1250 bytes (11.92MB) in
> size - it is a bitset representing every document and whether it is
> included or excluded.  That data would need to be gathered and copied
> into the cache.  I suspect that it's the gathering that takes the most
> time ... several megabytes of memory is not very much for a modern
> processor to copy.
>
> As for how long this takes, I actually have no idea.  You have two
> filters here, so it would need to do everything twice.
>
> Thanks,
> Shawn
>
>


Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)

2014-06-03 Thread Brett Hoerner
This is seemingly where it checks whether to use cache or not, the extra
work is really just a get (miss) and a put:


https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1216

I suppose it's possible the put is taking 4 seconds, but that seems...
surprising to me.


On Tue, Jun 3, 2014 at 4:02 PM, Brett Hoerner 
wrote:

> In this case, I have >400 million documents, so I understand it taking a
> while.
>
> That said, I'm still not sure I understand why it would take *more* time.
> In your example above, wouldn't it have to create an 11.92MB bitset even if
> I *don't* cache the bitset? It seems the mere act of storing the work after
> it's done (it has to be done in either case) is taking 4 whole seconds?
>
>
>
> On Tue, Jun 3, 2014 at 3:59 PM, Shawn Heisey  wrote:
>
>> On 6/3/2014 2:44 PM, Brett Hoerner wrote:
>> > If I run a query like this,
>> >
>> > fq=text:lol
>> > fq=created_at_tdid:[1400544000 TO 1400630400]
>> >
>> > It takes about 6 seconds. Following queries take only 50ms or less, as
>> > expected because my fqs are cached.
>> >
>> > However, if I change the query to not cache my big range query:
>> >
>> > fq=text:lol
>> > fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400]
>> >
>> > It takes 2 seconds every time, which is a much better experience for my
>> > "first query for that range."
>> >
>> > What's odd to me is that I would expect both of these (first) queries to
>> > have to do the same amount of work, expect the first one stuffs the
>> > resulting bitset into a map at the end... which seems to have a 4 second
>> > overhead?
>> >
>> > Here's my filterCache from solrconfig:
>> >
>> > > >  size="64"
>> >  initialSize="64"
>> >  autowarmCount="32"/>
>>
>> I think that probably depends on how many documents you have in the
>> single index/shard.  If you have one hundred million documents stored in
>> the Lucene index, then each filter entry is 1250 bytes (11.92MB) in
>> size - it is a bitset representing every document and whether it is
>> included or excluded.  That data would need to be gathered and copied
>> into the cache.  I suspect that it's the gathering that takes the most
>> time ... several megabytes of memory is not very much for a modern
>> processor to copy.
>>
>> As for how long this takes, I actually have no idea.  You have two
>> filters here, so it would need to do everything twice.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)

2014-06-03 Thread Brett Hoerner
Yonik, I'm familiar with your blog posts -- and thanks very much for them.
:) Though I'm not sure what you're trying to show me with the q=*:* part? I
was of course using q=*:* in my queries, but I assume you mean to leave off
the text:lol bit?

I've done some Cluster changes, so these are my baselines:

q=*:*
fq=created_at_tdid:[1392768004 TO 1393944400] (uncached at this point)
~7.5 seconds

q=*:*
fq={!cache=false}created_at_tdid:[1392768005 TO 1393944400]
~7.5 seconds (I guess this is what you were trying to show me?)

The thing is, my queries always more "specific" than that, so given a
string:

q=*:*
fq=text:basketball
fq={!cache=false}created_at_tdid:[1392768007 TO 1393944400]
~5.2 seconds

q=*:*
fq=text:basketball
fq={!cache=false}created_at_tdid:[1392768005 TO 1393944400]
~1.6 seconds

Is there no hope for my first time fq searches being as fast as non-cached
fqs? It's a shame to have to chose either (1) super fast queries after
cached XOR (2) more responsive first time queries (by a large margin).

Thanks!



On Tue, Jun 3, 2014 at 4:20 PM, Yonik Seeley  wrote:

> On Tue, Jun 3, 2014 at 5:19 PM, Yonik Seeley 
> wrote:
> > So try:
> >   q=*:*
> >   fq=created_at_tdid:[1400544000 TO 1400630400]
>
> vs
>
> So try:
>   q=*:*
>   fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400]
>
>
> -Yonik
> http://heliosearch.org - facet functions, subfacets, off-heap
> filters&fieldcache
>


"Fake" cached join query much faster than cached fq?

2014-06-04 Thread Brett Hoerner
The following two queries are doing the same thing, one using a "normal" fq
range query and another using a parent query. The cache is warm (these are
both hits) but the "normal" ones takes ~6 to 7.5sec while the parent query
hack takes ~1.2sec.

Is this expected? Is there anything "wrong" with my "normal fq" query? My
plan is to increase the size of my perSegFilter cache so I can use the hack
for faster queries... any thoughts here?

"responseHeader": { "status": 0, "QTime": 7657, "params": { "q": "*:*", "
facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [
"created_at_tdid:[1392768001
TO 1393954400]", "text:coffee" ], "rows": "0", "wt": "json", "facet": "true",
"_": "1401906435914" } }, "response": { "numFound": 2432754, "start": 0, "
maxScore": 1, "docs": [] }

Full response example:
https://gist.githubusercontent.com/bretthoerner/60418f08a88093c30220/raw/0a61f013f763e68985c15c5ed6cad6fa253182b9/gistfile1.txt

 "responseHeader": { "status": 0, "QTime": 1210, "params": { "q": "*:*", "
facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [
"{!cache=false}{!parent
which='created_at_tdid:[1392768001 TO 1393954400]'}", "text:coffee" ], "rows":
"0", "wt": "json", "facet": "true", "_": "1401906444521" } }, "response": {
"numFound": 2432754, "start": 0, "maxScore": 1, "docs": [] }

Full response example:
https://gist.githubusercontent.com/bretthoerner/9d82aa8fe59ffc7ff6ab/raw/560a395a0933870a5d2ac736b58805d8fab7f758/gistfile1.txt


Re: "Fake" cached join query much faster than cached fq?

2014-06-05 Thread Brett Hoerner
Thanks Mikhail, I'll try to profile it soon.

As for cardinality, on a single core:

created_at_tdid:[1392768001 TO 1393954400] = 241657215
text:coffee = 117593

Oddly enough, I just tried the query with &distrib=false and both return in
about 50ms... hmm.




On Thu, Jun 5, 2014 at 5:09 AM, Mikhail Khludnev  wrote:

> Brett,
>
> It's really interesting observation. I can only speculate. It's worth to
> check cache hit stats  and cache content via
> http://wiki.apache.org/solr/SolrCaching#showItems (the key question what
> are cached doc sets classes). Also if you tell the overall number of docs
> in the index, and cardinality of both filters, it might allow to guess
> something. Anyway, jvisualvm sampling can give an exact answer. Giving
> responses, it's enough to profile one of the slave nodes.
>
>
> On Wed, Jun 4, 2014 at 10:32 PM, Brett Hoerner 
> wrote:
>
> > The following two queries are doing the same thing, one using a "normal"
> fq
> > range query and another using a parent query. The cache is warm (these
> are
> > both hits) but the "normal" ones takes ~6 to 7.5sec while the parent
> query
> > hack takes ~1.2sec.
> >
> > Is this expected? Is there anything "wrong" with my "normal fq" query? My
> > plan is to increase the size of my perSegFilter cache so I can use the
> hack
> > for faster queries... any thoughts here?
> >
> > "responseHeader": { "status": 0, "QTime": 7657, "params": { "q": "*:*", "
> > facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [
> > "created_at_tdid:[1392768001
> > TO 1393954400]", "text:coffee" ], "rows": "0", "wt": "json", "facet":
> > "true",
> > "_": "1401906435914" } }, "response": { "numFound": 2432754, "start": 0,
> "
> > maxScore": 1, "docs": [] }
> >
> > Full response example:
> >
> >
> https://gist.githubusercontent.com/bretthoerner/60418f08a88093c30220/raw/0a61f013f763e68985c15c5ed6cad6fa253182b9/gistfile1.txt
> >
> >  "responseHeader": { "status": 0, "QTime": 1210, "params": { "q": "*:*",
> "
> > facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [
> > "{!cache=false}{!parent
> > which='created_at_tdid:[1392768001 TO 1393954400]'}", "text:coffee" ],
> > "rows":
> > "0", "wt": "json", "facet": "true", "_": "1401906444521" } },
> "response": {
> > "numFound": 2432754, "start": 0, "maxScore": 1, "docs": [] }
> >
> > Full response example:
> >
> >
> https://gist.githubusercontent.com/bretthoerner/9d82aa8fe59ffc7ff6ab/raw/560a395a0933870a5d2ac736b58805d8fab7f758/gistfile1.txt
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  
>


Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Can anyone explain the difference between these two queries?

  text:(+"happy") AND -user:("123456789") = numFound 2912224

But

  text:(+"happy") AND user:(-"123456789") = numFound 0

Now, you may just say "then just put - infront of your field, duh!" Well,

  text:(+"happy") = numFound 2912224
  user:(-"123456789") = numFound 465998192

(FWIW there is no user named 123456789 in my index)

As you can see, the queries work alone, but when combined with an AND I
always get 0 results. If I move the - before the field in my query, it
works. What am I missing here?

Thanks.


Re: Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Interesting, is there a performance impact to sending the *:*?


On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky 
wrote:

> Yeah, there's a known bug that a negative-only query within parentheses
> doesn't match properly - you need to add a non-negative term, such as
> "*:*". For example:
>
>  text:(+"happy") AND user:(*:* -"123456789")
>
> -- Jack Krupansky
>
> -Original Message- From: Brett Hoerner
> Sent: Tuesday, July 1, 2014 2:51 PM
> To: solr-user@lucene.apache.org
> Subject: Confusion about location of + and - ?
>
>
> Can anyone explain the difference between these two queries?
>
>  text:(+"happy") AND -user:("123456789") = numFound 2912224
>
> But
>
>  text:(+"happy") AND user:(-"123456789") = numFound 0
>
> Now, you may just say "then just put - infront of your field, duh!" Well,
>
>  text:(+"happy") = numFound 2912224
>  user:(-"123456789") = numFound 465998192
>
> (FWIW there is no user named 123456789 in my index)
>
> As you can see, the queries work alone, but when combined with an AND I
> always get 0 results. If I move the - before the field in my query, it
> works. What am I missing here?
>
> Thanks.
>


Re: Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Also, does anyone have the Solr or Lucene bug # for this?


On Tue, Jul 1, 2014 at 3:06 PM, Brett Hoerner 
wrote:

> Interesting, is there a performance impact to sending the *:*?
>
>
> On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky 
> wrote:
>
>> Yeah, there's a known bug that a negative-only query within parentheses
>> doesn't match properly - you need to add a non-negative term, such as
>> "*:*". For example:
>>
>>  text:(+"happy") AND user:(*:* -"123456789")
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Brett Hoerner
>> Sent: Tuesday, July 1, 2014 2:51 PM
>> To: solr-user@lucene.apache.org
>> Subject: Confusion about location of + and - ?
>>
>>
>> Can anyone explain the difference between these two queries?
>>
>>  text:(+"happy") AND -user:("123456789") = numFound 2912224
>>
>> But
>>
>>  text:(+"happy") AND user:(-"123456789") = numFound 0
>>
>> Now, you may just say "then just put - infront of your field, duh!" Well,
>>
>>  text:(+"happy") = numFound 2912224
>>  user:(-"123456789") = numFound 465998192
>>
>> (FWIW there is no user named 123456789 in my index)
>>
>> As you can see, the queries work alone, but when combined with an AND I
>> always get 0 results. If I move the - before the field in my query, it
>> works. What am I missing here?
>>
>> Thanks.
>>
>
>


Trouble with manually routed collection after upgrade to 4.6

2013-11-25 Thread Brett Hoerner
Hi, I've been using a collection on Solr 4.5.X for a few weeks and just did
an upgrade to 4.6 and am having some issues.

First: this collection is, I guess, implicitly routed. I do this for every
document insert using SolrJ:

  document.addField("_route_", shardId)

After upgrading the servers to 4.6 I now get the following on every
insert/delete when using either SolrJ 4.5.1 or 4.6:

  org.apache.solr.common.SolrException: No active slice servicing hash code
17b9dff6 in DocCollection

In the clusterstate *none* of my shards have a range set (they're all
null), but I thought this would be expected since I do routing myself.

Did the upgrade change something here? I didn't see anything related to
this in the upgrade notes.

Thanks,
Brett


Re: Trouble with manually routed collection after upgrade to 4.6

2013-11-25 Thread Brett Hoerner
Here's my clusterstate.json:

  https://gist.github.com/bretthoerner/a8120a8d89c93f773d70


On Mon, Nov 25, 2013 at 10:18 AM, Brett Hoerner wrote:

> Hi, I've been using a collection on Solr 4.5.X for a few weeks and just
> did an upgrade to 4.6 and am having some issues.
>
> First: this collection is, I guess, implicitly routed. I do this for every
> document insert using SolrJ:
>
>   document.addField("_route_", shardId)
>
> After upgrading the servers to 4.6 I now get the following on every
> insert/delete when using either SolrJ 4.5.1 or 4.6:
>
>   org.apache.solr.common.SolrException: No active slice servicing hash
> code 17b9dff6 in DocCollection
>
> In the clusterstate *none* of my shards have a range set (they're all
> null), but I thought this would be expected since I do routing myself.
>
> Did the upgrade change something here? I didn't see anything related to
> this in the upgrade notes.
>
> Thanks,
> Brett
>


Re: Trouble with manually routed collection after upgrade to 4.6

2013-11-25 Thread Brett Hoerner
Think I got it. For some reason this was in my clusterstate.json after the
upgrade (note that I was using 4.5.X just fine previously...):

 "router": {
   "name": "compositeId"
 },

I stopped all my nodes and manually edited this to me "implicit" (is there
a tool for this? I've always done it manually), started the cluster up
again and it's all good now.



On Mon, Nov 25, 2013 at 10:38 AM, Brett Hoerner wrote:

> Here's my clusterstate.json:
>
>   https://gist.github.com/bretthoerner/a8120a8d89c93f773d70
>
>
> On Mon, Nov 25, 2013 at 10:18 AM, Brett Hoerner wrote:
>
>> Hi, I've been using a collection on Solr 4.5.X for a few weeks and just
>> did an upgrade to 4.6 and am having some issues.
>>
>> First: this collection is, I guess, implicitly routed. I do this for
>> every document insert using SolrJ:
>>
>>   document.addField("_route_", shardId)
>>
>> After upgrading the servers to 4.6 I now get the following on every
>> insert/delete when using either SolrJ 4.5.1 or 4.6:
>>
>>   org.apache.solr.common.SolrException: No active slice servicing hash
>> code 17b9dff6 in DocCollection
>>
>> In the clusterstate *none* of my shards have a range set (they're all
>> null), but I thought this would be expected since I do routing myself.
>>
>> Did the upgrade change something here? I didn't see anything related to
>> this in the upgrade notes.
>>
>> Thanks,
>> Brett
>>
>
>


After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-07 Thread Brett Hoerner
I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ
4.6.1 and indexing ceased (indexer returned "No live servers for shard" but
the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
fine for the query side, just not adding documents.



21:35:21.508 [qtp1418442930-22296231] ERROR
o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)


Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-07 Thread Brett Hoerner
On Fri, Feb 7, 2014 at 6:15 PM, Mark Miller  wrote:

> You have to update the other nodes to 4.6.1 as well.
>

I'm not sure I follow, all of the Solr instances in the cluster are 4.6.1
to my knowledge?

Thanks,
Brett


Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Brett Hoerner
Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I
verified 4.6.1 is definitely winning and included alone when it breaks.


On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller  wrote:

> If that is the case we really have to dig in. Given the error, the first
> thing I would assume is that you have an old solrj jar or something before
> 4.6.1 involved with a 4.6.1 solrj jar or install.
>
> - Mark
>
> http://about.me/markrmiller
>
>
>
> On Feb 7, 2014, 7:15:24 PM, Mark Miller  wrote:
> Hey, yeah, blew it on this one. Someone just reported it the other day -
> the way that a bug was fixed was not back and forward compatible. The first
> implementation was wrong.
>
> You have to update the other nodes to 4.6.1 as well.
>
> I’m going to look at some scripting test that can help check for this type
> of thing.
>
> - Mark
>
> http://about.me/markrmiller
>
>
>
> On Feb 7, 2014, 7:01:24 PM, Brett Hoerner  wrote:
> I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ
> 4.6.1 and indexing ceased (indexer returned "No live servers for shard" but
> the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
> fine for the query side, just not adding documents.
>
>
>
> 21:35:21.508 [qtp1418442930-22296231] ERROR
> o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
> Unknown type 19
> at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
> at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
> at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
> at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
> at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
> at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
> at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
> at
>
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
> at
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
> at
>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection$R

Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Brett Hoerner
Oh, I was talking about my indexer. That stack is from my Solr servers,
very weird since it's *supposed* to be 4.6.1 . I'll dig in more, thanks.


On Sat, Feb 8, 2014 at 10:21 AM, Mark Miller  wrote:

> If you look at the stack trace, the line numbers match 4.6.0 in the src,
> but not 4.6.1. That code couldn’t have been 4.6.1 it seems.
>
> - Mark
>
> http://about.me/markrmiller
>
> On Feb 8, 2014, at 11:12 AM, Brett Hoerner  wrote:
>
> > Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I
> > verified 4.6.1 is definitely winning and included alone when it breaks.
> >
> >
> > On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller 
> wrote:
> >
> >> If that is the case we really have to dig in. Given the error, the first
> >> thing I would assume is that you have an old solrj jar or something
> before
> >> 4.6.1 involved with a 4.6.1 solrj jar or install.
> >>
> >> - Mark
> >>
> >> http://about.me/markrmiller
> >>
> >>
> >>
> >> On Feb 7, 2014, 7:15:24 PM, Mark Miller  wrote:
> >> Hey, yeah, blew it on this one. Someone just reported it the other day -
> >> the way that a bug was fixed was not back and forward compatible. The
> first
> >> implementation was wrong.
> >>
> >> You have to update the other nodes to 4.6.1 as well.
> >>
> >> I’m going to look at some scripting test that can help check for this
> type
> >> of thing.
> >>
> >> - Mark
> >>
> >> http://about.me/markrmiller
> >>
> >>
> >>
> >> On Feb 7, 2014, 7:01:24 PM, Brett Hoerner 
> wrote:
> >> I have Solr 4.6.1 on the server and just upgraded my indexer app to
> SolrJ
> >> 4.6.1 and indexing ceased (indexer returned "No live servers for shard"
> but
> >> the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
> >> fine for the query side, just not adding documents.
> >>
> >>
> >>
> >> 21:35:21.508 [qtp1418442930-22296231] ERROR
> >> o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
> >> Unknown type 19
> >> at
> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
> >> at
> >>
> >>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
> >> at
> >>
> >>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
> >> at
> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
> >> at
> >>
> >>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
> >> at
> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
> >> at
> >>
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
> >> at
> >>
> >>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
> >> at
> >>
> >>
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
> >> at
> >> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
> >> at
> >>
> >>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> >> at
> >>
> >>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >> at
> >>
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
> >> at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
> >> at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
> >> at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
> >> at
> >>
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> >> at
> >>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> >> at
> >>
> >>
> org.eclipse.jetty.server.handler.ScopedHan

Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Brett Hoerner
Mark, you were correct. I realized I was still running a prerelease of
4.6.1 (by a handful of commits). Bounced them with proper 4.6.1 and we're
all good, sorry for the spam. :)


On Sat, Feb 8, 2014 at 10:29 AM, Brett Hoerner wrote:

> Oh, I was talking about my indexer. That stack is from my Solr servers,
> very weird since it's *supposed* to be 4.6.1 . I'll dig in more, thanks.
>
>
> On Sat, Feb 8, 2014 at 10:21 AM, Mark Miller wrote:
>
>> If you look at the stack trace, the line numbers match 4.6.0 in the src,
>> but not 4.6.1. That code couldn’t have been 4.6.1 it seems.
>>
>> - Mark
>>
>> http://about.me/markrmiller
>>
>> On Feb 8, 2014, at 11:12 AM, Brett Hoerner 
>> wrote:
>>
>> > Hmmm, I'm assembling into an uberjar that forces uniqueness of classes.
>> I
>> > verified 4.6.1 is definitely winning and included alone when it breaks.
>> >
>> >
>> > On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller 
>> wrote:
>> >
>> >> If that is the case we really have to dig in. Given the error, the
>> first
>> >> thing I would assume is that you have an old solrj jar or something
>> before
>> >> 4.6.1 involved with a 4.6.1 solrj jar or install.
>> >>
>> >> - Mark
>> >>
>> >> http://about.me/markrmiller
>> >>
>> >>
>> >>
>> >> On Feb 7, 2014, 7:15:24 PM, Mark Miller  wrote:
>> >> Hey, yeah, blew it on this one. Someone just reported it the other day
>> -
>> >> the way that a bug was fixed was not back and forward compatible. The
>> first
>> >> implementation was wrong.
>> >>
>> >> You have to update the other nodes to 4.6.1 as well.
>> >>
>> >> I’m going to look at some scripting test that can help check for this
>> type
>> >> of thing.
>> >>
>> >> - Mark
>> >>
>> >> http://about.me/markrmiller
>> >>
>> >>
>> >>
>> >> On Feb 7, 2014, 7:01:24 PM, Brett Hoerner 
>> wrote:
>> >> I have Solr 4.6.1 on the server and just upgraded my indexer app to
>> SolrJ
>> >> 4.6.1 and indexing ceased (indexer returned "No live servers for
>> shard" but
>> >> the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
>> >> fine for the query side, just not adding documents.
>> >>
>> >>
>> >>
>> >> 21:35:21.508 [qtp1418442930-22296231] ERROR
>> >> o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
>> >> Unknown type 19
>> >> at
>> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
>> >> at
>> >>
>> >>
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
>> >> at
>> >>
>> >>
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
>> >> at
>> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
>> >> at
>> >>
>> >>
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
>> >> at
>> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
>> >> at
>> >>
>> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
>> >> at
>> >>
>> >>
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
>> >> at
>> >>
>> >>
>> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
>> >> at
>> >>
>> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
>> >> at
>> >>
>> >>
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>> >> at
>> >>
>> >>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> >> at
>> >>
>> >>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
>> >> at
>> >>
>> >>
&

Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)

2014-09-16 Thread Brett Hoerner
I have a very weird problem that I'm going to try to describe here to see
if anyone has any "ah-ha" moments or clues. I haven't created a small
reproducible project for this but I guess I will have to try in the future
if I can't figure it out. (Or I'll need to bisect by running long Hadoop
jobs...)

So, the facts:

* Have been successfully using Solr mapred to build very large Solr
clusters for months
* As of Solr 4.10 *some* job sizes repeatably hang in the MTree merge phase
in 4.10
* Those same jobs (same input, output, and Hadoop cluster itself) succeed
if I only change my Solr deps to 4.9
* The job *does succeed* in 4.10 if I use the same data to create more, but
smaller shards (e.g. 12x as many shards each 1/12th the size of the job
that fails)
* Creating my "normal size" shards (the size I want, that works in 4.9) the
job hangs with 2 mappers running, 0 reducers in the MTree merge phase
* There are no errors or warning in the syslog/stderr of the MTree mappers,
no errors ever echo'd back to the "interactive run" of the job (mapper says
100%, reduce says 0%, will stay forever)
* No CPU being used on the boxes running the merge, no GC happening, JVM
waiting on a futex, all threads blocked on various queues
* No disk usage problems, nothing else obviously wrong with any box in the
cluster

I diff'ed around between 4.10 and 4.9 and barely see any changes in mapred
contrib, mostly some test stuff. I didn't see any transitive dependency
changes in Solr/Lucene that look like they would affect me.


Re: Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)

2014-09-23 Thread Brett Hoerner
)
... 12 more




[...snip...] another similar failure:




14/09/23 17:52:55 INFO mapreduce.Job: Task Id :
attempt_1411487144915_0006_r_46_0, Status : FAILED
Error: java.io.IOException: org.apache.solr.common.SolrException: Error
opening new searcher
at
org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:307)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1421)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:615)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1648)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1625)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
at org.apache.solr.hadoop.BatchWriter.close(BatchWriter.java:200)
at
org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:295)
... 8 more
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware problem?) : expected=d9019857 actual=632aa4e2
(resource=BufferedChecksumIndexInput(_1i_Lucene41_0.tip))
at
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:211)
at
org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:268)
at
org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.(BlockTreeTermsReader.java:125)
at
org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:441)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:197)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:254)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:120)
at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:108)
at
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:143)
at
org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:237)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:104)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:426)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:277)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476)
... 25 more


On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner 
wrote:

> I have a very weird problem that I'm going to try to describe here to see
> if anyone has any "ah-ha" moments or clues. I haven't created a small
> reproducible project for this but I guess I will have to try in the future
> if I can't figure it out. (Or I'll need to bisect by running long Hadoop
> jobs...)
>
> So, the facts:
>
> * Have been successfully using Solr mapred

Re: Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)

2014-09-23 Thread Brett Hoerner
To be clear, those exceptions are during the "main" mapred job that is
creating the many small indexes. If these errors above occur (they don't
fail the job), I am 99% sure that is when the MTree job later hangs.

On Tue, Sep 23, 2014 at 1:02 PM, Brett Hoerner 
wrote:

> I believe these are related (they are new to me), anyone seen anything
> like this in Solr mapred?
>
>
>
> Error: java.io.IOException:
> org.apache.solr.client.solrj.SolrServerException:
> org.apache.solr.client.solrj.SolrServerException:
> org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
> problem?) : expected=5fb8f6da actual=8b048ec4
> (resource=BufferedChecksumIndexInput(_1e_Lucene41_0.tip))
> at
> org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:307)
> at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: org.apache.solr.client.solrj.SolrServerException:
> org.apache.solr.client.solrj.SolrServerException:
> org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
> problem?) : expected=5fb8f6da actual=8b048ec4
> (resource=BufferedChecksumIndexInput(_1e_Lucene41_0.tip))
> at
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
> at
> org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
> at org.apache.solr.hadoop.BatchWriter.close(BatchWriter.java:200)
> at
> org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:295)
> ... 8 more
> Caused by: org.apache.solr.client.solrj.SolrServerException:
> org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
> problem?) : expected=5fb8f6da actual=8b048ec4
> (resource=BufferedChecksumIndexInput(_1e_Lucene41_0.tip))
> at
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155)
> ... 12 more
> Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed
> (hardware problem?) : expected=5fb8f6da actual=8b048ec4
> (resource=BufferedChecksumIndexInput(_1e_Lucene41_0.tip))
> at
> org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:211)
> at
> org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:268)
> at
> org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.(BlockTreeTermsReader.java:125)
> at
> org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:441)
> at
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:197)
> at
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:254)
> at
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:120)
> at
> org.apache.lucene.index.SegmentReader.(SegmentReader.java:108)
> at
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:143)
> at
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282)
> at
> org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3315)
> at
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3306)
> at
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3020)
> at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3169)
> at
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3136)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:582)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateP

Solr mapred MTree merge stage ~6x slower in 4.10

2014-09-25 Thread Brett Hoerner
As an update to this thread, it seems my MTree wasn't completely hanging,
it was just much slower in 4.10.

If I replace 4.9.0 with 4.10 in my jar the MTree merge stage is 6x (or
more) slower (in my case, 20 min becomes 2 hours). I hope to bisect this in
the future, but the jobs I'm running take a long time. I haven't tried to
see if the issue shows on smaller jobs yet (does 1 minute become 6
minutes?).

Brett




On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner 
wrote:

> I have a very weird problem that I'm going to try to describe here to see
> if anyone has any "ah-ha" moments or clues. I haven't created a small
> reproducible project for this but I guess I will have to try in the future
> if I can't figure it out. (Or I'll need to bisect by running long Hadoop
> jobs...)
>
> So, the facts:
>
> * Have been successfully using Solr mapred to build very large Solr
> clusters for months
> * As of Solr 4.10 *some* job sizes repeatably hang in the MTree merge
> phase in 4.10
> * Those same jobs (same input, output, and Hadoop cluster itself) succeed
> if I only change my Solr deps to 4.9
> * The job *does succeed* in 4.10 if I use the same data to create more,
> but smaller shards (e.g. 12x as many shards each 1/12th the size of the job
> that fails)
> * Creating my "normal size" shards (the size I want, that works in 4.9)
> the job hangs with 2 mappers running, 0 reducers in the MTree merge phase
> * There are no errors or warning in the syslog/stderr of the MTree
> mappers, no errors ever echo'd back to the "interactive run" of the job
> (mapper says 100%, reduce says 0%, will stay forever)
> * No CPU being used on the boxes running the merge, no GC happening, JVM
> waiting on a futex, all threads blocked on various queues
> * No disk usage problems, nothing else obviously wrong with any box in the
> cluster
>
> I diff'ed around between 4.10 and 4.9 and barely see any changes in mapred
> contrib, mostly some test stuff. I didn't see any transitive dependency
> changes in Solr/Lucene that look like they would affect me.
>


Advice for using Solr 4.5 custom sharding to handle rolling time-oriented event data

2013-10-01 Thread Brett Hoerner
I'm interesting in using the new custom sharding features in the
collections API to search a rolling window of event data. I'd appreciate a
spot/sanity check of my plan/understanding.

Say I only care about the last 7 days of events and I have thousands per
second (billions per week).

Am I correct that I could create a new shard for each hour, and send events
that happen in those hour with the ID (uniqueKey) of
`new_event_hour!event_id` so that each hour block of events goes into one
shard?

I *always* query these events by the time in which they occurred, which is
another TrieInt field that I index with every document. So at query time I
would need to calculate the range the user cared about and send something
like _route_=hour1&_route_=hour2 if I wanted to only query those two
shards. (I *can* set multiple _route_ arguments in one query, right? And
Solr will handle merging results like it would with any other cores?)

Some scheduled task would drop and delete shards after they were more than
7 days old.

Does all of that make sense? Do you see a smarter way to do large
"time-oriented" search in SolrCloud?

Thanks!


Problems with maxShardsPerNode in 4.5

2013-10-01 Thread Brett Hoerner
It seems that changes in 4.5 collection configuration now require users to
set a maxShardsPerNode (or it defaults to 1).

Maybe this was the case before, but with the new CREATESHARD API it seems a
very restrictive. I've just created a very simple test collection on 3
machines where I set maxShardsPerNode at collection creation time to 1, and
I made 3 shards. Everything is good.

Now I want a 4th shard, it seems impossible to create because the cluster
"knows" I should only have 1 shard per node. Yet my problem doesn't require
more hardware, I just my new shard to exist on one of the existing servers.

So I try again -- I create a collection with 3 shards and set
maxShardsPerNode to 1000 (just as a silly test). Everything is good.

Now I add shard4 and it immediately tries to add 1000 replicas of shard4...

You can see my earlier email today about time-oriented data in 4.5 to see
what I'm trying to do. I was hoping to have 1 shard per hour/day with the
ability to easily add/drop them as I move the time window (say, a week of
data, 1 per day).

Am I missing something?

Thanks!


Re: Problems with maxShardsPerNode in 4.5

2013-10-01 Thread Brett Hoerner
Related, 1 more try:

Created collection starting with 4 shards on 1 box. Had to set
maxShardsPerNode to 4 to do this.

Now I want to "roll over" my time window, so to attempt to deal with the
problems noted above I delete the oldest shard first. That works fine.

Now I try to add my new shard, which works, but again it defaults to
"maxShardsPerNode" # of replicas, so I'm left with:

* [deleted by me] hour0
* hour1 - 1 replica
* hour2 - 1 replica
* hour3 - 1 replica
* hour4 - 4 replicas [ << the one I created after deleting hour0]

Still at a loss as to how I would create 1 new shard with 1 replica on any
server in 4.5?

Thanks!


On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner wrote:

> It seems that changes in 4.5 collection configuration now require users to
> set a maxShardsPerNode (or it defaults to 1).
>
> Maybe this was the case before, but with the new CREATESHARD API it seems
> a very restrictive. I've just created a very simple test collection on 3
> machines where I set maxShardsPerNode at collection creation time to 1, and
> I made 3 shards. Everything is good.
>
> Now I want a 4th shard, it seems impossible to create because the cluster
> "knows" I should only have 1 shard per node. Yet my problem doesn't require
> more hardware, I just my new shard to exist on one of the existing servers.
>
> So I try again -- I create a collection with 3 shards and set
> maxShardsPerNode to 1000 (just as a silly test). Everything is good.
>
> Now I add shard4 and it immediately tries to add 1000 replicas of shard4...
>
> You can see my earlier email today about time-oriented data in 4.5 to see
> what I'm trying to do. I was hoping to have 1 shard per hour/day with the
> ability to easily add/drop them as I move the time window (say, a week of
> data, 1 per day).
>
> Am I missing something?
>
> Thanks!
>


Re: Problems with maxShardsPerNode in 4.5

2013-10-02 Thread Brett Hoerner
Shalin,

Thanks for the fix. There's still part of the underlying issue that I
consider a bug or a documentation problem: how do I adjust maxShardsPerNode
after my collection has been created, and/or how can I disable it being
checked/used at all? It seems odd to me that I have to set it to an odd
number like 1000 just to get around this?

Thanks!


On Wed, Oct 2, 2013 at 12:04 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Thanks for reporting this Brett. This is indeed a bug. A workaround is to
> specify replicationFactor=1 with the createShard command which will create
> only one replica even if maxShardsPerNode=1000 at collection level.
>
> I'll open an issue.
>
>
> On Wed, Oct 2, 2013 at 7:25 AM, Brett Hoerner  >wrote:
>
> > Related, 1 more try:
> >
> > Created collection starting with 4 shards on 1 box. Had to set
> > maxShardsPerNode to 4 to do this.
> >
> > Now I want to "roll over" my time window, so to attempt to deal with the
> > problems noted above I delete the oldest shard first. That works fine.
> >
> > Now I try to add my new shard, which works, but again it defaults to
> > "maxShardsPerNode" # of replicas, so I'm left with:
> >
> > * [deleted by me] hour0
> > * hour1 - 1 replica
> > * hour2 - 1 replica
> > * hour3 - 1 replica
> > * hour4 - 4 replicas [ << the one I created after deleting hour0]
> >
> > Still at a loss as to how I would create 1 new shard with 1 replica on
> any
> > server in 4.5?
> >
> > Thanks!
> >
> >
> > On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner  > >wrote:
> >
> > > It seems that changes in 4.5 collection configuration now require users
> > to
> > > set a maxShardsPerNode (or it defaults to 1).
> > >
> > > Maybe this was the case before, but with the new CREATESHARD API it
> seems
> > > a very restrictive. I've just created a very simple test collection on
> 3
> > > machines where I set maxShardsPerNode at collection creation time to 1,
> > and
> > > I made 3 shards. Everything is good.
> > >
> > > Now I want a 4th shard, it seems impossible to create because the
> cluster
> > > "knows" I should only have 1 shard per node. Yet my problem doesn't
> > require
> > > more hardware, I just my new shard to exist on one of the existing
> > servers.
> > >
> > > So I try again -- I create a collection with 3 shards and set
> > > maxShardsPerNode to 1000 (just as a silly test). Everything is good.
> > >
> > > Now I add shard4 and it immediately tries to add 1000 replicas of
> > shard4...
> > >
> > > You can see my earlier email today about time-oriented data in 4.5 to
> see
> > > what I'm trying to do. I was hoping to have 1 shard per hour/day with
> the
> > > ability to easily add/drop them as I move the time window (say, a week
> of
> > > data, 1 per day).
> > >
> > > Am I missing something?
> > >
> > > Thanks!
> > >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
I'm curious what the later "shard-local" bits do, if anything?

I have a very large cluster (256 shards) and I'm sending most of my data
with a single "composite", e.g. 1234!, but I'm noticing the data
is being split among many of the shards.

My guess right now is that since I'm only using the default 16 bits my data
is being split across multiple shards (because of my high # of shards).

Thanks,
Brett


Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
Router is definitely compositeId.

To be clear, data isn't being spread evenly... it's like it's *almost*
working. It's just odd to me that I'm slamming in data that's 99% of one
_route_ key yet after a few minutes (from a fresh empty index) I have 2
shards with a sizeable amount of data (68M and 128M) and the rest are very
small as expected.

The fact that two are receiving so much makes me think my data is being
split into two shards. I'm trying to debug more now.


On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley  wrote:

> On Tue, Oct 8, 2013 at 6:29 PM, Brett Hoerner 
> wrote:
> > I'm curious what the later "shard-local" bits do, if anything?
> >
> > I have a very large cluster (256 shards) and I'm sending most of my data
> > with a single "composite", e.g. 1234!, but I'm noticing the
> data
> > is being split among many of the shards.
>
> That shouldn't be the case.  All of your shards should have a lower
> hash value with all 0 bits and an upper hash value of all 1s (i.e.
> 0x to 0x)
> So you see any shards where that's not true?
>
> Also, is the router set to compositeId?
>
> -Yonik
>
> > My guess right now is that since I'm only using the default 16 bits my
> data
> > is being split across multiple shards (because of my high # of shards).
> >
> > Thanks,
> > Brett
>


Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
This is my clusterstate.json:
https://gist.github.com/bretthoerner/0098f741f48f9bb51433

And these are my core sizes (note large ones are sorted to the end):
https://gist.github.com/bretthoerner/f5b5e099212194b5dff6

I've only "heavily sent" 2 shards by now (I'm sharding by hour and it's
been running for 2). There *is* a little old data in my stream, but not
that much (like <5%). What's confusing to me is that 5 of them are rather
large, when I'd expect 2 of them to be.


On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley  wrote:

> On Tue, Oct 8, 2013 at 6:29 PM, Brett Hoerner 
> wrote:
> > I'm curious what the later "shard-local" bits do, if anything?
> >
> > I have a very large cluster (256 shards) and I'm sending most of my data
> > with a single "composite", e.g. 1234!, but I'm noticing the
> data
> > is being split among many of the shards.
>
> That shouldn't be the case.  All of your shards should have a lower
> hash value with all 0 bits and an upper hash value of all 1s (i.e.
> 0x to 0x)
> So you see any shards where that's not true?
>
> Also, is the router set to compositeId?
>
> -Yonik
>
> > My guess right now is that since I'm only using the default 16 bits my
> data
> > is being split across multiple shards (because of my high # of shards).
> >
> > Thanks,
> > Brett
>


Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
I have a silly question, how do I query a single shard in SolrCloud? When I
hit solr/foo_shard1_replica1/select it always seems to do a full cluster
query.

I can't (easily) do a _route_ query before I know what each have.


On Tue, Oct 8, 2013 at 7:06 PM, Yonik Seeley  wrote:

> On Tue, Oct 8, 2013 at 7:31 PM, Brett Hoerner 
> wrote:
> > This is my clusterstate.json:
> > https://gist.github.com/bretthoerner/0098f741f48f9bb51433
> >
> > And these are my core sizes (note large ones are sorted to the end):
> > https://gist.github.com/bretthoerner/f5b5e099212194b5dff6
> >
> > I've only "heavily sent" 2 shards by now (I'm sharding by hour and it's
> > been running for 2). There *is* a little old data in my stream, but not
> > that much (like <5%). What's confusing to me is that 5 of them are rather
> > large, when I'd expect 2 of them to be.
>
> The cluster state looks fine at first glance... and each route key
> should map to a single shard.
> You could try a query to each of the big shards and see what IDs are in
> them.
>
> -Yonik
>


Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-08 Thread Brett Hoerner
Ignore me I forgot about shards= from the wiki.


On Tue, Oct 8, 2013 at 7:11 PM, Brett Hoerner wrote:

> I have a silly question, how do I query a single shard in SolrCloud? When
> I hit solr/foo_shard1_replica1/select it always seems to do a full cluster
> query.
>
> I can't (easily) do a _route_ query before I know what each have.
>
>
> On Tue, Oct 8, 2013 at 7:06 PM, Yonik Seeley  wrote:
>
>> On Tue, Oct 8, 2013 at 7:31 PM, Brett Hoerner 
>> wrote:
>> > This is my clusterstate.json:
>> > https://gist.github.com/bretthoerner/0098f741f48f9bb51433
>> >
>> > And these are my core sizes (note large ones are sorted to the end):
>> > https://gist.github.com/bretthoerner/f5b5e099212194b5dff6
>> >
>> > I've only "heavily sent" 2 shards by now (I'm sharding by hour and it's
>> > been running for 2). There *is* a little old data in my stream, but not
>> > that much (like <5%). What's confusing to me is that 5 of them are
>> rather
>> > large, when I'd expect 2 of them to be.
>>
>> The cluster state looks fine at first glance... and each route key
>> should map to a single shard.
>> You could try a query to each of the big shards and see what IDs are in
>> them.
>>
>> -Yonik
>>
>
>


Re: What's the purpose of the bits option in compositeId (Solr 4.5)?

2013-10-11 Thread Brett Hoerner
Thanks folks,

As an update for future readers --- the problem was on my side (my logic in
picking the _route_ was flawed) as expected. :)


On Tue, Oct 8, 2013 at 7:35 PM, Yonik Seeley  wrote:

> On Tue, Oct 8, 2013 at 8:27 PM, Shawn Heisey  wrote:
> > There is also the "distrib=false" parameter that will cause the request
> to
> > be handled directly by the core it is sent to rather than being
> > distributed/balanced by SolrCloud.
>
> Right - this is probably the best option for diagnosing what is in what
> index.
>
> -Yonik
>


SolrCloud facet query repeatably fails with "No live SolrServers" for some terms, not all

2013-05-01 Thread Brett Hoerner
An example:
https://gist.github.com/bretthoerner/2ffc362450bcd4c2487a

I'll note that all shards and replicas show as "Up" (green) in the Admin UI.

Does anyone know how this could happen? I can repeat this over and over
with the same terms. It was my understanding that something like a facet
query would need to go to *all* shards for any query (I'm using the default
SolrCloud sharding mechanism, nothing special).

How could a text field search for 'happy' always work and 'austin' always
return an error, shouldn't that "down server" be hit for a 'happy' query
also?

Thanks,
Brett


SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)

2012-12-04 Thread Brett Hoerner
Hi,

I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection,
which I called "default" and haven't used since. I'm using an external ZK
ensemble that was completely empty before I started this cloud.

Once I had all 4 nodes in the cloud I used the collection API to create the
real collections I wanted. I also tested that deleting works.

For example,

# this worked
curl "
http://localhost:8984/solr/admin/collections?action=CREATE&name=15678&numShards=4
"

# this worked
curl "http://localhost:8984/solr/admin/collections?action=DELETE&name=15678";

Next, I started my indexer service which happily sent many, many updates to
the cloud. Queries against the collections also work just fine.

Finally, a few hours later, I tried doing a create and a delete. Both
operations did nothing, although Solr replied with a "200 OK".

$ curl -i "
http://localhost:8984/solr/admin/collections?action=CREATE&name=15679&numShards=4
"
HTTP/1.1 200 OK
Content-Type: application/xml; charset=UTF-8
Transfer-Encoding: chunked



03

There is nothing in the stdout/stderr logs, nor the Java logs (I have it
set to WARN).

I have tried bouncing the nodes and it doesn't change anything.

Any ideas? How can I further debug this or what else can I provide?


Re: SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)

2012-12-07 Thread Brett Hoerner
For what it's worth this is the log output with DEBUG on,

Dec 07, 2012 2:00:48 PM org.apache.solr.handler.admin.CollectionsHandler
handleCreateAction
INFO: Creating Collection : action=CREATE&name=foo&numShards=4
Dec 07, 2012 2:01:03 PM org.apache.solr.core.SolrCore execute
INFO: [15671] webapp=/solr path=/admin/system params={wt=json} status=0
QTime=5
Dec 07, 2012 2:01:15 PM org.apache.solr.handler.admin.CollectionsHandler
handleDeleteAction
INFO: Deleting Collection : action=DELETE&name=default
Dec 07, 2012 2:01:20 PM org.apache.solr.core.SolrCore execute

Neither the CREATE or DELETE actually did anything, though. (Again, HTTP
200 OK)

Still stuck here, any ideas?

Brett


On Tue, Dec 4, 2012 at 7:19 PM, Brett Hoerner wrote:

> Hi,
>
> I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection,
> which I called "default" and haven't used since. I'm using an external ZK
> ensemble that was completely empty before I started this cloud.
>
> Once I had all 4 nodes in the cloud I used the collection API to create
> the real collections I wanted. I also tested that deleting works.
>
> For example,
>
> # this worked
> curl "
> http://localhost:8984/solr/admin/collections?action=CREATE&name=15678&numShards=4
> "
>
> # this worked
> curl "
> http://localhost:8984/solr/admin/collections?action=DELETE&name=15678";
>
> Next, I started my indexer service which happily sent many, many updates
> to the cloud. Queries against the collections also work just fine.
>
> Finally, a few hours later, I tried doing a create and a delete. Both
> operations did nothing, although Solr replied with a "200 OK".
>
> $ curl -i "
> http://localhost:8984/solr/admin/collections?action=CREATE&name=15679&numShards=4
> "
> HTTP/1.1 200 OK
> Content-Type: application/xml; charset=UTF-8
> Transfer-Encoding: chunked
>
> 
> 
> 0 name="QTime">3
>
> There is nothing in the stdout/stderr logs, nor the Java logs (I have it
> set to WARN).
>
> I have tried bouncing the nodes and it doesn't change anything.
>
> Any ideas? How can I further debug this or what else can I provide?
>


Re: SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)

2012-12-09 Thread Brett Hoerner
Thanks,

It looks like my cluster is in a wedged state after I tried to delete a
collection that didn't exist. There are about 80 items in the queue after
the delete op (that it can't get by). Is that a known bug?

I guess for now I'll just check that a collection exists before sending any
deletes. :)

Brett


On Fri, Dec 7, 2012 at 10:50 AM, Mark Miller  wrote:

> Anything in any of the other logs (the other nodes)? The key is getting
> the logs from the node designated as the overseer - it should hopefully
> have the error.
>
> Right now because you pass this stuff off to the overseer, you will always
> get back a 200 - there is a JIRA issue that addresses this though
> (collection API responses) and I hope to get it committed soon.
>
> - Mark
>
> On Dec 7, 2012, at 7:26 AM, Brett Hoerner  wrote:
>
> > For what it's worth this is the log output with DEBUG on,
> >
> > Dec 07, 2012 2:00:48 PM org.apache.solr.handler.admin.CollectionsHandler
> > handleCreateAction
> > INFO: Creating Collection : action=CREATE&name=foo&numShards=4
> > Dec 07, 2012 2:01:03 PM org.apache.solr.core.SolrCore execute
> > INFO: [15671] webapp=/solr path=/admin/system params={wt=json} status=0
> > QTime=5
> > Dec 07, 2012 2:01:15 PM org.apache.solr.handler.admin.CollectionsHandler
> > handleDeleteAction
> > INFO: Deleting Collection : action=DELETE&name=default
> > Dec 07, 2012 2:01:20 PM org.apache.solr.core.SolrCore execute
> >
> > Neither the CREATE or DELETE actually did anything, though. (Again, HTTP
> > 200 OK)
> >
> > Still stuck here, any ideas?
> >
> > Brett
> >
> >
> > On Tue, Dec 4, 2012 at 7:19 PM, Brett Hoerner  >wrote:
> >
> >> Hi,
> >>
> >> I have a Cloud setup of 4 machines. I bootstrapped them with 1
> collection,
> >> which I called "default" and haven't used since. I'm using an external
> ZK
> >> ensemble that was completely empty before I started this cloud.
> >>
> >> Once I had all 4 nodes in the cloud I used the collection API to create
> >> the real collections I wanted. I also tested that deleting works.
> >>
> >> For example,
> >>
> >> # this worked
> >> curl "
> >>
> http://localhost:8984/solr/admin/collections?action=CREATE&name=15678&numShards=4
> >> "
> >>
> >> # this worked
> >> curl "
> >> http://localhost:8984/solr/admin/collections?action=DELETE&name=15678";
> >>
> >> Next, I started my indexer service which happily sent many, many updates
> >> to the cloud. Queries against the collections also work just fine.
> >>
> >> Finally, a few hours later, I tried doing a create and a delete. Both
> >> operations did nothing, although Solr replied with a "200 OK".
> >>
> >> $ curl -i "
> >>
> http://localhost:8984/solr/admin/collections?action=CREATE&name=15679&numShards=4
> >> "
> >> HTTP/1.1 200 OK
> >> Content-Type: application/xml; charset=UTF-8
> >> Transfer-Encoding: chunked
> >>
> >> 
> >> 
> >> 0 >> name="QTime">3
> >>
> >> There is nothing in the stdout/stderr logs, nor the Java logs (I have it
> >> set to WARN).
> >>
> >> I have tried bouncing the nodes and it doesn't change anything.
> >>
> >> Any ideas? How can I further debug this or what else can I provide?
> >>
>
>


Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-19 Thread Brett Hoerner
I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm
trying out 4.1 RC1 right now but the update URL I used to use is returning
HTTP 404.

For example, I would post my document updates to,

http://localhost:8983/solr/collection1

But that is 404ing now (collection1 exists according to the admin UI, all
shards are green and happy, and data dirs exist on the nodes).

I also tried the following,

http://localhost:8983/solr/collection1/update

And also received a 404 there.

A specific example from the Java client:

22:38:12.474 [pool-7-thread-14] ERROR com.massrel.faassolr.SolrBackend -
Error while flushing to Solr.
org.apache.solr.common.SolrException: Server at
http://backfill-2d.i.massrel.com:8983/solr/15724/update returned non ok
status:404, message:Not Found
 at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
 at
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:438)
~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]

But I can hit that URL with a GET,

$ curl http://backfill-1d.i.massrel.com:8983/solr/15724/update


4002missing content
stream400


Thoughts?

Thanks.


Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-19 Thread Brett Hoerner
I'm actually wondering if this other issue I've been having is a problem:

https://issues.apache.org/jira/browse/SOLR-4321

The fact that some nodes don't "get" pieces of a collection could explain
the 404.

That said, even when a node has "parts" of a collection it reports 404
sometimes. What's odd is that I can use curl to post a JSON document to the
same URL and it will return 200.

When I log every request I make from my indexer process (using solr4j) it's
about 50/50 between 404 and 200...


On Sat, Jan 19, 2013 at 5:22 PM, Brett Hoerner wrote:

> I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm
> trying out 4.1 RC1 right now but the update URL I used to use is returning
> HTTP 404.
>
> For example, I would post my document updates to,
>
> http://localhost:8983/solr/collection1
>
> But that is 404ing now (collection1 exists according to the admin UI, all
> shards are green and happy, and data dirs exist on the nodes).
>
> I also tried the following,
>
> http://localhost:8983/solr/collection1/update
>
> And also received a 404 there.
>
> A specific example from the Java client:
>
> 22:38:12.474 [pool-7-thread-14] ERROR com.massrel.faassolr.SolrBackend -
> Error while flushing to Solr.
> org.apache.solr.common.SolrException: Server at
> http://backfill-2d.i.massrel.com:8983/solr/15724/update returned non ok
> status:404, message:Not Found
>  at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>  at
> org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:438)
> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>
> But I can hit that URL with a GET,
>
> $ curl http://backfill-1d.i.massrel.com:8983/solr/15724/update
> 
> 
> 400 name="QTime">2missing content
> stream400
> 
>
> Thoughts?
>
> Thanks.
>


Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-20 Thread Brett Hoerner
So the ticket I created wasn't related, there is a working patch for that
now but my original issue remains, I get 404 when trying to post updates to
a URL that worked fine in Solr 4.0.


On Sat, Jan 19, 2013 at 5:56 PM, Brett Hoerner wrote:

> I'm actually wondering if this other issue I've been having is a problem:
>
> https://issues.apache.org/jira/browse/SOLR-4321
>
> The fact that some nodes don't "get" pieces of a collection could explain
> the 404.
>
> That said, even when a node has "parts" of a collection it reports 404
> sometimes. What's odd is that I can use curl to post a JSON document to the
> same URL and it will return 200.
>
> When I log every request I make from my indexer process (using solr4j)
> it's about 50/50 between 404 and 200...
>
>
>  On Sat, Jan 19, 2013 at 5:22 PM, Brett Hoerner wrote:
>
>> I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm
>> trying out 4.1 RC1 right now but the update URL I used to use is returning
>> HTTP 404.
>>
>> For example, I would post my document updates to,
>>
>> http://localhost:8983/solr/collection1
>>
>> But that is 404ing now (collection1 exists according to the admin UI, all
>> shards are green and happy, and data dirs exist on the nodes).
>>
>> I also tried the following,
>>
>> http://localhost:8983/solr/collection1/update
>>
>> And also received a 404 there.
>>
>> A specific example from the Java client:
>>
>> 22:38:12.474 [pool-7-thread-14] ERROR com.massrel.faassolr.SolrBackend -
>> Error while flushing to Solr.
>> org.apache.solr.common.SolrException: Server at
>> http://backfill-2d.i.massrel.com:8983/solr/15724/update returned non ok
>> status:404, message:Not Found
>>  at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>>  at
>> org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:438)
>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>> at
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>>
>> But I can hit that URL with a GET,
>>
>> $ curl http://backfill-1d.i.massrel.com:8983/solr/15724/update
>> 
>> 
>> 400> name="QTime">2missing content
>> stream400
>> 
>>
>> Thoughts?
>>
>> Thanks.
>>
>
>


Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-20 Thread Brett Hoerner
Sorry, I take it back. It looks like fixing
https://issues.apache.org/jira/browse/SOLR-4321 fixed my issue after all.


On Sun, Jan 20, 2013 at 2:21 PM, Brett Hoerner wrote:

> So the ticket I created wasn't related, there is a working patch for that
> now but my original issue remains, I get 404 when trying to post updates to
> a URL that worked fine in Solr 4.0.
>
>
> On Sat, Jan 19, 2013 at 5:56 PM, Brett Hoerner wrote:
>
>> I'm actually wondering if this other issue I've been having is a problem:
>>
>> https://issues.apache.org/jira/browse/SOLR-4321
>>
>> The fact that some nodes don't "get" pieces of a collection could explain
>> the 404.
>>
>> That said, even when a node has "parts" of a collection it reports 404
>> sometimes. What's odd is that I can use curl to post a JSON document to the
>> same URL and it will return 200.
>>
>> When I log every request I make from my indexer process (using solr4j)
>> it's about 50/50 between 404 and 200...
>>
>>
>>  On Sat, Jan 19, 2013 at 5:22 PM, Brett Hoerner 
>> wrote:
>>
>>> I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm
>>> trying out 4.1 RC1 right now but the update URL I used to use is returning
>>> HTTP 404.
>>>
>>> For example, I would post my document updates to,
>>>
>>> http://localhost:8983/solr/collection1
>>>
>>> But that is 404ing now (collection1 exists according to the admin UI,
>>> all shards are green and happy, and data dirs exist on the nodes).
>>>
>>> I also tried the following,
>>>
>>> http://localhost:8983/solr/collection1/update
>>>
>>> And also received a 404 there.
>>>
>>> A specific example from the Java client:
>>>
>>> 22:38:12.474 [pool-7-thread-14] ERROR com.massrel.faassolr.SolrBackend -
>>> Error while flushing to Solr.
>>> org.apache.solr.common.SolrException: Server at
>>> http://backfill-2d.i.massrel.com:8983/solr/15724/update returned non ok
>>> status:404, message:Not Found
>>>  at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
>>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>>> at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>>>  at
>>> org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:438)
>>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>>> at
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44]
>>>
>>> But I can hit that URL with a GET,
>>>
>>> $ curl http://backfill-1d.i.massrel.com:8983/solr/15724/update
>>> 
>>> 
>>> 400>> name="QTime">2missing content
>>> stream400
>>> 
>>>
>>> Thoughts?
>>>
>>> Thanks.
>>>
>>
>>
>


Problem querying collection in Solr 4.1

2013-01-21 Thread Brett Hoerner
I have a collection in Solr 4.1 RC1 and doing a simple query like
text:"puppy dog" is causing an exception. Oddly enough, I CAN query for
text:puppy or text:"puppy", but adding the space breaks everything.

Schema and config: https://gist.github.com/f49da15e39e5609b75b1

This happens whether I query the whole collection or a single direct core.
I haven't tested whether this would happen outside of SolrCloud.

http://localhost:8984/solr/timeline/select?q=text%3A%22puppy+dog%22&wt=xml

http://localhost:8984/solr/timeline_shard4_replica1/select?q=text%3A%22puppy+dog%22&wt=xml

Jan 22, 2013 12:07:24 AM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request:[
http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard2_replica1,
http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard1_replica2,
http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard3_replica2,
http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard4_replica1,
http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard1_replica1,
http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard2_replica2,
http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard3_replica1,
http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard4_replica2]
 at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
 at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.client.solrj.SolrServerException: No live
SolrServers available to handle this request:[
http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard2_replica1,
http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard1_replica2,
http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard3_replica2,
http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard4_replica1,
http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard1_replica1,
http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard2_replica2,
http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard3_replica1,
http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard4_replica2]
 at
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:325)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:171)
 at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.

Re: Problem querying collection in Solr 4.1

2013-01-22 Thread Brett Hoerner
Thanks, I'll check that out.

Turns out our problem was we had omitTermFreqAndPositions true but were
running queries like "puppy dog" which, I would imagine, require position.


On Mon, Jan 21, 2013 at 9:22 PM, Gopal Patwa  wrote:

> one thing I noticed in solrconfig xml that it set to use Lucene version 4.0
> index format but you  mention you are using it 4.1
>
>   LUCENE_40
>
>
>
> On Mon, Jan 21, 2013 at 4:26 PM, Brett Hoerner  >wrote:
>
> > I have a collection in Solr 4.1 RC1 and doing a simple query like
> > text:"puppy dog" is causing an exception. Oddly enough, I CAN query for
> > text:puppy or text:"puppy", but adding the space breaks everything.
> >
> > Schema and config: https://gist.github.com/f49da15e39e5609b75b1
> >
> > This happens whether I query the whole collection or a single direct
> core.
> > I haven't tested whether this would happen outside of SolrCloud.
> >
> >
> http://localhost:8984/solr/timeline/select?q=text%3A%22puppy+dog%22&wt=xml
> >
> >
> >
> http://localhost:8984/solr/timeline_shard4_replica1/select?q=text%3A%22puppy+dog%22&wt=xml
> >
> > Jan 22, 2013 12:07:24 AM org.apache.solr.common.SolrException log
> > SEVERE: null:org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> > available to handle this request:[
> >
> http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard2_replica1,
> >
> http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard1_replica2,
> >
> http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard3_replica2,
> >
> http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard4_replica1,
> >
> http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard1_replica1,
> >
> http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard2_replica2,
> >
> http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard3_replica1,
> >
> http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard4_replica2]
> >  at
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
> >  at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
> > at
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> >  at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> >  at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> >  at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
> >  at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
> >  at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> >  at
> >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> >  at org.eclipse.jetty.server.Server.handle(Server.java:365)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
> >  at
> >
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
> >  at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
> > at org.eclipse.jetty.http.HttpP

Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-02 Thread Brett Hoerner
Hi,

I have a 5 server cluster running 1 collection with 20 shards, replication
factor of 2.

Earlier this week I had to do a rolling restart across the cluster, this
worked great and the cluster stayed up the whole time. The problem is that
the last node I restarted is now the leader of 0 shards, and is just
holding replicas.

I've noticed this node has abnormally high load average, while the other
nodes (who have the same number of shards, but more leaders on average) are
fine.

First, I'm wondering if that loud could be related to being a 5x replica
and 0x leader?

Second, I was wondering if I could somehow flag single shards to re-elect a
leader (or force a leader) so that I could more evenly distribute how many
leader shards each physical server has running?

Thanks.


Re: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-02 Thread Brett Hoerner
As an update, it looks like the heavy load is in part because the node
never "catches back up" with the other nodes. In SolrCloud UI it was yellow
for a long time, then eventually grey, then back to yellow and orange. It
never recovers as green.

I should note this collection is very busy, indexing 5k+ small documents
per second, but the nodes were all fine until I had to restart them and
they had to re-sync.

Here is the log since reboot: https://gist.github.com/396af4b217ce8f536db6

Any ideas?


On Sat, Feb 2, 2013 at 10:27 AM, Brett Hoerner wrote:

> Hi,
>
> I have a 5 server cluster running 1 collection with 20 shards, replication
> factor of 2.
>
> Earlier this week I had to do a rolling restart across the cluster, this
> worked great and the cluster stayed up the whole time. The problem is that
> the last node I restarted is now the leader of 0 shards, and is just
> holding replicas.
>
> I've noticed this node has abnormally high load average, while the other
> nodes (who have the same number of shards, but more leaders on average) are
> fine.
>
> First, I'm wondering if that loud could be related to being a 5x replica
> and 0x leader?
>
> Second, I was wondering if I could somehow flag single shards to re-elect
> a leader (or force a leader) so that I could more evenly distribute how
> many leader shards each physical server has running?
>
> Thanks.
>


Re: Is it possible to manually select a shard leader in a running SolrCloud?

2013-02-03 Thread Brett Hoerner
What is the inverse I'd use to re-create/load a core on another machine but
make sure it's also "known" to SolrCloud/as a shard?


On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale  wrote:

>
> To be more clear lets say bob it the leader of core 1. On bob do a
> /admin/cores?action=unload&name=core1. This removes the core/shard from
> bob, giving the other servers a chance to grab leader props.
>
> -Joey
>
> On Feb 2, 2013, at 11:27 AM, Brett Hoerner  wrote:
>
> > Hi,
> >
> > I have a 5 server cluster running 1 collection with 20 shards,
> replication
> > factor of 2.
> >
> > Earlier this week I had to do a rolling restart across the cluster, this
> > worked great and the cluster stayed up the whole time. The problem is
> that
> > the last node I restarted is now the leader of 0 shards, and is just
> > holding replicas.
> >
> > I've noticed this node has abnormally high load average, while the other
> > nodes (who have the same number of shards, but more leaders on average)
> are
> > fine.
> >
> > First, I'm wondering if that loud could be related to being a 5x replica
> > and 0x leader?
> >
> > Second, I was wondering if I could somehow flag single shards to
> re-elect a
> > leader (or force a leader) so that I could more evenly distribute how
> many
> > leader shards each physical server has running?
> >
> > Thanks.
>
>


Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
replication factor of 2) that I've been using for over a month now in
production.

Suddenly, the hourly cron I run that dispatches a delete by query
completely halts all indexing. Select queries still run (and quickly),
there is no CPU or disk I/O happening, but suddenly my indexer (which runs
at ~400 doc/sec steady) pauses, and everything blocks indefinitely.

To clarify some on the schema, this is a moving window of data (imagine
messages that don't matter after a 24 hour period) which are regularly
"chopped" off by my hourly cron (deleting messages over 24 hours old) to
keep the index size reasonable.

There are no errors (log level warn) in the logs. I'm not sure what to look
into. As I've said this has been running (delete included) for about a
month.

I'll also note that I have another cluster much like this one where I do
the very same thing... it has 4 machines, and indexes 10x the documents per
second, with more indexes... and yet I delete on a cron without issue...

Any ideas on where to start, or other information I could provide?

Thanks much.


Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
4.1, I'll induce it again and run jstack.


On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller  wrote:

> Which version of Solr?
>
> Can you use jconsole, visualvm, or jstack to get some stack traces and see
> where things are halting?
>
> - Mark
>
> On Mar 6, 2013, at 11:45 AM, Brett Hoerner  wrote:
>
> > I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
> > replication factor of 2) that I've been using for over a month now in
> > production.
> >
> > Suddenly, the hourly cron I run that dispatches a delete by query
> > completely halts all indexing. Select queries still run (and quickly),
> > there is no CPU or disk I/O happening, but suddenly my indexer (which
> runs
> > at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
> >
> > To clarify some on the schema, this is a moving window of data (imagine
> > messages that don't matter after a 24 hour period) which are regularly
> > "chopped" off by my hourly cron (deleting messages over 24 hours old) to
> > keep the index size reasonable.
> >
> > There are no errors (log level warn) in the logs. I'm not sure what to
> look
> > into. As I've said this has been running (delete included) for about a
> > month.
> >
> > I'll also note that I have another cluster much like this one where I do
> > the very same thing... it has 4 machines, and indexes 10x the documents
> per
> > second, with more indexes... and yet I delete on a cron without issue...
> >
> > Any ideas on where to start, or other information I could provide?
> >
> > Thanks much.
>
>


Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
Here is a dump after the delete, indexing has been stopped:
https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e

An interesting hint that I forgot to mention: it doesn't always happen on
the first delete. I manually ran the delete cron, and the server continued
to work. I waited about 5 minutes and ran it again and it stalled the
indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png

Another thing I forgot to mention. To bring the cluster back to life I:

1) stop my indexer
2) stop server1, start server1
3) stop server2, start start2
4) manually rebalance half of the shards to be mastered on server2
(unload/create on server1)
5) restart indexer

And it works again until a delete eventually kills it.

To be clear again, select queries continue to work indefinitely.

Thanks,
Brett


On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller  wrote:

> Which version of Solr?
>
> Can you use jconsole, visualvm, or jstack to get some stack traces and see
> where things are halting?
>
> - Mark
>
> On Mar 6, 2013, at 11:45 AM, Brett Hoerner  wrote:
>
> > I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
> > replication factor of 2) that I've been using for over a month now in
> > production.
> >
> > Suddenly, the hourly cron I run that dispatches a delete by query
> > completely halts all indexing. Select queries still run (and quickly),
> > there is no CPU or disk I/O happening, but suddenly my indexer (which
> runs
> > at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
> >
> > To clarify some on the schema, this is a moving window of data (imagine
> > messages that don't matter after a 24 hour period) which are regularly
> > "chopped" off by my hourly cron (deleting messages over 24 hours old) to
> > keep the index size reasonable.
> >
> > There are no errors (log level warn) in the logs. I'm not sure what to
> look
> > into. As I've said this has been running (delete included) for about a
> > month.
> >
> > I'll also note that I have another cluster much like this one where I do
> > the very same thing... it has 4 machines, and indexes 10x the documents
> per
> > second, with more indexes... and yet I delete on a cron without issue...
> >
> > Any ideas on where to start, or other information I could provide?
> >
> > Thanks much.
>
>


Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner
If there's anything I can try, let me know. Interestingly, I think I have
noticed that if I stop my indexer, do my delete, and restart the indexer
then I'm fine. Which goes along with the update thread contention theory.


On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller  wrote:

> This is what I see:
>
> We currently limit the number of outstanding update requests at one time
> to avoid a crazy number of threads being used.
>
> It looks like a bunch of update requests are stuck in socket reads and are
> taking up the available threads. It looks like the deletes are hanging out
> waiting for a free thread.
>
> It seems the question is, why are the requests stuck in socket reads. I
> don't have an answer at the moment.
>
> We should probably get this into a JIRA issue though.
>
> - Mark
>
>
> On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch 
> wrote:
>
> > It does not look like a deadlock, though it could be a distributed one.
> Or
> > it could be a livelock, though that's less likely.
> >
> > Here is what we used to recommend in similar situations for large Java
> > systems (BEA Weblogic):
> > 1) Do thread dump of both systems before anything. As simultaneous as you
> > can make it.
> > 2) Do the first delete. Do a thread dump every 2 minutes on both servers
> > (so, say 3 dumps in that 5 minute wait)
> > 3) Do the second delete and do thread dumps every 30 seconds on both
> > servers from just before and then during. Preferably all the way until
> the
> > problem shows itself. Every 5 seconds if the problem shows itself really
> > quick.
> >
> > That gives you a LOT of thread dumps. But it also gives you something
> that
> > allows to compare thread state before and after the problem starting
> > showing itself and to identify moving (or unnaturally still) threads. I
> > even wrote a tool long time ago that parsed those thread dumps
> > automatically and generated pretty deadlock graphs of those.
> >
> >
> > Regards,
> >   Alex.
> >
> >
> >
> >
> >
> > Personal blog: http://blog.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller 
> wrote:
> >
> >> Thans Brett, good stuff (though not a good problem).
> >>
> >> We def need to look into this.
> >>
> >> - Mark
> >>
> >> On Mar 6, 2013, at 1:53 PM, Brett Hoerner 
> wrote:
> >>
> >>> Here is a dump after the delete, indexing has been stopped:
> >>> https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e
> >>>
> >>> An interesting hint that I forgot to mention: it doesn't always happen
> on
> >>> the first delete. I manually ran the delete cron, and the server
> >> continued
> >>> to work. I waited about 5 minutes and ran it again and it stalled the
> >>> indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png
> >>>
> >>> Another thing I forgot to mention. To bring the cluster back to life I:
> >>>
> >>> 1) stop my indexer
> >>> 2) stop server1, start server1
> >>> 3) stop server2, start start2
> >>> 4) manually rebalance half of the shards to be mastered on server2
> >>> (unload/create on server1)
> >>> 5) restart indexer
> >>>
> >>> And it works again until a delete eventually kills it.
> >>>
> >>> To be clear again, select queries continue to work indefinitely.
> >>>
> >>> Thanks,
> >>> Brett
> >>>
> >>>
> >>> On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller 
> >> wrote:
> >>>
> >>>> Which version of Solr?
> >>>>
> >>>> Can you use jconsole, visualvm, or jstack to get some stack traces and
> >> see
> >>>> where things are halting?
> >>>>
> >>>> - Mark
> >>>>
> >>>> On Mar 6, 2013, at 11:45 AM, Brett Hoerner 
> >> wrote:
> >>>>
> >>>>> I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
> >>>>> replication factor of 2) that I've been using for over a month now in
> >>>>> production.
> >>>>>
> >>>>> Suddenly, the hourly cron I run t

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner
Here is the other server when it's locked:
https://gist.github.com/3529b7b6415756ead413

To be clear, neither is really "the replica", I have 32 shards and each
physical server is the leader for 16, and the replica for 16.

Also, related to the max threads hunch: my working cluster has many, many
fewer shards per Solr instance. I'm going to do some migration dancing on
this cluster today to have more Solr JVMs each with fewer cores, and see
how it affects the deletes.


On Wed, Mar 6, 2013 at 5:40 PM, Mark Miller  wrote:

> Any chance you can grab the stack trace of a replica as well? (also when
> it's locked up of course).
>
> - Mark
>
> On Mar 6, 2013, at 3:34 PM, Brett Hoerner  wrote:
>
> > If there's anything I can try, let me know. Interestingly, I think I have
> > noticed that if I stop my indexer, do my delete, and restart the indexer
> > then I'm fine. Which goes along with the update thread contention theory.
> >
> >
> > On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller 
> wrote:
> >
> >> This is what I see:
> >>
> >> We currently limit the number of outstanding update requests at one time
> >> to avoid a crazy number of threads being used.
> >>
> >> It looks like a bunch of update requests are stuck in socket reads and
> are
> >> taking up the available threads. It looks like the deletes are hanging
> out
> >> waiting for a free thread.
> >>
> >> It seems the question is, why are the requests stuck in socket reads. I
> >> don't have an answer at the moment.
> >>
> >> We should probably get this into a JIRA issue though.
> >>
> >> - Mark
> >>
> >>
> >> On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch 
> >> wrote:
> >>
> >>> It does not look like a deadlock, though it could be a distributed one.
> >> Or
> >>> it could be a livelock, though that's less likely.
> >>>
> >>> Here is what we used to recommend in similar situations for large Java
> >>> systems (BEA Weblogic):
> >>> 1) Do thread dump of both systems before anything. As simultaneous as
> you
> >>> can make it.
> >>> 2) Do the first delete. Do a thread dump every 2 minutes on both
> servers
> >>> (so, say 3 dumps in that 5 minute wait)
> >>> 3) Do the second delete and do thread dumps every 30 seconds on both
> >>> servers from just before and then during. Preferably all the way until
> >> the
> >>> problem shows itself. Every 5 seconds if the problem shows itself
> really
> >>> quick.
> >>>
> >>> That gives you a LOT of thread dumps. But it also gives you something
> >> that
> >>> allows to compare thread state before and after the problem starting
> >>> showing itself and to identify moving (or unnaturally still) threads. I
> >>> even wrote a tool long time ago that parsed those thread dumps
> >>> automatically and generated pretty deadlock graphs of those.
> >>>
> >>>
> >>> Regards,
> >>>  Alex.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Personal blog: http://blog.outerthoughts.com/
> >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >>> - Time is the quality of nature that keeps events from happening all at
> >>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> >>>
> >>>
> >>> On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller 
> >> wrote:
> >>>
> >>>> Thans Brett, good stuff (though not a good problem).
> >>>>
> >>>> We def need to look into this.
> >>>>
> >>>> - Mark
> >>>>
> >>>> On Mar 6, 2013, at 1:53 PM, Brett Hoerner 
> >> wrote:
> >>>>
> >>>>> Here is a dump after the delete, indexing has been stopped:
> >>>>> https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e
> >>>>>
> >>>>> An interesting hint that I forgot to mention: it doesn't always
> happen
> >> on
> >>>>> the first delete. I manually ran the delete cron, and the server
> >>>> continued
> >>>>> to work. I waited about 5 minutes and ran it again and it stalled the
> >>>>> indexer (as seen from indexer process):
> http://i.imgur.com/1Tt35u0.png
> >>>>>
> >>>&g

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner
As a side note, do you think that was a poor idea? I figured it's better to
spread the master "load" around?


On Thu, Mar 7, 2013 at 11:29 AM, Mark Miller  wrote:

>
> On Mar 7, 2013, at 9:03 AM, Brett Hoerner  wrote:
>
> > To be clear, neither is really "the replica", I have 32 shards and each
> > physical server is the leader for 16, and the replica for 16.
>
> Ah, interesting. That actually could be part of the issue - some brain
> cells are firing. I'm away from home till this weekend, but I can try and
> duplicate this when I get to my home base setup.
>
> - Mark


Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner
As an update to this, I did my SolrCloud dance and made it 2xJVMs per
machine (2 machines still, the same ones) and spread the load around. Each
Solr instance now has 16 total shards (master for 8, replica for 8).

*drum roll* ... I can repeatedly run my delete script and nothing breaks. :)


On Thu, Mar 7, 2013 at 11:03 AM, Brett Hoerner wrote:

> Here is the other server when it's locked:
> https://gist.github.com/3529b7b6415756ead413
>
> To be clear, neither is really "the replica", I have 32 shards and each
> physical server is the leader for 16, and the replica for 16.
>
> Also, related to the max threads hunch: my working cluster has many, many
> fewer shards per Solr instance. I'm going to do some migration dancing on
> this cluster today to have more Solr JVMs each with fewer cores, and see
> how it affects the deletes.
>
>
> On Wed, Mar 6, 2013 at 5:40 PM, Mark Miller  wrote:
>
>> Any chance you can grab the stack trace of a replica as well? (also when
>> it's locked up of course).
>>
>> - Mark
>>
>> On Mar 6, 2013, at 3:34 PM, Brett Hoerner  wrote:
>>
>> > If there's anything I can try, let me know. Interestingly, I think I
>> have
>> > noticed that if I stop my indexer, do my delete, and restart the indexer
>> > then I'm fine. Which goes along with the update thread contention
>> theory.
>> >
>> >
>> > On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller 
>> wrote:
>> >
>> >> This is what I see:
>> >>
>> >> We currently limit the number of outstanding update requests at one
>> time
>> >> to avoid a crazy number of threads being used.
>> >>
>> >> It looks like a bunch of update requests are stuck in socket reads and
>> are
>> >> taking up the available threads. It looks like the deletes are hanging
>> out
>> >> waiting for a free thread.
>> >>
>> >> It seems the question is, why are the requests stuck in socket reads. I
>> >> don't have an answer at the moment.
>> >>
>> >> We should probably get this into a JIRA issue though.
>> >>
>> >> - Mark
>> >>
>> >>
>> >> On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch 
>> >> wrote:
>> >>
>> >>> It does not look like a deadlock, though it could be a distributed
>> one.
>> >> Or
>> >>> it could be a livelock, though that's less likely.
>> >>>
>> >>> Here is what we used to recommend in similar situations for large Java
>> >>> systems (BEA Weblogic):
>> >>> 1) Do thread dump of both systems before anything. As simultaneous as
>> you
>> >>> can make it.
>> >>> 2) Do the first delete. Do a thread dump every 2 minutes on both
>> servers
>> >>> (so, say 3 dumps in that 5 minute wait)
>> >>> 3) Do the second delete and do thread dumps every 30 seconds on both
>> >>> servers from just before and then during. Preferably all the way until
>> >> the
>> >>> problem shows itself. Every 5 seconds if the problem shows itself
>> really
>> >>> quick.
>> >>>
>> >>> That gives you a LOT of thread dumps. But it also gives you something
>> >> that
>> >>> allows to compare thread state before and after the problem starting
>> >>> showing itself and to identify moving (or unnaturally still) threads.
>> I
>> >>> even wrote a tool long time ago that parsed those thread dumps
>> >>> automatically and generated pretty deadlock graphs of those.
>> >>>
>> >>>
>> >>> Regards,
>> >>>  Alex.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Personal blog: http://blog.outerthoughts.com/
>> >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> >>> - Time is the quality of nature that keeps events from happening all
>> at
>> >>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>> >>>
>> >>>
>> >>> On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller 
>> >> wrote:
>> >>>
>> >>>> Thans Brett, good stuff (though not a good problem).
>> >>>>
>> >>>> We def need to look into this.
>> >>>>
>> >>>> - Ma