Confusion when using go-live and MapReduceIndexerTool
I'm doing HDFS input and output in my job, with the following: hadoop jar /mnt/faas-solr.jar \ -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \ --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver \ --morphline-file /mnt/morphline-ignore.conf \ --zk-host $ZKHOST \ --output-dir hdfs://$MASTERIP:9000/output/ \ --collection $COLLECTION \ --go-live \ --verbose \ hdfs://$MASTERIP:9000/input/ Index creation works, $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0 drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0.fdt -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0.fdx -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0.fnm -rwxr-xr-x 1 hadoop supergroup396 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0.si -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos -rwxr-xr-x 1 hadoop supergroup508 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim -rwxr-xr-x 1 hadoop supergroup305 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip -rwxr-xr-x 1 hadoop supergroup120 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd -rwxr-xr-x 1 hadoop supergroup351 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/segments_1 -rwxr-xr-x 1 hadoop supergroup110 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/index/segments_2 drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/tlog -rw-r--r-- 1 hadoop supergroup333 2014-04-17 16:00 hdfs:// 10.98.33.114:9000/output/results/part-0/data/tlog/tlog.000 But the go-live step fails, it's trying to use the HDFS path as the remote index path? 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into Solr cluster... 14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs:// 10.98.33.114:9000/output/results/part-0 into http://discover8-test-1d.i.massrel.com:8983/solr 14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: directory '/mnt/solr_8983/home/hdfs:/ 10.98.33.114:9000/output/results/part-0/data/index' does not exist at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126) at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867) at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: directory '/mnt/solr_8983/home/hdfs:/ 10.98.33.114:9000/output/results/part-0/data/index' does not exist at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493) at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100) at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$Runn
Re: Confusion when using go-live and MapReduceIndexerTool
https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote: > Odd - might be helpful if you can share your sorlconfig.xml being used. > > -- > Mark Miller > about.me/markrmiller > > On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com) > wrote: > > I'm doing HDFS input and output in my job, with the following: > > hadoop jar /mnt/faas-solr.jar \ > -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \ > --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver > \ > --morphline-file /mnt/morphline-ignore.conf \ > --zk-host $ZKHOST \ > --output-dir hdfs://$MASTERIP:9000/output/ \ > --collection $COLLECTION \ > --go-live \ > --verbose \ > hdfs://$MASTERIP:9000/input/ > > Index creation works, > > $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0 > drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data > drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index > -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0.fdt > -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0.fdx > -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0.fnm > -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0.si > -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc > -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos > -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim > -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip > -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd > -rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm > -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/segments_1 > -rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/index/segments_2 > drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// > 10.98.33.114:9000/output/results/part-0/data/tlog > -rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs:// > > 10.98.33.114:9000/output/results/part-0/data/tlog/tlog.000 > > But the go-live step fails, it's trying to use the HDFS path as the remote > index path? > > 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into > Solr cluster... > 14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs:// > 10.98.33.114:9000/output/results/part-0 into > http://discover8-test-1d.i.massrel.com:8983/solr > 14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command > java.util.concurrent.ExecutionException: > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > directory '/mnt/solr_8983/home/hdfs:/ > 10.98.33.114:9000/output/results/part-0/data/index' does not exist > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126) > at > > org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867) > at > > org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > > org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > Caused by: > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > directory '/mnt/solr_8983/home/hdfs:/ > 10.98.33.114:9000/output/results/part-0/data/index' does not exist > at > >
Re: index merge question
Sorry to bump this, I have the same issue and was curious about the sanity of trying to work around it. * I have a constant stream of realtime documents I need to continually index. Sometimes they even overwrite very old documents (by using the same unique ID). * I also have a *huge* backlog of documents I'd like to get into a SolrCloud cluster via Hadoop. I understand that the MERGEINDEXES operation expects me to have unique documents, but is it reasonable at all for me to be able to change that? In a plain Solr instance I can add doc1, then add doc1 again with new fields and the new update "wins" and I assume during segment merges the old update is eventually removed. Does that mean it's possible for me to somehow override a merge policy (or something like that?) to effectively do exactly what my Hadoop conflict-resolver does? I already have logic there that knows how to (1) decide which of 2 duplicate documents to keep and (2) respect and "keep" deletes over anything else. I'd love some pointers at what Solr/Lucene classes to look at if I wanted to try my hand at this. I'm down in Lucene SegmentMerger right now but it seems too low level to understand whatever Solr "knows" about enforcing a single unique ID at merge (and search...? or update...?) time. Thanks! On Tue, Jun 11, 2013 at 11:10 AM, Mark Miller wrote: > Right - but that sounds a little different than what we were talking about. > > You had brought up the core admin merge cmd that let's you merge an index > into a running Solr cluster. > > We are calling that the golive option in the map reduce indexing code. It > has the limitations we have discussed. > > However, if you are only using map reduce to build indexes, there are > facilities for dealing with duplicate id's - as you see in the > documentation. The merges involved in that are different though - these are > merges that happen as the final index is being constructed by the map > reduce job. The final step is the golive step, where the indexes will be > deployed to the running Solr cluster - this is what uses the core admin > merge command, and if you are doing updates or adds outside of map reduce, > you will face the issues we have discussed. > > > - Mark > > On Jun 11, 2013, at 11:57 AM, James Thomas wrote: > > > FWIW, the Solr included with Cloudera Search, by default, "ignores all > but the most recent document version" during merges. > > The conflict resolution is configurable however. See the documentation > for details. > > > http://www.cloudera.com/content/support/en/documentation/cloudera-search/cloudera-search-documentation-v1-latest.html > > -- see the user guide pdf, " update-conflict-resolver" parameter > > > > James > > > > -Original Message- > > From: anirudh...@gmail.com [mailto:anirudh...@gmail.com] On Behalf Of > Anirudha Jadhav > > Sent: Tuesday, June 11, 2013 10:47 AM > > To: solr-user@lucene.apache.org > > Subject: Re: index merge question > > > > From my experience the lucene mergeTool and the one invoked by coreAdmin > is a pure lucene implementation and does not understand the concepts of a > unique Key(solr land concept) > > > > http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note > at the end > > > > we do frequent index merges for which we externally run map/reduce ( > java code using lucene api's) jobs to merge & validate merged indices with > sources. > > -Ani > > > > On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller > wrote: > >> Yeah, you have to carefully manage things if you are map/reduce > building indexes *and* updating documents in other ways. > >> > >> If your 'source' data for MR index building is the 'truth', you also > have the option of not doing incremental index merging, and you could > simply rebuild the whole thing every time - of course, depending your > cluster size, that could be quite expensive. > > > >> > >> - Mark > >> > >> On Jun 10, 2013, at 8:36 PM, Jamie Johnson wrote: > >> > >>> Thanks Mark. My question is stemming from the new cloudera search > stuff. > >>> My concern its that if while rebuilding the index someone updates a > >>> doc that update could be lost from a solr perspective. I guess what > >>> would need to happen to ensure the correct information was indexed > >>> would be to record the start time and reindex the information that > changed since then? > >>> On Jun 8, 2013 2:37 PM, "Mark Miller" wrote: > >>> > > On Jun 8, 2013, at 12:52 PM, Jamie Johnson wrote: > > > When merging through the core admin ( > > http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy > > for conflicts during the merge? So for instance if I am merging > > core 1 and core 2 into core 0 (first example), what happens if core > > 1 and core 2 > both > > have a document with the same key, say core 1 has a newer version > > of core 2? Does the merge fail, does the newer document remain? > > You end up with both documents, both with that ID
Re: Confusion when using go-live and MapReduceIndexerTool
Anyone have any thoughts on this? In general, am I expected to be able to go-live from an unrelated cluster of Hadoop machines to a SolrCloud that isn't running off of HDFS? intput: HDFS output: HDFS go-live cluster: SolrCloud cluster on different machines running on plain MMapDirectory I'm back to looking at the code but holy hell is debugging Hadoop hard. :) On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner wrote: > https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b > > > On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote: > >> Odd - might be helpful if you can share your sorlconfig.xml being used. >> >> -- >> Mark Miller >> about.me/markrmiller >> >> On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com) >> wrote: >> >> I'm doing HDFS input and output in my job, with the following: >> >> hadoop jar /mnt/faas-solr.jar \ >> -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \ >> --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver >> \ >> --morphline-file /mnt/morphline-ignore.conf \ >> --zk-host $ZKHOST \ >> --output-dir hdfs://$MASTERIP:9000/output/ \ >> --collection $COLLECTION \ >> --go-live \ >> --verbose \ >> hdfs://$MASTERIP:9000/input/ >> >> Index creation works, >> >> $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0 >> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data >> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index >> -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdt >> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdx >> -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0.fnm >> -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0.si >> -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc >> -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos >> -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim >> -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip >> -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd >> -rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm >> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/segments_1 >> -rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/index/segments_2 >> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// >> 10.98.33.114:9000/output/results/part-0/data/tlog >> -rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs:// >> >> 10.98.33.114:9000/output/results/part-0/data/tlog/tlog.000 >> >> But the go-live step fails, it's trying to use the HDFS path as the remote >> index path? >> >> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into >> Solr cluster... >> 14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs:// >> 10.98.33.114:9000/output/results/part-0 into >> http://discover8-test-1d.i.massrel.com:8983/solr >> 14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command >> java.util.concurrent.ExecutionException: >> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: >> directory '/mnt/solr_8983/home/hdfs:/ >> 10.98.33.114:9000/output/results/part-0/data/index' does not exist >> at java.util.concurrent.FutureTask.report(FutureTask.java:122) >> at java.util.concurrent.FutureTask.get(FutureTask.java:188) >> at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126) >> at >> >> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867) >> at >> >> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609
Re: Confusion when using go-live and MapReduceIndexerTool
I think I'm just misunderstanding the use of go-live. From mergeindexes docs: "The indexes must exist on the disk of the Solr host, which may make using this in a distributed environment cumbersome." I'm guessing I'll have to write some sort of tool that pulls each completed index out of HDFS and onto the respective SolrCloud machines and manually do some kind of merge? I don't want to (can't) be running my Hadoop jobs on the same nodes that SolrCloud is running on... Also confusing to me: "no writes should be allowed on either core until the merge is complete. If writes are allowed, corruption may occur on the merged index." Is that saying that Solr will block writes, or is that saying the end user has to ensure no writes are happening against the collection during a merge? That seems... risky? On Tue, Apr 22, 2014 at 9:29 AM, Brett Hoerner wrote: > Anyone have any thoughts on this? > > In general, am I expected to be able to go-live from an unrelated cluster > of Hadoop machines to a SolrCloud that isn't running off of HDFS? > > intput: HDFS > output: HDFS > go-live cluster: SolrCloud cluster on different machines running on plain > MMapDirectory > > I'm back to looking at the code but holy hell is debugging Hadoop hard. :) > > > On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner wrote: > >> https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b >> >> >> On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote: >> >>> Odd - might be helpful if you can share your sorlconfig.xml being used. >>> >>> -- >>> Mark Miller >>> about.me/markrmiller >>> >>> On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com) >>> wrote: >>> >>> I'm doing HDFS input and output in my job, with the following: >>> >>> hadoop jar /mnt/faas-solr.jar \ >>> -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \ >>> --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver >>> \ >>> --morphline-file /mnt/morphline-ignore.conf \ >>> --zk-host $ZKHOST \ >>> --output-dir hdfs://$MASTERIP:9000/output/ \ >>> --collection $COLLECTION \ >>> --go-live \ >>> --verbose \ >>> hdfs://$MASTERIP:9000/input/ >>> >>> Index creation works, >>> >>> $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0 >>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data >>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index >>> -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdt >>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdx >>> -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fnm >>> -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0.si >>> -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc >>> -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos >>> -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim >>> -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip >>> -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd >>> -rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm >>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/segments_1 >>> -rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/index/segments_2 >>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs:// >>> 10.98.33.114:9000/output/results/part-0/data/tlog >>> -rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs:// >>> >>> 10.98.33.114:9000/
Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)
If I run a query like this, fq=text:lol fq=created_at_tdid:[1400544000 TO 1400630400] It takes about 6 seconds. Following queries take only 50ms or less, as expected because my fqs are cached. However, if I change the query to not cache my big range query: fq=text:lol fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400] It takes 2 seconds every time, which is a much better experience for my "first query for that range." What's odd to me is that I would expect both of these (first) queries to have to do the same amount of work, expect the first one stuffs the resulting bitset into a map at the end... which seems to have a 4 second overhead? Here's my filterCache from solrconfig: Thanks.
Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)
In this case, I have >400 million documents, so I understand it taking a while. That said, I'm still not sure I understand why it would take *more* time. In your example above, wouldn't it have to create an 11.92MB bitset even if I *don't* cache the bitset? It seems the mere act of storing the work after it's done (it has to be done in either case) is taking 4 whole seconds? On Tue, Jun 3, 2014 at 3:59 PM, Shawn Heisey wrote: > On 6/3/2014 2:44 PM, Brett Hoerner wrote: > > If I run a query like this, > > > > fq=text:lol > > fq=created_at_tdid:[1400544000 TO 1400630400] > > > > It takes about 6 seconds. Following queries take only 50ms or less, as > > expected because my fqs are cached. > > > > However, if I change the query to not cache my big range query: > > > > fq=text:lol > > fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400] > > > > It takes 2 seconds every time, which is a much better experience for my > > "first query for that range." > > > > What's odd to me is that I would expect both of these (first) queries to > > have to do the same amount of work, expect the first one stuffs the > > resulting bitset into a map at the end... which seems to have a 4 second > > overhead? > > > > Here's my filterCache from solrconfig: > > > > > size="64" > > initialSize="64" > > autowarmCount="32"/> > > I think that probably depends on how many documents you have in the > single index/shard. If you have one hundred million documents stored in > the Lucene index, then each filter entry is 1250 bytes (11.92MB) in > size - it is a bitset representing every document and whether it is > included or excluded. That data would need to be gathered and copied > into the cache. I suspect that it's the gathering that takes the most > time ... several megabytes of memory is not very much for a modern > processor to copy. > > As for how long this takes, I actually have no idea. You have two > filters here, so it would need to do everything twice. > > Thanks, > Shawn > >
Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)
This is seemingly where it checks whether to use cache or not, the extra work is really just a get (miss) and a put: https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1216 I suppose it's possible the put is taking 4 seconds, but that seems... surprising to me. On Tue, Jun 3, 2014 at 4:02 PM, Brett Hoerner wrote: > In this case, I have >400 million documents, so I understand it taking a > while. > > That said, I'm still not sure I understand why it would take *more* time. > In your example above, wouldn't it have to create an 11.92MB bitset even if > I *don't* cache the bitset? It seems the mere act of storing the work after > it's done (it has to be done in either case) is taking 4 whole seconds? > > > > On Tue, Jun 3, 2014 at 3:59 PM, Shawn Heisey wrote: > >> On 6/3/2014 2:44 PM, Brett Hoerner wrote: >> > If I run a query like this, >> > >> > fq=text:lol >> > fq=created_at_tdid:[1400544000 TO 1400630400] >> > >> > It takes about 6 seconds. Following queries take only 50ms or less, as >> > expected because my fqs are cached. >> > >> > However, if I change the query to not cache my big range query: >> > >> > fq=text:lol >> > fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400] >> > >> > It takes 2 seconds every time, which is a much better experience for my >> > "first query for that range." >> > >> > What's odd to me is that I would expect both of these (first) queries to >> > have to do the same amount of work, expect the first one stuffs the >> > resulting bitset into a map at the end... which seems to have a 4 second >> > overhead? >> > >> > Here's my filterCache from solrconfig: >> > >> > > > size="64" >> > initialSize="64" >> > autowarmCount="32"/> >> >> I think that probably depends on how many documents you have in the >> single index/shard. If you have one hundred million documents stored in >> the Lucene index, then each filter entry is 1250 bytes (11.92MB) in >> size - it is a bitset representing every document and whether it is >> included or excluded. That data would need to be gathered and copied >> into the cache. I suspect that it's the gathering that takes the most >> time ... several megabytes of memory is not very much for a modern >> processor to copy. >> >> As for how long this takes, I actually have no idea. You have two >> filters here, so it would need to do everything twice. >> >> Thanks, >> Shawn >> >> >
Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)
Yonik, I'm familiar with your blog posts -- and thanks very much for them. :) Though I'm not sure what you're trying to show me with the q=*:* part? I was of course using q=*:* in my queries, but I assume you mean to leave off the text:lol bit? I've done some Cluster changes, so these are my baselines: q=*:* fq=created_at_tdid:[1392768004 TO 1393944400] (uncached at this point) ~7.5 seconds q=*:* fq={!cache=false}created_at_tdid:[1392768005 TO 1393944400] ~7.5 seconds (I guess this is what you were trying to show me?) The thing is, my queries always more "specific" than that, so given a string: q=*:* fq=text:basketball fq={!cache=false}created_at_tdid:[1392768007 TO 1393944400] ~5.2 seconds q=*:* fq=text:basketball fq={!cache=false}created_at_tdid:[1392768005 TO 1393944400] ~1.6 seconds Is there no hope for my first time fq searches being as fast as non-cached fqs? It's a shame to have to chose either (1) super fast queries after cached XOR (2) more responsive first time queries (by a large margin). Thanks! On Tue, Jun 3, 2014 at 4:20 PM, Yonik Seeley wrote: > On Tue, Jun 3, 2014 at 5:19 PM, Yonik Seeley > wrote: > > So try: > > q=*:* > > fq=created_at_tdid:[1400544000 TO 1400630400] > > vs > > So try: > q=*:* > fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400] > > > -Yonik > http://heliosearch.org - facet functions, subfacets, off-heap > filters&fieldcache >
"Fake" cached join query much faster than cached fq?
The following two queries are doing the same thing, one using a "normal" fq range query and another using a parent query. The cache is warm (these are both hits) but the "normal" ones takes ~6 to 7.5sec while the parent query hack takes ~1.2sec. Is this expected? Is there anything "wrong" with my "normal fq" query? My plan is to increase the size of my perSegFilter cache so I can use the hack for faster queries... any thoughts here? "responseHeader": { "status": 0, "QTime": 7657, "params": { "q": "*:*", " facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [ "created_at_tdid:[1392768001 TO 1393954400]", "text:coffee" ], "rows": "0", "wt": "json", "facet": "true", "_": "1401906435914" } }, "response": { "numFound": 2432754, "start": 0, " maxScore": 1, "docs": [] } Full response example: https://gist.githubusercontent.com/bretthoerner/60418f08a88093c30220/raw/0a61f013f763e68985c15c5ed6cad6fa253182b9/gistfile1.txt "responseHeader": { "status": 0, "QTime": 1210, "params": { "q": "*:*", " facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [ "{!cache=false}{!parent which='created_at_tdid:[1392768001 TO 1393954400]'}", "text:coffee" ], "rows": "0", "wt": "json", "facet": "true", "_": "1401906444521" } }, "response": { "numFound": 2432754, "start": 0, "maxScore": 1, "docs": [] } Full response example: https://gist.githubusercontent.com/bretthoerner/9d82aa8fe59ffc7ff6ab/raw/560a395a0933870a5d2ac736b58805d8fab7f758/gistfile1.txt
Re: "Fake" cached join query much faster than cached fq?
Thanks Mikhail, I'll try to profile it soon. As for cardinality, on a single core: created_at_tdid:[1392768001 TO 1393954400] = 241657215 text:coffee = 117593 Oddly enough, I just tried the query with &distrib=false and both return in about 50ms... hmm. On Thu, Jun 5, 2014 at 5:09 AM, Mikhail Khludnev wrote: > Brett, > > It's really interesting observation. I can only speculate. It's worth to > check cache hit stats and cache content via > http://wiki.apache.org/solr/SolrCaching#showItems (the key question what > are cached doc sets classes). Also if you tell the overall number of docs > in the index, and cardinality of both filters, it might allow to guess > something. Anyway, jvisualvm sampling can give an exact answer. Giving > responses, it's enough to profile one of the slave nodes. > > > On Wed, Jun 4, 2014 at 10:32 PM, Brett Hoerner > wrote: > > > The following two queries are doing the same thing, one using a "normal" > fq > > range query and another using a parent query. The cache is warm (these > are > > both hits) but the "normal" ones takes ~6 to 7.5sec while the parent > query > > hack takes ~1.2sec. > > > > Is this expected? Is there anything "wrong" with my "normal fq" query? My > > plan is to increase the size of my perSegFilter cache so I can use the > hack > > for faster queries... any thoughts here? > > > > "responseHeader": { "status": 0, "QTime": 7657, "params": { "q": "*:*", " > > facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [ > > "created_at_tdid:[1392768001 > > TO 1393954400]", "text:coffee" ], "rows": "0", "wt": "json", "facet": > > "true", > > "_": "1401906435914" } }, "response": { "numFound": 2432754, "start": 0, > " > > maxScore": 1, "docs": [] } > > > > Full response example: > > > > > https://gist.githubusercontent.com/bretthoerner/60418f08a88093c30220/raw/0a61f013f763e68985c15c5ed6cad6fa253182b9/gistfile1.txt > > > > "responseHeader": { "status": 0, "QTime": 1210, "params": { "q": "*:*", > " > > facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [ > > "{!cache=false}{!parent > > which='created_at_tdid:[1392768001 TO 1393954400]'}", "text:coffee" ], > > "rows": > > "0", "wt": "json", "facet": "true", "_": "1401906444521" } }, > "response": { > > "numFound": 2432754, "start": 0, "maxScore": 1, "docs": [] } > > > > Full response example: > > > > > https://gist.githubusercontent.com/bretthoerner/9d82aa8fe59ffc7ff6ab/raw/560a395a0933870a5d2ac736b58805d8fab7f758/gistfile1.txt > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > >
Confusion about location of + and - ?
Can anyone explain the difference between these two queries? text:(+"happy") AND -user:("123456789") = numFound 2912224 But text:(+"happy") AND user:(-"123456789") = numFound 0 Now, you may just say "then just put - infront of your field, duh!" Well, text:(+"happy") = numFound 2912224 user:(-"123456789") = numFound 465998192 (FWIW there is no user named 123456789 in my index) As you can see, the queries work alone, but when combined with an AND I always get 0 results. If I move the - before the field in my query, it works. What am I missing here? Thanks.
Re: Confusion about location of + and - ?
Interesting, is there a performance impact to sending the *:*? On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky wrote: > Yeah, there's a known bug that a negative-only query within parentheses > doesn't match properly - you need to add a non-negative term, such as > "*:*". For example: > > text:(+"happy") AND user:(*:* -"123456789") > > -- Jack Krupansky > > -Original Message- From: Brett Hoerner > Sent: Tuesday, July 1, 2014 2:51 PM > To: solr-user@lucene.apache.org > Subject: Confusion about location of + and - ? > > > Can anyone explain the difference between these two queries? > > text:(+"happy") AND -user:("123456789") = numFound 2912224 > > But > > text:(+"happy") AND user:(-"123456789") = numFound 0 > > Now, you may just say "then just put - infront of your field, duh!" Well, > > text:(+"happy") = numFound 2912224 > user:(-"123456789") = numFound 465998192 > > (FWIW there is no user named 123456789 in my index) > > As you can see, the queries work alone, but when combined with an AND I > always get 0 results. If I move the - before the field in my query, it > works. What am I missing here? > > Thanks. >
Re: Confusion about location of + and - ?
Also, does anyone have the Solr or Lucene bug # for this? On Tue, Jul 1, 2014 at 3:06 PM, Brett Hoerner wrote: > Interesting, is there a performance impact to sending the *:*? > > > On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky > wrote: > >> Yeah, there's a known bug that a negative-only query within parentheses >> doesn't match properly - you need to add a non-negative term, such as >> "*:*". For example: >> >> text:(+"happy") AND user:(*:* -"123456789") >> >> -- Jack Krupansky >> >> -Original Message- From: Brett Hoerner >> Sent: Tuesday, July 1, 2014 2:51 PM >> To: solr-user@lucene.apache.org >> Subject: Confusion about location of + and - ? >> >> >> Can anyone explain the difference between these two queries? >> >> text:(+"happy") AND -user:("123456789") = numFound 2912224 >> >> But >> >> text:(+"happy") AND user:(-"123456789") = numFound 0 >> >> Now, you may just say "then just put - infront of your field, duh!" Well, >> >> text:(+"happy") = numFound 2912224 >> user:(-"123456789") = numFound 465998192 >> >> (FWIW there is no user named 123456789 in my index) >> >> As you can see, the queries work alone, but when combined with an AND I >> always get 0 results. If I move the - before the field in my query, it >> works. What am I missing here? >> >> Thanks. >> > >
Trouble with manually routed collection after upgrade to 4.6
Hi, I've been using a collection on Solr 4.5.X for a few weeks and just did an upgrade to 4.6 and am having some issues. First: this collection is, I guess, implicitly routed. I do this for every document insert using SolrJ: document.addField("_route_", shardId) After upgrading the servers to 4.6 I now get the following on every insert/delete when using either SolrJ 4.5.1 or 4.6: org.apache.solr.common.SolrException: No active slice servicing hash code 17b9dff6 in DocCollection In the clusterstate *none* of my shards have a range set (they're all null), but I thought this would be expected since I do routing myself. Did the upgrade change something here? I didn't see anything related to this in the upgrade notes. Thanks, Brett
Re: Trouble with manually routed collection after upgrade to 4.6
Here's my clusterstate.json: https://gist.github.com/bretthoerner/a8120a8d89c93f773d70 On Mon, Nov 25, 2013 at 10:18 AM, Brett Hoerner wrote: > Hi, I've been using a collection on Solr 4.5.X for a few weeks and just > did an upgrade to 4.6 and am having some issues. > > First: this collection is, I guess, implicitly routed. I do this for every > document insert using SolrJ: > > document.addField("_route_", shardId) > > After upgrading the servers to 4.6 I now get the following on every > insert/delete when using either SolrJ 4.5.1 or 4.6: > > org.apache.solr.common.SolrException: No active slice servicing hash > code 17b9dff6 in DocCollection > > In the clusterstate *none* of my shards have a range set (they're all > null), but I thought this would be expected since I do routing myself. > > Did the upgrade change something here? I didn't see anything related to > this in the upgrade notes. > > Thanks, > Brett >
Re: Trouble with manually routed collection after upgrade to 4.6
Think I got it. For some reason this was in my clusterstate.json after the upgrade (note that I was using 4.5.X just fine previously...): "router": { "name": "compositeId" }, I stopped all my nodes and manually edited this to me "implicit" (is there a tool for this? I've always done it manually), started the cluster up again and it's all good now. On Mon, Nov 25, 2013 at 10:38 AM, Brett Hoerner wrote: > Here's my clusterstate.json: > > https://gist.github.com/bretthoerner/a8120a8d89c93f773d70 > > > On Mon, Nov 25, 2013 at 10:18 AM, Brett Hoerner wrote: > >> Hi, I've been using a collection on Solr 4.5.X for a few weeks and just >> did an upgrade to 4.6 and am having some issues. >> >> First: this collection is, I guess, implicitly routed. I do this for >> every document insert using SolrJ: >> >> document.addField("_route_", shardId) >> >> After upgrading the servers to 4.6 I now get the following on every >> insert/delete when using either SolrJ 4.5.1 or 4.6: >> >> org.apache.solr.common.SolrException: No active slice servicing hash >> code 17b9dff6 in DocCollection >> >> In the clusterstate *none* of my shards have a range set (they're all >> null), but I thought this would be expected since I do routing myself. >> >> Did the upgrade change something here? I didn't see anything related to >> this in the upgrade notes. >> >> Thanks, >> Brett >> > >
After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ 4.6.1 and indexing ceased (indexer returned "No live servers for shard" but the real root from the Solr servers is below). Note that SolrJ 4.6.1 is fine for the query side, just not adding documents. 21:35:21.508 [qtp1418442930-22296231] ERROR o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:724)
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
On Fri, Feb 7, 2014 at 6:15 PM, Mark Miller wrote: > You have to update the other nodes to 4.6.1 as well. > I'm not sure I follow, all of the Solr instances in the cluster are 4.6.1 to my knowledge? Thanks, Brett
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I verified 4.6.1 is definitely winning and included alone when it breaks. On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller wrote: > If that is the case we really have to dig in. Given the error, the first > thing I would assume is that you have an old solrj jar or something before > 4.6.1 involved with a 4.6.1 solrj jar or install. > > - Mark > > http://about.me/markrmiller > > > > On Feb 7, 2014, 7:15:24 PM, Mark Miller wrote: > Hey, yeah, blew it on this one. Someone just reported it the other day - > the way that a bug was fixed was not back and forward compatible. The first > implementation was wrong. > > You have to update the other nodes to 4.6.1 as well. > > I’m going to look at some scripting test that can help check for this type > of thing. > > - Mark > > http://about.me/markrmiller > > > > On Feb 7, 2014, 7:01:24 PM, Brett Hoerner wrote: > I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ > 4.6.1 and indexing ceased (indexer returned "No live servers for shard" but > the real root from the Solr servers is below). Note that SolrJ 4.6.1 is > fine for the query side, just not adding documents. > > > > 21:35:21.508 [qtp1418442930-22296231] ERROR > o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: > Unknown type 19 > at > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) > at > > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) > at > > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) > at > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) > at > > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) > at > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) > at > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) > at > > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) > at > > org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) > at > org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) > at > > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > at > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) > at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) > at > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > at > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > at > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > at > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > at > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > at > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > at > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > at > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > at > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > at org.eclipse.jetty.server.Server.handle(Server.java:368) > at > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) > at > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > at > > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) > at > > org.eclipse.jetty.server.AbstractHttpConnection$R
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
Oh, I was talking about my indexer. That stack is from my Solr servers, very weird since it's *supposed* to be 4.6.1 . I'll dig in more, thanks. On Sat, Feb 8, 2014 at 10:21 AM, Mark Miller wrote: > If you look at the stack trace, the line numbers match 4.6.0 in the src, > but not 4.6.1. That code couldn’t have been 4.6.1 it seems. > > - Mark > > http://about.me/markrmiller > > On Feb 8, 2014, at 11:12 AM, Brett Hoerner wrote: > > > Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I > > verified 4.6.1 is definitely winning and included alone when it breaks. > > > > > > On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller > wrote: > > > >> If that is the case we really have to dig in. Given the error, the first > >> thing I would assume is that you have an old solrj jar or something > before > >> 4.6.1 involved with a 4.6.1 solrj jar or install. > >> > >> - Mark > >> > >> http://about.me/markrmiller > >> > >> > >> > >> On Feb 7, 2014, 7:15:24 PM, Mark Miller wrote: > >> Hey, yeah, blew it on this one. Someone just reported it the other day - > >> the way that a bug was fixed was not back and forward compatible. The > first > >> implementation was wrong. > >> > >> You have to update the other nodes to 4.6.1 as well. > >> > >> I’m going to look at some scripting test that can help check for this > type > >> of thing. > >> > >> - Mark > >> > >> http://about.me/markrmiller > >> > >> > >> > >> On Feb 7, 2014, 7:01:24 PM, Brett Hoerner > wrote: > >> I have Solr 4.6.1 on the server and just upgraded my indexer app to > SolrJ > >> 4.6.1 and indexing ceased (indexer returned "No live servers for shard" > but > >> the real root from the Solr servers is below). Note that SolrJ 4.6.1 is > >> fine for the query side, just not adding documents. > >> > >> > >> > >> 21:35:21.508 [qtp1418442930-22296231] ERROR > >> o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: > >> Unknown type 19 > >> at > >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) > >> at > >> > >> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) > >> at > >> > >> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) > >> at > >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) > >> at > >> > >> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) > >> at > >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) > >> at > >> > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) > >> at > >> > >> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) > >> at > >> > >> > org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) > >> at > >> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) > >> at > >> > >> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > >> at > >> > >> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > >> at > >> > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) > >> at > >> > >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) > >> at > >> > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) > >> at > >> > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) > >> at > >> > >> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > >> at > >> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > >> at > >> > >> > org.eclipse.jetty.server.handler.ScopedHan
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
Mark, you were correct. I realized I was still running a prerelease of 4.6.1 (by a handful of commits). Bounced them with proper 4.6.1 and we're all good, sorry for the spam. :) On Sat, Feb 8, 2014 at 10:29 AM, Brett Hoerner wrote: > Oh, I was talking about my indexer. That stack is from my Solr servers, > very weird since it's *supposed* to be 4.6.1 . I'll dig in more, thanks. > > > On Sat, Feb 8, 2014 at 10:21 AM, Mark Miller wrote: > >> If you look at the stack trace, the line numbers match 4.6.0 in the src, >> but not 4.6.1. That code couldn’t have been 4.6.1 it seems. >> >> - Mark >> >> http://about.me/markrmiller >> >> On Feb 8, 2014, at 11:12 AM, Brett Hoerner >> wrote: >> >> > Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. >> I >> > verified 4.6.1 is definitely winning and included alone when it breaks. >> > >> > >> > On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller >> wrote: >> > >> >> If that is the case we really have to dig in. Given the error, the >> first >> >> thing I would assume is that you have an old solrj jar or something >> before >> >> 4.6.1 involved with a 4.6.1 solrj jar or install. >> >> >> >> - Mark >> >> >> >> http://about.me/markrmiller >> >> >> >> >> >> >> >> On Feb 7, 2014, 7:15:24 PM, Mark Miller wrote: >> >> Hey, yeah, blew it on this one. Someone just reported it the other day >> - >> >> the way that a bug was fixed was not back and forward compatible. The >> first >> >> implementation was wrong. >> >> >> >> You have to update the other nodes to 4.6.1 as well. >> >> >> >> I’m going to look at some scripting test that can help check for this >> type >> >> of thing. >> >> >> >> - Mark >> >> >> >> http://about.me/markrmiller >> >> >> >> >> >> >> >> On Feb 7, 2014, 7:01:24 PM, Brett Hoerner >> wrote: >> >> I have Solr 4.6.1 on the server and just upgraded my indexer app to >> SolrJ >> >> 4.6.1 and indexing ceased (indexer returned "No live servers for >> shard" but >> >> the real root from the Solr servers is below). Note that SolrJ 4.6.1 is >> >> fine for the query side, just not adding documents. >> >> >> >> >> >> >> >> 21:35:21.508 [qtp1418442930-22296231] ERROR >> >> o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: >> >> Unknown type 19 >> >> at >> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) >> >> at >> >> >> >> >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) >> >> at >> >> >> >> >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) >> >> at >> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) >> >> at >> >> >> >> >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) >> >> at >> >> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) >> >> at >> >> >> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) >> >> at >> >> >> >> >> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) >> >> at >> >> >> >> >> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) >> >> at >> >> >> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) >> >> at >> >> >> >> >> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >> >> at >> >> >> >> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >> >> at >> >> >> >> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) >> >> at >> >> >> >> &
Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)
I have a very weird problem that I'm going to try to describe here to see if anyone has any "ah-ha" moments or clues. I haven't created a small reproducible project for this but I guess I will have to try in the future if I can't figure it out. (Or I'll need to bisect by running long Hadoop jobs...) So, the facts: * Have been successfully using Solr mapred to build very large Solr clusters for months * As of Solr 4.10 *some* job sizes repeatably hang in the MTree merge phase in 4.10 * Those same jobs (same input, output, and Hadoop cluster itself) succeed if I only change my Solr deps to 4.9 * The job *does succeed* in 4.10 if I use the same data to create more, but smaller shards (e.g. 12x as many shards each 1/12th the size of the job that fails) * Creating my "normal size" shards (the size I want, that works in 4.9) the job hangs with 2 mappers running, 0 reducers in the MTree merge phase * There are no errors or warning in the syslog/stderr of the MTree mappers, no errors ever echo'd back to the "interactive run" of the job (mapper says 100%, reduce says 0%, will stay forever) * No CPU being used on the boxes running the merge, no GC happening, JVM waiting on a futex, all threads blocked on various queues * No disk usage problems, nothing else obviously wrong with any box in the cluster I diff'ed around between 4.10 and 4.9 and barely see any changes in mapred contrib, mostly some test stuff. I didn't see any transitive dependency changes in Solr/Lucene that look like they would affect me.
Re: Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)
) ... 12 more [...snip...] another similar failure: 14/09/23 17:52:55 INFO mapreduce.Job: Task Id : attempt_1411487144915_0006_r_46_0, Status : FAILED Error: java.io.IOException: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:307) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1421) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:615) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1648) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1625) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168) at org.apache.solr.hadoop.BatchWriter.close(BatchWriter.java:200) at org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:295) ... 8 more Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=d9019857 actual=632aa4e2 (resource=BufferedChecksumIndexInput(_1i_Lucene41_0.tip)) at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:211) at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:268) at org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.(BlockTreeTermsReader.java:125) at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:441) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:197) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:254) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:120) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:108) at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:143) at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:237) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:104) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:426) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:277) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476) ... 25 more On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner wrote: > I have a very weird problem that I'm going to try to describe here to see > if anyone has any "ah-ha" moments or clues. I haven't created a small > reproducible project for this but I guess I will have to try in the future > if I can't figure it out. (Or I'll need to bisect by running long Hadoop > jobs...) > > So, the facts: > > * Have been successfully using Solr mapred
Re: Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)
To be clear, those exceptions are during the "main" mapred job that is creating the many small indexes. If these errors above occur (they don't fail the job), I am 99% sure that is when the MTree job later hangs. On Tue, Sep 23, 2014 at 1:02 PM, Brett Hoerner wrote: > I believe these are related (they are new to me), anyone seen anything > like this in Solr mapred? > > > > Error: java.io.IOException: > org.apache.solr.client.solrj.SolrServerException: > org.apache.solr.client.solrj.SolrServerException: > org.apache.lucene.index.CorruptIndexException: checksum failed (hardware > problem?) : expected=5fb8f6da actual=8b048ec4 > (resource=BufferedChecksumIndexInput(_1e_Lucene41_0.tip)) > at > org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:307) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > Caused by: org.apache.solr.client.solrj.SolrServerException: > org.apache.solr.client.solrj.SolrServerException: > org.apache.lucene.index.CorruptIndexException: checksum failed (hardware > problem?) : expected=5fb8f6da actual=8b048ec4 > (resource=BufferedChecksumIndexInput(_1e_Lucene41_0.tip)) > at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223) > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) > at > org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168) > at org.apache.solr.hadoop.BatchWriter.close(BatchWriter.java:200) > at > org.apache.solr.hadoop.SolrRecordWriter.close(SolrRecordWriter.java:295) > ... 8 more > Caused by: org.apache.solr.client.solrj.SolrServerException: > org.apache.lucene.index.CorruptIndexException: checksum failed (hardware > problem?) : expected=5fb8f6da actual=8b048ec4 > (resource=BufferedChecksumIndexInput(_1e_Lucene41_0.tip)) > at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155) > ... 12 more > Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed > (hardware problem?) : expected=5fb8f6da actual=8b048ec4 > (resource=BufferedChecksumIndexInput(_1e_Lucene41_0.tip)) > at > org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:211) > at > org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:268) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.(BlockTreeTermsReader.java:125) > at > org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:441) > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:197) > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:254) > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:120) > at > org.apache.lucene.index.SegmentReader.(SegmentReader.java:108) > at > org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:143) > at > org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282) > at > org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3315) > at > org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3306) > at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3020) > at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3169) > at > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3136) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:582) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateP
Solr mapred MTree merge stage ~6x slower in 4.10
As an update to this thread, it seems my MTree wasn't completely hanging, it was just much slower in 4.10. If I replace 4.9.0 with 4.10 in my jar the MTree merge stage is 6x (or more) slower (in my case, 20 min becomes 2 hours). I hope to bisect this in the future, but the jobs I'm running take a long time. I haven't tried to see if the issue shows on smaller jobs yet (does 1 minute become 6 minutes?). Brett On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner wrote: > I have a very weird problem that I'm going to try to describe here to see > if anyone has any "ah-ha" moments or clues. I haven't created a small > reproducible project for this but I guess I will have to try in the future > if I can't figure it out. (Or I'll need to bisect by running long Hadoop > jobs...) > > So, the facts: > > * Have been successfully using Solr mapred to build very large Solr > clusters for months > * As of Solr 4.10 *some* job sizes repeatably hang in the MTree merge > phase in 4.10 > * Those same jobs (same input, output, and Hadoop cluster itself) succeed > if I only change my Solr deps to 4.9 > * The job *does succeed* in 4.10 if I use the same data to create more, > but smaller shards (e.g. 12x as many shards each 1/12th the size of the job > that fails) > * Creating my "normal size" shards (the size I want, that works in 4.9) > the job hangs with 2 mappers running, 0 reducers in the MTree merge phase > * There are no errors or warning in the syslog/stderr of the MTree > mappers, no errors ever echo'd back to the "interactive run" of the job > (mapper says 100%, reduce says 0%, will stay forever) > * No CPU being used on the boxes running the merge, no GC happening, JVM > waiting on a futex, all threads blocked on various queues > * No disk usage problems, nothing else obviously wrong with any box in the > cluster > > I diff'ed around between 4.10 and 4.9 and barely see any changes in mapred > contrib, mostly some test stuff. I didn't see any transitive dependency > changes in Solr/Lucene that look like they would affect me. >
Advice for using Solr 4.5 custom sharding to handle rolling time-oriented event data
I'm interesting in using the new custom sharding features in the collections API to search a rolling window of event data. I'd appreciate a spot/sanity check of my plan/understanding. Say I only care about the last 7 days of events and I have thousands per second (billions per week). Am I correct that I could create a new shard for each hour, and send events that happen in those hour with the ID (uniqueKey) of `new_event_hour!event_id` so that each hour block of events goes into one shard? I *always* query these events by the time in which they occurred, which is another TrieInt field that I index with every document. So at query time I would need to calculate the range the user cared about and send something like _route_=hour1&_route_=hour2 if I wanted to only query those two shards. (I *can* set multiple _route_ arguments in one query, right? And Solr will handle merging results like it would with any other cores?) Some scheduled task would drop and delete shards after they were more than 7 days old. Does all of that make sense? Do you see a smarter way to do large "time-oriented" search in SolrCloud? Thanks!
Problems with maxShardsPerNode in 4.5
It seems that changes in 4.5 collection configuration now require users to set a maxShardsPerNode (or it defaults to 1). Maybe this was the case before, but with the new CREATESHARD API it seems a very restrictive. I've just created a very simple test collection on 3 machines where I set maxShardsPerNode at collection creation time to 1, and I made 3 shards. Everything is good. Now I want a 4th shard, it seems impossible to create because the cluster "knows" I should only have 1 shard per node. Yet my problem doesn't require more hardware, I just my new shard to exist on one of the existing servers. So I try again -- I create a collection with 3 shards and set maxShardsPerNode to 1000 (just as a silly test). Everything is good. Now I add shard4 and it immediately tries to add 1000 replicas of shard4... You can see my earlier email today about time-oriented data in 4.5 to see what I'm trying to do. I was hoping to have 1 shard per hour/day with the ability to easily add/drop them as I move the time window (say, a week of data, 1 per day). Am I missing something? Thanks!
Re: Problems with maxShardsPerNode in 4.5
Related, 1 more try: Created collection starting with 4 shards on 1 box. Had to set maxShardsPerNode to 4 to do this. Now I want to "roll over" my time window, so to attempt to deal with the problems noted above I delete the oldest shard first. That works fine. Now I try to add my new shard, which works, but again it defaults to "maxShardsPerNode" # of replicas, so I'm left with: * [deleted by me] hour0 * hour1 - 1 replica * hour2 - 1 replica * hour3 - 1 replica * hour4 - 4 replicas [ << the one I created after deleting hour0] Still at a loss as to how I would create 1 new shard with 1 replica on any server in 4.5? Thanks! On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner wrote: > It seems that changes in 4.5 collection configuration now require users to > set a maxShardsPerNode (or it defaults to 1). > > Maybe this was the case before, but with the new CREATESHARD API it seems > a very restrictive. I've just created a very simple test collection on 3 > machines where I set maxShardsPerNode at collection creation time to 1, and > I made 3 shards. Everything is good. > > Now I want a 4th shard, it seems impossible to create because the cluster > "knows" I should only have 1 shard per node. Yet my problem doesn't require > more hardware, I just my new shard to exist on one of the existing servers. > > So I try again -- I create a collection with 3 shards and set > maxShardsPerNode to 1000 (just as a silly test). Everything is good. > > Now I add shard4 and it immediately tries to add 1000 replicas of shard4... > > You can see my earlier email today about time-oriented data in 4.5 to see > what I'm trying to do. I was hoping to have 1 shard per hour/day with the > ability to easily add/drop them as I move the time window (say, a week of > data, 1 per day). > > Am I missing something? > > Thanks! >
Re: Problems with maxShardsPerNode in 4.5
Shalin, Thanks for the fix. There's still part of the underlying issue that I consider a bug or a documentation problem: how do I adjust maxShardsPerNode after my collection has been created, and/or how can I disable it being checked/used at all? It seems odd to me that I have to set it to an odd number like 1000 just to get around this? Thanks! On Wed, Oct 2, 2013 at 12:04 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Thanks for reporting this Brett. This is indeed a bug. A workaround is to > specify replicationFactor=1 with the createShard command which will create > only one replica even if maxShardsPerNode=1000 at collection level. > > I'll open an issue. > > > On Wed, Oct 2, 2013 at 7:25 AM, Brett Hoerner >wrote: > > > Related, 1 more try: > > > > Created collection starting with 4 shards on 1 box. Had to set > > maxShardsPerNode to 4 to do this. > > > > Now I want to "roll over" my time window, so to attempt to deal with the > > problems noted above I delete the oldest shard first. That works fine. > > > > Now I try to add my new shard, which works, but again it defaults to > > "maxShardsPerNode" # of replicas, so I'm left with: > > > > * [deleted by me] hour0 > > * hour1 - 1 replica > > * hour2 - 1 replica > > * hour3 - 1 replica > > * hour4 - 4 replicas [ << the one I created after deleting hour0] > > > > Still at a loss as to how I would create 1 new shard with 1 replica on > any > > server in 4.5? > > > > Thanks! > > > > > > On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner > >wrote: > > > > > It seems that changes in 4.5 collection configuration now require users > > to > > > set a maxShardsPerNode (or it defaults to 1). > > > > > > Maybe this was the case before, but with the new CREATESHARD API it > seems > > > a very restrictive. I've just created a very simple test collection on > 3 > > > machines where I set maxShardsPerNode at collection creation time to 1, > > and > > > I made 3 shards. Everything is good. > > > > > > Now I want a 4th shard, it seems impossible to create because the > cluster > > > "knows" I should only have 1 shard per node. Yet my problem doesn't > > require > > > more hardware, I just my new shard to exist on one of the existing > > servers. > > > > > > So I try again -- I create a collection with 3 shards and set > > > maxShardsPerNode to 1000 (just as a silly test). Everything is good. > > > > > > Now I add shard4 and it immediately tries to add 1000 replicas of > > shard4... > > > > > > You can see my earlier email today about time-oriented data in 4.5 to > see > > > what I'm trying to do. I was hoping to have 1 shard per hour/day with > the > > > ability to easily add/drop them as I move the time window (say, a week > of > > > data, 1 per day). > > > > > > Am I missing something? > > > > > > Thanks! > > > > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
What's the purpose of the bits option in compositeId (Solr 4.5)?
I'm curious what the later "shard-local" bits do, if anything? I have a very large cluster (256 shards) and I'm sending most of my data with a single "composite", e.g. 1234!, but I'm noticing the data is being split among many of the shards. My guess right now is that since I'm only using the default 16 bits my data is being split across multiple shards (because of my high # of shards). Thanks, Brett
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
Router is definitely compositeId. To be clear, data isn't being spread evenly... it's like it's *almost* working. It's just odd to me that I'm slamming in data that's 99% of one _route_ key yet after a few minutes (from a fresh empty index) I have 2 shards with a sizeable amount of data (68M and 128M) and the rest are very small as expected. The fact that two are receiving so much makes me think my data is being split into two shards. I'm trying to debug more now. On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley wrote: > On Tue, Oct 8, 2013 at 6:29 PM, Brett Hoerner > wrote: > > I'm curious what the later "shard-local" bits do, if anything? > > > > I have a very large cluster (256 shards) and I'm sending most of my data > > with a single "composite", e.g. 1234!, but I'm noticing the > data > > is being split among many of the shards. > > That shouldn't be the case. All of your shards should have a lower > hash value with all 0 bits and an upper hash value of all 1s (i.e. > 0x to 0x) > So you see any shards where that's not true? > > Also, is the router set to compositeId? > > -Yonik > > > My guess right now is that since I'm only using the default 16 bits my > data > > is being split across multiple shards (because of my high # of shards). > > > > Thanks, > > Brett >
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
This is my clusterstate.json: https://gist.github.com/bretthoerner/0098f741f48f9bb51433 And these are my core sizes (note large ones are sorted to the end): https://gist.github.com/bretthoerner/f5b5e099212194b5dff6 I've only "heavily sent" 2 shards by now (I'm sharding by hour and it's been running for 2). There *is* a little old data in my stream, but not that much (like <5%). What's confusing to me is that 5 of them are rather large, when I'd expect 2 of them to be. On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley wrote: > On Tue, Oct 8, 2013 at 6:29 PM, Brett Hoerner > wrote: > > I'm curious what the later "shard-local" bits do, if anything? > > > > I have a very large cluster (256 shards) and I'm sending most of my data > > with a single "composite", e.g. 1234!, but I'm noticing the > data > > is being split among many of the shards. > > That shouldn't be the case. All of your shards should have a lower > hash value with all 0 bits and an upper hash value of all 1s (i.e. > 0x to 0x) > So you see any shards where that's not true? > > Also, is the router set to compositeId? > > -Yonik > > > My guess right now is that since I'm only using the default 16 bits my > data > > is being split across multiple shards (because of my high # of shards). > > > > Thanks, > > Brett >
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
I have a silly question, how do I query a single shard in SolrCloud? When I hit solr/foo_shard1_replica1/select it always seems to do a full cluster query. I can't (easily) do a _route_ query before I know what each have. On Tue, Oct 8, 2013 at 7:06 PM, Yonik Seeley wrote: > On Tue, Oct 8, 2013 at 7:31 PM, Brett Hoerner > wrote: > > This is my clusterstate.json: > > https://gist.github.com/bretthoerner/0098f741f48f9bb51433 > > > > And these are my core sizes (note large ones are sorted to the end): > > https://gist.github.com/bretthoerner/f5b5e099212194b5dff6 > > > > I've only "heavily sent" 2 shards by now (I'm sharding by hour and it's > > been running for 2). There *is* a little old data in my stream, but not > > that much (like <5%). What's confusing to me is that 5 of them are rather > > large, when I'd expect 2 of them to be. > > The cluster state looks fine at first glance... and each route key > should map to a single shard. > You could try a query to each of the big shards and see what IDs are in > them. > > -Yonik >
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
Ignore me I forgot about shards= from the wiki. On Tue, Oct 8, 2013 at 7:11 PM, Brett Hoerner wrote: > I have a silly question, how do I query a single shard in SolrCloud? When > I hit solr/foo_shard1_replica1/select it always seems to do a full cluster > query. > > I can't (easily) do a _route_ query before I know what each have. > > > On Tue, Oct 8, 2013 at 7:06 PM, Yonik Seeley wrote: > >> On Tue, Oct 8, 2013 at 7:31 PM, Brett Hoerner >> wrote: >> > This is my clusterstate.json: >> > https://gist.github.com/bretthoerner/0098f741f48f9bb51433 >> > >> > And these are my core sizes (note large ones are sorted to the end): >> > https://gist.github.com/bretthoerner/f5b5e099212194b5dff6 >> > >> > I've only "heavily sent" 2 shards by now (I'm sharding by hour and it's >> > been running for 2). There *is* a little old data in my stream, but not >> > that much (like <5%). What's confusing to me is that 5 of them are >> rather >> > large, when I'd expect 2 of them to be. >> >> The cluster state looks fine at first glance... and each route key >> should map to a single shard. >> You could try a query to each of the big shards and see what IDs are in >> them. >> >> -Yonik >> > >
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
Thanks folks, As an update for future readers --- the problem was on my side (my logic in picking the _route_ was flawed) as expected. :) On Tue, Oct 8, 2013 at 7:35 PM, Yonik Seeley wrote: > On Tue, Oct 8, 2013 at 8:27 PM, Shawn Heisey wrote: > > There is also the "distrib=false" parameter that will cause the request > to > > be handled directly by the core it is sent to rather than being > > distributed/balanced by SolrCloud. > > Right - this is probably the best option for diagnosing what is in what > index. > > -Yonik >
SolrCloud facet query repeatably fails with "No live SolrServers" for some terms, not all
An example: https://gist.github.com/bretthoerner/2ffc362450bcd4c2487a I'll note that all shards and replicas show as "Up" (green) in the Admin UI. Does anyone know how this could happen? I can repeat this over and over with the same terms. It was my understanding that something like a facet query would need to go to *all* shards for any query (I'm using the default SolrCloud sharding mechanism, nothing special). How could a text field search for 'happy' always work and 'austin' always return an error, shouldn't that "down server" be hit for a 'happy' query also? Thanks, Brett
SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)
Hi, I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection, which I called "default" and haven't used since. I'm using an external ZK ensemble that was completely empty before I started this cloud. Once I had all 4 nodes in the cloud I used the collection API to create the real collections I wanted. I also tested that deleting works. For example, # this worked curl " http://localhost:8984/solr/admin/collections?action=CREATE&name=15678&numShards=4 " # this worked curl "http://localhost:8984/solr/admin/collections?action=DELETE&name=15678"; Next, I started my indexer service which happily sent many, many updates to the cloud. Queries against the collections also work just fine. Finally, a few hours later, I tried doing a create and a delete. Both operations did nothing, although Solr replied with a "200 OK". $ curl -i " http://localhost:8984/solr/admin/collections?action=CREATE&name=15679&numShards=4 " HTTP/1.1 200 OK Content-Type: application/xml; charset=UTF-8 Transfer-Encoding: chunked 03 There is nothing in the stdout/stderr logs, nor the Java logs (I have it set to WARN). I have tried bouncing the nodes and it doesn't change anything. Any ideas? How can I further debug this or what else can I provide?
Re: SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)
For what it's worth this is the log output with DEBUG on, Dec 07, 2012 2:00:48 PM org.apache.solr.handler.admin.CollectionsHandler handleCreateAction INFO: Creating Collection : action=CREATE&name=foo&numShards=4 Dec 07, 2012 2:01:03 PM org.apache.solr.core.SolrCore execute INFO: [15671] webapp=/solr path=/admin/system params={wt=json} status=0 QTime=5 Dec 07, 2012 2:01:15 PM org.apache.solr.handler.admin.CollectionsHandler handleDeleteAction INFO: Deleting Collection : action=DELETE&name=default Dec 07, 2012 2:01:20 PM org.apache.solr.core.SolrCore execute Neither the CREATE or DELETE actually did anything, though. (Again, HTTP 200 OK) Still stuck here, any ideas? Brett On Tue, Dec 4, 2012 at 7:19 PM, Brett Hoerner wrote: > Hi, > > I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection, > which I called "default" and haven't used since. I'm using an external ZK > ensemble that was completely empty before I started this cloud. > > Once I had all 4 nodes in the cloud I used the collection API to create > the real collections I wanted. I also tested that deleting works. > > For example, > > # this worked > curl " > http://localhost:8984/solr/admin/collections?action=CREATE&name=15678&numShards=4 > " > > # this worked > curl " > http://localhost:8984/solr/admin/collections?action=DELETE&name=15678"; > > Next, I started my indexer service which happily sent many, many updates > to the cloud. Queries against the collections also work just fine. > > Finally, a few hours later, I tried doing a create and a delete. Both > operations did nothing, although Solr replied with a "200 OK". > > $ curl -i " > http://localhost:8984/solr/admin/collections?action=CREATE&name=15679&numShards=4 > " > HTTP/1.1 200 OK > Content-Type: application/xml; charset=UTF-8 > Transfer-Encoding: chunked > > > > 0 name="QTime">3 > > There is nothing in the stdout/stderr logs, nor the Java logs (I have it > set to WARN). > > I have tried bouncing the nodes and it doesn't change anything. > > Any ideas? How can I further debug this or what else can I provide? >
Re: SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)
Thanks, It looks like my cluster is in a wedged state after I tried to delete a collection that didn't exist. There are about 80 items in the queue after the delete op (that it can't get by). Is that a known bug? I guess for now I'll just check that a collection exists before sending any deletes. :) Brett On Fri, Dec 7, 2012 at 10:50 AM, Mark Miller wrote: > Anything in any of the other logs (the other nodes)? The key is getting > the logs from the node designated as the overseer - it should hopefully > have the error. > > Right now because you pass this stuff off to the overseer, you will always > get back a 200 - there is a JIRA issue that addresses this though > (collection API responses) and I hope to get it committed soon. > > - Mark > > On Dec 7, 2012, at 7:26 AM, Brett Hoerner wrote: > > > For what it's worth this is the log output with DEBUG on, > > > > Dec 07, 2012 2:00:48 PM org.apache.solr.handler.admin.CollectionsHandler > > handleCreateAction > > INFO: Creating Collection : action=CREATE&name=foo&numShards=4 > > Dec 07, 2012 2:01:03 PM org.apache.solr.core.SolrCore execute > > INFO: [15671] webapp=/solr path=/admin/system params={wt=json} status=0 > > QTime=5 > > Dec 07, 2012 2:01:15 PM org.apache.solr.handler.admin.CollectionsHandler > > handleDeleteAction > > INFO: Deleting Collection : action=DELETE&name=default > > Dec 07, 2012 2:01:20 PM org.apache.solr.core.SolrCore execute > > > > Neither the CREATE or DELETE actually did anything, though. (Again, HTTP > > 200 OK) > > > > Still stuck here, any ideas? > > > > Brett > > > > > > On Tue, Dec 4, 2012 at 7:19 PM, Brett Hoerner >wrote: > > > >> Hi, > >> > >> I have a Cloud setup of 4 machines. I bootstrapped them with 1 > collection, > >> which I called "default" and haven't used since. I'm using an external > ZK > >> ensemble that was completely empty before I started this cloud. > >> > >> Once I had all 4 nodes in the cloud I used the collection API to create > >> the real collections I wanted. I also tested that deleting works. > >> > >> For example, > >> > >> # this worked > >> curl " > >> > http://localhost:8984/solr/admin/collections?action=CREATE&name=15678&numShards=4 > >> " > >> > >> # this worked > >> curl " > >> http://localhost:8984/solr/admin/collections?action=DELETE&name=15678"; > >> > >> Next, I started my indexer service which happily sent many, many updates > >> to the cloud. Queries against the collections also work just fine. > >> > >> Finally, a few hours later, I tried doing a create and a delete. Both > >> operations did nothing, although Solr replied with a "200 OK". > >> > >> $ curl -i " > >> > http://localhost:8984/solr/admin/collections?action=CREATE&name=15679&numShards=4 > >> " > >> HTTP/1.1 200 OK > >> Content-Type: application/xml; charset=UTF-8 > >> Transfer-Encoding: chunked > >> > >> > >> > >> 0 >> name="QTime">3 > >> > >> There is nothing in the stdout/stderr logs, nor the Java logs (I have it > >> set to WARN). > >> > >> I have tried bouncing the nodes and it doesn't change anything. > >> > >> Any ideas? How can I further debug this or what else can I provide? > >> > >
Have the SolrCloud collection REST endpoints move or changed for 4.1?
I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm trying out 4.1 RC1 right now but the update URL I used to use is returning HTTP 404. For example, I would post my document updates to, http://localhost:8983/solr/collection1 But that is 404ing now (collection1 exists according to the admin UI, all shards are green and happy, and data dirs exist on the nodes). I also tried the following, http://localhost:8983/solr/collection1/update And also received a 404 there. A specific example from the Java client: 22:38:12.474 [pool-7-thread-14] ERROR com.massrel.faassolr.SolrBackend - Error while flushing to Solr. org.apache.solr.common.SolrException: Server at http://backfill-2d.i.massrel.com:8983/solr/15724/update returned non ok status:404, message:Not Found at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:438) ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] But I can hit that URL with a GET, $ curl http://backfill-1d.i.massrel.com:8983/solr/15724/update 4002missing content stream400 Thoughts? Thanks.
Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?
I'm actually wondering if this other issue I've been having is a problem: https://issues.apache.org/jira/browse/SOLR-4321 The fact that some nodes don't "get" pieces of a collection could explain the 404. That said, even when a node has "parts" of a collection it reports 404 sometimes. What's odd is that I can use curl to post a JSON document to the same URL and it will return 200. When I log every request I make from my indexer process (using solr4j) it's about 50/50 between 404 and 200... On Sat, Jan 19, 2013 at 5:22 PM, Brett Hoerner wrote: > I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm > trying out 4.1 RC1 right now but the update URL I used to use is returning > HTTP 404. > > For example, I would post my document updates to, > > http://localhost:8983/solr/collection1 > > But that is 404ing now (collection1 exists according to the admin UI, all > shards are green and happy, and data dirs exist on the nodes). > > I also tried the following, > > http://localhost:8983/solr/collection1/update > > And also received a 404 there. > > A specific example from the Java client: > > 22:38:12.474 [pool-7-thread-14] ERROR com.massrel.faassolr.SolrBackend - > Error while flushing to Solr. > org.apache.solr.common.SolrException: Server at > http://backfill-2d.i.massrel.com:8983/solr/15724/update returned non ok > status:404, message:Not Found > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) > ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] > at > org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:438) > ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] > > But I can hit that URL with a GET, > > $ curl http://backfill-1d.i.massrel.com:8983/solr/15724/update > > > 400 name="QTime">2missing content > stream400 > > > Thoughts? > > Thanks. >
Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?
So the ticket I created wasn't related, there is a working patch for that now but my original issue remains, I get 404 when trying to post updates to a URL that worked fine in Solr 4.0. On Sat, Jan 19, 2013 at 5:56 PM, Brett Hoerner wrote: > I'm actually wondering if this other issue I've been having is a problem: > > https://issues.apache.org/jira/browse/SOLR-4321 > > The fact that some nodes don't "get" pieces of a collection could explain > the 404. > > That said, even when a node has "parts" of a collection it reports 404 > sometimes. What's odd is that I can use curl to post a JSON document to the > same URL and it will return 200. > > When I log every request I make from my indexer process (using solr4j) > it's about 50/50 between 404 and 200... > > > On Sat, Jan 19, 2013 at 5:22 PM, Brett Hoerner wrote: > >> I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm >> trying out 4.1 RC1 right now but the update URL I used to use is returning >> HTTP 404. >> >> For example, I would post my document updates to, >> >> http://localhost:8983/solr/collection1 >> >> But that is 404ing now (collection1 exists according to the admin UI, all >> shards are green and happy, and data dirs exist on the nodes). >> >> I also tried the following, >> >> http://localhost:8983/solr/collection1/update >> >> And also received a 404 there. >> >> A specific example from the Java client: >> >> 22:38:12.474 [pool-7-thread-14] ERROR com.massrel.faassolr.SolrBackend - >> Error while flushing to Solr. >> org.apache.solr.common.SolrException: Server at >> http://backfill-2d.i.massrel.com:8983/solr/15724/update returned non ok >> status:404, message:Not Found >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) >> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] >> at >> org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:438) >> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] >> at >> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) >> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] >> >> But I can hit that URL with a GET, >> >> $ curl http://backfill-1d.i.massrel.com:8983/solr/15724/update >> >> >> 400> name="QTime">2missing content >> stream400 >> >> >> Thoughts? >> >> Thanks. >> > >
Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?
Sorry, I take it back. It looks like fixing https://issues.apache.org/jira/browse/SOLR-4321 fixed my issue after all. On Sun, Jan 20, 2013 at 2:21 PM, Brett Hoerner wrote: > So the ticket I created wasn't related, there is a working patch for that > now but my original issue remains, I get 404 when trying to post updates to > a URL that worked fine in Solr 4.0. > > > On Sat, Jan 19, 2013 at 5:56 PM, Brett Hoerner wrote: > >> I'm actually wondering if this other issue I've been having is a problem: >> >> https://issues.apache.org/jira/browse/SOLR-4321 >> >> The fact that some nodes don't "get" pieces of a collection could explain >> the 404. >> >> That said, even when a node has "parts" of a collection it reports 404 >> sometimes. What's odd is that I can use curl to post a JSON document to the >> same URL and it will return 200. >> >> When I log every request I make from my indexer process (using solr4j) >> it's about 50/50 between 404 and 200... >> >> >> On Sat, Jan 19, 2013 at 5:22 PM, Brett Hoerner >> wrote: >> >>> I was using Solr 4.0 but ran into a few problems using SolrCloud. I'm >>> trying out 4.1 RC1 right now but the update URL I used to use is returning >>> HTTP 404. >>> >>> For example, I would post my document updates to, >>> >>> http://localhost:8983/solr/collection1 >>> >>> But that is 404ing now (collection1 exists according to the admin UI, >>> all shards are green and happy, and data dirs exist on the nodes). >>> >>> I also tried the following, >>> >>> http://localhost:8983/solr/collection1/update >>> >>> And also received a 404 there. >>> >>> A specific example from the Java client: >>> >>> 22:38:12.474 [pool-7-thread-14] ERROR com.massrel.faassolr.SolrBackend - >>> Error while flushing to Solr. >>> org.apache.solr.common.SolrException: Server at >>> http://backfill-2d.i.massrel.com:8983/solr/15724/update returned non ok >>> status:404, message:Not Found >>> at >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) >>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] >>> at >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] >>> at >>> org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:438) >>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] >>> at >>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) >>> ~[solr-solrj-4.0.0.jar:4.0.0 1394950 - rmuir - 2012-10-06 03:05:44] >>> >>> But I can hit that URL with a GET, >>> >>> $ curl http://backfill-1d.i.massrel.com:8983/solr/15724/update >>> >>> >>> 400>> name="QTime">2missing content >>> stream400 >>> >>> >>> Thoughts? >>> >>> Thanks. >>> >> >> >
Problem querying collection in Solr 4.1
I have a collection in Solr 4.1 RC1 and doing a simple query like text:"puppy dog" is causing an exception. Oddly enough, I CAN query for text:puppy or text:"puppy", but adding the space breaks everything. Schema and config: https://gist.github.com/f49da15e39e5609b75b1 This happens whether I query the whole collection or a single direct core. I haven't tested whether this would happen outside of SolrCloud. http://localhost:8984/solr/timeline/select?q=text%3A%22puppy+dog%22&wt=xml http://localhost:8984/solr/timeline_shard4_replica1/select?q=text%3A%22puppy+dog%22&wt=xml Jan 22, 2013 12:07:24 AM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[ http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard2_replica1, http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard1_replica2, http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard3_replica2, http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard4_replica1, http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard1_replica1, http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard2_replica2, http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard3_replica1, http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard4_replica2] at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[ http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard2_replica1, http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard1_replica2, http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard3_replica2, http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard4_replica1, http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard1_replica1, http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard2_replica2, http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard3_replica1, http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard4_replica2] at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:325) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:171) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.
Re: Problem querying collection in Solr 4.1
Thanks, I'll check that out. Turns out our problem was we had omitTermFreqAndPositions true but were running queries like "puppy dog" which, I would imagine, require position. On Mon, Jan 21, 2013 at 9:22 PM, Gopal Patwa wrote: > one thing I noticed in solrconfig xml that it set to use Lucene version 4.0 > index format but you mention you are using it 4.1 > > LUCENE_40 > > > > On Mon, Jan 21, 2013 at 4:26 PM, Brett Hoerner >wrote: > > > I have a collection in Solr 4.1 RC1 and doing a simple query like > > text:"puppy dog" is causing an exception. Oddly enough, I CAN query for > > text:puppy or text:"puppy", but adding the space breaks everything. > > > > Schema and config: https://gist.github.com/f49da15e39e5609b75b1 > > > > This happens whether I query the whole collection or a single direct > core. > > I haven't tested whether this would happen outside of SolrCloud. > > > > > http://localhost:8984/solr/timeline/select?q=text%3A%22puppy+dog%22&wt=xml > > > > > > > http://localhost:8984/solr/timeline_shard4_replica1/select?q=text%3A%22puppy+dog%22&wt=xml > > > > Jan 22, 2013 12:07:24 AM org.apache.solr.common.SolrException log > > SEVERE: null:org.apache.solr.common.SolrException: > > org.apache.solr.client.solrj.SolrServerException: No live SolrServers > > available to handle this request:[ > > > http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard2_replica1, > > > http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard1_replica2, > > > http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard3_replica2, > > > http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard4_replica1, > > > http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard1_replica1, > > > http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard2_replica2, > > > http://timelinesearch-1d.i.massrel.com:8983/solr/timeline_shard3_replica1, > > > http://timelinesearch-2d.i.massrel.com:8983/solr/timeline_shard4_replica2] > > at > > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302) > > at > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) > > at > > > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) > > at > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) > > at > > > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > > at > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) > > at > > > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > > > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) > > at > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) > > at > > > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > at > > > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) > > at > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > at > > > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > at > > > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > at > > > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:365) > > at > > > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) > > at > > > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > > at > > > > > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) > > at > > > > > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) > > at org.eclipse.jetty.http.HttpP
Is it possible to manually select a shard leader in a running SolrCloud?
Hi, I have a 5 server cluster running 1 collection with 20 shards, replication factor of 2. Earlier this week I had to do a rolling restart across the cluster, this worked great and the cluster stayed up the whole time. The problem is that the last node I restarted is now the leader of 0 shards, and is just holding replicas. I've noticed this node has abnormally high load average, while the other nodes (who have the same number of shards, but more leaders on average) are fine. First, I'm wondering if that loud could be related to being a 5x replica and 0x leader? Second, I was wondering if I could somehow flag single shards to re-elect a leader (or force a leader) so that I could more evenly distribute how many leader shards each physical server has running? Thanks.
Re: Is it possible to manually select a shard leader in a running SolrCloud?
As an update, it looks like the heavy load is in part because the node never "catches back up" with the other nodes. In SolrCloud UI it was yellow for a long time, then eventually grey, then back to yellow and orange. It never recovers as green. I should note this collection is very busy, indexing 5k+ small documents per second, but the nodes were all fine until I had to restart them and they had to re-sync. Here is the log since reboot: https://gist.github.com/396af4b217ce8f536db6 Any ideas? On Sat, Feb 2, 2013 at 10:27 AM, Brett Hoerner wrote: > Hi, > > I have a 5 server cluster running 1 collection with 20 shards, replication > factor of 2. > > Earlier this week I had to do a rolling restart across the cluster, this > worked great and the cluster stayed up the whole time. The problem is that > the last node I restarted is now the leader of 0 shards, and is just > holding replicas. > > I've noticed this node has abnormally high load average, while the other > nodes (who have the same number of shards, but more leaders on average) are > fine. > > First, I'm wondering if that loud could be related to being a 5x replica > and 0x leader? > > Second, I was wondering if I could somehow flag single shards to re-elect > a leader (or force a leader) so that I could more evenly distribute how > many leader shards each physical server has running? > > Thanks. >
Re: Is it possible to manually select a shard leader in a running SolrCloud?
What is the inverse I'd use to re-create/load a core on another machine but make sure it's also "known" to SolrCloud/as a shard? On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale wrote: > > To be more clear lets say bob it the leader of core 1. On bob do a > /admin/cores?action=unload&name=core1. This removes the core/shard from > bob, giving the other servers a chance to grab leader props. > > -Joey > > On Feb 2, 2013, at 11:27 AM, Brett Hoerner wrote: > > > Hi, > > > > I have a 5 server cluster running 1 collection with 20 shards, > replication > > factor of 2. > > > > Earlier this week I had to do a rolling restart across the cluster, this > > worked great and the cluster stayed up the whole time. The problem is > that > > the last node I restarted is now the leader of 0 shards, and is just > > holding replicas. > > > > I've noticed this node has abnormally high load average, while the other > > nodes (who have the same number of shards, but more leaders on average) > are > > fine. > > > > First, I'm wondering if that loud could be related to being a 5x replica > > and 0x leader? > > > > Second, I was wondering if I could somehow flag single shards to > re-elect a > > leader (or force a leader) so that I could more evenly distribute how > many > > leader shards each physical server has running? > > > > Thanks. > >
Delete By Query suddenly halts indexing on SolrCloud cluster
I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards, replication factor of 2) that I've been using for over a month now in production. Suddenly, the hourly cron I run that dispatches a delete by query completely halts all indexing. Select queries still run (and quickly), there is no CPU or disk I/O happening, but suddenly my indexer (which runs at ~400 doc/sec steady) pauses, and everything blocks indefinitely. To clarify some on the schema, this is a moving window of data (imagine messages that don't matter after a 24 hour period) which are regularly "chopped" off by my hourly cron (deleting messages over 24 hours old) to keep the index size reasonable. There are no errors (log level warn) in the logs. I'm not sure what to look into. As I've said this has been running (delete included) for about a month. I'll also note that I have another cluster much like this one where I do the very same thing... it has 4 machines, and indexes 10x the documents per second, with more indexes... and yet I delete on a cron without issue... Any ideas on where to start, or other information I could provide? Thanks much.
Re: Delete By Query suddenly halts indexing on SolrCloud cluster
4.1, I'll induce it again and run jstack. On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller wrote: > Which version of Solr? > > Can you use jconsole, visualvm, or jstack to get some stack traces and see > where things are halting? > > - Mark > > On Mar 6, 2013, at 11:45 AM, Brett Hoerner wrote: > > > I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards, > > replication factor of 2) that I've been using for over a month now in > > production. > > > > Suddenly, the hourly cron I run that dispatches a delete by query > > completely halts all indexing. Select queries still run (and quickly), > > there is no CPU or disk I/O happening, but suddenly my indexer (which > runs > > at ~400 doc/sec steady) pauses, and everything blocks indefinitely. > > > > To clarify some on the schema, this is a moving window of data (imagine > > messages that don't matter after a 24 hour period) which are regularly > > "chopped" off by my hourly cron (deleting messages over 24 hours old) to > > keep the index size reasonable. > > > > There are no errors (log level warn) in the logs. I'm not sure what to > look > > into. As I've said this has been running (delete included) for about a > > month. > > > > I'll also note that I have another cluster much like this one where I do > > the very same thing... it has 4 machines, and indexes 10x the documents > per > > second, with more indexes... and yet I delete on a cron without issue... > > > > Any ideas on where to start, or other information I could provide? > > > > Thanks much. > >
Re: Delete By Query suddenly halts indexing on SolrCloud cluster
Here is a dump after the delete, indexing has been stopped: https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e An interesting hint that I forgot to mention: it doesn't always happen on the first delete. I manually ran the delete cron, and the server continued to work. I waited about 5 minutes and ran it again and it stalled the indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png Another thing I forgot to mention. To bring the cluster back to life I: 1) stop my indexer 2) stop server1, start server1 3) stop server2, start start2 4) manually rebalance half of the shards to be mastered on server2 (unload/create on server1) 5) restart indexer And it works again until a delete eventually kills it. To be clear again, select queries continue to work indefinitely. Thanks, Brett On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller wrote: > Which version of Solr? > > Can you use jconsole, visualvm, or jstack to get some stack traces and see > where things are halting? > > - Mark > > On Mar 6, 2013, at 11:45 AM, Brett Hoerner wrote: > > > I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards, > > replication factor of 2) that I've been using for over a month now in > > production. > > > > Suddenly, the hourly cron I run that dispatches a delete by query > > completely halts all indexing. Select queries still run (and quickly), > > there is no CPU or disk I/O happening, but suddenly my indexer (which > runs > > at ~400 doc/sec steady) pauses, and everything blocks indefinitely. > > > > To clarify some on the schema, this is a moving window of data (imagine > > messages that don't matter after a 24 hour period) which are regularly > > "chopped" off by my hourly cron (deleting messages over 24 hours old) to > > keep the index size reasonable. > > > > There are no errors (log level warn) in the logs. I'm not sure what to > look > > into. As I've said this has been running (delete included) for about a > > month. > > > > I'll also note that I have another cluster much like this one where I do > > the very same thing... it has 4 machines, and indexes 10x the documents > per > > second, with more indexes... and yet I delete on a cron without issue... > > > > Any ideas on where to start, or other information I could provide? > > > > Thanks much. > >
Re: Delete By Query suddenly halts indexing on SolrCloud cluster
If there's anything I can try, let me know. Interestingly, I think I have noticed that if I stop my indexer, do my delete, and restart the indexer then I'm fine. Which goes along with the update thread contention theory. On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller wrote: > This is what I see: > > We currently limit the number of outstanding update requests at one time > to avoid a crazy number of threads being used. > > It looks like a bunch of update requests are stuck in socket reads and are > taking up the available threads. It looks like the deletes are hanging out > waiting for a free thread. > > It seems the question is, why are the requests stuck in socket reads. I > don't have an answer at the moment. > > We should probably get this into a JIRA issue though. > > - Mark > > > On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch > wrote: > > > It does not look like a deadlock, though it could be a distributed one. > Or > > it could be a livelock, though that's less likely. > > > > Here is what we used to recommend in similar situations for large Java > > systems (BEA Weblogic): > > 1) Do thread dump of both systems before anything. As simultaneous as you > > can make it. > > 2) Do the first delete. Do a thread dump every 2 minutes on both servers > > (so, say 3 dumps in that 5 minute wait) > > 3) Do the second delete and do thread dumps every 30 seconds on both > > servers from just before and then during. Preferably all the way until > the > > problem shows itself. Every 5 seconds if the problem shows itself really > > quick. > > > > That gives you a LOT of thread dumps. But it also gives you something > that > > allows to compare thread state before and after the problem starting > > showing itself and to identify moving (or unnaturally still) threads. I > > even wrote a tool long time ago that parsed those thread dumps > > automatically and generated pretty deadlock graphs of those. > > > > > > Regards, > > Alex. > > > > > > > > > > > > Personal blog: http://blog.outerthoughts.com/ > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > - Time is the quality of nature that keeps events from happening all at > > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > > > > On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller > wrote: > > > >> Thans Brett, good stuff (though not a good problem). > >> > >> We def need to look into this. > >> > >> - Mark > >> > >> On Mar 6, 2013, at 1:53 PM, Brett Hoerner > wrote: > >> > >>> Here is a dump after the delete, indexing has been stopped: > >>> https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e > >>> > >>> An interesting hint that I forgot to mention: it doesn't always happen > on > >>> the first delete. I manually ran the delete cron, and the server > >> continued > >>> to work. I waited about 5 minutes and ran it again and it stalled the > >>> indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png > >>> > >>> Another thing I forgot to mention. To bring the cluster back to life I: > >>> > >>> 1) stop my indexer > >>> 2) stop server1, start server1 > >>> 3) stop server2, start start2 > >>> 4) manually rebalance half of the shards to be mastered on server2 > >>> (unload/create on server1) > >>> 5) restart indexer > >>> > >>> And it works again until a delete eventually kills it. > >>> > >>> To be clear again, select queries continue to work indefinitely. > >>> > >>> Thanks, > >>> Brett > >>> > >>> > >>> On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller > >> wrote: > >>> > >>>> Which version of Solr? > >>>> > >>>> Can you use jconsole, visualvm, or jstack to get some stack traces and > >> see > >>>> where things are halting? > >>>> > >>>> - Mark > >>>> > >>>> On Mar 6, 2013, at 11:45 AM, Brett Hoerner > >> wrote: > >>>> > >>>>> I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards, > >>>>> replication factor of 2) that I've been using for over a month now in > >>>>> production. > >>>>> > >>>>> Suddenly, the hourly cron I run t
Re: Delete By Query suddenly halts indexing on SolrCloud cluster
Here is the other server when it's locked: https://gist.github.com/3529b7b6415756ead413 To be clear, neither is really "the replica", I have 32 shards and each physical server is the leader for 16, and the replica for 16. Also, related to the max threads hunch: my working cluster has many, many fewer shards per Solr instance. I'm going to do some migration dancing on this cluster today to have more Solr JVMs each with fewer cores, and see how it affects the deletes. On Wed, Mar 6, 2013 at 5:40 PM, Mark Miller wrote: > Any chance you can grab the stack trace of a replica as well? (also when > it's locked up of course). > > - Mark > > On Mar 6, 2013, at 3:34 PM, Brett Hoerner wrote: > > > If there's anything I can try, let me know. Interestingly, I think I have > > noticed that if I stop my indexer, do my delete, and restart the indexer > > then I'm fine. Which goes along with the update thread contention theory. > > > > > > On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller > wrote: > > > >> This is what I see: > >> > >> We currently limit the number of outstanding update requests at one time > >> to avoid a crazy number of threads being used. > >> > >> It looks like a bunch of update requests are stuck in socket reads and > are > >> taking up the available threads. It looks like the deletes are hanging > out > >> waiting for a free thread. > >> > >> It seems the question is, why are the requests stuck in socket reads. I > >> don't have an answer at the moment. > >> > >> We should probably get this into a JIRA issue though. > >> > >> - Mark > >> > >> > >> On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch > >> wrote: > >> > >>> It does not look like a deadlock, though it could be a distributed one. > >> Or > >>> it could be a livelock, though that's less likely. > >>> > >>> Here is what we used to recommend in similar situations for large Java > >>> systems (BEA Weblogic): > >>> 1) Do thread dump of both systems before anything. As simultaneous as > you > >>> can make it. > >>> 2) Do the first delete. Do a thread dump every 2 minutes on both > servers > >>> (so, say 3 dumps in that 5 minute wait) > >>> 3) Do the second delete and do thread dumps every 30 seconds on both > >>> servers from just before and then during. Preferably all the way until > >> the > >>> problem shows itself. Every 5 seconds if the problem shows itself > really > >>> quick. > >>> > >>> That gives you a LOT of thread dumps. But it also gives you something > >> that > >>> allows to compare thread state before and after the problem starting > >>> showing itself and to identify moving (or unnaturally still) threads. I > >>> even wrote a tool long time ago that parsed those thread dumps > >>> automatically and generated pretty deadlock graphs of those. > >>> > >>> > >>> Regards, > >>> Alex. > >>> > >>> > >>> > >>> > >>> > >>> Personal blog: http://blog.outerthoughts.com/ > >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > >>> - Time is the quality of nature that keeps events from happening all at > >>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > >>> > >>> > >>> On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller > >> wrote: > >>> > >>>> Thans Brett, good stuff (though not a good problem). > >>>> > >>>> We def need to look into this. > >>>> > >>>> - Mark > >>>> > >>>> On Mar 6, 2013, at 1:53 PM, Brett Hoerner > >> wrote: > >>>> > >>>>> Here is a dump after the delete, indexing has been stopped: > >>>>> https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e > >>>>> > >>>>> An interesting hint that I forgot to mention: it doesn't always > happen > >> on > >>>>> the first delete. I manually ran the delete cron, and the server > >>>> continued > >>>>> to work. I waited about 5 minutes and ran it again and it stalled the > >>>>> indexer (as seen from indexer process): > http://i.imgur.com/1Tt35u0.png > >>>>> > >>>&g
Re: Delete By Query suddenly halts indexing on SolrCloud cluster
As a side note, do you think that was a poor idea? I figured it's better to spread the master "load" around? On Thu, Mar 7, 2013 at 11:29 AM, Mark Miller wrote: > > On Mar 7, 2013, at 9:03 AM, Brett Hoerner wrote: > > > To be clear, neither is really "the replica", I have 32 shards and each > > physical server is the leader for 16, and the replica for 16. > > Ah, interesting. That actually could be part of the issue - some brain > cells are firing. I'm away from home till this weekend, but I can try and > duplicate this when I get to my home base setup. > > - Mark
Re: Delete By Query suddenly halts indexing on SolrCloud cluster
As an update to this, I did my SolrCloud dance and made it 2xJVMs per machine (2 machines still, the same ones) and spread the load around. Each Solr instance now has 16 total shards (master for 8, replica for 8). *drum roll* ... I can repeatedly run my delete script and nothing breaks. :) On Thu, Mar 7, 2013 at 11:03 AM, Brett Hoerner wrote: > Here is the other server when it's locked: > https://gist.github.com/3529b7b6415756ead413 > > To be clear, neither is really "the replica", I have 32 shards and each > physical server is the leader for 16, and the replica for 16. > > Also, related to the max threads hunch: my working cluster has many, many > fewer shards per Solr instance. I'm going to do some migration dancing on > this cluster today to have more Solr JVMs each with fewer cores, and see > how it affects the deletes. > > > On Wed, Mar 6, 2013 at 5:40 PM, Mark Miller wrote: > >> Any chance you can grab the stack trace of a replica as well? (also when >> it's locked up of course). >> >> - Mark >> >> On Mar 6, 2013, at 3:34 PM, Brett Hoerner wrote: >> >> > If there's anything I can try, let me know. Interestingly, I think I >> have >> > noticed that if I stop my indexer, do my delete, and restart the indexer >> > then I'm fine. Which goes along with the update thread contention >> theory. >> > >> > >> > On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller >> wrote: >> > >> >> This is what I see: >> >> >> >> We currently limit the number of outstanding update requests at one >> time >> >> to avoid a crazy number of threads being used. >> >> >> >> It looks like a bunch of update requests are stuck in socket reads and >> are >> >> taking up the available threads. It looks like the deletes are hanging >> out >> >> waiting for a free thread. >> >> >> >> It seems the question is, why are the requests stuck in socket reads. I >> >> don't have an answer at the moment. >> >> >> >> We should probably get this into a JIRA issue though. >> >> >> >> - Mark >> >> >> >> >> >> On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch >> >> wrote: >> >> >> >>> It does not look like a deadlock, though it could be a distributed >> one. >> >> Or >> >>> it could be a livelock, though that's less likely. >> >>> >> >>> Here is what we used to recommend in similar situations for large Java >> >>> systems (BEA Weblogic): >> >>> 1) Do thread dump of both systems before anything. As simultaneous as >> you >> >>> can make it. >> >>> 2) Do the first delete. Do a thread dump every 2 minutes on both >> servers >> >>> (so, say 3 dumps in that 5 minute wait) >> >>> 3) Do the second delete and do thread dumps every 30 seconds on both >> >>> servers from just before and then during. Preferably all the way until >> >> the >> >>> problem shows itself. Every 5 seconds if the problem shows itself >> really >> >>> quick. >> >>> >> >>> That gives you a LOT of thread dumps. But it also gives you something >> >> that >> >>> allows to compare thread state before and after the problem starting >> >>> showing itself and to identify moving (or unnaturally still) threads. >> I >> >>> even wrote a tool long time ago that parsed those thread dumps >> >>> automatically and generated pretty deadlock graphs of those. >> >>> >> >>> >> >>> Regards, >> >>> Alex. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> Personal blog: http://blog.outerthoughts.com/ >> >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >> >>> - Time is the quality of nature that keeps events from happening all >> at >> >>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD >> book) >> >>> >> >>> >> >>> On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller >> >> wrote: >> >>> >> >>>> Thans Brett, good stuff (though not a good problem). >> >>>> >> >>>> We def need to look into this. >> >>>> >> >>>> - Ma