I tried running SOLR Cloud with the default number of shards (i.e. 1), and I get the same results.
On Wed, Feb 29, 2012 at 10:46 AM, Matthew Parker < mpar...@apogeeintegration.com> wrote: > Mark, > > Nothing appears to be wrong in the logs. I wiped the indexes and imported > 37 files from SharePoint using Manifold. All 37 make it in, but SOLR still > has issues with the results being inconsistent. > > Let me run my setup by you, and see whether that is the issue? > > On one machine, I have three zookeeper instances, four solr instances, and > a data directory for solr and zookeeper config data. > > Step 1. I modified each zoo.xml configuration file to have: > > Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg > ================ > tickTime=2000 > initLimit=10 > syncLimit=5 > dataDir=[DATA_DIRECTORY]/zk1_data > clientPort=2181 > server.1=localhost:2888:3888 > server.2=localhost:2889:3889 > server.3=localhost:2890:3890 > > Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following > contents: > ============================================================== > 1 > > Zookeep 2 - Create /zookeeper2/conf/zoo.cfg > ============== > tickTime=2000 > initLimit=10 > syncLimit=5 > dataDir=[DATA_DIRECTORY]/zk2_data > clientPort=2182 > server.1=localhost:2888:3888 > server.2=localhost:2889:3889 > server.3=localhost:2890:3890 > > Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following > contents: > ============================================================== > 2 > > Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg > ================ > tickTime=2000 > initLimit=10 > syncLimit=5 > dataDir=[DATA_DIRECTORY]/zk3_data > clientPort=2183 > server.1=localhost:2888:3888 > server.2=localhost:2889:3889 > server.3=localhost:2890:3890 > > Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following > contents: > ==================================================== > 3 > > Step 2 - SOLR Build > =============== > > I pulled the latest SOLR trunk down. I built it with the following > commands: > > ant example dist > > I modified the solr.war files and added the solr cell and extraction > libraries to WEB-INF/lib. I couldn't get the extraction to work > any other way. Will zookeper pickup jar files stored with the rest of the > configuration files in Zookeeper? > > I copied the contents of the example directory to each of my SOLR > directories. > > Step 3 - Starting Zookeeper instances > =========================== > > I ran the following commands to start the zookeeper instances: > > start .\zookeeper1\bin\zkServer.cmd > start .\zookeeper2\bin\zkServer.cmd > start .\zookeeper3\bin\zkServer.cmd > > Step 4 - Start Main SOLR instance > ========================== > I ran the following command to start the main SOLR instance > > java -Djetty.port=8081 -Dhostport=8081 > -Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2 > -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar > > Starts up fine. > > Step 5 - Start the Remaining 3 SOLR Instances > ================================== > I ran the following commands to start the other 3 instances from their > home directories: > > java -Djetty.port=8082 -Dhostport=8082 > -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar > > java -Djetty.port=8083 -Dhostport=8083 > -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar > > java -Djetty.port=8084 -Dhostport=8084 > -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar > > All startup without issue. > > Step 6 - Modified solrconfig.xml to have a custom request handler > =============================================== > > <requestHandler name="/update/sharepoint" startup="lazy" > class="solr.extraction.ExtractingRequestHandler"> > <lst name="defaults"> > <str name="update.chain">sharepoint-pipeline</str> > <str name="fmap.content">text</str> > <str name="lowernames">true</str> > <str name="uprefix">ignored</str> > <str name="caputreAttr">true</str> > <str name="fmap.a">links</str> > <str name="fmap.div">ignored</str> > </lst> > </requestHandler> > > <updateRequestProcessorChain name="sharepoint-pipeline"> > <processor class="solr.processor.SignatureUpdateProcessorFactory"> > <bool name="enabled">true</bool> > <str name="signatureField">id</str> > <bool name="owerrightDupes">true</bool> > <str name="fields">url</str> > <str name="signatureClass">solr.processor.Lookup3Signature</str> > </processor> > <processor class="solr.LogUpdateProcessorFactory"/> > <processor class="solr.RunUpdateProcessorFactory"/> > </updateRequestProcessorChain> > > > Hopefully this will shed some light on why my configuration is having > issues. > > Thanks for your help. > > Matt > > > > On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller <markrmil...@gmail.com>wrote: > >> Hmm...this is very strange - there is nothing interesting in any of the >> logs? >> >> In clusterstate.json, all of the shards have an active state? >> >> >> There are quite a few of us doing exactly this setup recently, so there >> must be something we are missing here... >> >> Any info you can offer might help. >> >> - Mark >> >> On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote: >> >> > Mark, >> > >> > I got the codebase from the 2/26/2012, and I got the same inconsistent >> > results. >> > >> > I have solr running on four ports 8081-8084 >> > >> > 8081 and 8082 are the leaders for shard 1, and shard 2, respectively >> > >> > 8083 - is assigned to shard 1 >> > 8084 - is assigned to shard 2 >> > >> > queries come in and sometime it seems the windows from 8081 and 8083 >> move >> > responding to the query but there are no results. >> > >> > if the queries run on 8081/8082 or 8081/8084 then results come back ok. >> > >> > The query is nothing more than: q=*:* >> > >> > Regards, >> > >> > Matt >> > >> > >> > On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker < >> > mpar...@apogeeintegration.com> wrote: >> > >> >> I'll have to check on the commit situation. We have been pushing data >> from >> >> SharePoint the last week or so. Would that somehow block the documents >> >> moving between the solr instances? >> >> >> >> I'll try another version tomorrow. Thanks for the suggestions. >> >> >> >> On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller <markrmil...@gmail.com >> >wrote: >> >> >> >>> Hmmm...all of that looks pretty normal... >> >>> >> >>> Did a commit somehow fail on the other machine? When you view the >> stats >> >>> for the update handler, are there a lot of pending adds for on of the >> >>> nodes? Do the commit counts match across nodes? >> >>> >> >>> You can also query an individual node with distrib=false to check >> that. >> >>> >> >>> If you build is a month old, I'd honestly recommend you try upgrading >> as >> >>> well. >> >>> >> >>> - Mark >> >>> >> >>> On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote: >> >>> >> >>>> Here is most of the cluster state: >> >>>> >> >>>> Connected to Zookeeper >> >>>> localhost:2181, localhost: 2182, localhost:2183 >> >>>> >> >>>> /(v=0 children=7) "" >> >>>> /CONFIGS(v=0, children=1) >> >>>> /CONFIGURATION(v=0 children=25) >> >>>> <<<<< all the configuration files, velocity info, xslt, >> etc. >> >>>>>>>> >> >>>> /NODE_STATES(v=0 children=4) >> >>>> MACHINE1:8083_SOLR (v=121)"[{"shard_id":"shard1", >> >>>> >> "state":"active","core":"","collection":"collection1","node_name:"..." >> >>>> MACHINE1:8082_SOLR (v=101)"[{"shard_id":"shard2", >> >>>> >> "state":"active","core":"","collection":"collection1","node_name:"..." >> >>>> MACHINE1:8081_SOLR (v=92)"[{"shard_id":"shard1", >> >>>> >> "state":"active","core":"","collection":"collection1","node_name:"..." >> >>>> MACHINE1:8084_SOLR (v=73)"[{"shard_id":"shard2", >> >>>> >> "state":"active","core":"","collection":"collection1","node_name:"..." >> >>>> /ZOOKEEPER (v-0 children=1) >> >>>> QUOTA(v=0) >> >>>> >> >>>> >> >>> >> /CLUSTERSTATE.JSON(V=272)"{"collection1":{"shard1":{MACHINE1:8081_solr_":{shard_id":"shard1","leader":"true","..." >> >>>> /LIVE_NODES (v=0 children=4) >> >>>> MACHINE1:8083_SOLR(ephemeral v=0) >> >>>> MACHINE1:8082_SOLR(ephemeral v=0) >> >>>> MACHINE1:8081_SOLR(ephemeral v=0) >> >>>> MACHINE1:8084_SOLR(ephemeral v=0) >> >>>> /COLLECTIONS (v=1 children=1) >> >>>> COLLECTION1(v=0 children=2)"{"configName":"configuration1"}" >> >>>> LEADER_ELECT(v=0 children=2) >> >>>> SHARD1(V=0 children=1) >> >>>> ELECTION(v=0 children=2) >> >>>> >> >>>> 87186203314552835-MACHINE1:8081_SOLR_-N_0000000096(ephemeral v=0) >> >>>> >> >>>> 87186203314552836-MACHINE1:8083_SOLR_-N_0000000084(ephemeral v=0) >> >>>> SHARD2(v=0 children=1) >> >>>> ELECTION(v=0 children=2) >> >>>> >> >>>> 231301391392833539-MACHINE1:8084_SOLR_-N_0000000085(ephemeral v=0) >> >>>> >> >>>> 159243797356740611-MACHINE1:8082_SOLR_-N_0000000084(ephemeral v=0) >> >>>> LEADERS (v=0 children=2) >> >>>> SHARD1 (ephemeral >> >>>> v=0)"{"core":"","node_name":"MACHINE1:8081_solr","base_url":" >> >>>> http://MACHINE1:8081/solr"}" >> >>>> SHARD2 (ephemeral >> >>>> v=0)"{"core":"","node_name":"MACHINE1:8082_solr","base_url":" >> >>>> http://MACHINE1:8082/solr"}" >> >>>> /OVERSEER_ELECT (v=0 children=2) >> >>>> ELECTION (v=0 children=4) >> >>>> 231301391392833539-MACHINE1:8084_SOLR_-N_0000000251(ephemeral >> >>> v=0) >> >>>> 87186203314552835-MACHINE1:8081_SOLR_-N_0000000248(ephemeral >> >>> v=0) >> >>>> 159243797356740611-MACHINE1:8082_SOLR_-N_0000000250(ephemeral >> >>> v=0) >> >>>> 87186203314552836-MACHINE1:8083_SOLR_-N_0000000249(ephemeral >> >>> v=0) >> >>>> LEADER (emphemeral >> >>>> v=0)"{"id":"87186203314552835-MACHINE1:8081_solr-n_000000248"}" >> >>>> >> >>>> >> >>>> >> >>>> On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller <markrmil...@gmail.com> >> >>> wrote: >> >>>> >> >>>>> >> >>>>> On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote: >> >>>>> >> >>>>>> Thanks for your reply Mark. >> >>>>>> >> >>>>>> I believe the build was towards the begining of the month. The >> >>>>>> solr.spec.version is 4.0.0.2012.01.10.38.09 >> >>>>>> >> >>>>>> I cannot access the clusterstate.json contents. I clicked on it a >> >>> couple >> >>>>> of >> >>>>>> times, but nothing happens. Is that stored on disk somewhere? >> >>>>> >> >>>>> Are you using the new admin UI? That has recently been updated to >> work >> >>>>> better with cloud - it had some troubles not too long ago. If you >> are, >> >>> you >> >>>>> should trying using the old admin UI's zookeeper page - that should >> >>> show >> >>>>> the cluster state. >> >>>>> >> >>>>> That being said, there has been a lot of bug fixes over the past >> month >> >>> - >> >>>>> so you may just want to update to a recent version. >> >>>>> >> >>>>>> >> >>>>>> I configured a custom request handler to calculate an unique >> document >> >>> id >> >>>>>> based on the file's url. >> >>>>>> >> >>>>>> On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller < >> markrmil...@gmail.com> >> >>>>> wrote: >> >>>>>> >> >>>>>>> Hey Matt - is your build recent? >> >>>>>>> >> >>>>>>> Can you visit the cloud/zookeeper page in the admin and send the >> >>>>> contents >> >>>>>>> of the clusterstate.json node? >> >>>>>>> >> >>>>>>> Are you using a custom index chain or anything out of the >> ordinary? >> >>>>>>> >> >>>>>>> >> >>>>>>> - Mark >> >>>>>>> >> >>>>>>> On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote: >> >>>>>>> >> >>>>>>>> TWIMC: >> >>>>>>>> >> >>>>>>>> Environment >> >>>>>>>> ========= >> >>>>>>>> Apache SOLR rev-1236154 >> >>>>>>>> Apache Zookeeper 3.3.4 >> >>>>>>>> Windows 7 >> >>>>>>>> JDK 1.6.0_23.b05 >> >>>>>>>> >> >>>>>>>> I have built a SOLR Cloud instance with 4 nodes using the embeded >> >>> Jetty >> >>>>>>>> servers. >> >>>>>>>> >> >>>>>>>> I created a 3 node zookeeper ensemble to manage the solr >> >>> configuration >> >>>>>>> data. >> >>>>>>>> >> >>>>>>>> All the instances run on one server so I've had to move ports >> around >> >>>>> for >> >>>>>>>> the various applications. >> >>>>>>>> >> >>>>>>>> I start the 3 zookeeper nodes. >> >>>>>>>> >> >>>>>>>> I started the first instance of solr cloud with the parameter to >> >>> have >> >>>>> two >> >>>>>>>> shards. >> >>>>>>>> >> >>>>>>>> The start the remaining 3 solr nodes. >> >>>>>>>> >> >>>>>>>> The system comes up fine. No errors thrown. >> >>>>>>>> >> >>>>>>>> I can view the solr cloud console and I can see the SOLR >> >>> configuration >> >>>>>>>> files managed by ZooKeeper. >> >>>>>>>> >> >>>>>>>> I published data into the SOLR Cloud instances from SharePoint >> using >> >>>>>>> Apache >> >>>>>>>> Manifold 0.4-incubating. Manifold is setup to publish the data >> into >> >>>>>>>> collection1, which is the only collection defined in the cluster. >> >>>>>>>> >> >>>>>>>> When I query the data from collection1 as per the solr wiki, the >> >>>>> results >> >>>>>>>> are inconsistent. Sometimes all the results are there, other >> times >> >>>>>>> nothing >> >>>>>>>> comes back at all. >> >>>>>>>> >> >>>>>>>> It seems to be having an issue auto replicating the data across >> the >> >>>>>>> cloud. >> >>>>>>>> >> >>>>>>>> Is there some specific setting I might have missed? Based upon >> what >> >>> I >> >>>>>>> read, >> >>>>>>>> I thought that SOLR cloud would take care of distributing and >> >>>>> replicating >> >>>>>>>> the data automatically. Do you have to tell it what shard to >> publish >> >>>>> the >> >>>>>>>> data into as well? >> >>>>>>>> >> >>>>>>>> Any help would be appreciated. >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> >> >>>>>>>> Matt >> >>>>>>>> >> >>>>>>>> ------------------------------ >> >>>>>>>> This e-mail and any files transmitted with it may be proprietary. >> >>>>>>> Please note that any views or opinions presented in this e-mail >> are >> >>>>> solely >> >>>>>>> those of the author and do not necessarily represent those of >> Apogee >> >>>>>>> Integration. >> >>>>>>> >> >>>>>>> - Mark Miller >> >>>>>>> lucidimagination.com >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>>> Matt >> >>>>>> >> >>>>>> ------------------------------ >> >>>>>> This e-mail and any files transmitted with it may be proprietary. >> >>>>> Please note that any views or opinions presented in this e-mail are >> >>> solely >> >>>>> those of the author and do not necessarily represent those of Apogee >> >>>>> Integration. >> >>>>> >> >>>>> - Mark Miller >> >>>>> lucidimagination.com >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> >> >>>> ------------------------------ >> >>>> This e-mail and any files transmitted with it may be proprietary. >> >>> Please note that any views or opinions presented in this e-mail are >> solely >> >>> those of the author and do not necessarily represent those of Apogee >> >>> Integration. >> >>> >> >>> - Mark Miller >> >>> lucidimagination.com >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >> >> > >> > ------------------------------ >> > This e-mail and any files transmitted with it may be proprietary. >> Please note that any views or opinions presented in this e-mail are solely >> those of the author and do not necessarily represent those of Apogee >> Integration. >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> >> > ------------------------------ This e-mail and any files transmitted with it may be proprietary. Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Apogee Integration.