Mark/Sami

I ran the system with 3 zookeeper nodes, 2 solr cloud nodes, and left
numShards set to its default value (i.e. 1)

I looks like it finally sync'd with the other one after quite a while, but
it's throwing lots of errors like the following:

org.apache.solr.common.SolrException: missing _version_ on update from
leader at
org.apache.solr.update.processor.DistributtedUpdateProcessor.versionDelete(
DistributedUpdateProcessor.java: 712)
....
....
....

Is it normal to sync long after the documents were sent for indexing?

I'll have to check and see whether the 4 solr node instance with 2 shards
works after waiting for the system to sync.

Regards,

Matt

On Wed, Feb 29, 2012 at 12:03 PM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> I also took out my requestHandler and used the standard /update/extract
> handler. Same result.
>
> On Wed, Feb 29, 2012 at 11:47 AM, Matthew Parker <
> mpar...@apogeeintegration.com> wrote:
>
>> I tried running SOLR Cloud with the default number of shards (i.e. 1),
>> and I get the same results.
>>
>> On Wed, Feb 29, 2012 at 10:46 AM, Matthew Parker <
>> mpar...@apogeeintegration.com> wrote:
>>
>>> Mark,
>>>
>>> Nothing appears to be wrong in the logs. I wiped the indexes and
>>> imported 37 files from SharePoint using Manifold. All 37 make it in, but
>>> SOLR still has issues with the results being inconsistent.
>>>
>>> Let me run my setup by you, and see whether that is the issue?
>>>
>>> On one machine, I have three zookeeper instances, four solr instances,
>>> and a data directory for solr and zookeeper config data.
>>>
>>> Step 1. I modified each zoo.xml configuration file to have:
>>>
>>> Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg
>>> ================
>>> tickTime=2000
>>> initLimit=10
>>> syncLimit=5
>>> dataDir=[DATA_DIRECTORY]/zk1_data
>>> clientPort=2181
>>> server.1=localhost:2888:3888
>>> server.2=localhost:2889:3889
>>> server.3=localhost:2890:3890
>>>
>>> Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
>>> contents:
>>> ==============================================================
>>> 1
>>>
>>> Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
>>> ==============
>>> tickTime=2000
>>> initLimit=10
>>> syncLimit=5
>>> dataDir=[DATA_DIRECTORY]/zk2_data
>>> clientPort=2182
>>> server.1=localhost:2888:3888
>>> server.2=localhost:2889:3889
>>> server.3=localhost:2890:3890
>>>
>>> Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
>>> contents:
>>> ==============================================================
>>> 2
>>>
>>> Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg
>>> ================
>>> tickTime=2000
>>> initLimit=10
>>> syncLimit=5
>>> dataDir=[DATA_DIRECTORY]/zk3_data
>>> clientPort=2183
>>> server.1=localhost:2888:3888
>>> server.2=localhost:2889:3889
>>> server.3=localhost:2890:3890
>>>
>>> Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
>>> contents:
>>> ====================================================
>>> 3
>>>
>>> Step 2 - SOLR Build
>>> ===============
>>>
>>> I pulled the latest SOLR trunk down. I built it with the following
>>> commands:
>>>
>>>            ant example dist
>>>
>>> I modified the solr.war files and added the solr cell and extraction
>>> libraries to WEB-INF/lib. I couldn't get the extraction to work
>>> any other way. Will zookeper pickup jar files stored with the rest of
>>> the configuration files in Zookeeper?
>>>
>>> I copied the contents of the example directory to each of my SOLR
>>> directories.
>>>
>>> Step 3 - Starting Zookeeper instances
>>> ===========================
>>>
>>> I ran the following commands to start the zookeeper instances:
>>>
>>> start .\zookeeper1\bin\zkServer.cmd
>>> start .\zookeeper2\bin\zkServer.cmd
>>> start .\zookeeper3\bin\zkServer.cmd
>>>
>>> Step 4 - Start Main SOLR instance
>>> ==========================
>>> I ran the following command to start the main SOLR instance
>>>
>>> java -Djetty.port=8081 -Dhostport=8081
>>> -Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
>>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>>
>>> Starts up fine.
>>>
>>> Step 5 - Start the Remaining 3 SOLR Instances
>>> ==================================
>>> I ran the following commands to start the other 3 instances from their
>>> home directories:
>>>
>>> java -Djetty.port=8082 -Dhostport=8082
>>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>>
>>> java -Djetty.port=8083 -Dhostport=8083
>>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>>
>>> java -Djetty.port=8084 -Dhostport=8084
>>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>>
>>> All startup without issue.
>>>
>>> Step 6 - Modified solrconfig.xml to have a custom request handler
>>> ===============================================
>>>
>>> <requestHandler name="/update/sharepoint" startup="lazy"
>>> class="solr.extraction.ExtractingRequestHandler">
>>>   <lst name="defaults">
>>>      <str name="update.chain">sharepoint-pipeline</str>
>>>      <str name="fmap.content">text</str>
>>>      <str name="lowernames">true</str>
>>>      <str name="uprefix">ignored</str>
>>>      <str name="caputreAttr">true</str>
>>>      <str name="fmap.a">links</str>
>>>      <str name="fmap.div">ignored</str>
>>>   </lst>
>>> </requestHandler>
>>>
>>> <updateRequestProcessorChain name="sharepoint-pipeline">
>>>    <processor class="solr.processor.SignatureUpdateProcessorFactory">
>>>       <bool name="enabled">true</bool>
>>>       <str name="signatureField">id</str>
>>>       <bool name="owerrightDupes">true</bool>
>>>       <str name="fields">url</str>
>>>       <str name="signatureClass">solr.processor.Lookup3Signature</str>
>>>    </processor>
>>>    <processor class="solr.LogUpdateProcessorFactory"/>
>>>    <processor class="solr.RunUpdateProcessorFactory"/>
>>> </updateRequestProcessorChain>
>>>
>>>
>>> Hopefully this will shed some light on why my configuration is having
>>> issues.
>>>
>>> Thanks for your help.
>>>
>>> Matt
>>>
>>>
>>>
>>> On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller <markrmil...@gmail.com>wrote:
>>>
>>>> Hmm...this is very strange - there is nothing interesting in any of the
>>>> logs?
>>>>
>>>> In clusterstate.json, all of the shards have an active state?
>>>>
>>>>
>>>> There are quite a few of us doing exactly this setup recently, so there
>>>> must be something we are missing here...
>>>>
>>>> Any info you can offer might help.
>>>>
>>>> - Mark
>>>>
>>>> On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
>>>>
>>>> > Mark,
>>>> >
>>>> > I got the codebase from the 2/26/2012, and I got the same inconsistent
>>>> > results.
>>>> >
>>>> > I have solr running on four ports 8081-8084
>>>> >
>>>> > 8081 and 8082 are the leaders for shard 1, and shard 2, respectively
>>>> >
>>>> > 8083 - is assigned to shard 1
>>>> > 8084 - is assigned to shard 2
>>>> >
>>>> > queries come in and sometime it seems the windows from 8081 and 8083
>>>> move
>>>> > responding to the query but there are no results.
>>>> >
>>>> > if the queries run on 8081/8082 or 8081/8084 then results come back
>>>> ok.
>>>> >
>>>> > The query is nothing more than: q=*:*
>>>> >
>>>> > Regards,
>>>> >
>>>> > Matt
>>>> >
>>>> >
>>>> > On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker <
>>>> > mpar...@apogeeintegration.com> wrote:
>>>> >
>>>> >> I'll have to check on the commit situation. We have been pushing
>>>> data from
>>>> >> SharePoint the last week or so. Would that somehow block the
>>>> documents
>>>> >> moving between the solr instances?
>>>> >>
>>>> >> I'll try another version tomorrow. Thanks for the suggestions.
>>>> >>
>>>> >> On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller <markrmil...@gmail.com
>>>> >wrote:
>>>> >>
>>>> >>> Hmmm...all of that looks pretty normal...
>>>> >>>
>>>> >>> Did a commit somehow fail on the other machine? When you view the
>>>> stats
>>>> >>> for the update handler, are there a lot of pending adds for on of
>>>> the
>>>> >>> nodes? Do the commit counts match across nodes?
>>>> >>>
>>>> >>> You can also query an individual node with distrib=false to check
>>>> that.
>>>> >>>
>>>> >>> If you build is a month old, I'd honestly recommend you try
>>>> upgrading as
>>>> >>> well.
>>>> >>>
>>>> >>> - Mark
>>>> >>>
>>>> >>> On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
>>>> >>>
>>>> >>>> Here is most of the cluster state:
>>>> >>>>
>>>> >>>> Connected to Zookeeper
>>>> >>>> localhost:2181, localhost: 2182, localhost:2183
>>>> >>>>
>>>> >>>> /(v=0 children=7) ""
>>>> >>>>  /CONFIGS(v=0, children=1)
>>>> >>>>     /CONFIGURATION(v=0 children=25)
>>>> >>>>            <<<<< all the configuration files, velocity info, xslt,
>>>> etc.
>>>> >>>>>>>>
>>>> >>>> /NODE_STATES(v=0 children=4)
>>>> >>>>    MACHINE1:8083_SOLR (v=121)"[{"shard_id":"shard1",
>>>> >>>>
>>>> "state":"active","core":"","collection":"collection1","node_name:"..."
>>>> >>>>    MACHINE1:8082_SOLR (v=101)"[{"shard_id":"shard2",
>>>> >>>>
>>>> "state":"active","core":"","collection":"collection1","node_name:"..."
>>>> >>>>    MACHINE1:8081_SOLR (v=92)"[{"shard_id":"shard1",
>>>> >>>>
>>>> "state":"active","core":"","collection":"collection1","node_name:"..."
>>>> >>>>    MACHINE1:8084_SOLR (v=73)"[{"shard_id":"shard2",
>>>> >>>>
>>>> "state":"active","core":"","collection":"collection1","node_name:"..."
>>>> >>>> /ZOOKEEPER (v-0 children=1)
>>>> >>>>    QUOTA(v=0)
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> /CLUSTERSTATE.JSON(V=272)"{"collection1":{"shard1":{MACHINE1:8081_solr_":{shard_id":"shard1","leader":"true","..."
>>>> >>>> /LIVE_NODES (v=0 children=4)
>>>> >>>>    MACHINE1:8083_SOLR(ephemeral v=0)
>>>> >>>>    MACHINE1:8082_SOLR(ephemeral v=0)
>>>> >>>>    MACHINE1:8081_SOLR(ephemeral v=0)
>>>> >>>>    MACHINE1:8084_SOLR(ephemeral v=0)
>>>> >>>> /COLLECTIONS (v=1 children=1)
>>>> >>>>    COLLECTION1(v=0 children=2)"{"configName":"configuration1"}"
>>>> >>>>        LEADER_ELECT(v=0 children=2)
>>>> >>>>            SHARD1(V=0 children=1)
>>>> >>>>                ELECTION(v=0 children=2)
>>>> >>>>
>>>> >>>> 87186203314552835-MACHINE1:8081_SOLR_-N_0000000096(ephemeral v=0)
>>>> >>>>
>>>> >>>> 87186203314552836-MACHINE1:8083_SOLR_-N_0000000084(ephemeral v=0)
>>>> >>>>            SHARD2(v=0 children=1)
>>>> >>>>                ELECTION(v=0 children=2)
>>>> >>>>
>>>> >>>> 231301391392833539-MACHINE1:8084_SOLR_-N_0000000085(ephemeral v=0)
>>>> >>>>
>>>> >>>> 159243797356740611-MACHINE1:8082_SOLR_-N_0000000084(ephemeral v=0)
>>>> >>>>        LEADERS (v=0 children=2)
>>>> >>>>            SHARD1 (ephemeral
>>>> >>>> v=0)"{"core":"","node_name":"MACHINE1:8081_solr","base_url":"
>>>> >>>> http://MACHINE1:8081/solr"}";
>>>> >>>>            SHARD2 (ephemeral
>>>> >>>> v=0)"{"core":"","node_name":"MACHINE1:8082_solr","base_url":"
>>>> >>>> http://MACHINE1:8082/solr"}";
>>>> >>>> /OVERSEER_ELECT (v=0 children=2)
>>>> >>>>    ELECTION (v=0 children=4)
>>>> >>>>
>>>>  231301391392833539-MACHINE1:8084_SOLR_-N_0000000251(ephemeral
>>>> >>> v=0)
>>>> >>>>        87186203314552835-MACHINE1:8081_SOLR_-N_0000000248(ephemeral
>>>> >>> v=0)
>>>> >>>>
>>>>  159243797356740611-MACHINE1:8082_SOLR_-N_0000000250(ephemeral
>>>> >>> v=0)
>>>> >>>>        87186203314552836-MACHINE1:8083_SOLR_-N_0000000249(ephemeral
>>>> >>> v=0)
>>>> >>>>    LEADER (emphemeral
>>>> >>>> v=0)"{"id":"87186203314552835-MACHINE1:8081_solr-n_000000248"}"
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller <
>>>> markrmil...@gmail.com>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>>>
>>>> >>>>> On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
>>>> >>>>>
>>>> >>>>>> Thanks for your reply Mark.
>>>> >>>>>>
>>>> >>>>>> I believe the build was towards the begining of the month. The
>>>> >>>>>> solr.spec.version is 4.0.0.2012.01.10.38.09
>>>> >>>>>>
>>>> >>>>>> I cannot access the clusterstate.json contents. I clicked on it a
>>>> >>> couple
>>>> >>>>> of
>>>> >>>>>> times, but nothing happens. Is that stored on disk somewhere?
>>>> >>>>>
>>>> >>>>> Are you using the new admin UI? That has recently been updated to
>>>> work
>>>> >>>>> better with cloud - it had some troubles not too long ago. If you
>>>> are,
>>>> >>> you
>>>> >>>>> should trying using the old admin UI's zookeeper page - that
>>>> should
>>>> >>> show
>>>> >>>>> the cluster state.
>>>> >>>>>
>>>> >>>>> That being said, there has been a lot of bug fixes over the past
>>>> month
>>>> >>> -
>>>> >>>>> so you may just want to update to a recent version.
>>>> >>>>>
>>>> >>>>>>
>>>> >>>>>> I configured a custom request handler to calculate an unique
>>>> document
>>>> >>> id
>>>> >>>>>> based on the file's url.
>>>> >>>>>>
>>>> >>>>>> On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller <
>>>> markrmil...@gmail.com>
>>>> >>>>> wrote:
>>>> >>>>>>
>>>> >>>>>>> Hey Matt - is your build recent?
>>>> >>>>>>>
>>>> >>>>>>> Can you visit the cloud/zookeeper page in the admin and send the
>>>> >>>>> contents
>>>> >>>>>>> of the clusterstate.json node?
>>>> >>>>>>>
>>>> >>>>>>> Are you using a custom index chain or anything out of the
>>>> ordinary?
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> - Mark
>>>> >>>>>>>
>>>> >>>>>>> On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
>>>> >>>>>>>
>>>> >>>>>>>> TWIMC:
>>>> >>>>>>>>
>>>> >>>>>>>> Environment
>>>> >>>>>>>> =========
>>>> >>>>>>>> Apache SOLR rev-1236154
>>>> >>>>>>>> Apache Zookeeper 3.3.4
>>>> >>>>>>>> Windows 7
>>>> >>>>>>>> JDK 1.6.0_23.b05
>>>> >>>>>>>>
>>>> >>>>>>>> I have built a SOLR Cloud instance with 4 nodes using the
>>>> embeded
>>>> >>> Jetty
>>>> >>>>>>>> servers.
>>>> >>>>>>>>
>>>> >>>>>>>> I created a 3 node zookeeper ensemble to manage the solr
>>>> >>> configuration
>>>> >>>>>>> data.
>>>> >>>>>>>>
>>>> >>>>>>>> All the instances run on one server so I've had to move ports
>>>> around
>>>> >>>>> for
>>>> >>>>>>>> the various applications.
>>>> >>>>>>>>
>>>> >>>>>>>> I start the 3 zookeeper nodes.
>>>> >>>>>>>>
>>>> >>>>>>>> I started the first instance of solr cloud with the parameter
>>>> to
>>>> >>> have
>>>> >>>>> two
>>>> >>>>>>>> shards.
>>>> >>>>>>>>
>>>> >>>>>>>> The start the remaining 3 solr nodes.
>>>> >>>>>>>>
>>>> >>>>>>>> The system comes up fine. No errors thrown.
>>>> >>>>>>>>
>>>> >>>>>>>> I can view the solr cloud console and I can see the SOLR
>>>> >>> configuration
>>>> >>>>>>>> files managed by ZooKeeper.
>>>> >>>>>>>>
>>>> >>>>>>>> I published data into the SOLR Cloud instances from SharePoint
>>>> using
>>>> >>>>>>> Apache
>>>> >>>>>>>> Manifold 0.4-incubating. Manifold is setup to publish the data
>>>> into
>>>> >>>>>>>> collection1, which is the only collection defined in the
>>>> cluster.
>>>> >>>>>>>>
>>>> >>>>>>>> When I query the data from collection1 as per the solr wiki,
>>>> the
>>>> >>>>> results
>>>> >>>>>>>> are inconsistent. Sometimes all the results are there, other
>>>> times
>>>> >>>>>>> nothing
>>>> >>>>>>>> comes back at all.
>>>> >>>>>>>>
>>>> >>>>>>>> It seems to be having an issue auto replicating the data
>>>> across the
>>>> >>>>>>> cloud.
>>>> >>>>>>>>
>>>> >>>>>>>> Is there some specific setting I might have missed? Based upon
>>>> what
>>>> >>> I
>>>> >>>>>>> read,
>>>> >>>>>>>> I thought that SOLR cloud would take care of distributing and
>>>> >>>>> replicating
>>>> >>>>>>>> the data automatically. Do you have to tell it what shard to
>>>> publish
>>>> >>>>> the
>>>> >>>>>>>> data into as well?
>>>> >>>>>>>>
>>>> >>>>>>>> Any help would be appreciated.
>>>> >>>>>>>>
>>>> >>>>>>>> Thanks,
>>>> >>>>>>>>
>>>> >>>>>>>> Matt
>>>> >>>>>>>>
>>>> >>>>>>>> ------------------------------
>>>> >>>>>>>> This e-mail and any files transmitted with it may be
>>>> proprietary.
>>>> >>>>>>> Please note that any views or opinions presented in this e-mail
>>>> are
>>>> >>>>> solely
>>>> >>>>>>> those of the author and do not necessarily represent those of
>>>> Apogee
>>>> >>>>>>> Integration.
>>>> >>>>>>>
>>>> >>>>>>> - Mark Miller
>>>> >>>>>>> lucidimagination.com
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>> Matt
>>>> >>>>>>
>>>> >>>>>> ------------------------------
>>>> >>>>>> This e-mail and any files transmitted with it may be proprietary.
>>>> >>>>> Please note that any views or opinions presented in this e-mail
>>>> are
>>>> >>> solely
>>>> >>>>> those of the author and do not necessarily represent those of
>>>> Apogee
>>>> >>>>> Integration.
>>>> >>>>>
>>>> >>>>> - Mark Miller
>>>> >>>>> lucidimagination.com
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>> ------------------------------
>>>> >>>> This e-mail and any files transmitted with it may be proprietary.
>>>> >>> Please note that any views or opinions presented in this e-mail are
>>>> solely
>>>> >>> those of the author and do not necessarily represent those of Apogee
>>>> >>> Integration.
>>>> >>>
>>>> >>> - Mark Miller
>>>> >>> lucidimagination.com
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>> > ------------------------------
>>>> > This e-mail and any files transmitted with it may be proprietary.
>>>>  Please note that any views or opinions presented in this e-mail are solely
>>>> those of the author and do not necessarily represent those of Apogee
>>>> Integration.
>>>>
>>>> - Mark Miller
>>>> lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

------------------------------
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Reply via email to