I tried running SOLR Cloud with the default number of shards (i.e. 1), and
I get the same results.

On Wed, Feb 29, 2012 at 10:46 AM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> Mark,
>
> Nothing appears to be wrong in the logs. I wiped the indexes and imported
> 37 files from SharePoint using Manifold. All 37 make it in, but SOLR still
> has issues with the results being inconsistent.
>
> Let me run my setup by you, and see whether that is the issue?
>
> On one machine, I have three zookeeper instances, four solr instances, and
> a data directory for solr and zookeeper config data.
>
> Step 1. I modified each zoo.xml configuration file to have:
>
> Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg
> ================
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=[DATA_DIRECTORY]/zk1_data
> clientPort=2181
> server.1=localhost:2888:3888
> server.2=localhost:2889:3889
> server.3=localhost:2890:3890
>
> Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
> contents:
> ==============================================================
> 1
>
> Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
> ==============
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=[DATA_DIRECTORY]/zk2_data
> clientPort=2182
> server.1=localhost:2888:3888
> server.2=localhost:2889:3889
> server.3=localhost:2890:3890
>
> Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
> contents:
> ==============================================================
> 2
>
> Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg
> ================
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=[DATA_DIRECTORY]/zk3_data
> clientPort=2183
> server.1=localhost:2888:3888
> server.2=localhost:2889:3889
> server.3=localhost:2890:3890
>
> Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
> contents:
> ====================================================
> 3
>
> Step 2 - SOLR Build
> ===============
>
> I pulled the latest SOLR trunk down. I built it with the following
> commands:
>
>            ant example dist
>
> I modified the solr.war files and added the solr cell and extraction
> libraries to WEB-INF/lib. I couldn't get the extraction to work
> any other way. Will zookeper pickup jar files stored with the rest of the
> configuration files in Zookeeper?
>
> I copied the contents of the example directory to each of my SOLR
> directories.
>
> Step 3 - Starting Zookeeper instances
> ===========================
>
> I ran the following commands to start the zookeeper instances:
>
> start .\zookeeper1\bin\zkServer.cmd
> start .\zookeeper2\bin\zkServer.cmd
> start .\zookeeper3\bin\zkServer.cmd
>
> Step 4 - Start Main SOLR instance
> ==========================
> I ran the following command to start the main SOLR instance
>
> java -Djetty.port=8081 -Dhostport=8081
> -Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>
> Starts up fine.
>
> Step 5 - Start the Remaining 3 SOLR Instances
> ==================================
> I ran the following commands to start the other 3 instances from their
> home directories:
>
> java -Djetty.port=8082 -Dhostport=8082
> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>
> java -Djetty.port=8083 -Dhostport=8083
> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>
> java -Djetty.port=8084 -Dhostport=8084
> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>
> All startup without issue.
>
> Step 6 - Modified solrconfig.xml to have a custom request handler
> ===============================================
>
> <requestHandler name="/update/sharepoint" startup="lazy"
> class="solr.extraction.ExtractingRequestHandler">
>   <lst name="defaults">
>      <str name="update.chain">sharepoint-pipeline</str>
>      <str name="fmap.content">text</str>
>      <str name="lowernames">true</str>
>      <str name="uprefix">ignored</str>
>      <str name="caputreAttr">true</str>
>      <str name="fmap.a">links</str>
>      <str name="fmap.div">ignored</str>
>   </lst>
> </requestHandler>
>
> <updateRequestProcessorChain name="sharepoint-pipeline">
>    <processor class="solr.processor.SignatureUpdateProcessorFactory">
>       <bool name="enabled">true</bool>
>       <str name="signatureField">id</str>
>       <bool name="owerrightDupes">true</bool>
>       <str name="fields">url</str>
>       <str name="signatureClass">solr.processor.Lookup3Signature</str>
>    </processor>
>    <processor class="solr.LogUpdateProcessorFactory"/>
>    <processor class="solr.RunUpdateProcessorFactory"/>
> </updateRequestProcessorChain>
>
>
> Hopefully this will shed some light on why my configuration is having
> issues.
>
> Thanks for your help.
>
> Matt
>
>
>
> On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller <markrmil...@gmail.com>wrote:
>
>> Hmm...this is very strange - there is nothing interesting in any of the
>> logs?
>>
>> In clusterstate.json, all of the shards have an active state?
>>
>>
>> There are quite a few of us doing exactly this setup recently, so there
>> must be something we are missing here...
>>
>> Any info you can offer might help.
>>
>> - Mark
>>
>> On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
>>
>> > Mark,
>> >
>> > I got the codebase from the 2/26/2012, and I got the same inconsistent
>> > results.
>> >
>> > I have solr running on four ports 8081-8084
>> >
>> > 8081 and 8082 are the leaders for shard 1, and shard 2, respectively
>> >
>> > 8083 - is assigned to shard 1
>> > 8084 - is assigned to shard 2
>> >
>> > queries come in and sometime it seems the windows from 8081 and 8083
>> move
>> > responding to the query but there are no results.
>> >
>> > if the queries run on 8081/8082 or 8081/8084 then results come back ok.
>> >
>> > The query is nothing more than: q=*:*
>> >
>> > Regards,
>> >
>> > Matt
>> >
>> >
>> > On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker <
>> > mpar...@apogeeintegration.com> wrote:
>> >
>> >> I'll have to check on the commit situation. We have been pushing data
>> from
>> >> SharePoint the last week or so. Would that somehow block the documents
>> >> moving between the solr instances?
>> >>
>> >> I'll try another version tomorrow. Thanks for the suggestions.
>> >>
>> >> On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller <markrmil...@gmail.com
>> >wrote:
>> >>
>> >>> Hmmm...all of that looks pretty normal...
>> >>>
>> >>> Did a commit somehow fail on the other machine? When you view the
>> stats
>> >>> for the update handler, are there a lot of pending adds for on of the
>> >>> nodes? Do the commit counts match across nodes?
>> >>>
>> >>> You can also query an individual node with distrib=false to check
>> that.
>> >>>
>> >>> If you build is a month old, I'd honestly recommend you try upgrading
>> as
>> >>> well.
>> >>>
>> >>> - Mark
>> >>>
>> >>> On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
>> >>>
>> >>>> Here is most of the cluster state:
>> >>>>
>> >>>> Connected to Zookeeper
>> >>>> localhost:2181, localhost: 2182, localhost:2183
>> >>>>
>> >>>> /(v=0 children=7) ""
>> >>>>  /CONFIGS(v=0, children=1)
>> >>>>     /CONFIGURATION(v=0 children=25)
>> >>>>            <<<<< all the configuration files, velocity info, xslt,
>> etc.
>> >>>>>>>>
>> >>>> /NODE_STATES(v=0 children=4)
>> >>>>    MACHINE1:8083_SOLR (v=121)"[{"shard_id":"shard1",
>> >>>>
>> "state":"active","core":"","collection":"collection1","node_name:"..."
>> >>>>    MACHINE1:8082_SOLR (v=101)"[{"shard_id":"shard2",
>> >>>>
>> "state":"active","core":"","collection":"collection1","node_name:"..."
>> >>>>    MACHINE1:8081_SOLR (v=92)"[{"shard_id":"shard1",
>> >>>>
>> "state":"active","core":"","collection":"collection1","node_name:"..."
>> >>>>    MACHINE1:8084_SOLR (v=73)"[{"shard_id":"shard2",
>> >>>>
>> "state":"active","core":"","collection":"collection1","node_name:"..."
>> >>>> /ZOOKEEPER (v-0 children=1)
>> >>>>    QUOTA(v=0)
>> >>>>
>> >>>>
>> >>>
>> /CLUSTERSTATE.JSON(V=272)"{"collection1":{"shard1":{MACHINE1:8081_solr_":{shard_id":"shard1","leader":"true","..."
>> >>>> /LIVE_NODES (v=0 children=4)
>> >>>>    MACHINE1:8083_SOLR(ephemeral v=0)
>> >>>>    MACHINE1:8082_SOLR(ephemeral v=0)
>> >>>>    MACHINE1:8081_SOLR(ephemeral v=0)
>> >>>>    MACHINE1:8084_SOLR(ephemeral v=0)
>> >>>> /COLLECTIONS (v=1 children=1)
>> >>>>    COLLECTION1(v=0 children=2)"{"configName":"configuration1"}"
>> >>>>        LEADER_ELECT(v=0 children=2)
>> >>>>            SHARD1(V=0 children=1)
>> >>>>                ELECTION(v=0 children=2)
>> >>>>
>> >>>> 87186203314552835-MACHINE1:8081_SOLR_-N_0000000096(ephemeral v=0)
>> >>>>
>> >>>> 87186203314552836-MACHINE1:8083_SOLR_-N_0000000084(ephemeral v=0)
>> >>>>            SHARD2(v=0 children=1)
>> >>>>                ELECTION(v=0 children=2)
>> >>>>
>> >>>> 231301391392833539-MACHINE1:8084_SOLR_-N_0000000085(ephemeral v=0)
>> >>>>
>> >>>> 159243797356740611-MACHINE1:8082_SOLR_-N_0000000084(ephemeral v=0)
>> >>>>        LEADERS (v=0 children=2)
>> >>>>            SHARD1 (ephemeral
>> >>>> v=0)"{"core":"","node_name":"MACHINE1:8081_solr","base_url":"
>> >>>> http://MACHINE1:8081/solr"}";
>> >>>>            SHARD2 (ephemeral
>> >>>> v=0)"{"core":"","node_name":"MACHINE1:8082_solr","base_url":"
>> >>>> http://MACHINE1:8082/solr"}";
>> >>>> /OVERSEER_ELECT (v=0 children=2)
>> >>>>    ELECTION (v=0 children=4)
>> >>>>        231301391392833539-MACHINE1:8084_SOLR_-N_0000000251(ephemeral
>> >>> v=0)
>> >>>>        87186203314552835-MACHINE1:8081_SOLR_-N_0000000248(ephemeral
>> >>> v=0)
>> >>>>        159243797356740611-MACHINE1:8082_SOLR_-N_0000000250(ephemeral
>> >>> v=0)
>> >>>>        87186203314552836-MACHINE1:8083_SOLR_-N_0000000249(ephemeral
>> >>> v=0)
>> >>>>    LEADER (emphemeral
>> >>>> v=0)"{"id":"87186203314552835-MACHINE1:8081_solr-n_000000248"}"
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller <markrmil...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>>>
>> >>>>> On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
>> >>>>>
>> >>>>>> Thanks for your reply Mark.
>> >>>>>>
>> >>>>>> I believe the build was towards the begining of the month. The
>> >>>>>> solr.spec.version is 4.0.0.2012.01.10.38.09
>> >>>>>>
>> >>>>>> I cannot access the clusterstate.json contents. I clicked on it a
>> >>> couple
>> >>>>> of
>> >>>>>> times, but nothing happens. Is that stored on disk somewhere?
>> >>>>>
>> >>>>> Are you using the new admin UI? That has recently been updated to
>> work
>> >>>>> better with cloud - it had some troubles not too long ago. If you
>> are,
>> >>> you
>> >>>>> should trying using the old admin UI's zookeeper page - that should
>> >>> show
>> >>>>> the cluster state.
>> >>>>>
>> >>>>> That being said, there has been a lot of bug fixes over the past
>> month
>> >>> -
>> >>>>> so you may just want to update to a recent version.
>> >>>>>
>> >>>>>>
>> >>>>>> I configured a custom request handler to calculate an unique
>> document
>> >>> id
>> >>>>>> based on the file's url.
>> >>>>>>
>> >>>>>> On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller <
>> markrmil...@gmail.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>>> Hey Matt - is your build recent?
>> >>>>>>>
>> >>>>>>> Can you visit the cloud/zookeeper page in the admin and send the
>> >>>>> contents
>> >>>>>>> of the clusterstate.json node?
>> >>>>>>>
>> >>>>>>> Are you using a custom index chain or anything out of the
>> ordinary?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> - Mark
>> >>>>>>>
>> >>>>>>> On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
>> >>>>>>>
>> >>>>>>>> TWIMC:
>> >>>>>>>>
>> >>>>>>>> Environment
>> >>>>>>>> =========
>> >>>>>>>> Apache SOLR rev-1236154
>> >>>>>>>> Apache Zookeeper 3.3.4
>> >>>>>>>> Windows 7
>> >>>>>>>> JDK 1.6.0_23.b05
>> >>>>>>>>
>> >>>>>>>> I have built a SOLR Cloud instance with 4 nodes using the embeded
>> >>> Jetty
>> >>>>>>>> servers.
>> >>>>>>>>
>> >>>>>>>> I created a 3 node zookeeper ensemble to manage the solr
>> >>> configuration
>> >>>>>>> data.
>> >>>>>>>>
>> >>>>>>>> All the instances run on one server so I've had to move ports
>> around
>> >>>>> for
>> >>>>>>>> the various applications.
>> >>>>>>>>
>> >>>>>>>> I start the 3 zookeeper nodes.
>> >>>>>>>>
>> >>>>>>>> I started the first instance of solr cloud with the parameter to
>> >>> have
>> >>>>> two
>> >>>>>>>> shards.
>> >>>>>>>>
>> >>>>>>>> The start the remaining 3 solr nodes.
>> >>>>>>>>
>> >>>>>>>> The system comes up fine. No errors thrown.
>> >>>>>>>>
>> >>>>>>>> I can view the solr cloud console and I can see the SOLR
>> >>> configuration
>> >>>>>>>> files managed by ZooKeeper.
>> >>>>>>>>
>> >>>>>>>> I published data into the SOLR Cloud instances from SharePoint
>> using
>> >>>>>>> Apache
>> >>>>>>>> Manifold 0.4-incubating. Manifold is setup to publish the data
>> into
>> >>>>>>>> collection1, which is the only collection defined in the cluster.
>> >>>>>>>>
>> >>>>>>>> When I query the data from collection1 as per the solr wiki, the
>> >>>>> results
>> >>>>>>>> are inconsistent. Sometimes all the results are there, other
>> times
>> >>>>>>> nothing
>> >>>>>>>> comes back at all.
>> >>>>>>>>
>> >>>>>>>> It seems to be having an issue auto replicating the data across
>> the
>> >>>>>>> cloud.
>> >>>>>>>>
>> >>>>>>>> Is there some specific setting I might have missed? Based upon
>> what
>> >>> I
>> >>>>>>> read,
>> >>>>>>>> I thought that SOLR cloud would take care of distributing and
>> >>>>> replicating
>> >>>>>>>> the data automatically. Do you have to tell it what shard to
>> publish
>> >>>>> the
>> >>>>>>>> data into as well?
>> >>>>>>>>
>> >>>>>>>> Any help would be appreciated.
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>>
>> >>>>>>>> Matt
>> >>>>>>>>
>> >>>>>>>> ------------------------------
>> >>>>>>>> This e-mail and any files transmitted with it may be proprietary.
>> >>>>>>> Please note that any views or opinions presented in this e-mail
>> are
>> >>>>> solely
>> >>>>>>> those of the author and do not necessarily represent those of
>> Apogee
>> >>>>>>> Integration.
>> >>>>>>>
>> >>>>>>> - Mark Miller
>> >>>>>>> lucidimagination.com
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>> Matt
>> >>>>>>
>> >>>>>> ------------------------------
>> >>>>>> This e-mail and any files transmitted with it may be proprietary.
>> >>>>> Please note that any views or opinions presented in this e-mail are
>> >>> solely
>> >>>>> those of the author and do not necessarily represent those of Apogee
>> >>>>> Integration.
>> >>>>>
>> >>>>> - Mark Miller
>> >>>>> lucidimagination.com
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> ------------------------------
>> >>>> This e-mail and any files transmitted with it may be proprietary.
>> >>> Please note that any views or opinions presented in this e-mail are
>> solely
>> >>> those of the author and do not necessarily represent those of Apogee
>> >>> Integration.
>> >>>
>> >>> - Mark Miller
>> >>> lucidimagination.com
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >
>> > ------------------------------
>> > This e-mail and any files transmitted with it may be proprietary.
>>  Please note that any views or opinions presented in this e-mail are solely
>> those of the author and do not necessarily represent those of Apogee
>> Integration.
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

------------------------------
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Reply via email to