I also took out my requestHandler and used the standard /update/extract
handler. Same result.

On Wed, Feb 29, 2012 at 11:47 AM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> I tried running SOLR Cloud with the default number of shards (i.e. 1), and
> I get the same results.
>
> On Wed, Feb 29, 2012 at 10:46 AM, Matthew Parker <
> mpar...@apogeeintegration.com> wrote:
>
>> Mark,
>>
>> Nothing appears to be wrong in the logs. I wiped the indexes and imported
>> 37 files from SharePoint using Manifold. All 37 make it in, but SOLR still
>> has issues with the results being inconsistent.
>>
>> Let me run my setup by you, and see whether that is the issue?
>>
>> On one machine, I have three zookeeper instances, four solr instances,
>> and a data directory for solr and zookeeper config data.
>>
>> Step 1. I modified each zoo.xml configuration file to have:
>>
>> Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg
>> ================
>> tickTime=2000
>> initLimit=10
>> syncLimit=5
>> dataDir=[DATA_DIRECTORY]/zk1_data
>> clientPort=2181
>> server.1=localhost:2888:3888
>> server.2=localhost:2889:3889
>> server.3=localhost:2890:3890
>>
>> Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
>> contents:
>> ==============================================================
>> 1
>>
>> Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
>> ==============
>> tickTime=2000
>> initLimit=10
>> syncLimit=5
>> dataDir=[DATA_DIRECTORY]/zk2_data
>> clientPort=2182
>> server.1=localhost:2888:3888
>> server.2=localhost:2889:3889
>> server.3=localhost:2890:3890
>>
>> Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
>> contents:
>> ==============================================================
>> 2
>>
>> Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg
>> ================
>> tickTime=2000
>> initLimit=10
>> syncLimit=5
>> dataDir=[DATA_DIRECTORY]/zk3_data
>> clientPort=2183
>> server.1=localhost:2888:3888
>> server.2=localhost:2889:3889
>> server.3=localhost:2890:3890
>>
>> Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
>> contents:
>> ====================================================
>> 3
>>
>> Step 2 - SOLR Build
>> ===============
>>
>> I pulled the latest SOLR trunk down. I built it with the following
>> commands:
>>
>>            ant example dist
>>
>> I modified the solr.war files and added the solr cell and extraction
>> libraries to WEB-INF/lib. I couldn't get the extraction to work
>> any other way. Will zookeper pickup jar files stored with the rest of the
>> configuration files in Zookeeper?
>>
>> I copied the contents of the example directory to each of my SOLR
>> directories.
>>
>> Step 3 - Starting Zookeeper instances
>> ===========================
>>
>> I ran the following commands to start the zookeeper instances:
>>
>> start .\zookeeper1\bin\zkServer.cmd
>> start .\zookeeper2\bin\zkServer.cmd
>> start .\zookeeper3\bin\zkServer.cmd
>>
>> Step 4 - Start Main SOLR instance
>> ==========================
>> I ran the following command to start the main SOLR instance
>>
>> java -Djetty.port=8081 -Dhostport=8081
>> -Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>
>> Starts up fine.
>>
>> Step 5 - Start the Remaining 3 SOLR Instances
>> ==================================
>> I ran the following commands to start the other 3 instances from their
>> home directories:
>>
>> java -Djetty.port=8082 -Dhostport=8082
>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>
>> java -Djetty.port=8083 -Dhostport=8083
>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>
>> java -Djetty.port=8084 -Dhostport=8084
>> -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
>>
>> All startup without issue.
>>
>> Step 6 - Modified solrconfig.xml to have a custom request handler
>> ===============================================
>>
>> <requestHandler name="/update/sharepoint" startup="lazy"
>> class="solr.extraction.ExtractingRequestHandler">
>>   <lst name="defaults">
>>      <str name="update.chain">sharepoint-pipeline</str>
>>      <str name="fmap.content">text</str>
>>      <str name="lowernames">true</str>
>>      <str name="uprefix">ignored</str>
>>      <str name="caputreAttr">true</str>
>>      <str name="fmap.a">links</str>
>>      <str name="fmap.div">ignored</str>
>>   </lst>
>> </requestHandler>
>>
>> <updateRequestProcessorChain name="sharepoint-pipeline">
>>    <processor class="solr.processor.SignatureUpdateProcessorFactory">
>>       <bool name="enabled">true</bool>
>>       <str name="signatureField">id</str>
>>       <bool name="owerrightDupes">true</bool>
>>       <str name="fields">url</str>
>>       <str name="signatureClass">solr.processor.Lookup3Signature</str>
>>    </processor>
>>    <processor class="solr.LogUpdateProcessorFactory"/>
>>    <processor class="solr.RunUpdateProcessorFactory"/>
>> </updateRequestProcessorChain>
>>
>>
>> Hopefully this will shed some light on why my configuration is having
>> issues.
>>
>> Thanks for your help.
>>
>> Matt
>>
>>
>>
>> On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller <markrmil...@gmail.com>wrote:
>>
>>> Hmm...this is very strange - there is nothing interesting in any of the
>>> logs?
>>>
>>> In clusterstate.json, all of the shards have an active state?
>>>
>>>
>>> There are quite a few of us doing exactly this setup recently, so there
>>> must be something we are missing here...
>>>
>>> Any info you can offer might help.
>>>
>>> - Mark
>>>
>>> On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
>>>
>>> > Mark,
>>> >
>>> > I got the codebase from the 2/26/2012, and I got the same inconsistent
>>> > results.
>>> >
>>> > I have solr running on four ports 8081-8084
>>> >
>>> > 8081 and 8082 are the leaders for shard 1, and shard 2, respectively
>>> >
>>> > 8083 - is assigned to shard 1
>>> > 8084 - is assigned to shard 2
>>> >
>>> > queries come in and sometime it seems the windows from 8081 and 8083
>>> move
>>> > responding to the query but there are no results.
>>> >
>>> > if the queries run on 8081/8082 or 8081/8084 then results come back ok.
>>> >
>>> > The query is nothing more than: q=*:*
>>> >
>>> > Regards,
>>> >
>>> > Matt
>>> >
>>> >
>>> > On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker <
>>> > mpar...@apogeeintegration.com> wrote:
>>> >
>>> >> I'll have to check on the commit situation. We have been pushing data
>>> from
>>> >> SharePoint the last week or so. Would that somehow block the documents
>>> >> moving between the solr instances?
>>> >>
>>> >> I'll try another version tomorrow. Thanks for the suggestions.
>>> >>
>>> >> On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller <markrmil...@gmail.com
>>> >wrote:
>>> >>
>>> >>> Hmmm...all of that looks pretty normal...
>>> >>>
>>> >>> Did a commit somehow fail on the other machine? When you view the
>>> stats
>>> >>> for the update handler, are there a lot of pending adds for on of the
>>> >>> nodes? Do the commit counts match across nodes?
>>> >>>
>>> >>> You can also query an individual node with distrib=false to check
>>> that.
>>> >>>
>>> >>> If you build is a month old, I'd honestly recommend you try
>>> upgrading as
>>> >>> well.
>>> >>>
>>> >>> - Mark
>>> >>>
>>> >>> On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
>>> >>>
>>> >>>> Here is most of the cluster state:
>>> >>>>
>>> >>>> Connected to Zookeeper
>>> >>>> localhost:2181, localhost: 2182, localhost:2183
>>> >>>>
>>> >>>> /(v=0 children=7) ""
>>> >>>>  /CONFIGS(v=0, children=1)
>>> >>>>     /CONFIGURATION(v=0 children=25)
>>> >>>>            <<<<< all the configuration files, velocity info, xslt,
>>> etc.
>>> >>>>>>>>
>>> >>>> /NODE_STATES(v=0 children=4)
>>> >>>>    MACHINE1:8083_SOLR (v=121)"[{"shard_id":"shard1",
>>> >>>>
>>> "state":"active","core":"","collection":"collection1","node_name:"..."
>>> >>>>    MACHINE1:8082_SOLR (v=101)"[{"shard_id":"shard2",
>>> >>>>
>>> "state":"active","core":"","collection":"collection1","node_name:"..."
>>> >>>>    MACHINE1:8081_SOLR (v=92)"[{"shard_id":"shard1",
>>> >>>>
>>> "state":"active","core":"","collection":"collection1","node_name:"..."
>>> >>>>    MACHINE1:8084_SOLR (v=73)"[{"shard_id":"shard2",
>>> >>>>
>>> "state":"active","core":"","collection":"collection1","node_name:"..."
>>> >>>> /ZOOKEEPER (v-0 children=1)
>>> >>>>    QUOTA(v=0)
>>> >>>>
>>> >>>>
>>> >>>
>>> /CLUSTERSTATE.JSON(V=272)"{"collection1":{"shard1":{MACHINE1:8081_solr_":{shard_id":"shard1","leader":"true","..."
>>> >>>> /LIVE_NODES (v=0 children=4)
>>> >>>>    MACHINE1:8083_SOLR(ephemeral v=0)
>>> >>>>    MACHINE1:8082_SOLR(ephemeral v=0)
>>> >>>>    MACHINE1:8081_SOLR(ephemeral v=0)
>>> >>>>    MACHINE1:8084_SOLR(ephemeral v=0)
>>> >>>> /COLLECTIONS (v=1 children=1)
>>> >>>>    COLLECTION1(v=0 children=2)"{"configName":"configuration1"}"
>>> >>>>        LEADER_ELECT(v=0 children=2)
>>> >>>>            SHARD1(V=0 children=1)
>>> >>>>                ELECTION(v=0 children=2)
>>> >>>>
>>> >>>> 87186203314552835-MACHINE1:8081_SOLR_-N_0000000096(ephemeral v=0)
>>> >>>>
>>> >>>> 87186203314552836-MACHINE1:8083_SOLR_-N_0000000084(ephemeral v=0)
>>> >>>>            SHARD2(v=0 children=1)
>>> >>>>                ELECTION(v=0 children=2)
>>> >>>>
>>> >>>> 231301391392833539-MACHINE1:8084_SOLR_-N_0000000085(ephemeral v=0)
>>> >>>>
>>> >>>> 159243797356740611-MACHINE1:8082_SOLR_-N_0000000084(ephemeral v=0)
>>> >>>>        LEADERS (v=0 children=2)
>>> >>>>            SHARD1 (ephemeral
>>> >>>> v=0)"{"core":"","node_name":"MACHINE1:8081_solr","base_url":"
>>> >>>> http://MACHINE1:8081/solr"}";
>>> >>>>            SHARD2 (ephemeral
>>> >>>> v=0)"{"core":"","node_name":"MACHINE1:8082_solr","base_url":"
>>> >>>> http://MACHINE1:8082/solr"}";
>>> >>>> /OVERSEER_ELECT (v=0 children=2)
>>> >>>>    ELECTION (v=0 children=4)
>>> >>>>        231301391392833539-MACHINE1:8084_SOLR_-N_0000000251(ephemeral
>>> >>> v=0)
>>> >>>>        87186203314552835-MACHINE1:8081_SOLR_-N_0000000248(ephemeral
>>> >>> v=0)
>>> >>>>        159243797356740611-MACHINE1:8082_SOLR_-N_0000000250(ephemeral
>>> >>> v=0)
>>> >>>>        87186203314552836-MACHINE1:8083_SOLR_-N_0000000249(ephemeral
>>> >>> v=0)
>>> >>>>    LEADER (emphemeral
>>> >>>> v=0)"{"id":"87186203314552835-MACHINE1:8081_solr-n_000000248"}"
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller <markrmil...@gmail.com
>>> >
>>> >>> wrote:
>>> >>>>
>>> >>>>>
>>> >>>>> On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
>>> >>>>>
>>> >>>>>> Thanks for your reply Mark.
>>> >>>>>>
>>> >>>>>> I believe the build was towards the begining of the month. The
>>> >>>>>> solr.spec.version is 4.0.0.2012.01.10.38.09
>>> >>>>>>
>>> >>>>>> I cannot access the clusterstate.json contents. I clicked on it a
>>> >>> couple
>>> >>>>> of
>>> >>>>>> times, but nothing happens. Is that stored on disk somewhere?
>>> >>>>>
>>> >>>>> Are you using the new admin UI? That has recently been updated to
>>> work
>>> >>>>> better with cloud - it had some troubles not too long ago. If you
>>> are,
>>> >>> you
>>> >>>>> should trying using the old admin UI's zookeeper page - that should
>>> >>> show
>>> >>>>> the cluster state.
>>> >>>>>
>>> >>>>> That being said, there has been a lot of bug fixes over the past
>>> month
>>> >>> -
>>> >>>>> so you may just want to update to a recent version.
>>> >>>>>
>>> >>>>>>
>>> >>>>>> I configured a custom request handler to calculate an unique
>>> document
>>> >>> id
>>> >>>>>> based on the file's url.
>>> >>>>>>
>>> >>>>>> On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller <
>>> markrmil...@gmail.com>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>>> Hey Matt - is your build recent?
>>> >>>>>>>
>>> >>>>>>> Can you visit the cloud/zookeeper page in the admin and send the
>>> >>>>> contents
>>> >>>>>>> of the clusterstate.json node?
>>> >>>>>>>
>>> >>>>>>> Are you using a custom index chain or anything out of the
>>> ordinary?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> - Mark
>>> >>>>>>>
>>> >>>>>>> On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
>>> >>>>>>>
>>> >>>>>>>> TWIMC:
>>> >>>>>>>>
>>> >>>>>>>> Environment
>>> >>>>>>>> =========
>>> >>>>>>>> Apache SOLR rev-1236154
>>> >>>>>>>> Apache Zookeeper 3.3.4
>>> >>>>>>>> Windows 7
>>> >>>>>>>> JDK 1.6.0_23.b05
>>> >>>>>>>>
>>> >>>>>>>> I have built a SOLR Cloud instance with 4 nodes using the
>>> embeded
>>> >>> Jetty
>>> >>>>>>>> servers.
>>> >>>>>>>>
>>> >>>>>>>> I created a 3 node zookeeper ensemble to manage the solr
>>> >>> configuration
>>> >>>>>>> data.
>>> >>>>>>>>
>>> >>>>>>>> All the instances run on one server so I've had to move ports
>>> around
>>> >>>>> for
>>> >>>>>>>> the various applications.
>>> >>>>>>>>
>>> >>>>>>>> I start the 3 zookeeper nodes.
>>> >>>>>>>>
>>> >>>>>>>> I started the first instance of solr cloud with the parameter to
>>> >>> have
>>> >>>>> two
>>> >>>>>>>> shards.
>>> >>>>>>>>
>>> >>>>>>>> The start the remaining 3 solr nodes.
>>> >>>>>>>>
>>> >>>>>>>> The system comes up fine. No errors thrown.
>>> >>>>>>>>
>>> >>>>>>>> I can view the solr cloud console and I can see the SOLR
>>> >>> configuration
>>> >>>>>>>> files managed by ZooKeeper.
>>> >>>>>>>>
>>> >>>>>>>> I published data into the SOLR Cloud instances from SharePoint
>>> using
>>> >>>>>>> Apache
>>> >>>>>>>> Manifold 0.4-incubating. Manifold is setup to publish the data
>>> into
>>> >>>>>>>> collection1, which is the only collection defined in the
>>> cluster.
>>> >>>>>>>>
>>> >>>>>>>> When I query the data from collection1 as per the solr wiki, the
>>> >>>>> results
>>> >>>>>>>> are inconsistent. Sometimes all the results are there, other
>>> times
>>> >>>>>>> nothing
>>> >>>>>>>> comes back at all.
>>> >>>>>>>>
>>> >>>>>>>> It seems to be having an issue auto replicating the data across
>>> the
>>> >>>>>>> cloud.
>>> >>>>>>>>
>>> >>>>>>>> Is there some specific setting I might have missed? Based upon
>>> what
>>> >>> I
>>> >>>>>>> read,
>>> >>>>>>>> I thought that SOLR cloud would take care of distributing and
>>> >>>>> replicating
>>> >>>>>>>> the data automatically. Do you have to tell it what shard to
>>> publish
>>> >>>>> the
>>> >>>>>>>> data into as well?
>>> >>>>>>>>
>>> >>>>>>>> Any help would be appreciated.
>>> >>>>>>>>
>>> >>>>>>>> Thanks,
>>> >>>>>>>>
>>> >>>>>>>> Matt
>>> >>>>>>>>
>>> >>>>>>>> ------------------------------
>>> >>>>>>>> This e-mail and any files transmitted with it may be
>>> proprietary.
>>> >>>>>>> Please note that any views or opinions presented in this e-mail
>>> are
>>> >>>>> solely
>>> >>>>>>> those of the author and do not necessarily represent those of
>>> Apogee
>>> >>>>>>> Integration.
>>> >>>>>>>
>>> >>>>>>> - Mark Miller
>>> >>>>>>> lucidimagination.com
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>> Matt
>>> >>>>>>
>>> >>>>>> ------------------------------
>>> >>>>>> This e-mail and any files transmitted with it may be proprietary.
>>> >>>>> Please note that any views or opinions presented in this e-mail are
>>> >>> solely
>>> >>>>> those of the author and do not necessarily represent those of
>>> Apogee
>>> >>>>> Integration.
>>> >>>>>
>>> >>>>> - Mark Miller
>>> >>>>> lucidimagination.com
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>> ------------------------------
>>> >>>> This e-mail and any files transmitted with it may be proprietary.
>>> >>> Please note that any views or opinions presented in this e-mail are
>>> solely
>>> >>> those of the author and do not necessarily represent those of Apogee
>>> >>> Integration.
>>> >>>
>>> >>> - Mark Miller
>>> >>> lucidimagination.com
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >
>>> > ------------------------------
>>> > This e-mail and any files transmitted with it may be proprietary.
>>>  Please note that any views or opinions presented in this e-mail are solely
>>> those of the author and do not necessarily represent those of Apogee
>>> Integration.
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

------------------------------
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Reply via email to