Thanks , I also noticed that the mandatory _version_ field is also
uniquely generated for every document in the collection , can this be
used as an unique value instead of generating the hashcode for the
urlField.

I want to avoid creation of a custom unique filed if _version_ field
which is mandated for schema.xml actually does that for me.



On Thu, Nov 13, 2014 at 8:07 AM, Garth Grimm
<garthgr...@averyranchconsulting.com> wrote:
> OK.  So it sounds like doctorURL is a good key, but you don’t like the 
> special characters.  I’ve used MD5 hashes of URLs before as a way to convert 
> unique URLs into unique alphanumeric strings in a repeatable way.  I think 
> most programming languages contain libraries for doing that as you feed the 
> data to Solr (Java certainly does).  Other hashing or encoding mechanisms 
> could be used if you wanted to be able to programmatically convert from the 
> doctorURL to the string you want to use and back again.
>
> Anyway, the point there being that you have a repeatable unique key that is 
> derived directly from the data you’re storing.  Not a random ID value that 
> will be different every time you feed the same thing in.
>
> BTW, you can certainly use a custom field type to do the hashing work, but 
> I’d suggest you do that before feeding the data to SolrCloud.  If you do it 
> outside of SolrCloud, then SolrCloud can use it for routing to the correct 
> shard.  If you try to do it solely in a field type, the field type output 
> won’t be available until the indexing is actually occurring, which is too 
> late for routing purposes.  And that means you can’t ensure that subsequent 
> re-feeds of the same thing will overwrite the old values since you can’t make 
> sure they get routed to the same shard.
>
>> On Nov 12, 2014, at 7:50 PM, Meraj A. Khan <mera...@gmail.com> wrote:
>>
>> Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
>> mechanism because urls can have special characters that can caise issue
>> with Solr lookup.
>>
>> I guess I should rephrase my question to ,how to auto generate the unique
>> keys in the id field when using SolrCloud?
>> On Nov 12, 2014 7:28 PM, "Garth Grimm" <garthgr...@averyranchconsulting.com>
>> wrote:
>>
>>> You mention you already have a unique Key identified for the data you’re
>>> storing in Solr:
>>>
>>>> <uniqueKey>doctorId<uniquekey>
>>>
>>> If that’s the field you’re using to uniquely identify each thing you’re
>>> storing in the solr index, why do you want to have an id field that is
>>> populated with some random value?  You’ll be using the doctorId field as
>>> the key, and the id field will have no real meaning in your Data Model.
>>>
>>> If doctorId actually isn’t unique to each item you plan on storing in
>>> Solr, is there any other field that is?  If so, use that field as your
>>> unique key.
>>>
>>> Remember, this uniqueKeys are usually used for routing documents to shards
>>> in SolrCloud, and are used to ensure that later updates of the same “thing”
>>> overwrite the old one, rather than generating multiple copies.  So the keys
>>> really should be something derived from the data your storing.  I’m not
>>> sure if I understand why you would want to have the key randomly generated.
>>>
>>>> On Nov 12, 2014, at 6:39 PM, S.L <simpleliving...@gmail.com> wrote:
>>>>
>>>> Just tried  adding  <uniqueKey>id</uniqueKey> while keeping id type=
>>>> "string" only blank ids are being generated ,looks like the id is being
>>>> auto generated only if the the id is set to  type uuid , but in case of
>>>> SolrCloud this id will be unique per replica.
>>>>
>>>> Is there a  way to generate a unique id both in case of SolrCloud with
>>> out
>>>> using the uuid type or not having a per replica unique id?
>>>>
>>>> The uuid in question is of type .
>>>>
>>>> <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
>>>>
>>>>
>>>> On Wed, Nov 12, 2014 at 6:20 PM, S.L <simpleliving...@gmail.com> wrote:
>>>>
>>>>> Thanks.
>>>>>
>>>>> So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
>>>>> defined in my schema.xml.
>>>>>
>>>>> If along with that I also want the <id></id> field to be automatically
>>>>> generated for each document do I have to declare it as a <uniquekey> as
>>>>> well , because I just tried the following setting without the uniqueKey
>>> for
>>>>> id and its only generating blank ids for me.
>>>>>
>>>>> *schema.xml*
>>>>>
>>>>>       <field name="id" type="string" indexed="true" stored="true"
>>>>>           required="true" multiValued="false" />
>>>>>
>>>>> *solrconfig.xml*
>>>>>
>>>>>     <updateRequestProcessorChain name="uuid">
>>>>>
>>>>>       <processor class="solr.UUIDUpdateProcessorFactory">
>>>>>           <str name="fieldName">id</str>
>>>>>       </processor>
>>>>>       <processor class="solr.RunUpdateProcessorFactory" />
>>>>>   </updateRequestProcessorChain>
>>>>>
>>>>>
>>>>> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
>>>>> garthgr...@averyranchconsulting.com> wrote:
>>>>>
>>>>>> Looking a little deeper, I did find this about UUIDField
>>>>>>
>>>>>>
>>>>>>
>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>>>>>>
>>>>>> "NOTE: Configuring a UUIDField instance with a default value of "NEW"
>>> is
>>>>>> not advisable for most users when using SolrCloud (and not possible if
>>> the
>>>>>> UUID value is configured as the unique key field) since the result
>>> will be
>>>>>> that each replica of each document will get a unique UUID value. Using
>>>>>> UUIDUpdateProcessorFactory<
>>>>>>
>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
>>>>
>>>>>> to generate UUID values when documents are added is recomended
>>> instead.”
>>>>>>
>>>>>> That might describe the behavior you saw.  And the use of
>>>>>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered
>>> well
>>>>>> here:
>>>>>>
>>>>>>
>>>>>>
>>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>>>>>>
>>>>>> Though I’ve not actually tried that process before.
>>>>>>
>>>>>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
>>>>>> garthgr...@averyranchconsulting.com<mailto:
>>>>>> garthgr...@averyranchconsulting.com>> wrote:
>>>>>>
>>>>>> “uuid” isn’t an out of the box field type that I’m familiar with.
>>>>>>
>>>>>> Generally, I’d stick with the out of the box advice of the schema.xml
>>>>>> file, which includes things like….
>>>>>>
>>>>>> <!-- Only remove the "id" field if you have a very good reason to.
>>>>>> While not strictly
>>>>>>   required, it is highly recommended. A <uniqueKey> is present in
>>>>>> almost all Solr
>>>>>>   installations. See the <uniqueKey> declaration below where
>>>>>> <uniqueKey> is set to "id".
>>>>>> -->
>>>>>> <field name="id" type="string" indexed="true" stored="true"
>>>>>> required="true" multiValued="false" />
>>>>>>
>>>>>> and…
>>>>>>
>>>>>> <!-- Field to use to determine and enforce document uniqueness.
>>>>>>    Unless this field is marked with required="false", it will be a
>>>>>> required field
>>>>>> -->
>>>>>> <uniqueKey>id</uniqueKey>
>>>>>>
>>>>>> If you’re creating some key/value pair with uuid as the key as you feed
>>>>>> documents in, and you know that the uuid values you’re creating are
>>> unique,
>>>>>> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
>>>>>> change the key name you send in from ‘uuid’ to ‘id’.
>>>>>>
>>>>>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving...@gmail.com<mailto:
>>>>>> simpleliving...@gmail.com>> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I am seeing interesting behavior on the replicas , I have a single
>>>>>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
>>>>>> number of documents ~375 that are replicated across the six replicas .
>>>>>>
>>>>>> The interesting thing is that the same  document has a different id in
>>>>>> each one of those replicas .
>>>>>>
>>>>>> This is causing the fq(id:xyz) type queries to fail, depending on
>>>>>> which replica the query goes to.
>>>>>>
>>>>>> I have  specified the id field in the following manner in schema.xml,
>>>>>> is it the right way to specifiy an auto generated id in  SolrCloud ?
>>>>>>
>>>>>>     <field name="id" type="uuid" indexed="true" stored="true"
>>>>>>         required="true" multiValued="false" />
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>

Reply via email to