bq:  can this be used as an unique value instead of generating the
hashcode for the urlField

Don't do this. The _version_ field is used internally for optimistic
locking etc. I'd be _very_
cautious about co-opting this for anything else.

Best,
Erick

On Thu, Nov 13, 2014 at 8:14 AM, Meraj A. Khan <mera...@gmail.com> wrote:
> Thanks , I also noticed that the mandatory _version_ field is also
> uniquely generated for every document in the collection , can this be
> used as an unique value instead of generating the hashcode for the
> urlField.
>
> I want to avoid creation of a custom unique filed if _version_ field
> which is mandated for schema.xml actually does that for me.
>
>
>
> On Thu, Nov 13, 2014 at 8:07 AM, Garth Grimm
> <garthgr...@averyranchconsulting.com> wrote:
>> OK.  So it sounds like doctorURL is a good key, but you don’t like the 
>> special characters.  I’ve used MD5 hashes of URLs before as a way to convert 
>> unique URLs into unique alphanumeric strings in a repeatable way.  I think 
>> most programming languages contain libraries for doing that as you feed the 
>> data to Solr (Java certainly does).  Other hashing or encoding mechanisms 
>> could be used if you wanted to be able to programmatically convert from the 
>> doctorURL to the string you want to use and back again.
>>
>> Anyway, the point there being that you have a repeatable unique key that is 
>> derived directly from the data you’re storing.  Not a random ID value that 
>> will be different every time you feed the same thing in.
>>
>> BTW, you can certainly use a custom field type to do the hashing work, but 
>> I’d suggest you do that before feeding the data to SolrCloud.  If you do it 
>> outside of SolrCloud, then SolrCloud can use it for routing to the correct 
>> shard.  If you try to do it solely in a field type, the field type output 
>> won’t be available until the indexing is actually occurring, which is too 
>> late for routing purposes.  And that means you can’t ensure that subsequent 
>> re-feeds of the same thing will overwrite the old values since you can’t 
>> make sure they get routed to the same shard.
>>
>>> On Nov 12, 2014, at 7:50 PM, Meraj A. Khan <mera...@gmail.com> wrote:
>>>
>>> Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
>>> mechanism because urls can have special characters that can caise issue
>>> with Solr lookup.
>>>
>>> I guess I should rephrase my question to ,how to auto generate the unique
>>> keys in the id field when using SolrCloud?
>>> On Nov 12, 2014 7:28 PM, "Garth Grimm" <garthgr...@averyranchconsulting.com>
>>> wrote:
>>>
>>>> You mention you already have a unique Key identified for the data you’re
>>>> storing in Solr:
>>>>
>>>>> <uniqueKey>doctorId<uniquekey>
>>>>
>>>> If that’s the field you’re using to uniquely identify each thing you’re
>>>> storing in the solr index, why do you want to have an id field that is
>>>> populated with some random value?  You’ll be using the doctorId field as
>>>> the key, and the id field will have no real meaning in your Data Model.
>>>>
>>>> If doctorId actually isn’t unique to each item you plan on storing in
>>>> Solr, is there any other field that is?  If so, use that field as your
>>>> unique key.
>>>>
>>>> Remember, this uniqueKeys are usually used for routing documents to shards
>>>> in SolrCloud, and are used to ensure that later updates of the same “thing”
>>>> overwrite the old one, rather than generating multiple copies.  So the keys
>>>> really should be something derived from the data your storing.  I’m not
>>>> sure if I understand why you would want to have the key randomly generated.
>>>>
>>>>> On Nov 12, 2014, at 6:39 PM, S.L <simpleliving...@gmail.com> wrote:
>>>>>
>>>>> Just tried  adding  <uniqueKey>id</uniqueKey> while keeping id type=
>>>>> "string" only blank ids are being generated ,looks like the id is being
>>>>> auto generated only if the the id is set to  type uuid , but in case of
>>>>> SolrCloud this id will be unique per replica.
>>>>>
>>>>> Is there a  way to generate a unique id both in case of SolrCloud with
>>>> out
>>>>> using the uuid type or not having a per replica unique id?
>>>>>
>>>>> The uuid in question is of type .
>>>>>
>>>>> <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
>>>>>
>>>>>
>>>>> On Wed, Nov 12, 2014 at 6:20 PM, S.L <simpleliving...@gmail.com> wrote:
>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> So the issue here is I already have a <uniqueKey>doctorId<uniquekey>
>>>>>> defined in my schema.xml.
>>>>>>
>>>>>> If along with that I also want the <id></id> field to be automatically
>>>>>> generated for each document do I have to declare it as a <uniquekey> as
>>>>>> well , because I just tried the following setting without the uniqueKey
>>>> for
>>>>>> id and its only generating blank ids for me.
>>>>>>
>>>>>> *schema.xml*
>>>>>>
>>>>>>       <field name="id" type="string" indexed="true" stored="true"
>>>>>>           required="true" multiValued="false" />
>>>>>>
>>>>>> *solrconfig.xml*
>>>>>>
>>>>>>     <updateRequestProcessorChain name="uuid">
>>>>>>
>>>>>>       <processor class="solr.UUIDUpdateProcessorFactory">
>>>>>>           <str name="fieldName">id</str>
>>>>>>       </processor>
>>>>>>       <processor class="solr.RunUpdateProcessorFactory" />
>>>>>>   </updateRequestProcessorChain>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
>>>>>> garthgr...@averyranchconsulting.com> wrote:
>>>>>>
>>>>>>> Looking a little deeper, I did find this about UUIDField
>>>>>>>
>>>>>>>
>>>>>>>
>>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>>>>>>>
>>>>>>> "NOTE: Configuring a UUIDField instance with a default value of "NEW"
>>>> is
>>>>>>> not advisable for most users when using SolrCloud (and not possible if
>>>> the
>>>>>>> UUID value is configured as the unique key field) since the result
>>>> will be
>>>>>>> that each replica of each document will get a unique UUID value. Using
>>>>>>> UUIDUpdateProcessorFactory<
>>>>>>>
>>>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
>>>>>
>>>>>>> to generate UUID values when documents are added is recomended
>>>> instead.”
>>>>>>>
>>>>>>> That might describe the behavior you saw.  And the use of
>>>>>>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered
>>>> well
>>>>>>> here:
>>>>>>>
>>>>>>>
>>>>>>>
>>>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>>>>>>>
>>>>>>> Though I’ve not actually tried that process before.
>>>>>>>
>>>>>>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
>>>>>>> garthgr...@averyranchconsulting.com<mailto:
>>>>>>> garthgr...@averyranchconsulting.com>> wrote:
>>>>>>>
>>>>>>> “uuid” isn’t an out of the box field type that I’m familiar with.
>>>>>>>
>>>>>>> Generally, I’d stick with the out of the box advice of the schema.xml
>>>>>>> file, which includes things like….
>>>>>>>
>>>>>>> <!-- Only remove the "id" field if you have a very good reason to.
>>>>>>> While not strictly
>>>>>>>   required, it is highly recommended. A <uniqueKey> is present in
>>>>>>> almost all Solr
>>>>>>>   installations. See the <uniqueKey> declaration below where
>>>>>>> <uniqueKey> is set to "id".
>>>>>>> -->
>>>>>>> <field name="id" type="string" indexed="true" stored="true"
>>>>>>> required="true" multiValued="false" />
>>>>>>>
>>>>>>> and…
>>>>>>>
>>>>>>> <!-- Field to use to determine and enforce document uniqueness.
>>>>>>>    Unless this field is marked with required="false", it will be a
>>>>>>> required field
>>>>>>> -->
>>>>>>> <uniqueKey>id</uniqueKey>
>>>>>>>
>>>>>>> If you’re creating some key/value pair with uuid as the key as you feed
>>>>>>> documents in, and you know that the uuid values you’re creating are
>>>> unique,
>>>>>>> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
>>>>>>> change the key name you send in from ‘uuid’ to ‘id’.
>>>>>>>
>>>>>>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving...@gmail.com<mailto:
>>>>>>> simpleliving...@gmail.com>> wrote:
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I am seeing interesting behavior on the replicas , I have a single
>>>>>>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
>>>>>>> number of documents ~375 that are replicated across the six replicas .
>>>>>>>
>>>>>>> The interesting thing is that the same  document has a different id in
>>>>>>> each one of those replicas .
>>>>>>>
>>>>>>> This is causing the fq(id:xyz) type queries to fail, depending on
>>>>>>> which replica the query goes to.
>>>>>>>
>>>>>>> I have  specified the id field in the following manner in schema.xml,
>>>>>>> is it the right way to specifiy an auto generated id in  SolrCloud ?
>>>>>>>
>>>>>>>     <field name="id" type="uuid" indexed="true" stored="true"
>>>>>>>         required="true" multiValued="false" />
>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>

Reply via email to