Thanks , I also noticed that the mandatory _version_ field is also uniquely generated for every document in the collection , can this be used as an unique value instead of generating the hashcode for the urlField.
I want to avoid creation of a custom unique filed if _version_ field which is mandated for schema.xml actually does that for me. On Thu, Nov 13, 2014 at 8:07 AM, Garth Grimm <garthgr...@averyranchconsulting.com> wrote: > OK. So it sounds like doctorURL is a good key, but you don’t like the > special characters. I’ve used MD5 hashes of URLs before as a way to convert > unique URLs into unique alphanumeric strings in a repeatable way. I think > most programming languages contain libraries for doing that as you feed the > data to Solr (Java certainly does). Other hashing or encoding mechanisms > could be used if you wanted to be able to programmatically convert from the > doctorURL to the string you want to use and back again. > > Anyway, the point there being that you have a repeatable unique key that is > derived directly from the data you’re storing. Not a random ID value that > will be different every time you feed the same thing in. > > BTW, you can certainly use a custom field type to do the hashing work, but > I’d suggest you do that before feeding the data to SolrCloud. If you do it > outside of SolrCloud, then SolrCloud can use it for routing to the correct > shard. If you try to do it solely in a field type, the field type output > won’t be available until the indexing is actually occurring, which is too > late for routing purposes. And that means you can’t ensure that subsequent > re-feeds of the same thing will overwrite the old values since you can’t make > sure they get routed to the same shard. > >> On Nov 12, 2014, at 7:50 PM, Meraj A. Khan <mera...@gmail.com> wrote: >> >> Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup >> mechanism because urls can have special characters that can caise issue >> with Solr lookup. >> >> I guess I should rephrase my question to ,how to auto generate the unique >> keys in the id field when using SolrCloud? >> On Nov 12, 2014 7:28 PM, "Garth Grimm" <garthgr...@averyranchconsulting.com> >> wrote: >> >>> You mention you already have a unique Key identified for the data you’re >>> storing in Solr: >>> >>>> <uniqueKey>doctorId<uniquekey> >>> >>> If that’s the field you’re using to uniquely identify each thing you’re >>> storing in the solr index, why do you want to have an id field that is >>> populated with some random value? You’ll be using the doctorId field as >>> the key, and the id field will have no real meaning in your Data Model. >>> >>> If doctorId actually isn’t unique to each item you plan on storing in >>> Solr, is there any other field that is? If so, use that field as your >>> unique key. >>> >>> Remember, this uniqueKeys are usually used for routing documents to shards >>> in SolrCloud, and are used to ensure that later updates of the same “thing” >>> overwrite the old one, rather than generating multiple copies. So the keys >>> really should be something derived from the data your storing. I’m not >>> sure if I understand why you would want to have the key randomly generated. >>> >>>> On Nov 12, 2014, at 6:39 PM, S.L <simpleliving...@gmail.com> wrote: >>>> >>>> Just tried adding <uniqueKey>id</uniqueKey> while keeping id type= >>>> "string" only blank ids are being generated ,looks like the id is being >>>> auto generated only if the the id is set to type uuid , but in case of >>>> SolrCloud this id will be unique per replica. >>>> >>>> Is there a way to generate a unique id both in case of SolrCloud with >>> out >>>> using the uuid type or not having a per replica unique id? >>>> >>>> The uuid in question is of type . >>>> >>>> <fieldType name="uuid" class="solr.UUIDField" indexed="true" /> >>>> >>>> >>>> On Wed, Nov 12, 2014 at 6:20 PM, S.L <simpleliving...@gmail.com> wrote: >>>> >>>>> Thanks. >>>>> >>>>> So the issue here is I already have a <uniqueKey>doctorId<uniquekey> >>>>> defined in my schema.xml. >>>>> >>>>> If along with that I also want the <id></id> field to be automatically >>>>> generated for each document do I have to declare it as a <uniquekey> as >>>>> well , because I just tried the following setting without the uniqueKey >>> for >>>>> id and its only generating blank ids for me. >>>>> >>>>> *schema.xml* >>>>> >>>>> <field name="id" type="string" indexed="true" stored="true" >>>>> required="true" multiValued="false" /> >>>>> >>>>> *solrconfig.xml* >>>>> >>>>> <updateRequestProcessorChain name="uuid"> >>>>> >>>>> <processor class="solr.UUIDUpdateProcessorFactory"> >>>>> <str name="fieldName">id</str> >>>>> </processor> >>>>> <processor class="solr.RunUpdateProcessorFactory" /> >>>>> </updateRequestProcessorChain> >>>>> >>>>> >>>>> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm < >>>>> garthgr...@averyranchconsulting.com> wrote: >>>>> >>>>>> Looking a little deeper, I did find this about UUIDField >>>>>> >>>>>> >>>>>> >>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html >>>>>> >>>>>> "NOTE: Configuring a UUIDField instance with a default value of "NEW" >>> is >>>>>> not advisable for most users when using SolrCloud (and not possible if >>> the >>>>>> UUID value is configured as the unique key field) since the result >>> will be >>>>>> that each replica of each document will get a unique UUID value. Using >>>>>> UUIDUpdateProcessorFactory< >>>>>> >>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html >>>> >>>>>> to generate UUID values when documents are added is recomended >>> instead.” >>>>>> >>>>>> That might describe the behavior you saw. And the use of >>>>>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered >>> well >>>>>> here: >>>>>> >>>>>> >>>>>> >>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/ >>>>>> >>>>>> Though I’ve not actually tried that process before. >>>>>> >>>>>> On Nov 11, 2014, at 7:39 PM, Garth Grimm < >>>>>> garthgr...@averyranchconsulting.com<mailto: >>>>>> garthgr...@averyranchconsulting.com>> wrote: >>>>>> >>>>>> “uuid” isn’t an out of the box field type that I’m familiar with. >>>>>> >>>>>> Generally, I’d stick with the out of the box advice of the schema.xml >>>>>> file, which includes things like…. >>>>>> >>>>>> <!-- Only remove the "id" field if you have a very good reason to. >>>>>> While not strictly >>>>>> required, it is highly recommended. A <uniqueKey> is present in >>>>>> almost all Solr >>>>>> installations. See the <uniqueKey> declaration below where >>>>>> <uniqueKey> is set to "id". >>>>>> --> >>>>>> <field name="id" type="string" indexed="true" stored="true" >>>>>> required="true" multiValued="false" /> >>>>>> >>>>>> and… >>>>>> >>>>>> <!-- Field to use to determine and enforce document uniqueness. >>>>>> Unless this field is marked with required="false", it will be a >>>>>> required field >>>>>> --> >>>>>> <uniqueKey>id</uniqueKey> >>>>>> >>>>>> If you’re creating some key/value pair with uuid as the key as you feed >>>>>> documents in, and you know that the uuid values you’re creating are >>> unique, >>>>>> just change the field name and unique key name from ‘id’ to ‘uuid’. Or >>>>>> change the key name you send in from ‘uuid’ to ‘id’. >>>>>> >>>>>> On Nov 11, 2014, at 7:18 PM, S.L <simpleliving...@gmail.com<mailto: >>>>>> simpleliving...@gmail.com>> wrote: >>>>>> >>>>>> Hi All, >>>>>> >>>>>> I am seeing interesting behavior on the replicas , I have a single >>>>>> shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small >>>>>> number of documents ~375 that are replicated across the six replicas . >>>>>> >>>>>> The interesting thing is that the same document has a different id in >>>>>> each one of those replicas . >>>>>> >>>>>> This is causing the fq(id:xyz) type queries to fail, depending on >>>>>> which replica the query goes to. >>>>>> >>>>>> I have specified the id field in the following manner in schema.xml, >>>>>> is it the right way to specifiy an auto generated id in SolrCloud ? >>>>>> >>>>>> <field name="id" type="uuid" indexed="true" stored="true" >>>>>> required="true" multiValued="false" /> >>>>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> >>>>> >>> >>> >