Why am I getting this email? Who are all BCC-ed?

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, March 7, 2019 10:33 AM
To: solr-user@lucene.apache.org
Subject: Re: TrieDate field in UpdateRequestProcessorChain

bq.  should be fine as update chain process is pre-process of document indexing.

You may be confusing document _routing_ with document _indexing_.

When a document comes in to Solr, the first thing that happens is the doc
is examined and the ID is used to route the raw document to the appropriate
shard. If your update processor is in the right place and modifies the
<uniqueKey> before this, then the changes are included in the hash.

If you change your <uniqueKey> _after_ the hash is computed, then the
routing is just on the original value.

In both cases the actual ID put in the _index_ is the modified one.

The Solr admin UI is displaying the _stored_ value, which is the
original input. The actual value in the index is just the Unix epoch
timestamp.

Best,
Erick


> On Mar 7, 2019, at 9:47 AM, Anil <anilk...@gmail.com> wrote:
> 
> Hi Eric,
> 
> Thanks for your response.
> 
> Yes, we are using solr cloud. i understood your point and has been
> considered while designing unique id. unique id (i.e Id) is set using
> update processor chain only and believe this should be fine as update chain
> process is pre-process of document indexing.
> 
> Could you please point me to resources to understand TrieDateField
> conversion to 2019-01-03T12:00:00Z format which is displayed solr Admin UI
> ? Thanks.
> 
> 
> Regards,
> Anil
> 
> On Thu, 7 Mar 2019 at 22:49, Erick Erickson <erickerick...@gmail.com> wrote:
> 
>> I’d probably go with a StatelessScriptUpdateProcessorFactory. It allows
>> you to manipulate the incoming doc in whatever scripting language you have
>> access to, javascript, groovy, etc.
>> 
>> Do be aware though that if your using SolrCloud and the id field is also
>> the <uniqueKey> you have to be very careful to do this transformation
>> _before_ the document is routed to the proper shard. Since routing is based
>> on a hash of the <uniqueKey>, any function that tries to send the doc to
>> the correct shard will not be reliable.
>> 
>> For instance, say your id is “1” and your date is “3_4_18”. The shard
>> would be the hash of “1”. However you’ve changed the id _after_ the doc has
>> been routed to the shard, and it’s now “1_3_4_18”. Next, say you try to
>> delete by ID. Solr will route the delete request to the hash of 1_3_4_18
>> which will very likely be the wrong shard.
>> 
>> Best,
>> Erick
>> 
>>> On Mar 6, 2019, at 9:33 PM, Anil <anilk...@gmail.com> wrote:
>>> 
>>> HI Team,
>>> 
>>> I am using solr 6.6.2 and my schema includes a date field 'window_time'
>> of
>>> TrieDate. window_time is added to doc id of the solr document using
>>> CloneFieldUpdateProcessorFactory and TruncateFieldUpdateProcessorFactory.
>>> 
>>> I noticed different date formats in window_time and doc id
>>> 
>>> "window_time" : *"2019-01-03T12:00:00Z"*
>>> and
>>> "id": 123445-products-*Thu Jan 03 12:00:00 UTC 2019*
>>> 
>>> 
>>> i have checked CloneFieldUpdateProcessorFactory and
>>> TruncateFieldUpdateProcessorFactory soruce code, didnt find much
>>> customization there.
>>> 
>>> Is there any way to keep date format of wind_time and its value format in
>>> ID same ?
>>> 
>>> Thanks in advance.
>>> 
>>> Regards,
>>> Anil
>> 
>> 



Reply via email to