Thanks, Alex. I’ll experiment with it. -R
On 3/22/17, 4:38 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote: You could provide the URP chain name (or individual URPs) when you index a particular document type, but that requires you to send all document types to put signature on together. Or you could have a custom URP that skips other ones (they are chained), though that's messier. And I think you want overwriteDupes as "false" actually, otherwise URP will delete the previous matching document. Regards, Alex. ---- http://www.solr-start.com/ - Resources for Solr users, new and experienced On 22 March 2017 at 15:46, Ronald Wood <rw...@smarsh.com> wrote: > Thanks. I had seen that page but had passed it over since I don’t want to do de-duping (text fields with the exact same text are possible and not cause for de-dupe). > > If I want just to store the signature, it looks like I define the signatureField in the configuration and set overwriteDupes to true (since I don’t actually regard them as dupes). > > I guess the one downside to this is that the processor will run regardless of the document type (we have 6 types and only 3 need hashes on text). Or maybe empty values for fields stops the processor? No signature is needed when the text fields are not provided. > > -R > > On 3/22/17, 3:20 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote: > > You'd use CloneField URP > http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html > > Then you do your custom algorithm. Or - as I just remembered - use one > of the hash ones described in dedupe section: > https://cwiki.apache.org/confluence/display/solr/De-Duplication (which > don't see to require CloneField anyway). > > Regards, > Alex. > ---- > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 22 March 2017 at 14:55, Ronald Wood <rw...@smarsh.com> wrote: > > I suppose it could be, but the flexibility of using copy directives is appealing for handling multiple fields as defined in the schema. > > > > Since I have rarely looked at the UpdateRequestProcessor, I guess I don’t know if it could take multiple fields to hash, and if so how that would be expressed. > > > > -R > > > > On 3/22/17, 2:21 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote: > > > > Can this be done at the UpdateRequestProcessor stage? > > > > Regards, > > Alex > > > > > > On 22 Mar 2017 1:48 PM, "Ronald Wood" <rw...@smarsh.com> wrote: > > > > I have been mulling over the usefulness of a new Hash field type for being > > able to validate data that is indexed but not stored. Basically, I’d use > > copy directives to copy all fields to be hashed to the new hash field and > > store a SHA-256 hash as a string. I’m still not sure how valuable it would > > for us. Maybe someone has already done something similar? > > > > However, I was wondering in general about how one would go about > > implementing and integrating a few FieldType. > > > > Looking at UUIDField<https://github.com/apache/lucene-solr/blob/ > > master/solr/core/src/java/org/apache/solr/schema/UUIDField.java> as an > > example, the work seems moderate. But then the question is, how would I > > integrate it? Just drop in a new jar with the class or does it have to be > > integrated into Solr as a proper commit? > > > > If it were valuable for others, I would love to contribute it, should we go > > ahead with it. But I already have had trouble getting our Legal Dept. to > > give the go ahead to contribute the code that worked for re-indexing > > docValues in place (SOLR-9437). ☹ > > > > -Ronald S. Wood > > > > > >