Thanks, Alex. I’ll experiment with it.

-R

On 3/22/17, 4:38 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote:

    You could provide the URP chain name (or individual URPs) when you
    index a particular document type, but that requires you to send all
    document types to put signature on together.
    
    Or you could have a custom URP that skips other ones (they are
    chained), though that's messier.
    
    And I think you want overwriteDupes as "false" actually, otherwise URP
    will delete the previous matching document.
    
    Regards,
       Alex.
    ----
    http://www.solr-start.com/ - Resources for Solr users, new and experienced
    
    
    On 22 March 2017 at 15:46, Ronald Wood <rw...@smarsh.com> wrote:
    > Thanks. I had seen that page but had passed it over since I don’t want to 
do de-duping (text fields with the exact same text are possible and not cause 
for de-dupe).
    >
    > If I want just to store the signature, it looks like I define the 
signatureField in the configuration and set overwriteDupes to true (since I 
don’t actually regard them as dupes).
    >
    > I guess the one downside to this is that the processor will run 
regardless of the document type (we have 6 types and only 3 need hashes on 
text). Or maybe empty values for fields stops the processor? No signature is 
needed when the text fields are not provided.
    >
    > -R
    >
    > On 3/22/17, 3:20 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote:
    >
    >     You'd use CloneField URP
    >     
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
    >
    >     Then you do your custom algorithm. Or - as I just remembered - use one
    >     of the hash ones described in dedupe section:
    >     https://cwiki.apache.org/confluence/display/solr/De-Duplication (which
    >     don't see to require CloneField anyway).
    >
    >     Regards,
    >        Alex.
    >     ----
    >     http://www.solr-start.com/ - Resources for Solr users, new and 
experienced
    >
    >
    >     On 22 March 2017 at 14:55, Ronald Wood <rw...@smarsh.com> wrote:
    >     > I suppose it could be, but the flexibility of using copy directives 
is appealing for handling multiple fields as defined in the schema.
    >     >
    >     > Since I have rarely looked at the UpdateRequestProcessor, I guess I 
don’t know if it could take multiple fields to hash, and if so how that would 
be expressed.
    >     >
    >     > -R
    >     >
    >     > On 3/22/17, 2:21 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> 
wrote:
    >     >
    >     >     Can this be done at the UpdateRequestProcessor stage?
    >     >
    >     >     Regards,
    >     >         Alex
    >     >
    >     >
    >     >     On 22 Mar 2017 1:48 PM, "Ronald Wood" <rw...@smarsh.com> wrote:
    >     >
    >     >     I have been mulling over the usefulness of a new Hash field 
type for being
    >     >     able to validate data that is indexed but not stored. 
Basically, I’d use
    >     >     copy directives to copy all fields to be hashed to the new hash 
field and
    >     >     store a SHA-256 hash as a string. I’m still not sure how 
valuable it would
    >     >     for us. Maybe someone has already done something similar?
    >     >
    >     >     However, I was wondering in general about how one would go about
    >     >     implementing and integrating a few FieldType.
    >     >
    >     >     Looking at UUIDField<https://github.com/apache/lucene-solr/blob/
    >     >     
master/solr/core/src/java/org/apache/solr/schema/UUIDField.java> as an
    >     >     example, the work seems moderate. But then the question is, how 
would I
    >     >     integrate it? Just drop in a new jar with the class or does it 
have to be
    >     >     integrated into Solr as a proper commit?
    >     >
    >     >     If it were valuable for others, I would love to contribute it, 
should we go
    >     >     ahead with it. But I already have had trouble getting our Legal 
Dept. to
    >     >     give the go ahead to contribute the code that worked for 
re-indexing
    >     >     docValues in place (SOLR-9437). ☹
    >     >
    >     >     -Ronald S. Wood
    >     >
    >     >
    >
    >
    

Reply via email to