Well, if i remember correctly (i have no testing facility at hand) WordDelimiterFilter maintains payloads on emitted sub terms. So if you use a KeywordTokenizer, input 'some text^PAYLOAD', and have a DelimitedPayloadFilter, the entire string gets a payload. You can then split that string up again in individual tokens. It is possible to abuse WordDelimiterFilter for it because it has a types parameter that you can use to split it on whitespace if its input is not trimmed. Otherwise you can use any other character instead of a space as your input.
This is a crazy idea, but it might work. -----Original message----- > From:Jamie Johnson <jej2...@gmail.com> > Sent: Tuesday 25th August 2015 19:37 > To: solr-user@lucene.apache.org > Subject: Re: Tokenizers and DelimitedPayloadTokenFilterFactory > > To be clear, we are using payloads as a way to attach authorizations to > individual tokens within Solr. The payloads are normal Solr Payloads > though we are not using floats, we are using the identity payload encoder > (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for > storing a byte[] of our choosing into the payload field. > > This works great for text, but now that I'm indexing more than just text I > need a way to specify the payload on the other field types. Does that make > more sense? > > On Tue, Aug 25, 2015 at 12:52 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > This really sounds like an XY problem. Or when you use > > "payload" it's not the Solr payload. > > > > So Solr Payloads are a float value that you can attach to > > individual terms to influence the scoring. Attaching the > > _same_ payload to all terms in a field is much the same > > thing as boosting on any matches in the field at query time > > or boosting on the field at index time (this latter assuming > > that different docs would have different boosts). > > > > So can you back up a bit and tell us what you're trying to > > accomplish maybe we can be sure we're both talking about > > the same thing ;) > > > > Best, > > Erick > > > > On Tue, Aug 25, 2015 at 9:09 AM, Jamie Johnson <jej2...@gmail.com> wrote: > > > I would like to specify a particular payload for all tokens emitted from > > a > > > tokenizer, but don't see a clear way to do this. Ideally I could specify > > > that something like the DelimitedPayloadTokenFilter be run on the entire > > > field and then standard analysis be done on the rest of the field, so in > > > the case that I had the following text > > > > > > this is a test\Foo > > > > > > I would like to create tokens "this", "is", "a", "test" each with a > > payload > > > of Foo. From what I'm seeing though only test get's the payload. Is > > there > > > anyway to accomplish this or will I need to implement a custom tokenizer? > > >