Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-26 Thread Jamie Johnson
iven that Lucene/Solr > >>> supports > >>> > payloads on these field types, they just aren't exposed. > >>> > > >>> > As always I appreciate any ideas if I'm barking up the wrong tree > here. > >>> > > >>> > On Tue, Au

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Erick Erickson
2015 at 2:52 PM, Markus Jelsma < >>> markus.jel...@openindex.io> >>> > wrote: >>> > >>> >> Well, if i remember correctly (i have no testing facility at hand) >>> >> WordDelimiterFilter maintains payloads on emitted sub terms. So if you >>> us

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson
rectly (i have no testing facility at hand) >> >> WordDelimiterFilter maintains payloads on emitted sub terms. So if you >> use >> >> a KeywordTokenizer, input 'some text^PAYLOAD', and have a >> >> DelimitedPayloadFilter, the entire string gets a payload. You c

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson
buse > >> WordDelimiterFilter for it because it has a types parameter that you can > >> use to split it on whitespace if its input is not trimmed. Otherwise you > >> can use any other character instead of a space as your input. > >> > >> This is a craz

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Erick Erickson
s parameter that you can >> use to split it on whitespace if its input is not trimmed. Otherwise you >> can use any other character instead of a space as your input. >> >> This is a crazy idea, but it might work. >> >> -Original message- >>

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson
hitespace if its input is not trimmed. Otherwise you > can use any other character instead of a space as your input. > > This is a crazy idea, but it might work. > > -Original message- > > From:Jamie Johnson > > Sent: Tuesday 25th August 2015 19:37 > > To: sol

RE: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Markus Jelsma
t it might work. -Original message- > From:Jamie Johnson > Sent: Tuesday 25th August 2015 19:37 > To: solr-user@lucene.apache.org > Subject: Re: Tokenizers and DelimitedPayloadTokenFilterFactory > > To be clear, we are using payloads as a way to attach authorizations t

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Erick Erickson
Oh My. What fun! bq: I need a way to specify the payload on the other field types Not to my knowledge. The payload mechanism is built on the capability of having a filter in the analysis chain. And there's no analysis chain with primitive types (string, numeric and the like). Hmmm. Totally off t

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Jamie Johnson
To be clear, we are using payloads as a way to attach authorizations to individual tokens within Solr. The payloads are normal Solr Payloads though we are not using floats, we are using the identity payload encoder (org.apache.lucene.analysis.payloads.IdentityEncoder) which allows for storing a by

Re: Tokenizers and DelimitedPayloadTokenFilterFactory

2015-08-25 Thread Erick Erickson
This really sounds like an XY problem. Or when you use "payload" it's not the Solr payload. So Solr Payloads are a float value that you can attach to individual terms to influence the scoring. Attaching the _same_ payload to all terms in a field is much the same thing as boosting on any matches in