Re: WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Mike L.
rom: Jack Krupansky To: solr-user@lucene.apache.org; Mike L. Sent: Sunday, April 5, 2015 8:23 AM Subject: Re: WordDelimiterFilterFactory - tokenizer question You have to tell the filter what types of tokens to generate - words, numbers. You told it to generate... nothing. You did te

Re: WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Jack Krupansky
You have to tell the filter what types of tokens to generate - words, numbers. You told it to generate... nothing. You did tell it to preserve the original, unfiltered token though, which is fine. -- Jack Krupansky On Sun, Apr 5, 2015 at 3:39 AM, Mike L. wrote: > Solr User Group, > I have a

WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Mike L.
Solr User Group,     I have a non-multivalied field with contains stored values similar to this: US100AUS100BUS100CUS100-DUS100BBA My assumption is - If I tokenized with the below fieldType definition, specifically the WDF -splitOnNumbers and the LowerCaseFilterFactory would have have provided

Re: simple tokenizer question

2013-12-08 Thread Josh Lincoln
AM, Upayavira wrote: > > > > > Have you tried a WhitespaceTokenizerFactory followed by the > > > WordDelimiterFilterFactory? The latter is perhaps more configurable at > > > what it does. Alternatively, you could use a RegexFilterFactory to > > > remove extran

Re: simple tokenizer question

2013-12-08 Thread Upayavira
avira > > > > On Sat, Dec 7, 2013, at 06:15 PM, Vulcanoid Developer wrote: > > > Hi, > > > > > > I am new to solr and I guess this is a basic tokenizer question so please > > > bear with me. > > > > > > I am trying to use SOLR to in

Re: simple tokenizer question

2013-12-08 Thread Vulcanoid Developer
> On Sat, Dec 7, 2013, at 06:15 PM, Vulcanoid Developer wrote: > > Hi, > > > > I am new to solr and I guess this is a basic tokenizer question so please > > bear with me. > > > > I am trying to use SOLR to index a few (Indian) legal judgments in text > >

Re: simple tokenizer question

2013-12-07 Thread Upayavira
n Sat, Dec 7, 2013, at 06:15 PM, Vulcanoid Developer wrote: > Hi, > > I am new to solr and I guess this is a basic tokenizer question so please > bear with me. > > I am trying to use SOLR to index a few (Indian) legal judgments in text > form and search against them. One of

simple tokenizer question

2013-12-07 Thread Vulcanoid Developer
Hi, I am new to solr and I guess this is a basic tokenizer question so please bear with me. I am trying to use SOLR to index a few (Indian) legal judgments in text form and search against them. One of the key points with these documents is that the sections/provisions of law usually have

Re: Tokenizer question

2012-10-30 Thread Jack Krupansky
M To: solr-user@lucene.apache.org Subject: Tokenizer question I could not find a solution to that in the documentation or the mailing list, so here's my question. I have files following the pattern: firstname_lastname_employeenumber.jpg I'm able to search for the single terms firstnam

Tokenizer question

2012-10-30 Thread RL
View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-question-tp4016932.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tokenizer Question

2011-07-20 Thread Jamie Johnson
Thanks, I'll try that now, I'm assuming I need to add the position increment and offset attributes? On Wed, Jul 20, 2011 at 3:44 PM, Chris Hostetter wrote: > > When the QueryParser gives hunks of text to an analyzer, and that analyzer > produces multiple terms, the query parser has to decide how

Re: Tokenizer Question

2011-07-20 Thread Chris Hostetter
When the QueryParser gives hunks of text to an analyzer, and that analyzer produces multiple terms, the query parser has to decide how to build a query out of it. if the terms have identicle position information, then it always builds an "OR" query (this is the typical synonym situation). If

Re: Tokenizer Question

2011-07-20 Thread Jamie Johnson
My use case really isn't names, I just used that as a simplification. I did look at the Synonym filter to see if I could implement a similar filter (if that was a more appropriate place to do so) but even after doing that I ended up with the same result. On Wed, Jul 20, 2011 at 12:07 PM, Kyle Lee

Re: Tokenizer Question

2011-07-20 Thread Kyle Lee
I'm not sure how to accomplish what you're asking, but have you considered using a synonyms file? This would also allow you to catch ostensibly unrelated name substitutes such as Robert -> Bob and Richard -> Dick. On Wed, Jul 20, 2011 at 10:57 AM, Jamie Johnson wrote: > I have a query which star

Tokenizer Question

2011-07-20 Thread Jamie Johnson
I have a query which starts out with something like name:"john", I need to expand this to something like name:("john" "johnny"). I've implemented a custom tokenzier which gets close, but isn't quite right it outputs name:"john johnny". Is there a simple example of doing what I'm attempting?

Re: Tokenizer question

2010-01-11 Thread rswart
Cristal clear. Thanks for your response&time! -- View this message in context: http://old.nabble.com/Tokenizer-question-tp27099119p27123281.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tokenizer question

2010-01-11 Thread Avlesh Singh
> > If the analyzer produces multiple Tokens, but they all have the same > position then the QueryParser produces a BooleanQuery will all SHOULD > clauses. -- This is what allows simple synonyms to work. > You rock Hoss!!! This is exactly the explanation I was looking for .. it is as simple as it

Re: Tokenizer question

2010-01-11 Thread Chris Hostetter
: q=PostCode:(1078 pw)+AND+HouseNumber:(39-43) : : the resulting parsed query contains a phrase query: : : +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:"39 43") This stems from some fairly fundemental behavior i nthe QueryParser ... each "chunk" of input that isn't deemed "markup (ie:

Re: Tokenizer question

2010-01-11 Thread rswart
y: >>> >>> 1. WordDelimiterFilterFactory with generateNumberParts=1 but this >>> results in >>> a phrase query >>> 2. PatternTokenizerFactory that splits on (\s+|-). >>> >>> But both options don't work. >>> >>

Re: Tokenizer question

2010-01-11 Thread Grant Ingersoll
>> >> 1. WordDelimiterFilterFactory with generateNumberParts=1 but this results in >> a phrase query >> 2. PatternTokenizerFactory that splits on (\s+|-). >> >> But both options don't work. >> >> Any suggestions

Re: Tokenizer question

2010-01-11 Thread Grant Ingersoll
suggestions on how to get rid of the phrase query? > > Thanks, > > Richard > -- > View this message in context: > http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search

Tokenizer question

2010-01-10 Thread rswart
suggestions on how to get rid of the phrase query? Thanks, Richard -- View this message in context: http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field tokenizer question

2009-03-24 Thread Chris Hostetter
: as far as I know solr.StrField is not analized but it is indexed as is : (verbatim). correct ... but there is definitely a bug here if the analysis.jsp is implying that an analyzer is being used... https://issues.apache.org/jira/browse/SOLR-1086 -Hoss

Re: Field tokenizer question

2009-03-23 Thread Giovanni De Stefano
ld" in your > > fieldType definition. > > Then reindex and commit. > > > > Koji > > > > > > > > -- > View this message in context: > http://www.nabble.com/Field-tokenizer-question-tp22594575p22653356.html > Sent from the Solr - User mailing list archive at Nabble.com. > >

Re: Field tokenizer question

2009-03-22 Thread Ashish P
lass="solr.TextField" instead of class="solr.StrField" in your > fieldType definition. > Then reindex and commit. > > Koji > > > -- View this message in context: http://www.nabble.com/Field-tokenizer-question-tp22594575p22653356.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field tokenizer question

2009-03-19 Thread Koji Sekiguchi
Ashish P wrote: I have created a field, Set class="solr.TextField" instead of class="solr.StrField" in your fieldType definition. Then reindex and commit. Koji

Field tokenizer question

2009-03-18 Thread Ashish P
committed. Am I missing something here? -- View this message in context: http://www.nabble.com/Field-tokenizer-question-tp22594575p22594575.html Sent from the Solr - User mailing list archive at Nabble.com.