I have an autophrase configured for 'wheel chair' and if I run analysis for 'super wheel chair awesome' such that it would index to 'super wheelchair awesome' this is how mine behaves: http://i.imgur.com/iR4IgGp.png
When I did the implementation that is how I thought the positioning should work. Do you think it should be different? On Fri, Mar 20, 2015 at 11:10 AM, trhodesg <trhodes...@gmail.com> wrote: > > > > > > Sorry, i can see my post is munged. > This seems to display it legibly > > > http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-td4173808.html > > I'm new to all this, so i hesitate to say the indexing isn't > correct. But my understanding is the query, "republic > of china", will only match > the indexing, republic(n) of(n+1) china(n+2) Since > the original APTF indexes this as republic(n) of(n+3) china(n+7) > that query will fail. Wouldn't it be more logical to leave the > original token numbering unchanged and just add the phrase token > with the same number as the last word in the matched series? > > BTW, i looked at your code re this. It is quite informative to a > newbie. Thanks! > > > On 3/19/2015 11:38 AM, James Strassburg [via Lucene] wrote: > > Sorry, I've been a bit unfocused from this list for a > bit. When I was > > working with the APTF code I rewrote a big chunk of it and didn't > include > > the inclusion of the original tokens as I didn't need it at the > time. That > > feature could easily be added back in. I will see if I can find a > bit of > > time for that. > > > As for the other part of your message, are you suggesting that the > token > > indexes are not correct? There is a bit of a formatting issue with > the text > > and I'm not sure what you're getting at. Can you explain further > please? > > > On Sun, Feb 8, 2015 at 3:04 PM, trhodesg < [hidden email] > > wrote: > > > > Thanks to everyone for the thought, time and effort put > into > > > AutoPhrasingTokenFilter(APTF)! It's a real lifesaver. > > > While trying to add APTF to my indexing, i discovered that > the original > > > (TS) > > > version throws an exception while indexing a 100MB PDF. The > error > > > isException writing document to the index; possible > analysis errorThe > > > modified (JS) version runs without error, but it removes > the tokens used to > > > create the phrase. They are needed. > > > Before looking into this i have a question; Solr would > normally tokenize > > > the > > > phrasethe peoples republic of china isasthe(1) peoples(2) > republic(3) of(4) > > > china(5) is(6) > > > Defining the APTF phrase file asthe Solr admin analysis > page reports that > > > the APTF indexer tokenizes the phrase asWould it be > possible for someone to > > > explain the reasoning behind the discontinuous token > numbering? As it is > > > now > > > phrase queries such as "republic of china" will fail. And i > can't get > > > proximity queries like "republic of"~10 to work either > (though it seems > > > they > > > should). Wouldn't it be more flexible to return the > following > > > tokenizationThis allows spurious matches such as "peoples > peoplesrepublic" > > > but it seems like this type of event would be very rare. It > has the > > > advantage of allowing phrase queries to continue working > the way most users > > > think. > > > Thank you for supporting more than one entity definition > per phrase (ie > > > peoplesrepublic and peoplesrepublicofchina). This is type > of contraction is > > > common in longer documents, especially when the first used > phrase ends with > > > a preposition. It helps support robust matching. > > > > > > > > > > > > -- > > > View this message in context: > > > > http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4184888.html > > Sent from the Solr - User mailing list archive at > Nabble.com. > > > > > > > > > > If you reply to this email, your > message will be added to the discussion below: > > http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4194036.html > > > To unsubscribe from Have anyone used Automatic Phrase > Tokenization (AutoPhrasingTokenFilterFactory) ?, click > here . > NAML > > > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4194205.html > Sent from the Solr - User mailing list archive at Nabble.com. >