Re: term frequency with stemming

2015-07-27 Thread Aki Balogh
Hi Alessandro, I'm counting word frequencies on a site. All I want to do is, I want to count "running" and "run" as the same topic. It's not really fuzzy matching I believe -- i.e. I wouldn't want to match "running" and "sprinting". I think stemming should be it.. seems to work fine now.. TY, A

Re: term frequency with stemming

2015-07-27 Thread Alessandro Benedetti
A part the funny "crypted" message by Darin xD I would like to focus on the initial user requirement : "get term frequencies with fuzzy matching" Solr/Lucene offer you the support for fuzzy query independently of the way you token filter your terms at analysis time. You can run fuzzy queries with

Re: term frequency with stemming

2015-07-25 Thread Aki Balogh
I believe I found a solution: use a third-party stemmer to stem the term first, then pass it to termfreq. The only trick is, each term in a phrase has to be stemmed separately (i.e. "end-user experience" has to be broken down into "end-user" -> "end-us" and "experience" -> "experi") before being p

Re: term frequency with stemming

2015-07-24 Thread Darin Amos
Hi Dale, I would think the coffee shop is better, I have in-laws visiting at home. Thanks Darin > On Jul 24, 2015, at 12:04 PM, Aki Balogh wrote: > > Hi All, > > I'm using TermVectorComponent and stemming (Porter) in order to get term > frequencies with fuzzy matching. I'm stemming at index

term frequency with stemming

2015-07-24 Thread Aki Balogh
Hi All, I'm using TermVectorComponent and stemming (Porter) in order to get term frequencies with fuzzy matching. I'm stemming at index and query time. Is there a way to get term frequency from the index? * termfreq doesn't support stemming or wildcards * terms component doesn't allow additional