WordDelimiterFilter combined with PositionFilter

2010-09-24 Thread Mathias Walter
Hi, I'm combined the WordDelimiterFilter with the PositionFilter to prevent the creation of expensive Phrase and MultiPhraseQueries. But if I now parse an escaped string consisting of two terms, the analyser returns a BooleanQuery. That's not what I would expect. If a string is escaped, I would

AW: WordDelimiterFilter combined with PositionFilter

2010-09-29 Thread Mathias Walter
Hi Robert, > On Fri, Sep 24, 2010 at 3:54 AM, Mathias Walter wrote: > > > Hi, > > > > I'm combined the WordDelimiterFilter with the PositionFilter to prevent the > > creation of expensive Phrase and MultiPhraseQueries. But > > if I now parse an es

RE: Using Solr Analyzers in Lucene

2010-10-05 Thread Mathias Walter
Hi Max, why don't you use WordDelimiterFilterFactory directly? I'm doing the same stuff inside my own analyzer: final Map args = new HashMap(); args.put("generateWordParts", "1"); args.put("generateNumberParts", "1"); args.put("catenateWords", "0"); args.put("catenateNumbers", "0"); args.put("ca

FieldCache

2010-10-21 Thread Mathias Walter
Hi, does a field which should be cached needs to be indexed? I have a binary field which is just stored. Retrieving it via FieldCache.DEFAULT.getTerms returns empty ByteRefs. Then I found the following post: http://www.mail-archive.com/d...@lucene.apache.org/msg05403.html How can I use the Fi

AW: FieldCache

2010-10-25 Thread Mathias Walter
ving > it is usually a rare enough operation that caching is irrelevant. > > This smells like an XY problem, see: > http://people.apache.org/~hossman/#xyproblem > > If this seems like gibberish, could you explain your problem > a little more? > > Best > Erick > >

RE: FieldCache

2010-10-25 Thread Mathias Walter
Hi, > On Mon, Oct 25, 2010 at 3:41 AM, Mathias Walter > wrote: > > I indexed about 90 million sentences and the PAS (predicate argument > structures) they consist of (which are about 500 million). Then > > I try to do NER (named entity recognition) by searching about 5 mi

IndexableBinaryStringTools (was FieldCache)

2010-11-02 Thread Mathias Walter
Hi, > > [...] I tried to use IndexableBinaryStringTools to re-encode my 11 byte > > array. The size was increased to 7 characters (= 14 bytes) > > which is still a gain of more than 50 percent compared to the UTF8 > > encoding. BTW: I found no sample how to use the > > IndexableBinaryStringTools c