Next Word - Any Suggestions?

2010-10-26 Thread Christopher Ball
Am about to implement a custom query that is sort of mash-up of Facets, Highlighting, and SpanQuery - but thought I'd see if anyone has done anything similar. In simple words, I need facet on the next word given a target word. For example, if my index only had the following 5 documents (co

RE: Index an entire Phrase and not it's constituent parts?

2010-03-13 Thread Christopher Ball
Thank you for the idea Mitch, but it just doesn't seem right that I should have to revert to Scoring when what I really need seems so fundamental. Logically, what I want is a "phrase filter factory" that would match on phrases listed in a file, like stopwords, but in this case index the match and

RE: Index an entire Phrase and not it's constituent parts?

2010-03-13 Thread Christopher Ball
quot; > without giving more details about the "X" so that we can understand the > full issue. Perhaps the best solution doesn't involve "Y" at all? > > See Also: http://www.perlmonks.org/index.pl?node_id=542341 > > Erick > > > On Tue, Mar 9, 2010 at 6:

RE: Index an entire Phrase and not it's constituent parts?

2010-03-09 Thread Christopher Ball
nizersTokenFilters HTH <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>Erick On Thu, Mar 4, 2010 at 2:31 PM, Christopher Ball < christopher.b...@metaheuristica.com> wrote: > How can I Index an entire Phrases and not it's constituent parts? > > > &

Count Sum of Term Occurrences?

2010-03-04 Thread Christopher Ball
How can I count the total number of a specific terms occurrences? How can you get the total number of occurrences of a term across all documents (e.g. Sum of the number of occurrences of a specific term in each doc)? For example, I have 3 documents, document #1 has "The green bird is flyin

Index an entire Phrase and not it's constituent parts?

2010-03-04 Thread Christopher Ball
How can I Index an entire Phrases and not it's constituent parts? I want to index collations as a single term in the index, and not as the multiple terms that comprise the phrase, for example, I want to index: "as much as" but not the independent parts: "as", "much", "as". Any guidance appr

RE: The Riddle of the Underscore and the Dollar Sign . . .

2010-02-11 Thread Christopher Ball
I think I am making some progress - the key suggestion was to look at the analysis.jsp which I foolishly had forgotten =(. I think it is actually a bug in the ShingleFilterFactory when it is used in subsequent to another Filter which removes tokens, e.g. StopFilterFactory or WordDelimiterFactory.

RE: The Riddle of the Underscore and the Dollar Sign . . .

2010-02-11 Thread Christopher Ball
Unfortunately, the underscore is being quite resilient =( I tried the solr.MappingCharFilterFactory and know the mapping is working as I am changing "c" => "q" just fine. But the underscore refuses to go! I am baffled . . . -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.co

The Riddle of the Underscore and the Dollar Sign . . .

2010-02-10 Thread Christopher Ball
I am perplexed by the behavior I am seeing of the Solr Analyzer and Filters with regard to Underscores. I am trying to get rid of underscores('_') when shingling, but seem unable to do so with a Stopwords Filter. And yet underscores are being removed when I am not even trying to by the WordDelimi

The Riddle of the Underscore and the Dollar Sign

2010-02-03 Thread Christopher Ball
I am perplexed by the behavior I am seeing of the Solr Analyzer and Filters with regard to Underscores. 1) I am trying to get rid of them when shingling, but seem unable to do so with a Stopwords Filter. And yet they are being removed when I am not even trying to by the WordDelimiter Filter

Querying for multi-term phrases only . . .

2010-01-28 Thread Christopher Ball
I am curious how I can query for multi-term phrases using the TermsComponent? The field I am searching has been shingled so it contains 2 and 3 word phrases. For example in the sample results below I want to only get back multi-word phrases such as "table of contents" and "under the" but no

RE: How to Implement SpanQuery in Solr . . ?

2010-01-28 Thread Christopher Ball
query must be span queries, and most query parsers generate non-span queries. I think there is code in the highlighter that uses spans that can do this conversion. -Yonik http://www.lucidimagination.com On Wed, Jan 27, 2010 at 12:24 PM, Christopher Ball wrote: > I am about to at

How to Implement SpanQuery in Solr . . ?

2010-01-27 Thread Christopher Ball
I am about to attempt to implementing the SpanQuery in Solr 1.4. I noticed there is a JIRA to add it in 1.5: * https://issues.apache.org/jira/browse/SOLR-1337 I also noticed a couple of email threads from Grant and Yonik about trying to implement it such as: * http://

Re: Using IDF to find Collactions and SIPs . . ?

2010-01-05 Thread Christopher Ball
Hoss, Thanks for your reply. As you pointed out the Terms Component alone with the terms.maxcount did the trick for single terms. And ShingleFilter did the trick for phrases. I have not ventured into Hadoop just yet - any examples you could point me to of simple map/reduce jobs?

RE: Listing Terms by Ascending IDF value . . ?

2010-01-05 Thread Christopher Ball
: Re: Listing Terms by Ascending IDF value . . ? On Tue, Jan 5, 2010 at 9:15 AM, Christopher Ball < christopher.b...@metaheuristica.com> wrote: > Hello, > > I am trying to get a list of highly unusual terms or phrases (for example a > TF of 1 or 2) within an entire index (essenti

Listing Terms by Ascending IDF value . . ?

2010-01-04 Thread Christopher Ball
Hello, I am trying to get a list of highly unusual terms or phrases (for example a TF of 1 or 2) within an entire index (essentially this would be the inverse of how Luke gives 'top terms' on the 'Overview' tab). I see how I can do this within a specific query using the Term Vector Componen