: currently the solr query language is not enough for our needs. : I understand it is possible to add our own customized query parse to the : system, but I was wondering if anybody have done that and if there is any : idea to share how and from where to start.
In Solr 1.2, you'd need to write your own custom RequestHandler and put whatever parsing logic you want in it (but you'd need to cut/paste everything else from one of the existing RequestHandlers to do anything useful once you parse the query. With the solr trunk, you can just write a QParserPlugin and configure it in solrconfig.xml. QParserPlugin isn't very well documented yet, but if you start by looking at the LuceneQParserPlugin as an example. As for the type of query you are describing... : paragraphs proximity i.e. (termsgroup1) near/n (termgroup2) termsgroup1 : n paragraph apart from termgroup2 : finding terms for number of times i.e. atleast/n abcd in text abcd : should show up atleast n times ...near/n can be implemented using Lucene's SpanNearQuery (where your termgroups are just SpanOrQueries). atleast/n should be doable using SpanNotQuery, which makes sure that two span queries don't "overlap" .. so if you make n SpanTermQuery objects for abcd, you can nest them all in a big chain of SpanNotQueries. (allthough there's probably a much more efficient way to accomplish the same thing if you wrote your own custom subclass of TermQuery and made it only match if the term freq is greater then n) -Hoss