Re: Improving performance for use-case where large (200) number of phrase queries are used?

Robert Muir Wed, 24 Oct 2012 09:27:28 -0700

On Wed, Oct 24, 2012 at 11:09 AM, Aaron Daubman <daub...@gmail.com> wrote:
> Greetings,
>
> We have a solr instance in use that gets some perhaps atypical queries
> and suffers from poor (>2 second) QTimes.
>
> Documents (~2,350,000) in this instance are mainly comprised of
> various "descriptive fields", such as multi-word (phrase) tags - an
> average document contains 200-400 phrases like this across several
> different multi-valued field types.
>
> A custom QueryComponent has been built that functions somewhat like a
> very specific MoreLikeThis. A seed document is specified via the
> incoming query, its terms are retrieved, boosted both by query
> parameters as well as fields within the document that specify term
> weighting, sorted by this custom boosting, and then a second query is
> crafted by taking the top 200 (sorted by the custom boosting)
> resulting field values paired with their fields and searching for
> documents matching these 200 values.


a few more ideas:
* use shingles e.g. to turn two-word phrases into single terms (how
long is your average phrase?).
* in addition to the above, maybe for phrases with > 2 terms, consider
just a boolean conjunction of the shingled phrases instead of a "real"
phrase query: e.g. "more like this" -> (more_like AND like_this). This
would have some false positives.
* use a more aggressive stopwords list for your "MorePhrasesLikeThis".
* reduce this number 200, and instead work harder to prune out which
phrases are the "most descriptive" from the seed document, e.g. based
on some heuristics like their frequency or location within that seed
document, so your query isnt so massive.

Re: Improving performance for use-case where large (200) number of phrase queries are used?

Reply via email to