Re: access matched token ids in the FacetComponent?

Mikhail Khludnev Tue, 22 Jan 2013 11:18:37 -0800

Dmitry,

Solr faceting is really fast due to using in-memory approach (keeping few
noticeable exceptions in mind), hence spans should be slower. Reading term
positions/payloads always has sensible gain. You can estimate it, if you
compare time for a phrase query "foo bar" with a plain conjunction +foo
+bar one.
It worth to mention that our SpansFacetComponent performed well enough,
even for public site. You can find my comment about performance numbers
"64К docs with 5-20 span positions per each. Search result length 100-2000
docs with 3-5 facet fields. It shows 100 q/sec on an average datacenter
box."



On Mon, Jan 21, 2013 at 5:23 PM, Dmitry Kan <solrexp...@gmail.com> wrote:

> Mikhail,
>
> Thanks for the guidance! This indeed sounds challenging, esp. given the
> bonus of fighting with solr 3.x in light of disjunction queries. Although,
> moving to solr 4.0 if this makes life easier should be ok.
>
> But even before getting one's hands dirty, it would be good to know, if
> this is going to fly performance wise. Has your span based implementation
> been fast enough? Did it stand close to the native solr's faceting in terms
> of performance?
>
> On Mon, Jan 21, 2013 at 2:33 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > Dmitry,
> >
> > First of all, FacetComponent is the Solr's out-of-the-box functionality.
> It
> > runs after search is done and accesses the bitSet of the found document,
> > i.e. there is no spans (matched terms positions) there at all.
> >
> > StandardFacetsAccumulator sounds like the "brand new" lucene faceting
> > library. see http://shaierera.blogspot.com/. I don't think but don't
> > exactly know whether they are accessible there too.
> >
> > Some time ago my team successfully prototyped facet component backed on
> > spans
> >
> blog.griddynamics.com/2011/10/solr-experience-search-parent-child.htmlbut
> > I don't suggest you go this way.
> > I can suggest you start from the following:
> > - supply PostFilter/DelegatingCollector
> > http://yonik.com/posts/advanced-filter-caching-in-solr/
> > - the DelegatingCollector will accept the scorer instance
> > - if this scorer is BooleanScorer2 (but not BooleanScorer!), you can
> access
> > the SpanQueryScorer in one of the legs and try to access the matched
> spans
> > - if you are in 3.x you'll have a problem with disjunction queries.
> >
> > it seems challenging, doesn't it?
> >
> > 18.01.2013 17:40 пользователь "Dmitry Kan" <solrexp...@gmail.com>
> написал:
> >
> > > Mikhail,
> > >
> > > Do you say, that it is not possible to access the matched terms
> positions
> > > in the FacetComponent? If that would be possible (somewhere in the
> > > StandardFacetsAccumulator class, where docids are available), then by
> > > knowing the matched term positions I can do some school simple math to
> > > calculate the sentence counts per doc id.
> > >
> > > Dmitry
> > >
> > > On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev <
> > > mkhlud...@griddynamics.com> wrote:
> > >
> > > > Dmitry,
> > > >
> > > > It definitely seems like postptocessing highlighter's output. The
> also
> > > > approach is:
> > > > - limit number of occurrences of a word in a sentence to 1
> > > > - play with facet by function patch
> > > > https://issues.apache.org/jira/browse/SOLR-1581 accomplished by tf()
> > > > function.
> > > >
> > > > It doesn't seem like much help.
> > > >
> > > > On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan <solrexp...@gmail.com>
> > > wrote:
> > > >
> > > > > that we actually require the count of the sentences inside
> > > > > each document where the hits were found.
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > > Principal Engineer,
> > > > Grid Dynamics
> > > >
> > > > <http://www.griddynamics.com>
> > > >  <mkhlud...@griddynamics.com>
> > > >
> > >
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: access matched token ids in the FacetComponent?

Reply via email to