Dmitry, Solr faceting is really fast due to using in-memory approach (keeping few noticeable exceptions in mind), hence spans should be slower. Reading term positions/payloads always has sensible gain. You can estimate it, if you compare time for a phrase query "foo bar" with a plain conjunction +foo +bar one. It worth to mention that our SpansFacetComponent performed well enough, even for public site. You can find my comment about performance numbers "64К docs with 5-20 span positions per each. Search result length 100-2000 docs with 3-5 facet fields. It shows 100 q/sec on an average datacenter box."
On Mon, Jan 21, 2013 at 5:23 PM, Dmitry Kan <solrexp...@gmail.com> wrote: > Mikhail, > > Thanks for the guidance! This indeed sounds challenging, esp. given the > bonus of fighting with solr 3.x in light of disjunction queries. Although, > moving to solr 4.0 if this makes life easier should be ok. > > But even before getting one's hands dirty, it would be good to know, if > this is going to fly performance wise. Has your span based implementation > been fast enough? Did it stand close to the native solr's faceting in terms > of performance? > > On Mon, Jan 21, 2013 at 2:33 PM, Mikhail Khludnev < > mkhlud...@griddynamics.com> wrote: > > > Dmitry, > > > > First of all, FacetComponent is the Solr's out-of-the-box functionality. > It > > runs after search is done and accesses the bitSet of the found document, > > i.e. there is no spans (matched terms positions) there at all. > > > > StandardFacetsAccumulator sounds like the "brand new" lucene faceting > > library. see http://shaierera.blogspot.com/. I don't think but don't > > exactly know whether they are accessible there too. > > > > Some time ago my team successfully prototyped facet component backed on > > spans > > > blog.griddynamics.com/2011/10/solr-experience-search-parent-child.htmlbut > > I don't suggest you go this way. > > I can suggest you start from the following: > > - supply PostFilter/DelegatingCollector > > http://yonik.com/posts/advanced-filter-caching-in-solr/ > > - the DelegatingCollector will accept the scorer instance > > - if this scorer is BooleanScorer2 (but not BooleanScorer!), you can > access > > the SpanQueryScorer in one of the legs and try to access the matched > spans > > - if you are in 3.x you'll have a problem with disjunction queries. > > > > it seems challenging, doesn't it? > > > > 18.01.2013 17:40 пользователь "Dmitry Kan" <solrexp...@gmail.com> > написал: > > > > > Mikhail, > > > > > > Do you say, that it is not possible to access the matched terms > positions > > > in the FacetComponent? If that would be possible (somewhere in the > > > StandardFacetsAccumulator class, where docids are available), then by > > > knowing the matched term positions I can do some school simple math to > > > calculate the sentence counts per doc id. > > > > > > Dmitry > > > > > > On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev < > > > mkhlud...@griddynamics.com> wrote: > > > > > > > Dmitry, > > > > > > > > It definitely seems like postptocessing highlighter's output. The > also > > > > approach is: > > > > - limit number of occurrences of a word in a sentence to 1 > > > > - play with facet by function patch > > > > https://issues.apache.org/jira/browse/SOLR-1581 accomplished by tf() > > > > function. > > > > > > > > It doesn't seem like much help. > > > > > > > > On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan <solrexp...@gmail.com> > > > wrote: > > > > > > > > > that we actually require the count of the sentences inside > > > > > each document where the hits were found. > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Sincerely yours > > > > Mikhail Khludnev > > > > Principal Engineer, > > > > Grid Dynamics > > > > > > > > <http://www.griddynamics.com> > > > > <mkhlud...@griddynamics.com> > > > > > > > > > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>