edismax - ignore/override qf field?

2013-12-17 Thread ses
With the edismax query handler, is it possible to provide a query parameter
which will specify a field (already defined in the qf parameter in
solrconfig.xml) to ignore. E.g. can you tell Solr to not search a specific
field?

An example may be where in solrconfig.xml qf is configured to search: 
title, author, description, fulltext

And in some circumstance, maybe a basic search you want to ignore the field
'fulltext', or perhaps even multiple fields such as both 'description' and
'fulltext'.

In the situation I'm thinking of there are many more qf fields than this
simple example, so defining a new query handler or specifying all the fields
to search at query time would be a less than ideal solution.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/edismax-ignore-override-qf-field-tp4107058.html
Sent from the Solr - User mailing list archive at Nabble.com.


PostingsHighlighter returning fields which don't match

2013-08-14 Thread ses
We are trying out the new PostingsHighlighter with Solr 4.2.1 and finding
that the highlighting section of the response includes self-closing tags for
all the fields in hl.fl (by default for edismax it is all fields in qf)
where there are no highlighting matches. In contrast the same query on Solr
4.0.0 without PostingsHighlighter it returns only the fields containing
highlighting matches.

here is a simplified example of the highlighting response for a document
with no matches in the fields specified by hl.fl:
with PostingsHighlighter:

  ...
  

  
  
  
  ...

  


without PostingsHighlighter:

  ...
  

  


This is a big problem for us as we have a large number of fields in a
dynamic field and we believe every time a highlighted response comes back it
is sending us a very large number of self-closing tags which bloats the
response to an unreasonable size (in some cases 100MB+).

We have tried using hl.requireFieldMatch=true but this seems to make no
difference.

Is there anything we can specify in the query (or solrconfig) to avoid
returning these empty tags? Or could this be a known bug?

We are considering looking at the source and modifying PostingsHighlighter
or associated classes, so any pointers on where to look would also be handy.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PostingsHighlighter-returning-fields-which-don-t-match-tp4084495.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PostingsHighlighter returning fields which don't match

2013-08-15 Thread ses
Thanks, we tried modifying the source as suggested but found in our case
PostingsHighlighter was returning no highlighting at all once we removed the
self-closing tags. I think perhaps we were not using it in the correct way.


Robert Muir wrote
> Do you want to open a JIRA issue to just change the behavior?

Yes, I think it would be useful to have it is an optional feature which can
be triggered by a parameter as suggested. This is how we implemented it, and
if it were returning highlighting we would happily contribute this back, but
as it stands its not properly tested. I will create a JIRA ticket to cover
this desired functionality though.


Robert Muir wrote
> Unrelated: If your queries actually go against a large number of fields,
> I'm not sure how efficient this highlighter will be. Thats because at some
> number of N fields, it will be much more efficient to use a
> document-oriented term vector approach (e.g. standard
> highlighter/fast-vector-highlighter).

Yes unfortunately it is not any faster. Our original problem was
highlighting performance and in our case PostingsHighlighter is performing
similarly to the default highlighter. 

We are now trying a solution which involves running one query to obtain the
field names in the N documents retrieved (where N=rows) and then a separate
query to specify those fields in 'hl.fl' parameter. This is working on the
basis that those two seperate queries run much faster than one query with
hl.fl=my_dynamic_field_*

Thanks for your detailed responses.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PostingsHighlighter-returning-fields-which-don-t-match-tp4084495p4084774.html
Sent from the Solr - User mailing list archive at Nabble.com.


Total number of hits within all documents

2012-11-28 Thread ses
I'm trying to find a way to retrieve from a Solr query the total number of
hits for a query across all documents.

I'm using an edismax query handler which searches across several fields
(specified in the schema.xml).

I have tried:
/solr/my_core/keyword?q=knights of arabia&fl=ttf:totaltermfreq(html,'knights
of arabia')
but the totaltermfreq function only works on individual terms

I have also tried
/solr/my_core/keyword?q=knights of arabia&facet=true&facet.query={!edismiax}
knights of arabia
which retrieves the total number of documents found with the search terms
within (same as numFound)

What I want is the total number of times the search terms appear in all
documents. For a standard disjunctive query like this it would total all
occurrences of 'knights', 'of' and 'arabia'. For a query like q="knights of
arabia", it would only count all occurrences of the entire phrase, and for
q=knights AND of AND arabia the number would be the total number of times
each term appears across all documents (but results would be fewer than
q=knights of arabia as documents must have all three of these terms in them
by the nature of the query).

I hope this makes sense and that there is some way I might be able to do
this that I am missing? I would also (begrudgingly) be happy if the answer
is that due to the way searching works, this is not possible and Solr/Lucene
will not easily be modified to do this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Total-number-of-hits-within-all-documents-tp4022895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Total number of hits within all documents

2012-11-28 Thread ses
Unfortunately a vague specification is all I have, due to the fact I am
trying to replicate the functionality in a closed-source legacy search
product. I suspect no-one at the company knows precisely how this works.

The purpose is ultimately to display to the user the entire number of 'hits'
found in all documents where a hit is any place in the text of the fields
searched (defined as 'qf' in the edismax search handler) where the search
terms appear. Essentially it should be like counting the number of
highlighted hits in a search with highlighting turned on. I could easily do
this for just the number of documents returned, specified by the 'rows'
parameter, by turning highlighting on and counting the snippets returned.
But I want this value for the entire dataset, which I have a feeling will be
too slow if I specify rows = total numFound.

I just want it to count this number for the fields specified in 'qf'. If it
could count all occurrences of terms that match wildcard queries, that would
be good but not essential. Fuzzy/span queries aren't used.

I would be fine with an approximation, for all I know this is how it works
using the old search product.

I hope this clarifies things a little, I realize it is a strange requirement
that the user is unlikely to even understand, but nevertheless apparently
the user must see something along the lines 'X documents found, Y hits
found'.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Total-number-of-hits-within-all-documents-tp4022895p4022920.html
Sent from the Solr - User mailing list archive at Nabble.com.