edismax - ignore/override qf field?
With the edismax query handler, is it possible to provide a query parameter which will specify a field (already defined in the qf parameter in solrconfig.xml) to ignore. E.g. can you tell Solr to not search a specific field? An example may be where in solrconfig.xml qf is configured to search: title, author, description, fulltext And in some circumstance, maybe a basic search you want to ignore the field 'fulltext', or perhaps even multiple fields such as both 'description' and 'fulltext'. In the situation I'm thinking of there are many more qf fields than this simple example, so defining a new query handler or specifying all the fields to search at query time would be a less than ideal solution. -- View this message in context: http://lucene.472066.n3.nabble.com/edismax-ignore-override-qf-field-tp4107058.html Sent from the Solr - User mailing list archive at Nabble.com.
PostingsHighlighter returning fields which don't match
We are trying out the new PostingsHighlighter with Solr 4.2.1 and finding that the highlighting section of the response includes self-closing tags for all the fields in hl.fl (by default for edismax it is all fields in qf) where there are no highlighting matches. In contrast the same query on Solr 4.0.0 without PostingsHighlighter it returns only the fields containing highlighting matches. here is a simplified example of the highlighting response for a document with no matches in the fields specified by hl.fl: with PostingsHighlighter: ... ... without PostingsHighlighter: ... This is a big problem for us as we have a large number of fields in a dynamic field and we believe every time a highlighted response comes back it is sending us a very large number of self-closing tags which bloats the response to an unreasonable size (in some cases 100MB+). We have tried using hl.requireFieldMatch=true but this seems to make no difference. Is there anything we can specify in the query (or solrconfig) to avoid returning these empty tags? Or could this be a known bug? We are considering looking at the source and modifying PostingsHighlighter or associated classes, so any pointers on where to look would also be handy. -- View this message in context: http://lucene.472066.n3.nabble.com/PostingsHighlighter-returning-fields-which-don-t-match-tp4084495.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PostingsHighlighter returning fields which don't match
Thanks, we tried modifying the source as suggested but found in our case PostingsHighlighter was returning no highlighting at all once we removed the self-closing tags. I think perhaps we were not using it in the correct way. Robert Muir wrote > Do you want to open a JIRA issue to just change the behavior? Yes, I think it would be useful to have it is an optional feature which can be triggered by a parameter as suggested. This is how we implemented it, and if it were returning highlighting we would happily contribute this back, but as it stands its not properly tested. I will create a JIRA ticket to cover this desired functionality though. Robert Muir wrote > Unrelated: If your queries actually go against a large number of fields, > I'm not sure how efficient this highlighter will be. Thats because at some > number of N fields, it will be much more efficient to use a > document-oriented term vector approach (e.g. standard > highlighter/fast-vector-highlighter). Yes unfortunately it is not any faster. Our original problem was highlighting performance and in our case PostingsHighlighter is performing similarly to the default highlighter. We are now trying a solution which involves running one query to obtain the field names in the N documents retrieved (where N=rows) and then a separate query to specify those fields in 'hl.fl' parameter. This is working on the basis that those two seperate queries run much faster than one query with hl.fl=my_dynamic_field_* Thanks for your detailed responses. -- View this message in context: http://lucene.472066.n3.nabble.com/PostingsHighlighter-returning-fields-which-don-t-match-tp4084495p4084774.html Sent from the Solr - User mailing list archive at Nabble.com.
Total number of hits within all documents
I'm trying to find a way to retrieve from a Solr query the total number of hits for a query across all documents. I'm using an edismax query handler which searches across several fields (specified in the schema.xml). I have tried: /solr/my_core/keyword?q=knights of arabia&fl=ttf:totaltermfreq(html,'knights of arabia') but the totaltermfreq function only works on individual terms I have also tried /solr/my_core/keyword?q=knights of arabia&facet=true&facet.query={!edismiax} knights of arabia which retrieves the total number of documents found with the search terms within (same as numFound) What I want is the total number of times the search terms appear in all documents. For a standard disjunctive query like this it would total all occurrences of 'knights', 'of' and 'arabia'. For a query like q="knights of arabia", it would only count all occurrences of the entire phrase, and for q=knights AND of AND arabia the number would be the total number of times each term appears across all documents (but results would be fewer than q=knights of arabia as documents must have all three of these terms in them by the nature of the query). I hope this makes sense and that there is some way I might be able to do this that I am missing? I would also (begrudgingly) be happy if the answer is that due to the way searching works, this is not possible and Solr/Lucene will not easily be modified to do this. -- View this message in context: http://lucene.472066.n3.nabble.com/Total-number-of-hits-within-all-documents-tp4022895.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Total number of hits within all documents
Unfortunately a vague specification is all I have, due to the fact I am trying to replicate the functionality in a closed-source legacy search product. I suspect no-one at the company knows precisely how this works. The purpose is ultimately to display to the user the entire number of 'hits' found in all documents where a hit is any place in the text of the fields searched (defined as 'qf' in the edismax search handler) where the search terms appear. Essentially it should be like counting the number of highlighted hits in a search with highlighting turned on. I could easily do this for just the number of documents returned, specified by the 'rows' parameter, by turning highlighting on and counting the snippets returned. But I want this value for the entire dataset, which I have a feeling will be too slow if I specify rows = total numFound. I just want it to count this number for the fields specified in 'qf'. If it could count all occurrences of terms that match wildcard queries, that would be good but not essential. Fuzzy/span queries aren't used. I would be fine with an approximation, for all I know this is how it works using the old search product. I hope this clarifies things a little, I realize it is a strange requirement that the user is unlikely to even understand, but nevertheless apparently the user must see something along the lines 'X documents found, Y hits found'. -- View this message in context: http://lucene.472066.n3.nabble.com/Total-number-of-hits-within-all-documents-tp4022895p4022920.html Sent from the Solr - User mailing list archive at Nabble.com.