highlighting phrasal hits
hello, i'm doing phrasal searches, and am not happy with how highlighting is done by default. if i search for something, like "w1 w2 w3", then correctly, only fields that match perfectly will be found. however, when i specify highlighting with hl=true&hl.fl=myfield, then two things don't work according to (my) expectations: 1) "w1 w2 w3" is not highlighted as a whole, but rather the pieces are highlighted. e.g. w1 w2 w3. really, the whole thing should be contained within a single element. 2) relatedly, and presumably for the same reason, all instances of "w1", "w2" and "w3" in myfield are highlighted, even when they don't occur together. i can't see any possible reason for things working this way, but perhaps SOLR is just following lucene here. any thoughts appreciated, edward p.s. haven't actually tested the above against indexed english data, so it's possible that it's an artifact of the data and analysis procedures i am using. -- Edward Garrett Visiting Fellow (2006-07) Endangered Languages Academic Programme School of Oriental and African Studies London, UK 0207 898 4536 Assistant Professor, Linguistics Program Eastern Michigan University 612 Pray-Harrold Building Ypsilanti, MI, USA
New SOLR installation problems
I installed the 12-8 snapshot of solr on my 64bit RH AS server and whenever I go to the admin page I get the following error: SEVERE: Servlet.service() for servlet jsp threw exception java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.core.SolrCore Any ideas as to what is causing this? Thanks Andrew
Re: New SOLR installation problems
On 12/11/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: I installed the 12-8 snapshot of solr on my 64bit RH AS server and whenever I go to the admin page I get the following error: SEVERE: Servlet.service() for servlet jsp threw exception java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.core.SolrCore Any ideas as to what is causing this? Look through the logs of whatever servlet container you are using for the first exception thrown. It most likely has something to do with not being able to find the solr config files. -Yonik
Re: New SOLR installation problems
Thanks Yonik for the reply. I am using tomcat, and there is nothing in the catalina.out file. The access log just reports the same error I see in the browser which is reported below. I am starting tomcat from my solr directory which has the data, bin and conf directories as subdirectories. So the CWD should be correct. I have this same setup on another server that I have been working on with no problem. Im kinda lost with this one. Is their a setting in the solrconfig.xml file that I should be looking at? Andrew Yonik Seeley wrote: On 12/11/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: I installed the 12-8 snapshot of solr on my 64bit RH AS server and whenever I go to the admin page I get the following error: SEVERE: Servlet.service() for servlet jsp threw exception java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.core.SolrCore Any ideas as to what is causing this? Look through the logs of whatever servlet container you are using for the first exception thrown. It most likely has something to do with not being able to find the solr config files. -Yonik
Re: New SOLR installation problems
Nevermind, I got it working now. Had the paths setup incorrectly. Dumb++ Andrew Andrew Nagy wrote: Thanks Yonik for the reply. I am using tomcat, and there is nothing in the catalina.out file. The access log just reports the same error I see in the browser which is reported below. I am starting tomcat from my solr directory which has the data, bin and conf directories as subdirectories. So the CWD should be correct. I have this same setup on another server that I have been working on with no problem. Im kinda lost with this one. Is their a setting in the solrconfig.xml file that I should be looking at? Andrew Yonik Seeley wrote: On 12/11/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: I installed the 12-8 snapshot of solr on my 64bit RH AS server and whenever I go to the admin page I get the following error: SEVERE: Servlet.service() for servlet jsp threw exception java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.core.SolrCore Any ideas as to what is causing this? Look through the logs of whatever servlet container you are using for the first exception thrown. It most likely has something to do with not being able to find the solr config files. -Yonik
multiple collections
I was wondering how I might create multiple collections that have different field sets under solr. Would I have to have multiple implementations of solr running, or can I have more than one schema.xml file per "collection" ? Thanks Andrew
Top Searches
I'm looking into creating something to track the top 10 - 20 searches that run through Solr for a given period. I could just create a counter object with an internal TreeMap or something that just keeps count of the various terms, but it could grow very large very fast and I'm not yet sure what implications this would have on memory usage. Also, storing it in memory means it would be wiped out during a restart, so it's not ideal. Other ideas I had were storing them in a database table, or in a separate Solr instance. Each method has it's own advantages and drawbacks. Has anyone looked into or had any experience doing something like this? Any info or advice would be appreciated. -Sangraal A.
Re: Top Searches
On 12/11/06, sangraal aiken <[EMAIL PROTECTED]> wrote: I'm looking into creating something to track the top 10 - 20 searches that run through Solr for a given period. For offline processing, using log files is the simplest thing... the code remains separated, you can do historical processing if you keep the logs, and it doesn't affect live queries. It depends on how fresh the info needs to be and how it will be used. -Yonik
Re: Top Searches
That's a great idea, thanks Yonik. -Sangraal On 12/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 12/11/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > I'm looking into creating something to track the top 10 - 20 searches that > run through Solr for a given period. For offline processing, using log files is the simplest thing... the code remains separated, you can do historical processing if you keep the logs, and it doesn't affect live queries. It depends on how fresh the info needs to be and how it will be used. -Yonik
Re: multiple collections
: different field sets under solr. Would I have to have multiple : implementations of solr running, or can I have more than one schema.xml : file per "collection" ? currently the only supported way to do this is run multiple isntances of the solr.war ... if you look at the various container specific wiki pages, they each have information on how to run "Multiple Solr Webapps" in a single container instance... http://wiki.apache.org/solr/?action=fullsearch&value=Multiple+Solr+Webapps&fullsearch=Text -Hoss
How to query a parent child relationship returning result set of parents?
We are currently using solr to index various types of content in our system, several of which allow users to comment on. What we would like to do is issue a query on the top level content which also searches the attached comments but only returns unique top level documents as results, while still maintaining the option to search and return comments as an alternative type of search for the user. The simplest example would probably be that of a blog. The blog could be indexed as follows: id: blog_intId title: blog title content: blog content And the associated comments: id: comment_intId title: comment title content: comment content parentId: blog_intId Given this type of layout, how would I go about querying and returning a list of blogs which contain text in either the blog content or any of the comments' content? The only solutions I can come up with would be to: 1) aggregate comment content into the blog content index, allowing me to query directly on the blog. However we are expecting the site to generate many comments, along the lines of hundreds and possibly thousands. This also has the downside of requiring duplicate content in the index if we want to still permit users to search on and return comments. 2) Use facets to get a list of parent items and issue an additional query (or hit the database) to pull in the parent content. Again, this isn't an ideal solution since we would have to page the results ourselves since solr's facet parameters don't support an offset. This possibly negates any optimizations solr may have for paging regular queries. Also, it forces us to issue a second round trip to either solr or the database to get summary content to display in the search results list. It also seems like a poor use case for the facet functionality in general. 3) Plug into the solr code and implement a custom request handler, HitCollector, or ...? I've spent some time digging into the solr code and I don't see any obvious place to plug this type of functionality in. A major concern of mine is performance as well, so I want to ensure that I can get at and modify the results prior to solr loading any unnecessary content into memory. Any thoughts on this are very appreciated. Any kind of kick start, pointer, or places to dig into would be very helpful. -- eric
Re: highlighting phrasal hits
On 12/11/06, Edward Garrett <[EMAIL PROTECTED]> wrote: hello, i'm doing phrasal searches, and am not happy with how highlighting is done by default. if i search for something, like "w1 w2 w3", then correctly, only fields that match perfectly will be found. however, when i specify highlighting with hl=true&hl.fl=myfield, then two things don't work according to (my) expectations: 1) "w1 w2 w3" is not highlighted as a whole, but rather the pieces are highlighted. e.g. w1 w2 w3. really, the whole thing should be contained within a single element. 2) relatedly, and presumably for the same reason, all instances of "w1", "w2" and "w3" in myfield are highlighted, even when they don't occur together. i can't see any possible reason for things working this way, but perhaps SOLR is just following lucene here. Solr is using Lucene's built-in highlighter, which has the deficiencies you mention. There have been improved highlighting approaches proposed; see http://issues.apache.org/jira/browse/LUCENE-663 and http://issues.apache.org/jira/browse/LUCENE-644. Improving Solr's highlighting is something I am quite interested in personally. Unfortunately, this is an extremely busy time for me at work, and I doubt that I'll have time to work on this in the near future. -Mike
Re: Suggestion for solr.war
: application/xml : : : That causes the MIME type to get set explicitly and Firefox renders : the page properly, with or without IE Tab. thanks for the suggestion, i went ahead and added this to the Solr web.xml, but i used "application/xslt+xml" instead since it seems to work just as well, and has been in the W3C XSLT 2.0 docs since the May2003 draft... http://www.w3.org/TR/xslt20/ ...I'm not really an XSLT or mime type expert however, so if anyone kowns a more generally compatible approach we can take plese chime in. -Hoss
try setting useFilterForSortedQuery to false
People may want to consider changing the useFilterForSortedQuery option from true to false (or commenting it out) in solrconfig.xml I believe it should result in a speedup for the average query that sorts on something other than score. Generating and using filters for base queries used to be a win due to: 1) a set of complex queries that were almost never sorted by score, but the sort changed 2) bugs/slowness in Lucene when dealing with multi-segment indicies The deficiencies in Lucene have been corrected, and the introduction of fq filter parameters that are cached separately go a long way toward mitigating #1. The downside to this optimization is extra work when the same base query isn't resorted often, and pollution of the filterCache. I've changed the default to false for the example solrconfig.xml -Yonik