Re: Simple Faceted Searching out of the box
Hoss, What is "faceted browsing"? Maybe an example of a site interface that is using it would be good. Dumb question, I know. On 9/8/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: Hey everybody, I just wanted to officially announce that as of the solr-2006-09-08.zip nightly build, Solr supports some simple Faceted Searching options right out of the box. Both the StandardRequestHandler and DisMaxRequestHandler now support some query params for specifying simple queries to use as facet constraints, or fields in your index you wish to use as facets - generating a constraint count for each term in the field. All of these params can be configured as "defaults" when registering the RequestHandler in your solrconfig.xml Information on what the new facet parameters are, how to use them, and what types of resultsthey generate can be found in the wiki... http://wiki.apache.org/solr/SimpleFacetParameters http://wiki.apache.org/solr/StandardRequestHandler http://wiki.apache.org/solr/DisMaxRequestHandler ...as allways: feedback, comments, suggestions and general discussion is strongly encouraged :) -Hoss
Re: Simple Faceted Searching out of the box
On Sep 9, 2006, at 8:15 AM, Tim Archambault wrote: What is "faceted browsing"? Maybe an example of a site interface that is using it would be good. Dumb question, I know. Faceted browsing is like this: http://shopper.cnet.com/ and http:// www.nines.org/collex In Collex, the "constrain further" box are the facets. Clicking on them adds them to "your constraints". The idea is to divide the documents in the index into distinct buckets (or sets) and show the counts of how many results are in each set. Erik
Re: Simple Faceted Searching out of the box
I need to understand this then. Thanks. I want to use Solr for our newspaper website and this would be a great way to break out content. Kind of greys the lines between what is search and what is browsing categories, which is a great thing actually. Thanks for the help. Tim On 9/9/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Sep 9, 2006, at 8:15 AM, Tim Archambault wrote: > What is "faceted browsing"? Maybe an example of a site interface > that is > using it would be good. Dumb question, I know. Faceted browsing is like this: http://shopper.cnet.com/ and http:// www.nines.org/collex In Collex, the "constrain further" box are the facets. Clicking on them adds them to "your constraints". The idea is to divide the documents in the index into distinct buckets (or sets) and show the counts of how many results are in each set. Erik
Re: Simple Faceted Searching out of the box
Good. Thk u,Hoss. 2006/9/9, Tim Archambault <[EMAIL PROTECTED]>: Hoss, What is "faceted browsing"? Maybe an example of a site interface that is using it would be good. Dumb question, I know. On 9/8/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > Hey everybody, I just wanted to officially announce that as of the > solr-2006-09-08.zip nightly build, Solr supports some simple Faceted > Searching options right out of the box. > > Both the StandardRequestHandler and DisMaxRequestHandler now support some > query params for specifying simple queries to use as facet constraints, or > fields in your index you wish to use as facets - generating a constraint > count for each term in the field. All of these params can be configured > as "defaults" when registering the RequestHandler in your solrconfig.xml > > Information on what the new facet parameters are, how to use them, and > what types of resultsthey generate can be found in the wiki... > > http://wiki.apache.org/solr/SimpleFacetParameters > http://wiki.apache.org/solr/StandardRequestHandler > http://wiki.apache.org/solr/DisMaxRequestHandler > > ...as allways: feedback, comments, suggestions and general discussion is > strongly encouraged :) > > > -Hoss > >
IIS web server and Solr integration
Looking to use Solr, but in Dedicated Windows environment. Can anyone provide me some input as to how this can be done? If it can, do you recommend Jetty, Tomcat, other? Should it run on a separate port than IIS or integrated using ISAPI plug-in? Any help is greatly appreciated. Our site isn't huge traffic 4 million page views per month (GA) and about 20,000 visitors per day. Thanks, Tim
Got it working! And some questions
First of all, in reference to http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html , I got it working! The problem(s) was coming from solPHP; the implementation in the wiki isn't really working, to be honest, at least for me. I had to modify it significantly at multiple places to get it working. Tomcat 5.5, WAMP and Windows XP. The main problem was that addIndex was sending 1 doc at a time to solr; it would cause a problem after a few thousand docs because i was running out of resources. I modified solr_update.php to handle batch queries, and i'm now sending batches of 1000 docs at a time. Great indexing speed. Had a slight problem with the curl function of solr_update.php; the custom HTTP header wasn't recognized; I now use curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); - much simpler, and now everything works! Up so far I indexed 15.000.000 documents (my whole collection, basically) and the performance i'm getting is INCREDIBLE (sub 100ms query time without warmup and no optimization at all on a 7 gigs index - and with the cache, it gets stupid fast)! Seriously, Solr amaze me every time I use it. I increased HashDocSet Maxsize to 75000, will continue to optimize this value - it helped a great deal. I will try disMaxHandler soon too; right now the standard one is great. And I will index with a better stopword file; the default one could really use improvements. Some questions (couldn't find the answer in the docs): - Is the solr php in the wiki working out of the box for anyone? Else we could modify the wiki... - What is the loadFactor variable of HashDocSet? Should I optimize it too? - What's the units on the size value of the caches? Megs, number of queries, kilobytes? Not described anywhere. - Any way to programatically change the OR/AND preference of the query parser? I set it to AND by default for user queries, but i'd like to set it to OR for some server-side queries I must do (find related articles, order by score). - Whats the difference between the 2 commits type? Blocking and non-blocking. Didn't see any differences at all, tried both. - Every time I do an command, I get the following in my catalina logs - should I do anything about it? 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log SEVERE: Exception during commit/optimize:java.io.EOFException: no more data available - expected end tag to close start tag from line 1, parser stopped on START_TAG seen ... @1:10 - Any benefits of setting the allowed memory for Tomcat higher? Right now im allocating 384 megs. Can't wait to try the new Faceted Queries... seriously, solr is really, really awesome up so far. Thanks for all your work, and sorry for all the questions! -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212
RE: Got it working! And some questions
Hi Michael, I apologize for the lack of testing on the SolPHP. I had to "strip" it down significantly to turn it into a general class that would be usable and the version up there has not been extensively tested yet (I'm almost ready to get back to that and "revise" it), plus much of my coding is done in Rails at the moment. However... If you have a new version, could you send it over my way or just upload it to the wiki? I'd like to take a look at the changes and throw your revised version up there or integrate both versions into a cleaner revision of the version already there. With respect to batch queries, it's already designed to do that (that's why you see "array($array)" in the example, because it accepts an array of updates) but I'd definitely like to see how you revised it. Thanks, Brian -Original Message- From: Michael Imbeault [mailto:[EMAIL PROTECTED] Sent: Saturday, September 09, 2006 12:30 PM To: solr-user@lucene.apache.org Subject: Got it working! And some questions First of all, in reference to http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html , I got it working! The problem(s) was coming from solPHP; the implementation in the wiki isn't really working, to be honest, at least for me. I had to modify it significantly at multiple places to get it working. Tomcat 5.5, WAMP and Windows XP. The main problem was that addIndex was sending 1 doc at a time to solr; it would cause a problem after a few thousand docs because i was running out of resources. I modified solr_update.php to handle batch queries, and i'm now sending batches of 1000 docs at a time. Great indexing speed. Had a slight problem with the curl function of solr_update.php; the custom HTTP header wasn't recognized; I now use curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); - much simpler, and now everything works! Up so far I indexed 15.000.000 documents (my whole collection, basically) and the performance i'm getting is INCREDIBLE (sub 100ms query time without warmup and no optimization at all on a 7 gigs index - and with the cache, it gets stupid fast)! Seriously, Solr amaze me every time I use it. I increased HashDocSet Maxsize to 75000, will continue to optimize this value - it helped a great deal. I will try disMaxHandler soon too; right now the standard one is great. And I will index with a better stopword file; the default one could really use improvements. Some questions (couldn't find the answer in the docs): - Is the solr php in the wiki working out of the box for anyone? Else we could modify the wiki... - What is the loadFactor variable of HashDocSet? Should I optimize it too? - What's the units on the size value of the caches? Megs, number of queries, kilobytes? Not described anywhere. - Any way to programatically change the OR/AND preference of the query parser? I set it to AND by default for user queries, but i'd like to set it to OR for some server-side queries I must do (find related articles, order by score). - Whats the difference between the 2 commits type? Blocking and non-blocking. Didn't see any differences at all, tried both. - Every time I do an command, I get the following in my catalina logs - should I do anything about it? 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log SEVERE: Exception during commit/optimize:java.io.EOFException: no more data available - expected end tag to close start tag from line 1, parser stopped on START_TAG seen ... @1:10 - Any benefits of setting the allowed memory for Tomcat higher? Right now im allocating 384 megs. Can't wait to try the new Faceted Queries... seriously, solr is really, really awesome up so far. Thanks for all your work, and sorry for all the questions! -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212
Re: Re: SolrCore as Singleton?
In regard to the comment about lack of an interface, I view this as a benefit of the tool. Whether I'm developing with Python, PHP, Coldfusion, .NET, Java, etc. I can create my own customizable interface. As a coldfusion programmer with moderate programming capabilities, this tool is perfect for my needs. On 9/8/06, Andrew May <[EMAIL PROTECTED]> wrote: Chris Hostetter wrote: > : Nice. Is the same doable under Jetty? (never had to deal with JNDI > : under Jetty) > > i haven't tried it personally, but according to Yoav "reading" JNDI > options is part of hte Servlet Spec, and billa found a refrene to > useing "" to do so... > > http://www.nabble.com/Re%3A-multiple-solr-webapps-p3991310.html > > ...where exactly that option goes in Jetty's configuration isn't something > i'm clear on. > values go in web.xml, so it would mean having modified versions of solr.war for each collection. is an optional part of the Servlet spec for standalone servlet implementations. The basic version of Jetty does not have any JNDI support, you need to use JettyPlus (http://jetty.mortbay.org/jetty5/plus/index.html) for that. -Andrew
Re: SolrCore as Singleton?
Tim Archambault wrote: In regard to the comment about lack of an interface, I view this as a benefit of the tool. Whether I'm developing with Python, PHP, Coldfusion, .NET, Java, etc. I can create my own customizable interface. As a coldfusion programmer with moderate programming capabilities, this tool is perfect for my needs. That's good to hear. I never meant that a GUI should replace anything at all. Did it come out that way? As the product evolves, it is only natural to add capabilities and features. Some of these should be available from different interfaces, including GUI(s). However one should be able to interface with the application at different levels. When Solr gets more complex over time, care must be taken so it does not get complicated. There might be numerous more points of entry into a more complex product. It is necessary to keep things simple as well as providing centralized configuration possibilities. Following this philosophy, Solr users will be able to choose their level of interaction. (In a metaphor, some people prefer using GNU/Linux just by installing a distro; others compile and become best friends with the command line.) Eivind
Re: Got it working! And some questions
- Is the solr php in the wiki working out of the box for anyone? show your php.ini. did you performance your php? 2006/9/10, Brian Lucas <[EMAIL PROTECTED]>: Hi Michael, I apologize for the lack of testing on the SolPHP. I had to "strip" it down significantly to turn it into a general class that would be usable and the version up there has not been extensively tested yet (I'm almost ready to get back to that and "revise" it), plus much of my coding is done in Rails at the moment. However... If you have a new version, could you send it over my way or just upload it to the wiki? I'd like to take a look at the changes and throw your revised version up there or integrate both versions into a cleaner revision of the version already there. With respect to batch queries, it's already designed to do that (that's why you see "array($array)" in the example, because it accepts an array of updates) but I'd definitely like to see how you revised it. Thanks, Brian -Original Message- From: Michael Imbeault [mailto:[EMAIL PROTECTED] Sent: Saturday, September 09, 2006 12:30 PM To: solr-user@lucene.apache.org Subject: Got it working! And some questions First of all, in reference to http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html , I got it working! The problem(s) was coming from solPHP; the implementation in the wiki isn't really working, to be honest, at least for me. I had to modify it significantly at multiple places to get it working. Tomcat 5.5, WAMP and Windows XP. The main problem was that addIndex was sending 1 doc at a time to solr; it would cause a problem after a few thousand docs because i was running out of resources. I modified solr_update.php to handle batch queries, and i'm now sending batches of 1000 docs at a time. Great indexing speed. Had a slight problem with the curl function of solr_update.php; the custom HTTP header wasn't recognized; I now use curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); - much simpler, and now everything works! Up so far I indexed 15.000.000 documents (my whole collection, basically) and the performance i'm getting is INCREDIBLE (sub 100ms query time without warmup and no optimization at all on a 7 gigs index - and with the cache, it gets stupid fast)! Seriously, Solr amaze me every time I use it. I increased HashDocSet Maxsize to 75000, will continue to optimize this value - it helped a great deal. I will try disMaxHandler soon too; right now the standard one is great. And I will index with a better stopword file; the default one could really use improvements. Some questions (couldn't find the answer in the docs): - Is the solr php in the wiki working out of the box for anyone? Else we could modify the wiki... - What is the loadFactor variable of HashDocSet? Should I optimize it too? - What's the units on the size value of the caches? Megs, number of queries, kilobytes? Not described anywhere. - Any way to programatically change the OR/AND preference of the query parser? I set it to AND by default for user queries, but i'd like to set it to OR for some server-side queries I must do (find related articles, order by score). - Whats the difference between the 2 commits type? Blocking and non-blocking. Didn't see any differences at all, tried both. - Every time I do an command, I get the following in my catalina logs - should I do anything about it? 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log SEVERE: Exception during commit/optimize:java.io.EOFException: no more data available - expected end tag to close start tag from line 1, parser stopped on START_TAG seen ... @1:10 - Any benefits of setting the allowed memory for Tomcat higher? Right now im allocating 384 megs. Can't wait to try the new Faceted Queries... seriously, solr is really, really awesome up so far. Thanks for all your work, and sorry for all the questions! -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212
search terms submitted
Just wondering what others do with the search terms people type into your solr search boxes? Does CNet use this information for "Popular Searches?" Just curious. FYI. SOLR is up and running on my Windows 2003 IIS machine. Thanks for everyone's feedback.
Re: Doc add limit problem, old issue
Fixed my problem, the implementation of solPHP was faulty. It was sending one doc at a time (one curl per doc) and the system quickly ran out of resources. Now I modified it to send by batch (1000 at a time) and everything is #1! Michael Imbeault wrote: Old issue (see http://www.mail-archive.com/solr-user@lucene.apache.org/msg00651.html), but I'm experiencing the same exact thing on windows xp, latest tomcat. I noticed that the tomcat process gobbles memory (10 megs a second maybe) and then jams at 125 megs. Can't find a fix yet. I'm using a php interface and curl to post my xml, one document at a time, and commit every 100 document. Indexing 3 docs, it hangs at maybe 5000. Anyone got an idea on this one? It would be helpful. I may try to switch to jetty tomorrow if nothing works :( -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212