Re: Simple Faceted Searching out of the box
On Sep 9, 2006, at 9:09 AM, Tim Archambault wrote: I need to understand this then. Thanks. I want to use Solr for our newspaper website and this would be a great way to break out content. Kind of greys the lines between what is search and what is browsing categories, which is a great thing actually. Thanks for the help. greys the lines indeed. there isn't any difference between search and browse in my view now. let's just call it "findability" :) (by the way, "Ambient Findability" is a fantastic book) Erik
Re: Simple Faceted Searching out of the box
For those using PHP to interface with can you explain to me how your PHP code interacts with Solr? Does PHP create a query_string manually and post an URL like this: http://localhost:8983/solr/select?q=vertical%3Ajobs+accounting&version=2.1&start=0&rows=10&fl=&qt=standard&stylesheet=&indent=on&explainOther=&hl.fl= for example then using some PHP command to read a webpage, it then parses it? I'm not much of a programmer, but I do know Coldfusion so I'm trying to apply the PHP principles to CF. Thanks for any and all help. Tim On 9/10/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Sep 9, 2006, at 9:09 AM, Tim Archambault wrote: > I need to understand this then. Thanks. I want to use Solr for our > newspaper > website and this would be a great way to break out content. Kind of > greys > the lines between what is search and what is browsing categories, > which is a > great thing actually. Thanks for the help. greys the lines indeed. there isn't any difference between search and browse in my view now. let's just call it "findability" :) (by the way, "Ambient Findability" is a fantastic book) Erik
Re: Simple Faceted Searching out of the box
: > > What is "faceted browsing"? Maybe an example of a site interface Whoops! ... sorry about that, i tend to get ahead of my self. The examples Erik pointed out are very representative, but there are more subtle ways faceted searching can come into play -- for example, if you look at these two search results... http://shopper-search.cnet.com/search?q=gta http://shopper-search.cnet.com/search?q=ipod ...the categories in the left nav change based on what you search on, because we treat "category" as a facet, and the individual categories as possible "constraints" ... we don't show the user the exact count of how many products match in each category but we use that information to determine the order of the categories (or wether we should include a category in the list at all) : website and this would be a great way to break out content. Kind of greys : the lines between what is search and what is browsing categories, which is a : great thing actually. Thanks for the help. Even without facets, "browsing" a set of documents is just a search for "all" docuemnts (or depending on who you talk to: "searching" is just browsing with a special user entered constraint on the "text" facet) -Hoss
Re: IIS web server and Solr integration
: Should it run on a separate port than IIS or integrated using ISAPI plug-in? I can't make any specific recomendations about Windows or IIS, but i personally wouldn't Run Solr in the same webserver/appserver that your users hit -- from a security standpoint, i would protect your solr instance the same way you would protect a database, let the applications running in your webserver connect to it and run queries against it, but don't expose it to the outside world directly. -Hoss
Re: Got it working! And some questions
: - What is the loadFactor variable of HashDocSet? Should I optimize it too? this is the same as the loadFactor in a HashMap constructor -- but i don't think it has much affect on performance since the HashDocSets never "grow". I personally have never tuned the loadFactor :) : - What's the units on the size value of the caches? Megs, number of : queries, kilobytes? Not described anywhere. "entries" ... the number of items allowed in the cache. : - Any way to programatically change the OR/AND preference of the query : parser? I set it to AND by default for user queries, but i'd like to set : it to OR for some server-side queries I must do (find related articles, : order by score). you mean using StandardRequestHandler? ... not that i can think of off the top of my head, but typicaly it makes sense to just configure what you want for your "users" in the schema, and then make any machine generated queries be explicit. : - Whats the difference between the 2 commits type? Blocking and : non-blocking. Didn't see any differences at all, tried both. do you mean the waitFlush and waitSearcher options? if either of those is true, you shouldn't get a response back from the server untill they have finished. if they are false, then the server should respond instantly even if it takes several seconds (or maybe even minutes) to complete the operation (optimizes can take a while in some cases -- as can opening newSearchers if you have a lot of cache warming configured) : - Every time I do an command, I get the following in my : catalina logs - should I do anything about it? the optimize command needs to be well formed XML, try "" instead of just "" : - Any benefits of setting the allowed memory for Tomcat higher? Right : now im allocating 384 megs. the more memory you've got, the more cachng you can support .. but if your index changes so frequently compared to the rate of *unique* queries you get that your caches never fill up, it may not matter. -Hoss
Re: Re: IIS web server and Solr integration
Good news. The rookie did just that. Thanks Chris. Just having a difficult time how to send my query parameters to the engine from Coldfusion [intelligently]. I'm going to download the PHP app and see if I can figure it out. Having lots of fun with this for sure. Tim On 9/10/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Should it run on a separate port than IIS or integrated using ISAPI plug-in? I can't make any specific recomendations about Windows or IIS, but i personally wouldn't Run Solr in the same webserver/appserver that your users hit -- from a security standpoint, i would protect your solr instance the same way you would protect a database, let the applications running in your webserver connect to it and run queries against it, but don't expose it to the outside world directly. -Hoss
Re: Re: IIS web server and Solr integration
Tim - If you can help it, I would suggest running Solr under Tomcat under Linux. Speaking from experience in a mixed mode environment, the Linux/Tomcat/Solr implementation just works. We're not newbies under Linux, but we're also a native Windows shop. The memory management and system availability is just outstanding in that stack. If you must run Windows, Tomcat does integrate with IIS, but be prepared to jump through a few hoops. Spend time on making that combination work, and you'll be 90% there Hope this helps. -- j On 9/10/06, Tim Archambault <[EMAIL PROTECTED]> wrote: Good news. The rookie did just that. Thanks Chris. Just having a difficult time how to send my query parameters to the engine from Coldfusion [intelligently]. I'm going to download the PHP app and see if I can figure it out. Having lots of fun with this for sure. Tim On 9/10/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Should it run on a separate port than IIS or integrated using ISAPI plug-in? > > I can't make any specific recomendations about Windows or IIS, but i > personally wouldn't Run Solr in the same webserver/appserver that your > users hit -- from a security standpoint, i would protect your solr > instance the same way you would protect a database, let the applications > running in your webserver connect to it and run queries against it, but > don't expose it to the outside world directly. > > > -Hoss > >
Re: Re: Re: IIS web server and Solr integration
Thanks Jeff, I am going to run Solr for our beta site, mobile.bangordailynews.net, the mobile device version of our site. I'm just running it on Jetty right now as a completely separate web app under a different port. The Jetty port is not available on the web. I'm using Coldfusion to "get" the results. This will give me a chance to play with it a little. If things work like I think they will, I will probably buy a Linux-based "VPS" account and run Tomcat and Solr on it with our hosting provider and send the requests over to my dedicated server. I'm not much of a programmer nor do I know anything about Linux, but I think you are right about this. Tim On 9/10/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: Tim - If you can help it, I would suggest running Solr under Tomcat under Linux. Speaking from experience in a mixed mode environment, the Linux/Tomcat/Solr implementation just works. We're not newbies under Linux, but we're also a native Windows shop. The memory management and system availability is just outstanding in that stack. If you must run Windows, Tomcat does integrate with IIS, but be prepared to jump through a few hoops. Spend time on making that combination work, and you'll be 90% there Hope this helps. -- j On 9/10/06, Tim Archambault <[EMAIL PROTECTED]> wrote: > > Good news. The rookie did just that. Thanks Chris. Just having a > difficult time how to send my query parameters to the engine from > Coldfusion [intelligently]. I'm going to download the PHP app and see > if I can figure it out. Having lots of fun with this for sure. > > Tim > > On 9/10/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > > : Should it run on a separate port than IIS or integrated using ISAPI > plug-in? > > > > I can't make any specific recomendations about Windows or IIS, but i > > personally wouldn't Run Solr in the same webserver/appserver that your > > users hit -- from a security standpoint, i would protect your solr > > instance the same way you would protect a database, let the applications > > running in your webserver connect to it and run queries against it, but > > don't expose it to the outside world directly. > > > > > > -Hoss > > > > >
Re: Got it working! And some questions
First of all, it seems the mailing list is having some troubles? Some of my posts end up in the wrong thread (even new threads I post), I don't receive them in my mail, and they're present only in the 'date archive' of http://www.mail-archive.com, and not in the 'thread' one? I don't receive some of the other peoples post in my mail too, problems started last week I think. Secondly, Chris, thanks for all the useful answers, everything is much clearer now. This info should be added to the wiki I think; should I do it? I'm still a little disappointed that I can't change the OR/AND parsing by just changing some parameter (like I can do for the number of results returned, for example); adding a OR between each word in the text i want to compare sounds suboptimal, but i'll probably do it that way; its a very minor nitpick, solr is awesome, as I said before. @ Brian Lucas: Don't worry, solrPHP was still 99.9% functional, great work; part of it sending a doc at a time was my fault; I was following the exact sequence (add to array, submit) displayed in the docs. The only thing that could be added is a big "//TODO: change this code" before sections you have to change to make it work for a particular schema. I'm pretty sure the custom header curl submit works for everyone else than me; I'm on a windows test box with WAMP on it, so it may be caused by that. I'll send you tomorrow the changes I done to the code anyway; as I said, nothing major. Chris Hostetter wrote: : - What is the loadFactor variable of HashDocSet? Should I optimize it too? this is the same as the loadFactor in a HashMap constructor -- but i don't think it has much affect on performance since the HashDocSets never "grow". I personally have never tuned the loadFactor :) : - What's the units on the size value of the caches? Megs, number of : queries, kilobytes? Not described anywhere. "entries" ... the number of items allowed in the cache. : - Any way to programatically change the OR/AND preference of the query : parser? I set it to AND by default for user queries, but i'd like to set : it to OR for some server-side queries I must do (find related articles, : order by score). you mean using StandardRequestHandler? ... not that i can think of off the top of my head, but typicaly it makes sense to just configure what you want for your "users" in the schema, and then make any machine generated queries be explicit. : - Whats the difference between the 2 commits type? Blocking and : non-blocking. Didn't see any differences at all, tried both. do you mean the waitFlush and waitSearcher options? if either of those is true, you shouldn't get a response back from the server untill they have finished. if they are false, then the server should respond instantly even if it takes several seconds (or maybe even minutes) to complete the operation (optimizes can take a while in some cases -- as can opening newSearchers if you have a lot of cache warming configured) : - Every time I do an command, I get the following in my : catalina logs - should I do anything about it? the optimize command needs to be well formed XML, try "" instead of just "" : - Any benefits of setting the allowed memory for Tomcat higher? Right : now im allocating 384 megs. the more memory you've got, the more cachng you can support .. but if your index changes so frequently compared to the rate of *unique* queries you get that your caches never fill up, it may not matter. -Hoss -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212
Re: Got it working! And some questions
: First of all, it seems the mailing list is having some troubles? Some of : my posts end up in the wrong thread (even new threads I post), I don't : receive them in my mail, and they're present only in the 'date archive' : of http://www.mail-archive.com, and not in the 'thread' one? I don't : receive some of the other peoples post in my mail too, problems started : last week I think. i haven't noticed any problems with mail not making it through - some mail clients (gmail for example) seem to supress messages they can tell you sent, maybe that'swhat's happening on your end? As for threads you start not showing up on the "thread" list ... according to my mailbox, all but one message i've recieved from you included a "References:" header (if not a In-Reply-To header) which causes some mail archivers to assume it's part of an existing thread (this thread for instance is considered part of the "Double Solr Installation on Single Tomcat (or Double Index)" thread) ... you may wnat to experiement with your mail client (off list) to see if you can figure out when/why this happening. : Secondly, Chris, thanks for all the useful answers, everything is much : clearer now. This info should be added to the wiki I think; should I do feel free ... that's why it's a wiki. : it? I'm still a little disappointed that I can't change the OR/AND : parsing by just changing some parameter (like I can do for the number of : results returned, for example); adding a OR between each word in the : text i want to compare sounds suboptimal, but i'll probably do it that : way; its a very minor nitpick, solr is awesome, as I said before. it would be a fairly simple option to add just like changing the default field (patches welcome!) but as i said -- typcially if you don't want the default behavior you are programaticaly generating the query anyway, and already adding some markup, a little more doesn't make it less optimal. -Hoss
Re: search terms submitted
: Just wondering what others do with the search terms people type into your : solr search boxes? you mean besies return results that match :) : Does CNet use this information for "Popular Searches?" yep... http://shopper-search.cnet.com/top -Hoss