Re: Simple Faceted Searching out of the box

2006-09-10 Thread Erik Hatcher


On Sep 9, 2006, at 9:09 AM, Tim Archambault wrote:
I need to understand this then. Thanks. I want to use Solr for our  
newspaper
website and this would be a great way to break out content. Kind of  
greys
the lines between what is search and what is browsing categories,  
which is a

great thing actually. Thanks for the help.


greys the lines indeed.  there isn't any difference between search  
and browse in my view now.  let's just call it "findability" :)  (by  
the way, "Ambient Findability" is a fantastic book)


Erik



Re: Simple Faceted Searching out of the box

2006-09-10 Thread Tim Archambault

For those using PHP to interface with can you explain to me how your PHP
code interacts with Solr? Does PHP create a query_string manually and post
an URL like this:
http://localhost:8983/solr/select?q=vertical%3Ajobs+accounting&version=2.1&start=0&rows=10&fl=&qt=standard&stylesheet=&indent=on&explainOther=&hl.fl=
for example then using some PHP command to read a webpage, it then parses
it?

I'm not much of a programmer, but I do know Coldfusion so I'm trying to
apply the PHP principles to CF.

Thanks for any and all help.

Tim


On 9/10/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:



On Sep 9, 2006, at 9:09 AM, Tim Archambault wrote:
> I need to understand this then. Thanks. I want to use Solr for our
> newspaper
> website and this would be a great way to break out content. Kind of
> greys
> the lines between what is search and what is browsing categories,
> which is a
> great thing actually. Thanks for the help.

greys the lines indeed.  there isn't any difference between search
and browse in my view now.  let's just call it "findability" :)  (by
the way, "Ambient Findability" is a fantastic book)

   Erik




Re: Simple Faceted Searching out of the box

2006-09-10 Thread Chris Hostetter

: > > What is "faceted browsing"? Maybe an example of a site interface

Whoops! ... sorry about that, i tend to get ahead of my self.

The examples Erik pointed out are very representative, but there are more
subtle ways faceted searching can come into play -- for example, if you
look at these two search results...

   http://shopper-search.cnet.com/search?q=gta
   http://shopper-search.cnet.com/search?q=ipod

...the categories in the left nav change based on what you search on,
because we treat "category" as a facet, and the individual categories as
possible "constraints" ... we don't show the user the exact count of how
many products match in each category but we use that information to
determine the order of the categories (or wether we should include a
category in the list at all)

: website and this would be a great way to break out content. Kind of greys
: the lines between what is search and what is browsing categories, which is a
: great thing actually. Thanks for the help.

Even without facets, "browsing" a set of documents is just a search for
"all" docuemnts (or depending on who you talk to: "searching" is just
browsing with a special user entered constraint on the "text" facet)




-Hoss



Re: IIS web server and Solr integration

2006-09-10 Thread Chris Hostetter

: Should it run on a separate port than IIS or integrated using ISAPI plug-in?

I can't make any specific recomendations about Windows or IIS, but i
personally wouldn't Run Solr in the same webserver/appserver that your
users hit -- from a security standpoint, i would protect your solr
instance the same way you would protect a database, let the applications
running in your webserver connect to it and run queries against it, but
don't expose it to the outside world directly.


-Hoss



Re: Got it working! And some questions

2006-09-10 Thread Chris Hostetter

: - What is the loadFactor variable of HashDocSet? Should I optimize it too?

this is the same as the loadFactor in a HashMap constructor -- but i don't
think it has much affect on performance since the HashDocSets never
"grow".

I personally have never tuned the loadFactor :)

: - What's the units on the size value of the caches? Megs, number of
: queries, kilobytes? Not described anywhere.

"entries" ... the number of items allowed in the cache.

: - Any way to programatically change the OR/AND preference of the query
: parser? I set it to AND by default for user queries, but i'd like to set
: it to OR for some server-side queries I must do (find related articles,
: order by score).

you mean using StandardRequestHandler? ... not that i can think of off the
top of my head, but typicaly it makes sense to just configure what you
want for your "users" in the schema, and then make any machine generated
queries be explicit.

: - Whats the difference between the 2 commits type? Blocking and
: non-blocking. Didn't see any differences at all, tried both.

do you mean the waitFlush and waitSearcher options?
if either of those is true, you shouldn't get a response back from the
server untill they have finished.  if they are false, then the server
should respond instantly even if it takes several seconds (or maybe even
minutes) to complete the operation (optimizes can take a while in some
cases -- as can opening newSearchers if you have a lot of cache warming
configured)

: - Every time I do an  command, I get the following in my
: catalina logs - should I do anything about it?

the optimize command needs to be well formed XML, try ""
instead of just ""

: - Any benefits of setting the allowed memory for Tomcat higher? Right
: now im allocating 384 megs.

the more memory you've got, the more cachng you can support .. but if
your index changes so frequently compared to the rate of *unique*
queries you get that your caches never fill up, it may not matter.




-Hoss



Re: Re: IIS web server and Solr integration

2006-09-10 Thread Tim Archambault

Good news. The rookie did just that. Thanks Chris. Just having a
difficult time how to send my query parameters to the engine from
Coldfusion [intelligently]. I'm going to download the PHP app and see
if I can figure it out. Having lots of fun with this for sure.

Tim

On 9/10/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: Should it run on a separate port than IIS or integrated using ISAPI plug-in?

I can't make any specific recomendations about Windows or IIS, but i
personally wouldn't Run Solr in the same webserver/appserver that your
users hit -- from a security standpoint, i would protect your solr
instance the same way you would protect a database, let the applications
running in your webserver connect to it and run queries against it, but
don't expose it to the outside world directly.


-Hoss




Re: Re: IIS web server and Solr integration

2006-09-10 Thread Jeff Rodenburg

Tim -

If you can help it, I would suggest running Solr under Tomcat under Linux.
Speaking from experience in a mixed mode environment, the Linux/Tomcat/Solr
implementation just works.  We're not newbies under Linux, but we're also a
native Windows shop.  The memory management and system availability is just
outstanding in that stack.

If you must run Windows, Tomcat does integrate with IIS, but be prepared to
jump through a few hoops.  Spend time on making that combination work, and
you'll be 90% there

Hope this helps.

-- j

On 9/10/06, Tim Archambault <[EMAIL PROTECTED]> wrote:


Good news. The rookie did just that. Thanks Chris. Just having a
difficult time how to send my query parameters to the engine from
Coldfusion [intelligently]. I'm going to download the PHP app and see
if I can figure it out. Having lots of fun with this for sure.

Tim

On 9/10/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : Should it run on a separate port than IIS or integrated using ISAPI
plug-in?
>
> I can't make any specific recomendations about Windows or IIS, but i
> personally wouldn't Run Solr in the same webserver/appserver that your
> users hit -- from a security standpoint, i would protect your solr
> instance the same way you would protect a database, let the applications
> running in your webserver connect to it and run queries against it, but
> don't expose it to the outside world directly.
>
>
> -Hoss
>
>



Re: Re: Re: IIS web server and Solr integration

2006-09-10 Thread Tim Archambault

Thanks Jeff,

I am going to run Solr for our beta site, mobile.bangordailynews.net,
the mobile device version of our site. I'm just running it on Jetty
right now as a completely separate web app under a different port. The
Jetty port is not available on the web. I'm using Coldfusion to "get"
the results.

This will give me a chance to play with it a little. If things work
like I think they will, I will probably buy a Linux-based "VPS"
account and run Tomcat and Solr on it with our hosting provider and
send the requests over to my dedicated server. I'm not much of a
programmer nor do I know anything about Linux, but I think you are
right about this.

Tim


On 9/10/06, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:

Tim -

If you can help it, I would suggest running Solr under Tomcat under Linux.
Speaking from experience in a mixed mode environment, the Linux/Tomcat/Solr
implementation just works.  We're not newbies under Linux, but we're also a
native Windows shop.  The memory management and system availability is just
outstanding in that stack.

If you must run Windows, Tomcat does integrate with IIS, but be prepared to
jump through a few hoops.  Spend time on making that combination work, and
you'll be 90% there

Hope this helps.

-- j

On 9/10/06, Tim Archambault <[EMAIL PROTECTED]> wrote:
>
> Good news. The rookie did just that. Thanks Chris. Just having a
> difficult time how to send my query parameters to the engine from
> Coldfusion [intelligently]. I'm going to download the PHP app and see
> if I can figure it out. Having lots of fun with this for sure.
>
> Tim
>
> On 9/10/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> >
> > : Should it run on a separate port than IIS or integrated using ISAPI
> plug-in?
> >
> > I can't make any specific recomendations about Windows or IIS, but i
> > personally wouldn't Run Solr in the same webserver/appserver that your
> > users hit -- from a security standpoint, i would protect your solr
> > instance the same way you would protect a database, let the applications
> > running in your webserver connect to it and run queries against it, but
> > don't expose it to the outside world directly.
> >
> >
> > -Hoss
> >
> >
>




Re: Got it working! And some questions

2006-09-10 Thread Michael Imbeault
First of all, it seems the mailing list is having some troubles? Some of 
my posts end up in the wrong thread (even new threads I post), I don't 
receive them in my mail, and they're present only in the 'date archive' 
of http://www.mail-archive.com, and not in the 'thread' one? I don't 
receive some of the other peoples post in my mail too, problems started 
last week I think.


Secondly, Chris, thanks for all the useful answers, everything is much 
clearer now. This info should be added to the wiki I think; should I do 
it? I'm still a little disappointed that I can't change the OR/AND 
parsing by just changing some parameter (like I can do for the number of 
results returned, for example); adding a OR between each word in the 
text i want to compare sounds suboptimal, but i'll probably do it that 
way; its a very minor nitpick, solr is awesome, as I said before.


@ Brian Lucas: Don't worry, solrPHP was still 99.9% functional, great 
work; part of it sending a doc at a time was my fault; I was following 
the exact sequence (add to array, submit) displayed in the docs. The 
only thing that could be added is a big "//TODO: change this code" 
before sections you have to change to make it work for a particular 
schema. I'm pretty sure the custom header curl submit works for everyone 
else than me; I'm on a windows test box with WAMP on it, so it may be 
caused by that. I'll send you tomorrow the changes I done to the code 
anyway; as I said, nothing major.


Chris Hostetter wrote:

: - What is the loadFactor variable of HashDocSet? Should I optimize it too?

this is the same as the loadFactor in a HashMap constructor -- but i don't
think it has much affect on performance since the HashDocSets never
"grow".

I personally have never tuned the loadFactor :)

: - What's the units on the size value of the caches? Megs, number of
: queries, kilobytes? Not described anywhere.

"entries" ... the number of items allowed in the cache.

: - Any way to programatically change the OR/AND preference of the query
: parser? I set it to AND by default for user queries, but i'd like to set
: it to OR for some server-side queries I must do (find related articles,
: order by score).

you mean using StandardRequestHandler? ... not that i can think of off the
top of my head, but typicaly it makes sense to just configure what you
want for your "users" in the schema, and then make any machine generated
queries be explicit.

: - Whats the difference between the 2 commits type? Blocking and
: non-blocking. Didn't see any differences at all, tried both.

do you mean the waitFlush and waitSearcher options?
if either of those is true, you shouldn't get a response back from the
server untill they have finished.  if they are false, then the server
should respond instantly even if it takes several seconds (or maybe even
minutes) to complete the operation (optimizes can take a while in some
cases -- as can opening newSearchers if you have a lot of cache warming
configured)

: - Every time I do an  command, I get the following in my
: catalina logs - should I do anything about it?

the optimize command needs to be well formed XML, try ""
instead of just ""

: - Any benefits of setting the allowed memory for Tomcat higher? Right
: now im allocating 384 megs.

the more memory you've got, the more cachng you can support .. but if
your index changes so frequently compared to the rate of *unique*
queries you get that your caches never fill up, it may not matter.




-Hoss
  

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



Re: Got it working! And some questions

2006-09-10 Thread Chris Hostetter

: First of all, it seems the mailing list is having some troubles? Some of
: my posts end up in the wrong thread (even new threads I post), I don't
: receive them in my mail, and they're present only in the 'date archive'
: of http://www.mail-archive.com, and not in the 'thread' one? I don't
: receive some of the other peoples post in my mail too, problems started
: last week I think.

i haven't noticed any problems with mail not making it through - some mail
clients (gmail for example) seem to supress messages they can tell you
sent, maybe that'swhat's happening on your end?  As for
threads you start not showing up on the "thread" list ... according to
my mailbox, all but one message i've recieved from you included a
"References:" header (if not a In-Reply-To header) which causes some mail
archivers to assume it's part of an existing thread (this thread for
instance is considered part of the "Double Solr Installation on Single
Tomcat (or Double Index)" thread) ... you may wnat to experiement with
your mail client (off list) to see if you can figure out when/why this
happening.

: Secondly, Chris, thanks for all the useful answers, everything is much
: clearer now. This info should be added to the wiki I think; should I do

feel free ... that's why it's a wiki.

: it? I'm still a little disappointed that I can't change the OR/AND
: parsing by just changing some parameter (like I can do for the number of
: results returned, for example); adding a OR between each word in the
: text i want to compare sounds suboptimal, but i'll probably do it that
: way; its a very minor nitpick, solr is awesome, as I said before.

it would be a fairly simple option to add just like changing the
default field (patches welcome!) but as i said -- typcially if you don't
want the default behavior you are programaticaly generating the query
anyway, and already adding some markup, a little more doesn't make it less
optimal.





-Hoss



Re: search terms submitted

2006-09-10 Thread Chris Hostetter

: Just wondering what others do with the search terms people type into your
: solr search boxes?

you mean besies return results that match :)

: Does CNet use this information for "Popular Searches?"

yep...

http://shopper-search.cnet.com/top




-Hoss