Re: Simple Faceted Searching out of the box

2006-09-09 Thread Tim Archambault

Hoss,

What is "faceted browsing"? Maybe an example of a site interface that is
using it would be good. Dumb question, I know.


On 9/8/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:



Hey everybody, I just wanted to officially announce that as of the
solr-2006-09-08.zip nightly build, Solr supports some simple Faceted
Searching options right out of the box.

Both the StandardRequestHandler and DisMaxRequestHandler now support some
query params for specifying simple queries to use as facet constraints, or
fields in your index you wish to use as facets - generating a constraint
count for each term in the field.  All of these params can be configured
as "defaults" when registering the RequestHandler in your solrconfig.xml

Information on what the new facet parameters are, how to use them, and
what types of resultsthey generate can be found in the wiki...

http://wiki.apache.org/solr/SimpleFacetParameters
http://wiki.apache.org/solr/StandardRequestHandler
http://wiki.apache.org/solr/DisMaxRequestHandler

...as allways: feedback, comments, suggestions and general discussion is
strongly encouraged :)


-Hoss




Re: Simple Faceted Searching out of the box

2006-09-09 Thread Erik Hatcher


On Sep 9, 2006, at 8:15 AM, Tim Archambault wrote:
What is "faceted browsing"? Maybe an example of a site interface  
that is

using it would be good. Dumb question, I know.


Faceted browsing is like this:  http://shopper.cnet.com/ and http:// 
www.nines.org/collex


In Collex, the "constrain further" box are the facets.  Clicking on  
them adds them to "your constraints".  The idea is to divide the  
documents in the index into distinct buckets (or sets) and show the  
counts of how many results are in each set.


Erik



Re: Simple Faceted Searching out of the box

2006-09-09 Thread Tim Archambault

I need to understand this then. Thanks. I want to use Solr for our newspaper
website and this would be a great way to break out content. Kind of greys
the lines between what is search and what is browsing categories, which is a
great thing actually. Thanks for the help.

Tim


On 9/9/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:



On Sep 9, 2006, at 8:15 AM, Tim Archambault wrote:
> What is "faceted browsing"? Maybe an example of a site interface
> that is
> using it would be good. Dumb question, I know.

Faceted browsing is like this:  http://shopper.cnet.com/ and http://
www.nines.org/collex

In Collex, the "constrain further" box are the facets.  Clicking on
them adds them to "your constraints".  The idea is to divide the
documents in the index into distinct buckets (or sets) and show the
counts of how many results are in each set.

   Erik




Re: Simple Faceted Searching out of the box

2006-09-09 Thread James liu

Good. Thk u,Hoss.

2006/9/9, Tim Archambault <[EMAIL PROTECTED]>:


Hoss,

What is "faceted browsing"? Maybe an example of a site interface that is
using it would be good. Dumb question, I know.


On 9/8/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> Hey everybody, I just wanted to officially announce that as of the
> solr-2006-09-08.zip nightly build, Solr supports some simple Faceted
> Searching options right out of the box.
>
> Both the StandardRequestHandler and DisMaxRequestHandler now support
some
> query params for specifying simple queries to use as facet constraints,
or
> fields in your index you wish to use as facets - generating a constraint
> count for each term in the field.  All of these params can be configured
> as "defaults" when registering the RequestHandler in your solrconfig.xml
>
> Information on what the new facet parameters are, how to use them, and
> what types of resultsthey generate can be found in the wiki...
>
> http://wiki.apache.org/solr/SimpleFacetParameters
> http://wiki.apache.org/solr/StandardRequestHandler
> http://wiki.apache.org/solr/DisMaxRequestHandler
>
> ...as allways: feedback, comments, suggestions and general discussion is
> strongly encouraged :)
>
>
> -Hoss
>
>




IIS web server and Solr integration

2006-09-09 Thread Tim Archambault

Looking to use Solr, but in Dedicated Windows environment.

Can anyone provide me some input as to how this can be done?

If it can, do you recommend Jetty, Tomcat, other?

Should it run on a separate port than IIS or integrated using ISAPI plug-in?

Any help is greatly appreciated. Our site isn't huge traffic 4 million page
views per month (GA) and about 20,000 visitors per day.

Thanks,

Tim


Got it working! And some questions

2006-09-09 Thread Michael Imbeault
First of all, in reference to 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html , 
I got it working! The problem(s) was coming from solPHP; the 
implementation in the wiki isn't really working, to be honest, at least 
for me. I had to modify it significantly at multiple places to get it 
working. Tomcat 5.5, WAMP and Windows XP.


The main problem was that addIndex was sending 1 doc at a time to solr; 
it would cause a problem after a few thousand docs because i was running 
out of resources. I modified solr_update.php to handle batch queries, 
and i'm now sending batches of 1000 docs at a time. Great indexing speed.


Had a slight problem with the curl function of solr_update.php; the 
custom HTTP header wasn't recognized; I now use curl_setopt($ch, 
CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); - 
much simpler, and now everything works!


Up so far I indexed 15.000.000 documents (my whole collection, 
basically) and the performance i'm getting is INCREDIBLE (sub 100ms 
query time without warmup and no optimization at all on a 7 gigs index - 
and with the cache, it gets stupid fast)! Seriously, Solr amaze me every 
time I use it. I increased HashDocSet Maxsize to 75000, will continue to 
optimize this value - it helped a great deal. I will try disMaxHandler 
soon too; right now the standard one is great. And I will index with a 
better stopword file; the default one could really use improvements.


Some questions (couldn't find the answer in the docs):

- Is the solr php in the wiki working out of the box for anyone? Else we 
could modify the wiki...


- What is the loadFactor variable of HashDocSet? Should I optimize it too?

- What's the units on the size value of the caches? Megs, number of 
queries, kilobytes? Not described anywhere.


- Any way to programatically change the OR/AND preference of the query 
parser? I set it to AND by default for user queries, but i'd like to set 
it to OR for some server-side queries I must do (find related articles, 
order by score).


- Whats the difference between the 2 commits type? Blocking and 
non-blocking. Didn't see any differences at all, tried both.


- Every time I do an  command, I get the following in my 
catalina logs - should I do anything about it?


9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.io.EOFException: no more 
data available - expected end tag  to close start tag 
 from line 1, parser stopped on START_TAG seen ... @1:10


- Any benefits of setting the allowed memory for Tomcat higher? Right 
now im allocating 384 megs.


Can't wait to try the new Faceted Queries... seriously, solr is really, 
really awesome up so far. Thanks for all your work, and sorry for all 
the questions!


--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



RE: Got it working! And some questions

2006-09-09 Thread Brian Lucas
Hi Michael,

I apologize for the lack of testing on the SolPHP.  I had to "strip" it down
significantly to turn it into a general class that would be usable and the
version up there has not been extensively tested yet (I'm almost ready to
get back to that and "revise" it), plus much of my coding is done in Rails
at the moment.  However...

If you have a new version, could you send it over my way or just upload it
to the wiki?  I'd like to take a look at the changes and throw your revised
version up there or integrate both versions into a cleaner revision of the
version already there.

With respect to batch queries, it's already designed to do that (that's why
you see "array($array)" in the example, because it accepts an array of
updates) but I'd definitely like to see how you revised it.

Thanks,
Brian


-Original Message-
From: Michael Imbeault [mailto:[EMAIL PROTECTED] 
Sent: Saturday, September 09, 2006 12:30 PM
To: solr-user@lucene.apache.org
Subject: Got it working! And some questions

First of all, in reference to 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html , 
I got it working! The problem(s) was coming from solPHP; the 
implementation in the wiki isn't really working, to be honest, at least 
for me. I had to modify it significantly at multiple places to get it 
working. Tomcat 5.5, WAMP and Windows XP.

The main problem was that addIndex was sending 1 doc at a time to solr; 
it would cause a problem after a few thousand docs because i was running 
out of resources. I modified solr_update.php to handle batch queries, 
and i'm now sending batches of 1000 docs at a time. Great indexing speed.

Had a slight problem with the curl function of solr_update.php; the 
custom HTTP header wasn't recognized; I now use curl_setopt($ch, 
CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); - 
much simpler, and now everything works!

Up so far I indexed 15.000.000 documents (my whole collection, 
basically) and the performance i'm getting is INCREDIBLE (sub 100ms 
query time without warmup and no optimization at all on a 7 gigs index - 
and with the cache, it gets stupid fast)! Seriously, Solr amaze me every 
time I use it. I increased HashDocSet Maxsize to 75000, will continue to 
optimize this value - it helped a great deal. I will try disMaxHandler 
soon too; right now the standard one is great. And I will index with a 
better stopword file; the default one could really use improvements.

Some questions (couldn't find the answer in the docs):

- Is the solr php in the wiki working out of the box for anyone? Else we 
could modify the wiki...

- What is the loadFactor variable of HashDocSet? Should I optimize it too?

- What's the units on the size value of the caches? Megs, number of 
queries, kilobytes? Not described anywhere.

- Any way to programatically change the OR/AND preference of the query 
parser? I set it to AND by default for user queries, but i'd like to set 
it to OR for some server-side queries I must do (find related articles, 
order by score).

- Whats the difference between the 2 commits type? Blocking and 
non-blocking. Didn't see any differences at all, tried both.

- Every time I do an  command, I get the following in my 
catalina logs - should I do anything about it?

 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.io.EOFException: no more 
data available - expected end tag  to close start tag 
 from line 1, parser stopped on START_TAG seen ... @1:10

- Any benefits of setting the allowed memory for Tomcat higher? Right 
now im allocating 384 megs.

Can't wait to try the new Faceted Queries... seriously, solr is really, 
really awesome up so far. Thanks for all your work, and sorry for all 
the questions!

-- 
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



Re: Re: SolrCore as Singleton?

2006-09-09 Thread Tim Archambault

In regard to the comment about lack of an interface, I view this as a
benefit of the tool.

Whether I'm developing with Python, PHP, Coldfusion, .NET, Java, etc.
I can create my own customizable interface. As a coldfusion programmer
with moderate programming capabilities, this tool is perfect for my
needs.



On 9/8/06, Andrew May <[EMAIL PROTECTED]> wrote:

Chris Hostetter wrote:
> : Nice.  Is the same doable under Jetty? (never had to deal with JNDI
> : under Jetty)
>
> i haven't tried it personally, but according to Yoav "reading" JNDI
> options is part of hte Servlet Spec, and billa found a refrene to
> useing "" to do so...
>
> http://www.nabble.com/Re%3A-multiple-solr-webapps-p3991310.html
>
> ...where exactly that option goes in Jetty's configuration isn't something
> i'm clear on.
>

 values go in web.xml, so it would mean having modified versions of 
solr.war
for each collection.

 is an optional part of the Servlet spec for standalone servlet
implementations. The basic version of Jetty does not have any JNDI support, you 
need to
use JettyPlus (http://jetty.mortbay.org/jetty5/plus/index.html) for that.

-Andrew



Re: SolrCore as Singleton?

2006-09-09 Thread Eivind Hasle Amundsen

Tim Archambault wrote:

In regard to the comment about lack of an interface, I view this as a
benefit of the tool.

Whether I'm developing with Python, PHP, Coldfusion, .NET, Java, etc.
I can create my own customizable interface. As a coldfusion programmer
with moderate programming capabilities, this tool is perfect for my
needs.


That's good to hear. I never meant that a GUI should replace anything at 
all. Did it come out that way?


As the product evolves, it is only natural to add capabilities and 
features. Some of these should be available from different interfaces, 
including GUI(s). However one should be able to interface with the 
application at different levels. When Solr gets more complex over time, 
care must be taken so it does not get complicated. There might be 
numerous more points of entry into a more complex product. It is 
necessary to keep things simple as well as providing centralized 
configuration possibilities. Following this philosophy, Solr users will 
be able to choose their level of interaction.


(In a metaphor, some people prefer using GNU/Linux just by installing a 
distro; others compile and become best friends with the command line.)


Eivind


Re: Got it working! And some questions

2006-09-09 Thread James liu

- Is the solr php in the wiki working out of the box for anyone?
show your php.ini. did you performance your php?




2006/9/10, Brian Lucas <[EMAIL PROTECTED]>:


Hi Michael,

I apologize for the lack of testing on the SolPHP.  I had to "strip" it
down
significantly to turn it into a general class that would be usable and the
version up there has not been extensively tested yet (I'm almost ready to
get back to that and "revise" it), plus much of my coding is done in Rails
at the moment.  However...

If you have a new version, could you send it over my way or just upload it
to the wiki?  I'd like to take a look at the changes and throw your
revised
version up there or integrate both versions into a cleaner revision of the
version already there.

With respect to batch queries, it's already designed to do that (that's
why
you see "array($array)" in the example, because it accepts an array of
updates) but I'd definitely like to see how you revised it.

Thanks,
Brian


-Original Message-
From: Michael Imbeault [mailto:[EMAIL PROTECTED]
Sent: Saturday, September 09, 2006 12:30 PM
To: solr-user@lucene.apache.org
Subject: Got it working! And some questions

First of all, in reference to
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html ,
I got it working! The problem(s) was coming from solPHP; the
implementation in the wiki isn't really working, to be honest, at least
for me. I had to modify it significantly at multiple places to get it
working. Tomcat 5.5, WAMP and Windows XP.

The main problem was that addIndex was sending 1 doc at a time to solr;
it would cause a problem after a few thousand docs because i was running
out of resources. I modified solr_update.php to handle batch queries,
and i'm now sending batches of 1000 docs at a time. Great indexing speed.

Had a slight problem with the curl function of solr_update.php; the
custom HTTP header wasn't recognized; I now use curl_setopt($ch,
CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); -
much simpler, and now everything works!

Up so far I indexed 15.000.000 documents (my whole collection,
basically) and the performance i'm getting is INCREDIBLE (sub 100ms
query time without warmup and no optimization at all on a 7 gigs index -
and with the cache, it gets stupid fast)! Seriously, Solr amaze me every
time I use it. I increased HashDocSet Maxsize to 75000, will continue to
optimize this value - it helped a great deal. I will try disMaxHandler
soon too; right now the standard one is great. And I will index with a
better stopword file; the default one could really use improvements.

Some questions (couldn't find the answer in the docs):

- Is the solr php in the wiki working out of the box for anyone? Else we
could modify the wiki...

- What is the loadFactor variable of HashDocSet? Should I optimize it too?

- What's the units on the size value of the caches? Megs, number of
queries, kilobytes? Not described anywhere.

- Any way to programatically change the OR/AND preference of the query
parser? I set it to AND by default for user queries, but i'd like to set
it to OR for some server-side queries I must do (find related articles,
order by score).

- Whats the difference between the 2 commits type? Blocking and
non-blocking. Didn't see any differences at all, tried both.

- Every time I do an  command, I get the following in my
catalina logs - should I do anything about it?

9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.io.EOFException: no more
data available - expected end tag  to close start tag
 from line 1, parser stopped on START_TAG seen ...
@1:10

- Any benefits of setting the allowed memory for Tomcat higher? Right
now im allocating 384 megs.

Can't wait to try the new Faceted Queries... seriously, solr is really,
really awesome up so far. Thanks for all your work, and sorry for all
the questions!

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212




search terms submitted

2006-09-09 Thread Tim Archambault

Just wondering what others do with the search terms people type into your
solr search boxes?

Does CNet use this information for "Popular Searches?"

Just curious.

FYI. SOLR is up  and running on my Windows 2003 IIS machine. Thanks for
everyone's feedback.


Re: Doc add limit problem, old issue

2006-09-09 Thread Michael Imbeault
Fixed my problem, the implementation of solPHP was faulty. It was 
sending one doc at a time (one curl per doc) and the system quickly ran 
out of resources. Now I modified it to send by batch (1000 at a time) 
and everything is #1!


Michael Imbeault wrote:
Old issue (see 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00651.html), 
but I'm experiencing the same exact thing on windows xp, latest 
tomcat. I noticed that the tomcat process gobbles memory (10 megs a 
second maybe) and then jams at 125 megs. Can't find a fix yet. I'm 
using a php interface and curl to post my xml, one document at a time, 
and commit every 100 document. Indexing 3 docs, it hangs at maybe 
5000. Anyone got an idea on this one? It would be helpful. I may try 
to switch to jetty tomorrow if nothing works :(



--

Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212