Re: highlight and wildcards ?

2007-06-07 Thread Mike Klaas
On 7-Jun-07, at 5:27 PM, Frédéric Glorieux wrote: Hoss, Thanks for all your information and pointers. I know that my problems are not mainstream. Have you tried commenting out getPrefixQuery in solr.search.SolrQueryParser? It should then revert to a "regular" lucene prefix query. -Mi

Re: highlight and wildcards ?

2007-06-07 Thread Frédéric Glorieux
Hoss, Thanks for all your information and pointers. I know that my problems are not mainstream. ConstantScoreQuery @author yonik public void extractTerms(Set terms) { // OK to not add any terms when used for MultiSearcher, // but may not be OK for highlighting } ConstantScoreRangeQ

Re: What logging facility shoould in my Solr plugin?

2007-06-07 Thread Ryan McKinley
Teruhiko Kurosaka wrote: I see Solr uses the JDK java.util.logging.Logger. I should also be using this Logger when I write a plugin, correct? You can use which ever logging you like ;) solr uses JDK logging. If you want to contribute the plugin back to solr, it will need to use JDK logging

What logging facility shoould in my Solr plugin?

2007-06-07 Thread Teruhiko Kurosaka
I see Solr uses the JDK java.util.logging.Logger. I should also be using this Logger when I write a plugin, correct? I am asking only because I see commons-logging.jar in apache-solr-1.1.0-incubating/example/ext What is this for? -kuro

Re: DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Casey Durfee
Thank you! That makes sense. --Casey >>> Mike Klaas <[EMAIL PROTECTED]> 6/7/2007 2:35 PM >>> On 7-Jun-07, at 1:41 PM, Casey Durfee wrote: > It appears that if your search terms include stopwords and you use > the DisMax request handler, you get no results whereas the same > search with the

Re: DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Mike Klaas
On 7-Jun-07, at 1:41 PM, Casey Durfee wrote: It appears that if your search terms include stopwords and you use the DisMax request handler, you get no results whereas the same search with the standard request handler does give you results. Is this a bug or by design? There is a subtlety

Re: DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Casey Durfee
Sure thing. I downloaded the latest version of Solr, started up the example server, and indexed the ipod_other.xml file. The following URLs give a result: http://localhost:8983/solr/select/?q=ipod http://localhost:8983/solr/select/?q=the+ipod http://localhost:8983/solr/select/?q=ipod&qt=dism

Re: DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Chris Hostetter
: It appears that if your search terms include stopwords and you use the : DisMax request handler, you get no results whereas the same search with : the standard request handler does give you results. Is this a bug or by : design? dismax works just fine with stop words ... can you give a specifi

Re: solr+hadoop = next solr

2007-06-07 Thread Rafael Rossini
Hi, Jeff and Mike. Would you mind telling us about the architecture of your solutions a little bit? Mike, you said that you implemented a highly-distributed search engine using Solr as indexing nodes. What does that mean? You guys implemented a master, multi-slave solution for replication? Or t

Re: TextField case sensitivity

2007-06-07 Thread Mike Klaas
On 7-Jun-07, at 1:04 PM, Xuesong Luo wrote: Ryan, you are right, that's the problem. WilliAM is treated as two words by the WordDelimiterFilterFactory. I have found this behaviour a little too aggresive for my needs, so i added an option to disable it. Patch is here: http://issues.apach

DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Casey Durfee
It appears that if your search terms include stopwords and you use the DisMax request handler, you get no results whereas the same search with the standard request handler does give you results. Is this a bug or by design? Thanks, --Casey

Re: solr+hadoop = next solr

2007-06-07 Thread Jeff Rodenburg
Mike - thanks for the comments. Some responses added below. On 6/7/07, Mike Klaas <[EMAIL PROTECTED]> wrote: I've implemented a highly-distributed search engine using Solr (200m docs and growing, 60+ servers). It is not a Solr-based solution in the vein of FederatedSearch--it is a higher-le

RE: TextField case sensitivity

2007-06-07 Thread Xuesong Luo
Ryan, you are right, that's the problem. WilliAM is treated as two words by the WordDelimiterFilterFactory. Thanks Xuesong -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Thursday, June 07, 2007 11:30 AM To: solr-user@lucene.apache.org Subject: Re: TextField case s

RE: TextField case sensitivity

2007-06-07 Thread Xuesong Luo
I have WordDelimiterFilter defined in the schema, I didn't include it in my original email because I thought it doesn't matter. It seems it matters. Looks like WilliAm is treated as two words. That's why it didn't find a match. Thanks Xuesong -Original Message- From: [EMAIL PROTECTED] [ma

Re: filter query speed

2007-06-07 Thread Yonik Seeley
On 6/7/07, Michael Thessel <[EMAIL PROTECTED]> wrote: Is there a general speed problem with range searches in solr? It looks a bit strange for me, that a query for a term takes 5 ms while adding a filter to the same resultset takes 80s? It's completely dependent on the number of terms in the

Re: filter query speed

2007-06-07 Thread Michael Thessel
Hey Yoink, thanks a lot for your quick reply. > I suspect that the endpoint to your dateline filter changes often, > hence caching is doing no good. Is then endpoint (1181237598) derived > from the current time? Yes, it is. > If so, there are some things you can do: > 1) make it faster to gener

Re: solr+hadoop = next solr

2007-06-07 Thread Mike Klaas
On 6-Jun-07, at 7:44 PM, Jeff Rodenburg wrote: I've been exploring distributed search, as of late. I don't know about the "next solr" but I could certainly see a "distributed solr" grow out of such an expansion. I've implemented a highly-distributed search engine using Solr (200m docs a

Re: filter query speed

2007-06-07 Thread Yonik Seeley
On 6/7/07, Michael Thessel <[EMAIL PROTECTED]> wrote: I've got a problem with filtered queries. I have an index with about 8 million documents. I save a timestamp (not the time of indexing) for each document as an integer field. Querying the index is pretty fast. But when I filter on the timestam

Re: TextField case sensitivity

2007-06-07 Thread Ryan McKinley
have you taken a look the output from the admin/analysis? http://localhost:8983/solr/admin/analysis.jsp?highlight=on This lets you see what tokens are generated for index/query. From your description, I'm suspicious that the generated tokens are actually: willi am Also, if you want the same

Re: highlight and wildcards ?

2007-06-07 Thread Chris Hostetter
: With "a?*" I get the documented lucene error : maxClauseCount is set to 1024 Which is why Solr converts PrefixQueries to ConstantScorePrefixQueries that don't have that problem --the trade off being that they can't be highlighted, and we're right back where we started. It's a question of prior

Re: TextField case sensitivity

2007-06-07 Thread Yonik Seeley
On 6/7/07, Xuesong Luo <[EMAIL PROTECTED]> wrote: I run a problem when searching on a TextField. When I pass q=William or q=WILLiam, solr is able to find records whose default search field value is William, however if I pass q=WilliAm, solr did not return any thing. Sounds like WordDelimiterFil

TextField case sensitivity

2007-06-07 Thread Xuesong Luo
I run a problem when searching on a TextField. When I pass q=William or q=WILLiam, solr is able to find records whose default search field value is William, however if I pass q=WilliAm, solr did not return any thing. I searched on the archive, Yonik mentioned the lowercasefilterfactory doesn't work

Re: Logging errors from multiple solr instances

2007-06-07 Thread Chris Hostetter
: Is this addressed in 1.2 or is running multiple instances of indexes : such a Bad Idea that supporting this would be leading a fool further astray? I still haven't had a chance to try it myself using Tomcat, but here's what i found the last time someone asked about this... http://www.nabble.co

Re: how to crawl when Solr is search engine?

2007-06-07 Thread Mike Klaas
On 7-Jun-07, at 1:04 AM, Manoharam Reddy wrote: Some musing:- (I have used Nutch before and one thing I observed there was that if I delete the crawl folder when Nutch is running, users can still search and obtain proper results. It seems Nutch caches all the indexes in the memory when it starts

Re: highlight and wildcards ?

2007-06-07 Thread Frédéric Glorieux
Same in my project. Chris does mention we can put a ? before the *, so instead of domin*, you can use domin?*, however that requires at least one char following your search string. Right, it works well, and one char is a detail. With "a?*" I get the documented lucene error maxClauseCount is s

filter query speed

2007-06-07 Thread Michael Thessel
Hello UG, I've got a problem with filtered queries. I have an index with about 8 million documents. I save a timestamp (not the time of indexing) for each document as an integer field. Querying the index is pretty fast. But when I filter on the timestamp the queries are extremely slow, even if the

Re: Multi-language indexing and searching

2007-06-07 Thread Walter Underwood
I'm not sure what sort of "field" you mean for defining the language. If you plan to use a single search UI regardless of language, we used to do this in Ultraseek, but it doesn't really work. Queries are too short for reliable language ID (is "die" in German, English, or Latin?), and language-spe

Re: highlight and wildcards ?

2007-06-07 Thread Walter Underwood
Implementing a stemmer for Latin might be easier for you and for your users. It will probably provide better results, too. http://informationr.net/ir/2-1/paper10.html wunder On 6/7/07 10:36 AM, "Frédéric Glorieux" <[EMAIL PROTECTED]> wrote: > Thanks a lot for your answer, sorry to have not scan

Multi-language indexing and searching

2007-06-07 Thread Daniel Alheiros
Hi, I'm just starting to use Solr and so far, it has been a very interesting learning process. I wasn't a Lucene user, so I'm learning a lot about both. My problem is: I have to index and search content in several languages. My scenario is a bit different from other that I've already read in th

RE: highlight and wildcards ?

2007-06-07 Thread Xuesong Luo
Same in my project. Chris does mention we can put a ? before the *, so instead of domin*, you can use domin?*, however that requires at least one char following your search string. -Original Message- From: Frédéric Glorieux [mailto:[EMAIL PROTECTED] Sent: Thursday, June 07, 2007 10:37

Re: highlight and wildcards ?

2007-06-07 Thread Frédéric Glorieux
Xuesong (?), Thanks a lot for your answer, sorry to have not scan the archives before. This a really good and understandable reason, but sad for my project. Prefix queries will be the main activities of my users (they need to search latin texts, so that domin* is enough to match "dominus" or

RE: highlight and wildcards ?

2007-06-07 Thread Xuesong Luo
Frédéric, I asked a similar question several days before, it seems we don't have a perfect solution when using prefix wildcard with highlight. Here is what Chris said: in Solr 1.1, highlighting used the info from the raw query to do highlighting, hence in your query for consult* it would highl

Re: Is this solr 1.2 a final version?

2007-06-07 Thread Yonik Seeley
On 6/7/07, Thierry Collogne <[EMAIL PROTECTED]> wrote: I was just downloading solr and noticed that there is a 1.2 version available. Is this the final 1.2 version? Is this the version that is to be used? Yes. A release is typically available a day before an announcement because it takes a whi

Solr 1.2 released

2007-06-07 Thread Yonik Seeley
Solr 1.2 is now available for download! This is the first release since Solr graduated from the Incubator, and includes many improvements, including CSV/delimited-text data loading, time based auto-commit, faster faceting, negative filters, a spell-check handler, sounds-like word filters, regex te

Re: Wildcards / Binary searches

2007-06-07 Thread Frédéric Glorieux
Sorry to jump on a "Side note" of the thread, but the topic is about some of my need of the moment. Side Note: It's my opinion that "type ahead" or "auto complete' style functionality is best addressed by customized logic (most likely using specially built fields containing all of the pref

Re: how to crawl when Solr is search engine?

2007-06-07 Thread Walter Underwood
Solr is not designed to be a general enterprise search engine. It is a back end search server. If you are going to crawl your intranet, you will need a good crawler that is easy to manage, and the ability to parse lots of kinds of documents. Unfortunately, Solr really doesn't have those. Commerci

Re: Logging errors from multiple solr instances

2007-06-07 Thread Clay Webster
Perhaps not the most elegant, but running each index on a different container & port works pretty well. And we can tune the jvm (and of course caches) differently. --cw

host logging options (was Re: Schema validator/debugger)

2007-06-07 Thread Walter Lewis
Andrew Nagy wrote: Yonik Seeley wrote: I dropped your schema.xml directly into the Solr example (using Jetty), fired it up, and everything works fine!? Okay, I switched over to Jetty and now I get a different error: SEVERE: org.apache.solr.core.SolrException: undefined field text As someone wh

Logging errors from multiple solr instances

2007-06-07 Thread Walter Lewis
I'm running solr 1.1 under Tomcat 5.5. On the development machine there are a modest number of instances of solr indexes (six). In the logs currently the only way to distinguish them is to compare the [EMAIL PROTECTED], where the someIdentifier changes each time Tomcat is restarted (depressin

Re: how to crawl when Solr is search engine?

2007-06-07 Thread Bertrand Delacretaz
On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote: . it's called XSLT. most modern browsers can do the transform on the client side. otherwise there is some server side tools (cocoon I think does this) to do the transform on the server before sending it out Solr also does server-side XSLT,

highlight and wildcards ?

2007-06-07 Thread Frédéric Glorieux
Hi all, I'm talking about solr subversion, jetty example, default documents, like the tutorial. I tried to highlight queries with wildcard. Documents are found like waited, but I haven't seen the terms highlighted. It seems to work with fuzzy search, so I thought it was a supposed feature. A

Re: Highlight in a response writer, bad practice ?

2007-06-07 Thread Frédéric Glorieux
Simplicity. The best answer :o) The memory usage for highlight fields in normal responses is not an issue. If it becomes an issue for you, then you're roughly taking the right approach. However, rather than write your own response writer to solve your issue, you might consider just your

Re: how to crawl when Solr is search engine?

2007-06-07 Thread Manoharam Reddy
Pardon me if I am taking too much of your time. It would be really great if you could please highlight a few advantages of caching and maintenance over nutch. Some musing:- (I have used Nutch before and one thing I observed there was that if I delete the crawl folder when Nutch is running, users

Re: how to crawl when Solr is search engine?

2007-06-07 Thread Ian Holsman
Manoharam Reddy wrote: Thanks for your quick response. This brings me to another question. As far as I know Nutch can take care of crawling as well as indexing. Then why go through the hassle of crawling through Nutch and integrating it into Solr? I found Solr's caching and maintenance easier

Re: how to crawl when Solr is search engine?

2007-06-07 Thread Manoharam Reddy
Thanks for your quick response. This brings me to another question. As far as I know Nutch can take care of crawling as well as indexing. Then why go through the hassle of crawling through Nutch and integrating it into Solr? Another question I have, Solr provides the search results in XML format

Re: solr+hadoop = next solr

2007-06-07 Thread Ian Holsman
Yonik Seeley wrote: On 6/6/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: In terms of the FederatedSearch wiki entry (updated last year), has there been any progress made this year on this topic, at least something worthy of being added or updated to the wiki page? Priorities shifted, and I d

Re: how to crawl when Solr is search engine?

2007-06-07 Thread Ian Holsman
Hi Manoharam. we use nutch to do the crawl, and have used sami's patch of nutch (http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html ) to have it integrate with Solr. It works quite well for our needs. If you are concerned with the speed, Solr also has a CSV upload

how to crawl when Solr is search engine?

2007-06-07 Thread Manoharam Reddy
I have just begun using Solr. I see that we have to insert documents by posting XMLs to solr/update I would like to know how Solr is used as a search engine in enterprises. How do you do the crawling of your intranet and passing the information as XML to solr/update. Isn't this going to be slow?

Is this solr 1.2 a final version?

2007-06-07 Thread Thierry Collogne
Hello, I was just downloading solr and noticed that there is a 1.2 version available. Is this the final 1.2 version? Is this the version that is to be used? Thank you, Thierry