Re: spellcheck.onlyMorePopular

2009-02-16 Thread Marcus Stratmann
Shalin Shekhar Mangar wrote: The implementation is a bit more complicated. 1. Read all tokens from the specified field in the solr index. 2. Create n-grams of the terms read in #1 and index them into a separate Lucene index (spellcheck index). 3. When asked for suggestions, create n-grams of the

Re: spellcheck.onlyMorePopular

2009-02-13 Thread Marcus Stratmann
Shalin Shekhar Mangar wrote: If onlyMorePopular=true, then the algorithm finds tokens which have greater frequency than the searched term. Among these terms, the one which is closest (by edit distance) is returned. Okay, this is a bit weird, but I think I got it now. Let me try to explain it u

Re: spellcheck.onlyMorePopular

2009-02-13 Thread Marcus Stratmann
Shalin Shekhar Mangar wrote: And to come back to my last question: There seems to be no case in which "onlyMorePopular=false" makes sense (provided Grant's assumption is correct). Do you see one? Here's a use-case -- you provide a mis-spelled word and you want the closest suggestion by edit dis

Re: spellcheck.onlyMorePopular

2009-02-13 Thread Marcus Stratmann
Shalin Shekhar Mangar wrote: The end goal is to give spelling suggestions. Even if it gave less frequently occurring spelling suggestions, what would you do with it? To give you an example: We have an index for computer games. One title is "gran turismo". The word "gran" is less frequent in the

Re: spellcheck.onlyMorePopular

2009-02-13 Thread Marcus Stratmann
Grant Ingersoll wrote: I believe the reason is b/c when onlyMP is false, if the word itself is already in the index, it short circuits out. When onlyMP is true, it checks to see if there are more frequently occurring variations. This would mean that onlyMorePopular=false isn't useful at all. If

Re: Differences in output of spell checkers

2009-02-12 Thread Marcus Stratmann
Hi Grant, thanks for your help. I have just one more question: BTW, one workaround is to simply create an index from your file and then use the IndexBasedSpellChecker. Each line equals one document. You could even assign weights that way. In the solrconfig.xml there is a line field Can I u

spellcheck.onlyMorePopular

2009-02-12 Thread Marcus Stratmann
Hello, I have another question concerning the spell checking mechanism. Setting onlyMorePopular=true and using the parameters spellcheck=true&spellcheck.q=gran&q=gran&spellcheck.onlyMorePopular=true I get the result 1 0 4 13 32 grand true which is oka

Re: Differences in output of spell checkers

2009-02-05 Thread Marcus Stratmann
Hello, Are you sending in the same query to both? Frequency and word only get printed when extendedResults == true. correctlySpelled only gets printed when there is Index frequency information. For the FileBasedSpellChecker, there is no Frequency information, so it isn't returned. Yes, I

Differences in output of spell checkers

2009-02-04 Thread Marcus Stratmann
Hello, I'm trying to learn how to use the spell checkers of solr (1.3). I found out that FileBasedSpellChecker and IndexBasedSpellChecker produce different outputs. IndexBasedSpellChecker says 1 0

Re: [VOTE] Community Logo Preferences

2008-11-25 Thread Marcus Stratmann
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png https://issues.apache.org/jira/secure/attachment/12393936/logo_remake.jpg https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_

Re: Distribution without SSH?

2007-11-30 Thread Marcus Stratmann
Justin Knoll wrote: We plan to attempt to rewrite the snappuller (and possibly other distribution scripts, as required) to eliminate this dependency on SSH. I thought I ask the list in case anyone has experience with this same situation or any insights into the reasoning behind requiring SSH ac

Re: sort problem

2007-09-03 Thread Marcus Stratmann
If you could live with a cap of 2B on message id, switching to type "int" would decrease the memory usage to 4 bytes per doc (presumably you don't need range queries?) I haven't found exact definitions of the fieldTypes anywhere. Does "integer" span the common range from -2^31 to 2^31-1? And t

Re: Solr 1.1 HTTP server stops responding

2007-07-30 Thread Marcus Stratmann
Hi David, We're running Solr 1.1 and we're seeing intermittent cases where Solr stops responding to HTTP requests. It seems like the listener on port 8983 just doesn't respond. When we started using solr we encountered the same problem. We are currently running solr 1.0 (!) with tomcat 5.5 o

Re: who uses Solr?

2006-06-19 Thread Marcus Stratmann
Our Solr system is up now since a few days. You can find it at http://www.booklooker.de/ I'm sorry we have a german user interface only, but maybe if you want to try out our system you just can fill out some fields in our search form and press "suchen" on the right side. We are "book brokers" an

Re: OutOfMemory error while sorting

2006-06-19 Thread Marcus Stratmann
Hi, Chris Hostetter wrote: This is a fairly typical Lucene issue (ie: not specific to Solr)... Ah, I see. I should really put more attention on Lucene. But when working with Solr I sometimes forget about the underlying technology. Sorting on a field requires building a FieldCache for every d

OutOfMemory error while sorting

2006-06-14 Thread Marcus Stratmann
Hello, I have a new problem with OutOfMemory errors. As I reported before, we have an index with more than 10 million documents and 23 fields. Recently I added a new field which we will only use for sorting purposes (by "adding" I mean building a new index). But it turned out that every query

Re: solrconfig environment variable

2006-05-24 Thread Marcus Stratmann
Talking about configuration and system properties: is it possible to set the log level of Solr's logger from a system property? Or is there any other way to change this level during the start of the servlet container? Thanks, Marcus

Re: One big XML file vs. many HTTP requests

2006-05-21 Thread Marcus Stratmann
Erik Hatcher wrote: I believe that Solr indexes one document at a time; each document requires a separate HTTP POST. Actually adding multiple documents per POST is possible But deleting multiple documents with just one POST is not possible, right? Is there a special reason for that or is it be

Re: Separate config and index per webapp

2006-05-17 Thread Marcus Stratmann
Chris Hostetter wrote: correct .. we thought we can impliment something that looked at the war file name easily ... but then we were set straight -- there is no portable way to do that, hence we came up with the current JNDI plan which isn't quite as "out of the box" as we had hoped, but it has t

Re: Separate config and index per webapp

2006-05-17 Thread Marcus Stratmann
Yonik Seeley wrote: I am hoping I can change the default location for each webapp. Thanks! It's not yet possible, but see this thread: http://www.mail-archive.com/solr-dev@lucene.apache.org/msg00298.html If I see it right, if I just rename the webapp to, say, "solrfoo" then it still uses the s

Re: Java heap space

2006-05-15 Thread Marcus Stratmann
On 5/4/06, I wrote: > From my point of view it looks like this: Revision 393957 works while > the latest revision cause problems. I don't know what part of the > distribution causes the problems but I will try to find out. I think a > good start would be to find out which was the first revision no

Re: solr setup

2006-05-05 Thread Marcus Stratmann
Yonik Seeley wrote: > If you start from a normal tomcat distribution, we will be able to > eliminate that difference. Yes, I finally got Solr working with Tomcat. But there are still two minor problems. The first appears when I try to get the statistics page. I'm getting this error message: org.a

Re: Java heap space

2006-05-04 Thread Marcus Stratmann
Chris Hostetter wrote: This is because building a full Solr distribution from scratch requires that you have JUnit. Bt it is not required to run Solr. Ah, I see. That was a very valuable hint for me. I was able now to compile an older revision (393957). Testing this revision I was able to dele

Re: Java heap space

2006-05-03 Thread Marcus Stratmann
Yonik Seeley wrote: Is your problem reproducable with a test case you can share? Well, you can get the configuration files. If you ask for the data, this could be a problem, since this is "real" data from our production database. The amount of data needed could be another problem. You could al

Re: Java heap space

2006-05-03 Thread Marcus Stratmann
Hello, deleting or updating documents is still not possible for me so now I tried to built a completely new index. Unfortunately this didn't work either. Now I'm getting OOM after inserting slightly more than 20,000 documents to the new index. To me this looks as if a bug has been introduced

Re: Java heap space

2006-05-01 Thread Marcus Stratmann
Chris Hostetter wrote: > this is off the subject of the heap space issue ... but if the id changes, > then maybe it shouldn't be the uniqueId of your index? .. your code must > have someone of recognizing that article B with id 222 is a changed > version of article A with id 111 (otherwise how woul

Re: Java heap space

2006-05-01 Thread Marcus Stratmann
Yonik Seeley wrote: Yes, on a delete operation. I'm not doing any commits until the end of all delete operations. I assume this is a delete-by-id and not a delete-by-query? They work very differently. Yes, all queries are delete-by-id. If you are first deleting so you can re-add a newer ve

Re: Java heap space

2006-04-29 Thread Marcus Stratmann
Chris Hostetter wrote: > interesting .. are you getting the OutOfMemory on an actual delete > operation or when doing a commit after executing some deletes? Yes, on a delete operation. I'm not doing any commits until the end of all delete operations. After reading this I was curious if using commi

Re: Java heap space

2006-04-28 Thread Marcus Stratmann
Chris Hostetter wrote: How big is your physical index directory on disk? It's about 2.9G now. Is there a direct connection between size of index and usage of ram? Your best bet is to allocate as much ram to the server as you can. Depending on how full your caches are, and what hitratios you ar

Re: Synchronizing commit and optimize

2006-04-28 Thread Marcus Stratmann
Yonik Seeley wrote: >I think you are probably right about Jetty timing out the request. >Solr doesn't implement timeouts for requests, and I havent' seen this >behavior with Solr running on Resin. > >You could try another app server like Tomcat, or perhaps figure out of >the Jetty timeout is config

Synchronizing commit and optimize

2006-04-24 Thread Marcus Stratmann
Hello, when doing a commit or optimize the operation takes quite long (in my test case at least some minutes). When I submit the command via curl, I get the response "curl: (52) Empty reply from server" though solr is still working (as I can see from the process list and the admin interface). I tr

Re: Deleting documents

2006-04-15 Thread Marcus Stratmann
Yonik Seeley wrote: > OK, I think I fixed this bug. Haven't added a test case yet... In our test case everything works properly now. Thanks for the quick bugfix! Marcus

Re: Deleting documents

2006-04-12 Thread Marcus Stratmann
> Yes, I believe the Wiki has an example like this (a uniqueKey field > not named "id") Right, I should have looked there, too. > > But after a I found the number of documents unchanged > > in the stats. > What stat? maxDoc may be unchanged since it doesn't reflect deleted > documents that haven

Deleting documents

2006-04-11 Thread Marcus Stratmann
Hello, I have a problem deleting documents from the index. In the tutorial "SP2514N" is used as an example for deleting. I was wondering if "" is some kind of keyword or the name of a field (in the example, a unique field named "id" is used). In my config I have the line bookID making bookID (ty

Re: solr setup

2006-03-28 Thread Marcus Stratmann
> Solr looks in the current working directory for the solrconf > directory, so it depends where that ends up when tomcat is started. Meanwhile I found out that tomcat is located in /usr/share/tomcat5 and that there is a bin-directory in it, which I was searching for. A handfull of links are pointin

Re: solr setup

2006-03-28 Thread Marcus Stratmann
Hi, I have a tomcat5 running under linux (debian). I think that my configuration may be wrong, because I don't get solr running. Yonik Seeley wrote: >the layout should look something like this: > >tomcat/webapps/solr.war >tomcat/solrconf/solrconfig.xml, schema.xml, etc >tomcat/bin/startup.sh > >t