Re: question about fl=score

2008-03-20 Thread j . L
2008/3/20 李银松 <[EMAIL PROTECTED]>: > 1、When I set fl=score ,solr returns just as fl=*,score ,not just scores > Is it a bug or just do it on purpose? u can set fl=id,score, solr not support the style like fl=score > My customer want to get the 1th-10010th added docs > So I have to sort by t

Re: Help Requested

2008-03-20 Thread Norberto Meijome
On Wed, 19 Mar 2008 21:22:42 -0700 (PDT) Raghav Kapoor <[EMAIL PROTECTED]> wrote: > I am new to Solr and I am facing a question if solr can be helpful in a > project that I'm working on. welcome :) > The project is a client/server app that requires a client app to index the > documents and sen

Re: RAM Based Index for Solr

2008-03-20 Thread Norberto Meijome
On Wed, 19 Mar 2008 17:04:34 -0700 (PDT) swarag <[EMAIL PROTECTED]> wrote: > In Lucene there is a Ram Based Index > "org.apache.lucene.store.RAMDirectory". > Is there a way to setup my index in solr to use a RAMDirectory? create a mountpoint on a ramdrive (tmpfs in linux, i think), and put your i

RAM size

2008-03-20 Thread Geert Van Huychem
Hi all, is there a way (or formula) to determine the required amount of RAM memory, e.g. by number of documents, document size? I need to index about 15.000.000 documents, each document is 1 to 3Kb big, only the id of the document will be stored. I've just implemented a testcase on one of o

RE: Language support

2008-03-20 Thread nicolas . dessaigne
You may be interested in a recent discussion that took place on a similar subject: http://www.mail-archive.com/solr-user@lucene.apache.org/msg09332.html Nicolas -Message d'origine- De : David King [mailto:[EMAIL PROTECTED] Envoyé : mercredi 19 mars 2008 20:07 À : solr-user@lucene.apache.

Re: RAM Based Index for Solr

2008-03-20 Thread Jeryl Cook
there currently is no way to use RAMDirectory instead of FSDirectory yet in SOLR, however there is a feature request to implement this. I personally think this will be great because we could use Terracotta to handle the clustering. Jeryl Cook On Thu, Mar 20, 2008 at 1:07 AM, Norberto Meijome <[E

Re: what's up with: java -Ddata=args -jar post.jar ""

2008-03-20 Thread Bill Au
What messages do you see in your log file? Bill On Wed, Mar 19, 2008 at 3:15 PM, <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I'm a new Solr user. I figured my way around Solr just fine (I think) ... > I can index and search ets. And so far I have indexed over 300k documents. > > > > What I can't

Re: Faceting Problem

2008-03-20 Thread Erik Hatcher
When faced with these sorts of issues, it is worthwhile to step back and experiment with Solr's analysis page. http://localhost:8983/solr/ admin/analysis.jsp Select your field type either by name of field or by type, put in some text, and see what happens to it at both indexing and querying

Re: what's up with: java -Ddata=args -jar post.jar ""

2008-03-20 Thread John
Thanks Bill!! Here is the content of the log file?(I restarted Solr so we have a clean log): 127.0.0.1 -? -? [20/03/2008:13:38:09 +] "GET /solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on HTTP/1.1" 200 2538 127.0.0.1 -? -? [20/03/2008:13:38:31 +] "GET /solr/admin/logging.jsp

Re: what's up with: java -Ddata=args -jar post.jar ""

2008-03-20 Thread Yonik Seeley
On Wed, Mar 19, 2008 at 3:15 PM, <[EMAIL PROTECTED]> wrote: > What I'm finding is that I have to do it twice in order for the files to be > "optimized" ... i.e.: the > first post takes 3-4 minutes but leaves the file count as is at 44 ... the > second post takes 2-3 > seconds but shrinks the fil

Re: what's up with: java -Ddata=args -jar post.jar ""

2008-03-20 Thread John
Thanks Yonik!! Yep, I'm on Windows ... so if it can't delete the old files, shouldn't a restart of Solr do the trick?? i.e. the files are no longer locked by Windows ... so they can now be deleted when Solr exits ... I tried it and didn't see any change. Who is keeping those files around / loc

Re: what's up with: java -Ddata=args -jar post.jar ""

2008-03-20 Thread Yonik Seeley
On Thu, Mar 20, 2008 at 10:55 AM, John <[EMAIL PROTECTED]> wrote: > Yep, I'm on Windows ... so if it can't delete the old files, shouldn't a > restart of Solr do the trick?? i.e. the files are no longer locked by Windows > ... so they can now be deleted when Solr exits ... I tried it and didn't

Re: Help Requested

2008-03-20 Thread Raghav Kapoor
Thanks Norberto ! > Any particular reason why need the server in this > situation? pretty much > everything you are doing can be done locally. > Except, probably, cross linking > between client's documents. I have no idea in what > kind of environment this app > is supposed to run (home? office LA

Re: Language support

2008-03-20 Thread David King
You may be interested in a recent discussion that took place on a similar subject: http://www.mail-archive.com/solr-user@lucene.apache.org/msg09332.html Interesting, yes. But since it doesn't actually exist, it's not much help. I guess what I'm asking is, if my approach seems convoluted, I

Re: Language support

2008-03-20 Thread Benson Margulies
Unless you can come up with language-neutral tokenization and stemming, you need to: a) know the language of each document. b) run a different analyzer depending on the language. c) force the user to tell you the language of the query. d) run the query through the same analyzer. On Thu, Mar 20,

Re: Language support

2008-03-20 Thread David King
Unless you can come up with language-neutral tokenization and stemming, you need to: a) know the language of each document. b) run a different analyzer depending on the language. c) force the user to tell you the language of the query. d) run the query through the same analyzer. I can do all o

Re: Language support

2008-03-20 Thread Walter Underwood
Nice list. You may still need to mark the language of each document. There are plenty of cross-language collisions: "die" and "boot" have different meanings in German and English. Proper nouns ("Laserjet") may be the same in all languages, a different problem if you are trying to get answers in on

Re: Language support

2008-03-20 Thread Benson Margulies
You can store in one field if you manage to hide a language code with the text. XML is overkill but effective for this. At one point, we'd investigated how to allow a Lucene analyzer to see more than one field (the language code as well as the text) but I don't think we came up with anything. On

Re: Language support

2008-03-20 Thread Benson Margulies
Token/by/token seems a bit extreme. Are you concerned with macaronic documents? On Thu, Mar 20, 2008 at 12:42 PM, Walter Underwood <[EMAIL PROTECTED]> wrote: > Nice list. > > You may still need to mark the language of each document. There are > plenty of cross-language collisions: "die" and "boot

Re: Language support

2008-03-20 Thread Walter Underwood
Extreme, but guaranteed to work and it avoids bad IDF when there are inter-language collisions. In Ultraseek, we only stored the hash, so the size of the source token didn't matter. Trademarks are a bad source of collisions and anomalous IDF. If you have LaserJet support docs in 20 languages, the

Re: Language support

2008-03-20 Thread Benson Margulies
Oh, Walter! Hello! I thought that name was familiar. Greetings from Basis. All that makes sense. On Thu, Mar 20, 2008 at 1:00 PM, Walter Underwood <[EMAIL PROTECTED]> wrote: > Extreme, but guaranteed to work and it avoids bad IDF when there are > inter-language collisions. In Ultraseek, we only s

Re: FunctionQuery in a custom request handler

2008-03-20 Thread evol__
Hi again, digging this one up. This is the code I've used in my handler. ReciprocalFloatFunction tb_valuesource; tb_valuesource = new ReciprocalFloatFunction(new ReverseOrdFieldSource(TIMEBIAS_FIELD), m, a, b); FunctionQuery timebias = new FunctionQuery(tb_valuesource);

Re: Preferential boosting

2008-03-20 Thread Yonik Seeley
On Thu, Mar 20, 2008 at 3:13 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Suppose I have a schema with an integer field called 'duration'. I want to > find all records, but if the duration is 3 I want those records to be > boosted. > > The index has 10 records, with duration between 2 and 4.

Re: Quoted searches

2008-03-20 Thread Chris Hostetter
: > When I issue a search in quotes, like "tay sachs" : > lucene is returning results as if it were written: tay OR sachs : : If you are using the standard request handler, the default operator is : OR (I assume you didn't use quotes in your query). You can switch the BUt the Justin said "When

Preferential boosting

2008-03-20 Thread Lance Norskog
Suppose I have a schema with an integer field called 'duration'. I want to find all records, but if the duration is 3 I want those records to be boosted. The index has 10 records, with duration between 2 and 4. What is the query that will find all of the records and place the records with durati

Re: Does emty fields affect index size?

2008-03-20 Thread Yonik Seeley
Make sure you omit norms for those fields if possible. If you do that, the index should only be marginally bigger. -Yonik On Thu, Mar 20, 2008 at 3:20 PM, Evgeniy Strokin <[EMAIL PROTECTED]> wrote: > Hello, lets say I have 10 fields and usually some 5 of them are present in > each document. An

RE: Preferential boosting

2008-03-20 Thread Lance Norskog
I was doing something wrong. Bisecting the result set does not work. Using a much larger boost and ORing with the entire index does work. Thanks. *:* OR duration:3^20.0 works -duration:3 OR duration:3^20gives empty result set Now we come to another question

Re: sorting on a multivalued field

2008-03-20 Thread Chris Hostetter
: the custom Sort object seems a bit more direct. : : I'm not very familiar with the solr source. Can you give me some idea : of how to get started -- maybe this is now a better discussion for : solr-dev . . . solr-user is fine ... writing plugins are a user level discussion topic (although ma

Re: highlighting pt2: returning tokens out of order from PhraseQuery

2008-03-20 Thread Erik Hatcher
On Mar 19, 2008, at 10:26 AM, Brian Whitman wrote: Can we somehow force the highlighter to not return snips that do not exactly match the query? Unfortunately not with the current highlighter. But there has been a great deal of work towards fixing this here: http:// issues.apache.org/jir

Re: highlighting pt2: returning tokens out of order from PhraseQuery

2008-03-20 Thread Brian Whitman
Unfortunately not with the current highlighter. But there has been a great deal of work towards fixing this here: http://issues.apache.org/jira/browse/LUCENE-794 ah, thanks Eric, didn't think to check w/ the lucene folks. I see they have somewhat working patches -- does this kind of stuff

Re: Does emty fields affect index size?

2008-03-20 Thread Evgeniy Strokin
Thanks for the info. But what about cache? Will it take more memory for 100 fields schema with the same amount of data? - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, March 20, 2008 3:48:28 PM Subject: Re: Does emty fields affec

Re: Does emty fields affect index size?

2008-03-20 Thread Yonik Seeley
On Thu, Mar 20, 2008 at 4:23 PM, Evgeniy Strokin <[EMAIL PROTECTED]> wrote: > Thanks for the info. But what about cache? Will it take more memory for 100 > fields schema with the same amount of data? For normal searches, not really. -Yonik > - Original Message > From: Yonik Seeley <

Re: Does emty fields affect index size?

2008-03-20 Thread Evgeniy Strokin
This is I found in docs: Omitting norms is useful for saving memory on Fields that do not affect scoring, such as those used for calculating facets. I don't really understand the statement, but does it mean I cannot use those fields as facet fields, because this is exactly why I need those 10

Re: Does emty fields affect index size?

2008-03-20 Thread Yonik Seeley
On Thu, Mar 20, 2008 at 4:46 PM, Evgeniy Strokin <[EMAIL PROTECTED]> wrote: > This is I found in docs: > > Omitting norms is useful for saving memory on Fields that do not affect > scoring, such as those used for calculating facets. > > I don't really understand the statement, but does it mean I

Re: highlighting pt2: returning tokens out of order from PhraseQuery

2008-03-20 Thread Erik Hatcher
On Mar 20, 2008, at 4:13 PM, Brian Whitman wrote: Unfortunately not with the current highlighter. But there has been a great deal of work towards fixing this here: http:// issues.apache.org/jira/browse/LUCENE-794 ah, thanks Eric, didn't think to check w/ the lucene folks. I see they hav

Re: what's up with: java -Ddata=args -jar post.jar ""

2008-03-20 Thread John
Thanks Yonik.? Now that I understand it ... i'm not worried about it.? :) -JM -Original Message- From: Yonik Seeley <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thu, 20 Mar 2008 11:19 am Subject: Re: what's up with: java -Ddata=args -jar post.jar "" On Thu, Mar 20, 20

cannot start solr after adding Analyzer, ClassCaseException error

2008-03-20 Thread xunzhang huang
Hi, everyone After I add a Analyzer to solr, there is a exception ClassCaseException error and solr cannot be started. the detail is: environment: solr 1.2, jdk 1.6.03, ubuntu linux 7.10, and a chinese analyzer I add some lines in schema.xml: I tried some different analyzer,