date:20070622

Re: Conceptual Question

2007-06-22 Thread Frédéric Glorieux


Hi Yonik,

Sorry to jump on an old post


There is a change interface in JIRA, as long as all of the fields
originally sent are stored.


Do you remember the JIRA issue, or a token to find it ? It sounds useful 
in some cases, for example, when you are working on analysers. That 
could be real life for me in future.


--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique

Re: add CJKTokenizer to solr

2007-06-22 Thread Otis Gospodnetic

I'm jumping in the middle of the thread here.
CJK = Chinese, Japanese, Korean
German = etwas ganz anderes
Why are you trying to use CJKAnalyzer+Tokenizer for German?  Have you tried 
German Analyzer from Lucene contrib?

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Xuesong Luo <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, June 22, 2007 8:54:37 AM
Subject: RE: add CJKTokenizer to solr

Thanks, Toru and Chris,
I tried both the CJKTokenizer and CJKAnalyzer. Both return some unexpected 
highlight results when I tested with Germany. The field value I searched is 
"Ein Mann beißt den Hund".  The search criteria is beißt. 

When using CJKAnalyzer, beißt is treated as 2 single terms(bei and ß) the 
highlight result is: 
Ein Mann beißt den Hund 

When using CJKTokenizer, beißt is treated as 3 single terms, the result is:
Ein Mann beißt den Hund

When using standard tokenizer, beißt is treated as a word, the result is:
Ein Mann beißt den Hund


I understand why the standard tokenizer treat beißt as a word, but don't know 
how CJKAnalyzer and CJKAnalyzer work, could anyone explain a little bit?


Thanks
Xuesong

-Original Message-
From: Toru Matsuzawa [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 18, 2007 10:29 PM
To: solr-user@lucene.apache.org
Subject: Re: add CJKTokenizer to solr

I'm sorry. Because it was not possible to append it, 
it sends it again. 

> > I got the error below after adding CJKTokenizer to schema.xml.  I
> > checked the constructor of CJKTokenizer, it requires a Reader parameter,
> > I guess that's why I get this error, I searched the email archive, it
> > seems working for other users. Does anyone know what is the problem?
> 
> 
> CJKTokenizerFactory that I am using is appended.
> 
--
package org.apache.solr.analysis.ja;

import java.io.Reader;
import org.apache.lucene.analysis.cjk.CJKTokenizer ;

import org.apache.lucene.analysis.TokenStream;
import org.apache.solr.analysis.BaseTokenizerFactory;

/**
 * CJKTokenizer for Solr
 * @see org.apache.lucene.analysis.cjk.CJKTokenizer
 * @author matsu
 *
 */
public class CJKTokenizerFactory extends BaseTokenizerFactory {

  /**
   * @see org.apache.solr.analysis.TokenizerFactory#create(Reader)
   */
  public TokenStream create(Reader input) {
return new CJKTokenizer( input );
  }

}


-- 
Trou Matsuzawa

Re: add CJKTokenizer to solr

2007-06-22 Thread Daniel Alheiros

Hi Hoss.

I've done a few tests using reflection to instantiate a simple object and
the results will vary a lot depending on the JVM. As the JVM optimizes code
as it is executed it will vary depending on the usage, but I think we have
something to consider:

If done 1,000 samples (5 clean X loop of 200) and each sample is creating
100,000 objects and the results were:

With reflection:
- Average  : 0.0005418
- Worst (first clean execution): 0.0007760

Without reflection:
- Average  : 0.469
- Worst (first clean execution): 0.0002140

So comparing these numbers, I can see that using reflection on the average
case will cost 10 times more than creating the object without reflection.

But my question is: Do we need to create factories so frequently or the are
just create once and re-used (are they thread safe)? The term Factory made
me think of a class that is responsible for building others instance, so
usually they can be singletons... If they don't need to be created all the
time it will not impact really and will give extra flexibility in terms of
incorporating new Tokenizers (it would make easier to make Solr/Lucene
versions less coupled).

Environment:
java version "1.5.0_07"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
Heap size: 256M
Running on a PowerPC - Mac OS/X 10.4.9 with 1.5Gb RAM

Regards,
Daniel

On 21/6/07 20:39, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:

> 
> : Why instead of that we don't create an UbberFactory that takes the Tokenizer
> : class as a parameter and instantiates the proper Tokenizer?
> 
> The idea has come up before ... and there's really no reason why it
> wouldn't be okay to include a reflection based facotry like this in Solr
> -- it just hasn't been done yet.
> 
> One of the reasons is that there are some performance costs associated
> with the reflection, so we wouldn't want to competley replace the existing
> "configuration via factory name" model with a "configure via class name
> and an uber factory does the reflection quetly in the background" model
> because it's the kind of appraoch that would really only make sense for
> simple prototypes -- in any system where you are really concerned about
> performacne, reflection on every analyzer call would probably be pretty
> expensive.  (allthough i'd love to see benchmarks prove me wrong)
> 
> Another question in my mind is "why doesn't solr provide an optional jar
> with factories for every tokenizer/tokenfilter in the lucene contribs?"
> ... the only answer to that is that no one has bothered to crank out a
> patch that does it.
> 
> http://www.nabble.com/Re%3A-making-schema.xml-nicer-to-read-use-p5939980.html
> http://www.nabble.com/foo-tf1737025.html#a4720545
> 
> 
> -Hoss
> 

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: add CJKTokenizer to solr

2007-06-22 Thread Daniel Alheiros

Sorry I've confused things a bit... The thread safeness have to be
considered only on the Tokenizers, not on the factories. So are the
Tokenizers thread safe?

Regards,
Daniel


On 22/6/07 11:36, "Daniel Alheiros" <[EMAIL PROTECTED]> wrote:

> Hi Hoss.
> 
> I've done a few tests using reflection to instantiate a simple object and
> the results will vary a lot depending on the JVM. As the JVM optimizes code
> as it is executed it will vary depending on the usage, but I think we have
> something to consider:
> 
> If done 1,000 samples (5 clean X loop of 200) and each sample is creating
> 100,000 objects and the results were:
> 
> With reflection:
> - Average  : 0.0005418
> - Worst (first clean execution): 0.0007760
> 
> Without reflection:
> - Average  : 0.469
> - Worst (first clean execution): 0.0002140
> 
> So comparing these numbers, I can see that using reflection on the average
> case will cost 10 times more than creating the object without reflection.
> 
> But my question is: Do we need to create factories so frequently or the are
> just create once and re-used (are they thread safe)? The term Factory made
> me think of a class that is responsible for building others instance, so
> usually they can be singletons... If they don't need to be created all the
> time it will not impact really and will give extra flexibility in terms of
> incorporating new Tokenizers (it would make easier to make Solr/Lucene
> versions less coupled).
> 
> Environment:
> java version "1.5.0_07"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
> Heap size: 256M
> Running on a PowerPC - Mac OS/X 10.4.9 with 1.5Gb RAM
> 
> Regards,
> Daniel
> 
> 
> On 21/6/07 20:39, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:
> 
>> 
>> : Why instead of that we don't create an UbberFactory that takes the
>> Tokenizer
>> : class as a parameter and instantiates the proper Tokenizer?
>> 
>> The idea has come up before ... and there's really no reason why it
>> wouldn't be okay to include a reflection based facotry like this in Solr
>> -- it just hasn't been done yet.
>> 
>> One of the reasons is that there are some performance costs associated
>> with the reflection, so we wouldn't want to competley replace the existing
>> "configuration via factory name" model with a "configure via class name
>> and an uber factory does the reflection quetly in the background" model
>> because it's the kind of appraoch that would really only make sense for
>> simple prototypes -- in any system where you are really concerned about
>> performacne, reflection on every analyzer call would probably be pretty
>> expensive.  (allthough i'd love to see benchmarks prove me wrong)
>> 
>> Another question in my mind is "why doesn't solr provide an optional jar
>> with factories for every tokenizer/tokenfilter in the lucene contribs?"
>> ... the only answer to that is that no one has bothered to crank out a
>> patch that does it.
>> 
>> http://www.nabble.com/Re%3A-making-schema.xml-nicer-to-read-use-p5939980.html
>> http://www.nabble.com/foo-tf1737025.html#a4720545
>> 
>> 
>> -Hoss
>> 
> 
> 
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance on
> it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: add CJKTokenizer to solr

2007-06-22 Thread Otis Gospodnetic

Tokenizers are not thread safe (I made a mistake yesterday saying they are - I 
don't know what I was thinking).
This is why:

public abstract class Tokenizer extends TokenStream {
  /** The text source for this Tokenizer. */
  protected Reader input;   < oops :(
  ...

public abstract class CharTokenizer extends Tokenizer {
  public CharTokenizer(Reader input) {
super(input);
  }
  ...

Otis
 
--
Lucene Consulting -- http://lucene-consulting.com/


- Original Message 
From: Daniel Alheiros <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, June 22, 2007 12:43:50 PM
Subject: Re: add CJKTokenizer to solr

Sorry I've confused things a bit... The thread safeness have to be
considered only on the Tokenizers, not on the factories. So are the
Tokenizers thread safe?

Regards,
Daniel


On 22/6/07 11:36, "Daniel Alheiros" <[EMAIL PROTECTED]> wrote:

> Hi Hoss.
> 
> I've done a few tests using reflection to instantiate a simple object and
> the results will vary a lot depending on the JVM. As the JVM optimizes code
> as it is executed it will vary depending on the usage, but I think we have
> something to consider:
> 
> If done 1,000 samples (5 clean X loop of 200) and each sample is creating
> 100,000 objects and the results were:
> 
> With reflection:
> - Average  : 0.0005418
> - Worst (first clean execution): 0.0007760
> 
> Without reflection:
> - Average  : 0.469
> - Worst (first clean execution): 0.0002140
> 
> So comparing these numbers, I can see that using reflection on the average
> case will cost 10 times more than creating the object without reflection.
> 
> But my question is: Do we need to create factories so frequently or the are
> just create once and re-used (are they thread safe)? The term Factory made
> me think of a class that is responsible for building others instance, so
> usually they can be singletons... If they don't need to be created all the
> time it will not impact really and will give extra flexibility in terms of
> incorporating new Tokenizers (it would make easier to make Solr/Lucene
> versions less coupled).
> 
> Environment:
> java version "1.5.0_07"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
> Heap size: 256M
> Running on a PowerPC - Mac OS/X 10.4.9 with 1.5Gb RAM
> 
> Regards,
> Daniel
> 
> 
> On 21/6/07 20:39, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:
> 
>> 
>> : Why instead of that we don't create an UbberFactory that takes the
>> Tokenizer
>> : class as a parameter and instantiates the proper Tokenizer?
>> 
>> The idea has come up before ... and there's really no reason why it
>> wouldn't be okay to include a reflection based facotry like this in Solr
>> -- it just hasn't been done yet.
>> 
>> One of the reasons is that there are some performance costs associated
>> with the reflection, so we wouldn't want to competley replace the existing
>> "configuration via factory name" model with a "configure via class name
>> and an uber factory does the reflection quetly in the background" model
>> because it's the kind of appraoch that would really only make sense for
>> simple prototypes -- in any system where you are really concerned about
>> performacne, reflection on every analyzer call would probably be pretty
>> expensive.  (allthough i'd love to see benchmarks prove me wrong)
>> 
>> Another question in my mind is "why doesn't solr provide an optional jar
>> with factories for every tokenizer/tokenfilter in the lucene contribs?"
>> ... the only answer to that is that no one has bothered to crank out a
>> patch that does it.
>> 
>> http://www.nabble.com/Re%3A-making-schema.xml-nicer-to-read-use-p5939980.html
>> http://www.nabble.com/foo-tf1737025.html#a4720545
>> 
>> 
>> -Hoss
>> 
> 
> 
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance on
> it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

RE: page rank

2007-06-22 Thread David Xiao

I have a few more questions base on your kindly replies to my first question.

1. My solr instance already indexed hundreds of thousands of documents, so how 
can I update these documents to add new field "numberField"

2. In runtime, my application might want to update value of "numberField" very 
frequency. How to achieve that via solr? Is that performance critical if many 
documents need to be updated?

3. Even I have check below wiki page for FunctionQuery, it is still not clear 
to me to understand this quoted words:
"
> In terms of score which RequestHandler are you planning to use?
> If using dismax you can define a boost function:
> recip(rord(numberField),1,1000,1000)
"
With it, how to let solr take into consideration of this numberField (kind of 
popularity factor)? 
Would it be possible to give me an example please?


Best Regards,
David




-Original Message-
From: Nick Jenkin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 21, 2007 6:30 AM
To: solr-user@lucene.apache.org
Subject: Re: page rank

Also if you are using the standard request handler you can use the "val" hack:

foo:"bar" _val_:"recip(rord(numberField),1,1000,1000)"

You can find more info about this here:
http://wiki.apache.org/solr/FunctionQuery

-Nick

On 6/21/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote:
> Hi David.
>
> Yes you can.
>
> Just define a field as a slong type field:
>
> 
>
> It can be used to sort (&sort=numberField desc) or to boost your score (it
> will depend on the RequestHandler you are going to use).
>
> In terms of score which RequestHandler are you planning to use?
> If using dismax you can define a boost function:
> recip(rord(numberField),1,1000,1000)
>
> I hope it helps.
>
> Regards,
> Daniel Alheiros
>
> On 20/6/07 16:47, "David Xiao" <[EMAIL PROTECTED]> wrote:
>
> > Hello folks,
> >
> >
> >
> > I am using solr to index web contents. I want to know is that possible to 
> > tell
> > solr about rank information of contents?
> >
> > For example, I give each content an integer number.
> >
> >
> >
> > And I hope solr take this number into consideration when it generates search
> > result. (larger number, more priority)
> >
> >
> >
> > Best Regards,
> >
> > David
> >
>
>
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal 
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance 
> on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
>

RE: add CJKTokenizer to solr

2007-06-22 Thread Xuesong Luo

Thanks, otis, I didn't know CJK is only used for Asian language. I'll try the 
German Analyzer. 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 22, 2007 3:18 AM
To: solr-user@lucene.apache.org
Subject: Re: add CJKTokenizer to solr

I'm jumping in the middle of the thread here.
CJK = Chinese, Japanese, Korean
German = etwas ganz anderes
Why are you trying to use CJKAnalyzer+Tokenizer for German?  Have you tried 
German Analyzer from Lucene contrib?

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Xuesong Luo <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, June 22, 2007 8:54:37 AM
Subject: RE: add CJKTokenizer to solr

Thanks, Toru and Chris,
I tried both the CJKTokenizer and CJKAnalyzer. Both return some unexpected 
highlight results when I tested with Germany. The field value I searched is 
"Ein Mann beißt den Hund".  The search criteria is beißt. 

When using CJKAnalyzer, beißt is treated as 2 single terms(bei and ß) the 
highlight result is: 
Ein Mann beißt den Hund 

When using CJKTokenizer, beißt is treated as 3 single terms, the result is:
Ein Mann beißt den Hund

When using standard tokenizer, beißt is treated as a word, the result is:
Ein Mann beißt den Hund


I understand why the standard tokenizer treat beißt as a word, but don't know 
how CJKAnalyzer and CJKAnalyzer work, could anyone explain a little bit?


Thanks
Xuesong

-Original Message-
From: Toru Matsuzawa [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 18, 2007 10:29 PM
To: solr-user@lucene.apache.org
Subject: Re: add CJKTokenizer to solr

I'm sorry. Because it was not possible to append it, 
it sends it again. 

> > I got the error below after adding CJKTokenizer to schema.xml.  I
> > checked the constructor of CJKTokenizer, it requires a Reader parameter,
> > I guess that's why I get this error, I searched the email archive, it
> > seems working for other users. Does anyone know what is the problem?
> 
> 
> CJKTokenizerFactory that I am using is appended.
> 
--
package org.apache.solr.analysis.ja;

import java.io.Reader;
import org.apache.lucene.analysis.cjk.CJKTokenizer ;

import org.apache.lucene.analysis.TokenStream;
import org.apache.solr.analysis.BaseTokenizerFactory;

/**
 * CJKTokenizer for Solr
 * @see org.apache.lucene.analysis.cjk.CJKTokenizer
 * @author matsu
 *
 */
public class CJKTokenizerFactory extends BaseTokenizerFactory {

  /**
   * @see org.apache.solr.analysis.TokenizerFactory#create(Reader)
   */
  public TokenStream create(Reader input) {
return new CJKTokenizer( input );
  }

}


-- 
Trou Matsuzawa

Re: add CJKTokenizer to solr

2007-06-22 Thread Chris Hostetter


: Sorry I've confused things a bit... The thread safeness have to be
: considered only on the Tokenizers, not on the factories. So are the
: Tokenizers thread safe?

nope ... they are constructed using Readers and mainting state about the
text they are processing ... the only api is a "next()" method.

: > But my question is: Do we need to create factories so frequently or the are
: > just create once and re-used (are they thread safe)? The term Factory made
: > me think of a class that is responsible for building others instance, so
: > usually they can be singletons... If they don't need to be created all the

just to be clear, the Factories are reused, but if we wanted one
"UberFactory" class to be able to return any arbitrary Tokenizer specfied
in the config, the reflection would have to be for the Tokenizer classes

the factories aren't singletons, becuase you might want to use them for
multiple fields with differnet configurations.



-Hoss

Re: add CJKTokenizer to solr

2007-06-22 Thread Mike Klaas


On 21-Jun-07, at 10:22 PM, Chris Hostetter wrote:



like i said though: i'm in favore of factories like this ... i just  
don't

think we should do anything to hide their use and make refering to
Tokenizer or TOkenFilter class names directly use reflection magicly.


What would be the best way to not hide their use?

Re: add CJKTokenizer to solr

2007-06-22 Thread Chris Hostetter


: What would be the best way to not hide their use?
:
: 

How about just...

 



-Hoss

RE: Multi-language Tokenizers / Filters recommended?

2007-06-22 Thread Teruhiko Kurosaka

Hi Daniel,
As you know, Chinese and Japanese does not use
space or any other delimiters to break words.
To overcome this problem, CJKTokenizer uses a method
called bi-gram where the run of ideographic (=Chinese) 
characters are made into tokens of two neighboring
characters.  So a run of five characters ABCDE
will result in four tokens AB, BC, CD, and DE.

So search for "BC" will hits this text,
even if AB is a word and CD is another word.
That is, it increases the noise in the hits.
I don't know how much real problem it would be
for Chinese.  But for Japanese, my native language,
this is a problem. Because of this, search result
for Kyoto will include false hits of documents
that incldue Tokyoto, i.e. Tokyo prefecture.

There is another method called morphological
analysis, which uses dictionaries and grammer
rules to break down text into real words.  You
might want to consider this method. 

-kuro

Use Windows 1252 encoding...

2007-06-22 Thread escher2k


Is it possible to use Windows 1252 encoding instead of UTF-8 for Solr ? The
application runs
on Linux/JDK 1.5. We are using PHP for the front end. The problem we are
having is that some
characters are displayed weirdly owing to the encoding.

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Use-Windows-1252-encoding...-tf3967676.html#a11262259
Sent from the Solr - User mailing list archive at Nabble.com.

Re: page rank

2007-06-22 Thread Nick Jenkin


Hi David

1)  you will have to re-add the documents, solr does not support an
update operation (only add/del)

2) same as above, solr does not support an update operation, you will
need to re-add the document with the updated numberField, if its any
help I have a popularity field in my index (3 million documents) which
gets updated daily with no performance issues.

3) What query handler are you using, dismax or standard?
dismax is when you send keywords and a lucene query is generated
standard is when you create your own lucene query

-Nick

On 6/23/07, David Xiao <[EMAIL PROTECTED]> wrote:

I have a few more questions base on your kindly replies to my first question.

1. My solr instance already indexed hundreds of thousands of documents, so how can I 
update these documents to add new field "numberField"

2. In runtime, my application might want to update value of "numberField" very 
frequency. How to achieve that via solr? Is that performance critical if many documents 
need to be updated?

3. Even I have check below wiki page for FunctionQuery, it is still not clear 
to me to understand this quoted words:
"
> In terms of score which RequestHandler are you planning to use?
> If using dismax you can define a boost function:
> recip(rord(numberField),1,1000,1000)
"
With it, how to let solr take into consideration of this numberField (kind of 
popularity factor)?
Would it be possible to give me an example please?


Best Regards,
David




-Original Message-
From: Nick Jenkin [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 21, 2007 6:30 AM
To: solr-user@lucene.apache.org
Subject: Re: page rank

Also if you are using the standard request handler you can use the "val" hack:

foo:"bar" _val_:"recip(rord(numberField),1,1000,1000)"

You can find more info about this here:
http://wiki.apache.org/solr/FunctionQuery

-Nick

On 6/21/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote:
> Hi David.
>
> Yes you can.
>
> Just define a field as a slong type field:
>
> 
>
> It can be used to sort (&sort=numberField desc) or to boost your score (it
> will depend on the RequestHandler you are going to use).
>
> In terms of score which RequestHandler are you planning to use?
> If using dismax you can define a boost function:
> recip(rord(numberField),1,1000,1000)
>
> I hope it helps.
>
> Regards,
> Daniel Alheiros
>
> On 20/6/07 16:47, "David Xiao" <[EMAIL PROTECTED]> wrote:
>
> > Hello folks,
> >
> >
> >
> > I am using solr to index web contents. I want to know is that possible to 
tell
> > solr about rank information of contents?
> >
> > For example, I give each content an integer number.
> >
> >
> >
> > And I hope solr take this number into consideration when it generates search
> > result. (larger number, more priority)
> >
> >
> >
> > Best Regards,
> >
> > David
> >
>
>
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance 
on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
>

Re: Use Windows 1252 encoding...

2007-06-22 Thread Nick Jenkin


Have you tried using the PHP functions utf8_decode/utf8_encode?

As far as I understand only UTF8 is supported (but I could be wrong on that!)
-Nick
On 6/23/07, escher2k <[EMAIL PROTECTED]> wrote:


Is it possible to use Windows 1252 encoding instead of UTF-8 for Solr ? The
application runs
on Linux/JDK 1.5. We are using PHP for the front end. The problem we are
having is that some
characters are displayed weirdly owing to the encoding.

Thanks.
--
View this message in context: 
http://www.nabble.com/Use-Windows-1252-encoding...-tf3967676.html#a11262259
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Use Windows 1252 encoding...

2007-06-22 Thread Chris Hostetter


: Is it possible to use Windows 1252 encoding instead of UTF-8 for Solr ? The

not at the moment...

https://issues.apache.org/jira/browse/SOLR-96



-Hoss

Re: Conceptual Question

2007-06-22 Thread Chris Hostetter


: > There is a change interface in JIRA, as long as all of the fields
: > originally sent are stored.
:
: Do you remember the JIRA issue, or a token to find it ? It sounds useful
: in some cases, for example, when you are working on analysers. That
: could be real life for me in future.

https://issues.apache.org/jira/browse/SOLR-139


-Hoss

Re: Conceptual Question

Re: add CJKTokenizer to solr

Re: add CJKTokenizer to solr

Re: add CJKTokenizer to solr

Re: add CJKTokenizer to solr

RE: page rank

RE: add CJKTokenizer to solr

Re: add CJKTokenizer to solr

Re: add CJKTokenizer to solr

Re: add CJKTokenizer to solr

RE: Multi-language Tokenizers / Filters recommended?

Use Windows 1252 encoding...

Re: page rank

Re: Use Windows 1252 encoding...

Re: Use Windows 1252 encoding...

Re: Conceptual Question

16 matches

Site Navigation

Mail list logo

Footer information