date:20080505

multi-language searching with Solr

2008-05-05 Thread Eli K

Hello folks,

Let me start by saying that I am new to Lucene and Solr.

I am in the process of designing a search back-end for a system that
receives 20k documents a day and needs to keep them available for 30
days.  The documents should be searchable on a free text field and on
about 8 other fields.

One of my requirements is to index and search documents in multiple
languages.  I would like to have the ability to stem and provide the advanced
search features that are based on it.  This will only affect the free
text field because
the rest of the fields are in English.

I can find out the language of the document before indexing and I
might be able to
provide the language to search on.  I also need to have the ability to
search across all
indexed languages (there will be 20 in total).

Given these requirements do you think this is doable with Solr?  A
major limiting factor
is that I need to stick to the 1.2 GA version and I cannot utilize the
multi-core features in
the 1.3 trunk.

I considered writing my own analyzer that will call the appropriate
Lucene analyzer for the given language
but I did not see any way for it to access the field that specifies
the language of the document.

Thanks,

Eli

p.s. I am looking for an experienced Lucene/Solr consultant to help
with the design of this system.

Re[2]: definition of field types?

2008-05-05 Thread JLIST

Thanks Otis. The schema.xml actually explains it very well!

> A good place to look is the Wiki.  Look for "Analyzer" substring on the main 
> Solr wiki page.

>> I must be overlooking ... where can I find definitions of
>> the built-in types such as textTight, text_ws, etc?

custom queries via plugins?

2008-05-05 Thread Phillip Rhodes

I am currently using lucene directly to build custom queries.  Can I write a 
plugin to build these custom BooleanQueries, RangeQueries, etc...?  As a simple 
example, we have documents that represent coupons, events and activities.  Some 
searches may only be for coupons and events. Currently, I programmatically 
build up a boolean query for this.  I wanted to know if I could still do this 
with solr.

I just wanted to get a little bit of validation before investing a few hours 
into actually trying to use solr.  I have been reading the tutorials, docs, but 
while I suspect that solr exposes the lucene query via plugins, I have not seen 
this spelled out (but I'm a bad speller;)



Thank you for your time.

Phillip

RE: dismax query handler ignoring qf entirely!

2008-05-05 Thread Ezra Epstein

I think the problem is that 'cat' is of type 'string' and we're querying
as though it was type 
'text'.  We get expected results only when we quote the query string,
otherwise the query string is goes through stemming and, after that, no
longer quite matches the literal string in the 'cat' field.  Is that
possible bug in the filter logic- shouldn't both the original query
string and the stemmed version get through, or is that a feature and we
must supply quotes on a string to by-pass stemming?

Thanks,

Ezra

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 01, 2008 5:27 PM
To: solr-user@lucene.apache.org
Subject: Re: dismax query handler ignoring qf entirely!

Unless I'm not understanding what you are saying, then no, this is not
expected behaviour - DisMax doesn't rely on one copying the actual field
data to a "text" field.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Ezra Epstein <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, May 2, 2008 1:06:52 AM
> Subject: dismax query handler ignoring qf entirely!
> 
> It appears as though the DisMax query handler is ignoring our qf
> settings and only searching the "text" field as defined in the
>  element of the schema.xml file.  Thus if a field
> exists and is indexed it is not being searched unless its contents
were
> copied to the "text" field.  Is that corrected/expected behavior?
> 
>  
> 
> I can provide config details and sample query results if that's
helpful.
> 
>  
> 
> Thanks,
> 
>  
> 
> Ezra Epstein
> 
>

RE: multi-language searching with Solr

2008-05-05 Thread Binkley, Peter

I think you would have to declare a separate field for each language
(freetext_en, freetext_fr, etc.), each with its own appropriate
stemming. Your ingestion process would have to assign the free text
content for each document to the appropriate field; so, for each
document, only one of the freetext fields would be populated. At search
time, you would either search against the appropriate field if you know
the search language, or search across them with "freetext_fr:query OR
freetext_en:query OR ...". That way your query will be interpreted by
each language field using that language's stemming rules. 

Other options for combining indexes, such as copyfield or dynamic fields
(see http://wiki.apache.org/solr/SchemaXml), would lead to a single
field type and therefore a single type of stemming. You could always use
copyfield to create an unstemmed common index, if you don't care about
stemming when you search across languages (since you're likely to get
odd results when a query in one language is stemmed according to the
rules of another language).

Peter

-Original Message-
From: Eli K [mailto:[EMAIL PROTECTED] 
Sent: Monday, May 05, 2008 8:27 AM
To: solr-user@lucene.apache.org
Subject: multi-language searching with Solr

Hello folks,

Let me start by saying that I am new to Lucene and Solr.

I am in the process of designing a search back-end for a system that
receives 20k documents a day and needs to keep them available for 30
days.  The documents should be searchable on a free text field and on
about 8 other fields.

One of my requirements is to index and search documents in multiple
languages.  I would like to have the ability to stem and provide the
advanced search features that are based on it.  This will only affect
the free text field because the rest of the fields are in English.

I can find out the language of the document before indexing and I might
be able to provide the language to search on.  I also need to have the
ability to search across all indexed languages (there will be 20 in
total).

Given these requirements do you think this is doable with Solr?  A major
limiting factor is that I need to stick to the 1.2 GA version and I
cannot utilize the multi-core features in the 1.3 trunk.

I considered writing my own analyzer that will call the appropriate
Lucene analyzer for the given language but I did not see any way for it
to access the field that specifies the language of the document.

Thanks,

Eli

p.s. I am looking for an experienced Lucene/Solr consultant to help with
the design of this system.

Re: multi-language searching with Solr

2008-05-05 Thread Eli K

Wouldn't this impact both indexing and search performance and the size
of the index?
It is also probable that I will have more then one free text fields
later on and with at least 20 languages this approach does not seem
very manageable.  Are there other options for making this work with
stemming?

Thanks,

Eli


On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter
<[EMAIL PROTECTED]> wrote:
> I think you would have to declare a separate field for each language
>  (freetext_en, freetext_fr, etc.), each with its own appropriate
>  stemming. Your ingestion process would have to assign the free text
>  content for each document to the appropriate field; so, for each
>  document, only one of the freetext fields would be populated. At search
>  time, you would either search against the appropriate field if you know
>  the search language, or search across them with "freetext_fr:query OR
>  freetext_en:query OR ...". That way your query will be interpreted by
>  each language field using that language's stemming rules.
>
>  Other options for combining indexes, such as copyfield or dynamic fields
>  (see http://wiki.apache.org/solr/SchemaXml), would lead to a single
>  field type and therefore a single type of stemming. You could always use
>  copyfield to create an unstemmed common index, if you don't care about
>  stemming when you search across languages (since you're likely to get
>  odd results when a query in one language is stemmed according to the
>  rules of another language).
>
>  Peter
>
>
>
>  -Original Message-
>  From: Eli K [mailto:[EMAIL PROTECTED]
>  Sent: Monday, May 05, 2008 8:27 AM
>  To: solr-user@lucene.apache.org
>  Subject: multi-language searching with Solr
>
>  Hello folks,
>
>  Let me start by saying that I am new to Lucene and Solr.
>
>  I am in the process of designing a search back-end for a system that
>  receives 20k documents a day and needs to keep them available for 30
>  days.  The documents should be searchable on a free text field and on
>  about 8 other fields.
>
>  One of my requirements is to index and search documents in multiple
>  languages.  I would like to have the ability to stem and provide the
>  advanced search features that are based on it.  This will only affect
>  the free text field because the rest of the fields are in English.
>
>  I can find out the language of the document before indexing and I might
>  be able to provide the language to search on.  I also need to have the
>  ability to search across all indexed languages (there will be 20 in
>  total).
>
>  Given these requirements do you think this is doable with Solr?  A major
>  limiting factor is that I need to stick to the 1.2 GA version and I
>  cannot utilize the multi-core features in the 1.3 trunk.
>
>  I considered writing my own analyzer that will call the appropriate
>  Lucene analyzer for the given language but I did not see any way for it
>  to access the field that specifies the language of the document.
>
>  Thanks,
>
>  Eli
>
>  p.s. I am looking for an experienced Lucene/Solr consultant to help with
>  the design of this system.
>
>

Re: multi-language searching with Solr

2008-05-05 Thread Erick Erickson

You might want to bounce over to the Lucene user's list and search
for language. This topic has arisen many times and there's some good
discussion. And have you searched the solr users list of "language"? I
know it's turned up here as well.

Best
Erick

On Mon, May 5, 2008 at 4:28 PM, Eli K <[EMAIL PROTECTED]> wrote:

> Wouldn't this impact both indexing and search performance and the size
> of the index?
> It is also probable that I will have more then one free text fields
> later on and with at least 20 languages this approach does not seem
> very manageable.  Are there other options for making this work with
> stemming?
>
> Thanks,
>
> Eli
>
>
> On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter
> <[EMAIL PROTECTED]> wrote:
> > I think you would have to declare a separate field for each language
> >  (freetext_en, freetext_fr, etc.), each with its own appropriate
> >  stemming. Your ingestion process would have to assign the free text
> >  content for each document to the appropriate field; so, for each
> >  document, only one of the freetext fields would be populated. At search
> >  time, you would either search against the appropriate field if you know
> >  the search language, or search across them with "freetext_fr:query OR
> >  freetext_en:query OR ...". That way your query will be interpreted by
> >  each language field using that language's stemming rules.
> >
> >  Other options for combining indexes, such as copyfield or dynamic
> fields
> >  (see http://wiki.apache.org/solr/SchemaXml), would lead to a single
> >  field type and therefore a single type of stemming. You could always
> use
> >  copyfield to create an unstemmed common index, if you don't care about
> >  stemming when you search across languages (since you're likely to get
> >  odd results when a query in one language is stemmed according to the
> >  rules of another language).
> >
> >  Peter
> >
> >
> >
> >  -Original Message-
> >  From: Eli K [mailto:[EMAIL PROTECTED]
> >  Sent: Monday, May 05, 2008 8:27 AM
> >  To: solr-user@lucene.apache.org
> >  Subject: multi-language searching with Solr
> >
> >  Hello folks,
> >
> >  Let me start by saying that I am new to Lucene and Solr.
> >
> >  I am in the process of designing a search back-end for a system that
> >  receives 20k documents a day and needs to keep them available for 30
> >  days.  The documents should be searchable on a free text field and on
> >  about 8 other fields.
> >
> >  One of my requirements is to index and search documents in multiple
> >  languages.  I would like to have the ability to stem and provide the
> >  advanced search features that are based on it.  This will only affect
> >  the free text field because the rest of the fields are in English.
> >
> >  I can find out the language of the document before indexing and I might
> >  be able to provide the language to search on.  I also need to have the
> >  ability to search across all indexed languages (there will be 20 in
> >  total).
> >
> >  Given these requirements do you think this is doable with Solr?  A
> major
> >  limiting factor is that I need to stick to the 1.2 GA version and I
> >  cannot utilize the multi-core features in the 1.3 trunk.
> >
> >  I considered writing my own analyzer that will call the appropriate
> >  Lucene analyzer for the given language but I did not see any way for it
> >  to access the field that specifies the language of the document.
> >
> >  Thanks,
> >
> >  Eli
> >
> >  p.s. I am looking for an experienced Lucene/Solr consultant to help
> with
> >  the design of this system.
> >
> >
>

RE: multi-language searching with Solr

2008-05-05 Thread Binkley, Peter

It won't make much difference to the index size, since you'll only be
populating one of the language fields for each document, and empty
fields cost nothing. The performance may suffer a bit but Lucene may
surprise you with how good it is with that kind of boolean query. 

I agree that as the number of fields and languages increases, this is
going to become a lot to manage. But you're up against some basic
problems when you try to model this in Solr: for each token, you care
about not just its value (which is all Lucene cares about) but also its
language and its stem; and the stem for a given token depends on the
language (different stemming rules); and at query time you may not know
the language. I don't think you're going to get a solution without some
redundancy; but solving problems by adding redundant fields is a common
method in Solr.

Peter


-Original Message-
From: Eli K [mailto:[EMAIL PROTECTED] 
Sent: Monday, May 05, 2008 2:28 PM
To: solr-user@lucene.apache.org
Subject: Re: multi-language searching with Solr

Wouldn't this impact both indexing and search performance and the size
of the index?
It is also probable that I will have more then one free text fields
later on and with at least 20 languages this approach does not seem very
manageable.  Are there other options for making this work with stemming?

Thanks,

Eli


On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter
<[EMAIL PROTECTED]> wrote:
> I think you would have to declare a separate field for each language  
> (freetext_en, freetext_fr, etc.), each with its own appropriate  
> stemming. Your ingestion process would have to assign the free text  
> content for each document to the appropriate field; so, for each  
> document, only one of the freetext fields would be populated. At 
> search  time, you would either search against the appropriate field if

> you know  the search language, or search across them with 
> "freetext_fr:query OR  freetext_en:query OR ...". That way your query 
> will be interpreted by  each language field using that language's
stemming rules.
>
>  Other options for combining indexes, such as copyfield or dynamic 
> fields  (see http://wiki.apache.org/solr/SchemaXml), would lead to a 
> single  field type and therefore a single type of stemming. You could 
> always use  copyfield to create an unstemmed common index, if you 
> don't care about  stemming when you search across languages (since 
> you're likely to get  odd results when a query in one language is 
> stemmed according to the  rules of another language).
>
>  Peter
>
>
>
>  -Original Message-
>  From: Eli K [mailto:[EMAIL PROTECTED]
>  Sent: Monday, May 05, 2008 8:27 AM
>  To: solr-user@lucene.apache.org
>  Subject: multi-language searching with Solr
>
>  Hello folks,
>
>  Let me start by saying that I am new to Lucene and Solr.
>
>  I am in the process of designing a search back-end for a system that

> receives 20k documents a day and needs to keep them available for 30  
> days.  The documents should be searchable on a free text field and on

> about 8 other fields.
>
>  One of my requirements is to index and search documents in multiple  
> languages.  I would like to have the ability to stem and provide the  
> advanced search features that are based on it.  This will only affect

> the free text field because the rest of the fields are in English.
>
>  I can find out the language of the document before indexing and I 
> might  be able to provide the language to search on.  I also need to 
> have the  ability to search across all indexed languages (there will 
> be 20 in  total).
>
>  Given these requirements do you think this is doable with Solr?  A 
> major  limiting factor is that I need to stick to the 1.2 GA version 
> and I  cannot utilize the multi-core features in the 1.3 trunk.
>
>  I considered writing my own analyzer that will call the appropriate  
> Lucene analyzer for the given language but I did not see any way for 
> it  to access the field that specifies the language of the document.
>
>  Thanks,
>
>  Eli
>
>  p.s. I am looking for an experienced Lucene/Solr consultant to help 
> with  the design of this system.
>
>

Re: multi-language searching with Solr

2008-05-05 Thread Eli K

I searched the Solr list but not as much the Lucene list.  I will look
again to see if there is something there that might work with Solr.  I
rather leverage Solr, but if I have no choice I will to do this using
Lucene only.

Thanks,

Eli

On Mon, May 5, 2008 at 4:58 PM, Erick Erickson <[EMAIL PROTECTED]> wrote:
> You might want to bounce over to the Lucene user's list and search
>  for language. This topic has arisen many times and there's some good
>  discussion. And have you searched the solr users list of "language"? I
>  know it's turned up here as well.
>
>  Best
>  Erick
>
>
>
>  On Mon, May 5, 2008 at 4:28 PM, Eli K <[EMAIL PROTECTED]> wrote:
>
>  > Wouldn't this impact both indexing and search performance and the size
>  > of the index?
>  > It is also probable that I will have more then one free text fields
>  > later on and with at least 20 languages this approach does not seem
>  > very manageable.  Are there other options for making this work with
>  > stemming?
>  >
>  > Thanks,
>  >
>  > Eli
>  >
>  >
>  > On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter
>  > <[EMAIL PROTECTED]> wrote:
>  > > I think you would have to declare a separate field for each language
>  > >  (freetext_en, freetext_fr, etc.), each with its own appropriate
>  > >  stemming. Your ingestion process would have to assign the free text
>  > >  content for each document to the appropriate field; so, for each
>  > >  document, only one of the freetext fields would be populated. At search
>  > >  time, you would either search against the appropriate field if you know
>  > >  the search language, or search across them with "freetext_fr:query OR
>  > >  freetext_en:query OR ...". That way your query will be interpreted by
>  > >  each language field using that language's stemming rules.
>  > >
>  > >  Other options for combining indexes, such as copyfield or dynamic
>  > fields
>  > >  (see http://wiki.apache.org/solr/SchemaXml), would lead to a single
>  > >  field type and therefore a single type of stemming. You could always
>  > use
>  > >  copyfield to create an unstemmed common index, if you don't care about
>  > >  stemming when you search across languages (since you're likely to get
>  > >  odd results when a query in one language is stemmed according to the
>  > >  rules of another language).
>  > >
>  > >  Peter
>  > >
>  > >
>  > >
>  > >  -Original Message-
>  > >  From: Eli K [mailto:[EMAIL PROTECTED]
>  > >  Sent: Monday, May 05, 2008 8:27 AM
>  > >  To: solr-user@lucene.apache.org
>  > >  Subject: multi-language searching with Solr
>  > >
>  > >  Hello folks,
>  > >
>  > >  Let me start by saying that I am new to Lucene and Solr.
>  > >
>  > >  I am in the process of designing a search back-end for a system that
>  > >  receives 20k documents a day and needs to keep them available for 30
>  > >  days.  The documents should be searchable on a free text field and on
>  > >  about 8 other fields.
>  > >
>  > >  One of my requirements is to index and search documents in multiple
>  > >  languages.  I would like to have the ability to stem and provide the
>  > >  advanced search features that are based on it.  This will only affect
>  > >  the free text field because the rest of the fields are in English.
>  > >
>  > >  I can find out the language of the document before indexing and I might
>  > >  be able to provide the language to search on.  I also need to have the
>  > >  ability to search across all indexed languages (there will be 20 in
>  > >  total).
>  > >
>  > >  Given these requirements do you think this is doable with Solr?  A
>  > major
>  > >  limiting factor is that I need to stick to the 1.2 GA version and I
>  > >  cannot utilize the multi-core features in the 1.3 trunk.
>  > >
>  > >  I considered writing my own analyzer that will call the appropriate
>  > >  Lucene analyzer for the given language but I did not see any way for it
>  > >  to access the field that specifies the language of the document.
>  > >
>  > >  Thanks,
>  > >
>  > >  Eli
>  > >
>  > >  p.s. I am looking for an experienced Lucene/Solr consultant to help
>  > with
>  > >  the design of this system.
>  > >
>  > >
>  >
>

Re: Help optimizing

2008-05-05 Thread Mike Klaas


On 3-May-08, at 10:06 AM, Daniel Andersson wrote:


Our database/index is 3.5 GB and contains 4,352,471 documents. Most  
documents are less than 1 kb. When performing a search, the results  
vary between 1.5 seconds up to 60 seconds.


I don't have a big problem with 1.5 seconds (even though below 1  
would be nice), but 60 seconds it just.. well, scary.


That is too long, and shouldn't be happening.



How do I optimize Solr to better use all the RAM? I'm using java6,  
64bit version, and start Solr using:

java -Xmx7500M -Xms4096M -jar start.jar

But according to top it only seems to be using 7.7% of the memory  
(around 600 MB).


Don't try to give Solr _all_ the memory on the system.  Solr depends  
on the index existing in the OS's disk cache (this is "cached" in  
top).  You should have at least 2 GB memory for a 3.5GB index,  
depending on how much of the index is stored (best is of course to  
have 3.5GB available so it can be cached completely).


Solr will require a wide distribution of queries to "warm up" (get the  
index in the OS disk cache).   This is automatically prioritize the  
"hot spots" in the index.  If you want to load the whole thing 'cd  
datadir; cat * > /dev/null' works, but I don't recommend relying on  
that.


Most queries are for make_id + model_id or city + state and almost  
all of the queries are ordered by datetime_found (newest -> oldest).


How many documents match, typically?  How many documents are returned,  
typically?  How often do you commit() [I suspect frequently, based on  
the problems you are having]?


-Mike

Re: Tokenize integers?

2008-05-05 Thread Mike Klaas

Just use fieldType="string", and send them to solr in a multivalued  
fashion:


1133name="blah">999


Search:

blah:133
+blah:999 +blah:1 [both must match]

Just treat the numbers as untokenized text.

-Mike


On 4-May-08, at 2:30 AM, [EMAIL PROTECTED] wrote:

Ok, thanks. However I am still abit confused. Since I know that  
these are only integers, can't I somehow make solr to use  
solr.IntField or solr.SortableIntField, but still tokenize like  
this? I tried the configuration below but changed TextField to  
IntField and indexed the document again, but then the search didn't  
work...


This is what I use now (after your suggestion):

   
 
 
 
 
 
 
 
 
   

This works great when searching. But when I get the document back, I  
see that the stored value is still the comma separated values. ie:


...
3,5
...

I would have liked it like this instead:

...
3
5
...

Is this possible with solr by some configuration? Am I really the  
only one that would like this behaivor?


/Jimi

Quoting Otis Gospodnetic <[EMAIL PROTECTED]>:


I think you are after  
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089

Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 

From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, May 3, 2008 11:57:37 PM
Subject: Tokenize integers?

Hi,

What is the recommended way to configure a fieldtype for a field  
that

looks like this in the source system?

categoryIds=1,325,488

The order of these id's are not important. I want to be able to  
fetch
all the id's, separately, ie I want them to be stored as  
multivalue, I
guess... And I also want to be able to search on the individual  
id's,
or combinations (for example search for all articles with category  
id

1 and 488).

I know I can index this as multiple categoryId fields (and have them
as int or sint type), but that means I need to write preprocessing  
on

the "client" side. I would prefer a server side fix, so that the
client can send the xml like this:

...
1,325,488
...

And then the server (ie solr) will transform this into a multivalue
int/sint field, using tokenizing or whatever it is called (or is
tokenizing not performed on the stored value?).

What are your suggestions? Maybe this is already documented in the
wiki or someplace else? I have searched for this, but not found
anything that helps.

Regards
/Jimi

Re: Re[2]: startsWith?

2008-05-05 Thread Mike Klaas



On 3-May-08, at 10:44 PM, JLIST wrote:


Hello Otis,

Do you mean that if I index the URL as a "text" field, I'll
be able to do * for a given prefix because the text will be
tokenized at the "/" and should suffice for my need?


I'm not sure what your needs are, but I use the following to index urls:


  


  


(in which is stored the _reversed domain_.  That is, "com.example.www")

I also store the url as a textTight (see example schema).  If you want  
to do prefix matching on the url,  I recommend storing it untokenized  
in another field (or minimal tokenization, like lowercasing).


If, like me, you want to restrict document to a certain domain and  
subdomains, you have to be careful with your query:


reverse_domain:com.example reverse_domain:com.example.*

If you just do reverse_domain:com.example*, you will also match www.foo-example.com 
, which you don't want.


-Mike

Re: custom queries via plugins?

2008-05-05 Thread Otis Gospodnetic

I'm not sure if you are after a custom query parsing component, but if that is 
that you are after, start by looking at these:

$ ff \*QParser\*java 
./src/test/org/apache/solr/search/FooQParserPlugin.java
./src/java/org/apache/solr/search/LuceneQParserPlugin.java<== here
./src/java/org/apache/solr/search/DisMaxQParserPlugin.java
./src/java/org/apache/solr/search/PrefixQParserPlugin.java
./src/java/org/apache/solr/search/QParser.java<=== 
here
./src/java/org/apache/solr/search/RawQParserPlugin.java
./src/java/org/apache/solr/search/FieldQParserPlugin.java
./src/java/org/apache/solr/search/BoostQParserPlugin.java
./src/java/org/apache/solr/search/NestedQParserPlugin.java
./src/java/org/apache/solr/search/FunctionQParser.java
./src/java/org/apache/solr/search/QParserPlugin.java<== here
./src/java/org/apache/solr/search/FunctionQParserPlugin.java
./src/java/org/apache/solr/search/OldLuceneQParserPlugin.java

+ SolrQueryParser.java

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Phillip Rhodes <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, May 5, 2008 1:34:51 PM
> Subject: custom queries via plugins?
> 
> I am currently using lucene directly to build custom queries.  Can I write a 
> plugin to build these custom BooleanQueries, RangeQueries, etc...?  As a 
> simple 
> example, we have documents that represent coupons, events and activities.  
> Some 
> searches may only be for coupons and events. Currently, I programmatically 
> build 
> up a boolean query for this.  I wanted to know if I could still do this with 
> solr.
> 
> I just wanted to get a little bit of validation before investing a few hours 
> into actually trying to use solr.  I have been reading the tutorials, docs, 
> but 
> while I suspect that solr exposes the lucene query via plugins, I have not 
> seen 
> this spelled out (but I'm a bad speller;)
> 
> 
> 
> Thank you for your time.
> 
> Phillip
> 
>

Multiple SpellCheckRequestHandlers

2008-05-05 Thread solr_user


Hi all,

  Is it possible in Solr to have multiple SpellCheckRequestHandlers.  In my
application I have got two different spell check indexes.  I want the spell
checker to check for a spelling suggestion in the first index and if it
fails to get any suggestion from the first index only then it should try to
get a suggestion from the second index.  
  
  Is it possible to have a separate SpellCheckRequestHandler one for each
index?

Solr-User


-- 
View this message in context: 
http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tokenize integers?

2008-05-05 Thread Chris Hostetter


: Just use fieldType="string", and send them to solr in a multivalued fashion:
: 
: 1133999

But as the OP said: that requires preprocessing -- it would be nice if 
Solr would make this easier for you.

I've had some ideas in the back of my mind for a while now that:
  1) schema.xml should support something analyzer-chain-esque for 
processing the "stored" value of a field.
  2) it should be easy to make #1 either apply just to the stored value 
independent of the indexed value, or be applied prior to the "index" 
analyzer to the 
  3) we should change IndexSchema to respect  for all the 
fieldtypes, not just TextField.

...then people could configure all sorts of interesting behavior like "i 
want fieldtypeA to be a SortableInt, but if someone indexes a comma 
seperated list of numbers to the right thing".  

I *think* #2 could probably be achieved really easily using the TeeFilter 
and the SinkTokenizer (but i haven't actually played with them to be sure)

(too many ideas, too little time)


-Hoss

Re: Multiple SpellCheckRequestHandlers

2008-05-05 Thread Otis Gospodnetic

Yes, just define two instances (with two distinct names) in solrconfig.xml and 
point each of them to a different index.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: solr_user <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 12:16:07 AM
> Subject: Multiple SpellCheckRequestHandlers
> 
> 
> Hi all,
> 
>   Is it possible in Solr to have multiple SpellCheckRequestHandlers.  In my
> application I have got two different spell check indexes.  I want the spell
> checker to check for a spelling suggestion in the first index and if it
> fails to get any suggestion from the first index only then it should try to
> get a suggestion from the second index.  
>   
>   Is it possible to have a separate SpellCheckRequestHandler one for each
> index?
> 
> Solr-User
> 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
>

Re: Tokenize integers?

2008-05-05 Thread Mike Klaas



On 5-May-08, at 9:19 PM, Chris Hostetter wrote:



: Just use fieldType="string", and send them to solr in a  
multivalued fashion:

:
: 1133field>
: name="blah">999

But as the OP said: that requires preprocessing -- it would be nice if
Solr would make this easier for you.


Oh I see, I misinterpreted "multiple categoryId fields".  I agree that  
it would be nice to have a Solr stored field processor.  While it is  
usually possible to do arbitrary transformations before getting to  
Solr, it is nice to be able to encode as much information as possible  
in the Solr config.


-Mike

Your valuable suggestion on autocomplete

2008-05-05 Thread Rantjil Bould

Hi Group,
 I have already got some valuable suggestions from group. Based
on that, I have come out with following process to finally implement
autocomplete like fetaure in my system
1- Index the whole documents
2- Extract all terms using indexReader's terms() method

I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
to get absolute terms i.e. vlanand. The field definition in solr is


  






  
  







  


Would appreciate your input to get absolute terms??

3- For each term, extract documents containing those term using termDocs()
method
4- Create one more index with fields, term, frequency and docNo. This index
would be used for autocomplete feature.
5- Any letter typed by user in search field, use Ajax script (like
scriptaculous or JQuery) to extract all terms using prefix query.
6- Based on search term selected by user, keep track of document nos in
which this term belongs.
7- For next search term selection using documents nos to select all terms
excluding currently selected term.

This somehow works. As new to SOlr ans also to Lucene, I would like to know
in case it can be improved?

- RB

multi-language searching with Solr

Re[2]: definition of field types?

custom queries via plugins?

RE: dismax query handler ignoring qf entirely!

RE: multi-language searching with Solr

Re: multi-language searching with Solr

Re: multi-language searching with Solr

RE: multi-language searching with Solr

Re: multi-language searching with Solr

Re: Help optimizing

Re: Tokenize integers?

Re: Re[2]: startsWith?

Re: custom queries via plugins?

Multiple SpellCheckRequestHandlers

Re: Tokenize integers?

Re: Multiple SpellCheckRequestHandlers

Re: Tokenize integers?

Your valuable suggestion on autocomplete

18 matches

Site Navigation

Mail list logo

Footer information