multi-language searching with Solr
Hello folks, Let me start by saying that I am new to Lucene and Solr. I am in the process of designing a search back-end for a system that receives 20k documents a day and needs to keep them available for 30 days. The documents should be searchable on a free text field and on about 8 other fields. One of my requirements is to index and search documents in multiple languages. I would like to have the ability to stem and provide the advanced search features that are based on it. This will only affect the free text field because the rest of the fields are in English. I can find out the language of the document before indexing and I might be able to provide the language to search on. I also need to have the ability to search across all indexed languages (there will be 20 in total). Given these requirements do you think this is doable with Solr? A major limiting factor is that I need to stick to the 1.2 GA version and I cannot utilize the multi-core features in the 1.3 trunk. I considered writing my own analyzer that will call the appropriate Lucene analyzer for the given language but I did not see any way for it to access the field that specifies the language of the document. Thanks, Eli p.s. I am looking for an experienced Lucene/Solr consultant to help with the design of this system.
Re[2]: definition of field types?
Thanks Otis. The schema.xml actually explains it very well! > A good place to look is the Wiki. Look for "Analyzer" substring on the main > Solr wiki page. >> I must be overlooking ... where can I find definitions of >> the built-in types such as textTight, text_ws, etc?
custom queries via plugins?
I am currently using lucene directly to build custom queries. Can I write a plugin to build these custom BooleanQueries, RangeQueries, etc...? As a simple example, we have documents that represent coupons, events and activities. Some searches may only be for coupons and events. Currently, I programmatically build up a boolean query for this. I wanted to know if I could still do this with solr. I just wanted to get a little bit of validation before investing a few hours into actually trying to use solr. I have been reading the tutorials, docs, but while I suspect that solr exposes the lucene query via plugins, I have not seen this spelled out (but I'm a bad speller;) Thank you for your time. Phillip
RE: dismax query handler ignoring qf entirely!
I think the problem is that 'cat' is of type 'string' and we're querying as though it was type 'text'. We get expected results only when we quote the query string, otherwise the query string is goes through stemming and, after that, no longer quite matches the literal string in the 'cat' field. Is that possible bug in the filter logic- shouldn't both the original query string and the stemmed version get through, or is that a feature and we must supply quotes on a string to by-pass stemming? Thanks, Ezra -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, May 01, 2008 5:27 PM To: solr-user@lucene.apache.org Subject: Re: dismax query handler ignoring qf entirely! Unless I'm not understanding what you are saying, then no, this is not expected behaviour - DisMax doesn't rely on one copying the actual field data to a "text" field. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Ezra Epstein <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, May 2, 2008 1:06:52 AM > Subject: dismax query handler ignoring qf entirely! > > It appears as though the DisMax query handler is ignoring our qf > settings and only searching the "text" field as defined in the > element of the schema.xml file. Thus if a field > exists and is indexed it is not being searched unless its contents were > copied to the "text" field. Is that corrected/expected behavior? > > > > I can provide config details and sample query results if that's helpful. > > > > Thanks, > > > > Ezra Epstein > >
RE: multi-language searching with Solr
I think you would have to declare a separate field for each language (freetext_en, freetext_fr, etc.), each with its own appropriate stemming. Your ingestion process would have to assign the free text content for each document to the appropriate field; so, for each document, only one of the freetext fields would be populated. At search time, you would either search against the appropriate field if you know the search language, or search across them with "freetext_fr:query OR freetext_en:query OR ...". That way your query will be interpreted by each language field using that language's stemming rules. Other options for combining indexes, such as copyfield or dynamic fields (see http://wiki.apache.org/solr/SchemaXml), would lead to a single field type and therefore a single type of stemming. You could always use copyfield to create an unstemmed common index, if you don't care about stemming when you search across languages (since you're likely to get odd results when a query in one language is stemmed according to the rules of another language). Peter -Original Message- From: Eli K [mailto:[EMAIL PROTECTED] Sent: Monday, May 05, 2008 8:27 AM To: solr-user@lucene.apache.org Subject: multi-language searching with Solr Hello folks, Let me start by saying that I am new to Lucene and Solr. I am in the process of designing a search back-end for a system that receives 20k documents a day and needs to keep them available for 30 days. The documents should be searchable on a free text field and on about 8 other fields. One of my requirements is to index and search documents in multiple languages. I would like to have the ability to stem and provide the advanced search features that are based on it. This will only affect the free text field because the rest of the fields are in English. I can find out the language of the document before indexing and I might be able to provide the language to search on. I also need to have the ability to search across all indexed languages (there will be 20 in total). Given these requirements do you think this is doable with Solr? A major limiting factor is that I need to stick to the 1.2 GA version and I cannot utilize the multi-core features in the 1.3 trunk. I considered writing my own analyzer that will call the appropriate Lucene analyzer for the given language but I did not see any way for it to access the field that specifies the language of the document. Thanks, Eli p.s. I am looking for an experienced Lucene/Solr consultant to help with the design of this system.
Re: multi-language searching with Solr
Wouldn't this impact both indexing and search performance and the size of the index? It is also probable that I will have more then one free text fields later on and with at least 20 languages this approach does not seem very manageable. Are there other options for making this work with stemming? Thanks, Eli On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter <[EMAIL PROTECTED]> wrote: > I think you would have to declare a separate field for each language > (freetext_en, freetext_fr, etc.), each with its own appropriate > stemming. Your ingestion process would have to assign the free text > content for each document to the appropriate field; so, for each > document, only one of the freetext fields would be populated. At search > time, you would either search against the appropriate field if you know > the search language, or search across them with "freetext_fr:query OR > freetext_en:query OR ...". That way your query will be interpreted by > each language field using that language's stemming rules. > > Other options for combining indexes, such as copyfield or dynamic fields > (see http://wiki.apache.org/solr/SchemaXml), would lead to a single > field type and therefore a single type of stemming. You could always use > copyfield to create an unstemmed common index, if you don't care about > stemming when you search across languages (since you're likely to get > odd results when a query in one language is stemmed according to the > rules of another language). > > Peter > > > > -Original Message- > From: Eli K [mailto:[EMAIL PROTECTED] > Sent: Monday, May 05, 2008 8:27 AM > To: solr-user@lucene.apache.org > Subject: multi-language searching with Solr > > Hello folks, > > Let me start by saying that I am new to Lucene and Solr. > > I am in the process of designing a search back-end for a system that > receives 20k documents a day and needs to keep them available for 30 > days. The documents should be searchable on a free text field and on > about 8 other fields. > > One of my requirements is to index and search documents in multiple > languages. I would like to have the ability to stem and provide the > advanced search features that are based on it. This will only affect > the free text field because the rest of the fields are in English. > > I can find out the language of the document before indexing and I might > be able to provide the language to search on. I also need to have the > ability to search across all indexed languages (there will be 20 in > total). > > Given these requirements do you think this is doable with Solr? A major > limiting factor is that I need to stick to the 1.2 GA version and I > cannot utilize the multi-core features in the 1.3 trunk. > > I considered writing my own analyzer that will call the appropriate > Lucene analyzer for the given language but I did not see any way for it > to access the field that specifies the language of the document. > > Thanks, > > Eli > > p.s. I am looking for an experienced Lucene/Solr consultant to help with > the design of this system. > >
Re: multi-language searching with Solr
You might want to bounce over to the Lucene user's list and search for language. This topic has arisen many times and there's some good discussion. And have you searched the solr users list of "language"? I know it's turned up here as well. Best Erick On Mon, May 5, 2008 at 4:28 PM, Eli K <[EMAIL PROTECTED]> wrote: > Wouldn't this impact both indexing and search performance and the size > of the index? > It is also probable that I will have more then one free text fields > later on and with at least 20 languages this approach does not seem > very manageable. Are there other options for making this work with > stemming? > > Thanks, > > Eli > > > On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter > <[EMAIL PROTECTED]> wrote: > > I think you would have to declare a separate field for each language > > (freetext_en, freetext_fr, etc.), each with its own appropriate > > stemming. Your ingestion process would have to assign the free text > > content for each document to the appropriate field; so, for each > > document, only one of the freetext fields would be populated. At search > > time, you would either search against the appropriate field if you know > > the search language, or search across them with "freetext_fr:query OR > > freetext_en:query OR ...". That way your query will be interpreted by > > each language field using that language's stemming rules. > > > > Other options for combining indexes, such as copyfield or dynamic > fields > > (see http://wiki.apache.org/solr/SchemaXml), would lead to a single > > field type and therefore a single type of stemming. You could always > use > > copyfield to create an unstemmed common index, if you don't care about > > stemming when you search across languages (since you're likely to get > > odd results when a query in one language is stemmed according to the > > rules of another language). > > > > Peter > > > > > > > > -Original Message- > > From: Eli K [mailto:[EMAIL PROTECTED] > > Sent: Monday, May 05, 2008 8:27 AM > > To: solr-user@lucene.apache.org > > Subject: multi-language searching with Solr > > > > Hello folks, > > > > Let me start by saying that I am new to Lucene and Solr. > > > > I am in the process of designing a search back-end for a system that > > receives 20k documents a day and needs to keep them available for 30 > > days. The documents should be searchable on a free text field and on > > about 8 other fields. > > > > One of my requirements is to index and search documents in multiple > > languages. I would like to have the ability to stem and provide the > > advanced search features that are based on it. This will only affect > > the free text field because the rest of the fields are in English. > > > > I can find out the language of the document before indexing and I might > > be able to provide the language to search on. I also need to have the > > ability to search across all indexed languages (there will be 20 in > > total). > > > > Given these requirements do you think this is doable with Solr? A > major > > limiting factor is that I need to stick to the 1.2 GA version and I > > cannot utilize the multi-core features in the 1.3 trunk. > > > > I considered writing my own analyzer that will call the appropriate > > Lucene analyzer for the given language but I did not see any way for it > > to access the field that specifies the language of the document. > > > > Thanks, > > > > Eli > > > > p.s. I am looking for an experienced Lucene/Solr consultant to help > with > > the design of this system. > > > > >
RE: multi-language searching with Solr
It won't make much difference to the index size, since you'll only be populating one of the language fields for each document, and empty fields cost nothing. The performance may suffer a bit but Lucene may surprise you with how good it is with that kind of boolean query. I agree that as the number of fields and languages increases, this is going to become a lot to manage. But you're up against some basic problems when you try to model this in Solr: for each token, you care about not just its value (which is all Lucene cares about) but also its language and its stem; and the stem for a given token depends on the language (different stemming rules); and at query time you may not know the language. I don't think you're going to get a solution without some redundancy; but solving problems by adding redundant fields is a common method in Solr. Peter -Original Message- From: Eli K [mailto:[EMAIL PROTECTED] Sent: Monday, May 05, 2008 2:28 PM To: solr-user@lucene.apache.org Subject: Re: multi-language searching with Solr Wouldn't this impact both indexing and search performance and the size of the index? It is also probable that I will have more then one free text fields later on and with at least 20 languages this approach does not seem very manageable. Are there other options for making this work with stemming? Thanks, Eli On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter <[EMAIL PROTECTED]> wrote: > I think you would have to declare a separate field for each language > (freetext_en, freetext_fr, etc.), each with its own appropriate > stemming. Your ingestion process would have to assign the free text > content for each document to the appropriate field; so, for each > document, only one of the freetext fields would be populated. At > search time, you would either search against the appropriate field if > you know the search language, or search across them with > "freetext_fr:query OR freetext_en:query OR ...". That way your query > will be interpreted by each language field using that language's stemming rules. > > Other options for combining indexes, such as copyfield or dynamic > fields (see http://wiki.apache.org/solr/SchemaXml), would lead to a > single field type and therefore a single type of stemming. You could > always use copyfield to create an unstemmed common index, if you > don't care about stemming when you search across languages (since > you're likely to get odd results when a query in one language is > stemmed according to the rules of another language). > > Peter > > > > -Original Message- > From: Eli K [mailto:[EMAIL PROTECTED] > Sent: Monday, May 05, 2008 8:27 AM > To: solr-user@lucene.apache.org > Subject: multi-language searching with Solr > > Hello folks, > > Let me start by saying that I am new to Lucene and Solr. > > I am in the process of designing a search back-end for a system that > receives 20k documents a day and needs to keep them available for 30 > days. The documents should be searchable on a free text field and on > about 8 other fields. > > One of my requirements is to index and search documents in multiple > languages. I would like to have the ability to stem and provide the > advanced search features that are based on it. This will only affect > the free text field because the rest of the fields are in English. > > I can find out the language of the document before indexing and I > might be able to provide the language to search on. I also need to > have the ability to search across all indexed languages (there will > be 20 in total). > > Given these requirements do you think this is doable with Solr? A > major limiting factor is that I need to stick to the 1.2 GA version > and I cannot utilize the multi-core features in the 1.3 trunk. > > I considered writing my own analyzer that will call the appropriate > Lucene analyzer for the given language but I did not see any way for > it to access the field that specifies the language of the document. > > Thanks, > > Eli > > p.s. I am looking for an experienced Lucene/Solr consultant to help > with the design of this system. > >
Re: multi-language searching with Solr
I searched the Solr list but not as much the Lucene list. I will look again to see if there is something there that might work with Solr. I rather leverage Solr, but if I have no choice I will to do this using Lucene only. Thanks, Eli On Mon, May 5, 2008 at 4:58 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > You might want to bounce over to the Lucene user's list and search > for language. This topic has arisen many times and there's some good > discussion. And have you searched the solr users list of "language"? I > know it's turned up here as well. > > Best > Erick > > > > On Mon, May 5, 2008 at 4:28 PM, Eli K <[EMAIL PROTECTED]> wrote: > > > Wouldn't this impact both indexing and search performance and the size > > of the index? > > It is also probable that I will have more then one free text fields > > later on and with at least 20 languages this approach does not seem > > very manageable. Are there other options for making this work with > > stemming? > > > > Thanks, > > > > Eli > > > > > > On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter > > <[EMAIL PROTECTED]> wrote: > > > I think you would have to declare a separate field for each language > > > (freetext_en, freetext_fr, etc.), each with its own appropriate > > > stemming. Your ingestion process would have to assign the free text > > > content for each document to the appropriate field; so, for each > > > document, only one of the freetext fields would be populated. At search > > > time, you would either search against the appropriate field if you know > > > the search language, or search across them with "freetext_fr:query OR > > > freetext_en:query OR ...". That way your query will be interpreted by > > > each language field using that language's stemming rules. > > > > > > Other options for combining indexes, such as copyfield or dynamic > > fields > > > (see http://wiki.apache.org/solr/SchemaXml), would lead to a single > > > field type and therefore a single type of stemming. You could always > > use > > > copyfield to create an unstemmed common index, if you don't care about > > > stemming when you search across languages (since you're likely to get > > > odd results when a query in one language is stemmed according to the > > > rules of another language). > > > > > > Peter > > > > > > > > > > > > -Original Message- > > > From: Eli K [mailto:[EMAIL PROTECTED] > > > Sent: Monday, May 05, 2008 8:27 AM > > > To: solr-user@lucene.apache.org > > > Subject: multi-language searching with Solr > > > > > > Hello folks, > > > > > > Let me start by saying that I am new to Lucene and Solr. > > > > > > I am in the process of designing a search back-end for a system that > > > receives 20k documents a day and needs to keep them available for 30 > > > days. The documents should be searchable on a free text field and on > > > about 8 other fields. > > > > > > One of my requirements is to index and search documents in multiple > > > languages. I would like to have the ability to stem and provide the > > > advanced search features that are based on it. This will only affect > > > the free text field because the rest of the fields are in English. > > > > > > I can find out the language of the document before indexing and I might > > > be able to provide the language to search on. I also need to have the > > > ability to search across all indexed languages (there will be 20 in > > > total). > > > > > > Given these requirements do you think this is doable with Solr? A > > major > > > limiting factor is that I need to stick to the 1.2 GA version and I > > > cannot utilize the multi-core features in the 1.3 trunk. > > > > > > I considered writing my own analyzer that will call the appropriate > > > Lucene analyzer for the given language but I did not see any way for it > > > to access the field that specifies the language of the document. > > > > > > Thanks, > > > > > > Eli > > > > > > p.s. I am looking for an experienced Lucene/Solr consultant to help > > with > > > the design of this system. > > > > > > > > >
Re: Help optimizing
On 3-May-08, at 10:06 AM, Daniel Andersson wrote: Our database/index is 3.5 GB and contains 4,352,471 documents. Most documents are less than 1 kb. When performing a search, the results vary between 1.5 seconds up to 60 seconds. I don't have a big problem with 1.5 seconds (even though below 1 would be nice), but 60 seconds it just.. well, scary. That is too long, and shouldn't be happening. How do I optimize Solr to better use all the RAM? I'm using java6, 64bit version, and start Solr using: java -Xmx7500M -Xms4096M -jar start.jar But according to top it only seems to be using 7.7% of the memory (around 600 MB). Don't try to give Solr _all_ the memory on the system. Solr depends on the index existing in the OS's disk cache (this is "cached" in top). You should have at least 2 GB memory for a 3.5GB index, depending on how much of the index is stored (best is of course to have 3.5GB available so it can be cached completely). Solr will require a wide distribution of queries to "warm up" (get the index in the OS disk cache). This is automatically prioritize the "hot spots" in the index. If you want to load the whole thing 'cd datadir; cat * > /dev/null' works, but I don't recommend relying on that. Most queries are for make_id + model_id or city + state and almost all of the queries are ordered by datetime_found (newest -> oldest). How many documents match, typically? How many documents are returned, typically? How often do you commit() [I suspect frequently, based on the problems you are having]? -Mike
Re: Tokenize integers?
Just use fieldType="string", and send them to solr in a multivalued fashion: 1133name="blah">999 Search: blah:133 +blah:999 +blah:1 [both must match] Just treat the numbers as untokenized text. -Mike On 4-May-08, at 2:30 AM, [EMAIL PROTECTED] wrote: Ok, thanks. However I am still abit confused. Since I know that these are only integers, can't I somehow make solr to use solr.IntField or solr.SortableIntField, but still tokenize like this? I tried the configuration below but changed TextField to IntField and indexed the document again, but then the search didn't work... This is what I use now (after your suggestion): This works great when searching. But when I get the document back, I see that the stored value is still the comma separated values. ie: ... 3,5 ... I would have liked it like this instead: ... 3 5 ... Is this possible with solr by some configuration? Am I really the only one that would like this behaivor? /Jimi Quoting Otis Gospodnetic <[EMAIL PROTECTED]>: I think you are after http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, May 3, 2008 11:57:37 PM Subject: Tokenize integers? Hi, What is the recommended way to configure a fieldtype for a field that looks like this in the source system? categoryIds=1,325,488 The order of these id's are not important. I want to be able to fetch all the id's, separately, ie I want them to be stored as multivalue, I guess... And I also want to be able to search on the individual id's, or combinations (for example search for all articles with category id 1 and 488). I know I can index this as multiple categoryId fields (and have them as int or sint type), but that means I need to write preprocessing on the "client" side. I would prefer a server side fix, so that the client can send the xml like this: ... 1,325,488 ... And then the server (ie solr) will transform this into a multivalue int/sint field, using tokenizing or whatever it is called (or is tokenizing not performed on the stored value?). What are your suggestions? Maybe this is already documented in the wiki or someplace else? I have searched for this, but not found anything that helps. Regards /Jimi
Re: Re[2]: startsWith?
On 3-May-08, at 10:44 PM, JLIST wrote: Hello Otis, Do you mean that if I index the URL as a "text" field, I'll be able to do * for a given prefix because the text will be tokenized at the "/" and should suffice for my need? I'm not sure what your needs are, but I use the following to index urls: (in which is stored the _reversed domain_. That is, "com.example.www") I also store the url as a textTight (see example schema). If you want to do prefix matching on the url, I recommend storing it untokenized in another field (or minimal tokenization, like lowercasing). If, like me, you want to restrict document to a certain domain and subdomains, you have to be careful with your query: reverse_domain:com.example reverse_domain:com.example.* If you just do reverse_domain:com.example*, you will also match www.foo-example.com , which you don't want. -Mike
Re: custom queries via plugins?
I'm not sure if you are after a custom query parsing component, but if that is that you are after, start by looking at these: $ ff \*QParser\*java ./src/test/org/apache/solr/search/FooQParserPlugin.java ./src/java/org/apache/solr/search/LuceneQParserPlugin.java<== here ./src/java/org/apache/solr/search/DisMaxQParserPlugin.java ./src/java/org/apache/solr/search/PrefixQParserPlugin.java ./src/java/org/apache/solr/search/QParser.java<=== here ./src/java/org/apache/solr/search/RawQParserPlugin.java ./src/java/org/apache/solr/search/FieldQParserPlugin.java ./src/java/org/apache/solr/search/BoostQParserPlugin.java ./src/java/org/apache/solr/search/NestedQParserPlugin.java ./src/java/org/apache/solr/search/FunctionQParser.java ./src/java/org/apache/solr/search/QParserPlugin.java<== here ./src/java/org/apache/solr/search/FunctionQParserPlugin.java ./src/java/org/apache/solr/search/OldLuceneQParserPlugin.java + SolrQueryParser.java Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Phillip Rhodes <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, May 5, 2008 1:34:51 PM > Subject: custom queries via plugins? > > I am currently using lucene directly to build custom queries. Can I write a > plugin to build these custom BooleanQueries, RangeQueries, etc...? As a > simple > example, we have documents that represent coupons, events and activities. > Some > searches may only be for coupons and events. Currently, I programmatically > build > up a boolean query for this. I wanted to know if I could still do this with > solr. > > I just wanted to get a little bit of validation before investing a few hours > into actually trying to use solr. I have been reading the tutorials, docs, > but > while I suspect that solr exposes the lucene query via plugins, I have not > seen > this spelled out (but I'm a bad speller;) > > > > Thank you for your time. > > Phillip > >
Multiple SpellCheckRequestHandlers
Hi all, Is it possible in Solr to have multiple SpellCheckRequestHandlers. In my application I have got two different spell check indexes. I want the spell checker to check for a spelling suggestion in the first index and if it fails to get any suggestion from the first index only then it should try to get a suggestion from the second index. Is it possible to have a separate SpellCheckRequestHandler one for each index? Solr-User -- View this message in context: http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenize integers?
: Just use fieldType="string", and send them to solr in a multivalued fashion: : : 1133999 But as the OP said: that requires preprocessing -- it would be nice if Solr would make this easier for you. I've had some ideas in the back of my mind for a while now that: 1) schema.xml should support something analyzer-chain-esque for processing the "stored" value of a field. 2) it should be easy to make #1 either apply just to the stored value independent of the indexed value, or be applied prior to the "index" analyzer to the 3) we should change IndexSchema to respect for all the fieldtypes, not just TextField. ...then people could configure all sorts of interesting behavior like "i want fieldtypeA to be a SortableInt, but if someone indexes a comma seperated list of numbers to the right thing". I *think* #2 could probably be achieved really easily using the TeeFilter and the SinkTokenizer (but i haven't actually played with them to be sure) (too many ideas, too little time) -Hoss
Re: Multiple SpellCheckRequestHandlers
Yes, just define two instances (with two distinct names) in solrconfig.xml and point each of them to a different index. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: solr_user <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 12:16:07 AM > Subject: Multiple SpellCheckRequestHandlers > > > Hi all, > > Is it possible in Solr to have multiple SpellCheckRequestHandlers. In my > application I have got two different spell check indexes. I want the spell > checker to check for a spelling suggestion in the first index and if it > fails to get any suggestion from the first index only then it should try to > get a suggestion from the second index. > > Is it possible to have a separate SpellCheckRequestHandler one for each > index? > > Solr-User > > > -- > View this message in context: > http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Tokenize integers?
On 5-May-08, at 9:19 PM, Chris Hostetter wrote: : Just use fieldType="string", and send them to solr in a multivalued fashion: : : 1133field> : name="blah">999 But as the OP said: that requires preprocessing -- it would be nice if Solr would make this easier for you. Oh I see, I misinterpreted "multiple categoryId fields". I agree that it would be nice to have a Solr stored field processor. While it is usually possible to do arbitrary transformations before getting to Solr, it is nice to be able to encode as much information as possible in the Solr config. -Mike
Your valuable suggestion on autocomplete
Hi Group, I have already got some valuable suggestions from group. Based on that, I have come out with following process to finally implement autocomplete like fetaure in my system 1- Index the whole documents 2- Extract all terms using indexReader's terms() method I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like to get absolute terms i.e. vlanand. The field definition in solr is Would appreciate your input to get absolute terms?? 3- For each term, extract documents containing those term using termDocs() method 4- Create one more index with fields, term, frequency and docNo. This index would be used for autocomplete feature. 5- Any letter typed by user in search field, use Ajax script (like scriptaculous or JQuery) to extract all terms using prefix query. 6- Based on search term selected by user, keep track of document nos in which this term belongs. 7- For next search term selection using documents nos to select all terms excluding currently selected term. This somehow works. As new to SOlr ans also to Lucene, I would like to know in case it can be improved? - RB