Re: ranking on Multivalued fields
What you probably want to achieve is displaying only docs in a certain category (maybe filtered) ordered by descending score in the context of exactly this category, right? Well, you could come over this by creating a category specific score field for every category following the schema "cat-X-score" where X is the identifier of each of your categories. Then when receiving a request for your category you programmatically have to build the sort-by condition for field "cat-Y-score", where Y is the category id of the category you received the request for. *tobi* Umar Shah wrote: Hi Otis, thanks for the reply, consider a multivalued field name cat --other fields val 1 score2 val 3 Umar, I'm not sure what you mean by a "subfield", can you explain please? As for your second question, just add category:X to your query and you'll get matches ordered/ranked by score by default. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Umar Shah <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, March 7, 2008 1:17:35 AM Subject: ranking on Multivalued fields Hi, I have a problem where i want to rank multivalued fields suppose a multivalued field "category" having associated subfield "score". First Is it possible to have a subfield in the mutlivalued field? Second I want to get the documents ranked with the highest score say for the category:X thanks Umar Shah
Re: What is default Date time format in Solr
Thanks Chris, My index creation was wrong ;)(I was using 12 Hour format) Thanks for your support -kmu On Sat, Mar 8, 2008 at 1:35 AM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : I heard Solr Date time format is 24 hours. > > that is correct. > > : emf.artist:[2007-12-31T22:20:00Z TO 2007-12-31T22:39:00Z] > : > : I am not able to get the content what I expected. > : > : But, I tried with following query:- > : > : emf.artist:[2007-12-31T10:20:00Z TO 2007-12-31T10:39:00Z] > > Is your emf.artist field stored? > If so what value do you see in the field when you do that second query and > get the results you are looking for? if they don't match what you think > they should be, then the code you have reading dates from your index and > writing them to Solr isn't doing what you think it's doing. > > > > > -Hoss > >
Re: Accented search
I'm not sure about a way to boost scores in this case, but you can achieve the basic matching by applying a filter to the index and the queries. The ISOLatin1Accent Filter seems like it may work for you, though I'm not entirely certain if that will cover all the accent characters you need. My approach has been to write new filters, one to normalize the unicode into the "decomposed" version, then one to manually strip out all of the "add-on" characters (with decimal codepoint greater than 256). I don't know if this will always work, but it's worked well for me so far. I would test out adding a to your analyzer. It might do the trick. Once again, with this approach I'm not sure how to boost either score, so someone else may have better ideas. I'm pretty new to all of this stuff. Peter climbingrose wrote: Hi guys, I'm running to some problems with accented (UTF-8) language. I'd love to hear some ideas about how to use Solr with those languages. Basically, I want to achieve what Google did with UTF-8 language. My requirements including: 1) Accent insensitive search and proper highlighting: For example, we have 2 documents: Doc A (title:Lập Trình Viên) Doc B (title:Lap Trinh Vien) if the user enters "Lập Trình Viên", then Doc B is also matched and "Lập Trình Viên" is highlighted. On the other hand, if the query is "Lap Trinh Vien", Doc A is also matched. 2) Assign proper scores to accented or non-accented searches: if the user enters "Lập Trình Viên", then Doc A should be given higher score than DOC B. if the query is "Lap Trinh Vien", Doc A should be given higher score. Any ideas guys? Thanks in advance!
RE: Accented search
We've done this in a pre-Solr Lucene context by using the position increment: when a token contains accented characters, you add a stripped version of that token with a zero increment, so that for matching purposes the original and the stripped version are at the same position. Accents are not stripped from queries. The effect is that an accented search matches your Doc A, and an unaccented search matches Docs A and B. We do that after lower-casing the token. There are some limitations: users might start to expect that they can freely add accents to restrict their search to accented hits, but if they don't match the accents exactly they won't get any hits: e.g. if a word contains two accented characters and the user only accents one of them in their query, they won't match the accented or the unaccented version. Peter Peter Binkley Digital Initiatives Technology Librarian Information Technology Services 4-30 Cameron Library University of Alberta Libraries Edmonton, Alberta Canada T6G 2J8 Phone: (780) 492-3743 Fax: (780) 492-9243 e-mail: [EMAIL PROTECTED] ~ The code is willing, but the data is weak. ~ -Original Message- From: climbingrose [mailto:[EMAIL PROTECTED] Sent: Monday, March 10, 2008 10:01 PM To: solr-user@lucene.apache.org Subject: Accented search Hi guys, I'm running to some problems with accented (UTF-8) language. I'd love to hear some ideas about how to use Solr with those languages. Basically, I want to achieve what Google did with UTF-8 language. My requirements including: 1) Accent insensitive search and proper highlighting: For example, we have 2 documents: Doc A (title:Lập Trình Viên) Doc B (title:Lap Trinh Vien) if the user enters "Lập Trình Viên", then Doc B is also matched and "Lập Trình Viên" is highlighted. On the other hand, if the query is "Lap Trinh Vien", Doc A is also matched. 2) Assign proper scores to accented or non-accented searches: if the user enters "Lập Trình Viên", then Doc A should be given higher score than DOC B. if the query is "Lap Trinh Vien", Doc A should be given higher score. Any ideas guys? Thanks in advance! -- Regards, Cuong Hoang
schema help
hi :) I'm trying to work out a schema for our widgets. more than "just coming up with something" I'd like something idiomatic in solr terms. any help is much appreciated. here's a similar problem space to what I'm working with... lets say we're talking books. books are written by authors and held in libraries. a sister company is using lucene+compass and they seem to have completely different collections (or whatever the technical term is :) authors books libraries so that a search for authors hits only the authors dataset. all of the solr examples I can find don't seem to address this kind of data disparity. what is the standard and idiomatic approach for solr? for my particular data I'd want to display something like this author book in library book in library on the same result page, but using a completely flat, single schema doesn't seem to scale very well. collective widsom most welcome :) --Geoff
RE: Accented search
Peter: Very interesting. To take care of the issue you mention, could you add multiple "synonyms" with progressively less accents? E.g. you'd index "préférence" as 4 tokens: préférence (unchanged) preférence (stripped one accent) préference (stripped the other accent) preference (stripped both accents) Or does it yield too many tokens to be useful? And how does this take care of scoring? Do you get a higher score with a closer match? -Original Message- From: Binkley, Peter [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 11, 2008 8:37 AM To: solr-user@lucene.apache.org Subject: RE: Accented search We've done this in a pre-Solr Lucene context by using the position increment: when a token contains accented characters, you add a stripped version of that token with a zero increment, so that for matching purposes the original and the stripped version are at the same position. Accents are not stripped from queries. The effect is that an accented search matches your Doc A, and an unaccented search matches Docs A and B. We do that after lower-casing the token. There are some limitations: users might start to expect that they can freely add accents to restrict their search to accented hits, but if they don't match the accents exactly they won't get any hits: e.g. if a word contains two accented characters and the user only accents one of them in their query, they won't match the accented or the unaccented version. Peter Peter Binkley Digital Initiatives Technology Librarian Information Technology Services 4-30 Cameron Library University of Alberta Libraries Edmonton, Alberta Canada T6G 2J8 Phone: (780) 492-3743 Fax: (780) 492-9243 e-mail: [EMAIL PROTECTED] ~ The code is willing, but the data is weak. ~ -Original Message- From: climbingrose [mailto:[EMAIL PROTECTED] Sent: Monday, March 10, 2008 10:01 PM To: solr-user@lucene.apache.org Subject: Accented search Hi guys, I'm running to some problems with accented (UTF-8) language. I'd love to hear some ideas about how to use Solr with those languages. Basically, I want to achieve what Google did with UTF-8 language. My requirements including: 1) Accent insensitive search and proper highlighting: For example, we have 2 documents: Doc A (title:L?p Trình Viên) Doc B (title:Lap Trinh Vien) if the user enters "L?p Trình Viên", then Doc B is also matched and "L?p Trình Viên" is highlighted. On the other hand, if the query is "Lap Trinh Vien", Doc A is also matched. 2) Assign proper scores to accented or non-accented searches: if the user enters "L?p Trình Viên", then Doc A should be given higher score than DOC B. if the query is "Lap Trinh Vien", Doc A should be given higher score. Any ideas guys? Thanks in advance! -- Regards, Cuong Hoang
Re: Accented search
Generally, the accented version will have a higher IDF, so it will score higher. wunder On 3/11/08 8:44 AM, "Renaud Waldura" <[EMAIL PROTECTED]> wrote: > Peter: > > Very interesting. To take care of the issue you mention, could you add > multiple "synonyms" with progressively less accents? > > E.g. you'd index "préférence" as 4 tokens: > préférence (unchanged) > preférence (stripped one accent) > préference (stripped the other accent) > preference (stripped both accents) > > Or does it yield too many tokens to be useful? > > And how does this take care of scoring? Do you get a higher score with a > closer match? > > > > > -Original Message- > From: Binkley, Peter [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 11, 2008 8:37 AM > To: solr-user@lucene.apache.org > Subject: RE: Accented search > > We've done this in a pre-Solr Lucene context by using the position > increment: when a token contains accented characters, you add a stripped > version of that token with a zero increment, so that for matching purposes > the original and the stripped version are at the same position. Accents are > not stripped from queries. The effect is that an accented search matches > your Doc A, and an unaccented search matches Docs A and B. We do that after > lower-casing the token. > > There are some limitations: users might start to expect that they can freely > add accents to restrict their search to accented hits, but if they don't > match the accents exactly they won't get any hits: e.g. if a word contains > two accented characters and the user only accents one of them in their > query, they won't match the accented or the unaccented version. > > Peter > > Peter Binkley > Digital Initiatives Technology Librarian Information Technology Services > 4-30 Cameron Library University of Alberta Libraries Edmonton, Alberta > Canada T6G 2J8 > Phone: (780) 492-3743 > Fax: (780) 492-9243 > e-mail: [EMAIL PROTECTED] > > ~ The code is willing, but the data is weak. ~ > > > -Original Message- > From: climbingrose [mailto:[EMAIL PROTECTED] > Sent: Monday, March 10, 2008 10:01 PM > To: solr-user@lucene.apache.org > Subject: Accented search > > Hi guys, > > I'm running to some problems with accented (UTF-8) language. I'd love to > hear some ideas about how to use Solr with those languages. Basically, I > want to achieve what Google did with UTF-8 language. > > My requirements including: > 1) Accent insensitive search and proper highlighting: > For example, we have 2 documents: > > Doc A (title:L?p Trình Viên) > Doc B (title:Lap Trinh Vien) > > if the user enters "L?p Trình Viên", then Doc B is also matched and "L?p > Trình Viên" is highlighted. > On the other hand, if the query is "Lap Trinh Vien", Doc A is also > matched. > 2) Assign proper scores to accented or non-accented searches: > if the user enters "L?p Trình Viên", then Doc A should be given higher > score than DOC B. > if the query is "Lap Trinh Vien", Doc A should be given higher score. > > Any ideas guys? Thanks in advance! > > -- > Regards, > > Cuong Hoang > >
Re: schema help
Geoff, I'm not sure if I understood your problem correctly, but it sounds like you want your search to be restricted to authors, but then you want to list all of his/her books when displaying results. The easiest thing to do would be to create an index where each "row"/Document has the author name, the book title, etc. For each author-matching Document you'd pull his/her books out of the result set. Yes, this means the author name would be denormalized in RDBMS-speak. Another option is not to index/store book titles, but rather have only an author index to search against. The book data (mapped to author identities) would then be pulled from an external source (e.g. RDBMS: select title from books where author_id in (1,2,3)) at search results display time. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Geoffrey Young <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, March 11, 2008 12:17:32 PM Subject: schema help hi :) I'm trying to work out a schema for our widgets. more than "just coming up with something" I'd like something idiomatic in solr terms. any help is much appreciated. here's a similar problem space to what I'm working with... lets say we're talking books. books are written by authors and held in libraries. a sister company is using lucene+compass and they seem to have completely different collections (or whatever the technical term is :) authors books libraries so that a search for authors hits only the authors dataset. all of the solr examples I can find don't seem to address this kind of data disparity. what is the standard and idiomatic approach for solr? for my particular data I'd want to display something like this author book in library book in library on the same result page, but using a completely flat, single schema doesn't seem to scale very well. collective widsom most welcome :) --Geoff
Re: ranking on Multivalued fields
Umar, The notion of "subfield" does not exist in Solr (or am I living under a rock?). Thus, val 1 http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Umar Shah <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, March 8, 2008 7:03:32 AM Subject: Re: ranking on Multivalued fields Hi Otis, thanks for the reply, consider a multivalued field name cat --other fields val 1 score2 val 3 > As for your second question, just add category:X to your query and you'll > get matches ordered/ranked by score by default. > > Otis > > > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > - Original Message > From: Umar Shah <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, March 7, 2008 1:17:35 AM > Subject: ranking on Multivalued fields > > Hi, > > I have a problem where i want to rank multivalued fields > > suppose a multivalued field "category" having associated subfield "score". > First Is it possible to have a subfield in the mutlivalued field? > Second I want to get the documents ranked with the highest score say for > the > category:X > > thanks > Umar Shah > > > >
Re: schema help
Otis Gospodnetic wrote: Geoff, I'm not sure if I understood your problem correctly, but it sounds like you want your search to be restricted to authors, but then you want to list all of his/her books when displaying results. that's about right. add that I may also want to search on libraries and show all the books (and authors) stored there. in real life, it's not books or authors, of course, but the parallels are close enough :) in fact, the library example is a good one for me... or at least a network of public libraries linked together. The easiest thing to do would be to create an index where each "row"/Document has the author name, the book title, etc. For each author-matching Document you'd pull his/her books out of the result set. Yes, this means the author name would be denormalized in RDBMS-speak. I think I can live with the denormalization - it seems lucene is flat and very different conceptually than a database :) the trouble I'm having is one of dimension. an author has many, many attributes (name, birthdate, biography in $language, etc). as does each book (title in $language, summary in $language, genre, etc). as does each library (name, address, directions in $language, etc). so an author with N books doesn't seem to scale very well in the flat representations I'm finding in all the lucene/solr docs and examples... at least not in some way I can wrap my head around. part of what seemed really appealing about lucene in general was that you could stuff all this (unindexed) information into a document and retrieve it all based on some search criteria. but it's seeming very difficult for me to wrap my head around the data I need to represent. Another option is not to index/store book titles, but rather have only an author index to search against. The book data (mapped to author identities) would then be pulled from an external source (e.g. RDBMS: select title from books where author_id in (1,2,3)) at search results display time. eew :) seriously, though, that's what we have now - all rdbms driven. if solr could only conceptually handle the initial lookup there wouldn't be much point. maybe I'm thinking about this all wrong (as is to be expected :), but I just can't believe that nobody is using solr to represent data a bit more complex than the examples out there. thanks for the feedback. --Geoff Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Geoffrey Young <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, March 11, 2008 12:17:32 PM Subject: schema help hi :) I'm trying to work out a schema for our widgets. more than "just coming up with something" I'd like something idiomatic in solr terms. any help is much appreciated. here's a similar problem space to what I'm working with... lets say we're talking books. books are written by authors and held in libraries. a sister company is using lucene+compass and they seem to have completely different collections (or whatever the technical term is :) authors books libraries so that a search for authors hits only the authors dataset. all of the solr examples I can find don't seem to address this kind of data disparity. what is the standard and idiomatic approach for solr? for my particular data I'd want to display something like this author book in library book in library on the same result page, but using a completely flat, single schema doesn't seem to scale very well. collective widsom most welcome :) --Geoff
Re: schema help
Our Solr use consists of several rather different data types, some of which have one-to-many relationships with other types. We don't need to do any searching of quite the kind you describe, but I have an idea about it, depending on what you need to do with the book data. It is rather hacky, but maybe you can improve it. If you only need to present a list of books, possibly with links to fuller data, you could do this: * store only Authors in solr * create a field, stored but not indexed (I may be using slightly wrong terms here) which contains the short text representation of all their books * search on authors however you want and make sure you return this field, and just display it as is For example, if Jane Doe has written 2 books, How To Garden, and Fields Of Maine, your special field might contain this: Fields of Maine published on DATE. A brief overvew of Maine's woods and fields with special attention to wildflowers If your 'authors' 'write' 'books' with great frequency, you'd need to update a lot... Another possibility is to do two searches, with this kind of structure, which sort of mimics an RDBMS: * everything in Solr has a field, type (book, author, library, etc). these can be filtered on a search by search basis * books have a field, authorId, uniquely referencing the author * your first search will restricted to just authors, from which you will extract the IDs. * your second search will be restricted to just books, whose authorId field is exactly one of the IDs from the first search As you have noticed, Lucene is not an RDBMS. Searching through all the text of all the books is more the use it was designed around; of course the analogy might not be THAT strong with your need! Rachel On 3/11/08, Geoffrey Young <[EMAIL PROTECTED]> wrote: > > > Otis Gospodnetic wrote: > > Geoff, > > > > I'm not sure if I understood your problem correctly, but it sounds > > like you want your search to be restricted to authors, but then you > > want to list all of his/her books when displaying results. > > > that's about right. add that I may also want to search on libraries and > show all the books (and authors) stored there. > > in real life, it's not books or authors, of course, but the parallels > are close enough :) in fact, the library example is a good one for > me... or at least a network of public libraries linked together. > > > > The > > easiest thing to do would be to create an index where each > > "row"/Document has the author name, the book title, etc. For each > > author-matching Document you'd pull his/her books out of the result > > set. Yes, this means the author name would be denormalized in > > RDBMS-speak. > > > I think I can live with the denormalization - it seems lucene is flat > and very different conceptually than a database :) > > the trouble I'm having is one of dimension. an author has many, many > attributes (name, birthdate, biography in $language, etc). as does each > book (title in $language, summary in $language, genre, etc). as does > each library (name, address, directions in $language, etc). so an > author with N books doesn't seem to scale very well in the flat > representations I'm finding in all the lucene/solr docs and examples... > at least not in some way I can wrap my head around. > > part of what seemed really appealing about lucene in general was that > you could stuff all this (unindexed) information into a document and > retrieve it all based on some search criteria. but it's seeming very > difficult for me to wrap my head around the data I need to represent. > > > > Another option is not to index/store book titles, but > > rather have only an author index to search against. The book data > > (mapped to author identities) would then be pulled from an external > > source (e.g. RDBMS: select title from books where author_id in > > (1,2,3)) at search results display time. > > > eew :) seriously, though, that's what we have now - all rdbms driven. > if solr could only conceptually handle the initial lookup there wouldn't > be much point. > > maybe I'm thinking about this all wrong (as is to be expected :), but I > just can't believe that nobody is using solr to represent data a bit > more complex than the examples out there. > > thanks for the feedback. > > --Geoff > > > > > > Otis > > > > -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message From: Geoffrey Young > > <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: > > Tuesday, March 11, 2008 12:17:32 PM Subject: schema help > > > > hi :) > > > > I'm trying to work out a schema for our widgets. more than "just > > coming up with something" I'd like something idiomatic in solr terms. > > any help is much appreciated. here's a similar problem space to what > > I'm working with... > > > > lets say we're talking books. books are written by authors and held > > in libraries. a sister
Re: Unparseable date
I indexed my docs with field : 1995-12-31T23:59:59.000Z But when i try to search on that field : order_dt:1995-12-31T23:59:59.000Z , I get an exception : Mar 11, 2008 4:13:55 PM org.apache.solr.core.SolrException log SEVERE: org.apache.solr.core.SolrException: Invalid Date String:'1995-12-31T23' at org.apache.solr.schema.DateField.toInternal(DateField.java:108) at org.apache.solr.schema.FieldType$DefaultAnalyzer$1.next(FieldType.java:298) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:437) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:78) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1092) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:979) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:907) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:896) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:146) Am I missing anything ? Thanks, Monica. Daniel Andersson-5 wrote: > > > On Mar 5, 2008, at 11:08 PM, Chris Hostetter wrote: > >> It's ".000" not ":00" ... "2008-02-12T15:02:06.000Z" >> >> but like i said: that stack trace is odd, the time doesn't seem >> like it >> actually comes from any query params, it looks like it's coming from a >> previously indexed doc. To work arround this you may need to reindex >> all of your docs with those optional milliseconds. > > Ah, re-indexing now. Thanks for your help! > > / d > > -- View this message in context: http://www.nabble.com/Unparseable-date-tp15854401p15994506.html Sent from the Solr - User mailing list archive at Nabble.com.
Query Level Boosting
Hello. I was wondering if anyone knew a way to do query level boosting with SolrJ. On the http client I could just do something like sku:123^2.3 which would boost the sky query 2.3 points. -- View this message in context: http://www.nabble.com/Query-Level-Boosting-tp15995005p15995005.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Out of memory in analysis
: I pasted a modest blob of text into the analysis debug slot on the admin : app, and am rewarded with this, even with -Xmx1g. what was the text? what was the field/fieldtype? what did the analyzers for that fieldtype look like in your schema.xml? -Hoss
Re: return only sorted Field, but with a different Field Name
: : For example, say I want to sort by the field '162_sortable_s' then I add a : parameter like so 'sort=162_sortable_s.' I need to change the settings so : that when the result set is returned from solr, it takes the values of : '162_sortable_s' and inserts them into a separate field called 'SortedField' : so that the return doc looks like this: there is nothing like this in solr right now, it doesn't seem like something that should be odne in solr, as it would be a simple translation that could be done via an XSLT or some client layer code. : How or where do I change that setting? Do I have to rewrite some part of the : RequestHandler? assuming you didn't want to just use an XSLT, writing your own response writer that subclasses XmlResponseWriter would probably be the simplest way to accomplish this. -Hoss
Re: How to get incrementPositionGap value from IndexSchema ?
: I am looking for a way to access the incrementPositionGap value defined for a : field type in the schema.xml. I think you mean "positionIncrementGap" It's a property of the in schema.xml, but internally it's passed to SolrAnalyzer.setPositionIncrementGap. if you want to programaticly know what the "positionIncrementGap" is for any analyzer of any field or fieldtype regardless of wether or not it's a SolrAnalyzer, just use Analzer.getPositionIncrementGap(String fieldName) ie: myFieldType.getAnalyzer().getPositionIncrementGap(myFieldName) If you don't mind me asking: why do you want/need this information in your custom code? -Hoss
Re: Result based sorting for KWIC?
: I am investigating using solr for a project that requires presentation of : search results in a KWIC display, sorted according to either the string : following the matches or the (reverse) of the characters previous to the : matches. Can this be done with Solr? How would I go about implement this? 1) if you've got full text search, why would you even want KWIC? 2) your description of how you'd want the results ordered is extrmely confusing to me ... can you give a simple concrete example of some documents / queries / result-doclists that you would want to see? -Hoss
Re: Unparseable date
: I indexed my docs with field : 1995-12-31T23:59:59.000Z : But when i try to search on that field : order_dt:1995-12-31T23:59:59.000Z , : I get an exception : : Mar 11, 2008 4:13:55 PM org.apache.solr.core.SolrException log : SEVERE: org.apache.solr.core.SolrException: Invalid Date : String:'1995-12-31T23' ":" is a special character for the query parser, so it either needs to be escaped or the date needs to be quoted... order_dt:"1995-12-31T23:59:59.000Z" this isn't something most people typically need to worry about, because dates are typically only queried using ranges... order_dt:[1995-12-31T23:59:59.000Z TO *] -Hoss
Re: Accented search
Hi Peter, It looks like a very promising approach for us. I'm going to implement an custom Tokeniser based on your suggestions and see how it goes. Thank you all for your comments! Cheers On Wed, Mar 12, 2008 at 2:37 AM, Binkley, Peter <[EMAIL PROTECTED]> wrote: > We've done this in a pre-Solr Lucene context by using the position > increment: when a token contains accented characters, you add a stripped > version of that token with a zero increment, so that for matching purposes > the original and the stripped version are at the same position. Accents are > not stripped from queries. The effect is that an accented search matches > your Doc A, and an unaccented search matches Docs A and B. We do that after > lower-casing the token. > > There are some limitations: users might start to expect that they can > freely add accents to restrict their search to accented hits, but if they > don't match the accents exactly they won't get any hits: e.g. if a word > contains two accented characters and the user only accents one of them in > their query, they won't match the accented or the unaccented version. > > Peter > > Peter Binkley > Digital Initiatives Technology Librarian > Information Technology Services > 4-30 Cameron Library > University of Alberta Libraries > Edmonton, Alberta > Canada T6G 2J8 > Phone: (780) 492-3743 > Fax: (780) 492-9243 > e-mail: [EMAIL PROTECTED] > > ~ The code is willing, but the data is weak. ~ > > > -Original Message- > From: climbingrose [mailto:[EMAIL PROTECTED] > Sent: Monday, March 10, 2008 10:01 PM > To: solr-user@lucene.apache.org > Subject: Accented search > > Hi guys, > > I'm running to some problems with accented (UTF-8) language. I'd love to > hear some ideas about how to use Solr with those languages. Basically, I > want to achieve what Google did with UTF-8 language. > > My requirements including: > 1) Accent insensitive search and proper highlighting: > For example, we have 2 documents: > > Doc A (title:Lập Trình Viên) > Doc B (title:Lap Trinh Vien) > > if the user enters "Lập Trình Viên", then Doc B is also matched and "Lập > Trình Viên" is highlighted. > On the other hand, if the query is "Lap Trinh Vien", Doc A is also > matched. > 2) Assign proper scores to accented or non-accented searches: > if the user enters "Lập Trình Viên", then Doc A should be given higher > score than DOC B. > if the query is "Lap Trinh Vien", Doc A should be given higher score. > > Any ideas guys? Thanks in advance! > > -- > Regards, > > Cuong Hoang > -- Regards, Cuong Hoang
Re: Accented search
: It looks like a very promising approach for us. I'm going to implement : an custom Tokeniser based on your suggestions and see how it goes. Thank : you all for your comments! you don't really need a custom tokenizer -- just a buffered TokenFilter that clones the original token if it contains accent chars, mutates the clone, and then emits it next with a positionIncrement of 0. i'm kind of suprised ISOLatin1AccentFilter doesn't have an option to do this already -- it would certianly be a worthy patch to commit if someone wants to submit it back to lucene-java. : > don't match the accents exactly they won't get any hits: e.g. if a word : > contains two accented characters and the user only accents one of them in : > their query, they won't match the accented or the unaccented version. this could be accounted for by generating all of the permuations of unaccented characters when indexing -- it wouldn't solve the problem of a source term containing only one accent and the user quering with only one accent but on a different character ... you could work arround this by puting all of the permutations in at index time, but querying on the exact term and the no-accent term at query time. -Hoss
Cannot start solr
I follow the tutorial on wiki but when I go to http://server_address/solr/admin I got tomcat error message: HTTP 404 Then I go to check in Tomcat manager, I see it is not started, when I attend to start it, I got this error message. FAIL - Application at context path /solr could not be started I am using tomcat 5.5 on debian, and I am placing the war file outside the /webapps; also I copied everything under /example/solr to the path I pointed to... I checked the file is here. What I did wrong? -- View this message in context: http://www.nabble.com/Cannot-start-solr-tp15997140p15997140.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cannot start solr
Additional Infomation: 2008/3/12 上午 11:10:54 org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: Using JNDI solr.home: /var/webapps/solr 2008/3/12 上午 11:10:54 org.apache.solr.servlet.SolrDispatchFilter init INFO: looking for multicore.xml: /var/webapps/solr/multicore.xml 2008/3/12 上午 11:10:54 org.apache.solr.servlet.SolrDispatchFilter init FATAL: Could not start SOLR. Check solr/home property java.lang.ExceptionInInitializerError at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:104) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1206) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:293) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1306) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1570) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1579) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1559) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the XPathFactoryConfigurationException: javax.xml.xpath.XPathFactoryConfigurati... 2008/3/12 上午 11:10:54 org.apache.catalina.core.StandardContext start FATAL: Error filterStart 2008/3/12 上午 11:10:54 org.apache.catalina.core.StandardContext start FATAL: Context [/solr] startup failed due to previous errors - Related config: solr locate in /var/webapps/solr tree: /var/webapps/solr/ |-- README.txt |-- bin | |-- abc | |-- abo | |-- backup | |-- backupcleaner | |-- commit | |-- optimize | |-- readercycle | |-- rsyncd-disable | |-- rsyncd-enable | |-- rsyncd-start | |-- rsyncd-stop | |-- scripts-util | |-- snapcleaner | |-- snapinstaller | |-- snappuller | |-- snappuller-disable | |-- snappuller-enable | `-- snapshooter `-- conf |-- admin-extra.html |-- elevate.xml |-- protwords.txt |-- schema.xml |-- scripts.conf |-- solrconfig.xml |-- stopwords.txt |-- synonyms.txt `-- xslt |-- example.xsl |-- example_atom.xsl |-- example_rss.xsl `-- luke.xsl solr.xml: Can anybody help me? I am not so familiar with tomcat... Vinci wrote: > > I follow the tutorial on wiki but when I go to > http://server_address/solr/admin > I got tomcat error message: > HTTP 404 > > Then I go to check in Tomcat manager, I see it is not started, when I > attend to start it, I got this error message. > > FAIL - Application at context path /solr could not be started > > I am using tomcat 5.5 on debian, and I am placing the war file outside the > /webapps; also I copied everything under /example/solr to the path I > pointed to... I checked the file is here. > > What I did wrong? > -- View this message in context: http://www.nabble.com/Cannot-start-solr-tp15997140p15997330.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: schema help
Geoff, some comments inlined. - Original Message From: Geoffrey Young <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, March 11, 2008 4:55:15 PM Subject: Re: schema help Otis Gospodnetic wrote: > Geoff, > > I'm not sure if I understood your problem correctly, but it sounds > like you want your search to be restricted to authors, but then you > want to list all of his/her books when displaying results. that's about right. add that I may also want to search on libraries and show all the books (and authors) stored there. OG: That's fine. One page (of results) at a time, I imagine. in real life, it's not books or authors, of course, but the parallels are close enough :) in fact, the library example is a good one for me... or at least a network of public libraries linked together. > The > easiest thing to do would be to create an index where each > "row"/Document has the author name, the book title, etc. For each > author-matching Document you'd pull his/her books out of the result > set. Yes, this means the author name would be denormalized in > RDBMS-speak. I think I can live with the denormalization - it seems lucene is flat and very different conceptually than a database :) OG: Right, it is. :) the trouble I'm having is one of dimension. an author has many, many attributes (name, birthdate, biography in $language, etc). as does each book (title in $language, summary in $language, genre, etc). as does each library (name, address, directions in $language, etc). so an author with N books doesn't seem to scale very well in the flat representations I'm finding in all the lucene/solr docs and examples... at least not in some way I can wrap my head around. OG: I'm not sure why the number of attributes worries you. Imagine is as a wide RDBMS table, if it helps. Indices with dozens of fields are not uncommon. part of what seemed really appealing about lucene in general was that you could stuff all this (unindexed) information into a document and retrieve it all based on some search criteria. but it's seeming very difficult for me to wrap my head around the data I need to represent. OG: You certainly can do that. I'm not sure I understand where the hard part is. You seem to know what attributes each entity has. Maybe you are confused by how to handle N different types of entities in a single index? (I'm assuming a single index is what you currently have in mind) > Another option is not to index/store book titles, but > rather have only an author index to search against. The book data > (mapped to author identities) would then be pulled from an external > source (e.g. RDBMS: select title from books where author_id in > (1,2,3)) at search results display time. eew :) seriously, though, that's what we have now - all rdbms driven. if solr could only conceptually handle the initial lookup there wouldn't be much point. OG: Well, there might or might not be, depending on how much data you have, how flexible and fast your RDBMS-powered (full-text?) search, and so on. The Lucene/Solr for full-text search + RDBMS/BDB for display data is a common combination. maybe I'm thinking about this all wrong (as is to be expected :), but I just can't believe that nobody is using solr to represent data a bit more complex than the examples out there. OG: Oh, lots of people are, it's just that examples are simple, so people new to Solr, Lucene, etc. have easier time learning. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > Otis > > -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message From: Geoffrey Young > <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: > Tuesday, March 11, 2008 12:17:32 PM Subject: schema help > > hi :) > > I'm trying to work out a schema for our widgets. more than "just > coming up with something" I'd like something idiomatic in solr terms. > any help is much appreciated. here's a similar problem space to what > I'm working with... > > lets say we're talking books. books are written by authors and held > in libraries. a sister company is using lucene+compass and they seem > to have completely different collections (or whatever the technical > term is :) > > authors books libraries > > so that a search for authors hits only the authors dataset. > > all of the solr examples I can find don't seem to address this kind > of data disparity. what is the standard and idiomatic approach for > solr? > > for my particular data I'd want to display something like this > > author book in library book in library > > on the same result page, but using a completely flat, single schema > doesn't seem to scale very well. > > collective widsom most welcome :) > > --Geoff > >
Solr nightly build and the multicore mode
Hi all, after tracing log, I found the tomcat problem with nightly build is the multicore.xml on nightly build - if the multicore.xml doesn't exist, it won't run the application like jetty does (run in single core mode if file doesn't exist) Q1. I don't know how to set the path...WHERE should I put the core1 and core0 folder? somewhare in the solr/home or somewhere in webapps?, and make the admin panel working? Q2 how can I disable the multicore function when multicore.xml exist? just remove the second core? Thank you for any reply -- View this message in context: http://www.nabble.com/Solr-nightly-build-and-the-multicore-mode-tp15997822p15997822.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Result based sorting for KWIC?
Chris Hostetter wrote: 1) if you've got full text search, why would you even want KWIC? Well, KWIC is a way to present the full text search results so that they can be easily read. 2) your description of how you'd want the results ordered is extrmely confusing to me ... can you give a simple concrete example of some documents / queries / result-doclists that you would want to see? If you go to http://tkb.mydns.jp:8899/exist/rest/db/new/tkb.xq you will see what I currently have. Just click search to search for the example, or maybe delete the last character so that you get more results (this is not released yet, so don't be surprised if it breaks...). You will see the search term highlighted in the middle, context is available from the blue arrow to the right. The display would be much more useful for the users, if this could be sorted on the characters following the hit (ignoring punctuation). Another option would be to sort on the characters previous to the hit. But in this case, the sorting has to be reversed, so that if I have: ABCDFGHI the sort-key would be constructed as DCBA for this case. I know that this can be done by post-processing the results on the client (which is what Erik suggested offline), but if I get thousands of hits, that would be very slow, so I am looking for other ways. Erik also said that down the road there might be a sort function that could be called, which is what I would need here. Cheers, Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN
Re: Out of memory in analysis
This turned out to be a side-effect of the since-fixed use of GET in analysis.jsp, coupled with a mistake in one of my filters. On Tue, Mar 11, 2008 at 8:31 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : I pasted a modest blob of text into the analysis debug slot on the admin > : app, and am rewarded with this, even with -Xmx1g. > > what was the text? what was the field/fieldtype? what did the > analyzers for that fieldtype look like in your schema.xml? > > > -Hoss > >
[Update] Solr can be started from jetty but not tomcat
Hi all, after several hour I make the solr works a little bit: the jetty version works, but the tomcat version doesn't. Enviroment: JRE 1.6, tomcat 5.5, ubuntu 7.10. Solr nightly (8 Mar 08) Look like the multicore.xml cause the problem...the Solr die at the time of Config? In the localhost log: org.apache.catalina.core.StandardContext filterStart SEVERE: Exception starting filter SolrRequestFilter java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.core.SolrConfig at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:114) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:448) at org.apache.catalina.core.StandardServer.start(StandardServer.java:700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:433) Catalina log: org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: Using JNDI solr.home: /var/webapps/solr org.apache.solr.servlet.SolrDispatchFilter init INFO: looking for multicore.xml: /var/webapps/solr/multicore.xml org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start SOLR. Check solr/home property java.lang.ExceptionInInitializerError at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:104) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
Re: ranking on Multivalued fields
Thanks Otis, I am a first time user of SOLR. I understood that my problem calls for a redesign of the document structure. However using CatX and Cat-X-Score is not simple because cat is not fixed set, number of values x can take is not predetermined. However I think dynamic fields might be helpful. If you have any insights please share. thanks again. umar On 3/12/08, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > Umar, > > The notion of "subfield" does not exist in Solr (or am I living under a > rock?). > Thus, val 1 http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: Umar Shah <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > > Sent: Saturday, March 8, 2008 7:03:32 AM > Subject: Re: ranking on Multivalued fields > > Hi Otis, > > thanks for the reply, > > consider a multivalued field name cat > > --other fields > > val 1 score2 > val 3 > > > As for your second question, just add category:X to your query and > you'll > > get matches ordered/ranked by score by default. > > > > Otis > > > > > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > - Original Message > > From: Umar Shah <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Friday, March 7, 2008 1:17:35 AM > > Subject: ranking on Multivalued fields > > > > Hi, > > > > I have a problem where i want to rank multivalued fields > > > > suppose a multivalued field "category" having associated subfield > "score". > > First Is it possible to have a subfield in the mutlivalued field? > > Second I want to get the documents ranked with the highest score say for > > the > > category:X > > > > thanks > > Umar Shah > > > > > > > > > > > >