Question: Pagination with multi index box
if use multi index box, how to pagination with sort by score correctly? for example, i wanna query "search" with 60 index box and sort by score. i don't know the num found from every index box which have different content. if promise 10 page with sort score correctly, i think solr 's start is 0, and rows is 100.(10 result per page) 60*100=6000, sort it and get top 100 to cache. it is very slove although it promise 10 page with sort score correctly. any idea to fix it? fast and correct. -- regards jl
NumberFormat exception when trying to use recip function query
Hi, I am getting the following exception when I try & run any query : java.lang.NumberFormatException: empty String at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:994) at java.lang.Float.parseFloat(Float.java:394) at org.apache.solr.search.QueryParsing$StrParser.getFloat(QueryParsing.java:478) at org.apache.solr.search.QueryParsing.parseValSource(QueryParsing.java:526) at org.apache.solr.search.QueryParsing.parseFunction(QueryParsing.java:579) at org.apache.solr.util.SolrPluginUtils.parseFuncs(SolrPluginUtils.java:519) at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:321) at org.apache.solr.core.SolrCore.execute(SolrCore.java:595) The function query field is - recip(popularityRank, 1, 1000, 1000)^0.5recip(rord(creationDate),1,1000,1000)^ 0.3 relevant field definitions from schema are: I have checked that there is no document that has popularityRank as empty or null. (I ran an update query to set it to a large number when popularityRank was empty or null) If I change the function to rord(popularityRank) - the queries start working Any clue what else I could do to debug this. Thanks, mekin -- My company - http://ugenie.com My Blog - http://mekin.livejournal.com/ My linkedin URL - http://www.linkedin.com/in/mekin
Feature Request: Multiple default search fields
The default search field is really handy. It helps simplify the query, and thus simplify the application using solr. My understand is that solr only allows one default search field. It would be useful to allow multiple default fields, and maybe also specify a global field boost in the schema file, as opposed to on a per document bases during post time. For example, article title can be given a higher boost factor than article content. -- Best regards, Jack
Re: Feature Request: Multiple default search fields
On May 14, 2007, at 12:38 PM, Jack L wrote: The default search field is really handy. It helps simplify the query, and thus simplify the application using solr. My understand is that solr only allows one default search field. It would be useful to allow multiple default fields, and maybe also specify a global field boost in the schema file, as opposed to on a per document bases during post time. For example, article title can be given a higher boost factor than article content. For your first issue, use a copyField to copy all the text you want as default to a default search field. For the second, have you looked at http://wiki.apache.org/solr/ DisMaxRequestHandler ? You set up boosts per field in solrconfig.xml that way.
RE: Requests per second/minute monitor?
I've needed similar logged information recently and I looked at the code and had a few questions: Why does SolrCore.setResponseHeaderValues(...) set the QTime (and other response header options) instead of having it as a function of RequestHandlerBase? If things were tracked in the RequestHandlers you could add timing information there: avg query time, etc. I know some people have argued that you can do that with logs but being able to pull that info live via JMX/stats.jsp would make monitoring much cleaner in environments with multiple machines on different networks. If things are tracked in the handlers then people can add more statistics easily to both query response headers and overall via custom handlers. I'm happy to make the changes and supply a patch to move the logic as well as adding a few simple metrics unless enough people on this thread really feel that it's always better to do it with log files and postmortem math. - will
Re[3]: Multiple fq fields in URL
: q=samsung+camera : : And if samsung is mandatory, the query will be like this: (or not:) : : q=+samsung+camera : : And the first + will be interpreted as mandatory flag? No. bottom line, forget all about URLs and URL escape. step #1: understand the Lucene query syntax... http://lucene.apache.org/java/docs/queryparsersyntax.html in that syntax, this says samsung is mandatory and camera is optional... +samsung camera Step #2: use the admin form in Solr to type in queries, check the debug enable option to see exactly what query structures you are getting at the botom of your results... http://localhost:8983/solr/admin/form.jsp step#3: only after you are sure you understand the syntax, and what result you ar getting as a result, should you look at the URL to see how the Lucene query syntax is being URL escaped. Solr doesn't do anything magic with the URL, it doesn't do any special Solr specific parsing ... the URL must be legal, and it must be valid, it will be parsed/unescaped just like any other CGI/form style URL .. and then the args will be interpreted. I've updated the wiki page that started this thread to try and eli8minate any ambiguity about URL escaping... http://wiki.apache.org/solr/CommonQueryParameters#fq -Hoss
Re: Question: Pagination with multi index box
On 14-May-07, at 1:35 AM, James liu wrote: if use multi index box, how to pagination with sort by score correctly? for example, i wanna query "search" with 60 index box and sort by score. i don't know the num found from every index box which have different content. if promise 10 page with sort score correctly, i think solr 's start is 0, and rows is 100.(10 result per page) 60*100=6000, sort it and get top 100 to cache. it is very slove although it promise 10 page with sort score correctly. With few index partitions, you it is sufficient to ask for startAt +numNeeded docs from each partition and sort globally. Normally if you wanted 10 for the first page, you would ask for 10 from each server and cache the remainder. It is better to ask for more later if the user asks for page ten. When you get up to 60 partitions, you should make it a multi stage process. Assuming your partitions are disjoint and evenly distributed, estimate the number of documents that will appear in the final result from each. Double or triple that (and put a minimum threshold), try to assemble the number of documents you require, and if one partition "runs out" of docs before it is done, request a new round. -Mike
Re: solr for corpus?
Hi matej, since I didn't see anyone answering your question yet, I'll have a go at it, but I'm not one of the Solr developers, I've just used it so far and am very happy with it. I use it for searching literary texts, storing information from a SQL database in the Solr documents as metadata for the texts. [EMAIL PROTECTED] schreef: i test solr as one of potential tools for the purpose of building a linguistic corpus. i'd like to have your opinion, to which extent it would be a good choice. the specifics, which i find deviate from the typical use of solr, are: 1. basic unit is a text (of a book, of a newspaper-article, etc.), with a bibliographic header looking at the examples of the solr-tutorial and the central concept of the "field", i am a bit confused how to map these on one another, ie would the whole text be one field "text" and the bibheader-items individual fields? Yes, you could do that. What I did was: add the text as a whole in one field, add each chapter in it's own field, add metadata fields from a SQL database for each title (e.g. year=1966, author.name=Some one, author.placeofbirth=Somewhere). Basically, everything you want to explicitely search for/in you put in a separate field. 2. real full-text (no stop-words, really every word is indexed, possibly even all the punctuation) Shouldn't be a problem I think. 3. Lemma and PoS-tag (Part-of-Speech tag) or generally additional keys on the word-level, ie for every word we also have its lemma-value and its PoS. In dedicated systems, this is implemented either as verticale (each word in one line): word lemma pos ... or in newer systems with xml-attributes: trees Important is, that it has to be possible to mix this various layers in one query, eg: "word(some) lemma(nice) pos(Noun)" This seems to me to be the biggest challenge for solr. I'm not 100% sure what you are trying to do here, sorry. 4. indexing/searching-ratio corpus is very static: the selection fo texts changes perhaps once a year (in production environment), so it doesnt really matter how long the indexing takes. Regarding the speed the emphasis is on the searches, which have to be "fast", exact and the results have to be further processable (kwic-view possible (though it cuts off searching the text for keywords after 50Kb. Actually, Lucene does that and it is configurable, but it can be annoying, so you might have to hack that if you find that Solr doesn't return a kwic-index for a hit. But maybe I'm not using Solr the right way ;-). ) , thinning the solution, possible sorting, possible export, Not sure what you mean here, but Solr just returns a XML document that you can process any way you like. etc.). "Fast" is important also for more complex queries (ranges, boolean operators and prefixes mixed) and i say 10-15 seconds is the upper limit, which should be rather an exception to the rule of ~ 1 second. 5. also to regard the size: we are talking of multiples of 100 millions of tokens. the canonic example British National Corpus is 100 million, there are corpora with 2 billions tokens That's a lot of text. I find Solr performs very well, but I can't guarantee you that Solr will work in your case, other more knowledgable people might be able to though. Good luck with your decision making! Kind regards, Huib Verweij.
RE: solr for corpus?
Regarding the Lemma and PoS-tag requirement: you might handle this by inserting each word as its own document, with "lemma", "pos", and "word" fields, thereby allowing you lots of search flexibility. You could also include ID fields for the item and (if necessary) part (chapter etc.) and use these as facets, allowing you to group results by the items that contain them. Your application would have to know how to use the item ID value to retrieve the full item-level record. These word-level records could live in a separate index or in the main index (since there are no required fields in Solr, you can have entirely different record structures in a single index; you just have to structure your queries accordingly). The problem will be that because your word-level entries are separate from your item-level entries, you'll have to include in the word-level entries any item-level fields that you want to be able to use in word-level queries (e.g. if you wanted to be able to limit a lemma search by date). The alternative would be to insert the lemma/pos/word entries in a multivalued string field and come up with more complex wildcard query structures to get at them. Apparently you can now get queries with leading and trailing wildcards to work, so you should be able to do everything you need, but I don't know how the performance will be. All the best, Peter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Saturday, May 12, 2007 11:28 AM To: solr-user@lucene.apache.org Subject: solr for corpus? i test solr as one of potential tools for the purpose of building a linguistic corpus. i'd like to have your opinion, to which extent it would be a good choice. the specifics, which i find deviate from the typical use of solr, are: 1. basic unit is a text (of a book, of a newspaper-article, etc.), with a bibliographic header looking at the examples of the solr-tutorial and the central concept of the "field", i am a bit confused how to map these on one another, ie would the whole text be one field "text" and the bibheader-items individual fields? 2. real full-text (no stop-words, really every word is indexed, possibly even all the punctuation) 3. Lemma and PoS-tag (Part-of-Speech tag) or generally additional keys on the word-level, ie for every word we also have its lemma-value and its PoS. In dedicated systems, this is implemented either as verticale (each word in one line): word lemma pos ... or in newer systems with xml-attributes: trees Important is, that it has to be possible to mix this various layers in one query, eg: "word(some) lemma(nice) pos(Noun)" This seems to me to be the biggest challenge for solr. 4. indexing/searching-ratio corpus is very static: the selection fo texts changes perhaps once a year (in production environment), so it doesnt really matter how long the indexing takes. Regarding the speed the emphasis is on the searches, which have to be "fast", exact and the results have to be further processable (kwic-view, thinning the solution, sorting, export, etc.). "Fast" is important also for more complex queries (ranges, boolean operators and prefixes mixed) and i say 10-15 seconds is the upper limit, which should be rather an exception to the rule of ~ 1 second. 5. also to regard the size: we are talking of multiples of 100 millions of tokens. the canonic example British National Corpus is 100 million, there are corpora with 2 billions tokens thank you in advance regards matej
Re: NumberFormat exception when trying to use recip function query
: The function query field is - : : recip(popularityRank, 1, 1000, : 1000)^0.5recip(rord(creationDate),1,1000,1000)^ : 0.3 : off the top of my head, i'd suggest you: 1) verify there is some whitespace between the boost of the popularity recip function and the date recip function 2) eliminate the space inside the recip functions 3) verify that there isno psace between eitehr recp function and it's boost ...and see if that works... recip(popularityRank,1,1000,1000)^0.5 recip(rord(creationDate),1,1000,1000)^0.3 -Hoss
Re: Null pointer exception
: I have tried indexing from the exampledocs which is just sitting in my : user home directory but now I get a null pointer exception after : running: just to clarify: are you using solr 1.1 or a nightly build? did you check the log file to ensure thatthere are no exceptions when you start tomcat? are you using the example solrconfig.xml and schema.xml? have you tried doing a search first without indexing any docs to see if that executs and (correctly) returns 0 docs? If i had to guess, i'd speculate that you aren't correctly using a system prop or JNDI to point Solr at your solr home dir, so it's not finding the configs; either that, or you've modified the configs and there is a syntax error -- either way there should be an exception when the server starts up, well before you update any docs. -Hoss
Re: solr for corpus?
: 3. Lemma and PoS-tag (Part-of-Speech tag) or generally additional keys : on the word-level, : ie for every word we also have its lemma-value and its PoS. : or in newer systems with xml-attributes: : trees : : Important is, that it has to be possible to mix this various layers in : one query, eg: : "word(some) lemma(nice) pos(Noun)" the best way to approach this would probably be to preprocess the data nad use a custom analyzer ... send it to solr with all of the info encoded in each word, (ie: trees__tree_Noun) and then have a custom indexing analyzer create multiple tokens in each position with an easy way to distinguish wether a token is a word, the Lemma for a word, or the POS for word (ie: the regular word plain, the Lemma prefixed by two underscores, and the POS indexed by a single understore) then at query time if you know you are looking for the phrase "some nice trees" you would search for "some nice trees" but if you are looking for the word "some" followed by a word whose lemma is "nice" followed by any Noun, you would search for "some __nice _Noun" : This seems to me to be the biggest challenge for solr. yeah ... neither Solr nor Lucene really attempt to tackly complex query forms like this ... but Lucene has recently added a Token Payload mechanism in an attempt to make queries like this easier (allowing annotation of the actual terms that can be queried instead of needing to create artificial terms in identical positions) : corpus is very static: the selection fo texts changes perhaps once a : year (in production environment), : so it doesnt really matter how long the indexing takes. Regarding the : speed the emphasis is on the searches, : which have to be "fast", exact and the results have to be further : processable (kwic-view, thinning the solution, sorting, export, etc.). : "Fast" is important also for more complex queries (ranges, boolean : operators and prefixes mixed) these things should all be decent, especially since your index will be fairly static so you don't have to worry baout 'warming' FieldCaches for sorting etc something you might wnat to consider if you find query speeds unacceptible on your full corpus with stop words left in would be to sacrifice disk for speed by creating another field where the stop words are removed and using it as much as possible (ie: anytime a query doesn't care about stop words). ... but i wouldn't worry abotu that unless you find it's actually a problem. i've yet to see a complaint from anyone that Solr isn't fast enough unless they are doing heavy faceting, or updating their index so frequently that the caches can't be used. -Hoss
RE: Null pointer exception
Thanks a lot for your reply Chris I am running v1.1.0. If I do a search (from the admin page), it throws the following exception: java.lang.RuntimeException: java.io.IOException: /var/www/html/solr/data/index not a directory There are no exceptions on starting Tomcat, only one warning regarding JMS client lib not found (related to Cocoon). I have named a file solr.xml in my $TOMCAT_HOME/conf/Catalina/localhost directory containing the following: I am using the example configs (unmodified). Thanks again Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946 -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, 15 May 2007 7:27 AM To: solr-user@lucene.apache.org Subject: Re: Null pointer exception : I have tried indexing from the exampledocs which is just sitting in my : user home directory but now I get a null pointer exception after : running: just to clarify: are you using solr 1.1 or a nightly build? did you check the log file to ensure thatthere are no exceptions when you start tomcat? are you using the example solrconfig.xml and schema.xml? have you tried doing a search first without indexing any docs to see if that executs and (correctly) returns 0 docs? If i had to guess, i'd speculate that you aren't correctly using a system prop or JNDI to point Solr at your solr home dir, so it's not finding the configs; either that, or you've modified the configs and there is a syntax error -- either way there should be an exception when the server starts up, well before you update any docs. -Hoss
RE: Null pointer exception
: I am running v1.1.0. If I do a search (from the admin page), it throws : the following exception: : : java.lang.RuntimeException: java.io.IOException: : /var/www/html/solr/data/index not a directory does /var/www/html/solr/data/ exist? ... if so does the effective userID for tomcat have permission to write to it? if not does the effective userID for tomcat have permission to write to /var/www/html/solr/ ? -Hoss
Re: Question: Pagination with multi index box
2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: On 14-May-07, at 1:35 AM, James liu wrote: > if use multi index box, how to pagination with sort by score > correctly? > > for example, i wanna query "search" with 60 index box and sort by > score. > > i don't know the num found from every index box which have different > content. > > if promise 10 page with sort score correctly, i think solr 's start > is 0, > and rows is 100.(10 result per page) > > 60*100=6000, sort it and get top 100 to cache. > it is very slove although it promise 10 page with sort score > correctly. With few index partitions, you it is sufficient to ask for startAt +numNeeded docs from each partition and sort globally. Normally if you wanted 10 for the first page, you would ask for 10 from each server and cache the remainder. It is better to ask for more later if the user asks for page ten. When you get up to 60 partitions, you should make it a multi stage process. Assuming your partitions are disjoint and evenly distributed, estimate the number of documents that will appear in the final result from each. yes, partitions distrbuted. Double or triple that (and put a minimum threshold), try to assemble the number of documents you require, and if one partition "runs out" of docs before it is done, request a new round. i dont' know what u mean "runs out" one user request will generate 60 partitions request. they work in parallel。 so i don't know every partion's status before they done. To promise 10 page result sorted by score correctly, the only way seems to get 100 results(rows=100) from each partitioin. but it very slow. now i wanna find a way to get result sorted by score correctly and search fast. -Mike Thks Mike. But it not i want. -- regards jl
Re: Question: Pagination with multi index box
if i set rows=(page-1)*10,,,it will lose more result which fits query. how to set start when pagination. 2007/5/15, James liu <[EMAIL PROTECTED]>: 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: > > On 14-May-07, at 1:35 AM, James liu wrote: > > > if use multi index box, how to pagination with sort by score > > correctly? > > > > for example, i wanna query "search" with 60 index box and sort by > > score. > > > > i don't know the num found from every index box which have different > > content. > > > > if promise 10 page with sort score correctly, i think solr 's start > > is 0, > > and rows is 100.(10 result per page) > > > > 60*100=6000, sort it and get top 100 to cache. > > > it is very slove although it promise 10 page with sort score > > correctly. > > With few index partitions, you it is sufficient to ask for startAt > +numNeeded docs from each partition and sort globally. Normally if > you wanted 10 for the first page, you would ask for 10 from each > server and cache the remainder. It is better to ask for more later > if the user asks for page ten. > > > When you get up to 60 partitions, you should make it a multi stage > process. Assuming your partitions are disjoint and evenly > distributed, estimate the number of documents that will appear in the > final result from each. yes, partitions distrbuted. Double or triple that (and put a minimum > threshold), try to assemble the number of documents you require, and > if one partition "runs out" of docs before it is done, request a new > round. i dont' know what u mean "runs out" one user request will generate 60 partitions request. they work in parallel。 so i don't know every partion's status before they done. To promise 10 page result sorted by score correctly, the only way seems to get 100 results(rows=100) from each partitioin. but it very slow. now i wanna find a way to get result sorted by score correctly and search fast. -Mike > Thks Mike. But it not i want. -- regards jl -- regards jl
RE: Null pointer exception
Hi Chris The /var/www/html/solr/data/ directory did exist. I tried opening up permissions completely for testing but no luck (the tomcat user had write permissions). I decided to trash the whole installation and start again. I downloaded last nights build and untarred it. Put the .war into $TOMCAT_HOME/webapps. Copied the example/solr directory as /var/www/html/solr. No JNDI file this time, just updated solrconfig to read /var/www/html/solr as my data.dir. I can access the admin page but when I try an index action from the commandline, or a search from the admin page, I get something like: "The requested resource (/solr/select/) is not available" I have other apps running under tomcat okay, seems like it can't find the lib .jars or can't access the classes within them? Stuck... Cheers Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946 -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, 15 May 2007 9:51 AM To: solr-user@lucene.apache.org Subject: RE: Null pointer exception : I am running v1.1.0. If I do a search (from the admin page), it throws : the following exception: : : java.lang.RuntimeException: java.io.IOException: : /var/www/html/solr/data/index not a directory does /var/www/html/solr/data/ exist? ... if so does the effective userID for tomcat have permission to write to it? if not does the effective userID for tomcat have permission to write to /var/www/html/solr/ ? -Hoss
Re: Question: Pagination with multi index box
On 14-May-07, at 6:49 PM, James liu wrote: 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: On 14-May-07, at 1:35 AM, James liu wrote: When you get up to 60 partitions, you should make it a multi stage process. Assuming your partitions are disjoint and evenly distributed, estimate the number of documents that will appear in the final result from each. yes, partitions distrbuted. Double or triple that (and put a minimum threshold), try to assemble the number of documents you require, and if one partition "runs out" of docs before it is done, request a new round. i dont' know what u mean "runs out" Say you request 5 docs from each of 60 partitions, and are interested in docs 1-10. If, sorted by score, the docs come from: p1, p2, p1, p1, p3, p4, p1, p1 Then p1 has "run out" at n=8, and there is no way to be sure if the remaining two needed docs come from p1 or somewhere else. So you have to now request at least two additional documents from p1. one user request will generate 60 partitions request. they work in parallel。 so i don't know every partion's status before they done. Normally, you would wait for them to finish, and execute a subsequent request if more docs are needed. -Mike
Re: Question: Pagination with multi index box
On 14-May-07, at 7:15 PM, James liu wrote: if i set rows=(page-1)*10,,,it will lose more result which fits query. how to set start when pagination. I'm not sure I understand the question. When combining results from partitions, you can't use startAt. You must always assemble the docs from 0 to N for each partition (whether through one request or multiple). -Mike 2007/5/15, James liu <[EMAIL PROTECTED]>: 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: > > On 14-May-07, at 1:35 AM, James liu wrote: > > > if use multi index box, how to pagination with sort by score > > correctly? > > > > for example, i wanna query "search" with 60 index box and sort by > > score. > > > > i don't know the num found from every index box which have different > > content. > > > > if promise 10 page with sort score correctly, i think solr 's start > > is 0, > > and rows is 100.(10 result per page) > > > > 60*100=6000, sort it and get top 100 to cache. > > > it is very slove although it promise 10 page with sort score > > correctly. > > With few index partitions, you it is sufficient to ask for startAt > +numNeeded docs from each partition and sort globally. Normally if > you wanted 10 for the first page, you would ask for 10 from each > server and cache the remainder. It is better to ask for more later > if the user asks for page ten. > > > When you get up to 60 partitions, you should make it a multi stage > process. Assuming your partitions are disjoint and evenly > distributed, estimate the number of documents that will appear in the > final result from each. yes, partitions distrbuted. Double or triple that (and put a minimum > threshold), try to assemble the number of documents you require, and > if one partition "runs out" of docs before it is done, request a new > round. i dont' know what u mean "runs out" one user request will generate 60 partitions request. they work in parallel。 so i don't know every partion's status before they done. To promise 10 page result sorted by score correctly, the only way seems to get 100 results(rows=100) from each partitioin. but it very slow. now i wanna find a way to get result sorted by score correctly and search fast. -Mike > Thks Mike. But it not i want. -- regards jl -- regards jl
Re: Question: Pagination with multi index box
thks for your detail answer. but u ignore "sorted by score" p1, p2,p1,p1,p3,p4,p1,p1 maybe their max score is lower than from p19,p20. so it will not sorted by score correctly. and if user click page 2 to see, how to show data? p1 start from 10 or query other partitions? 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: On 14-May-07, at 6:49 PM, James liu wrote: > 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: >> >> On 14-May-07, at 1:35 AM, James liu wrote: >> >> When you get up to 60 partitions, you should make it a multi stage >> process. Assuming your partitions are disjoint and evenly >> distributed, estimate the number of documents that will appear in the >> final result from each. > > > yes, partitions distrbuted. > > > Double or triple that (and put a minimum >> threshold), try to assemble the number of documents you require, and >> if one partition "runs out" of docs before it is done, request a new >> round. > > > i dont' know what u mean "runs out" Say you request 5 docs from each of 60 partitions, and are interested in docs 1-10. If, sorted by score, the docs come from: p1, p2, p1, p1, p3, p4, p1, p1 Then p1 has "run out" at n=8, and there is no way to be sure if the remaining two needed docs come from p1 or somewhere else. So you have to now request at least two additional documents from p1. > one user request will generate 60 partitions request. > > they work in parallel。 > > so i don't know every partion's status before they done. Normally, you would wait for them to finish, and execute a subsequent request if more docs are needed. -Mike -- regards jl
Documenting function queries [was Re: NumberFormat exception when trying to use recip function query]
2) eliminate the space inside the recip functions This solved it :) I would like to document this along with a little detail about function queries & may be if I get enough time, simple graphs that I created to help people choose the right values for using in the function queries. I dont see a link from the wiki at the top level - http://wiki.apache.org/solr/ I do see a stub for - http://wiki.apache.org/solr/FunctionQuery Which I can start filling up. The other options are - http://wiki.apache.org/solr/SolrRelevancyCookbook and http://wiki.apache.org/solr/DisMaxRequestHandler I am inclined to creating the FuntionQuery page and adding links to it from the other 2 pages. Let me know if you think of a more appropriate place to put this stuff. Thanks, mekin On 5/15/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : The function query field is - : : recip(popularityRank, 1, 1000, : 1000)^0.5recip(rord(creationDate),1,1000,1000)^ : 0.3 : off the top of my head, i'd suggest you: 1) verify there is some whitespace between the boost of the popularity recip function and the date recip function 2) eliminate the space inside the recip functions 3) verify that there isno psace between eitehr recp function and it's boost ...and see if that works... recip(popularityRank,1,1000,1000)^0.5 recip(rord(creationDate),1,1000,1000)^0.3 -Hoss --
Re: Question: Pagination with multi index box
for example, i wanna query "lucene", it's numFound is 234300. and results should sorted by score. if u do, how to pagination and sort it's score? 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: On 14-May-07, at 7:15 PM, James liu wrote: > if i set rows=(page-1)*10,,,it will lose more result which fits query. > > how to set start when pagination. I'm not sure I understand the question. When combining results from partitions, you can't use startAt. if not use startAt, how to define rows to keep user can find results? You must always assemble the docs from 0 to N for each partition (whether through one request or multiple). if rows bigger it will slow, if smaller it will lose data and sort score not correctly. -Mike > > > 2007/5/15, James liu <[EMAIL PROTECTED]>: >> >> >> >> 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: >> > >> > On 14-May-07, at 1:35 AM, James liu wrote: >> > >> > > if use multi index box, how to pagination with sort by score >> > > correctly? >> > > >> > > for example, i wanna query "search" with 60 index box and sort by >> > > score. >> > > >> > > i don't know the num found from every index box which have >> different >> > > content. >> > > >> > > if promise 10 page with sort score correctly, i think solr 's >> start >> > > is 0, >> > > and rows is 100.(10 result per page) >> > > >> > > 60*100=6000, sort it and get top 100 to cache. >> > >> > > it is very slove although it promise 10 page with sort score >> > > correctly. >> > >> > With few index partitions, you it is sufficient to ask for startAt >> > +numNeeded docs from each partition and sort globally. Normally if >> > you wanted 10 for the first page, you would ask for 10 from each >> > server and cache the remainder. It is better to ask for more later >> > if the user asks for page ten. >> > >> > >> > When you get up to 60 partitions, you should make it a multi stage >> > process. Assuming your partitions are disjoint and evenly >> > distributed, estimate the number of documents that will appear >> in the >> > final result from each. >> >> >> yes, partitions distrbuted. >> >> >> Double or triple that (and put a minimum >> > threshold), try to assemble the number of documents you require, >> and >> > if one partition "runs out" of docs before it is done, request a >> new >> > round. >> >> >> i dont' know what u mean "runs out" >> >> one user request will generate 60 partitions request. >> >> they work in parallel。 >> >> so i don't know every partion's status before they done. >> >> >> To promise 10 page result sorted by score correctly, the only way >> seems to >> get 100 results(rows=100) from each partitioin. but it very slow. >> >> now i wanna find a way to get result sorted by score correctly and >> search >> fast. >> >> >> -Mike >> > >> >> Thks Mike. But it not i want. >> >> >> -- >> regards >> jl > > > > > -- > regards > jl -- regards jl
Re: Documenting function queries [was Re: NumberFormat exception when trying to use recip function query]
: I would like to document this along with a little detail about function : queries & may be if I get enough time, simple graphs that I created to help : people choose the right values for using in the function queries. that would be *awesome* : I do see a stub for - http://wiki.apache.org/solr/FunctionQuery there's no actual stub article, but the wiki probably shows you a link there from somewhere that someone types FunctionQuery (since java class names look like wikiwords) so there's no particular reason to fill up that page ... but it's pretty much the best possible name, so my all means start using it. : I am inclined to creating the FuntionQuery page and adding links to it from : the other 2 pages. sounds like a good plan to me. -Hoss
Re: Question: Pagination with multi index box
On 14-May-07, at 8:55 PM, James liu wrote: thks for your detail answer. but u ignore "sorted by score" p1, p2,p1,p1,p3,p4,p1,p1 maybe their max score is lower than from p19,p20. I'm not ignoring it: I'm implying that the above is the correct descending score-sorted order. You have to perform that sort manually. so it will not sorted by score correctly. and if user click page 2 to see, how to show data? p1 start from 10 or query other partitions? Assemble results 1 through 20, then display 11-20 to the user. -Mike 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: On 14-May-07, at 6:49 PM, James liu wrote: > 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: >> >> On 14-May-07, at 1:35 AM, James liu wrote: >> >> When you get up to 60 partitions, you should make it a multi stage >> process. Assuming your partitions are disjoint and evenly >> distributed, estimate the number of documents that will appear in the >> final result from each. > > > yes, partitions distrbuted. > > > Double or triple that (and put a minimum >> threshold), try to assemble the number of documents you require, and >> if one partition "runs out" of docs before it is done, request a new >> round. > > > i dont' know what u mean "runs out" Say you request 5 docs from each of 60 partitions, and are interested in docs 1-10. If, sorted by score, the docs come from: p1, p2, p1, p1, p3, p4, p1, p1 Then p1 has "run out" at n=8, and there is no way to be sure if the remaining two needed docs come from p1 or somewhere else. So you have to now request at least two additional documents from p1. > one user request will generate 60 partitions request. > > they work in parallel。 > > so i don't know every partion's status before they done. Normally, you would wait for them to finish, and execute a subsequent request if more docs are needed. -Mike -- regards jl
Re: Question: Pagination with multi index box
2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: On 14-May-07, at 8:55 PM, James liu wrote: > thks for your detail answer. > > but u ignore "sorted by score" > > p1, p2,p1,p1,p3,p4,p1,p1 > > maybe their max score is lower than from p19,p20. > I'm not ignoring it: I'm implying that the above is the correct descending score-sorted order. You have to perform that sort manually. i mean merged results(from 60 p) and sort it, not solr's sort. every result from box have been sorted by score. so it will not sorted by score correctly. > > and if user click page 2 to see, how to show data? > > p1 start from 10 or query other partitions? Assemble results 1 through 20, then display 11-20 to the user. for example, i wanna query "solr" p1 have 100 results which score is bigger than 80 p2 have 100 results which score is smaller than 20 so if i use rows=10, score not correct. if i wanna promise 10 pages which sort by score correctly. so i have to get 100(rows=100) results from every box. and merge results, sort it, finallay get top 100 results. but it will very slow. i don't know other search how to solve it? maybe they not sort by score very correctly. -Mike > > 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: >> >> On 14-May-07, at 6:49 PM, James liu wrote: >> >> > 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: >> >> >> >> On 14-May-07, at 1:35 AM, James liu wrote: >> >> >> >> When you get up to 60 partitions, you should make it a multi stage >> >> process. Assuming your partitions are disjoint and evenly >> >> distributed, estimate the number of documents that will appear >> in the >> >> final result from each. >> > >> > >> > yes, partitions distrbuted. >> > >> > >> > Double or triple that (and put a minimum >> >> threshold), try to assemble the number of documents you >> require, and >> >> if one partition "runs out" of docs before it is done, request >> a new >> >> round. >> > >> > >> > i dont' know what u mean "runs out" >> >> Say you request 5 docs from each of 60 partitions, and are interested >> in docs 1-10. If, sorted by score, the docs come from: >> >> p1, p2, p1, p1, p3, p4, p1, p1 >> >> Then p1 has "run out" at n=8, and there is no way to be sure if the >> remaining two needed docs come from p1 or somewhere else. So you >> have to now request at least two additional documents from p1. >> >> > one user request will generate 60 partitions request. >> > >> > they work in parallel。 >> > >> > so i don't know every partion's status before they done. >> >> Normally, you would wait for them to finish, and execute a subsequent >> request if more docs are needed. >> >> -Mike > > > > > -- > regards > jl -- regards jl
Re: Question: Pagination with multi index box
maybe full-text search sort correct not very import. 2007/5/15, James liu <[EMAIL PROTECTED]>: 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: > > On 14-May-07, at 8:55 PM, James liu wrote: > > > thks for your detail answer. > > > > but u ignore "sorted by score" > > > > p1, p2,p1,p1,p3,p4,p1,p1 > > > > maybe their max score is lower than from p19,p20. > > > > I'm not ignoring it: I'm implying that the above is the correct > descending score-sorted order. You have to perform that sort manually. i mean merged results(from 60 p) and sort it, not solr's sort. every result from box have been sorted by score. > so it will not sorted by score correctly. > > > > and if user click page 2 to see, how to show data? > > > > p1 start from 10 or query other partitions? > > Assemble results 1 through 20, then display 11-20 to the user. for example, i wanna query "solr" p1 have 100 results which score is bigger than 80 p2 have 100 results which score is smaller than 20 so if i use rows=10, score not correct. if i wanna promise 10 pages which sort by score correctly. so i have to get 100(rows=100) results from every box. and merge results, sort it, finallay get top 100 results. but it will very slow. i don't know other search how to solve it? maybe they not sort by score very correctly. -Mike > > > > > 2007/5/15, Mike Klaas <[EMAIL PROTECTED] >: > >> > >> On 14-May-07, at 6:49 PM, James liu wrote: > >> > >> > 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>: > >> >> > >> >> On 14-May-07, at 1:35 AM, James liu wrote: > >> >> > >> >> When you get up to 60 partitions, you should make it a multi stage > >> >> process. Assuming your partitions are disjoint and evenly > >> >> distributed, estimate the number of documents that will appear > >> in the > >> >> final result from each. > >> > > >> > > >> > yes, partitions distrbuted. > >> > > >> > > >> > Double or triple that (and put a minimum > >> >> threshold), try to assemble the number of documents you > >> require, and > >> >> if one partition "runs out" of docs before it is done, request > >> a new > >> >> round. > >> > > >> > > >> > i dont' know what u mean "runs out" > >> > >> Say you request 5 docs from each of 60 partitions, and are interested > > >> in docs 1-10. If, sorted by score, the docs come from: > >> > >> p1, p2, p1, p1, p3, p4, p1, p1 > >> > >> Then p1 has "run out" at n=8, and there is no way to be sure if the > >> remaining two needed docs come from p1 or somewhere else. So you > >> have to now request at least two additional documents from p1. > >> > >> > one user request will generate 60 partitions request. > >> > > >> > they work in parallel。 > >> > > >> > so i don't know every partion's status before they done. > >> > >> Normally, you would wait for them to finish, and execute a subsequent > > >> request if more docs are needed. > >> > >> -Mike > > > > > > > > > > -- > > regards > > jl > > -- regards jl -- regards jl
Re: NumberFormat exception when trying to use recip function query
Done. Please check - http://wiki.apache.org/solr/FunctionQuery and send me your comments (or improve the wiki ) Right now its more of a aggregation of all relevant information. I hope people will be able to add notes like what values to use, pitfalls to avoid, behaviour in special cases as well. For example, I dont know how these functions deal with missing values. -mekin On 5/15/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : The function query field is - : : recip(popularityRank, 1, 1000, : 1000)^0.5recip(rord(creationDate),1,1000,1000)^ : 0.3 : off the top of my head, i'd suggest you: 1) verify there is some whitespace between the boost of the popularity recip function and the date recip function 2) eliminate the space inside the recip functions 3) verify that there isno psace between eitehr recp function and it's boost ...and see if that works... recip(popularityRank,1,1000,1000)^0.5 recip(rord(creationDate),1,1000,1000)^0.3 -Hoss -- My company - http://ugenie.com My Blog - http://mekin.livejournal.com/ My linkedin URL - http://www.linkedin.com/in/mekin