How to display search results of solr in to other application.
Hi, I am creating indexes using solr which is running on jetty server port 8983, and my application is running on tomcat server port 8080. Now my problem is i want to display the results of search on my application. i created a ajax-javascript page for parsing Json object. now please suggest me how i send my request to solr server for search and get back the result. Here is my sample html file where i parsed Json data. Solr Ajax Example query: Raw JSON String: I suppose i am making mistake in xmlhttpPost("/solr/db/select"). Thanks and regards Romi. - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014101.html Sent from the Solr - User mailing list archive at Nabble.com.
tika and solr 3,1 integration
Hi I am trying to integrate solr 3.1 and tika (which comes default with the version) and using curl command trying to index few of the documents, i am getting this error. the error is attr_meta field is unknown. i checked the solrconfig, it looks perfect to me. can you please tell me what i am missing. I copied all the jars from contrib/extraction/lib to solr/lib folder that is there in same place where conf is there I am using the same request handler which is coming with default text true ignored_ true links ignored_ * curl " http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true"; -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"* Apache Tomcat/6.0.18 - Error report HTTP Status 400 - ERROR:unknown field 'attr_meta'type Status reportmessage ERROR:unknown field 'attr_meta'description The request sent by the client was syntactically incorrect (ERROR:unknown field 'attr_meta').Apache Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib# Please note i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows machine and using solr cell calling the program works fine without any changes in configuration. Thanks Naveen
how to request for Json object
How to parse Json through ajax when your ajax pager is on one server(Tomcat)and Json object is of onther server(solr server). i mean i have to make a request to another server, how can i do it . - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html Sent from the Solr - User mailing list archive at Nabble.com.
Standard Request Handler Boosting
I want to know what is the difference between the normal Boosting and boosting using functionQuery for the standard request handler In the example below I want to boost the field 2 with higher influence on score Example: field1: field2:^boost value Example : field1 AND _val_:""^boost value Regards Sujatha
Re: how to request for Json object
ajax does not allow request to an other domain. Only sway, unless using server side requests, is going through a proxy that would hide the host origin so that ajax request think both servers are the same 2011/6/2 Romi > How to parse Json through ajax when your ajax pager is on one > server(Tomcat)and Json object is of onther server(solr server). i mean i > have to make a request to another server, how can i do it . > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: how to request for Json object
look at d uploaded file here here it is making request from my local server for Json to server "http://api.flickr.com i juss want the same i want to request for Json from local server to solr server. http://lucene.472066.n3.nabble.com/file/n3014191/Jquery_Json.html Jquery_Json.html - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014191.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to request for Json object
sorry for the inconvenience, please look at this file http://lucene.472066.n3.nabble.com/file/n3014224/JsonJquery.text JsonJquery.text - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014224.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr memory consumption
> Hey Denis, > * How big is your index in terms of number of documents and index size? 5 cores, average 250.000 documents, one with about 1 million (but without text, just int/float fields), one with about 10 million id/name documents, but with n-gram. Size: 4 databases about 1G (sum), 1 database (with n-gram) for 21G.. I don't know any other way to search for product names except n-gram =\ > * Is it production system where you have many search requests? Yes, dependent on database, but not less than 100 req/sec > * Is there any pattern for OOM errors? I.e. right after you start your > Solr app, after some search activity or specific Solr queries, etc? No, java just raises memory size used all the time until it crush. > * What are 1) cache settings 2) facets and sort-by fields 3) commit > frequency and warmup queries? All settings are default (as given in trunk / example). Facets are used, sort by also used. Commits are divided into 2 groups: - often but small (last changed info) - 1 time per day all the database > etc > Generally you might want to connect to your jvm using jconsole tool > and monitor your heap usage (and other JVM/Solr numbers) > * http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html > * http://wiki.apache.org/solr/SolrJmx#Remote_Connection_to_Solr_JMX > HTH, > Alexey
Multilingual text analysis
Hello, Some of the possible analyzers that can be applied to a text field, depend on the language of the text to analyze and can be configured for a concrete language. In my case, the text fields can be in many different languages, but each document also includes a field containing the language of text fields. Is it possible to configure analyzers to use the suitable language for each document, in function of the language field? Thanks, Juan
Re: synonyms problem
On Thu, Jun 2, 2011 at 11:58 AM, deniz wrote: > Hi all, > > here is a piece from my solfconfig: [...] > but somehow synonyms are not read... I mean there is no match when i use a > word in the synonym file... any ideas? [...] Please provide further details, e.g., is your field in schema.xml using this fieldType, one example line from the synonyms.txt file, how are you searching, what results you expect to get, and what are the actual results. Also, while this is not the issue here, normally the fieldType "string" is a non-analyzed field, and one would normally use a different fieldType, e.g., "text" for data that are to be analyzed. Regards, Gora
Question about sorting by coordination factor
Hi, I am trying to solve a sorting problem using Solr. The sorting requirements are a bit complicated. I have to sort the documents by three different criteria: - First by number of keywords that match (coordination factor) - Then, within the documents that match the same number of keywords, sort first the documents that match a user value (country) and then the rest. - Then within those two blocks, sort by a document value (popularity). I have managed to make the second and third criteria to work, with a query like this: http://localhost:8983/solr/select/?q=description%3Afootball&version=2.2&start=0&rows=10&indent=on&qq=country_uk:true&sort=map%28query%28$qq,-1%29,0,999,1%29%20desc,popularity%20desc This gets with the query function a positive value for the documents that match the country, and a negative for the ones that don't, and then maps those ones to 1, so I have two blocks of documents with sorting value of 1 and -1, which works for me cause ties are then sorted by popularity. But as you see, this is only searching for 1 keyword. My problem comes with the first requirement when we search for more than one keyword, because as I understand, I would like to sort by the coordination factor, which is the number of query keywords that each document matches. The problem is that there's no Function Query I can use to get that value, so I don't know how to proceed. I was trying to understand if there was a way to split the regular score into sets which should mean that the same number of keywords was matched, but the score depends on different things, and the range of values can be arbitrary, so I'm not able to make such a function. Is there any solution to this? Thanks, Jesus.
Sorting algorithm
Hi, I want to do a similar sorting function query to the way reddit handles its ranking. I have the date stored in a I also have the number of twitter, facebook and reads from our site stored. below is the pseudo code that I want to work out. var t = (CreationDate - 1131428803) / 1000; var x = FacebookCount + TwitterCount + VoteCount - DownVoteCount; var y = 0; if (x > 0) { y = 1; } else if (x == 0) { y = 0; } else if (x < 0) { y = -1; } var z = 1; var absX = Math.abs(x); if (absX >= 1) { z = absX; } var ranking = (Math.log(z) / Math.LN10) + ((y * t) / 45000); I have no Java experience so I cannot re-write it as a custom function. This is my current query I am trying to use. http://127.0.0.1:8983/solr/select?q.alt=*:*&fq=content_type:news&start=0&rows=10&wt=json&indent=on&omitHeader=true &fl=id,name,excerpt,timestamp,domain,source,facebook,twitter,read,imageheight &defType=dismax &tt=div(sub(_val_:timestamp,1131428803),1000) &xx=sub(sum(facebook,twitter,read),0) &yy=map(query($xx),1,,1,map(query($xx),0,0,0,map(query($xx),-,-1,-1,0))) &zz=map(abs(query($xx)),-9,0,1) &sort=sum(div(log(query($zz)),ln(10)),div(product(query($yy),query($tt)),45000)) desc Currently I am getting errors relating to my date field when trying to convert it from the TrieDate to timestamp with the _val_:MyDateField. Also I wanted to know if their is another way to do this? If my query is even correct. Thanks in advance Richard -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014549.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms problem
Deniz, it looks like you are missing an index anlayzer ? or have you removed that for brevity ? lee c On 2 June 2011 10:41, Gora Mohanty wrote: > On Thu, Jun 2, 2011 at 11:58 AM, deniz wrote: >> Hi all, >> >> here is a piece from my solfconfig: > [...] >> but somehow synonyms are not read... I mean there is no match when i use a >> word in the synonym file... any ideas? > [...] > > Please provide further details, e.g., is your field in schema.xml using > this fieldType, one example line from the synonyms.txt file, how are > you searching, what results you expect to get, and what are the actual > results. > > Also, while this is not the issue here, normally the fieldType > "string" is a non-analyzed field, and one would normally use > a different fieldType, e.g., "text" for data that are to be analyzed. > > Regards, > Gora >
Re: synonyms problem
oh and its a string field change this to be text if you need analysis class="solr.StrField" lee c On 2 June 2011 11:45, lee carroll wrote: > Deniz, > > it looks like you are missing an index anlayzer ? or have you removed > that for brevity ? > > lee c > > On 2 June 2011 10:41, Gora Mohanty wrote: >> On Thu, Jun 2, 2011 at 11:58 AM, deniz wrote: >>> Hi all, >>> >>> here is a piece from my solfconfig: >> [...] >>> but somehow synonyms are not read... I mean there is no match when i use a >>> word in the synonym file... any ideas? >> [...] >> >> Please provide further details, e.g., is your field in schema.xml using >> this fieldType, one example line from the synonyms.txt file, how are >> you searching, what results you expect to get, and what are the actual >> results. >> >> Also, while this is not the issue here, normally the fieldType >> "string" is a non-analyzed field, and one would normally use >> a different fieldType, e.g., "text" for data that are to be analyzed. >> >> Regards, >> Gora >> >
Re: Multilingual text analysis
Juan I don't think so. you can try indexing fields like myfield_en. myfield_fr, my field_xx if you now what language you are dealing with at index and query time. you can also have seperate cores for your documents for each language if you don't want to complicate your schema again you will need to know language at index and query time On 2 June 2011 08:57, Juan Antonio Farré Basurte wrote: > Hello, > Some of the possible analyzers that can be applied to a text field, depend on > the language of the text to analyze and can be configured for a concrete > language. > In my case, the text fields can be in many different languages, but each > document also includes a field containing the language of text fields. > Is it possible to configure analyzers to use the suitable language for each > document, in function of the language field? > Thanks, > > Juan
Re: synonyms problem
Are you sure solr.StrField is the way to go with this? solr.StrField stores the entire text verbatim and I am pretty sure skips any analysis. Perhaps you should use solr.TextField instead. François On Jun 2, 2011, at 2:28 AM, deniz wrote: > Hi all, > > here is a piece from my solfconfig: > > omitNorms="true"> > > > ignoreCase="true" expand="true"/> > > > > > but somehow synonyms are not read... I mean there is no match when i use a > word in the synonym file... any ideas? > > - > Zeki ama calismiyor... Calissa yapar... > -- > View this message in context: > http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3014006.html > Sent from the Solr - User mailing list archive at Nabble.com.
query routing with shards
Hello all, We have currently several pretty fat logically isolated shards with the same schema / solrconfig (indices are separate). We currently have one single front end SOLR (1.4) for the client code calls. Since a client code query usually hits only one shard, we are considering making a smart routing of queries to the shards they map to. Can you please give some pointers as to what would be an optimal way to achieve such a routing inside the front end solr? Is there a way to configure mapping inside the solrconfig? Thanks. -- Regards, Dmitry Kan
Re: How to display search results of solr in to other application.
this is from another post and could help Can you use a javascript library which handles ajax and json / jsonp You will end up with much cleaner client code for example a jquery implementation looks quite nice using solrs neat jsonp support: queryString = "*:*" $.getJSON( "http://[server]:[port]/solr/select/?jsoncallback=?";, {"q": queryString, "version": "2.2", "start": "0", "rows": "10", "indent": "on", "json.wrf": "callbackFunctionToDoSomethingWithOurData", "wt": "json", "fl": "field1"} ); and the callback function function callbackFunctionToDoSomethingWithOurData(solrData) { // do stuff with your nice data } Their is also a javascript client for solr as well but i've not used this On 2 June 2011 08:14, Romi wrote: > Hi, I am creating indexes using solr which is running on jetty server port > 8983, and my application is running on tomcat server port 8080. Now my > problem is i want to display the results of search on my application. i > created a ajax-javascript page for parsing Json object. now please suggest > me how i send my request to solr server for search and get back the result. > > Here is my sample html file where i parsed Json data. > > > > Solr Ajax Example > > > > > > query: > > > > Raw JSON String: > > > > > > > I suppose i am making mistake in xmlhttpPost("/solr/db/select"). > > Thanks and regards > Romi. > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014101.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: how to request for Json object
use solrs jasonp format On 2 June 2011 08:54, Romi wrote: > sorry for the inconvenience, please look at this file > http://lucene.472066.n3.nabble.com/file/n3014224/JsonJquery.text > JsonJquery.text > > > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014224.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Result Grouping always returns grouped output
Hi Karel, group.main=true should do the trick. When that is set to true the group.format is always simple. Martijn On 27 May 2011 19:13, kare...@gmail.com wrote: > Hello, > > I am using the latest nightly build of Solr 4.0 and I would like to > use grouping/field collapsing while maintaining compatibility with my > current parser. I am using the regular webinterface to test it, the > same commands like in the wiki, just with the field names matching my > dataset. > > Grouping itself works, group=true and group.field return the expected > results, but neither group.main=true or group.format=simple seem to > change anything. > > Do I have to include something special in solrconconfig.xml or > scheme.xml to make the simple output work? > > Thanks for any hints, > K > -- Met vriendelijke groet, Martijn van Groningen
Re: how to request for Json object
This is not really an issue with SOLR per se, and I have run into this before, you will need to read up on 'Access-Control-Allow-Origin' which needs to be set in the http headers that your ajax pager is returning. Beware that not all browsers obey it and Olivier is right when he suggested creating a proxy, which I did. François On Jun 2, 2011, at 3:27 AM, Romi wrote: > How to parse Json through ajax when your ajax pager is on one > server(Tomcat)and Json object is of onther server(solr server). i mean i > have to make a request to another server, how can i do it . > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html > Sent from the Solr - User mailing list archive at Nabble.com.
tika and solr 3,1 integration error
Hi I am trying to integrate solr 3.1 and tika (which comes default with the version) and using curl command trying to index few of the documents, i am getting this error. the error is attr_meta field is unknown. i checked the solrconfig, it looks perfect to me. can you please tell me what i am missing. I copied all the jars from contrib/extraction/lib to solr/lib folder that is there in same place where conf is there I am using the same request handler which is coming with default > > > text > true > ignored_ > > > true > links > ignored_ > > > > > > > > * curl " > http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true"; > -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"* > > > Apache Tomcat/6.0.18 - Error report > HTTP Status 400 - ERROR:unknown field 'attr_meta' size="1" noshade="noshade">type Status reportmessage > ERROR:unknown field 'attr_meta'description The > request sent by the client was syntactically incorrect (ERROR:unknown field > 'attr_meta').Apache > Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib# > > > Please note > > i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows > machine and using solr cell > > calling the program works fine without any changes in configuration. > > Thanks > Naveen > >
Re: Question about sorting by coordination factor
Say you're trying to match terms A, B, C. Would something like (A AND B AND C)^1000 OR (A AND B)^100 OR (A AND C)^100 OR (B AND C)^100 OR A OR B OR C work? It wouldn't be an absolute ordering, but it would tend to push the documents where all three terms matched toward the top. It would get really cumbersome if there were lots of terms, but. Best Erick On Thu, Jun 2, 2011 at 6:21 AM, Jesus Gabriel y Galan wrote: > Hi, > > I am trying to solve a sorting problem using Solr. The sorting requirements > are a bit complicated. > I have to sort the documents by three different criteria: > > - First by number of keywords that match (coordination factor) > - Then, within the documents that match the same number of keywords, sort > first the documents that match a user value (country) and then the rest. > - Then within those two blocks, sort by a document value (popularity). > > I have managed to make the second and third criteria to work, with a query > like this: > > http://localhost:8983/solr/select/?q=description%3Afootball&version=2.2&start=0&rows=10&indent=on&qq=country_uk:true&sort=map%28query%28$qq,-1%29,0,999,1%29%20desc,popularity%20desc > > This gets with the query function a positive value for the documents that > match the country, and a negative for the ones that don't, and then maps > those ones to 1, so I have two blocks of documents with sorting value of 1 > and -1, which works for me cause ties are then sorted by popularity. But as > you see, this is only searching for 1 keyword. > > My problem comes with the first requirement when we search for more than one > keyword, because as I understand, I would like to sort by the coordination > factor, which is the number of query keywords that each document matches. > The problem is that there's no Function Query I can use to get that value, > so I don't know how to proceed. I was trying to understand if there was a > way to split the regular score into sets which should mean that the same > number of keywords was matched, but the score depends on different things, > and the range of values can be arbitrary, so I'm not able to make such a > function. > > Is there any solution to this? > > Thanks, > > Jesus. >
Function Query not getting picked up by Standard Query Parser
Hello, I'm trying to find out why my Function Query isn't getting picked up by the Standard Parser. More specifically I send the following set of http params (I'm using the "_val_" syntax): . "creationDate"^0.01 on 225 allFields:(born to be wild) 5 . and turning on Debug Query yields the following calculation for the first result: . 0.29684606 = (MATCH) product of: 0.5936921 = (MATCH) sum of: 0.5936921 = (MATCH) weight(allFields:wild in 13093), product of: 0.64602524 = queryWeight(allFields:wild), product of: 5.88155 = idf(docFreq=223, maxDocs=29531) 0.10983928 = queryNorm 0.91899216 = (MATCH) fieldWeight(allFields:wild in 13093), product of: 1.0 = tf(termFreq(allFields:wild)=1) 5.88155 = idf(docFreq=223, maxDocs=29531) 0.15625 = fieldNorm(field=allFields, doc=13093) 0.5 = coord(1/2) . but I don't see anywhere my Function Query affecting the score.. Is there something else I should be setting? what am I missing? Cheers, Savvas
Re: how to request for Json object
just to re-iterate jasonp gets round ajax same server policy 2011/6/2 François Schiettecatte : > This is not really an issue with SOLR per se, and I have run into this > before, you will need to read up on 'Access-Control-Allow-Origin' which needs > to be set in the http headers that your ajax pager is returning. Beware that > not all browsers obey it and Olivier is right when he suggested creating a > proxy, which I did. > > François > > On Jun 2, 2011, at 3:27 AM, Romi wrote: > >> How to parse Json through ajax when your ajax pager is on one >> server(Tomcat)and Json object is of onther server(solr server). i mean i >> have to make a request to another server, how can i do it . >> >> - >> Thanks & Regards >> Romi >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html >> Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: how to request for Json object
I did this: $(document).ready(function(){ $.getJSON("http://[remotehost]:8983/solr/select/?q=diamond&wt=json&json.wrf=?";, function(result){ alert("hello" + result.response.docs[0].name); }); }); But i am not getting any result, what i did wrong ?? - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014792.html Sent from the Solr - User mailing list archive at Nabble.com.
'deltaImportQuery' attribute is not specified for entity : user
Hi,I'm try to build a delta index . I really have a entity calls 'user' in data-config.xml like '
Re: How to display search results of solr in to other application.
I did this: $(document).ready(function(){ $.getJSON("http://[remotehost]:8983/solr/select/?q=diamond&wt=json&json.wrf=?";, function(result){ alert("hello" + result.response.docs[0].name); }); }); But i am not getting any result, what i did wrong ?? - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014797.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 'deltaImportQuery' attribute is not specified for entity : user
take a look at the following url it might help you. http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014805.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Function Query not getting picked up by Standard Query Parser
For this to work, _val_:"" goes *in* the q parameter, not as a separate parameter. See here for more details: http://wiki.apache.org/solr/SolrQuerySyntax#Differences_From_Lucene_Query_Parser Erik On Jun 2, 2011, at 07:46 , Savvas-Andreas Moysidis wrote: > Hello, > > I'm trying to find out why my Function Query isn't getting picked up by the > Standard Parser. > More specifically I send the following set of http params (I'm using the > "_val_" syntax): > . > > > "creationDate"^0.01 > on > 225 > allFields:(born to be wild) > 5 > > . > > and turning on Debug Query yields the following calculation for the first > result: > . > > 0.29684606 = (MATCH) product of: > 0.5936921 = (MATCH) sum of: >0.5936921 = (MATCH) weight(allFields:wild in 13093), product of: > 0.64602524 = queryWeight(allFields:wild), product of: >5.88155 = idf(docFreq=223, maxDocs=29531) >0.10983928 = queryNorm > 0.91899216 = (MATCH) fieldWeight(allFields:wild in 13093), product of: >1.0 = tf(termFreq(allFields:wild)=1) >5.88155 = idf(docFreq=223, maxDocs=29531) >0.15625 = fieldNorm(field=allFields, doc=13093) > 0.5 = coord(1/2) > . > > but I don't see anywhere my Function Query affecting the score.. > Is there something else I should be setting? what am I missing? > > Cheers, > Savvas
Re: Sorting algorithm
Hi Richard, all your data seem to be available at indexing time, am I correct? Why don't you do the math at index time and just index the result on a field, on which you can sort later at query time? On Thu, Jun 2, 2011 at 7:26 AM, Richard Hodsdon wrote: > Hi, > > I want to do a similar sorting function query to the way reddit handles its > ranking. > I have the date stored in a > precisionStep="6" positionIncrementGap="0"/> > > I also have the number of twitter, facebook and reads from our site stored. > below is the pseudo code that I want to work out. > > var t = (CreationDate - 1131428803) / 1000; > var x = FacebookCount + TwitterCount + VoteCount - DownVoteCount; > var y = 0; > if (x > 0) { > y = 1; > } else if (x == 0) { > y = 0; > } else if (x < 0) { > y = -1; > } > var z = 1; > var absX = Math.abs(x); > if (absX >= 1) { > z = absX; > } > var ranking = (Math.log(z) / Math.LN10) + ((y * t) / 45000); > > I have no Java experience so I cannot re-write it as a custom function. > This is my current query I am trying to use. > > > http://127.0.0.1:8983/solr/select?q.alt=*:*&fq=content_type:news&start=0&rows=10&wt=json&indent=on&omitHeader=true > > &fl=id,name,excerpt,timestamp,domain,source,facebook,twitter,read,imageheight > &defType=dismax > &tt=div(sub(_val_:timestamp,1131428803),1000) > &xx=sub(sum(facebook,twitter,read),0) > > &yy=map(query($xx),1,,1,map(query($xx),0,0,0,map(query($xx),-,-1,-1,0))) > &zz=map(abs(query($xx)),-9,0,1) > > &sort=sum(div(log(query($zz)),ln(10)),div(product(query($yy),query($tt)),45000)) > desc > > Currently I am getting errors relating to my date field when trying to > convert it from the TrieDate to timestamp with the _val_:MyDateField. > > Also I wanted to know if their is another way to do this? If my query is > even correct. > > Thanks in advance > > Richard > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014549.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: how to request for Json object
$.getJSON("http://192.168.1.9:8983/solr/db/select/?q=diamond&wt=json&json.wrf=?";, function(result){ alert("hello" + result.response.docs[0].name); }); }); using this i got the result. But as you can see it is hard coded, i am passing a query in the url how can i make it as user choice. - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014928.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NRT facet search options comparison
Andy: I did not actually measure and benchmark facet search performance with and without NRT. The screenshots though show the performance impact if you look at the QTime parameter in Fig 1 and Fig 2. I did not notice any appreciable difference in performance when I tested faceting with NRT. The current implementation just recreates the UnInvertedField cache as this was easier to implement. This needs be dynamic and not recreated. Regards, - NN On 6/1/2011 8:53 PM, Andy wrote: Nagendra, Thanks. Can you comment on the performance impact of NRT on facet search? The pages you linked to don't really touch on that. My concern is that with NRT, the facet cache will be constantly invalidated. How will that impact the performance of faceting? Do you have any benchmark comparing the performance of facet search with and without NRT? Thanks Andy --- On Wed, 6/1/11, Nagendra Nagarajayya wrote: From: Nagendra Nagarajayya Subject: Re: NRT facet search options comparison To: solr-user@lucene.apache.org Date: Wednesday, June 1, 2011, 11:29 PM Hi Andy: Here is a white paper that shows screenshots of faceting working with Solr and RankingAlgorithm under NRT: http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search The implementation (src) is also available with the download and is described in the below document: http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf The faceting test was done with the mbartists demo from the book, Solr-14-Enterprise-Search-Server and is approx around 390k docs. Regards, - Nagendra Nagarajayya http://solr-ra.tgels.com http://rankingalgorithm.tgels.com On 6/1/2011 12:52 PM, Andy wrote: Hi, I need to provide NRT search with faceting. Been looking at the options out there. Wondered if anyone could clarify some questions I have and perhaps share your NRT experiences. The various NRT options: 1) Solr -Solr doesn't have NRT, yet. What is the expected time frame for NRT? Is it a few months or more like a year? -How would Solr faceting work with NRT? My understanding is that faceting in Solr relies on caching, which doesn't go well with NRT updates. When NRT arrives, would facet performance take a huge drop when using with NRT because of this caching issue? 2) ElasticSearch -ES supports NRT so that's great. Does anyone have experiences with ES that they could share? Does faceting work with NRT in ES? Any Solr features that are missing in ES? 3) Solr-RA -Read in this list about Solr-RA, which has NRT support. Has anyone used it? Can you share your experiences? -Again not sure if facet would work with Solr-RA NRT. Solr-RA is based on Solr, so faceting in Solr-RA relies on caching I suppose. Does NRT affect facet performance? 4) Zoie plugin for Solr -Zoie is a NRT search library. I tried but couldn't get the Zoie plugin to work with Solr. Always got the error message of opening too many Searchers. Has anyone got this to work? Any other options? Thanks Andy
how to make getJson parameter dynamic
$.getJSON("http://192.168.1.9:8983/solr/db/select/?q=diamond&wt=json&json.wrf=?";, function(result){ alert("hello" + result.response.docs[0].name); }); }); using this i am parsing solr json response, but as you can see it is hard coded (q=diamond) how can i make it user's choice. i mean user can pass the query at run time for example using a text box. - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3014941.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting algorithm
Thanks for the response, You are correct, but my pseudo code was not. this line var t = (CreationDate - 1131428803) / 1000; should be var t = (CreationDate - now()) / 1000; This will cause the items ranking to depreciate over time. Richard -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about sorting by coordination factor
On 02/06/11 13:32, Erick Erickson wrote: Say you're trying to match terms A, B, C. Would something like (A AND B AND C)^1000 OR (A AND B)^100 OR (A AND C)^100 OR (B AND C)^100 OR A OR B OR C work? It wouldn't be an absolute ordering, but it would tend to push the documents where all three terms matched toward the top. The problem with this is that that would give better score to the documents with most number of matches, but then I have to sort internally those groups. So I'd need a sort=score,xxx,yyy and the score would not be equal for the documents which match the same number of keywords. I would need to have as many groups as keywords, and within each group all documents need to have the same value for that sorting criteria (score or a function or whatever), so that they tie, and they move to the next sorting criteria. Thanks, Jesus.
Re: Function Query not getting picked up by Standard Query Parser
great, that did it! I can now see the Function Query part in the calculation. Thanks very much Eric, Savvas On 2 June 2011 13:28, Erik Hatcher wrote: > For this to work, _val_:"" goes *in* the q parameter, not as a separate > parameter. > > See here for more details: > http://wiki.apache.org/solr/SolrQuerySyntax#Differences_From_Lucene_Query_Parser > >Erik > > On Jun 2, 2011, at 07:46 , Savvas-Andreas Moysidis wrote: > > > Hello, > > > > I'm trying to find out why my Function Query isn't getting picked up by > the > > Standard Parser. > > More specifically I send the following set of http params (I'm using the > > "_val_" syntax): > > . > > > > > > "creationDate"^0.01 > > on > > 225 > > allFields:(born to be wild) > > 5 > > > > . > > > > and turning on Debug Query yields the following calculation for the first > > result: > > . > > > > 0.29684606 = (MATCH) product of: > > 0.5936921 = (MATCH) sum of: > >0.5936921 = (MATCH) weight(allFields:wild in 13093), product of: > > 0.64602524 = queryWeight(allFields:wild), product of: > >5.88155 = idf(docFreq=223, maxDocs=29531) > >0.10983928 = queryNorm > > 0.91899216 = (MATCH) fieldWeight(allFields:wild in 13093), product > of: > >1.0 = tf(termFreq(allFields:wild)=1) > >5.88155 = idf(docFreq=223, maxDocs=29531) > >0.15625 = fieldNorm(field=allFields, doc=13093) > > 0.5 = coord(1/2) > > . > > > > but I don't see anywhere my Function Query affecting the score.. > > Is there something else I should be setting? what am I missing? > > > > Cheers, > > Savvas > >
Re: Question about sorting by coordination factor
Ahhh, you're right. I know there's been some discussion in the past about how to find out the number of terms that matched, but don't remember the outcome off-hand. You might try searching the mail archive for something like "number of matching terms" or some such. Sorry I'm not more help Erick On Thu, Jun 2, 2011 at 8:48 AM, Jesus Gabriel y Galan wrote: > On 02/06/11 13:32, Erick Erickson wrote: >> >> Say you're trying to match terms A, B, C. Would something like >> >> (A AND B AND C)^1000 OR (A AND B)^100 OR (A AND C)^100 OR (B AND >> C)^100 OR A OR B OR C >> >> work? It wouldn't be an absolute ordering, but it would tend to >> push the documents where all three terms matched toward >> the top. > > The problem with this is that that would give better score to the documents > with most number of matches, but then I have to sort internally those > groups. So I'd need a sort=score,xxx,yyy and the score would not be equal > for the documents which match the same number of keywords. > I would need to have as many groups as keywords, and within each group all > documents need to have the same value for that sorting criteria (score or a > function or whatever), so that they tie, and they move to the next sorting > criteria. > > Thanks, > > Jesus. >
Re: Sorting algorithm
OK, then (everything that's available at index time, I'll say it's constant): (Math.log(z) / Math.LN10) (not sure what you mean with Math.LN10) is constant, I'll call it c1 ((y * t) / 45000) = (y/4500)*t --> y/4500 is constant, I'll call it c2. c1+(c2 * t) = c1 + (c2 * (CreationDate - now) / 1000) --> c2 / 1000 is also constant, I'll call it c3. Then, your ranking formula is: c1 + (c3 * (creationDate - now)). In solr, this will be: &sort=sum(c1,product(c3,ms(creationDate, NOW))). I haven't tried it but if my arithmetics are correct (I'm a little bit rusty with that), that should work and should be faster than doing the whole thing at query time. Of course, "c1" and "c3" must be indexed as fields. Regards, Tomás On Thu, Jun 2, 2011 at 9:46 AM, Richard Hodsdon wrote: > Thanks for the response, > > You are correct, but my pseudo code was not. > this line > var t = (CreationDate - 1131428803) / 1000; > should be > var t = (CreationDate - now()) / 1000; > > This will cause the items ranking to depreciate over time. > > Richard > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Multilingual text analysis
Juan, An easy way in solr, I think, is indeed to use different fields at index time and expand on multiple fields at query time. I believe using field-names' wildcards allows you to specify a different analyzer per language doing this. There's been long discussions on the java-u...@lucene.apache.org mailing-list about the best design for multilingual indexing and searching. One of the key arguments was wether you were able to detect with faithfulness the language of a query, this is generally very hard. It would make sense to start a page at the solr website... paul Le 2 juin 2011 à 12:52, lee carroll a écrit : > Juan > > I don't think so. > > you can try indexing fields like myfield_en. myfield_fr, my field_xx > if you now what language you are dealing with at index and query time. > > you can also have seperate cores for your documents for each language > if you don't want to complicate your schema > again you will need to know language at index and query time > > > > On 2 June 2011 08:57, Juan Antonio Farré Basurte > wrote: >> Hello, >> Some of the possible analyzers that can be applied to a text field, depend >> on the language of the text to analyze and can be configured for a concrete >> language. >> In my case, the text fields can be in many different languages, but each >> document also includes a field containing the language of text fields. >> Is it possible to configure analyzers to use the suitable language for each >> document, in function of the language field? >> Thanks, >> >> Juan
Re: Multilingual text analysis
Thank you both Paul and Lee for your answer. Luckily in my case there's no problem about knowing language at index time nor we have really to bother about the language of the query, as users can specify the language they are interested in. So I guess our solution would be to use different optional fields, one for each language and that should be good enough. I just had wondered whether it was possible to parametrize the analyzers in function of one field value. I think this would be a very elegant solution for many needs. May it could be a possible improvement for future versions of solr :) Paul, what do you mean when you say it would make sense to start a page at the solr website? Thanks again, Juan El 02/06/2011, a las 16:06, Paul Libbrecht escribió: > Juan, > > An easy way in solr, I think, is indeed to use different fields at index time > and expand on multiple fields at query time. > I believe using field-names' wildcards allows you to specify a different > analyzer per language doing this. > > There's been long discussions on the java-u...@lucene.apache.org mailing-list > about the best design for multilingual indexing and searching. One of the key > arguments was wether you were able to detect with faithfulness the language > of a query, this is generally very hard. > > It would make sense to start a page at the solr website... > > paul > > > Le 2 juin 2011 à 12:52, lee carroll a écrit : > >> Juan >> >> I don't think so. >> >> you can try indexing fields like myfield_en. myfield_fr, my field_xx >> if you now what language you are dealing with at index and query time. >> >> you can also have seperate cores for your documents for each language >> if you don't want to complicate your schema >> again you will need to know language at index and query time >> >> >> >> On 2 June 2011 08:57, Juan Antonio Farré Basurte >> wrote: >>> Hello, >>> Some of the possible analyzers that can be applied to a text field, depend >>> on the language of the text to analyze and can be configured for a concrete >>> language. >>> In my case, the text fields can be in many different languages, but each >>> document also includes a field containing the language of text fields. >>> Is it possible to configure analyzers to use the suitable language for each >>> document, in function of the language field? >>> Thanks, >>> >>> Juan >
Re: Multilingual text analysis
Le 2 juin 2011 à 16:27, Juan Antonio Farré Basurte a écrit : > Paul, what do you mean when you say it would make sense to start a page at > the solr website? I meant the solr wiki. > I just had wondered whether it was possible to parametrize the analyzers in > function of one field value. I think this would be a very elegant solution > for many needs. May it could be a possible improvement for future versions of > solr :) Honestly, I think it is of utmost importance for a CMS manager to kind of know "how much stemming" one wishes... so configuring which analyzer is used for which language is, I think really useful and the schema is easy to write that. In one of my search projects, I have a series of unit-tests that all fail because the analyzer, say, for Arabic or Hungarian, was not "good enough"... this always happens and it's better to be aware of that. paul
Re: How to display search results of solr in to other application.
did you include the jquery lib, make sure you use the jsasoncallback ie $.getJSON( "http://[server]:[port]/solr/select/?jsoncallback=?";, {"q": queryString, "version": "2.2", "start": "0", "rows": "10", "indent": "on", "json.wrf": "callbackFunctionToDoSomethingWithOurData", "wt": "json", "fl": "field1"} ); not what you have got On 2 June 2011 13:00, Romi wrote: > I did this: > > $(document).ready(function(){ > > > $.getJSON("http://[remotehost]:8983/solr/select/?q=diamond&wt=json&json.wrf=?";, > function(result){ > > alert("hello" + result.response.docs[0].name); > }); > }); > > > But i am not getting any result, what i did wrong ?? > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014797.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: how to make getJson parameter dynamic
Hi Romi, this is the third thread you have created on this subject. Its not good and will get you ignored by many people who could help. The question relates to js rather than SOLR now. See any good js manual or site for how to assign values to a variable and then concatanate these into a string. lee c On 2 June 2011 13:40, Romi wrote: > $.getJSON("http://192.168.1.9:8983/solr/db/select/?q=diamond&wt=json&json.wrf=?";, > function(result){ > > alert("hello" + result.response.docs[0].name); > }); > }); > > using this i am parsing solr json response, but as you can see it is hard > coded (q=diamond) how can i make it user's choice. i mean user can pass the > query at run time for example using a text box. > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3014941.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Need Schema help
Hi) What i need: Index prices to products, each product has multiple prices, to each region, country, and price itself. I tried to do with field type "long" multiple:true, and form value as "country code + region code + price" (1004000349601, for example), but it has strange behaviour.. price:[* TO 1004000349600] do include 1004000349601.. I am doing something wrong? Possible data: Country: 1-9 Region: 0-99 Price: 1-999
Re: Need Schema help
Denis, would dynamic fields help: field defined as *_price in schema at index time you index fields named like: [1-9]_[0-99]_price at query time you search the price field for a given country region 1_10_price:[10 TO 100] This may work for some use-cases i guess lee 2011/6/2 Denis Kuzmenok : > Hi) > > What i need: > Index prices to products, each product has multiple prices, to each > region, country, and price itself. > I tried to do with field type "long" multiple:true, and form > value as "country code + region code + price" (1004000349601, for > example), but it has strange behaviour.. price:[* TO 1004000349600] do > include 1004000349601.. I am doing something wrong? > > Possible data: > Country: 1-9 > Region: 0-99 > Price: 1-999 > >
Re: return unaltered complete multivalued fields with Highlighted results
Hi, Here is the code for Solr 3.1 that will preserve all the text and will disable sorting. This goes in solrconfig.xml request handler config or which ever way you pass params: true This line goes into HighlightParams class: public static final String PRESERVE_ORDER = HIGHLIGHT + ".preserveOrder"; Replace this method DefaultSolrHighlighter.doHighlightingByHighlighter (I only added 3 if blocks): private void doHighlightingByHighlighter( Query query, SolrQueryRequest req, NamedList docSummaries, int docId, Document doc, String fieldName ) throws IOException { SolrParams params = req.getParams(); String[] docTexts = doc.getValues(fieldName); // according to Document javadoc, doc.getValues() never returns null. check empty instead of null if (docTexts.length == 0) return; SolrIndexSearcher searcher = req.getSearcher(); IndexSchema schema = searcher.getSchema(); TokenStream tstream = null; int numFragments = getMaxSnippets(fieldName, params); boolean mergeContiguousFragments = isMergeContiguousFragments(fieldName, params); String[] summaries = null; List frags = new ArrayList(); TermOffsetsTokenStream tots = null; // to be non-null iff we're using TermOffsets optimization try { TokenStream tvStream = TokenSources.getTokenStream(searcher.getReader(), docId, fieldName); if (tvStream != null) { tots = new TermOffsetsTokenStream(tvStream); } } catch (IllegalArgumentException e) { // No problem. But we can't use TermOffsets optimization. } for (int j = 0; j < docTexts.length; j++) { if( tots != null ) { // if we're using TermOffsets optimization, then get the next // field value's TokenStream (i.e. get field j's TokenStream) from tots: tstream = tots.getMultiValuedTokenStream( docTexts[j].length() ); } else { // fall back to analyzer tstream = createAnalyzerTStream(schema, fieldName, docTexts[j]); } Highlighter highlighter; if (Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER, "true"))) { // TODO: this is not always necessary - eventually we would like to avoid this wrap // when it is not needed. tstream = new CachingTokenFilter(tstream); // get highlighter highlighter = getPhraseHighlighter(query, fieldName, req, (CachingTokenFilter) tstream); // after highlighter initialization, reset tstream since construction of highlighter already used it tstream.reset(); } else { // use "the old way" highlighter = getHighlighter(query, fieldName, req); } int maxCharsToAnalyze = params.getFieldInt(fieldName, HighlightParams.MAX_CHARS, Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE); if (maxCharsToAnalyze < 0) { highlighter.setMaxDocCharsToAnalyze(docTexts[j].length()); } else { highlighter.setMaxDocCharsToAnalyze(maxCharsToAnalyze); } try { TextFragment[] bestTextFragments = highlighter.getBestTextFragments(tstream, docTexts[j], mergeContiguousFragments, numFragments); for (int k = 0; k < bestTextFragments.length; k++) { if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) { if ((bestTextFragments[k] != null) ){//&& (bestTextFragments[k].getScore() > 0)) { frags.add(bestTextFragments[k]); } } else { if ((bestTextFragments[k] != null) && (bestTextFragments[k].getScore() > 0)) { frags.add(bestTextFragments[k]); } } } } catch (InvalidTokenOffsetsException e) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e); } } // sort such that the fragments with the highest score come first if (!params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) { Collections.sort(frags, new Comparator() { public int compare(TextFragment arg0, TextFragment arg1) { return Math.round(arg1.getScore() - arg0.getScore()); } }); } // convert fragments back into text // TODO: we can include score and position information in output as snippet attributes if (frags.size() > 0) { ArrayList fragTexts = new ArrayList(); for (TextFragment fragment: frags) { if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) { if ((fragment != null) ){// && (fragment.getScore() > 0)) { fragTexts.add(fragment.toString()); } if (fragTexts.size() >= numFragments) break; } else { if ((fragment != null) && (fragment.getScore() > 0)) { fragTexts.add(fragment.toString()); }
RE: Spellcheck Phrases
Actually, someone just pointed out to me that a patch like this is unnecessary. The code works as-is if configured like this: .01 (correct) instead of this: .01 (incorrect) I tested this and it seems to work. I'm still am trying to figure out if using this parameter actually improves the quality of our spell suggestions, now that I know how to use it properly. Sorry about the mis-information earlier. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dyer, James Sent: Wednesday, June 01, 2011 3:02 PM To: solr-user@lucene.apache.org Subject: RE: Spellcheck Phrases Tanner, I just entered SOLR-2571 to fix the float-parsing-bug that breaks "thresholdTokenFrequency". Its just a 1-line code fix so I also included a patch that should cleanly apply to solr 3.1. See https://issues.apache.org/jira/browse/SOLR-2571 for info and patches. This parameter appears absent from the wiki. And as it has always been broken for me, I haven't tested it. However, my understanding it should be set as the minimum percentage of documents in which a term has to occur in order for it to appear in the spelling dictionary. For instance in the config below, a term would have to occur in at least 1% of the documents for it to be part of the spelling dictionary. This might be a good setting for long fields but for the short fields in my application, I was thinking of setting this to something like 1/1000 of 1% ... text spellchecker Spelling_Dictionary text ./spellchecker .01 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Tanner Postert [mailto:tanner.post...@gmail.com] Sent: Friday, May 27, 2011 6:04 PM To: solr-user@lucene.apache.org Subject: Re: Spellcheck Phrases are there any updates on this? any third party apps that can make this work as expected? On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James wrote: > Tanner, > > Currently Solr will only make suggestions for words that are not in the > dictionary, unless you specifiy "spellcheck.onlyMorePopular=true". However, > if you do that, then it will try to "improve" every word in your query, even > the ones that are spelled correctly (so while it might change "brake" to > "break" it might also change "leg" to "log".) > > You might be able to alleviate some of the pain by setting the > "thresholdTokenFrequency" so as to remove misspelled and rarely-used words > from your dictionary, although I personally haven't been able to get this > parameter to work. It also doesn't seem to be documented on the wiki but it > is in the 1.4.1. source code, in class IndexBasedSpellChecker. Its also > mentioned in Smiley&Pugh's book. I tried setting it like this, but got a > ClassCastException on the float value: > > > text_spelling > > spellchecker > Spelling_Dictionary > text_spelling > true > .001 > > > > I have it on my to-do list to look into this further but haven't yet. If > you decide to try it and can get it to work, please let me know how you do > it. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > -Original Message- > From: Tanner Postert [mailto:tanner.post...@gmail.com] > Sent: Wednesday, February 23, 2011 12:53 PM > To: solr-user@lucene.apache.org > Subject: Spellcheck Phrases > > right now when I search for 'brake a leg', solr returns valid results with > no indication of misspelling, which is understandable since all of those > terms are valid words and are probably found in a few pieces of our > content. > My question is: > > is there any way for it to recognize that the phase should be "break a leg" > and not "brake a leg" and suggest the proper phrase? >
Re: Need Schema help
Thursday, June 2, 2011, 6:29:23 PM, you wrote: Wow. This sounds nice. Will try this way. Thanks! > Denis, > would dynamic fields help: > field defined as *_price in schema > at index time you index fields named like: > [1-9]_[0-99]_price > at query time you search the price field for a given country region > 1_10_price:[10 TO 100] > This may work for some use-cases i guess > lee
SolrJ and Range Faceting
Currently the range and date faceting in SolrJ acts a bit differently than I would expect. Specifically, range facets aren't parsed at all and date facets end up generating filterQueries which don't have the range, just the lower bound. Is there a reason why SolrJ doesn't support these? I have written some things on my end to handle these and generate filterQueries for date ranges of the form dateTime:[start TO end] and I have a function (which I copied from the date faceting) which parses the range facets, but would prefer not to have to maintain these myself. Is there a plan to implement these? Also is there a plan to update FacetField to not have end be a date, perhaps making it a String like start so we can support date and range queries?
Re: return unaltered complete multivalued fields with Highlighted results
Hmmm, I don't know a thing about the highlighter code, but if you can just make a patch and create a JIRA (https://issues.apache.org/jira/browse/SOLR) and attach it, it'll get "in the system". I suspect you've seen this page, but just in case: http://wiki.apache.org/solr/HowToContribute See, especially, "Yonik's Law of patches" on that page... Two questions: 1> after your changes, could you successfully run "ant test"? 2> can you supply any unit tests that illustrated the correct behavior here? Even if both answers are "no", it's still probably a good idea to submit the patch. Although first it might be a good idea to discuss this on the dev list (d...@lucene.apache.org) before opening a JIRA, it's possible that there's something similar in the works already... Best Erick On Thu, Jun 2, 2011 at 11:31 AM, alexei wrote: > Hi, > > Here is the code for Solr 3.1 that will preserve all the text and will > disable sorting. > > This goes in solrconfig.xml request handler config or which ever way you > pass params: > true > > This line goes into HighlightParams class: > public static final String PRESERVE_ORDER = HIGHLIGHT + ".preserveOrder"; > > Replace this method DefaultSolrHighlighter.doHighlightingByHighlighter (I > only added 3 if blocks): > > private void doHighlightingByHighlighter( Query query, SolrQueryRequest > req, NamedList docSummaries, > int docId, Document doc, String fieldName ) throws IOException { > SolrParams params = req.getParams(); > String[] docTexts = doc.getValues(fieldName); > // according to Document javadoc, doc.getValues() never returns null. > check empty instead of null > if (docTexts.length == 0) return; > > SolrIndexSearcher searcher = req.getSearcher(); > IndexSchema schema = searcher.getSchema(); > TokenStream tstream = null; > int numFragments = getMaxSnippets(fieldName, params); > boolean mergeContiguousFragments = isMergeContiguousFragments(fieldName, > params); > > String[] summaries = null; > List frags = new ArrayList(); > > TermOffsetsTokenStream tots = null; // to be non-null iff we're using > TermOffsets optimization > try { > TokenStream tvStream = > TokenSources.getTokenStream(searcher.getReader(), docId, fieldName); > if (tvStream != null) { > tots = new TermOffsetsTokenStream(tvStream); > } > } > catch (IllegalArgumentException e) { > // No problem. But we can't use TermOffsets optimization. > } > > for (int j = 0; j < docTexts.length; j++) { > if( tots != null ) { > // if we're using TermOffsets optimization, then get the next > // field value's TokenStream (i.e. get field j's TokenStream) from > tots: > tstream = tots.getMultiValuedTokenStream( docTexts[j].length() ); > } else { > // fall back to analyzer > tstream = createAnalyzerTStream(schema, fieldName, docTexts[j]); > } > > Highlighter highlighter; > if > (Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER, > "true"))) { > // TODO: this is not always necessary - eventually we would like to > avoid this wrap > // when it is not needed. > tstream = new CachingTokenFilter(tstream); > > // get highlighter > highlighter = getPhraseHighlighter(query, fieldName, req, > (CachingTokenFilter) tstream); > > // after highlighter initialization, reset tstream since > construction of highlighter already used it > tstream.reset(); > } > else { > // use "the old way" > highlighter = getHighlighter(query, fieldName, req); > } > > int maxCharsToAnalyze = params.getFieldInt(fieldName, > HighlightParams.MAX_CHARS, > Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE); > if (maxCharsToAnalyze < 0) { > highlighter.setMaxDocCharsToAnalyze(docTexts[j].length()); > } else { > highlighter.setMaxDocCharsToAnalyze(maxCharsToAnalyze); > } > > try { > TextFragment[] bestTextFragments = > highlighter.getBestTextFragments(tstream, docTexts[j], > mergeContiguousFragments, numFragments); > for (int k = 0; k < bestTextFragments.length; k++) { > if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) { > if ((bestTextFragments[k] != null) ){//&& > (bestTextFragments[k].getScore() > 0)) { > frags.add(bestTextFragments[k]); > } > } > else { > if ((bestTextFragments[k] != null) && > (bestTextFragments[k].getScore() > 0)) { > frags.add(bestTextFragments[k]); > } > } > } > } catch (InvalidTokenOffsetsException e) { > throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e); > } > } > // sort such that the fragments with the highest score come first > if (!params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) { > Collections.sort(frags, n
Re: Need Schema help
This range behavior doesn't make sense. Are you completely sure you're not dropping a digit out someplace? Best Erick 2011/6/2 Denis Kuzmenok : > Hi) > > What i need: > Index prices to products, each product has multiple prices, to each > region, country, and price itself. > I tried to do with field type "long" multiple:true, and form > value as "country code + region code + price" (1004000349601, for > example), but it has strange behaviour.. price:[* TO 1004000349600] do > include 1004000349601.. I am doing something wrong? > > Possible data: > Country: 1-9 > Region: 0-99 > Price: 1-999 > >
Is there a way to get all the hits and score them later?
Basically I don't want the hits and the scores at the same time. I want to get a list of hits but I want to score them myself externally (there is a dedicated server that will do the scoring given a list of id's). Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016424.html Sent from the Solr - User mailing list archive at Nabble.com.
Large number of dynamic fields
Hello, I have a 7Gb index having 2MM documents. Each document has about 400 fields, but fields are dynamic and in total I have ~200k fields. We're using SOLR 3.1 and tomcat 5.5. We are seeing very slow start-up times (from tomcat startup to SOLR ready to answer queries about 5 minutes). We have tried from 8 to 32GB of memory, with little difference. Would you say SOLR is not suitable for such a large number of fields? Committing ~10k docs takes about 5 minutes as well. Thanks in advance, Santiago
Re: return unaltered complete multivalued fields with Highlighted results
I could use this feature too, encourage you to submit a patch in JIRA. I wouldn't call the param "preserveOrder" though -- what it's really doing is returning the whole entire field, with highlighting markers, not just "preserving order" of fragments. Not sure what to call it, but not "preserveOrder". On 6/2/2011 11:31 AM, alexei wrote: Hi, Here is the code for Solr 3.1 that will preserve all the text and will disable sorting. This goes in solrconfig.xml request handler config or which ever way you pass params: true This line goes into HighlightParams class: public static final String PRESERVE_ORDER = HIGHLIGHT + ".preserveOrder"; Replace this method DefaultSolrHighlighter.doHighlightingByHighlighter (I only added 3 if blocks): private void doHighlightingByHighlighter( Query query, SolrQueryRequest req, NamedList docSummaries, int docId, Document doc, String fieldName ) throws IOException { SolrParams params = req.getParams(); String[] docTexts = doc.getValues(fieldName); // according to Document javadoc, doc.getValues() never returns null. check empty instead of null if (docTexts.length == 0) return; SolrIndexSearcher searcher = req.getSearcher(); IndexSchema schema = searcher.getSchema(); TokenStream tstream = null; int numFragments = getMaxSnippets(fieldName, params); boolean mergeContiguousFragments = isMergeContiguousFragments(fieldName, params); String[] summaries = null; List frags = new ArrayList(); TermOffsetsTokenStream tots = null; // to be non-null iff we're using TermOffsets optimization try { TokenStream tvStream = TokenSources.getTokenStream(searcher.getReader(), docId, fieldName); if (tvStream != null) { tots = new TermOffsetsTokenStream(tvStream); } } catch (IllegalArgumentException e) { // No problem. But we can't use TermOffsets optimization. } for (int j = 0; j< docTexts.length; j++) { if( tots != null ) { // if we're using TermOffsets optimization, then get the next // field value's TokenStream (i.e. get field j's TokenStream) from tots: tstream = tots.getMultiValuedTokenStream( docTexts[j].length() ); } else { // fall back to analyzer tstream = createAnalyzerTStream(schema, fieldName, docTexts[j]); } Highlighter highlighter; if (Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER, "true"))) { // TODO: this is not always necessary - eventually we would like to avoid this wrap // when it is not needed. tstream = new CachingTokenFilter(tstream); // get highlighter highlighter = getPhraseHighlighter(query, fieldName, req, (CachingTokenFilter) tstream); // after highlighter initialization, reset tstream since construction of highlighter already used it tstream.reset(); } else { // use "the old way" highlighter = getHighlighter(query, fieldName, req); } int maxCharsToAnalyze = params.getFieldInt(fieldName, HighlightParams.MAX_CHARS, Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE); if (maxCharsToAnalyze< 0) { highlighter.setMaxDocCharsToAnalyze(docTexts[j].length()); } else { highlighter.setMaxDocCharsToAnalyze(maxCharsToAnalyze); } try { TextFragment[] bestTextFragments = highlighter.getBestTextFragments(tstream, docTexts[j], mergeContiguousFragments, numFragments); for (int k = 0; k< bestTextFragments.length; k++) { if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) { if ((bestTextFragments[k] != null) ){//&& (bestTextFragments[k].getScore()> 0)) { frags.add(bestTextFragments[k]); } } else { if ((bestTextFragments[k] != null)&& (bestTextFragments[k].getScore()> 0)) { frags.add(bestTextFragments[k]); } } } } catch (InvalidTokenOffsetsException e) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e); } } // sort such that the fragments with the highest score come first if (!params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) { Collections.sort(frags, new Comparator() { public int compare(TextFragment arg0, TextFragment arg1) { return Math.round(arg1.getScore() - arg0.getScore()); } }); } // convert fragments back into text // TODO: we can include score and position information in output as snippet attributes if (frags.size()> 0) { ArrayList fragTexts = new ArrayList(); for (TextFragment fragment: frags) { if (params.getBool( HighlightParams.PRESERVE_ORDER, false ) ) {
Re: Is there a way to get all the hits and score them later?
To clarify. I want to do this all underneath solr. I don't want to get a bunch of hits from solr in my app and then go to my server and score them again. I'd like to score them myself underneath solr before I return the results to my app. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to get all the hits and score them later?
Well, you can get all the hits by setting "rows" to a very high value, say a value one more than the total number of docs you have int he database, so all hits will be returned. If there are a lot of them, it won't be quick. If you choose to sort by something other than 'score', I don't know if Solr will score anyway, not sure if there's a way to actually turn off scoring. But you can certainly ignore it. Not sure if this is really what you were asking, it is a pretty simple answer. On 6/2/2011 2:30 PM, arian487 wrote: Basically I don't want the hits and the scores at the same time. I want to get a list of hits but I want to score them myself externally (there is a dedicated server that will do the scoring given a list of id's). Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to get all the hits and score them later?
It sounds to me like maybe you want to implement a custom scoring algorithm in Solr? I have no experience with that, but maybe if you ask and/or google using those words, you'll have more luck. I know it's possible to implement a custom scoring algorithm, but I believe it's kind of tricky, and also of course has performance implications depending on implementation -- and defintiely isn't designed for the use case of sending all results to an external server for scoring (not sure how you could do that in a performant way even if Solr's architecture would support it, which I'm not sure). On 6/2/2011 3:01 PM, arian487 wrote: To clarify. I want to do this all underneath solr. I don't want to get a bunch of hits from solr in my app and then go to my server and score them again. I'd like to score them myself underneath solr before I return the results to my app. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching using a PDF
I mean instead of typing http://localhost:8983/?q=mysearch, I would send a PDF file with the contents of "mysearch" and search based on that. I am leaning toward handling this before it hits solr however. Thanks, Brian Lamb On Wed, Jun 1, 2011 at 3:52 PM, Erick Erickson wrote: > I'm not quite sure what you mean by "regular search". When > you index a PDF (Presumably through Tika or Solr Cell) the text > is indexed into your index and you can certainly search that. Additionally, > there may be meta data indexed in specific fields (e.g. author, > date modified, etc). > > But what does "search based on a PDF file" mean in your context? > > Best > Erick > > On Wed, Jun 1, 2011 at 3:41 PM, Brian Lamb > wrote: > > Is it possible to do a search based on a PDF file? I know its possible to > > update the index with a PDF but can you do just a regular search with it? > > > > Thanks, > > > > Brian Lamb > > >
Re: Is there a way to get all the hits and score them later?
don't know if this is what you mean: you can add 'score' to the fl field list, and it will show you the score for each item. Upayavira On Thu, 02 Jun 2011 11:30 -0700, "arian487" wrote: > Basically I don't want the hits and the scores at the same time. I want > to > get a list of hits but I want to score them myself externally (there is a > dedicated server that will do the scoring given a list of id's). Thanks! > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016424.html > Sent from the Solr - User mailing list archive at Nabble.com. > --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Searching using a PDF
Not that I know of, you'll probably have to handle this before it hits Solr. Best Erick On Thu, Jun 2, 2011 at 3:10 PM, Brian Lamb wrote: > I mean instead of typing http://localhost:8983/?q=mysearch, I would send a > PDF file with the contents of "mysearch" and search based on that. I am > leaning toward handling this before it hits solr however. > > Thanks, > > Brian Lamb > > On Wed, Jun 1, 2011 at 3:52 PM, Erick Erickson wrote: > >> I'm not quite sure what you mean by "regular search". When >> you index a PDF (Presumably through Tika or Solr Cell) the text >> is indexed into your index and you can certainly search that. Additionally, >> there may be meta data indexed in specific fields (e.g. author, >> date modified, etc). >> >> But what does "search based on a PDF file" mean in your context? >> >> Best >> Erick >> >> On Wed, Jun 1, 2011 at 3:41 PM, Brian Lamb >> wrote: >> > Is it possible to do a search based on a PDF file? I know its possible to >> > update the index with a PDF but can you do just a regular search with it? >> > >> > Thanks, >> > >> > Brian Lamb >> > >> >
RE: Anyway to know changed documents?
...and it works really well!!! :) -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, June 01, 2011 5:37 AM To: solr-user@lucene.apache.org Subject: Re: Anyway to know changed documents? On 6/1/2011 6:12 AM, pravesh wrote: > SOLR wiki will provide help on this. You might be interested in pure Java > based replication too. I'm not sure,whether SOLR operational will have this > feature(synch'ing only changed segments). You might need to change > configuration in searchconfig.xml Yes, this feature is there in the Java/HTTP based replication since Solr 1.4
Re: tika and solr 3,1 integration
Hi Naveen, Check if there is a dynamic field named "attr_*" in the schema. The "uprefix=attr_" parameter means that if Solr can't find an extracted field in the schema, it'll add the prefix "attr_" and try again. *Juan* On Thu, Jun 2, 2011 at 4:21 AM, Naveen Gupta wrote: > Hi > > I am trying to integrate solr 3.1 and tika (which comes default with the > version) > > and using curl command trying to index few of the documents, i am getting > this error. the error is attr_meta field is unknown. i checked the > solrconfig, it looks perfect to me. > > can you please tell me what i am missing. > > I copied all the jars from contrib/extraction/lib to solr/lib folder that > is > there in same place where conf is there > > > I am using the same request handler which is coming with default > > startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > > > > text > true > ignored_ > > > true > links > ignored_ > > > > > > > > * curl " > > http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true > " > -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"* > > > Apache Tomcat/6.0.18 - Error report > HTTP Status 400 - ERROR:unknown field 'attr_meta' size="1" noshade="noshade">type Status > reportmessage > ERROR:unknown field 'attr_meta'description The > request sent by the client was syntactically incorrect (ERROR:unknown field > 'attr_meta').Apache > Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib# > > > Please note > > i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows > machine > and using solr cell > > calling the program works fine without any changes in configuration. > > Thanks > Naveen >
Sorting
Hi, When using the following URL: http://localhost:8080/solr/StatReg/select?version=2.2&sort=path+asc&fl=path&start=0&q=paths%3A%222%2Froot%2FStatReg%2F--+C+--%22&hl=off&rows=500 I get the result in the following order: [...] /-- C --/Community Care Facility Act [RSBC 1996] c. 60/00_96060REP_01.xml /-- C --/Community Care and Assisted Living Act [SBC 2002] c. 75/00_02075_01.xml [...] However, the order is not right "and Assisted" should come before "Facitity Act". I'm using the following schema configuration: Thanks, Clécio
Re: Is there a way to get all the hits and score them later?
Actually I was thinking I wanted to do something before the sharding (like in the layer where faceting happens for example). I wanna hack a plugin in the middle to go to my server after I have a bunch of hits. Just not sure where to do this... Though I've decided I can do scoring from solr (like a preliminary scoring to narrow down some results) and then in the middle send those hits to my server for additional scoring. I can't hack it on in the end since the sharding has happened I think, I'm just not sure where to look right now. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3017401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting
Hi Clécio, Your problem may be caused by case sensitiveness of string fields. Try using the "lowercase" field type that comes in the example. Regards, *Juan* On Thu, Jun 2, 2011 at 6:13 PM, Clecio Varjao wrote: > Hi, > > When using the following URL: > > http://localhost:8080/solr/StatReg/select?version=2.2&sort=path+asc&fl=path&start=0&q=paths%3A%222%2Froot%2FStatReg%2F--+C+--%22&hl=off&rows=500 > > I get the result in the following order: > > [...] > /-- C --/Community Care Facility Act [RSBC 1996] c. 60/00_96060REP_01.xml > /-- C --/Community Care and Assisted Living Act [SBC 2002] c. > 75/00_02075_01.xml > [...] > > However, the order is not right "and Assisted" should come before > "Facitity Act". > > I'm using the following schema configuration: > > omitNorms="true"/> > > multiValued="false" /> > > Thanks, > > Clécio >
Indexes in ramdisk don't show performance improvement?
Hey everyone. Been doing some load testing over the past few days. I've been throwing a good bit of load at an instance of solr and have been measuring response time. We're running a variety of different keyword searches to keep solr's cache on its toes. I'm running two exact same load testing scenarios: one with indexes residing in /dev/shm and another from local disk. The indexes are about 4.5GB in size. On both tests the response times are the same. I wasn't expecting that. I do see the java heap size grow when indexes are served from disk (which is expected). When the indexes are served out of /dev/shm, the java heap stays small. So in general is this consistent behavior? I don't really see the advantage of serving indexes from /dev/shm. When the indexes are being served out of ramdisk, is the linux kernel or the memory mapper doing something tricky behind the scenes to use ramdisk in lieu of the java heap? For what it is worth, we are running x_64 rh5.4 on a 12 core 2.27Ghz Xeon system with 48GB ram. Thoughts? -Park
Re: Solr memory consumption
> Commits are divided into 2 groups: > - often but small (last changed > info) 1) Make sure that it's not too often and you don't have commit overlapping problem. http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F 2) You may also try to limit cache sizes and check if it helps. 3) If it doesn't help then try to monitor your app using jconsole * try to hit garbage collector and see if it frees some memory * browse solr jmx attributes and see if there'r any hints re solr caches usage, etc 4) Try to run jmap -heap -histo and see if there's any hints there 5) If none of above helps then you probably need to examine your memory usage using some kind of java profiler tool (like yourkit profiler) > Size: 4 databases about 1G (sum), 1 database (with n-gram) for 21G.. > I don't know any other way to search for product names except n-gram > =\ Standard text field with solr.WordDelimiterFilterFactory and generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" during indexing isn't good enough? You might want to limit min and max ngram size, just to reduce your index size.
Hitting the URI limit, how to get around this?
I have a master solr instance that I sent my request to, it hosts no documents it just farms the request out to a large number of shards. All the other solr instances that host the data contain multiple cores. Therefore my search string looks like "http://host:port/solr/select?...&shards=nodeA:1234/solr/core01,nodeA:1234/solr/core02,nodeA:1234/solr/core03,..."; This shard list is pretty long and has finally hit "the limit". So my question is how to best avoid having to build such a long uri? Is there a way to have mutiple tiers, where the master server has a list of servers (nodeA:1234,nodeB:1234,...) and each of those nodes query the cores that they host (nodeA hosts core01, core02, core03, ...)? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3017837.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexes in ramdisk don't show performance improvement?
What I expect is happening is that the Solr caches are effectively making the two tests identical, using memory to hold the vital parts of the code in both cases (after disk warming on the instance using the local disk). I suspect if you measured the first few queries (assuming no auto-warming) you'd see the local disk version be slower. Were you running these tests for curiosity or is running from /dev/shm something you're considering for production? Best Erick On Thu, Jun 2, 2011 at 5:47 PM, Parker Johnson wrote: > > Hey everyone. > > Been doing some load testing over the past few days. I've been throwing a > good bit of load at an instance of solr and have been measuring response > time. We're running a variety of different keyword searches to keep > solr's cache on its toes. > > I'm running two exact same load testing scenarios: one with indexes > residing in /dev/shm and another from local disk. The indexes are about > 4.5GB in size. > > On both tests the response times are the same. I wasn't expecting that. > I do see the java heap size grow when indexes are served from disk (which > is expected). When the indexes are served out of /dev/shm, the java heap > stays small. > > So in general is this consistent behavior? I don't really see the > advantage of serving indexes from /dev/shm. When the indexes are being > served out of ramdisk, is the linux kernel or the memory mapper doing > something tricky behind the scenes to use ramdisk in lieu of the java heap? > > For what it is worth, we are running x_64 rh5.4 on a 12 core 2.27Ghz Xeon > system with 48GB ram. > > Thoughts? > > -Park > > >
Re: Sorting algorithm
It hasn't been committed yet, but you may want to track this JIRA: https://issues.apache.org/jira/browse/SOLR-2136 I happened to notice it over on the dev list, it's about adding if () to function queries. Best Erick 2011/6/2 Tomás Fernández Löbbe : > OK, then (everything that's available at index time, I'll say it's > constant): > (Math.log(z) / Math.LN10) (not sure what you mean with Math.LN10) is > constant, I'll call it c1 > > ((y * t) / 45000) = (y/4500)*t --> y/4500 is constant, I'll call it c2. > > c1+(c2 * t) = c1 + (c2 * (CreationDate - now) / 1000) --> c2 / 1000 is also > constant, I'll call it c3. > > Then, your ranking formula is: c1 + (c3 * (creationDate - now)). > > In solr, this will be: &sort=sum(c1,product(c3,ms(creationDate, NOW))). > > I haven't tried it but if my arithmetics are correct (I'm a little bit rusty > with that), that should work and should be faster than doing the whole thing > at query time. Of course, "c1" and "c3" must be indexed as fields. > > Regards, > > Tomás > On Thu, Jun 2, 2011 at 9:46 AM, Richard Hodsdon > wrote: > >> Thanks for the response, >> >> You are correct, but my pseudo code was not. >> this line >> var t = (CreationDate - 1131428803) / 1000; >> should be >> var t = (CreationDate - now()) / 1000; >> >> This will cause the items ranking to depreciate over time. >> >> Richard >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3014961.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >
Better to have lots of smaller cores or one really big core?
I am trying to decide what the right approach would be, to have one big core and many smaller cores hosted by a solr instance. I think there may be trade offs either way but wanted to see what others do. And by small I mean about 5-10 million documents, large may be 50 million. It seems like small cores are better because - If one server can host say 70 million documents (before memory issues) we can get really close with a bunch of small indexes, vs only being able to host one 50 million document index. And when a software update comes out that allows us to host 90 million then we could add a few more small indexes. - It takes less time to build ten 5 million document indexes than one 50 million document index. It seems like larger cores are better because - Each core returns their result set, so if I want 1000 results and their are 100 cores the network is transferring 10 documents for that search. Where if I had only 10 much larger cores only 1 documents would be sent over the network. - It would prolong my time until I hit uri length limits being that there would be less cores in my system. Any thoughts??? Other trade-offs??? How do you find what the right size for you is? -- View this message in context: http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3017973.html Sent from the Solr - User mailing list archive at Nabble.com.
DeltaImport records not commited using DIH.
Hi, 1) When I run full-import it works fine and commits all the records. The document count matches table and dataImport.properties is updated with last_index timestamp. 2) After some time I ran the delta import and it is giving enough information but it is not adding the new record into the index. I am including my config and log information. Could anyone help me to fix this. *Data-config* *DeltaImport Status message.* 0:0:3.391 15 77 0 0 2011-06-02 16:58:23 2011-06-02 16:58:23 2011-06-02 16:58:24 2011-06-02 16:58:24 77 *Log info* Jun 2, 2011 3:59:20 PM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:15.641 Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Jun 2, 2011 4:01:02 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={command=delta-import&clean=false&qt=/dataimport&commit=true} status=0 QTime=0 Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: cps_dataset Jun 2, 2011 4:01:02 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity cps_dataset with URL: jdbc:oracle:thin:@lnxdb-stg-abcd.com:1521:STG Jun 2, 2011 4:01:03 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 484 Jun 2, 2011 4:01:03 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: xxx_xxx_cps_dataset rows obtained : 77 Jun 2, 2011 4:01:03 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: xxx_xxx_cps_dataset rows obtained : 0 Jun 2, 2011 4:01:03 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: xxx_xxx_dataset Jun 2, 2011 4:01:18 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully Jun 2, 2011 4:01:18 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 Jun 2, 2011 4:01:18 PM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:15.625
Re: Is there a way to get all the hits and score them later?
Hmm, looks like I can inherit the Similarity Class and do my own thing there. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3018001.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Better to have lots of smaller cores or one really big core?
Take another approach ? Cores are often used for isolation purposes. That is, the data in one core may have nothing to do with another core, the schemas don't have to match etc. They #may# be both logically and physically separate. I don't have measurements for this, so I'm guessing a little. But I expect that using multiple cores will actually use a few more resources than a single core (e.g. memory). Each core will be keeping a separate cache, duplicating terms etc. (I may be wrong on this one!). But if you have a single schema in a logically single core that just grows too big to server queries acceptably, the usual approach is to go to shards, which are just a core but Solr manages the query part over multiple shards via configuration, which is probably easier. So the answer in this case is to put stuff on a single machine in a single core until it grows too big, then go to sharding So the question is really whether you consider the cores sub-parts of a single index or distinct units (say one core per customer). In the former, I'd use one core until it gets too big, then shard. In the latter, multiple cores are a good solution, largely for administrative/security reasons, but then you aren't manually constructing a huge URL... Hope that helps Erick On Thu, Jun 2, 2011 at 7:57 PM, JohnRodey wrote: > I am trying to decide what the right approach would be, to have one big core > and many smaller cores hosted by a solr instance. > > I think there may be trade offs either way but wanted to see what others do. > And by small I mean about 5-10 million documents, large may be 50 million. > > It seems like small cores are better because > - If one server can host say 70 million documents (before memory issues) we > can get really close with a bunch of small indexes, vs only being able to > host one 50 million document index. And when a software update comes out > that allows us to host 90 million then we could add a few more small > indexes. > - It takes less time to build ten 5 million document indexes than one 50 > million document index. > > It seems like larger cores are better because > - Each core returns their result set, so if I want 1000 results and their > are 100 cores the network is transferring 10 documents for that search. > Where if I had only 10 much larger cores only 1 documents would be sent > over the network. > - It would prolong my time until I hit uri length limits being that there > would be less cores in my system. > > Any thoughts??? Other trade-offs??? > > How do you find what the right size for you is? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3017973.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Indexes in ramdisk don't show performance improvement?
That¹s just the thing. Even the initial queries have similar response times as the later ones. WEIRD! I was considering running from /dev/shm in production, but for slaves only (master remains on disk). At this point though, I'm not seeing a benefit to ramdisk so I think I'm going back to traditional disk so the indexes stay intact after a power cycle. Has anyone else seen that indexes served from disk perform similarly as indexes served from ramdisk? -Park On 6/2/11 4:15 PM, "Erick Erickson" wrote: >What I expect is happening is that the Solr caches are effectively making >the >two tests identical, using memory to hold the vital parts of the code in >both >cases (after disk warming on the instance using the local disk). I >suspect if >you measured the first few queries (assuming no auto-warming) you'd see >the >local disk version be slower. > >Were you running these tests for curiosity or is running from /dev/shm >something >you're considering for production? > >Best >Erick > >On Thu, Jun 2, 2011 at 5:47 PM, Parker Johnson >wrote: >> >> Hey everyone. >> >> Been doing some load testing over the past few days. I've been throwing >>a >> good bit of load at an instance of solr and have been measuring response >> time. We're running a variety of different keyword searches to keep >> solr's cache on its toes. >> >> I'm running two exact same load testing scenarios: one with indexes >> residing in /dev/shm and another from local disk. The indexes are about >> 4.5GB in size. >> >> On both tests the response times are the same. I wasn't expecting that. >> I do see the java heap size grow when indexes are served from disk >>(which >> is expected). When the indexes are served out of /dev/shm, the java >>heap >> stays small. >> >> So in general is this consistent behavior? I don't really see the >> advantage of serving indexes from /dev/shm. When the indexes are being >> served out of ramdisk, is the linux kernel or the memory mapper doing >> something tricky behind the scenes to use ramdisk in lieu of the java >>heap? >> >> For what it is worth, we are running x_64 rh5.4 on a 12 core 2.27Ghz >>Xeon >> system with 48GB ram. >> >> Thoughts? >> >> -Park >> >> >> >
Re: Indexes in ramdisk don't show performance improvement?
Linux will cache the open index files in RAM (in the filesystem cache) after their first read which makes the ram disk generally useless. Unless you're processing other files on the box with a size greater than your total unused ram (and thus need to micro-manage what stays in RAM), then I wouldn't recommend using a ramdisk - it's just more to manage. If you reboot the box and run a few searches, those first few will likely be slower until all the index files are cached in Memory. After that point, the performance should be comparable because all files are read out of RAM from that point forward. If solr caches are enabled and your queries are repetitive then that could also be contributing to the speed of repetitive queries. Note that the above advice assumes your total unused ram (not allocated to the JVM or any other processes) is greater than the size of your lucene index files, which should be a safe assumption considering you're trying to put the whole index in a ramdisk. -Trey On Thu, Jun 2, 2011 at 7:15 PM, Erick Erickson wrote: > What I expect is happening is that the Solr caches are effectively making the > two tests identical, using memory to hold the vital parts of the code in both > cases (after disk warming on the instance using the local disk). I suspect if > you measured the first few queries (assuming no auto-warming) you'd see the > local disk version be slower. > > Were you running these tests for curiosity or is running from /dev/shm > something > you're considering for production? > > Best > Erick > > On Thu, Jun 2, 2011 at 5:47 PM, Parker Johnson wrote: >> >> Hey everyone. >> >> Been doing some load testing over the past few days. I've been throwing a >> good bit of load at an instance of solr and have been measuring response >> time. We're running a variety of different keyword searches to keep >> solr's cache on its toes. >> >> I'm running two exact same load testing scenarios: one with indexes >> residing in /dev/shm and another from local disk. The indexes are about >> 4.5GB in size. >> >> On both tests the response times are the same. I wasn't expecting that. >> I do see the java heap size grow when indexes are served from disk (which >> is expected). When the indexes are served out of /dev/shm, the java heap >> stays small. >> >> So in general is this consistent behavior? I don't really see the >> advantage of serving indexes from /dev/shm. When the indexes are being >> served out of ramdisk, is the linux kernel or the memory mapper doing >> something tricky behind the scenes to use ramdisk in lieu of the java heap? >> >> For what it is worth, we are running x_64 rh5.4 on a 12 core 2.27Ghz Xeon >> system with 48GB ram. >> >> Thoughts? >> >> -Park >> >> >> >
Re: synonyms problem
oh thank you for reminding me about string and text issues... I will change it asap... and about index analyzer i just removed if for brevity... i will try again and if it fails will post here again... thank you so much - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3018185.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hitting the URI limit, how to get around this?
just a suggestion ... If the shards are know, you can add them as the default params in the requesthandler so they are added always. and the URL would just have the qt parameter. As the limit for uri is browser dependent. How are you querying solr .. any client api ?? through browser ?? is it hitting the max header length ?? Can you use post instead ?? Regards, Jayendra On Thu, Jun 2, 2011 at 7:12 PM, JohnRodey wrote: > I have a master solr instance that I sent my request to, it hosts no > documents it just farms the request out to a large number of shards. All the > other solr instances that host the data contain multiple cores. > > Therefore my search string looks like > "http://host:port/solr/select?...&shards=nodeA:1234/solr/core01,nodeA:1234/solr/core02,nodeA:1234/solr/core03,..."; > This shard list is pretty long and has finally hit "the limit". > > So my question is how to best avoid having to build such a long uri? > > Is there a way to have mutiple tiers, where the master server has a list of > servers (nodeA:1234,nodeB:1234,...) and each of those nodes query the cores > that they host (nodeA hosts core01, core02, core03, ...)? > > Thanks! > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3017837.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Debugging a Solr/Jetty Hung Process
If you have an SNMP infrastructure available (nagios or similar) you should be able to set up a polling monitor that will keep statistics on the number of threads in your jvm and even allow you to inspect their stacks remotely. You can set alarms so you will be notified if cpu thread count or other metrics exceed a configurable threshold and then take a look at the process before it goes off the deep end. It is a fair amount of work to set this up, but really useful if you need to support a critical system. -Mike On 6/1/2011 3:42 PM, Jonathan Rochkind wrote: First guess (and it really is just a guess) would be Java garbage collection taking over. There are some JVM parameters you can use to tune the GC process, especially if the machine is multi-core, making sure GC happens in a seperate thread is helpful. But figuring out exactly what's going on requires confusing JVM debugging of which I am no expert at either. On 6/1/2011 3:04 PM, Chris Cowan wrote: About once a day a Solr/Jetty process gets hung on my server consuming 100% of one of the CPU's. Once this happens the server no longer responds to requests. I've looked through the logs to try and see if anything stands out but so far I've found nothing out of the ordinary. My current remedy is to log in and just kill the single processes that's hung. Once that happens everything goes back to normal and I'm good for a day or so. I'm currently the running following: solr-jetty-1.4.0+ds1-1ubuntu1 which is comprised of Solr 1.4.0 Jetty 6.1.22 on Unbuntu 10.10 I'm pretty new to managing a Jetty/Solr instance so at this point I'm just looking for advice on how I should go about trouble shooting this problem. Chris
Re: Index vs. Query Time Aware Filters
It doesn't look like this is supported in any way that is at all straightforward. http://wiki.apache.org/solr/SolrPlugins talks about the easy ways to parameterize plugins, and they don't include what you're after. I think maybe you could extend the query parser you are currently using, wrap the parse() method, get a hold of your analyzer, which maybe is your own class with special knowledge of its filter chain and can inform the filter that it's being used in "query" mode; otherwise it would default to index mode. If you are letting Solr generate the Analyzer, or maybe in either case (?) you could call Analyzer.reusableTokenStream() to get the TokenStream, but from there things get murky. I don't think TokenStream provides any mechanism to walk the chain so you could find your special filter and inform it of its status. You'd probably have to add your own mechanism for tracking this, extending all TokenStreams, but I don't think this is actually feasible since these are required to be final! -Mike On 6/1/2011 12:23 PM, Mike Schultz wrote: I should have explained that the queryMode parameter is for our own custom filter. So the result is that we have 8 filters in our field definition. All the filter parameters (30 or so) of the query time and index time are identical EXCEPT for our one custom filter which needs to know if it's in query time or index time mode. If we could determine inside our custom code whether we're indexing or querying, then we could omit the query time definition entirely and save about 50 lines of configuration and be much less error prone. One possible solution would be if we could get at the SolrCore from within a filter. Then at init time we could iterate through the filter chains and determine when we find a factory == this. (I've done this in other places where it's useful to know the name of a ValueSourceParser for example) -- View this message in context: http://lucene.472066.n3.nabble.com/Index-vs-Query-Time-Aware-Filters-tp3009450p3011556.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie question: how to deal with different # of search results per page due to pagination then grouping
Just keep one extra facet value hidden; ie request one more than you need to show the current page. If you get it, there are more (show the next button), otherwise there aren't. You can't page arbitrarily deep like this, but you can have a next button reliably enabled or disabled. On 6/1/2011 5:57 PM, Robert Petersen wrote: Yes that is exactly the issue... we're thinking just maybe always have a next button and if you go too far you just get zero results. User gets what the user asks for, and so user could simply back up if desired to where the facet still has values. Could also detect an empty facet results on the front end. You can also only expand one facet only to allow paging only the facet pane and not the whole page using an ajax call. -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, June 01, 2011 2:30 PM To: solr-user@lucene.apache.org Cc: Robert Petersen Subject: Re: Newbie question: how to deal with different # of search results per page due to pagination then grouping How do you know whether to provide a 'next' button, or whether you are the end of your facet list? On 6/1/2011 4:47 PM, Robert Petersen wrote: I think facet.offset allows facet paging nicely by letting you index into the list of facet values. It is working for me... http://wiki.apache.org/solr/SimpleFacetParameters#facet.offset -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, June 01, 2011 12:41 PM To: solr-user@lucene.apache.org Subject: Re: Newbie question: how to deal with different # of search results per page due to pagination then grouping There's no great way to do that. One approach would be using facets, but that will just get you the author names (as stored in fields), and not the documents under it. If you really only want to show the author names, facets could work. One issue with facets though is Solr won't tell you the total number of facet values for your query, so it's tricky to provide next/prev paging through them. There is also a 'field collapsing' feature that I think is not in a released Solr, but may be in the Solr repo. I'm not sure it will quite do what you want either though, although it's related and worth a look. http://wiki.apache.org/solr/FieldCollapsing Another vaguely related thing that is also not yet in a released Solr, is a 'join' function. That could possibly be used to do what you want, although it'd be tricky too. https://issues.apache.org/jira/browse/SOLR-2272 Jonathan On 6/1/2011 2:56 PM, beccax wrote: Apologize if this question has already been raised. I tried searching but couldn't find the relevant posts. We've indexed a bunch of documents by different authors. Then for search results, we'd like to show the authors that have 1 or more documents matching the search keywords. The problem is right now our solr search method first paginates results to 100 documents per page, then we take the results and group by authors. This results in different number of authors per page. (Some authors may only have one matching document and others 5 or 10.) How do we change it to somehow show the same number of authors (say 25) per page? I mean alternatively we could just show all the documents themselves ordered by author, but it's not the user experience we're looking for. Thanks so much. And please let me know if you need more details not provided here. B -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-question-how-to-deal-with-diff erent-of-search-results-per-page-due-to-pagination-then-grouping-tp30121 68p3012168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: tika and solr 3,1 integration
Hi This is fixed .. yes, schema.xml was the culprit and i fixed it looking at the sample schema provided in the sample. But in windows, i am getting slf4j (illegalacess exception) which looks like jar problem. looking at the fixes, suggested in their FAQs, they are suggesting to use 1.5.5 version, which is already there in lib folder .. i have been finding a lot of jars to be deployed .. i am afraid if that is causing the problem .. Has somebody experienced the same ? Thanks Naveen On Fri, Jun 3, 2011 at 2:41 AM, Juan Grande wrote: > Hi Naveen, > > Check if there is a dynamic field named "attr_*" in the schema. The > "uprefix=attr_" parameter means that if Solr can't find an extracted field > in the schema, it'll add the prefix "attr_" and try again. > > *Juan* > > > > On Thu, Jun 2, 2011 at 4:21 AM, Naveen Gupta wrote: > > > Hi > > > > I am trying to integrate solr 3.1 and tika (which comes default with the > > version) > > > > and using curl command trying to index few of the documents, i am getting > > this error. the error is attr_meta field is unknown. i checked the > > solrconfig, it looks perfect to me. > > > > can you please tell me what i am missing. > > > > I copied all the jars from contrib/extraction/lib to solr/lib folder that > > is > > there in same place where conf is there > > > > > > I am using the same request handler which is coming with default > > > > > startup="lazy" > > class="solr.extraction.ExtractingRequestHandler" > > > > > > > text > > true > > ignored_ > > > > > > true > > links > > ignored_ > > > > > > > > > > > > > > > > * curl " > > > > > http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true > > " > > -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"* > > > > > > Apache Tomcat/6.0.18 - Error > report > > HTTP Status 400 - ERROR:unknown field > 'attr_meta' > size="1" noshade="noshade">type Status > > reportmessage > > ERROR:unknown field 'attr_meta'description The > > request sent by the client was syntactically incorrect (ERROR:unknown > field > > 'attr_meta').Apache > > Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib# > > > > > > Please note > > > > i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows > > machine > > and using solr cell > > > > calling the program works fine without any changes in configuration. > > > > Thanks > > Naveen > > >
Strategy --> Frequent updates in our application
Hi We are having an application where every 10 mins, we are doing indexing of users docs repository, and eventually, if some thread is being added in that particular discussion, we need to index the thread again (please note we are not doing blind indexing each time, we have various rules to filter out which thread is new and thus that is a candidate for indexing plus new ones which has arrived). So we are doing updates for each user docs repository .. the performance is not looking so far very good. the future is that we are going to get hits in volume(1000 to 10,000 hits per mins), so looking for strategy where we can tune solr in order to index the data in real time and what about NRT, is it fine to apply in this case of scenario. i read that solr NRT is not very good in performance, but i am not going to believe it since it is one of the best open sources ..so it is going to have this problem sorted in near future ..but if any benchmark is there, kindly share with me ... we would like to analyze with our requirements. Is there any way to add incremental indexes which we generally find in other search engine like endeca and etc? i don't know much in detail about solr... since i am newbie, so can you please tell me if we can have some settings which can keep track of incremental indexing? Thanks Naveen
RE: solr Invalid Date in Date Math String/Invalid Date String
Hi Erick Here is the error message: Fieldtype: tdate (I use the default one in solr schema.xml) Field value(Index): 2006-12-22T13:52:13Z Field value(query): [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z] <<< with '[' and ']' And it generates the result below: ---Start--- HTTP ERROR: 500 org.apache.solr.common.SolrException: Invalid Date in Date Math String:'[2006-12-22T00:00:00Z TO 2006-12' org.apache.jasper.JasperException: org.apache.solr.common.SolrException: Invalid Date in Date Math String:'[2006-12-22T00:00:00Z TO 2006-12' at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:4 02) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 264) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler .java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl ection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11 4) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java: 835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22 6) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4 42) Caused by: org.apache.solr.common.SolrException: Invalid Date in Date Math String:'[2006-12-22T00:00:00Z TO 2006-12' at org.apache.solr.schema.DateField.parseMath(DateField.java:158) at org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:101) at org.apache.solr.analysis.TrieTokenizer.(TrieTokenizerFactory.java:73) at org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.ja va:51) at org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.ja va:41) at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain. java:69) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java: 74) at org.apache.jsp.admin.analysis_jsp._jspService(org.apache.jsp.admin.analysis_ jsp:685) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3 73) ... 29 more Caused by: java.text.ParseException: Unparseable date: "[2006-12-22T00:00:00Z" at java.text.DateFormat.parse(Unknown Source) at org.apache.solr.schema.DateField.parseDate(DateField.java:254) at org.apache.solr.schema.DateField.parseMath(DateField.java:156) ... 39 more RequestURI=/solr/i-audience.com-contacts-test/admin/analysis.jsp Powered by Jetty:// --- End --- Can you tell me what is the problem? Thank you very much in advance. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 2011年5月31日 9:54 下午 To: solr-user@lucene.apache.org; elleryle...@be-o.com Subject: Re: solr Invalid Date in Date Math String/Invalid Date String Can we see the results of attaching &debugQuery=on to the query? That often points out the issue. I'd expect this form to work: [
Re: Strategy --> Frequent updates in our application
Naveen, Solr does support incremental indexing. Solr currently doesn't make use of Lucene's NRT support, but that is starting to change. If you provide more specifics about issues you are having and your architecture, data and query volume, we may be able to help better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Naveen Gupta > To: solr-user@lucene.apache.org > Sent: Thu, June 2, 2011 11:29:42 PM > Subject: Strategy --> Frequent updates in our application > > Hi > > We are having an application where every 10 mins, we are doing indexing of > users docs repository, and eventually, if some thread is being added in that > particular discussion, we need to index the thread again (please note we are > not doing blind indexing each time, we have various rules to filter out > which thread is new and thus that is a candidate for indexing plus new ones > which has arrived). > > So we are doing updates for each user docs repository .. the performance is > not looking so far very good. the future is that we are going to get hits in > volume(1000 to 10,000 hits per mins), so looking for strategy where we can > tune solr in order to index the data in real time > > and what about NRT, is it fine to apply in this case of scenario. i read > that solr NRT is not very good in performance, but i am not going to believe > it since it is one of the best open sources ..so it is going to have this > problem sorted in near future ..but if any benchmark is there, kindly share > with me ... we would like to analyze with our requirements. > > Is there any way to add incremental indexes which we generally find in other > search engine like endeca and etc? i don't know much in detail about solr... > since i am newbie, so can you please tell me if we can have some settings > which can keep track of incremental indexing? > > > Thanks > Naveen >
Re: Indexes in ramdisk don't show performance improvement?
Park, I think there is no way initial queries will be the same IF: * your index in ramfs is really in RAM * your index in regular FS is not already in RAM due to being previously cached (you *did* flush OS cache before the test, right?) Having said that, if you update your index infrequently and make use of warm up queries and cache warming, you are likely to be very fine with the index on disk. For example, we have a customer right now that we helped a bit with performance. They also have lots of RAM, 10M docs in the index, and replicate the whole optimized index nightly. They have 2 servers, each handling about 1000 requests per minute and their average response time is under 20 ms with pre-1.4.1 Solr and lots of facets and fqs (they use Solr not only for search, but also navigation). No ramfs involved, but they have zero disk reads because the whole index is cached in memory, so things are fast. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Parker Johnson > To: "solr-user@lucene.apache.org" > Sent: Thu, June 2, 2011 9:20:55 PM > Subject: Re: Indexes in ramdisk don't show performance improvement? > > > That¹s just the thing. Even the initial queries have similar response > times as the later ones. WEIRD! > > I was considering running from /dev/shm in production, but for slaves only > (master remains on disk). At this point though, I'm not seeing a benefit > to ramdisk so I think I'm going back to traditional disk so the indexes > stay intact after a power cycle. > > Has anyone else seen that indexes served from disk perform similarly as > indexes served from ramdisk? > > -Park > > On 6/2/11 4:15 PM, "Erick Erickson" wrote: > > >What I expect is happening is that the Solr caches are effectively making > >the > >two tests identical, using memory to hold the vital parts of the code in > >both > >cases (after disk warming on the instance using the local disk). I > >suspect if > >you measured the first few queries (assuming no auto-warming) you'd see > >the > >local disk version be slower. > > > >Were you running these tests for curiosity or is running from /dev/shm > >something > >you're considering for production? > > > >Best > >Erick > > > >On Thu, Jun 2, 2011 at 5:47 PM, Parker Johnson > >wrote: > >> > >> Hey everyone. > >> > >> Been doing some load testing over the past few days. I've been throwing > >>a > >> good bit of load at an instance of solr and have been measuring response > >> time. We're running a variety of different keyword searches to keep > >> solr's cache on its toes. > >> > >> I'm running two exact same load testing scenarios: one with indexes > >> residing in /dev/shm and another from local disk. The indexes are about > >> 4.5GB in size. > >> > >> On both tests the response times are the same. I wasn't expecting that. > >> I do see the java heap size grow when indexes are served from disk > >>(which > >> is expected). When the indexes are served out of /dev/shm, the java > >>heap > >> stays small. > >> > >> So in general is this consistent behavior? I don't really see the > >> advantage of serving indexes from /dev/shm. When the indexes are being > >> served out of ramdisk, is the linux kernel or the memory mapper doing > >> something tricky behind the scenes to use ramdisk in lieu of the java > >>heap? > >> > >> For what it is worth, we are running x_64 rh5.4 on a 12 core 2.27Ghz > >>Xeon > >> system with 48GB ram. > >> > >> Thoughts? > >> > >> -Park > >> > >> > >> > > > > >
Re: query routing with shards
Hi Dmitry (you may not want to additionally copy Yonik, he's subscribed to this list, too) It sounds like you have the knowledge of which query maps to which shard. If so, why not control/change the value of "shards" param in the request to your front-end Solr (aka distributed request dispatcher) within your app, which is the one calling Solr? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Dmitry Kan > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Sent: Thu, June 2, 2011 7:00:53 AM > Subject: query routing with shards > > Hello all, > > We have currently several pretty fat logically isolated shards with the same > schema / solrconfig (indices are separate). We currently have one single > front end SOLR (1.4) for the client code calls. Since a client code query > usually hits only one shard, we are considering making a smart routing of > queries to the shards they map to. Can you please give some pointers as to > what would be an optimal way to achieve such a routing inside the front end > solr? Is there a way to configure mapping inside the solrconfig? > > Thanks. > > -- > Regards, > > Dmitry Kan >
java.io.IOException: The specified network name is no longer available
Hi, I am using solr 1.4.1 and at the time of updating index getting following error: 2011-06-03 05:54:06,943 ERROR [org.apache.solr.core.SolrCore] (http-10.38.33.146-8080-4) java.io.IOException: The specified network name is no longer available at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:322) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:57) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1103) at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:981) at org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:320) at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:640) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:545) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:581) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:903) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:181) at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285) at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:654) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:951) at java.lang.Thread.run(Thread.java:619) 2011-06-03 05:54:06,943 INFO [org.apache.solr.core.SolrCore] (http-10.38.33.146-8080-4) [project_58787] webapp=/solr path=/select params={sort=revisionid_l+desc&start=0&q=type_s:IFCFileMaster+AND+modelversionid_l:(+8+7+)&wt=javabin&fq=reftable_s:IFCRELDEFINESBYPROPERTIES&version=1&rows=100} status=500 QTime=0 2011-06-03 05:54:06,990 ERROR [org.apache.solr.servlet.SolrDispatchFilter] (http-10.38.33.146-8080-4) java.io.IOException: The specified network name is no longer available at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:322) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132) a
different indexes for multitenant approach
Hi I want to implement different index strategy where we want to keep indexes with respect to each tennant and we want to maintain indexes separately ... first level of category -- company name second level of category - company name + fields to be indexed then further categories - group of different company name based on some heuristic (hashing) (if it grows furhter) i want to do in the same solr instance. can it be possible ? Thanks Naveen
Re: how to make getJson parameter dynamic
lee carroll: Sorry for this. i did this because i was not getting any response. anyway thanks for letting me know and now i found the solution of the above problem :) now i am facing a very strange problem related to jquery can you please help me out. $(document).ready(function(){ $("#c2").click(function(){ var q=getquerystring() ; $.getJSON("http://192.168.1.9:8983/solr/db/select/?wt=json&q="+q+"&json.wrf=?";, function(result){ $.each(result.response.docs, function(i,item){ alert(result.response.docs); alert(item.UID_PK); }); }); }); }); when i use $("#c2").click(function() then it does not enter in $.getJSON() and when i remove $("#c2").click(function() from the code it run fine. Why is so please explain. because i want to get data from a text box on onclickevent and then display response. - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3018732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to display search results of solr in to other application.
$.getJSON( "http://[server]:[port]/solr/select/?jsoncallback=?";, {"q": queryString, "version": "2.2", "start": "0", "rows": "10", "indent": "on", "json.wrf": "callbackFunctionToDoSomethingWithOurData", "wt": "json", "fl": "field1"} ); would you please explain what are queryString and "json.wrf": "callbackFunctionToDoSomethingWithOurData". and what if i want to change my query string each time. - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3018740.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query routing with shards
Hi Otis, I merely followed on the gmail's suggestion to include other people into the recipients list, Yonik was the first one :) I won't do it next time. Thanks for a rapid reply. The reason for doing this query routing is that we abstract the distributed SOLR from the client code for security reasons (that is, we don't want to expose the entire shard farm to the world, but only the frontend SOLR) and for better decoupling. Is it possible to implement a plugin to SOLR that would map queries to shards? We have other choices too, they'll take quite some time, that's why I decided to quickly ask, if I was missing something from the SOLR main components design and configuration. Dmitry On Fri, Jun 3, 2011 at 8:25 AM, Otis Gospodnetic wrote: > Hi Dmitry (you may not want to additionally copy Yonik, he's subscribed to > this > list, too) > > > It sounds like you have the knowledge of which query maps to which shard. > If > so, why not control/change the value of "shards" param in the request to > your > front-end Solr (aka distributed request dispatcher) within your app, which > is > the one calling Solr? > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: Dmitry Kan > > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > > Sent: Thu, June 2, 2011 7:00:53 AM > > Subject: query routing with shards > > > > Hello all, > > > > We have currently several pretty fat logically isolated shards with the > same > > schema / solrconfig (indices are separate). We currently have one single > > front end SOLR (1.4) for the client code calls. Since a client code > query > > usually hits only one shard, we are considering making a smart routing > of > > queries to the shards they map to. Can you please give some pointers as > to > > what would be an optimal way to achieve such a routing inside the front > end > > solr? Is there a way to configure mapping inside the solrconfig? > > > > Thanks. > > > > -- > > Regards, > > > > Dmitry Kan > > > -- Regards, Dmitry Kan
Re: How to display search results of solr in to other application.
Hi Romi As per me, you need to understand how ajax with jquery works .. then go for json and then jsonp (if you are fetching from different) query here is dynamic query which you will be trying to hit solr .. (it could be simple text, or more advanced query string) http://wiki.apache.org/solr/CommonQueryParameters Callback is the method name which you will define .. after getting response, this method will be called (callback mechanism) using the response from solr (json format), you need to show the response or analyze the response as per your business need. Thanks Naveen On Fri, Jun 3, 2011 at 12:00 PM, Romi wrote: > $.getJSON( > "http://[server]:[port]/solr/select/?jsoncallback=?";, > {"q": queryString, > "version": "2.2", > "start": "0", > "rows": "10", > "indent": "on", > "json.wrf": "callbackFunctionToDoSomethingWithOurData", > "wt": "json", > "fl": "field1"} > ); > > would you please explain what are queryString and "json.wrf": > "callbackFunctionToDoSomethingWithOurData". and what if i want to change my > query string each time. > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3018740.html > Sent from the Solr - User mailing list archive at Nabble.com. >