Re: Solrj Javabin and JSON
There is no point converting javabin to json. javabin is in intermediate format it is converted to the java objects as soon as comes. You just need means to convert the java object to json. On Sat, Oct 24, 2009 at 12:10 PM, SGE0 wrote: > > Hi, > > did anyone write a Javabin to JSON convertor and is willing to share this ? > > In our servlet we use a CommonsHttpSolrServer instance to execute a query. > > The problem is that is returns Javabin format and we need to send the result > back to the browser using JSON format. > > And no, the browser is not allowed to directly query Lucene with the wt=json > format. > > Regards, > > S. > -- > View this message in context: > http://www.nabble.com/Solrj-Javabin-and-JSON-tp26036551p26036551.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Where the new replication pulls the files?
On Fri, Oct 23, 2009 at 11:46 PM, Jérôme Etévé wrote: > Hi all, > I'm wondering where a slave pulls the files from the master on replication. > > Is it directly to the index/ directory or is it somewhere else before > it's completed and gets copied to index? > it is copied to a emp dir till all the files are downloaded > Cheers! > > Jerome. > > -- > Jerome Eteve. > http://www.eteve.net > jer...@eteve.net > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solrj client API and response in XML format (Solr 1.4)
hi you don't see the point . You really don't need to use SolrJ . All that you need to do is just make an http request with wt=json and read the output to a buffer and you can just send it to your client. --Noble On Fri, Oct 23, 2009 at 9:40 PM, SGE0 wrote: > > Hi All, > > After a day of searching I'm quite confused. > > I use the solrj client as follows: > > CommonsHttpSolrServer solr = new > CommonsHttpSolrServer("http://127.0.0.1:8080/apache-solr-1.4-dev/test";); > solr.setRequestWriter(new BinaryRequestWriter()); > > ModifiableSolrParams params = new ModifiableSolrParams(); > params.set("qt", "dismax"); > params.set("indent", "on"); > params.set("version", "2.2"); > params.set("q", "test"); > params.set("start", "0"); > params.set("rows", "10"); > params.set("wt", "xml"); > params.set("hl", "on"); > QueryResponse response = solr.query(params); > > > How can I get the query result (response) in XML format out f? > > I know it sounds stupid but I can't seem to manage that. > > What do I need to do with the response object to get the response in XML > format ? > > I already understood I can"t get the result in JSON so my idea was to go > from XML to JSON. > > Thx for your answer already ! > > S. > > > > > System.out.println("response = " + response); > SolrDocumentList sdl = response.getResults(); > -- > View this message in context: > http://www.nabble.com/Solrj-client-API-and-response-in-XML-format-%28Solr-1.4%29-tp26029197p26029197.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solrj Javabin and JSON
Hi Paul, fair enough. Is this included in the Solrj package ? Any examples how to do this ? Stefan Noble Paul നോബിള് नोब्ळ्-2 wrote: > > There is no point converting javabin to json. javabin is in > intermediate format it is converted to the java objects as soon as > comes. You just need means to convert the java object to json. > > > > On Sat, Oct 24, 2009 at 12:10 PM, SGE0 wrote: >> >> Hi, >> >> did anyone write a Javabin to JSON convertor and is willing to share this >> ? >> >> In our servlet we use a CommonsHttpSolrServer instance to execute a >> query. >> >> The problem is that is returns Javabin format and we need to send the >> result >> back to the browser using JSON format. >> >> And no, the browser is not allowed to directly query Lucene with the >> wt=json >> format. >> >> Regards, >> >> S. >> -- >> View this message in context: >> http://www.nabble.com/Solrj-Javabin-and-JSON-tp26036551p26036551.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Solrj-Javabin-and-JSON-tp26036551p26037017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj client API and response in XML format (Solr 1.4)
Hi Paul, thx again. Can I use this technique from within a servlet ? Do I need an instance of the HttpClient to do that ? I noticed I can instantiate the CommonsHttpSolrServer with a HttpClient client . I did not find any relevant examples how to use this . If you can help me out with this much appreciated.. Stefan Noble Paul നോബിള് नोब्ळ्-2 wrote: > > hi > you don't see the point . You really don't need to use SolrJ . All > that you need to do is just make an http request with wt=json and read > the output to a buffer and you can just send it to your client. > --Noble > > > > On Fri, Oct 23, 2009 at 9:40 PM, SGE0 wrote: >> >> Hi All, >> >> After a day of searching I'm quite confused. >> >> I use the solrj client as follows: >> >> CommonsHttpSolrServer solr = new >> CommonsHttpSolrServer("http://127.0.0.1:8080/apache-solr-1.4-dev/test";); >> solr.setRequestWriter(new BinaryRequestWriter()); >> >> ModifiableSolrParams params = new ModifiableSolrParams(); >> params.set("qt", "dismax"); >> params.set("indent", "on"); >> params.set("version", "2.2"); >> params.set("q", "test"); >> params.set("start", "0"); >> params.set("rows", "10"); >> params.set("wt", "xml"); >> params.set("hl", "on"); >> QueryResponse response = solr.query(params); >> >> >> How can I get the query result (response) in XML format out f? >> >> I know it sounds stupid but I can't seem to manage that. >> >> What do I need to do with the response object to get the response in XML >> format ? >> >> I already understood I can"t get the result in JSON so my idea was to go >> from XML to JSON. >> >> Thx for your answer already ! >> >> S. >> >> >> >> >> System.out.println("response = " + response); >> SolrDocumentList sdl = response.getResults(); >> -- >> View this message in context: >> http://www.nabble.com/Solrj-client-API-and-response-in-XML-format-%28Solr-1.4%29-tp26029197p26029197.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Solrj-client-API-and-response-in-XML-format-%28Solr-1.4%29-tp26029197p26037037.html Sent from the Solr - User mailing list archive at Nabble.com.
Date Facet Giving Count more than actual
hi guys, I am indexing events in solr, where every Event contains a startDate and endDate. On the search page, I would like to have a Date Facet where users can quickly browse through dates they are interested in. I have a field daysForFilter in each document which stores timestamps from today till endDate as -MM-ddT00:00:01Z. The reason I have kept 01 seconds is to avoid overlap between two dates when calculating facets. My application works on IST time zone, thus date 2009-10-24 00:00:00 is stored in solr as 2009-10-23 18:30:00. I am using Date Faceting on this field, and the date facet query is something like this. q=&facet=true&facet.date=daysForFilter&facet.date.start=2009-10-23T18:30:01Z&facet.date.gap=%2B1DAY&facet.date.end=2009-10-28T18:30:01Z Ideally I should get correct date facets with count of events occuring on that date. But for some dates I get count more that existing in the result. For example I get total 18 documents for my query, and the facet count for date 2009-10-23T18:30:01Z is 11; whereas there are only 5 documents containing this field value. I have verified this in result. Also when I query for daysForFilter:2009-10-23T18:30:01Z, it gives me 5 results. I am really helpless with this problem, and do not understand why its generating such wrong facets. It would be great if any one can guide me further. regards, aakash
Solr under tomcat - UTF-8 issue
Hoping someone can help - Problem: Querying for non-english phrases such as Добавить do not return any results under Tomcat but do work when using the Jetty example. Both tomcat and jetty are being queried by the same custom (flash) client and both reference the same solr/data/index. I'm using an http POST rather than http GET to do the query to solr. I believe the problem must be in how tomcat is configured and had hoped the -Dfile.encoding=UTF-8 would solve it - but no luck. I've stopped started tomcat and deleted the work directory as well. Results are the same in both IE6 and Firefox and I've used both firebug and fiddler to view the http request/responses. It is consistent - jetty works, tomcat does not. Environment: Tomcat 6 as a service on WinXP Professional 2002 sp 2 Tomcat Java properties - -Dcatalina.home=C:\Program Files\Apache Software Foundation\Tomcat 6.0 -Dcatalina.base=C:\Program Files\Apache Software Foundation\Tomcat 6.0 -Djava.endorsed.dirs=C:\Program Files\Apache Software Foundation\Tomcat 6.0\endorsed -Djava.io.tmpdir=C:\Program Files\Apache Software Foundation\Tomcat 6.0\temp -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=C:\Program Files\Apache Software Foundation\Tomcat 6.0\conf\logging.properties -Dfile.encoding=UTF-8 Thanks in advance. Tom Glock
Re: Solr under tomcat - UTF-8 issue
Hello Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml (on connector element)? Like: Hope this helps. Best regards czinkos 2009/10/24 Glock, Thomas : > > Hoping someone can help - > > Problem: > Querying for non-english phrases such as Добавить do not return any > results under Tomcat but do work when using the Jetty example. > > Both tomcat and jetty are being queried by the same custom (flash) > client and both reference the same solr/data/index. > > I'm using an http POST rather than http GET to do the query to solr. > I believe the problem must be in how tomcat is configured and had hoped the > -Dfile.encoding=UTF-8 would solve it - but no luck. I've stopped started > tomcat and deleted the work directory as well. > > Results are the same in both IE6 and Firefox and I've used both > firebug and fiddler to view the http request/responses. It is consistent - > jetty works, tomcat does not. > > Environment: > Tomcat 6 as a service on WinXP Professional 2002 sp 2 > Tomcat Java properties - > > -Dcatalina.home=C:\Program Files\Apache Software Foundation\Tomcat 6.0 > -Dcatalina.base=C:\Program Files\Apache Software Foundation\Tomcat 6.0 > -Djava.endorsed.dirs=C:\Program Files\Apache Software > Foundation\Tomcat 6.0\endorsed > -Djava.io.tmpdir=C:\Program Files\Apache Software Foundation\Tomcat > 6.0\temp > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > -Djava.util.logging.config.file=C:\Program Files\Apache Software > Foundation\Tomcat 6.0\conf\logging.properties > -Dfile.encoding=UTF-8 > > Thanks in advance. > Tom Glock > >
RE: Solr under tomcat - UTF-8 issue
Thanks but not working... I did have the URIEncoding in place and just again moved the URIEncoding attribute to be the first attribute - ensured I saved sever.xml, shut down tomcat, deleted logs and cache and still no luck Its probably something very simple and I'm just missing it. Thanks for your help. -Original Message- From: Zsolt Czinkos [mailto:czin...@gmail.com] Sent: Saturday, October 24, 2009 11:36 AM To: solr-user@lucene.apache.org Subject: Re: Solr under tomcat - UTF-8 issue Hello Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml (on connector element)? Like: Hope this helps. Best regards czinkos 2009/10/24 Glock, Thomas : > > Hoping someone can help - > > Problem: > Querying for non-english phrases such as Добавить do not return any > results under Tomcat but do work when using the Jetty example. > > Both tomcat and jetty are being queried by the same custom (flash) > client and both reference the same solr/data/index. > > I'm using an http POST rather than http GET to do the query to solr. > I believe the problem must be in how tomcat is configured and had hoped the > -Dfile.encoding=UTF-8 would solve it - but no luck. I've stopped started > tomcat and deleted the work directory as well. > > Results are the same in both IE6 and Firefox and I've used both > firebug and fiddler to view the http request/responses. It is consistent - > jetty works, tomcat does not. > > Environment: > Tomcat 6 as a service on WinXP Professional 2002 sp 2 > Tomcat Java properties - > > -Dcatalina.home=C:\Program Files\Apache Software > Foundation\Tomcat 6.0 > -Dcatalina.base=C:\Program Files\Apache Software > Foundation\Tomcat 6.0 > -Djava.endorsed.dirs=C:\Program Files\Apache Software > Foundation\Tomcat 6.0\endorsed > -Djava.io.tmpdir=C:\Program Files\Apache Software > Foundation\Tomcat 6.0\temp > > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > -Djava.util.logging.config.file=C:\Program Files\Apache > Software Foundation\Tomcat 6.0\conf\logging.properties > -Dfile.encoding=UTF-8 > > Thanks in advance. > Tom Glock > >
RE: Too many open files
I had extremely specific use case; about 5000 documents-per-second (small documents) update rate, some documents can be repeatedly sent to SOLR with different timestamp field (and same unique document ID). Nothing breaks, just a great performance gain which was impossible with 32GB Buffer (- it caused constant index merge, 5 times more CPU than index update). Nothing breaks... with indexMerge=10 I don't have ANY merge during 24 hours; segments are large (few of 4Gb-8Gb, and one large "union"); I have "merge" explicitly only, at night, when I issue "commit". Of course, it depends on use case, for applications such as "Content Management System" we don't need high remBufferSizeMB (few updates a day sent to SOLR)... > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: October-23-09 5:28 PM > To: solr-user@lucene.apache.org > Subject: Re: Too many open files > > 8 GB is much larger than is well supported. Its diminishing returns over > 40-100 and mostly a waste of RAM. Too high and things can break. It > should be well below 2 GB at most, but I'd still recommend 40-100. > > Fuad Efendi wrote: > > Reason of having big RAM buffer is lowering frequency of IndexWriter flushes > > and (subsequently) lowering frequency of index merge events, and > > (subsequently) merging of a few larger files takes less time... especially > > if RAM Buffer is intelligent enough (and big enough) to deal with 100 > > concurrent updates of existing document without 100-times flushing to disk > > of 100 document versions. > > > > I posted here thread related; I had 1:5 timing for Update:Merge (5 minutes > > merge, and 1 minute update) with default SOLR settings (32Mb buffer). I > > increased buffer to 8Gb on Master, and it triggered significant indexing > > performance boost... > > > > -Fuad > > http://www.linkedin.com/in/liferay > > > > > > > >> -Original Message- > >> From: Mark Miller [mailto:markrmil...@gmail.com] > >> Sent: October-23-09 3:03 PM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Too many open files > >> > >> I wouldn't use a RAM buffer of a gig - 32-100 is generally a good number. > >> > >> Fuad Efendi wrote: > >> > >>> I was partially wrong; this is what Mike McCandless (Lucene-in-Action, > >>> > > 2nd > > > >>> edition) explained at Manning forum: > >>> > >>> mergeFactor of 1000 means you will have up to 1000 segments at each > >>> > > level. > > > >>> A level 0 segment means it was flushed directly by IndexWriter. > >>> After you have 1000 such segments, they are merged into a single level 1 > >>> segment. > >>> Once you have 1000 level 1 segments, they are merged into a single level > >>> > > 2 > > > >>> segment, etc. > >>> So, depending on how many docs you add to your index, you'll could have > >>> 1000s of segments w/ mergeFactor=1000. > >>> > >>> http://www.manning-sandbox.com/thread.jspa?threadID=33784&tstart=0 > >>> > >>> > >>> So, in case of mergeFactor=100 you may have (theoretically) 1000 > >>> > > segments, > > > >>> 10-20 files each (depending on schema)... > >>> > >>> > >>> mergeFactor=10 is default setting... ramBufferSizeMB=1024 means that you > >>> need at least double Java heap, but you have -Xmx1024m... > >>> > >>> > >>> -Fuad > >>> > >>> > >>> > >>> > I am getting too many open files error. > > Usually I test on a server that has 4GB RAM and assigned 1GB for > tomcat(set JAVA_OPTS=-Xms256m -Xmx1024m), ulimit -n is 256 for this > server and has following setting for SolrConfig.xml > > > > true > > 1024 > > 100 > > 2147483647 > > 1 > > > > >>> > >>> > >> -- > >> - Mark > >> > >> http://www.lucidimagination.com > >> > >> > >> > > > > > > > > > > > -- > - Mark > > http://www.lucidimagination.com > >
Re: Solrj client API and response in XML format (Solr 1.4)
no need to use httpclient . use java.net.URL#openConnection(url) and read the inputstream into a buffer and that is it. On Sat, Oct 24, 2009 at 1:53 PM, SGE0 wrote: > > Hi Paul, > > thx again. > > Can I use this technique from within a servlet ? > > Do I need an instance of the HttpClient to do that ? > > I noticed I can instantiate the CommonsHttpSolrServer with a HttpClient > client . > > I did not find any relevant examples how to use this . > > If you can help me out with this much appreciated.. > > Stefan > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> hi >> you don't see the point . You really don't need to use SolrJ . All >> that you need to do is just make an http request with wt=json and read >> the output to a buffer and you can just send it to your client. >> --Noble >> >> >> >> On Fri, Oct 23, 2009 at 9:40 PM, SGE0 wrote: >>> >>> Hi All, >>> >>> After a day of searching I'm quite confused. >>> >>> I use the solrj client as follows: >>> >>> CommonsHttpSolrServer solr = new >>> CommonsHttpSolrServer("http://127.0.0.1:8080/apache-solr-1.4-dev/test";); >>> solr.setRequestWriter(new BinaryRequestWriter()); >>> >>> ModifiableSolrParams params = new ModifiableSolrParams(); >>> params.set("qt", "dismax"); >>> params.set("indent", "on"); >>> params.set("version", "2.2"); >>> params.set("q", "test"); >>> params.set("start", "0"); >>> params.set("rows", "10"); >>> params.set("wt", "xml"); >>> params.set("hl", "on"); >>> QueryResponse response = solr.query(params); >>> >>> >>> How can I get the query result (response) in XML format out f? >>> >>> I know it sounds stupid but I can't seem to manage that. >>> >>> What do I need to do with the response object to get the response in XML >>> format ? >>> >>> I already understood I can"t get the result in JSON so my idea was to go >>> from XML to JSON. >>> >>> Thx for your answer already ! >>> >>> S. >>> >>> >>> >>> >>> System.out.println("response = " + response); >>> SolrDocumentList sdl = response.getResults(); >>> -- >>> View this message in context: >>> http://www.nabble.com/Solrj-client-API-and-response-in-XML-format-%28Solr-1.4%29-tp26029197p26029197.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > > -- > View this message in context: > http://www.nabble.com/Solrj-client-API-and-response-in-XML-format-%28Solr-1.4%29-tp26029197p26037037.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
RE: Too many open files
Thanks for pointing to it, but it is so obvious: 1. "Buffer" is used as a RAM storage for index updates 2. "int" has 2 x Gb different values (2^^32) 3. We can have _up_to_ 2Gb of _Documents_ (stored as key->value pairs, inverted index) In case of 5 fields which I have, I need 5 arrays (up to 2Gb of size for each) to store inverted pointers, so that there is no any theoretical limit: > Also, from the javadoc in IndexWriter: > >* NOTE: because IndexWriter uses >* ints when managing its internal storage, >* the absolute maximum value for this setting is somewhat >* less than 2048 MB. The precise limit depends on >* various factors, such as how large your documents are, >* how many fields have norms, etc., so it's best to set >* this value comfortably under 2048. Note also, I use norms etc...
RE: Too many open files
Mark, I don't understand this; of course it is use case specific, I haven't seen any terrible behaviour with 8Gb... 32Mb is extremely small for Nutch-SOLR -like applications, but it is acceptable for Liferay-SOLR... Please note also, I have some documents with same IDs updated many thousands times a day, and I believe (I hope) IndexWriter flushes "optimized" segment instead of thousands "delete" and single "insert" in many small (32Mb) files (especially with SOLR)... > Hmm - came out worse than it looked. Here is a better attempt: > > MergeFactor: 10 > > BUF DOCS/S > 32 37.40 > 80 39.91 > 120 40.74 > 512 38.25 > > Mark Miller wrote: > > Here is an example using the Lucene benchmark package. Indexing 64,000 > > wikipedia docs (sorry for the formatting): > > > > [java] > Report sum by Prefix (MAddDocs) and Round (4 > > about 32 out of 256058) > > [java] Operation round mrg flush runCnt > > recsPerRunrec/s elapsedSecavgUsedMemavgTotalMem > > [java] MAddDocs_8000 0 10 32.00MB8 > > 800037.401,711.22 124,612,472182,689,792 > > [java] MAddDocs_8000 - 1 10 80.00MB - - 8 - - - 8000 - > > - 39.91 - 1,603.76 - 266,716,128 - 469,925,888 > > [java] MAddDocs_8000 2 10 120.00MB8 > > 800040.741,571.02 348,059,488548,233,216 > > [java] MAddDocs_8000 - 3 10 512.00MB - - 8 - - - 8000 - > > - 38.25 - 1,673.05 - 746,087,808 - 926,089,216 > > > > After about 32-40, you don't gain much, and it starts decreasing once > > you start getting to high. 8GB is a terrible recommendation. > >
RE: Too many open files
This JavaDoc is incorrect especially for SOLR, when you store raw (non tokenized, non indexed) "text" value with a document (which almost everyone does). Try to store 1,000,000 documents with 1000 bytes non-tokenized field: you will need 1Gb just for this array. > -Original Message- > From: Fuad Efendi [mailto:f...@efendi.ca] > Sent: October-24-09 12:10 PM > To: solr-user@lucene.apache.org > Subject: RE: Too many open files > > Thanks for pointing to it, but it is so obvious: > > 1. "Buffer" is used as a RAM storage for index updates > 2. "int" has 2 x Gb different values (2^^32) > 3. We can have _up_to_ 2Gb of _Documents_ (stored as key->value pairs, > inverted index) > > In case of 5 fields which I have, I need 5 arrays (up to 2Gb of size for > each) to store inverted pointers, so that there is no any theoretical limit: > > > Also, from the javadoc in IndexWriter: > > > >* NOTE: because IndexWriter uses > >* ints when managing its internal storage, > >* the absolute maximum value for this setting is somewhat > >* less than 2048 MB. The precise limit depends on > >* various factors, such as how large your documents are, > >* how many fields have norms, etc., so it's best to set > >* this value comfortably under 2048. > > > > Note also, I use norms etc... > >
Re: Too many open files
On Sat, Oct 24, 2009 at 12:18 PM, Fuad Efendi wrote: > > Mark, I don't understand this; of course it is use case specific, I haven't > seen any terrible behaviour with 8Gb If you had gone over 2GB of actual buffer *usage*, it would have broke... Guaranteed. We've now added a check in Lucene 2.9.1 that will throw an exception if you try to go over 2048MB. And as the javadoc says, to be on the safe side, you probably shouldn't go too near 2048 - perhaps 2000MB is a good practical limit. -Yonik http://www.lucidimagination.com
Re: Solr under tomcat - UTF-8 issue
Try using example/exampledocs/test_utf8.sh to narrow down if the charset problems you're hitting are due to servlet container configuration. -Yonik http://www.lucidimagination.com 2009/10/24 Glock, Thomas : > > Thanks but not working... > > I did have the URIEncoding in place and just again moved the URIEncoding > attribute to be the first attribute - ensured I saved sever.xml, shut down > tomcat, deleted logs and cache and still no luck Its probably something > very simple and I'm just missing it. > > Thanks for your help. > > > -Original Message- > From: Zsolt Czinkos [mailto:czin...@gmail.com] > Sent: Saturday, October 24, 2009 11:36 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr under tomcat - UTF-8 issue > > Hello > > Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml (on > connector element)? > > Like: > > protocol="HTTP/1.1" redirectPort="8443"/> > > Hope this helps. > > Best regards > > czinkos > > > 2009/10/24 Glock, Thomas : >> >> Hoping someone can help - >> >> Problem: >> Querying for non-english phrases such as Добавить do not return any >> results under Tomcat but do work when using the Jetty example. >> >> Both tomcat and jetty are being queried by the same custom (flash) >> client and both reference the same solr/data/index. >> >> I'm using an http POST rather than http GET to do the query to solr. >> I believe the problem must be in how tomcat is configured and had hoped the >> -Dfile.encoding=UTF-8 would solve it - but no luck. I've stopped started >> tomcat and deleted the work directory as well. >> >> Results are the same in both IE6 and Firefox and I've used both >> firebug and fiddler to view the http request/responses. It is consistent - >> jetty works, tomcat does not. >> >> Environment: >> Tomcat 6 as a service on WinXP Professional 2002 sp 2 >> Tomcat Java properties - >> >> -Dcatalina.home=C:\Program Files\Apache Software >> Foundation\Tomcat 6.0 >> -Dcatalina.base=C:\Program Files\Apache Software >> Foundation\Tomcat 6.0 >> -Djava.endorsed.dirs=C:\Program Files\Apache Software >> Foundation\Tomcat 6.0\endorsed >> -Djava.io.tmpdir=C:\Program Files\Apache Software >> Foundation\Tomcat 6.0\temp >> >> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager >> -Djava.util.logging.config.file=C:\Program Files\Apache >> Software Foundation\Tomcat 6.0\conf\logging.properties >> -Dfile.encoding=UTF-8 >> >> Thanks in advance. >> Tom Glock >> >> >
Re: Too many open files
On Sat, Oct 24, 2009 at 12:25 PM, Fuad Efendi wrote: > This JavaDoc is incorrect especially for SOLR, It looks correct to me... if you think it can be clarified, please propose how you would change it. > when you store raw (non > tokenized, non indexed) "text" value with a document (which almost everyone > does). Try to store 1,000,000 documents with 1000 bytes non-tokenized field: > you will need 1Gb just for this array. Nope. You shouldn't even need 1GB of buffer space for that. The size specified is for all things that the indexing process needs to temporarily keep in memory... stored fields are normally immediately written to disk. -Yonik http://www.lucidimagination.com
RE: Solr under tomcat - UTF-8 issue
Thanks - I now think it must be due to my client not sending enough ( or correct ) headers in the request. Tomcat does work when using an HTTP GET but is failing the POST from my flash client. For example putting this in both firefox and IE browsers url works correctly: http://localhost:8080/hranswers/elevate?fl=*%20score&indent=on&start=0&q=%D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE%D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD%D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2&fq=language_cd:ru&rows=20 The POST information my client is sending looks like this and it fails: POST /hranswers/elevate HTTP/1.1 Accept: */* Accept-Language: en-US x-flash-version: 10,0,32,18 Content-Type: application/x-www-form-urlencoded Content-Encoding: UTF-8 Content-Length: 209 Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; UserABC123) Host: localhost:8080 Connection: Keep-Alive Pragma: no-cache fq=language%5Fcd%3Aru&rows=20&start=0&fl=%2A%20score&indent=on&q=%D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE%D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD%D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2 I will keep digging - and let you know how it turns out. Thanks! -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Saturday, October 24, 2009 12:43 PM To: solr-user@lucene.apache.org Subject: Re: Solr under tomcat - UTF-8 issue Try using example/exampledocs/test_utf8.sh to narrow down if the charset problems you're hitting are due to servlet container configuration. -Yonik http://www.lucidimagination.com 2009/10/24 Glock, Thomas : > > Thanks but not working... > > I did have the URIEncoding in place and just again moved the URIEncoding > attribute to be the first attribute - ensured I saved sever.xml, shut down > tomcat, deleted logs and cache and still no luck Its probably something > very simple and I'm just missing it. > > Thanks for your help. > > > -Original Message- > From: Zsolt Czinkos [mailto:czin...@gmail.com] > Sent: Saturday, October 24, 2009 11:36 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr under tomcat - UTF-8 issue > > Hello > > Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml (on > connector element)? > > Like: > > protocol="HTTP/1.1" redirectPort="8443"/> > > Hope this helps. > > Best regards > > czinkos > > > 2009/10/24 Glock, Thomas : >> >> Hoping someone can help - >> >> Problem: >> Querying for non-english phrases such as Добавить do not return any >> results under Tomcat but do work when using the Jetty example. >> >> Both tomcat and jetty are being queried by the same custom (flash) >> client and both reference the same solr/data/index. >> >> I'm using an http POST rather than http GET to do the query to solr. >> I believe the problem must be in how tomcat is configured and had hoped the >> -Dfile.encoding=UTF-8 would solve it - but no luck. I've stopped started >> tomcat and deleted the work directory as well. >> >> Results are the same in both IE6 and Firefox and I've used both >> firebug and fiddler to view the http request/responses. It is consistent - >> jetty works, tomcat does not. >> >> Environment: >> Tomcat 6 as a service on WinXP Professional 2002 sp 2 >> Tomcat Java properties - >> >> -Dcatalina.home=C:\Program Files\Apache Software >> Foundation\Tomcat 6.0 >> -Dcatalina.base=C:\Program Files\Apache Software >> Foundation\Tomcat 6.0 >> -Djava.endorsed.dirs=C:\Program Files\Apache Software >> Foundation\Tomcat 6.0\endorsed >> -Djava.io.tmpdir=C:\Program Files\Apache Software >> Foundation\Tomcat 6.0\temp >> >> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager >> -Djava.util.logging.config.file=C:\Program Files\Apache >> Software Foundation\Tomcat 6.0\conf\logging.properties >> -Dfile.encoding=UTF-8 >> >> Thanks in advance. >> Tom Glock >> >> >
RE: Too many open files
Hi Yonik, I am still using pre-2.9 Lucene (taken from SOLR trunk two months ago). 2048 is limit for documents, not for array of pointers to documents. And especially for new "uninverted" SOLR features, plus non-tokenized stored fields, we need 1Gb to store 1Mb of a simple field only (size of field: 1000 bytes). May be it would broke... frankly, I started with 8Gb, then by some reason I set if to 2Gb (a month ago), I don't remember why... I had hardware problems and I didn't want frequent loose of ram buffer... But again: why it would broke? Because "int" has 2048M different values?!! This is extremely strange. My understanding is that "buffer" stores processed data such as "term -> document_id" values, _per_field_array(s!!!); so that 2048M is _absolute_maximum_ in case if your SOLR schema consists from _single_tokenized_field_only_. What about 10 fields? What about plain text stored with document, term vectors, "uninverted" values??? What are reasons on putting such check in Lucene? Array overflow? -Fuad http://www.linkedin.com/in/liferay > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley > Sent: October-24-09 12:27 PM > To: solr-user@lucene.apache.org > Subject: Re: Too many open files > > On Sat, Oct 24, 2009 at 12:18 PM, Fuad Efendi wrote: > > > > Mark, I don't understand this; of course it is use case specific, I haven't > > seen any terrible behaviour with 8Gb > > If you had gone over 2GB of actual buffer *usage*, it would have > broke... Guaranteed. > We've now added a check in Lucene 2.9.1 that will throw an exception > if you try to go over 2048MB. > And as the javadoc says, to be on the safe side, you probably > shouldn't go too near 2048 - perhaps 2000MB is a good practical limit. > > -Yonik > http://www.lucidimagination.com
RE: Too many open files
> > when you store raw (non > > tokenized, non indexed) "text" value with a document (which almost everyone > > does). Try to store 1,000,000 documents with 1000 bytes non-tokenized field: > > you will need 1Gb just for this array. > > Nope. You shouldn't even need 1GB of buffer space for that. > The size specified is for all things that the indexing process needs > to temporarily keep in memory... stored fields are normally > immediately written to disk. > > -Yonik > http://www.lucidimagination.com -Ok, thanks for clarification! What about term vectors, what about non-trivial schema having 10 tokenized fields? Buffer will need 10 arrays (up to 2048M each) for that. My understanding is probably very naive... -Fuad http://www.linkedin.com/in/liferay
Re: Solr under tomcat - UTF-8 issue
Don't use POST. That is the wrong HTTP semantic for search results. Use GET. That will make it possible to cache the results, will make your HTTP logs useful, and all sorts of other good things. wunder On Oct 24, 2009, at 10:11 AM, Glock, Thomas wrote: Thanks - I now think it must be due to my client not sending enough ( or correct ) headers in the request. Tomcat does work when using an HTTP GET but is failing the POST from my flash client. For example putting this in both firefox and IE browsers url works correctly: http://localhost:8080/hranswers/elevate?fl=*%20score&indent=on&start=0&q=%D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE%D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD%D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2&fq=language_cd:ru&rows=20 The POST information my client is sending looks like this and it fails: POST /hranswers/elevate HTTP/1.1 Accept: */* Accept-Language: en-US x-flash-version: 10,0,32,18 Content-Type: application/x-www-form-urlencoded Content-Encoding: UTF-8 Content-Length: 209 Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; UserABC123) Host: localhost:8080 Connection: Keep-Alive Pragma: no-cache fq=language%5Fcd%3Aru&rows=20&start=0&fl=%2A%20score&indent=on&q= %D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE %D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD %D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2 I will keep digging - and let you know how it turns out. Thanks! -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Saturday, October 24, 2009 12:43 PM To: solr-user@lucene.apache.org Subject: Re: Solr under tomcat - UTF-8 issue Try using example/exampledocs/test_utf8.sh to narrow down if the charset problems you're hitting are due to servlet container configuration. -Yonik http://www.lucidimagination.com 2009/10/24 Glock, Thomas : Thanks but not working... I did have the URIEncoding in place and just again moved the URIEncoding attribute to be the first attribute - ensured I saved sever.xml, shut down tomcat, deleted logs and cache and still no luck Its probably something very simple and I'm just missing it. Thanks for your help. -Original Message- From: Zsolt Czinkos [mailto:czin...@gmail.com] Sent: Saturday, October 24, 2009 11:36 AM To: solr-user@lucene.apache.org Subject: Re: Solr under tomcat - UTF-8 issue Hello Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml (on connector element)? Like: Hope this helps. Best regards czinkos 2009/10/24 Glock, Thomas : Hoping someone can help - Problem: Querying for non-english phrases such as Добавить do not return any results under Tomcat but do work when using the Jetty example. Both tomcat and jetty are being queried by the same custom (flash) client and both reference the same solr/data/index. I'm using an http POST rather than http GET to do the query to solr. I believe the problem must be in how tomcat is configured and had hoped the -Dfile.encoding=UTF-8 would solve it - but no luck. I've stopped started tomcat and deleted the work directory as well. Results are the same in both IE6 and Firefox and I've used both firebug and fiddler to view the http request/responses. It is consistent - jetty works, tomcat does not. Environment: Tomcat 6 as a service on WinXP Professional 2002 sp 2 Tomcat Java properties - -Dcatalina.home=C:\Program Files\Apache Software Foundation\Tomcat 6.0 -Dcatalina.base=C:\Program Files\Apache Software Foundation\Tomcat 6.0 -Djava.endorsed.dirs=C:\Program Files\Apache Software Foundation\Tomcat 6.0\endorsed -Djava.io.tmpdir=C:\Program Files\Apache Software Foundation\Tomcat 6.0\temp -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=C:\Program Files\Apache Software Foundation\Tomcat 6.0\conf\logging.properties -Dfile.encoding=UTF-8 Thanks in advance. Tom Glock
RE: Solr under tomcat - UTF-8 issue
Thanks - I agree. However my application requires results be trimmed to users based on roles. The roles are repeating values on the documents. Users have many different role combinations as do documents. I recognize this is going to hamper caching - but using a GET will tend to limit the size of search phrases when combined with the boolean role clause. And I am concerned with hitting url limits. At any rate I solved it thanks to Yonik's recommendation. My flex client httpservice by default only sets the content-type request header to "application/x-www-form-urlencoded" what it needed to do for tomcat is set the content-type request header to content-type = "application/x-www-form-urlencoded; charset=UTF-8"; If you have any suggestions regarding limiting results based on user and document role permutations - I'm all ears. I've been to the Search Summit in NYC and no vendor could even seem to grasp the concept. The problem case statement is this - I have users globally who need to search for content tailored to them. Users searching for 'Holiday' don't get any value from 1 documents having the word holiday. What they need are documents authored for that population. The documents have the associated role information as metadata and therefore users will get only the documents they have access to and are relevant to them. That's the plan anyway! By chance I stumbled in Solr a month or so ago and I think its awesome. I got the book two days ago too - fantastic! Thanks again, Tom -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Saturday, October 24, 2009 1:31 PM To: solr-user@lucene.apache.org Subject: Re: Solr under tomcat - UTF-8 issue Don't use POST. That is the wrong HTTP semantic for search results. Use GET. That will make it possible to cache the results, will make your HTTP logs useful, and all sorts of other good things. wunder On Oct 24, 2009, at 10:11 AM, Glock, Thomas wrote: > > Thanks - I now think it must be due to my client not sending enough ( > or correct ) headers in the request. > > Tomcat does work when using an HTTP GET but is failing the POST from > my flash client. > > For example putting this in both firefox and IE browsers url works > correctly: > > http://localhost:8080/hranswers/elevate?fl=*%20score&indent=on&start=0 > &q=%D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE%D0% > B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD%D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%B > E%D0%B2&fq=language_cd:ru&rows=20 > > The POST information my client is sending looks like this and it > fails: > > POST /hranswers/elevate HTTP/1.1 > Accept: */* > Accept-Language: en-US > x-flash-version: 10,0,32,18 > Content-Type: application/x-www-form-urlencoded > Content-Encoding: UTF-8 > Content-Length: 209 > Accept-Encoding: gzip, deflate > User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; > .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR > 3.0.04506.648; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR > 3.5.30729; UserABC123) > Host: localhost:8080 > Connection: Keep-Alive > Pragma: no-cache > > fq=language%5Fcd%3Aru&rows=20&start=0&fl=%2A%20score&indent=on&q= > %D0%94%D0%BE%D0%B1%D0%B0%D0%B2%D0%B8%D1%82%D1%8C%20%D0%BD%D0%BE > %D0%B2%D1%8B%D1%85%20%D0%BA%D0%B0%D0%BD > %D0%B4%D0%B8%D0%B4%D0%B0%D1%82%D0%BE%D0%B2 > > I will keep digging - and let you know how it turns out. > > Thanks! > > > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley > Sent: Saturday, October 24, 2009 12:43 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr under tomcat - UTF-8 issue > > Try using example/exampledocs/test_utf8.sh to narrow down if the > charset problems you're hitting are due to servlet container > configuration. > > -Yonik > http://www.lucidimagination.com > > > 2009/10/24 Glock, Thomas : >> >> Thanks but not working... >> >> I did have the URIEncoding in place and just again moved the >> URIEncoding attribute to be the first attribute - ensured I saved >> sever.xml, shut down tomcat, deleted logs and cache and still no >> luck Its probably something very simple and I'm just missing it. >> >> Thanks for your help. >> >> >> -Original Message- >> From: Zsolt Czinkos [mailto:czin...@gmail.com] >> Sent: Saturday, October 24, 2009 11:36 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr under tomcat - UTF-8 issue >> >> Hello >> >> Have you set URIEncoding attribute to UTF-8 in tomcat's server.xml >> (on connector element)? >> >> Like: >> >> > protocol="HTTP/1.1" redirectPort="8443"/> >> >> Hope this helps. >> >> Best regards >> >> czinkos >> >> >> 2009/10/24 Glock, Thomas : >>> >>> Hoping someone can help - >>> >>> Problem: >>>Querying for non-english phrases such as Добавить do not >>> return any results under Tomcat but do work when using the Jetty >>> example. >>> >>>Both tomcat and jetty are being queri
RE: Too many open files
> If you had gone over 2GB of actual buffer *usage*, it would have > broke... Guaranteed. > We've now added a check in Lucene 2.9.1 that will throw an exception > if you try to go over 2048MB. > And as the javadoc says, to be on the safe side, you probably > shouldn't go too near 2048 - perhaps 2000MB is a good practical limit. > I browsed http://issues.apache.org/jira/browse/LUCENE-1995 and http://search.lucidimagination.com/search/document/f29fc52348ab9b63/arrayind exoutofboundsexception_during_indexing - it is not proof of concept. It is workaround. Problem still exists, and scenario is unclear. -Fuad http://www.linkedin.com/in/liferay
RE: StreamingUpdateSolrServer - indexing process stops in a couple of hours
I am using java 1.6.0_05 To illustrate what is happening I wrote this test program that has 10 threads adding a collection of documents and one thread optimizing the index every 10 sec. I am seeing that after the first optimize there is only one thread that keeps adding documents. The other ones are locked. In the real code I ended up adding synchronized around add on optimize to avoid this. public static void main(String[] args) { final JettySolrRunner jetty = new JettySolrRunner("/solr", 8983 ); try { jetty.start(); // setup the server... String url = "http://localhost:8983/solr";; final StreamingUpdateSolrServer server = new StreamingUpdateSolrServer( url, 2, 5 ) { @Override public void handleError(Throwable ex) { // do somethign... } }; server.setConnectionTimeout(1000); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); int i = 0; while (i++ < 10) { new Thread("add-thread"+i) { public void run(){ int j = 0; while (true) { try { List docs = new ArrayList(); for (int n = 0; n < 50; n++) { SolrInputDocument doc = new SolrInputDocument(); String docID = this.getName()+"_doc_"+j++; doc.addField( "id", docID); doc.addField( "content", "document_"+docID); docs.add(doc); } server.add(docs); System.out.println(this.getName()+" added "+docs.size()+" documents"); Thread.sleep(100); } catch (Exception e) { e.printStackTrace(); System.err.println(this.getName()+" "+e.getLocalizedMessage()); System.exit(0); } } } }.start(); } new Thread("optimizer-thread") { public void run(){ while (true) { try { Thread.sleep(1); server.optimize(); System.out.println(this.getName()+" optimized"); } catch (Exception e) { e.printStackTrace(); System.err.println("optimizer "+e.getLocalizedMessage()); System.exit(0); } } } }.start(); } catch (Exception e) { e.printStackTrace(); } } -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Tuesday, October 13, 2009 8:59 PM To: solr-user@lucene.apache.org Subject: Re: StreamingUpdateSolrServer - indexing process stops in a couple of hours Which Java release is this? There are known thread-blocking problems in Java 1.5. Also, what sockets are used during this time? Try 'netstat -s | fgrep 8983' (or your Solr URL port #) and watch the active, TIME_WAIT, CLOSE_WAIT sockets build up. This may give a hint. On Tue, Oct 13, 2009 at 8:47 AM, Dadasheva, Olga wrote: > Hi, > > I am indexing documents using StreamingUpdateSolrServer. My 'setup' > code is almost a copy of the junit test of the Solr trunk. > > try { > StreamingUpdateSolrServer streamingServer = new > StreamingUpdateSolrServer( url, 2, 5 ) { > �...@override > public void handleError(Throwable ex) { > System.out.println(" new > StreamingUpdateSolrServer error "+ex); > mail.send(new > Date()+"StreamingUpdateSolrServer error. "+ex); > } > }; > streamingServer.setConnectionTimeout(20*6