Re: ERROR on posting update request using CURL in php
Hi, Basically i need to post something like this using curl in php The example of php explained in earlier thread, curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary 'testdoc' Should we need to create a temp file and using put command can we do it using post Regards Naveen -- View this message in context: http://lucene.472066.n3.nabble.com/ERROR-on-posting-update-request-using-CURL-in-php-tp3047312p3047372.html Sent from the Solr - User mailing list archive at Nabble.com.
FW: SolrCloud App Unit Testing
Hi, I am writing a Solr Application, can anyone please let me know how to Unit test the application? I see we have MiniSolrCloudCluster class available in Solr, but I am confused about how to use that for Unit testing. How should I create a embedded server for unit testing? Thanks, Naveen The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Sold MiniSolrCloudCluster Issue
Hi, I am using MiniSolrCloudCluster class in writing unit test cases for testing the solr application. Looks like there is a HTTPClient library mismatch with the solr version and I am getting the below error, java.lang.VerifyError: Bad return type Exception Details: Location: org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient; @57: areturn Reason: Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current frame, stack[0]) is not assignable to 'org/apache/http/impl/client/CloseableHttp I am using Solr 5.3.1 I see a similar issue here https://issues.apache.org/jira/browse/SOLR-7948 but there doesn’t seem to be a work around for this. Can anyone please tell me how to fix this issue? Below is code snippet, dataDir = tempFolder.newFolder(); File solrXml = new File("src/test/resources/solr.xml”); MiniSolrCloudCluster cluster = new MiniSolrCloudCluster(1,null,workingDir,solrXml,null,null); Thanks. The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Question about CloudSolrServer
Hi, Trying to migrate from HttpSolrServer to CloudSolrServer. getting the following exception while adding docs using CloudSolrServer. org.apache.solr.common.SolrException: Unknown document router '{name=compositeId}' at org.apache.solr.common.cloud.DocRouter.getDocRouter(DocRouter.java:46) whereas my cluterstate json says -- "maxShardsPerNode":"1", "router":{"name":"compositeId"}, "replicationFactor":"1". please advice. PS : i'm using solr 4.10.4. Thanks, Naveen.
Re: Question about CloudSolrServer
Thanks *Shawn.* i was using older version of solrj. upgrading it to newer version worked. Thank you. On Thu, Jun 9, 2016 at 11:41 AM, Shawn Heisey wrote: > On 6/8/2016 11:44 PM, Naveen Pajjuri wrote: > > Trying to migrate from HttpSolrServer to CloudSolrServer. getting the > > following exception while adding docs using CloudSolrServer. > > > > > > org.apache.solr.common.SolrException: Unknown document router > > '{name=compositeId}' > > > > at org.apache.solr.common.cloud.DocRouter.getDocRouter(DocRouter.java:46) > > > > whereas my cluterstate json says -- > > > > "maxShardsPerNode":"1", > > "router":{"name":"compositeId"}, > > "replicationFactor":"1". > > I am guessing that you are using a much older version of SolrJ than the > Solr version it is talking to. The '{"name":"compositeId"}' structure > appears to be the way that newer versions of Solr record the router in > zookeeper, which is something that the older versions of SolrJ will not > know how to handle. > > Mixing different versions of Solr and SolrJ will work very well, as long > as you're not using the cloud client. That client is so tightly coupled > to SolrCloud internals that it does not work well with a large version > difference, especially if the client is older than the server. > > Most likely you'll need to upgrade your SolrJ version. At the same > time, switching to CloudSolrClient is probably a good idea -- the class > names that end in Server are deprecated in 5.x and gone in 6.x. > > Thanks, > Shawn > >
Boosting exact match fields.
Hi, I have documents with a field (data type definition for that field is below) values as ear phones, sony ear phones, philips ear phones. when i query for earphones sony ear phones is the top result where as i want ear phones as top result. please suggest how to boost exact matches. PS: I have earphones => ear phones in my synonyms.txt and the datatype definition for that field keywords is REGARDS, Naveen
CloudSolrServer with multiple zookeeper cluster setup.
Hi, In our production we have a solr cloud setup with zookeeper cluster setup. I want to shift to CloudSolrServer from httpsolrserver is there any way to specify all the ip addresses of zookeeper machines while instantiating CloudSolrServer, so that i will have an automatic fallback mechanism. PS : right now i'm instantiating CloudSolrServer with one of the zookeeper machine's ip from the cluster. But if zookeeper on this machine dies my production systems may break. Thanks, Naveen.
Sorting in solr
Hi, If i apply some sorting order on solr. when are the Documents sorted. 1. are documents sorted after fetching the results ? 2. or we get sorted documents ? Regards, Naveen
CloudSolrServer instead of httpSolrServer
Hi, While sending updates to solr cloud i randomly send updates to one of the node (in my cloud) directly using httpSolrServer. if i use cloudSolrServer (by passing zk ip's), instead of httpSolrServer can i expect any improvment in performance. my baisc question is how does updates propagate when i directly send updates to one of the node using httpSolrServer in cloud model. - will the update bounces back to leader direclty?? - or will it send to every node till it finds leader?? Thank You.
custom field types in solr 6.1.0
Hi, Im trying to move from 4.10.4 to 6.1.0. I want to define and use custom field types. but i read that its not advisable to modify managed-schema file. how do i create custom field types ?? Thanks in advance, Naveen Reddy
Issue faced while re-starting solr 6.1.0 after cleaning zk data.
Hi, I'm trying to move to solr-6.1.0. it was working fine and i cleaned up zk data (version folder) and restarted solr and zookeeper. I started getting this error. - *sample_shard1_replica1:* org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Specified config does not exist in ZooKeeper: sample. Please let me know what i'm missing. Regards, Naveen Reddy.
Re: Issue faced while re-starting solr 6.1.0 after cleaning zk data.
Here sample is the name of my collection. Thanks On Sun, Aug 7, 2016 at 3:10 PM, Naveen Pajjuri wrote: > Hi, > I'm trying to move to solr-6.1.0. it was working fine and i cleaned up zk > data (version folder) and restarted solr and zookeeper. I started getting > this error. > > >- *sample_shard1_replica1:* org.apache.solr.common.cloud. >ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: >Specified config does not exist in ZooKeeper: sample. > > > Please let me know what i'm missing. > > Regards, > Naveen Reddy. >
How to exclude stop words in spellcheck collations
Hi, Is there any way i can exclude stop words from the collations and sugesstions from spell check component ? Regards, Naveen Pajjuri.
tika and solr 3,1 integration
Hi I am trying to integrate solr 3.1 and tika (which comes default with the version) and using curl command trying to index few of the documents, i am getting this error. the error is attr_meta field is unknown. i checked the solrconfig, it looks perfect to me. can you please tell me what i am missing. I copied all the jars from contrib/extraction/lib to solr/lib folder that is there in same place where conf is there I am using the same request handler which is coming with default text true ignored_ true links ignored_ * curl " http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true"; -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"* Apache Tomcat/6.0.18 - Error report<!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--> HTTP Status 400 - ERROR:unknown field 'attr_meta'type Status reportmessage ERROR:unknown field 'attr_meta'description The request sent by the client was syntactically incorrect (ERROR:unknown field 'attr_meta').Apache Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib# Please note i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows machine and using solr cell calling the program works fine without any changes in configuration. Thanks Naveen
tika and solr 3,1 integration error
Hi I am trying to integrate solr 3.1 and tika (which comes default with the version) and using curl command trying to index few of the documents, i am getting this error. the error is attr_meta field is unknown. i checked the solrconfig, it looks perfect to me. can you please tell me what i am missing. I copied all the jars from contrib/extraction/lib to solr/lib folder that is there in same place where conf is there I am using the same request handler which is coming with default > > > text > true > ignored_ > > > true > links > ignored_ > > > > > > > > * curl " > http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true"; > -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"* > > > Apache Tomcat/6.0.18 - Error report<!--H1 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} > H2 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} > H3 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} > BODY > {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} > P > {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A > {color : black;}A.name {color : black;}HR {color : #525D76;}--> > HTTP Status 400 - ERROR:unknown field 'attr_meta' size="1" noshade="noshade">type Status reportmessage > ERROR:unknown field 'attr_meta'description The > request sent by the client was syntactically incorrect (ERROR:unknown field > 'attr_meta').Apache > Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib# > > > Please note > > i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows > machine and using solr cell > > calling the program works fine without any changes in configuration. > > Thanks > Naveen > >
Re: tika and solr 3,1 integration
Hi This is fixed .. yes, schema.xml was the culprit and i fixed it looking at the sample schema provided in the sample. But in windows, i am getting slf4j (illegalacess exception) which looks like jar problem. looking at the fixes, suggested in their FAQs, they are suggesting to use 1.5.5 version, which is already there in lib folder .. i have been finding a lot of jars to be deployed .. i am afraid if that is causing the problem .. Has somebody experienced the same ? Thanks Naveen On Fri, Jun 3, 2011 at 2:41 AM, Juan Grande wrote: > Hi Naveen, > > Check if there is a dynamic field named "attr_*" in the schema. The > "uprefix=attr_" parameter means that if Solr can't find an extracted field > in the schema, it'll add the prefix "attr_" and try again. > > *Juan* > > > > On Thu, Jun 2, 2011 at 4:21 AM, Naveen Gupta wrote: > > > Hi > > > > I am trying to integrate solr 3.1 and tika (which comes default with the > > version) > > > > and using curl command trying to index few of the documents, i am getting > > this error. the error is attr_meta field is unknown. i checked the > > solrconfig, it looks perfect to me. > > > > can you please tell me what i am missing. > > > > I copied all the jars from contrib/extraction/lib to solr/lib folder that > > is > > there in same place where conf is there > > > > > > I am using the same request handler which is coming with default > > > > > startup="lazy" > > class="solr.extraction.ExtractingRequestHandler" > > > > > > > text > > true > > ignored_ > > > > > > true > > links > > ignored_ > > > > > > > > > > > > > > > > * curl " > > > > > http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&attr_&fmap.content=attr_content&commit=true > > " > > -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"* > > > > > > Apache Tomcat/6.0.18 - Error > report<!--H1 > > > > > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} > > H2 > > > > > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} > > H3 > > > > > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} > > BODY > > {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} > B > > > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} > > P > > > > > {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A > > {color : black;}A.name {color : black;}HR {color : #525D76;}--> > > HTTP Status 400 - ERROR:unknown field > 'attr_meta' > size="1" noshade="noshade">type Status > > reportmessage > > ERROR:unknown field 'attr_meta'description The > > request sent by the client was syntactically incorrect (ERROR:unknown > field > > 'attr_meta').Apache > > Tomcat/6.0.18root@weforpeople:/usr/share/solr1/lib# > > > > > > Please note > > > > i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows > > machine > > and using solr cell > > > > calling the program works fine without any changes in configuration. > > > > Thanks > > Naveen > > >
Strategy --> Frequent updates in our application
Hi We are having an application where every 10 mins, we are doing indexing of users docs repository, and eventually, if some thread is being added in that particular discussion, we need to index the thread again (please note we are not doing blind indexing each time, we have various rules to filter out which thread is new and thus that is a candidate for indexing plus new ones which has arrived). So we are doing updates for each user docs repository .. the performance is not looking so far very good. the future is that we are going to get hits in volume(1000 to 10,000 hits per mins), so looking for strategy where we can tune solr in order to index the data in real time and what about NRT, is it fine to apply in this case of scenario. i read that solr NRT is not very good in performance, but i am not going to believe it since it is one of the best open sources ..so it is going to have this problem sorted in near future ..but if any benchmark is there, kindly share with me ... we would like to analyze with our requirements. Is there any way to add incremental indexes which we generally find in other search engine like endeca and etc? i don't know much in detail about solr... since i am newbie, so can you please tell me if we can have some settings which can keep track of incremental indexing? Thanks Naveen
different indexes for multitenant approach
Hi I want to implement different index strategy where we want to keep indexes with respect to each tennant and we want to maintain indexes separately ... first level of category -- company name second level of category - company name + fields to be indexed then further categories - group of different company name based on some heuristic (hashing) (if it grows furhter) i want to do in the same solr instance. can it be possible ? Thanks Naveen
Re: How to display search results of solr in to other application.
Hi Romi As per me, you need to understand how ajax with jquery works .. then go for json and then jsonp (if you are fetching from different) query here is dynamic query which you will be trying to hit solr .. (it could be simple text, or more advanced query string) http://wiki.apache.org/solr/CommonQueryParameters Callback is the method name which you will define .. after getting response, this method will be called (callback mechanism) using the response from solr (json format), you need to show the response or analyze the response as per your business need. Thanks Naveen On Fri, Jun 3, 2011 at 12:00 PM, Romi wrote: > $.getJSON( > "http://[server]:[port]/solr/select/?jsoncallback=?";, > {"q": queryString, > "version": "2.2", > "start": "0", > "rows": "10", > "indent": "on", > "json.wrf": "callbackFunctionToDoSomethingWithOurData", > "wt": "json", > "fl": "field1"} > ); > > would you please explain what are queryString and "json.wrf": > "callbackFunctionToDoSomethingWithOurData". and what if i want to change my > query string each time. > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3018740.html > Sent from the Solr - User mailing list archive at Nabble.com. >
php library for extractrequest handler
Hi We want to post to solr server with some of the files (rtf,doc,etc) using php .. one way is to post using curl is there any client like java client (solrcell) urls will also help Thanks Naveen
Re: Strategy --> Frequent updates in our application
Hi Pravesh We don't have that setup right now .. we are thinking of doing that for writes we are going to have one instance and for read, we are going to have another... do you have other design in mind .. kindly share Thanks Naveen On Fri, Jun 3, 2011 at 2:50 PM, pravesh wrote: > You can use DataImportHandler for your full/incremental indexing. Now NRT > indexing could vary as per business requirements (i mean delay cud be > 5-mins > ,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will > be indexed incrementally. > BTW, r u having Master+Slave SOLR setup? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: php library for extractrequest handler
Yes, that one i used and it is working fine .thanks to nabble .. Thanks Naveen On Fri, Jun 3, 2011 at 4:02 PM, Gora Mohanty wrote: > On Fri, Jun 3, 2011 at 3:55 PM, Naveen Gupta wrote: > > Hi > > > > We want to post to solr server with some of the files (rtf,doc,etc) using > > php .. one way is to post using curl > > Do not normally use PHP, and have not tried it myself. > However, there is a PHP extension for Solr: > http://wiki.apache.org/solr/SolPHP > http://php.net/manual/en/book.solr.php > > Regards, > Gora >
TIKA INTEGRATION PERFORMANCE
Hi Since it is php, we are using solphp for calling curl based call, what my concern here is that for each user, we might be having 20-40 attachments needed to be indexed each day, and there are various users ..daily we are targeting around 500-1000 users .. right now if you see, we http://localhost:8010/solr/update/extract?literal.id=doc2&commit=true'); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=>"@paper.pdf")); $result= curl_exec ($ch); ?> also we are planning to use other fields which are to be indexed and stored ... There are couple of questions here 1. what would be the best strategies for commit. if we take all the documents in an array and iterating one by one and fire the curl and for the last doc, if we commit, will it work or for each doc, we need to commit? 2. we are having several fields which are already defined in schema and few of the them are required earlier, but for this purpose, we don't want, how to have two requirement together in the same schema? 3. since it is frequent commit, how to use solr multicore for write and read operations separately ? Thanks Naveen
Re: TIKA INTEGRATION PERFORMANCE
Hi Tomas, 1. Regarding SolrInputDocument, We are not using java client, rather we are using php solr, wrapping content in SolrInputDocument, i am not sure how to do in PHP client? In this case, we need tika related jars to avail the metadata such as content .. we certainly don't want to handle all these things in PHP client. Secondly, what i was asking about commit strategy -- what about suppose you have 100 docs iterate over 99 docs and fire curl without commit in url and for 100th doc, we will use commit so doing so, will it also update the indexes for last 99 docs while(upto 99){ curl_command = url without commit; } when i = 100, url would be commit i wanted to achieve something similar to optimize kind of thing why these kind of use cases which are general purpose not included in example (especially in other language ...java guys can easily do using API) I am basically a Java Guy, so i can feel the problem Thanks Naveen 2011/6/6 Tomás Fernández Löbbe > 1. About the commit strategy, all the ExtractingRequestHandler (request > handler that uses Tika to extract content from the input file) will do is > extract the content of your file and add it to a SolrInputDocument. The > commit strategy should not change because of this, compared to other > documents you might be indexing. It is usually not recommended to commit on > every new / updated document. > > 2. Don't know if I understand the question. you can add all the static > fields you want to the document by adding the "literal." prefix to the name > of the fields when using ExtractingRequestHandler (as you are doing with " > literal.id"). You can also leave empty fields if they are not marked as > "required" at the schema.xml file. See: > http://wiki.apache.org/solr/ExtractingRequestHandler#Literals > > 3. Solr cores can work almost as completely different Solr instances. You > could tell one core to replicate from another core. I don't think this > would > be of any help here. If you want to separate the indexing operations from > the query operations, you could probably use different machines, that's > usually a better option. Configure the indexing box as master and the query > box as slave. Here you have some more information about it: > http://wiki.apache.org/solr/SolrReplication > > Were this the answers you were looking for or did I misunderstand your > questions? > > Tomás > > On Mon, Jun 6, 2011 at 2:54 AM, Naveen Gupta wrote: > > > Hi > > > > Since it is php, we are using solphp for calling curl based call, > > > > what my concern here is that for each user, we might be having 20-40 > > attachments needed to be indexed each day, and there are various users > > ..daily we are targeting around 500-1000 users .. > > > > right now if you see, we > > > > > $ch = curl_init(' > > http://localhost:8010/solr/update/extract?literal.id=doc2&commit=true'); > > curl_setopt ($ch, CURLOPT_POST, 1); > > curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=>"@paper.pdf")); > > $result= curl_exec ($ch); > > ?> > > > > also we are planning to use other fields which are to be indexed and > stored > > ... > > > > > > There are couple of questions here > > > > 1. what would be the best strategies for commit. if we take all the > > documents in an array and iterating one by one and fire the curl and for > > the > > last doc, if we commit, will it work or for each doc, we need to commit? > > > > 2. we are having several fields which are already defined in schema and > few > > of the them are required earlier, but for this purpose, we don't want, > how > > to have two requirement together in the same schema? > > > > 3. since it is frequent commit, how to use solr multicore for write and > > read > > operations separately ? > > > > Thanks > > Naveen > > >
getting numberformat exception while using tika
Hi We are using requestextractinghandler and we are getting following error. we are giving microsoft docx file for indexing. I think that this is something to do with field date definition .. but now very sure ...what field type should we use? 2. we are trying to index jpg (when we search over the name of the jpg, it is not coming .. though in id i am passing one) 3. what about zip files or rar files.. does tika with solr handle this one ? java.lang.NumberFormatException: For input string: "2011-01-27T07:18:00Z" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:412) at java.lang.Long.parseLong(Long.java:461) at org.apache.solr.schema.TrieField.createField(TrieField.java:434) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Thanks Naveen
tika integration exception and other related queries
Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error. java.lang. > > NumberFormatException: For input string: "2011-01-27T07:18:00Z" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Long.parseLong(Long.java:412) > at java.lang.Long.parseLong(Long.java:461) > at org.apache.solr.schema.TrieField.createField(TrieField.java:434) > at > org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) > at > org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) > at > org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:619) > Thanks Naveen On Tue, Jun 7, 2011 at 3:33 PM, Naveen Gupta wrote: > Hi > > We are using requestextractinghandler and we are getting following error. > we are giving microsoft docx file for indexing. > > I think that this is something to do with field date definition .. but now > very sure ...what field type should we use? > > 2. we are trying to index jpg (when we search over the name of the jpg, it > is not coming .. though in id i am passing one) > > 3. what about zip files or rar files.. does tika with solr handle this one > ? > > > java.lang.NumberFormatException: For input string: > "2011-01-27T07:18:00Z" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Long.parseLong(Long.java:412) > at java.lang.Long.parseLong(Long.java:461) > at org.apache.solr.schema.TrieField.createField(TrieField.java:434) > at > org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) > at > org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) > at > org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHa
Re: tika integration exception and other related queries
Hi Gary It started working .. though i did not test for Zip files, but for rar files, it is working fine .. only thing what i wanted to do is to index the metadata (text mapped to content) not store the data Also in search result, i want to filter the stuffs ... and it started working fine .. i don't want to show the content stuffs to the end user, since the way it extracts the information is not very helpful to the user .. although we can apply few of the analyzers and filters to remove the unnecessary tags ..still the information would not be of much help .. looking for your opinion ... what you did in order to filter out the content or are you showing the content extracted to the end user? Even in case, we are showing the text part to the end user, how can i limit the number of characters while querying the search results ... is there any feature where we can achieve this ... the concept of snippet kind of thing ... Thanks Naveen On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor wrote: > Naveen, > > For indexing Zip files with Tika, take a look at the following thread : > > > http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html > > I got it to work with the 3.1 source and a couple of patches. > > Hope this helps. > > Regards, > Gary. > > > > On 08/06/2011 04:12, Naveen Gupta wrote: > >> Hi Can somebody answer this ... >> >> 3. can somebody tell me an idea how to do indexing for a zip file ? >> >> 1. while sending docx, we are getting following error. >> > >
Re: tika integration exception and other related queries
Hi Gary, Similar thing we are doing, but we are not creating an XML doc, rather we are leaving TIKA to extract the content and depends on dynamic fields. We are not storing the text as well. But not sure if in future that would be the case. What about microsoft 7 and later related attachments. Is this working for you, because we are always getting number format exception. I posted as well in the community, but till now no response has some. Thanks Naveen On Thu, Jun 9, 2011 at 6:43 PM, Gary Taylor wrote: > Naveen, > > Not sure our requirement matches yours, but one of the things we index is a > "comment" item that can have one or more files attached to it. To index the > whole thing as a single Solr document we create a zipfile containing a file > with the comment details in it and any additional attached files. This is > submitted to Solr as a TEXT field in an XML doc, along with other meta-data > fields from the comment. In our schema the TEXT field is indexed but not > stored, so when we search and get a match back it doesn't contain all of the > contents from the attached files etc., only the stored fields in our schema. > Admittedly, the user can therefore get back a "comment" match with no > indication as to WHERE the match occurred (ie. was it in the meta-data or > the contents of the attached files), but at the moment we're only interested > in getting appropriate matches, not explaining where the match is. > > Hope that helps. > > Kind regards, > Gary. > > > > > On 09/06/2011 03:00, Naveen Gupta wrote: > >> Hi Gary >> >> It started working .. though i did not test for Zip files, but for rar >> files, it is working fine .. >> >> only thing what i wanted to do is to index the metadata (text mapped to >> content) not store the data Also in search result, i want to filter >> the >> stuffs ... and it started working fine .. i don't want to show the content >> stuffs to the end user, since the way it extracts the information is not >> very helpful to the user .. although we can apply few of the analyzers and >> filters to remove the unnecessary tags ..still the information would not >> be >> of much help .. looking for your opinion ... what you did in order to >> filter >> out the content or are you showing the content extracted to the end user? >> >> Even in case, we are showing the text part to the end user, how can i >> limit >> the number of characters while querying the search results ... is there >> any >> feature where we can achieve this ... the concept of snippet kind of thing >> ... >> >> Thanks >> Naveen >> >> On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor wrote: >> >> Naveen, >>> >>> For indexing Zip files with Tika, take a look at the following thread : >>> >>> >>> >>> http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html >>> >>> I got it to work with the 3.1 source and a couple of patches. >>> >>> Hope this helps. >>> >>> Regards, >>> Gary. >>> >>> >>> >>> On 08/06/2011 04:12, Naveen Gupta wrote: >>> >>> Hi Can somebody answer this ... >>>> >>>> 3. can somebody tell me an idea how to do indexing for a zip file ? >>>> >>>> 1. while sending docx, we are getting following error. >>>> >>>> >
ERROR on posting update request using CURL in php
Hi This is my document in php $xmldoc = 'F_14674gmail.com121sample.pptx'; $ch = curl_init("http://localhost:8080/solr/update";); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: text/xml") ); curl_setopt($ch, CURLOPT_POSTFIELDS,$xmldoc); $result= curl_exec($ch); if(!curl_errno($ch)) { $info = curl_getinfo($ch); $header = substr($response, 0, $info['header_size']); echo 'Took ' . $info['total_time'] . ' seconds to send a request to ' . $info['url']; }else{ print_r('no idea'); } println('result of query'.' '.' -> '.$result); It is throwing error Apache Tomcat/6.0.18 - Error report<!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--> HTTP Status 400 - Unexpected character ''' (code 39) in prolog; expected '<' at [row,col {unknown-source}]: [1,1]type Status reportmessage Unexpected character ''' (code 39) in prolog; expected '<' at [row,col {unknown-source}]: [1,1]description The request sent by the client was syntactically incorrect (Unexpected character ''' (code 39) in prolog; expected '<' at [row,col {unknown-source}]: [1,1]).Apache Tomcat/6.0.18 Thanks Naveen
Re: ERROR on posting update request using CURL in php
Hi, curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary 'testdoc' Regards Naveen On Fri, Jun 10, 2011 at 10:18 AM, Naveen Gupta wrote: > Hi > > This is my document > > in php > > $xmldoc = 'F_146 name="userid">74gmail.com name="attachment_size">121 name="attachment_name">sample.pptx'; > > $ch = curl_init("http://localhost:8080/solr/update";); > curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); > curl_setopt ($ch, CURLOPT_POST, 1); > curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: > text/xml") ); > curl_setopt($ch, CURLOPT_POSTFIELDS,$xmldoc); > >$result= curl_exec($ch); >if(!curl_errno($ch)) >{ >$info = curl_getinfo($ch); >$header = substr($response, 0, $info['header_size']); >echo 'Took ' . $info['total_time'] . ' seconds to send a > request to ' . $info['url']; > }else{ > print_r('no idea'); > } > println('result of query'.' '.' -> '.$result); > > It is throwing error > > Apache Tomcat/6.0.18 - Error > report<!--H1 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} > H2 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} > H3 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} > BODY > {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} > P > {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A > {color : black;}A.name {color : black;}HR {color : #525D76;}--> > HTTP Status 400 - Unexpected character ''' (code 39) in > prolog; expected '<' > at [row,col {unknown-source}]: [1,1] noshade="noshade">type Status reportmessage > Unexpected character ''' (code 39) in prolog; expected '<' > at [row,col {unknown-source}]: [1,1]description The > request sent by the client was syntactically incorrect (Unexpected character > ''' (code 39) in prolog; expected '<' > at [row,col {unknown-source}]: [1,1]). noshade="noshade">Apache Tomcat/6.0.18 > > > Thanks > Naveen > > >
relevant result for query with boost factor on parameters
Hi, I am trying to achieve this use case with following expectation three fields 1. field1 2. field2 3. field3 field1 should have the max relevance field2 should have the next field3 is the last the term will be entered by end user (say* rock roll*) i want to show the results which will contain *rock and roll* both in field1 (first) i want to show the results which will contain *rock and roll* both in field 2 (first) these should be only done for a given* field3 (x...@gmail.com)* but if suppose field1 does not contain both the term *"rock" and "roll", * *special attention *then field 2 results should take the priority (show the results which has both the terms first and then show the results with respect to boost factor or relevance) if both the fields do not contain these terms together (show as normal one with field1 having more relevance than field2) how to join the results for field3 that means for a given field3, the above results should be filtered. I am trying this one, giving satisfactory results, but not the best one, field1:(rock roll)^20 field2:(rock roll)^4 field3:x...@gmail.com i was thinking of givning filed1 field2 && field3 but not working. Can you help in this regard? What other config should i consider in terms of given context ? Thanks Naveen
indexing taking very long time
Hi We have a requirement where we are indexing all the messages of a a thread, a thread may have attachment too . We are adding to the solr for indexing and searching for applying few business rule. For a user, we have almost many threads (100k) in number and each thread may be having 10-20 messages. Now what we are finding is that it is taking 30 mins to index the entire threads. When we run optimize then it is taking faster time. The question here is that how frequently this optimize should be called and when ? Please note that we are following commit strategy (that is every after 10k threads, commit is called). we are not calling commit after every doc. Secondly how can we use multi threading from solr perspective in order to improve jvm and other utilization ? Thanks Naveen
Re: IMP: indexing taking very long time
Can somebody answer this? What should be the best strategy for optimize (when million of messages we are indexing for a new registered user) Thanks Naveen On Tue, Aug 2, 2011 at 5:36 PM, Naveen Gupta wrote: > Hi > > We have a requirement where we are indexing all the messages of a a thread, > a thread may have attachment too . We are adding to the solr for indexing > and searching for applying few business rule. > > For a user, we have almost many threads (100k) in number and each thread > may be having 10-20 messages. > > Now what we are finding is that it is taking 30 mins to index the entire > threads. > > When we run optimize then it is taking faster time. > > The question here is that how frequently this optimize should be called and > when ? > > Please note that we are following commit strategy (that is every after 10k > threads, commit is called). we are not calling commit after every doc. > > Secondly how can we use multi threading from solr perspective in order to > improve jvm and other utilization ? > > > Thanks > Naveen >
merge factor performance
Hi, We are having a requirement where we are having almost 100,000 documents to be indexed (atleast 20 fields). These fields are not having length greater than 10 KB. Also we are running parallel search for the same index. We found that it is taking almost 3 min to index the entire documents. Strategy what we are doing is that We are making a commit after 15000 docs (single large xml doc) We are having merge factor of 10 as if now I am wondering if increasing the merge factor to 25 or 50 would increase the performance. also what about RAM Size (default is 32 MB) ? Which other factors we need to consider ? When should we consider optimize ? Any other deviation from default would help us in achieving the target. We are allocating JVM max heap size allocation 512 MB, default concurrent mark sweep is set for garbage collection. Thanks Naveen
Re: merge factor performance
Sorry for 15k Docs, it is taking 3 mins. On Thu, Aug 4, 2011 at 10:07 PM, Naveen Gupta wrote: > Hi, > > We are having a requirement where we are having almost 100,000 documents to > be indexed (atleast 20 fields). These fields are not having length greater > than 10 KB. > > Also we are running parallel search for the same index. > > We found that it is taking almost 3 min to index the entire documents. > > Strategy what we are doing is that > > We are making a commit after 15000 docs (single large xml doc) > > We are having merge factor of 10 as if now > > I am wondering if increasing the merge factor to 25 or 50 would increase > the performance. > > also what about RAM Size (default is 32 MB) ? > > Which other factors we need to consider ? > > When should we consider optimize ? > > Any other deviation from default would help us in achieving the target. > > We are allocating JVM max heap size allocation 512 MB, default concurrent > mark sweep is set for garbage collection. > > > Thanks > Naveen > > > >
Re: indexing taking very long time
Hi Erick, We are having a requirement where we are having almost 100,000 documents to be indexed (atleast 20 fields). These fields are not having length greater than 10 KB. Also we are running parallel search for the same index. We found that it is taking almost 3 min to index the entire documents. Strategy what we are doing is that We are making a commit after 15000 docs (single large xml doc) (update streaming using curl in php) We are having merge factor of 10 as if now I am wondering if increasing the merge factor to 25 or 50 would increase the performance. also what about RAM Size (default is 32 MB) ? Which other factors we need to consider ? When should we consider optimize ? Any other deviation from default would help us in achieving the target. We are allocating JVM max heap size allocation 512 MB, default concurrent mark sweep is set for garbage collection. One more thing, we have CPU utilization (20-25 % in all 4 cores) (using htop) Thanks Naveen On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson wrote: > What version of Solr are you using? If it's a recent version, then > optimizing is not that essential, you can do it during off hours, perhaps > nightly or weekly. > > As far as indexing speed, have you profiled your application to see whether > it's Solr or your indexing process that's the bottleneck? A quick check > would be to monitor the CPU utilization on the server and see if it's high. > > As far as multithreading, one option is to simply have multiple clients > indexing simultaneously. But you haven't indicated how the indexing is > being > done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to > provide those kinds of details to get meaningful help. > > Best > Erick > On Aug 2, 2011 8:06 AM, "Naveen Gupta" wrote: > > Hi > > > > We have a requirement where we are indexing all the messages of a a > thread, > > a thread may have attachment too . We are adding to the solr for indexing > > and searching for applying few business rule. > > > > For a user, we have almost many threads (100k) in number and each thread > may > > be having 10-20 messages. > > > > Now what we are finding is that it is taking 30 mins to index the entire > > threads. > > > > When we run optimize then it is taking faster time. > > > > The question here is that how frequently this optimize should be called > and > > when ? > > > > Please note that we are following commit strategy (that is every after > 10k > > threads, commit is called). we are not calling commit after every doc. > > > > Secondly how can we use multi threading from solr perspective in order to > > improve jvm and other utilization ? > > > > > > Thanks > > Naveen >
Re: indexing taking very long time
Hi ERick, Version of SOLR 3.0 We are indexing the data using CURL call from C interface to SOLR server using REST. We are merging 15,000 docs in a single XML doc and directly using CURL to index the data and then calling commit. (update) For each of the client, we are creating a new connection .(a php script uses exec() command to start new C process for every user) and hitting the SOLR server. We are using default solrconfig except few of the fields changes.inschema.xml Max JVM heap allocation (512 MB RAM) (512 MB RAM is for linux box as well) Initially i increased merge factor 50 and Ram size of 50 MB, but needed to reduce since we were getting java.lang.OutOfMemoryError: Java heap space it is taking 3 mins to index 15,000 docs ( a client can have 100 000 docs and we have many multiple clients). Also we run in parallel search query from other client to this index as well. its the time between curl was called and the time response came back When we commit, CPU usage goes upto 25 % (not all the cores, but yeah few of them). The total number of cores is 4. Can you please advise where to start from tuning perspective. Some blog i was going through, it clearly says that it should take 40 secs to index 100,000 docs (if you have 10-12 fields defined). I forgot the link. They talked about increasing the merge factor. Thanks Naveen On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson wrote: > What version of Solr are you using? If it's a recent version, then > optimizing is not that essential, you can do it during off hours, perhaps > nightly or weekly. > > As far as indexing speed, have you profiled your application to see whether > it's Solr or your indexing process that's the bottleneck? A quick check > would be to monitor the CPU utilization on the server and see if it's high. > > As far as multithreading, one option is to simply have multiple clients > indexing simultaneously. But you haven't indicated how the indexing is > being > done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to > provide those kinds of details to get meaningful help. > > Best > Erick > On Aug 2, 2011 8:06 AM, "Naveen Gupta" wrote: > > Hi > > > > We have a requirement where we are indexing all the messages of a a > thread, > > a thread may have attachment too . We are adding to the solr for indexing > > and searching for applying few business rule. > > > > For a user, we have almost many threads (100k) in number and each thread > may > > be having 10-20 messages. > > > > Now what we are finding is that it is taking 30 mins to index the entire > > threads. > > > > When we run optimize then it is taking faster time. > > > > The question here is that how frequently this optimize should be called > and > > when ? > > > > Please note that we are following commit strategy (that is every after > 10k > > threads, commit is called). we are not calling commit after every doc. > > > > Secondly how can we use multi threading from solr perspective in order to > > improve jvm and other utilization ? > > > > > > Thanks > > Naveen >
LockObtainFailedException
:59:56 PM org.apache.solr.update.SolrIndexWriter finalize SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! Kindly tell me where it is failing We have increased timelockout. But still it is giving the same problem Thanks Naveen
Re: LockObtainFailedException
Yes this was happening because of JVM heap size But the real issue is that if our index size is growing (very high) then indexing time is taking very long (using streaming) earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it was taking 3 mins 20 secs time, after deleting the index data, it is taking 9 secs What would be approach to have better indexing performance as well as index size should also at the same time. The index size was around 4.5 GB Thanks Naveen On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge wrote: > Hi, > > When you get this exception with no other error or explananation in > the logs, this is almost always because the JVM has run out of memory. > Have you checked/profiled your mem usage/GC during the stream operation? > > > > On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta wrote: > > Hi, > > > > We are doing streaming update to solr for multiple user, > > > > We are getting > > > > > > Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log > > > > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain > timed > > out: NativeFSLock@/var/lib/solr/data/index/write.lock > >at org.apache.lucene.store.Lock.obtain(Lock.java:84) > >at > org.apache.lucene.index.IndexWriter.(IndexWriter.java:1097) > >at > > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83) > >at > > > org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) > >at > > > org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) > >at > > > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) > >at > > > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) > >at > > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) > >at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) > >at > > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) > >at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) > >at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > >at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > >at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > >at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > >at > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > >at > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > >at > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > >at > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > >at > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > >at > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) > >at > > > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) > >at > > > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) > >at org.apache.tomcat.util.net.JIoEndpoint > > > > Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log > > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain > timed > > out: NativeFSLock@/var/lib/solr/data/index/write.lock > >at org.apache.lucene.store.Lock.obtain(Lock.java:84) > >at > org.apache.lucene.index.IndexWriter.(IndexWriter.java:1097) > >at > > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83) > >at > > > org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) > >at > > > org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) > >at > > > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) > >at > > > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) > >a
Re: LockObtainFailedException
HI Peter I found the issue, Actually we were getting this exception because of JVM space. I allocated 512 xms and 1024 xmx .. finally increased the time limit for write lock to 20 secs .. things are working fine ... but still it did not help ... On closely analysis of doc which we were indexing, we were using commitWithin as 10 secs, which was the root cause of taking so long for indexing the document because of so many segments to be committed. On separate commit command using curl solved the issue. The performance improved from 3 mins to 1.5 secs :) Thanks a lot Naveen On Thu, Aug 11, 2011 at 6:27 PM, Peter Sturge wrote: > Optimizing indexing time is a very different question. > I'm guessing your 3mins+ time you refer to is the commit time. > > There are a whole host of things to take into account regarding > indexing, like: number of segments, schema, how many fields, storing > fields, omitting norms, caching, autowarming, search activity etc. - > the list goes on... > The trouble is, you can look at 100 different Solr installations with > slow indexing, and find 200 different reasons why each is slow. > > The best place to start is to get a full understanding of precisely > how your data is being stored in the index, starting with adding docs, > going through your schema, Lucene segments, solrconfig.xml etc, > looking at caches, commit triggers etc. - really getting to know how > each step is affecting performance. > Once you really have a handle on all the indexing steps, you'll be > able to spot the bottlenecks that relate to your particular > environment. > > An index of 4.5GB isn't that big (but the number of documents tends to > have more of an effect than the physical size), so the bottleneck(s) > should be findable once you trace through the indexing operations. > > > > On Thu, Aug 11, 2011 at 1:02 PM, Naveen Gupta wrote: > > Yes this was happening because of JVM heap size > > > > But the real issue is that if our index size is growing (very high) > > > > then indexing time is taking very long (using streaming) > > > > earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it > > was taking 3 mins 20 secs time, > > > > after deleting the index data, it is taking 9 secs > > > > What would be approach to have better indexing performance as well as > index > > size should also at the same time. > > > > The index size was around 4.5 GB > > > > Thanks > > Naveen > > > > On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge >wrote: > > > >> Hi, > >> > >> When you get this exception with no other error or explananation in > >> the logs, this is almost always because the JVM has run out of memory. > >> Have you checked/profiled your mem usage/GC during the stream operation? > >> > >> > >> > >> On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta > wrote: > >> > Hi, > >> > > >> > We are doing streaming update to solr for multiple user, > >> > > >> > We are getting > >> > > >> > > >> > Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log > >> > > >> > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain > >> timed > >> > out: NativeFSLock@/var/lib/solr/data/index/write.lock > >> >at org.apache.lucene.store.Lock.obtain(Lock.java:84) > >> >at > >> org.apache.lucene.index.IndexWriter.(IndexWriter.java:1097) > >> >at > >> > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83) > >> >at > >> > > >> > org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) > >> >at > >> > > >> > org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) > >> >at > >> > > >> > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) > >> >at > >> > > >> > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) > >> >at > >> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) > >> >at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) > >> >at > >> > > >> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) > >> >at > >> > > >> > org.ap
exceeded limit of maxWarmingSearchers ERROR
Hi, Most of the settings are default. We have single node (Memory 1 GB, Index Size 4GB) We have a requirement where we are doing very fast commit. This is kind of real time requirement where we are polling many threads from third party and indexes into our system. We want these results to be available soon. We are committing for each user (may have 10k threads and inside that 1 thread may have 10 messages). So overall documents per user will be having around .1 million (10) Earlier we were using commit Within as 10 milliseconds inside the document, but that was slowing the indexing and we were not getting any error. As we removed the commit Within, indexing became very fast. But after that we started experiencing in the system As i read many forums, everybody told that this is happening because of very fast commit rate, but what is the solution for our problem? We are using CURL to post the data and commit Also till now we are using default solrconfig. Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662)
Re: exceeded limit of maxWarmingSearchers ERROR
Hi Mark/Erick/Nagendra, I was not very confident about NRT at that point of time, when we started project almost 1 year ago, definitely i would try NRT and see the performance. The current requirement was working fine till we were using commitWithin 10 millisecs in the XMLDocument which we were posting to SOLR. But due to which, we were getting very poor performance (almost 3 mins for 15,000 docs) per user. There are many paraller user committing to our SOLR. So we removed the commitWithin, and hence performance was much much better. But then we are getting this maxWarmingSearcher Error, because we are committing separately as a curl request after once entire doc is submitted for indexing. The question here is what is difference between commitWithin and commit (apart from the fact that commit takes memory and processes and additional hardware usage) Why we want it to be visible as soon as possible, since we are applying many business rules on top of the results (older indexes as well as new one) and apply different filters. upto 5 mins is fine for us. but more than that we need to think then other optimizations. We will definitely try NRT. But please tell me other options which we can apply in order to optimize.? Thanks Naveen On Sun, Aug 14, 2011 at 9:42 PM, Erick Erickson wrote: > Ah, thanks, Mark... I must have been looking at the wrong JIRAs. > > Erick > > On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller > wrote: > > > > On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: > > > >> You either have to go to near real time (NRT), which is under > >> development, but not committed to trunk yet > > > > NRT support is committed to trunk. > > > > - Mark Miller > > lucidimagination.com > > > > > > > > > > > > > > > > > > >
Re: exceeded limit of maxWarmingSearchers ERROR
Nagendra You wrote, Naveen: *NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a document to become searchable*. Any document that you add through update becomes immediately searchable. So no need to commit from within your update client code. Since there is no commit, the cache does not have to be cleared or the old searchers closed or new searchers opened, and warmed (error that you are facing). Looking at the link which you mentioned is clearly what we wanted. But the real thing is that you have "RA does need a commit for a document to become searchable" (please take a look at bold sentence) . In future, for more loads, can it cater to Master Slave (Replication) and etc to scale and perform better? If yes, we would like to go for NRT and looking at the performance described in the article is acceptable. We were expecting the same real time performance for a single user. What about multiple users, should we wait for 1-2 secs before calling the curl request to make SOLR perform better. Or internally it will handle with multiple request (multithreaded and etc). What would be doc size (10,000 docs) to allow JVM perform better? Have you done any kind of benchmarking in terms of multi threaded and multi user for NRT and also JVM tuning in terms of SOLR sever performance. Any kind of performance analysis would help us to decide quickly to switch over to NRT. Questions in terms for switching over to NRT, 1.Should we upgrade to SOLR 4.x ? 2. Any benchmarking (10,000 docs/secs). The question here is more specific the detail of individual doc (fields, number of fields, fields size, parameters affecting performance with faceting or w/o faceting) 3. What about multiple users ? A user in real time might be having an large doc size of .1 million. How to break and analyze which one is better (though it is our task to do). But still any kind of break up will help us. Imagine a user inbox. 4. JVM tuning and performance result based on Multithreaded environment. 5. Machine Details (RAM, CPU, and settings from SOLR perspective). Hoping that you are getting my point. We want to benchmark the performance. If you can involve me in your group, that would be great. Thanks Naveen 2011/8/15 Nagendra Nagarajayya > Bill: > > I did look at Marks performance tests. Looks very interesting. > > Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance: > http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x<http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x> > > > Regards > > - Nagendra Nagarajayya > http://solr-ra.tgels.org > http://rankingalgorithm.tgels.**org <http://rankingalgorithm.tgels.org> > > > > On 8/14/2011 7:47 PM, Bill Bell wrote: > >> I understand. >> >> Have you looked at Mark's patch? From his performance tests, it looks >> pretty good. >> >> When would RA work better? >> >> Bill >> >> >> On 8/14/11 8:40 PM, "Nagendra Nagarajayya"> transaxtions.com > >> wrote: >> >> Bill: >>> >>> The technical details of the NRT implementation in Apache Solr with >>> RankingAlgorithm (SOLR-RA) is available here: >>> >>> http://solr-ra.tgels.com/**papers/NRT_Solr_**RankingAlgorithm.pdf<http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf> >>> >>> (Some changes for Solr 3.x, but for most it is as above) >>> >>> Regarding support for 4.0 trunk, should happen sometime soon. >>> >>> Regards >>> >>> - Nagendra Nagarajayya >>> http://solr-ra.tgels.org >>> http://rankingalgorithm.tgels.**org <http://rankingalgorithm.tgels.org> >>> >>> >>> >>> >>> >>> On 8/14/2011 7:11 PM, Bill Bell wrote: >>> >>>> OK, >>>> >>>> I'll ask the elephant in the roomŠ. >>>> >>>> What is the difference between the new UpdateHandler from Mark and the >>>> SOLR-RA? >>>> >>>> The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk? >>>> >>>> Pros/Cons? >>>> >>>> >>>> On 8/14/11 8:10 PM, "Nagendra >>>> Nagarajayya" >>>> > >>>> wrote: >>>> >>>> Naveen: >>>>> >>>>> NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a >>>>> document to become searchable. Any document that you add through update >>>>> becomes immediately searchable. So no need to commit from within your >>>>> update client code. Since there is no commit, the cache does not have >>>>>
Re: exceeded limit of maxWarmingSearchers ERROR
Hi Nagendra, Thanks a lot .. i will start working on NRT today.. meanwhile old settings (increased warmSearcher in Master) have not given me trouble till now .. but NRT will be more suitable to us ... Will work on that one and will analyze the performance and share with you. Thanks Naveen 2011/8/17 Nagendra Nagarajayya > Naveen: > > See below: > >> *NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a >> >> document to become searchable*. Any document that you add through update >> becomes immediately searchable. So no need to commit from within your >> update client code. Since there is no commit, the cache does not have to >> be >> cleared or the old searchers closed or new searchers opened, and warmed >> (error that you are facing). >> >> >> Looking at the link which you mentioned is clearly what we wanted. But the >> real thing is that you have "RA does need a commit for a document to >> become >> searchable" (please take a look at bold sentence) . >> >> > Yes, as said earlier you do not need a commit. A document becomes > searchable as soon as you add it. Below is an example of adding a document > with curl (this from the wiki at http://solr-ra.tgels.com/wiki/** > en/Near_Real_Time_Search_ver_**3.x<http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x> > ): > > curl "http://localhost:8983/solr/**update/csv?stream.file=/tmp/** > x1.csv&encapsulator=%1f<http://localhost:8983/solr/update/csv?stream.file=/tmp/x1.csv&encapsulator=%1f> > " > > > There is no commit included. The contents of the document become > immediately searchable. > > > In future, for more loads, can it cater to Master Slave (Replication) and >> etc to scale and perform better? If yes, we would like to go for NRT and >> looking at the performance described in the article is acceptable. We were >> expecting the same real time performance for a single user. >> >> > There are no changes to Master/Slave (replication) process. So any changes > you have currently will work as before or if you enable replication later, > it should still work as without NRT. > > > What about multiple users, should we wait for 1-2 secs before calling the >> curl request to make SOLR perform better. Or internally it will handle >> with >> multiple request (multithreaded and etc). >> > > Again for updating documents, you do not have to change your current > process or code. Everything remains the same, except that if you were > including commit, you do not include commit in your update statements. There > is no change to the existing update process so internally it will not queue > or multi-thread updates. It is as in existing Solr functionality, there no > changes to the existing setup. > > Regarding perform better, in the Wiki paper every update through curl adds > (streams) 500 documents. So you could take this approach. (this was > something that I chose randomly to test the performance but seems to be > good) > > > What would be doc size (10,000 docs) to allow JVM perform better? Have you >> done any kind of benchmarking in terms of multi threaded and multi user >> for >> NRT and also JVM tuning in terms of SOLR sever performance. Any kind of >> performance analysis would help us to decide quickly to switch over to >> NRT. >> >> > The performance discussed in the wiki paper uses the MBArtists index. The > MBArtists index is the index used as one of the examples in the book, Solr > 1.4 Enterprise Search Server. You can download and build this index if you > have the book or can also download the contents from musicbrainz.org. > Each doc maybe about 100 bytes and has about 7 fields. Performance with > wikipedia's xml dump, commenting out skipdoc field (include redirects) in > the dataconfig.xml [ dataimport handler ], the update performance is about > 15000 docs / sec (100 million docs), with the skipdoc enabled (does not skip > redirects), the performance is about 1350 docs / sec [ time spent mostly > converting validating/xml than actual update ] (about 11 million docs ). > Documents in wikipedia can be quite big, at least avg size of about > 2500-5000 bytes or more. > > I would suggest that you download and give NRT with Apache Solr 3.3 and > RankingAlgorithm a try and get a feel of it as this would be the best way to > see how your config works with it. > > > Questions in terms for switching over to NRT, >> >> >> 1.Should we upgrade to SOLR 4.x ? >> >> 2. Any benchmarking (10,000 docs/secs). The question here is more >> specific >> >> the detail of indi
Disabling jvm properties from ui
Hi, Is there a way to disable jvm properties from the solr UI. It has some information which we don’t want to expose. Any pointers would be helpful. Thanks
Solr index writing to s3
hi, My requirement is to write the index data into S3, we have solr installed on aws instances. Please let me know if there is any documentation on how to achieve writing the index data to s3. Thanks