Re: what does the version parameter in the query mean?
On Fri, May 22, 2009 at 7:40 AM, Anshuman Manur wrote: > ahI see! thank you so much for the response! > > I'm using SolrJ, so I probably don't need to set XML version since the wiki > tells me that it uses binary as a default! > > Solrj automatically adds the correct version parameter/value. You do not need to add it yourself. -- Regards, Shalin Shekhar Mangar.
Re: No sanity checks before replicating files?
I think this problem might happen when there are uncommitted changes in S2 and the master S1 comes back online. In that case, slave's generation is still less than master's and installation of index diff from master may fail. However, I do not understand a few points. Damien, if S1 comes back online and S2 starts replicating from S1, any changes to S2 will be discarded when a successful replication happens. How do you intend to protect against that? A better way is to detect when S1 comes back online and make it a slave of S2. 2009/5/22 Noble Paul നോബിള് नोब्ळ् > Let us see what is the desired behavior. > > When s1 comes back up online , s2 must download a fresh copy of index > from s1 because s1 is the slave and s2 has a newer version of index > than s1. > > Are you suggesting that s2 downloads the index files and then commit > fails? The code is written as follows > > boolean freshDownloadneeded = myIndexGeneration >= mastersIndexgeneration; > > then it should be a problem > > can u post the stacktrace? > > On Thu, May 21, 2009 at 11:45 PM, Otis Gospodnetic > wrote: > > > > Aha, I see. Perhaps you can post the error message/stack trace? > > > > As for the sanity check, I bet a call to > > http://host:port/solr/replication?command=indexversion > could be used ensure only newer versions of the index are being pulled. > We'll see what Paul says when he wakes up. :) > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message > >> From: Damien Tournoud > >> To: solr-user@lucene.apache.org > >> Sent: Thursday, May 21, 2009 1:26:30 PM > >> Subject: Re: No sanity checks before replicating files? > >> > >> Hi Otis, > >> > >> Thanks for your answer. > >> > >> On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic > >> wrote: > >> > Interesting, this is similar to my suggestion to another person I just > replied > >> to here on solr-user. > >> > Have you actually run into this problem? I haven't tried it, but I'd > think > >> the first next replication (copying index from s1 to s2) would not > necessarily > >> fail, but would simply overwrite any changes that were made on s2 while > it was > >> serving as the master. Is that not what happens? > >> > >> No it doesn't. For some reason, Solr download all the files of the > >> index, but fails to commit the changes locally. At the next poll, the > >> process restarts. Not only does this clogs the network, but it also > >> unnecessarily uses resources on the newly promoted slave, until we > >> change its configuration. > >> > >> > If that's what happens, then I think what you'd simply have to do is > to: > >> > > >> > 1) bring s1 back up, but don't make it a master immediately > >> > 2) take away the master role from s2 > >> > 3) make s1 copy the index from s2, since s2 might have a more up to > date index > >> now > >> > 4) make s1 the master > >> > >> Once s2 is the master, we want it to stay this way. We will reassign > >> s1 as the slave at a later stage, when resources allows. What worries > >> me is that strange behavior of Solr 1.4 replication when the "slave" > >> index is fresher then the "master" one. > >> > >> Damien > > > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > -- Regards, Shalin Shekhar Mangar.
Re: How to index large set data
about 2.8 m total docs were created. only the first run finishes. In my 2nd try, it hangs there forever at the end of indexing, (I guess right before commit), with cpu usage of 100%. Total 5G (2050) index files are created. Now I have two problems: 1. why it hangs there and failed? 2. how can i speed up the indexing? Here is my solrconfig.xml false 3000 1000 2147483647 1 false --- On Thu, 5/21/09, Noble Paul നോബിള് नोब्ळ् wrote: > From: Noble Paul നോബിള് नोब्ळ् > Subject: Re: How to index large set data > To: solr-user@lucene.apache.org > Date: Thursday, May 21, 2009, 10:39 PM > what is the total no:of docs created > ? I guess it may not be memory > bound. indexing is mostly amn IO bound operation. You may > be able to > get a better perf if a SSD is used (solid state disk) > > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai > wrote: > > > > Hi Paul, > > > > Thank you so much for answering my questions. It > really helped. > > After some adjustment, basically setting mergeFactor > to 1000 from the default value of 10, I can finished the > whole job in 2.5 hours. I checked that during running time, > only around 18% of memory is being used, and VIRT is always > 1418m. I am thinking it may be restricted by JVM memory > setting. But I run the data import command through web, > i.e., > > > http://:/solr/dataimport?command=full-import, > how can I set the memory allocation for JVM? > > Thanks again! > > > > JB > > > > --- On Thu, 5/21/09, Noble Paul നോബിള് > नोब्ळ् > wrote: > > > >> From: Noble Paul നോബിള് > नोब्ळ् > >> Subject: Re: How to index large set data > >> To: solr-user@lucene.apache.org > >> Date: Thursday, May 21, 2009, 9:57 PM > >> check the status page of DIH and see > >> if it is working properly. and > >> if, yes what is the rate of indexing > >> > >> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai > > >> wrote: > >> > > >> > Hi, > >> > > >> > I have about 45GB xml files to be indexed. I > am using > >> DataImportHandler. I started the full import 4 > hours ago, > >> and it's still running > >> > My computer has 4GB memory. Any suggestion on > the > >> solutions? > >> > Thanks! > >> > > >> > JB > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > >> -- > >> > - > >> Noble Paul | Principal Engineer| AOL | http://aol.com > >> > > > > > > > > > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Re: Solr statistics of top searches and results returned
Hi, good feature to have, maintaining top N would also require storing all the search queries done so far and keep updating (or atleast in some time window). having pluggable persistent storage for all time search queries would be great. tell me how can I help? -umar On Fri, May 22, 2009 at 12:21 PM, Shalin Shekhar Mangar wrote: > On Fri, May 22, 2009 at 3:22 AM, Grant Ingersoll wrote: > >> >> I think you will want some type of persistence mechanism otherwise you will >> end up consuming a lot of resources keeping track of all the query strings, >> unless I'm missing something. Either a Lucene index (Solr core) or the >> option of embedding a DB. Ideally, it would be pluggable such that people >> could choose their storage mechanism. Most people do this kind of thing >> offline via log analysis as logs can grow quite large quite quickly. >> > > For a general case, yes. But I was thinking more of a top 'n' queries as a > running statistic. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: clustering SOLR-769
Hi there, > Is it possbile to specify more than one snippet field or should I use copy > field to copy copy two or three field into single field and specify it in > snippet field. Currently, you can specify only one snippet field, so you'd need to use copy. Cheers, S.
solr replication 1.3
I want to add master slave configuration for solr. I have following solr configuration: I am using solr 1.3 on windows. I am also using EmbeddedSolrServer. In this case is it possible to perform master slave configuration?? My second question is if I user solr 1.4 which has solr replication using java.. Still is it possible to do solr replication using EmbeddedSolrServer on windows?? Thanks, Ashish -- View this message in context: http://www.nabble.com/solr-replication-1.3-tp23667360p23667360.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr in cluster
Hi, One of the problems I have with Lucene is Lock obtained by the IndexWriter. I want to use one Solr running inside a cluster behind the load balancer. Are multiple webservers able to write and commit to Lucene using Solr with out locking issues etc? Is Solr the solution for concurrency problem or do I have to use some JMS queue or something to update/commit? I can use synchronization technics to fix concurrency problems on one webserver but on more than one webserver, I think that you what I mean. Gr, Reza -- Reza Safari LUKKIEN Copernicuslaan 15 6716 BM Ede The Netherlands - http://www.lukkien.com t: +31 (0) 318 698000 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited.
Re: clustering SOLR-769
On May 22, 2009, at 4:40 AM, Stanislaw Osinski wrote: Hi there, Is it possbile to specify more than one snippet field or should I use copy field to copy copy two or three field into single field and specify it in snippet field. Currently, you can specify only one snippet field, so you'd need to use copy. Do note, though, that nothing is set in stone on this stuff. What you have right now is a first attempt. We are definitely open to suggestions on improvements. -Grant
Re: solr replication 1.3
On Fri, May 22, 2009 at 3:12 PM, Ashish P wrote: > > I want to add master slave configuration for solr. I have following solr > configuration: > I am using solr 1.3 on windows. I am also using EmbeddedSolrServer. > In this case is it possible to perform master slave configuration?? > > My second question is if I user solr 1.4 which has solr replication using > java.. > Still is it possible to do solr replication using EmbeddedSolrServer on > windows?? no . The replication in 1.4 relies on http transport. for an EmbeddedSolrServer there is no http end point > > Thanks, > Ashish > -- > View this message in context: > http://www.nabble.com/solr-replication-1.3-tp23667360p23667360.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to index large set data
Can you parallelize this? I don't know that the DIH can handle it, but having multiple threads sending docs to Solr is the best performance wise, so maybe you need to look at alternatives to pulling with DIH and instead use a client to push into Solr. On May 22, 2009, at 3:42 AM, Jianbin Dai wrote: about 2.8 m total docs were created. only the first run finishes. In my 2nd try, it hangs there forever at the end of indexing, (I guess right before commit), with cpu usage of 100%. Total 5G (2050) index files are created. Now I have two problems: 1. why it hangs there and failed? 2. how can i speed up the indexing? Here is my solrconfig.xml false 3000 1000 2147483647 1 false --- On Thu, 5/21/09, Noble Paul നോബിള് नो ब्ळ् wrote: From: Noble Paul നോബിള് नोब्ळ् Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Thursday, May 21, 2009, 10:39 PM what is the total no:of docs created ? I guess it may not be memory bound. indexing is mostly amn IO bound operation. You may be able to get a better perf if a SSD is used (solid state disk) On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai wrote: Hi Paul, Thank you so much for answering my questions. It really helped. After some adjustment, basically setting mergeFactor to 1000 from the default value of 10, I can finished the whole job in 2.5 hours. I checked that during running time, only around 18% of memory is being used, and VIRT is always 1418m. I am thinking it may be restricted by JVM memory setting. But I run the data import command through web, i.e., http://:/solr/dataimport?command=full-import, how can I set the memory allocation for JVM? Thanks again! JB --- On Thu, 5/21/09, Noble Paul നോബിള് नोब्ळ् wrote: From: Noble Paul നോബിള് नोब्ळ् Subject: Re: How to index large set data To: solr-user@lucene.apache.org Date: Thursday, May 21, 2009, 9:57 PM check the status page of DIH and see if it is working properly. and if, yes what is the rate of indexing On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai wrote: Hi, I have about 45GB xml files to be indexed. I am using DataImportHandler. I started the full import 4 hours ago, and it's still running My computer has 4GB memory. Any suggestion on the solutions? Thanks! JB -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: How to index large set data
there is already an issue for writing to Solr in multiple threads SOLR-1089 On Fri, May 22, 2009 at 6:08 PM, Grant Ingersoll wrote: > Can you parallelize this? I don't know that the DIH can handle it, but > having multiple threads sending docs to Solr is the best performance wise, > so maybe you need to look at alternatives to pulling with DIH and instead > use a client to push into Solr. > > > On May 22, 2009, at 3:42 AM, Jianbin Dai wrote: > >> >> about 2.8 m total docs were created. only the first run finishes. In my >> 2nd try, it hangs there forever at the end of indexing, (I guess right >> before commit), with cpu usage of 100%. Total 5G (2050) index files are >> created. Now I have two problems: >> 1. why it hangs there and failed? >> 2. how can i speed up the indexing? >> >> >> Here is my solrconfig.xml >> >> false >> 3000 >> 1000 >> 2147483647 >> 1 >> false >> >> >> >> >> --- On Thu, 5/21/09, Noble Paul നോബിള് नोब्ळ् >> wrote: >> >>> From: Noble Paul നോബിള് नोब्ळ् >>> Subject: Re: How to index large set data >>> To: solr-user@lucene.apache.org >>> Date: Thursday, May 21, 2009, 10:39 PM >>> what is the total no:of docs created >>> ? I guess it may not be memory >>> bound. indexing is mostly amn IO bound operation. You may >>> be able to >>> get a better perf if a SSD is used (solid state disk) >>> >>> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai >>> wrote: Hi Paul, Thank you so much for answering my questions. It >>> >>> really helped. After some adjustment, basically setting mergeFactor >>> >>> to 1000 from the default value of 10, I can finished the >>> whole job in 2.5 hours. I checked that during running time, >>> only around 18% of memory is being used, and VIRT is always >>> 1418m. I am thinking it may be restricted by JVM memory >>> setting. But I run the data import command through web, >>> i.e., >>> http://:/solr/dataimport?command=full-import, >>> how can I set the memory allocation for JVM? Thanks again! JB --- On Thu, 5/21/09, Noble Paul നോബിള് >>> >>> नोब्ळ् >>> wrote: > From: Noble Paul നോബിള് >>> >>> नोब्ळ् > > Subject: Re: How to index large set data > To: solr-user@lucene.apache.org > Date: Thursday, May 21, 2009, 9:57 PM > check the status page of DIH and see > if it is working properly. and > if, yes what is the rate of indexing > > On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai >>> >>> > > wrote: >> >> Hi, >> >> I have about 45GB xml files to be indexed. I >>> >>> am using > > DataImportHandler. I started the full import 4 >>> >>> hours ago, > > and it's still running >> >> My computer has 4GB memory. Any suggestion on >>> >>> the > > solutions? >> >> Thanks! >> >> JB >> >> >> >> >> > > > > -- > >>> - > > Noble Paul | Principal Engineer| AOL | http://aol.com > >>> >>> >>> >>> -- >>> - >>> Noble Paul | Principal Engineer| AOL | http://aol.com >>> >> >> >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr in cluster
Reza, You can't have multiple Solr instances write to the same index at the same time. But you can add documents to a single Solr instance in parallel (e.g. from multiple threads of one or more applications) and Solr will do the right thing without you having to put JMS or some other type of queue in front of Solr. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Reza Safari > To: solr-user@lucene.apache.org > Sent: Friday, May 22, 2009 6:17:56 AM > Subject: Solr in cluster > > Hi, > > One of the problems I have with Lucene is Lock obtained by the IndexWriter. I > want to use one Solr running inside a cluster behind the load balancer. Are > multiple webservers able to write and commit to Lucene using Solr with out > locking issues etc? Is Solr the solution for concurrency problem or do I have > to > use some JMS queue or something to update/commit? I can use synchronization > technics to fix concurrency problems on one webserver but on more than one > webserver, I think that you what I mean. > > Gr, Reza > > -- > Reza Safari > LUKKIEN > Copernicuslaan 15 > 6716 BM Ede > > The Netherlands > - > http://www.lukkien.com > t: +31 (0) 318 698000 > > This message is for the designated recipient only and may contain privileged, > proprietary, or otherwise private information. If you have received it in > error, > please notify the sender immediately and delete the original. Any other use > of > the email by you is prohibited.
Re: How to index large set data
Hi, Those settings are a little "crazy". Are you sure you want to give Solr/Lucene 3G to buffer documents before flushing them to disk? Are you sure you want to use the mergeFactor of 1000? Checking the logs to see if there are any errors. Look at the index directory to see if Solr is actually still writing to it? (file sizes are changing, number of files is changing). kill -QUIT the JVM pid to see where things are "stuck" if they are stuck... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Jianbin Dai > To: solr-user@lucene.apache.org; noble.p...@gmail.com > Sent: Friday, May 22, 2009 3:42:04 AM > Subject: Re: How to index large set data > > > about 2.8 m total docs were created. only the first run finishes. In my 2nd > try, > it hangs there forever at the end of indexing, (I guess right before commit), > with cpu usage of 100%. Total 5G (2050) index files are created. Now I have > two > problems: > 1. why it hangs there and failed? > 2. how can i speed up the indexing? > > > Here is my solrconfig.xml > > false > 3000 > 1000 > 2147483647 > 1 > false > > > > > --- On Thu, 5/21/09, Noble Paul നോബിള് नोब्ळ् wrote: > > > From: Noble Paul നോബിള് नोब्ळ् > > Subject: Re: How to index large set data > > To: solr-user@lucene.apache.org > > Date: Thursday, May 21, 2009, 10:39 PM > > what is the total no:of docs created > > ? I guess it may not be memory > > bound. indexing is mostly amn IO bound operation. You may > > be able to > > get a better perf if a SSD is used (solid state disk) > > > > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai > > wrote: > > > > > > Hi Paul, > > > > > > Thank you so much for answering my questions. It > > really helped. > > > After some adjustment, basically setting mergeFactor > > to 1000 from the default value of 10, I can finished the > > whole job in 2.5 hours. I checked that during running time, > > only around 18% of memory is being used, and VIRT is always > > 1418m. I am thinking it may be restricted by JVM memory > > setting. But I run the data import command through web, > > i.e., > > > > > http://:/solr/dataimport?command=full-import, > > how can I set the memory allocation for JVM? > > > Thanks again! > > > > > > JB > > > > > > --- On Thu, 5/21/09, Noble Paul നോബിള് > > नोब्ळ् > > wrote: > > > > > >> From: Noble Paul നോബിള് > > नोब्ळ् > > >> Subject: Re: How to index large set data > > >> To: solr-user@lucene.apache.org > > >> Date: Thursday, May 21, 2009, 9:57 PM > > >> check the status page of DIH and see > > >> if it is working properly. and > > >> if, yes what is the rate of indexing > > >> > > >> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai > > > > >> wrote: > > >> > > > >> > Hi, > > >> > > > >> > I have about 45GB xml files to be indexed. I > > am using > > >> DataImportHandler. I started the full import 4 > > hours ago, > > >> and it's still running > > >> > My computer has 4GB memory. Any suggestion on > > the > > >> solutions? > > >> > Thanks! > > >> > > > >> > JB > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > > >> > > >> -- > > >> > > - > > >> Noble Paul | Principal Engineer| AOL | http://aol.com > > >> > > > > > > > > > > > > > > > > > > > > > > > -- > > - > > Noble Paul | Principal Engineer| AOL | http://aol.com > >
Re: Solr in cluster
Master work. This is exactly what I'm looking for. Now I'm happy :) Gr, Reza On May 22, 2009, at 4:23 PM, Otis Gospodnetic wrote: Reza, You can't have multiple Solr instances write to the same index at the same time. But you can add documents to a single Solr instance in parallel (e.g. from multiple threads of one or more applications) and Solr will do the right thing without you having to put JMS or some other type of queue in front of Solr. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Reza Safari To: solr-user@lucene.apache.org Sent: Friday, May 22, 2009 6:17:56 AM Subject: Solr in cluster Hi, One of the problems I have with Lucene is Lock obtained by the IndexWriter. I want to use one Solr running inside a cluster behind the load balancer. Are multiple webservers able to write and commit to Lucene using Solr with out locking issues etc? Is Solr the solution for concurrency problem or do I have to use some JMS queue or something to update/commit? I can use synchronization technics to fix concurrency problems on one webserver but on more than one webserver, I think that you what I mean. Gr, Reza -- Reza Safari LUKKIEN Copernicuslaan 15 6716 BM Ede The Netherlands - http://www.lukkien.com t: +31 (0) 318 698000 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited. -- Reza Safari LUKKIEN Copernicuslaan 15 6716 BM Ede The Netherlands - http://www.lukkien.com t: +31 (0) 318 698000 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited.
Re: Plugin Not Found
I have included the configuration and the log for the error on startup. I does appear it tries to load the lib but then simply can't referene it. explicit 0.01 productId^10.0 personality^15.0 subCategory^20.0 category^10.0 productType^8.0 brandName^10.0 realBrandName^9.5 productNameSearch^20 size^1.2 width^1.0 heelHeight^1.0 productDescription^5.0 color^6.0 price^1.0 expandedGender^0.5 brandName^5.0 productNameSearch^5.0 productDescription^5.0 personality^10.0 subCategory^20.0 category^10.0 productType^8.0 productId, productName, price, originalPrice, brandNameFacet, productRating, imageUrl, productUrl, isNew, onSale rord(popularity)^1 100% 1 5 *:* brandNameFacet,productTypeFacet,productName,categoryFacet,subC ategoryFacet,personalityFacet,colorFacet,heelHeight,expandedGender 1 1 spellcheck facetcube LOGS May 22, 2009 7:38:24 AM org.apache.catalina.startup.SetAllPropertiesRule begin WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'maxProcessors' to '500' did not find a matching property. May 22, 2009 7:38:24 AM org.apache.catalina.startup.SetAllPropertiesRule begin WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'maxProcessors' to '500' did not find a matching property. May 22, 2009 7:38:24 AM org.apache.catalina.core.AprLifecycleListener init INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/local/apr/lib May 22, 2009 7:38:24 AM org.apache.tomcat.util.net.NioSelectorPool getSharedSelector INFO: Using a shared selector for servlet write/read May 22, 2009 7:38:24 AM org.apache.coyote.http11.Http11NioProtocol init INFO: Initializing Coyote HTTP/1.1 on http-8080 May 22, 2009 7:38:24 AM org.apache.tomcat.util.net.NioSelectorPool getSharedSelector INFO: Using a shared selector for servlet write/read May 22, 2009 7:38:24 AM org.apache.coyote.http11.Http11NioProtocol init INFO: Initializing Coyote HTTP/1.1 on http-8443 May 22, 2009 7:38:24 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 1011 ms May 22, 2009 7:38:24 AM org.apache.catalina.core.StandardService start INFO: Starting service Catalina May 22, 2009 7:38:24 AM org.apache.catalina.core.StandardEngine start INFO: Starting Servlet Engine: Apache Tomcat/6.0.16 May 22, 2009 7:38:24 AM org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web application archive solr.war May 22, 2009 7:38:25 AM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No /solr/home in JNDI May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: using system property solr.solr.home: /home/zetasolr May 22, 2009 7:38:25 AM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /home/zetasolr/solr.xml May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader INFO: Solr home set to '/home/zetasolr/' May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr classloader May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader INFO: Solr home set to '/home/zetasolr/cores/zeta-main/' May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Reusing parent classloader May 22, 2009 7:38:25 AM org.apache.solr.core.SolrConfig INFO: Loaded SolrConfig: solrconfig.xml May 22, 2009 7:38:25 AM org.apache.solr.schema.IndexSchema readSchema INFO: Reading Solr Schema May 22, 2009 7:38:25 AM org.apache.solr.schema.IndexSchema readSchema INFO: Schema name=Zappos Zeta (zeta-main) May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created string: org.apache.solr.schema.StrField May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created boolean: org.apache.solr.schema.BoolField May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created integer: org.apache.solr.schema.IntField May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created long: org.apache.solr.schema.LongField May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created float: org.apache.solr.schema.FloatField M
Re: How to index large set data
I dont know exactly what is this 3G Ram buffer used. But what I noticed was both index size and file number were keeping increasing, but stuck in the commit. --- On Fri, 5/22/09, Otis Gospodnetic wrote: > From: Otis Gospodnetic > Subject: Re: How to index large set data > To: solr-user@lucene.apache.org > Date: Friday, May 22, 2009, 7:26 AM > > Hi, > > Those settings are a little "crazy". Are you sure you > want to give Solr/Lucene 3G to buffer documents before > flushing them to disk? Are you sure you want to use > the mergeFactor of 1000? Checking the logs to see if > there are any errors. Look at the index directory to > see if Solr is actually still writing to it? (file sizes are > changing, number of files is changing). kill -QUIT the > JVM pid to see where things are "stuck" if they are > stuck... > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Jianbin Dai > > To: solr-user@lucene.apache.org; > noble.p...@gmail.com > > Sent: Friday, May 22, 2009 3:42:04 AM > > Subject: Re: How to index large set data > > > > > > about 2.8 m total docs were created. only the first > run finishes. In my 2nd try, > > it hangs there forever at the end of indexing, (I > guess right before commit), > > with cpu usage of 100%. Total 5G (2050) index files > are created. Now I have two > > problems: > > 1. why it hangs there and failed? > > 2. how can i speed up the indexing? > > > > > > Here is my solrconfig.xml > > > > false > > 3000 > > 1000 > > 2147483647 > > 1 > > false > > > > > > > > > > --- On Thu, 5/21/09, Noble Paul > നോബിള് नोब्ळ् wrote: > > > > > From: Noble Paul നോബിള് > नोब्ळ् > > > Subject: Re: How to index large set data > > > To: solr-user@lucene.apache.org > > > Date: Thursday, May 21, 2009, 10:39 PM > > > what is the total no:of docs created > > > ? I guess it may not be memory > > > bound. indexing is mostly amn IO bound operation. > You may > > > be able to > > > get a better perf if a SSD is used (solid state > disk) > > > > > > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai > > > wrote: > > > > > > > > Hi Paul, > > > > > > > > Thank you so much for answering my > questions. It > > > really helped. > > > > After some adjustment, basically setting > mergeFactor > > > to 1000 from the default value of 10, I can > finished the > > > whole job in 2.5 hours. I checked that during > running time, > > > only around 18% of memory is being used, and VIRT > is always > > > 1418m. I am thinking it may be restricted by JVM > memory > > > setting. But I run the data import command > through web, > > > i.e., > > > > > > > http://:/solr/dataimport?command=full-import, > > > how can I set the memory allocation for JVM? > > > > Thanks again! > > > > > > > > JB > > > > > > > > --- On Thu, 5/21/09, Noble Paul > നോബിള് > > > नोब्ळ् > > > wrote: > > > > > > > >> From: Noble Paul നോബിള് > > > नोब्ळ् > > > >> Subject: Re: How to index large set > data > > > >> To: solr-user@lucene.apache.org > > > >> Date: Thursday, May 21, 2009, 9:57 PM > > > >> check the status page of DIH and see > > > >> if it is working properly. and > > > >> if, yes what is the rate of indexing > > > >> > > > >> On Thu, May 21, 2009 at 11:48 AM, > Jianbin Dai > > > > > > >> wrote: > > > >> > > > > >> > Hi, > > > >> > > > > >> > I have about 45GB xml files to be > indexed. I > > > am using > > > >> DataImportHandler. I started the full > import 4 > > > hours ago, > > > >> and it's still running. > > > >> > My computer has 4GB memory. Any > suggestion on > > > the > > > >> solutions? > > > >> > Thanks! > > > >> > > > > >> > JB > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > >> > > > >> > > > >> -- > > > >> > > > > - > > > >> Noble Paul | Principal Engineer| AOL | > http://aol.com > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > - > > > Noble Paul | Principal Engineer| AOL | http://aol.com > > > > >
Re: How to index large set data
If I do the xml parsing by myself and use embedded client to do the push, would it be more efficient than DIH? --- On Fri, 5/22/09, Grant Ingersoll wrote: > From: Grant Ingersoll > Subject: Re: How to index large set data > To: solr-user@lucene.apache.org > Date: Friday, May 22, 2009, 5:38 AM > Can you parallelize this? I > don't know that the DIH can handle it, > but having multiple threads sending docs to Solr is the > best > performance wise, so maybe you need to look at alternatives > to pulling > with DIH and instead use a client to push into Solr. > > > On May 22, 2009, at 3:42 AM, Jianbin Dai wrote: > > > > > about 2.8 m total docs were created. only the first > run finishes. In > > my 2nd try, it hangs there forever at the end of > indexing, (I guess > > right before commit), with cpu usage of 100%. Total 5G > (2050) index > > files are created. Now I have two problems: > > 1. why it hangs there and failed? > > 2. how can i speed up the indexing? > > > > > > Here is my solrconfig.xml > > > > > false > > > 3000 > > > 1000 > > > 2147483647 > > > 1 > > > false > > > > > > > > > > --- On Thu, 5/21/09, Noble Paul > നോബിള് नो > > ब्ळ् > wrote: > > > >> From: Noble Paul നോബിള് > नोब्ळ् > >> > >> Subject: Re: How to index large set data > >> To: solr-user@lucene.apache.org > >> Date: Thursday, May 21, 2009, 10:39 PM > >> what is the total no:of docs created > >> ? I guess it may not be memory > >> bound. indexing is mostly amn IO bound operation. > You may > >> be able to > >> get a better perf if a SSD is used (solid state > disk) > >> > >> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai > > >> wrote: > >>> > >>> Hi Paul, > >>> > >>> Thank you so much for answering my questions. > It > >> really helped. > >>> After some adjustment, basically setting > mergeFactor > >> to 1000 from the default value of 10, I can > finished the > >> whole job in 2.5 hours. I checked that during > running time, > >> only around 18% of memory is being used, and VIRT > is always > >> 1418m. I am thinking it may be restricted by JVM > memory > >> setting. But I run the data import command through > web, > >> i.e., > >>> > >> > http://:/solr/dataimport?command=full-import, > >> how can I set the memory allocation for JVM? > >>> Thanks again! > >>> > >>> JB > >>> > >>> --- On Thu, 5/21/09, Noble Paul > നോബിള് > >> नोब्ळ् > >> wrote: > >>> > From: Noble Paul നോബിള് > >> नोब्ळ् > Subject: Re: How to index large set data > To: solr-user@lucene.apache.org > Date: Thursday, May 21, 2009, 9:57 PM > check the status page of DIH and see > if it is working properly. and > if, yes what is the rate of indexing > > On Thu, May 21, 2009 at 11:48 AM, Jianbin > Dai > >> > wrote: > > > > Hi, > > > > I have about 45GB xml files to be > indexed. I > >> am using > DataImportHandler. I started the full > import 4 > >> hours ago, > and it's still running > > My computer has 4GB memory. Any > suggestion on > >> the > solutions? > > Thanks! > > > > JB > > > > > > > > > > > > > > -- > > >> > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > >>> > >>> > >>> > >>> > >>> > >> > >> > >> > >> -- > >> > - > >> Noble Paul | Principal Engineer| AOL | http://aol.com > >> > > > > > > > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem > (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination..com/search > >
Filtering query terms
Hi, I am experiencing problems using filters. I'm using the following version of Solr: solr/nightly of 2009-04-12 The part of the schema.xml I'm using for setting filters is the following: and the field I'm querying is a field called "all" declared as follows: When I try testing the filter "solr.LowerCaseFilterFactory" I get different results calling the following urls: 1. http://[server-ip]:[server-port]/solr/[core-name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on 2. http://[server-ip]:[server-port]/solr/[core-name]/select/?q=all%3APaPa&version=2.2&start=0&rows=10&indent=on Besides, when trying to test the "solr.ISOLatin1AccentFilterFactory" I get different results calling the following urls: 1. http://[server-ip]:[server-port]/solr/[core-name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on 2. http://[server-ip]:[server-port]/solr/[core-name]/select/?q=all%3Apapà&version=2.2&start=0&rows=10&indent=on Is it the expected behavior or it is a (known) bug? I would like to apply some filter converting all searched words in the corresponding lowercase version without accents. Thanks for your help, Marco -- The information transmitted is intended for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
RE: Filtering query terms
> When I try testing the filter "solr.LowerCaseFilterFactory" I get > different results calling the following urls: > > 1. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on > 2. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3APaPa&version=2.2&start=0&rows=10&indent=on In this case, the WordDelimiterFilterFactory is kicking in on your second search, so "APaPa" is split into "APa" and "Pa". You can double-check this by using the analysis tool in the admin UI - http://localhost:8983/solr/admin/analysis.jsp > > Besides, when trying to test the "solr.ISOLatin1AccentFilterFactory" I > get different results calling the following urls: > > 1. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on > 2. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3Apapà&version=2.2&start=0&rows=10&indent=on Not sure what it happening here, but again I would check it with the analysi tool
Re: Multicore Solr not showing Cache Stats
Old email. Hoss, thanks for doing this. I had a closer look at my solrconfig.xml and found that I didn't put elements around the settings for caches. Solr didn't complain, so I didn't notice earlier... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Chris Hostetter > To: solr-user@lucene.apache.org > Sent: Tuesday, April 7, 2009 5:41:48 PM > Subject: Re: Multicore Solr not showing Cache Stats > > > : - Going to http://localhost:8983/core1/admin/stats.jsp#cache shows a > : nearly empty Cache section. The only cache that shows up there is > : fieldValueCache (which is really commented out in solrconfig.xml, but > : Solr creates it anyway, which is normal). All other caches are missing. > : > : Any ideas why cache stats might not be getting displayed or where I > : could look to figure out what's going on? > > Otis: I can't reporduce on the trunk... > > chr...@chrishmposxl:~/lucene/solr/example$ mkdir otis > chr...@chrishmposxl:~/lucene/solr/example$ cp multicore/solr.xml otis/ > chr...@chrishmposxl:~/lucene/solr/example$ cp -r solr otis/core0 > chr...@chrishmposxl:~/lucene/solr/example$ cp -r solr otis/core1 > chr...@chrishmposxl:~/lucene/solr/example$ java -Dsolr.solr.home=otis -jar > start.jar > > http://localhost:8983/solr/core1/admin/stats.jsp#cache > http://localhost:8983/solr/core0/admin/stats.jsp#cache > > ...both show full cache stats for all of the expected caches. > > > are you sure there isn't a bug in your configs? if you set > -Dsolr.solr.home=/data/solr_home/cores/core1 can you see the stats for > that core? > > > -Hoss
R: Filtering query terms
Thank you very much for the instantaneous support. I couldn't find the conflict for hours :( When I have a response for the ISOLatin1AccentFilterFactory I will write it on the mailing-list. Thanks again, Marco Da: Ensdorf Ken [ensd...@zoominfo.com] Inviato: venerdì 22 maggio 2009 18.16 A: 'solr-user@lucene.apache.org' Oggetto: RE: Filtering query terms > When I try testing the filter "solr.LowerCaseFilterFactory" I get > different results calling the following urls: > > 1. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on > 2. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3APaPa&version=2.2&start=0&rows=10&indent=on In this case, the WordDelimiterFilterFactory is kicking in on your second search, so "APaPa" is split into "APa" and "Pa". You can double-check this by using the analysis tool in the admin UI - http://localhost:8983/solr/admin/analysis.jsp > > Besides, when trying to test the "solr.ISOLatin1AccentFilterFactory" I > get different results calling the following urls: > > 1. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on > 2. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3Apapà&version=2.2&start=0&rows=10&indent=on Not sure what it happening here, but again I would check it with the analysi tool -- The information transmitted is intended for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
Re: How to index large set data
If the file numbers and index size was increasing, that means Solr was still working. It's possible it's taking extra long because of such high settings. Bring them both down and try. For example, don't go over 20 with mergeFactor, and try just 1GB for ramBufferSizeMB. Bona fortuna! Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Jianbin Dai > To: solr-user@lucene.apache.org > Sent: Friday, May 22, 2009 11:05:27 AM > Subject: Re: How to index large set data > > > I dont know exactly what is this 3G Ram buffer used. But what I noticed was > both > index size and file number were keeping increasing, but stuck in the commit. > > --- On Fri, 5/22/09, Otis Gospodnetic wrote: > > > From: Otis Gospodnetic > > Subject: Re: How to index large set data > > To: solr-user@lucene.apache.org > > Date: Friday, May 22, 2009, 7:26 AM > > > > Hi, > > > > Those settings are a little "crazy". Are you sure you > > want to give Solr/Lucene 3G to buffer documents before > > flushing them to disk? Are you sure you want to use > > the mergeFactor of 1000? Checking the logs to see if > > there are any errors. Look at the index directory to > > see if Solr is actually still writing to it? (file sizes are > > changing, number of files is changing). kill -QUIT the > > JVM pid to see where things are "stuck" if they are > > stuck... > > > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message > > > From: Jianbin Dai > > > To: solr-user@lucene.apache.org; > > noble.p...@gmail.com > > > Sent: Friday, May 22, 2009 3:42:04 AM > > > Subject: Re: How to index large set data > > > > > > > > > about 2.8 m total docs were created. only the first > > run finishes. In my 2nd try, > > > it hangs there forever at the end of indexing, (I > > guess right before commit), > > > with cpu usage of 100%. Total 5G (2050) index files > > are created. Now I have two > > > problems: > > > 1. why it hangs there and failed? > > > 2. how can i speed up the indexing? > > > > > > > > > Here is my solrconfig.xml > > > > > > false > > > 3000 > > > 1000 > > > 2147483647 > > > 1 > > > false > > > > > > > > > > > > > > > --- On Thu, 5/21/09, Noble Paul > > നോബിള് नोब्ळ् wrote: > > > > > > > From: Noble Paul നോബിള് > > नोब्ळ् > > > > Subject: Re: How to index large set data > > > > To: solr-user@lucene.apache.org > > > > Date: Thursday, May 21, 2009, 10:39 PM > > > > what is the total no:of docs created > > > > ? I guess it may not be memory > > > > bound. indexing is mostly amn IO bound operation. > > You may > > > > be able to > > > > get a better perf if a SSD is used (solid state > > disk) > > > > > > > > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai > > > > wrote: > > > > > > > > > > Hi Paul, > > > > > > > > > > Thank you so much for answering my > > questions. It > > > > really helped. > > > > > After some adjustment, basically setting > > mergeFactor > > > > to 1000 from the default value of 10, I can > > finished the > > > > whole job in 2.5 hours. I checked that during > > running time, > > > > only around 18% of memory is being used, and VIRT > > is always > > > > 1418m. I am thinking it may be restricted by JVM > > memory > > > > setting. But I run the data import command > > through web, > > > > i.e., > > > > > > > > > http://:/solr/dataimport?command=full-import, > > > > how can I set the memory allocation for JVM? > > > > > Thanks again! > > > > > > > > > > JB > > > > > > > > > > --- On Thu, 5/21/09, Noble Paul > > നോബിള് > > > > नोब्ळ् > > > > wrote: > > > > > > > > > >> From: Noble Paul നോബിള് > > > > नोब्ळ् > > > > >> Subject: Re: How to index large set > > data > > > > >> To: solr-user@lucene.apache.org > > > > >> Date: Thursday, May 21, 2009, 9:57 PM > > > > >> check the status page of DIH and see > > > > >> if it is working properly. and > > > > >> if, yes what is the rate of indexing > > > > >> > > > > >> On Thu, May 21, 2009 at 11:48 AM, > > Jianbin Dai > > > > > > > > >> wrote: > > > > >> > > > > > >> > Hi, > > > > >> > > > > > >> > I have about 45GB xml files to be > > indexed. I > > > > am using > > > > >> DataImportHandler. I started the full > > import 4 > > > > hours ago, > > > > >> and it's still running. > > > > >> > My computer has 4GB memory. Any > > suggestion on > > > > the > > > > >> solutions? > > > > >> > Thanks! > > > > >> > > > > > >> > JB > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> > > > > > > - > > > > >> Noble Paul | Principal Engineer| AOL | > > http://aol.com > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > - > > > > Noble Paul | Principal Engineer| AOL | http://aol.com >
Re: R: Filtering query terms
Marco, Open-source can be good like that. :) See http://www.jroller.com/otis/entry/lucene_solr_nutch_amazing_tech for a similar example Ciao, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Branca Marco > To: "solr-user@lucene.apache.org" > Sent: Friday, May 22, 2009 12:27:45 PM > Subject: R: Filtering query terms > > Thank you very much for the instantaneous support. > I couldn't find the conflict for hours :( > > When I have a response for the ISOLatin1AccentFilterFactory I will write it > on > the mailing-list. > > Thanks again, > > Marco > > Da: Ensdorf Ken [ensd...@zoominfo.com] > Inviato: venerdì 22 maggio 2009 18.16 > A: 'solr-user@lucene.apache.org' > Oggetto: RE: Filtering query terms > > > When I try testing the filter "solr.LowerCaseFilterFactory" I get > > different results calling the following urls: > > > > 1. http://[server-ip]:[server-port]/solr/[core- > > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on > > 2. http://[server-ip]:[server-port]/solr/[core- > > name]/select/?q=all%3APaPa&version=2.2&start=0&rows=10&indent=on > > In this case, the WordDelimiterFilterFactory is kicking in on your second > search, so "APaPa" is split into "APa" and "Pa". You can double-check this > by > using the analysis tool in the admin UI - > http://localhost:8983/solr/admin/analysis.jsp > > > > > Besides, when trying to test the "solr.ISOLatin1AccentFilterFactory" I > > get different results calling the following urls: > > > > 1. http://[server-ip]:[server-port]/solr/[core- > > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on > > 2. http://[server-ip]:[server-port]/solr/[core- > > name]/select/?q=all%3Apapà&version=2.2&start=0&rows=10&indent=on > > Not sure what it happening here, but again I would check it with the analysi > tool > > -- > The information transmitted is intended for the person or entity to which it > is > addressed and may contain confidential and/or privileged material. Any > review, > retransmission, dissemination or other use of, or taking of any action in > reliance upon, this information by persons or entities other than the > intended > recipient is prohibited. If you received this in error, please contact the > sender and delete the material from any computer.
DIH uses == instead of = in SQL
I am getting this error: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '=='1433'' at line 1 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) during a select for a specific institution: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select institution_id, name, acronym as i_acronym from institutions where institution_id=='1433' Processing Document # 1 at org.apache.solr.handler.dataimport.JdbcDataSource $ResultSetIterator.(JdbcDataSource.java:248) at org .apache .solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:205) at org .apache .solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org .apache .solr .handler .dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org .apache .solr .handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java: 71) I just switched to using the paired deltaImportQuery and deltaQuery approach. I am using the latest from trunk. Any ideas? Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Document Boosts don't seem to be having an effect
Greetings - first post here - hoping someone can direct me - grasping at straws. thank you in advance. Jodi I'm trying to tune the sort order using a combination of document and query time boosts. When searching for the term 'builder' with almost identical quantities of this term, and a much larger document boost for doc #, it seems to be the score should be much higher for doc #1. Doc 1 boost - 21.542363409468 Doc 1 scoring - 6.7017727 Doc 1 boost - 12.6390725007673 Doc 2 scoring - 8.00193 All fields being searched on are _t fields - all are: where text is defined as: positionIncrementGap="100"> words="stopwords.txt"/> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> protected="protwords.txt"/> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> words="stopwords.txt"/> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> protected="protwords.txt"/> omitNorms isn't indicated - I've tried adding it to the Text definition - but no change. To illustrate I have the following documents (I may be overly verbose): #1 Companyfield>211623Company: 211623-79.3761name='lat'>43.6496J. Roberts & Associates Interiorsboost='1.0'>J. Roberts & Associates Interiorsname='profile_search_s' boost='1.0'>J. ROBERTS & ASSOCIATES INTERIORS <br />-30 years construction experience <br />-Quality service, on time, on budget <br />-All sub-trades are licensed and certified - We are fully licensed, insured and covered by WSIB. <br />-References available from our satisfied clients.<br/> <br/>BUILDER <br />-Custom Homes , Additions and Major Renovations <br />-Project Management and Planning - Design / Build , Engineering , Permits <br />-Renovation Advisors for DIY homeowners <br />KITCHENS & INTERIORS <br />-Design and planning <br />-Custom kitchens and interior renovations <br />-Complete painting services <br />STRUCTURAL SERVICES <br />-Engineering , permits required, Foundations and underpinning <br />-Wall removal and beam installation<br/ > <br/>Maintenance and Repairs Services <br />-Masonry repairs and Stone work (in house staff ) <br />-Windows and doors <br />-Eave troughs and metal work<br/> <br/>J. ROBERTS & ASSOCIATES INTERIORS -30 years construction experience -Quality service, on time, on budget -All sub-trades are licensed and certified - We are fully licensed, insured and covered by WSIB. -References available from our satisfied clients. BUILDER -Custom Homes , Additions and Major Renovations -Project Management and Planning - Design / Build , Engineering , Permits -Renovation Advisors for DIY homeowners KITCHENS & INTERIORS -Design and planning -Custom kitchens and interior renovations -Complete painting services STRUCTURAL SERVICES -Engineering , permits required, Foundations and underpinning -Wall removal and beam installation Maintenance and Repairs Services -Masonry repairs and Stone work (in house staff ) -Windows and doors -Eave troughs and metal work Builders, Home Builders, Home Contractors, residential builders, residential contractors, home construction companies, design build companies, design build contractors, residential building contractor,Foundations, ,General Contractors, Residential General Contractor, Building Contractor, Additions, Remodeling Contractor, Renovation, Builder,Home Additions, General contractor, home improvement, building addition, home expansion, house expansion,Kitchen & Bathroom - Cabinets & Design, Kitchen Cabinet And Counter, Kitchen Cabinet Hardware, Bathroom Cabinet, Bathroom Wall Cabinet, Bathroom Sink Cabinet,Kitchen Planning & Renovation, Kitchen Planning And Design, Kitchen Cabinet Planning, Kitchen Design, Kitchen Remodeling,Masonry & Bricklaying, Masonry Supply, Masonry Contractor, Concrete Masonry, Stone Masonry, Brick Laying Technique, Brick Laying Pattern, building a fireplace, constructing a firplace, stone fence, stone wall, brick wall, masonry repair, brick repairs,Paint & Wallpaper Contractors, Paint Colors, Paint Store, Paint Brush, Paint Shop, Home Wallpaper, Home Decorating, Wallpaper, Paint colour advice, paint colour consultants, wallpapering,name='reviews_info_cache_t' boost='0.0'>name='position_rf' boost='0.0'>12.1115name='first_letter_of_name_t' boost='0.0'>Jname='country_t' boost='0.0'>CANADAboost='0.0'>9.8913boost='0.0'>66boost='0.0'>23boost='0.0'>232field>approvedname='category_name_facet'>Buildersname='category_name_facet'>Foundationsname='category_name_facet'>General Contractorsname='category_name_facet'>Home Additi
Re: DIH uses == instead of = in SQL
Eric, WHERE institution_id=1433 vs. WHERE institution_id==1433 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Eric Pugh > To: solr-user@lucene.apache.org > Sent: Friday, May 22, 2009 2:43:59 PM > Subject: DIH uses == instead of = in SQL > > I am getting this error: > > Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You > have > an error in your SQL syntax; check the manual that corresponds to your MySQL > server version for the right syntax to use near '=='1433'' at line 1 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > during a select for a specific institution: > > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute > query: select institution_id, name, acronym as i_acronym from institutions > where > institution_id=='1433' Processing Document # 1 > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:248) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:205) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) > > I just switched to using the paired deltaImportQuery and deltaQuery approach. > > I am using the latest from trunk. Any ideas? > > Eric > > - > Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com > Free/Busy: http://tinyurl.com/eric-cal
Data Import Handler - parentDeltaImport
I have the data-config.xml detailed below (Stripped down a bit for simplicity) - When I run the delta import, the design_template delta query is running and modified rows are being returned. However, the parentDeltaQuery is never executed. Any thoughts? Thanks, Micahael
Re: Data Import Handler - parentDeltaImport
I have the data-config.xml detailed below (Stripped down a bit for simplicity) - When I run the delta import, the design_template delta query is running and modified rows are being returned. However, the parentDeltaQuery is never executed. Any thoughts? Thanks, Micahael
How to use DIH to index attributes in xml file
I have an xml file like this 301.46 In the data-config.xml, I use but how can I index "id", "mid"? Thanks.
solr machine freeze up during first replication after optimization
Hi all, We recently started running into this solr slave server freeze up problem. After looking into the logs and the timing of such occurrences, it seems that the problem always follows the first replication after an optimization. Once the server freezes up, we are unable to ssh into it, but ping still returns fine. The only way to recover is by rebooting the machine. In our replication setup, the masters are optimized nightly because we have a fairly large index (~60GB per master) and are adding millions of documents everyday. After the optimization, a snapshot happens automatically. When replication kicks in, the corresonding slave server will retrieve the snapshot using rsync. Here is the snappuller.log capturing one of the failed pull and one successful pull before and after it: 2009/05/21 22:55:01 started by biz360 2009/05/21 22:55:01 command: /mnt/solr/bin/snappuller ... 2009/05/21 22:55:04 pulling snapshot snapshot.20090521221402 2009/05/21 22:55:11 ended (elapsed time: 10 sec) # optimization completes sometime during this gap, and a new snapshot is created 2009/05/21 23:55:01 started by biz360 2009/05/21 23:55:01 command: /mnt/solr/bin/snappuller ... 2009/05/21 23:55:02 pulling snapshot snapshot.20090521233922 # slave freezes up, and machine has to be rebooted 2009/05/22 01:55:02 started by biz360 2009/05/22 01:55:02 command: /mnt/solr/bin/snappuller ... 2009/05/22 01:55:03 pulling snapshot snapshot.20090522014528 2009/05/22 02:56:12 ended (elapsed time: 3670 sec) A more detailed debug log shows snappuller simply stopped at some point: started by biz360 command: /mnt/solr/bin/snappuller ... pulling snapshot snapshot.20090521233922 receiving file list ... done deleting segments_16a deleting _cwu.tis deleting _cwu.tii deleting _cwu.prx deleting _cwu.nrm deleting _cwu.frq deleting _cwu.fnm deleting _cwt.tis deleting _cwt.tii deleting _cwt.prx deleting _cwt.nrm deleting _cwt.frq deleting _cwt.fnm deleting _cws.tis deleting _cws.tii deleting _cws.prx deleting _cws.nrm deleting _cws.frq deleting _cws.fnm deleting _cwr_1.del deleting _cwr.tis deleting _cwr.tii deleting _cwr.prx deleting _cwr.nrm deleting _cwr.frq deleting _cwr.fnm deleting _cwq.tis deleting _cwq.tii deleting _cwq.prx deleting _cwq.nrm deleting _cwq.frq deleting _cwq.fnm deleting _cwq.fdx deleting _cwq.fdt deleting _cwp.tis deleting _cwp.tii deleting _cwp.prx deleting _cwp.nrm deleting _cwp.frq deleting _cwq.fnm deleting _cwq.fdx deleting _cwq.fdt deleting _cwp.tis deleting _cwp.tii deleting _cwp.prx deleting _cwp.nrm deleting _cwp.frq deleting _cwp.fnm deleting _cwp.fdx deleting _cwp.fdt deleting _cwo_1.del deleting _cwo.tis deleting _cwo.tii deleting _cwo.prx deleting _cwo.nrm deleting _cwo.frq deleting _cwo.fnm deleting _cwo.fdx deleting _cwo.fdt deleting _cwe_1.del deleting _cwe.tis deleting _cwe.tii deleting _cwe.prx deleting _cwe.nrm deleting _cwe.frq deleting _cwe.fnm deleting _cwe.fdx deleting _cwe.fdt deleting _cw2_3.del deleting _cw2.tis deleting _cw2.tii deleting _cw2.prx deleting _cw2.nrm deleting _cw2.frq deleting _cw2.fnm deleting _cw2.fdx deleting _cw2.fdt deleting _cvs_4.del deleting _cvs.tis deleting _cvs.tii deleting _cvs.prx deleting _cvs.nrm deleting _cvs.frq deleting _cvs.fnm deleting _cvs.fdx deleting _cvs.fdt deleting _csp_h.del deleting _csp.tis deleting _csp.tii deleting _csp.prx deleting _csp.nrm deleting _csp.frq deleting _csp.fnm deleting _csp.fdx deleting _csp.fdt deleting _cpn_q.del deleting _cpn.tis deleting _cpn.tii deleting _cpn.prx deleting _cpn.nrm deleting _cpn.frq deleting _cpn.fnm deleting _cpn.fdx deleting _cpn.fdt deleting _cmk_x.del deleting _cmk.tis deleting _cmk.tii deleting _cmk.prx deleting _cmk.nrm deleting _cmk.frq deleting _cmk.fnm deleting _cmk.fdx deleting _cmk.fdt deleting _cjg_14.del deleting _cjg.tis deleting _cjg.tii deleting _cjg.prx deleting _cjg.nrm deleting _cjg.frq deleting _cjg.fnm deleting _cjg.fdx deleting _cjg.fdt deleting _cge_19.del deleting _cge.tis deleting _cge.tii deleting _cge.prx deleting _cge.nrm deleting _cge.frq deleting _cge.fnm deleting _cge.fdx deleting _cge.fdt deleting _cd9_1m.del deleting _cd9.tis deleting _cd9.tii deleting _cd9.prx deleting _cd9.nrm deleting _cd9.frq deleting _cd9.fnm deleting _cd9.fdx deleting _cd9.fdt ./ _cww.fdt We have random Solr slaves failing in the exact same manner almost daily. Any help is appreciated!
Re: solr machine freeze up during first replication after optimization
Hm, are you sure this is not a network/switch/disk/something like that problem? Also, precisely because you have such a large index I'd avoid optimizing the index and then replicating it. My wild guess is that simply rsyncing this much data over the network kills your machines. Have you tried manually doing the rsync and watching the machine/switches/NICs/disks to see what's going on? That's what I'd do. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Kyle Lau > To: solr-user@lucene.apache.org > Sent: Friday, May 22, 2009 7:54:53 PM > Subject: solr machine freeze up during first replication after optimization > > Hi all, > > We recently started running into this solr slave server freeze up problem. > After looking into the logs and the timing of such occurrences, it seems > that the problem always follows the first replication after an > optimization. Once the server freezes up, we are unable to ssh into it, but > ping still returns fine. The only way to recover is by rebooting the > machine. > > In our replication setup, the masters are optimized nightly because we have > a fairly large index (~60GB per master) and are adding millions of documents > everyday. After the optimization, a snapshot happens automatically. When > replication kicks in, the corresonding slave server will retrieve the > snapshot using rsync. > > Here is the snappuller.log capturing one of the failed pull and one > successful pull before and after it: > > 2009/05/21 22:55:01 started by biz360 > 2009/05/21 22:55:01 command: /mnt/solr/bin/snappuller ... > 2009/05/21 22:55:04 pulling snapshot snapshot.20090521221402 > 2009/05/21 22:55:11 ended (elapsed time: 10 sec) > > # optimization completes sometime during this gap, and a new snapshot is > created > > 2009/05/21 23:55:01 started by biz360 > 2009/05/21 23:55:01 command: /mnt/solr/bin/snappuller ... > 2009/05/21 23:55:02 pulling snapshot snapshot.20090521233922 > > # slave freezes up, and machine has to be rebooted > > 2009/05/22 01:55:02 started by biz360 > 2009/05/22 01:55:02 command: /mnt/solr/bin/snappuller ... > 2009/05/22 01:55:03 pulling snapshot snapshot.20090522014528 > 2009/05/22 02:56:12 ended (elapsed time: 3670 sec) > > > A more detailed debug log shows snappuller simply stopped at some point: > > started by biz360 > command: /mnt/solr/bin/snappuller ... > pulling snapshot snapshot.20090521233922 > receiving file list ... done > deleting segments_16a > deleting _cwu.tis > deleting _cwu.tii > deleting _cwu.prx > deleting _cwu.nrm > deleting _cwu.frq > deleting _cwu.fnm > deleting _cwt.tis > deleting _cwt.tii > deleting _cwt.prx > deleting _cwt.nrm > deleting _cwt.frq > deleting _cwt.fnm > deleting _cws.tis > deleting _cws.tii > deleting _cws.prx > deleting _cws.nrm > deleting _cws.frq > deleting _cws.fnm > deleting _cwr_1.del > deleting _cwr.tis > deleting _cwr.tii > deleting _cwr.prx > deleting _cwr.nrm > deleting _cwr.frq > deleting _cwr.fnm > deleting _cwq.tis > deleting _cwq.tii > deleting _cwq.prx > deleting _cwq.nrm > deleting _cwq.frq > deleting _cwq.fnm > deleting _cwq.fdx > deleting _cwq.fdt > deleting _cwp.tis > deleting _cwp.tii > deleting _cwp.prx > deleting _cwp.nrm > deleting _cwp.frq > deleting _cwq.fnm > deleting _cwq.fdx > deleting _cwq.fdt > deleting _cwp.tis > deleting _cwp.tii > deleting _cwp.prx > deleting _cwp.nrm > deleting _cwp.frq > deleting _cwp.fnm > deleting _cwp.fdx > deleting _cwp.fdt > deleting _cwo_1.del > deleting _cwo.tis > deleting _cwo.tii > deleting _cwo.prx > deleting _cwo.nrm > deleting _cwo.frq > deleting _cwo.fnm > deleting _cwo.fdx > deleting _cwo.fdt > deleting _cwe_1.del > deleting _cwe.tis > deleting _cwe.tii > deleting _cwe.prx > deleting _cwe.nrm > deleting _cwe.frq > deleting _cwe.fnm > deleting _cwe.fdx > deleting _cwe.fdt > deleting _cw2_3.del > deleting _cw2.tis > deleting _cw2.tii > deleting _cw2.prx > deleting _cw2.nrm > deleting _cw2.frq > deleting _cw2.fnm > deleting _cw2.fdx > deleting _cw2.fdt > deleting _cvs_4.del > deleting _cvs.tis > deleting _cvs.tii > deleting _cvs.prx > deleting _cvs.nrm > deleting _cvs.frq > deleting _cvs.fnm > deleting _cvs.fdx > deleting _cvs.fdt > deleting _csp_h.del > deleting _csp.tis > deleting _csp.tii > deleting _csp.prx > deleting _csp.nrm > deleting _csp.frq > deleting _csp.fnm > deleting _csp.fdx > deleting _csp.fdt > deleting _cpn_q.del > deleting _cpn.tis > deleting _cpn.tii > deleting _cpn.prx > deleting _cpn.nrm > deleting _cpn.frq > deleting _cpn.fnm > deleting _cpn.fdx > deleting _cpn.fdt > deleting _cmk_x.del > deleting _cmk.tis > deleting _cmk.tii > deleting _cmk.prx > deleting _cmk.nrm > deleting _cmk.frq > deleting _cmk.fnm > deleting _cmk.fdx > deleting _cmk.fdt > deleting _cjg_14.del > deleting _cjg.tis > deleting _cjg.tii > deleting _cjg.prx > deleting _cjg.nrm > deleting _cjg.frq > deleting _cjg.fnm > deleting _cjg.fdx > deleting _cjg.fdt > deleting _cge_19.del > deleting _cge.
questions about Clustering
I'm thinking using clustering (SOLR-769) function for my project. I have a couple of questions: 1. if q=*:* is requested, Carrot2 will receive "MatchAllDocsQuery" via attributes. Is it OK? 2. I'd like to use it on an environment other than English, e.g. Japanese. I've implemented Carrot2JapaneseAnalyzer (w/ Payload/ITokenType) for this purpose. It worked well with ClusteringDocumentList example, but didn't work with CarrotClusteringEngine. What I did is that I inserted the following lines(+) to CarrotClusteringEngine: attributes.put(AttributeNames.QUERY, query.toString()); + attributes.put(AttributeUtils.getKey(Tokenizer.class, "analyzer"), + Carrot2JapaneseAnalyzer.class); There is no runtime errors, but Carrot2 didn't use my analyzer, it just ignored and used ExtendedWhitespaceAnalyzer (confirmed via debugger). Is it classloader problem? I placed my jar in ${solr.solr.home}/lib . Thank you, Koji
Re: DIH uses == instead of = in SQL
are you using delta-import w/o a deltaImportQuery ? pls paste the relevant portion of data-config.xml On Sat, May 23, 2009 at 12:13 AM, Eric Pugh wrote: > I am getting this error: > > Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You > have an error in your SQL syntax; check the manual that corresponds to your > MySQL server version for the right syntax to use near '=='1433'' at line 1 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > during a select for a specific institution: > > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute query: select institution_id, name, acronym as i_acronym from > institutions where institution_id=='1433' Processing Document # 1 > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:248) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:205) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) > > I just switched to using the paired deltaImportQuery and deltaQuery > approach. I am using the latest from trunk. Any ideas? > > Eric > > - > Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com > Free/Busy: http://tinyurl.com/eric-cal > > > > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to use DIH to index attributes in xml file
wild cards are not supported . u must use full xpath On Sat, May 23, 2009 at 4:55 AM, Jianbin Dai wrote: > > I have an xml file like this > > > > > 301.46 > > > In the data-config.xml, I use > > > but how can I index "id", "mid"? > > Thanks. > > > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to index large set data
no need to use embedded Solrserver. you can use SolrJ with streaming in multiple threads On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai wrote: > > If I do the xml parsing by myself and use embedded client to do the push, > would it be more efficient than DIH? > > > --- On Fri, 5/22/09, Grant Ingersoll wrote: > >> From: Grant Ingersoll >> Subject: Re: How to index large set data >> To: solr-user@lucene.apache.org >> Date: Friday, May 22, 2009, 5:38 AM >> Can you parallelize this? I >> don't know that the DIH can handle it, >> but having multiple threads sending docs to Solr is the >> best >> performance wise, so maybe you need to look at alternatives >> to pulling >> with DIH and instead use a client to push into Solr. >> >> >> On May 22, 2009, at 3:42 AM, Jianbin Dai wrote: >> >> > >> > about 2.8 m total docs were created. only the first >> run finishes. In >> > my 2nd try, it hangs there forever at the end of >> indexing, (I guess >> > right before commit), with cpu usage of 100%. Total 5G >> (2050) index >> > files are created. Now I have two problems: >> > 1. why it hangs there and failed? >> > 2. how can i speed up the indexing? >> > >> > >> > Here is my solrconfig.xml >> > >> > >> false >> > >> 3000 >> > >> 1000 >> > >> 2147483647 >> > >> 1 >> > >> false >> > >> > >> > >> > >> > --- On Thu, 5/21/09, Noble Paul >> നോബിള് नो >> > ब्ळ् >> wrote: >> > >> >> From: Noble Paul നോബിള് >> नोब्ळ् >> >> >> >> Subject: Re: How to index large set data >> >> To: solr-user@lucene.apache.org >> >> Date: Thursday, May 21, 2009, 10:39 PM >> >> what is the total no:of docs created >> >> ? I guess it may not be memory >> >> bound. indexing is mostly amn IO bound operation. >> You may >> >> be able to >> >> get a better perf if a SSD is used (solid state >> disk) >> >> >> >> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai >> >> >> wrote: >> >>> >> >>> Hi Paul, >> >>> >> >>> Thank you so much for answering my questions. >> It >> >> really helped. >> >>> After some adjustment, basically setting >> mergeFactor >> >> to 1000 from the default value of 10, I can >> finished the >> >> whole job in 2.5 hours. I checked that during >> running time, >> >> only around 18% of memory is being used, and VIRT >> is always >> >> 1418m. I am thinking it may be restricted by JVM >> memory >> >> setting. But I run the data import command through >> web, >> >> i.e., >> >>> >> >> >> http://:/solr/dataimport?command=full-import, >> >> how can I set the memory allocation for JVM? >> >>> Thanks again! >> >>> >> >>> JB >> >>> >> >>> --- On Thu, 5/21/09, Noble Paul >> നോബിള് >> >> नोब्ळ् >> >> wrote: >> >>> >> From: Noble Paul നോബിള് >> >> नोब्ळ् >> Subject: Re: How to index large set data >> To: solr-user@lucene.apache.org >> Date: Thursday, May 21, 2009, 9:57 PM >> check the status page of DIH and see >> if it is working properly. and >> if, yes what is the rate of indexing >> >> On Thu, May 21, 2009 at 11:48 AM, Jianbin >> Dai >> >> >> wrote: >> > >> > Hi, >> > >> > I have about 45GB xml files to be >> indexed. I >> >> am using >> DataImportHandler. I started the full >> import 4 >> >> hours ago, >> and it's still running >> > My computer has 4GB memory. Any >> suggestion on >> >> the >> solutions? >> > Thanks! >> > >> > JB >> > >> > >> > >> > >> > >> >> >> >> -- >> >> >> >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> >>> >> >>> >> >>> >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> >> - >> >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> >> > >> > >> > >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem >> (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> using Solr/Lucene: >> http://www.lucidimagination..com/search >> >> > > > > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Data Import Handler - parentDeltaImport
how do you know it is not being executed ?. use deltaImportQuery also if you are using Solr1.4 On Sat, May 23, 2009 at 4:29 AM, Michael Korthuis wrote: > I have the data-config.xml detailed below (Stripped down a bit for > simplicity) - > When I run the delta import, the design_template delta query is running and > modified rows are being returned. However, the parentDeltaQuery is never > executed. > > Any thoughts? > > Thanks, > > Micahael > > > > user="USER" password="PASSWORD"/> > > pk="catalog_item_id" > query="select catalog_item_id,catalog_item_code from catalog_item" > deltaQuery="select catalog_item_id from catalog_item where date_updated > > '${dataimporter.last_index_time}'" > deletedPkQuery="select catalog_item_id from catalog_item_delete where > date_deleted > '${dataimporter.last_index_time}'" >> > > > > id="design_template_id" pk="design_template_id" > query="select name from design_template where > design_template_id='${catalog_item.design_template_id_fk}'" > deltaQuery="select design_template_id from design_template where > date_updated > '${dataimporter.last_index_time}'" > parentDeltaQuery="select catalog_item_id from catalog_item where > design_template_id_fk = '${design_template.design_template_id}'" > > > > > > > > > > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: How to index large set data
Hi Pual, but in your previous post, you said "there is already an issue for writing to Solr in multiple threads SOLR-1089". Do you think use solrj alone would be better than DIH? Thanks and have a good weekend! --- On Fri, 5/22/09, Noble Paul നോബിള് नोब्ळ् wrote: > no need to use embedded Solrserver. > you can use SolrJ with streaming > in multiple threads > > On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai > wrote: > > > > If I do the xml parsing by myself and use embedded > client to do the push, would it be more efficient than DIH? > > > > > > --- On Fri, 5/22/09, Grant Ingersoll > wrote: > > > >> From: Grant Ingersoll > >> Subject: Re: How to index large set data > >> To: solr-user@lucene.apache.org > >> Date: Friday, May 22, 2009, 5:38 AM > >> Can you parallelize this? I > >> don't know that the DIH can handle it, > >> but having multiple threads sending docs to Solr > is the > >> best > >> performance wise, so maybe you need to look at > alternatives > >> to pulling > >> with DIH and instead use a client to push into > Solr. > >> > >> > >> On May 22, 2009, at 3:42 AM, Jianbin Dai wrote: > >> > >> > > >> > about 2.8 m total docs were created. only the > first > >> run finishes. In > >> > my 2nd try, it hangs there forever at the end > of > >> indexing, (I guess > >> > right before commit), with cpu usage of 100%. > Total 5G > >> (2050) index > >> > files are created. Now I have two problems: > >> > 1. why it hangs there and failed? > >> > 2. how can i speed up the indexing? > >> > > >> > > >> > Here is my solrconfig.xml > >> > > >> > > >> > false > >> > > >> > 3000 > >> > > >> 1000 > >> > > >> > 2147483647 > >> > > >> > 1 > >> > > >> > false > >> > > >> > > >> > > >> > > >> > --- On Thu, 5/21/09, Noble Paul > >> നോബിള് नो > >> > ब्ळ् > >> wrote: > >> > > >> >> From: Noble Paul നോബിള് > >> नोब्ळ् > >> >> > >> >> Subject: Re: How to index large set data > >> >> To: solr-user@lucene.apache.org > >> >> Date: Thursday, May 21, 2009, 10:39 PM > >> >> what is the total no:of docs created > >> >> ? I guess it may not be memory > >> >> bound. indexing is mostly amn IO bound > operation. > >> You may > >> >> be able to > >> >> get a better perf if a SSD is used (solid > state > >> disk) > >> >> > >> >> On Fri, May 22, 2009 at 10:46 AM, Jianbin > Dai > >> > >> >> wrote: > >> >>> > >> >>> Hi Paul, > >> >>> > >> >>> Thank you so much for answering my > questions. > >> It > >> >> really helped. > >> >>> After some adjustment, basically > setting > >> mergeFactor > >> >> to 1000 from the default value of 10, I > can > >> finished the > >> >> whole job in 2.5 hours. I checked that > during > >> running time, > >> >> only around 18% of memory is being used, > and VIRT > >> is always > >> >> 1418m. I am thinking it may be restricted > by JVM > >> memory > >> >> setting. But I run the data import > command through > >> web, > >> >> i.e., > >> >>> > >> >> > >> > http://:/solr/dataimport?command=full-import, > >> >> how can I set the memory allocation for > JVM? > >> >>> Thanks again! > >> >>> > >> >>> JB > >> >>> > >> >>> --- On Thu, 5/21/09, Noble Paul > >> നോബിള് > >> >> नोब्ळ् > >> >> wrote: > >> >>> > >> From: Noble Paul > നോബിള് > >> >> नोब्ळ् > >> Subject: Re: How to index large > set data > >> To: solr-u...@lucene.apache..org > >> Date: Thursday, May 21, 2009, > 9:57 PM > >> check the status page of DIH and > see > >> if it is working properly. and > >> if, yes what is the rate of > indexing > >> > >> On Thu, May 21, 2009 at 11:48 AM, > Jianbin > >> Dai > >> >> > >> wrote: > >> > > >> > Hi, > >> > > >> > I have about 45GB xml files > to be > >> indexed. I > >> >> am using > >> DataImportHandler. I started the > full > >> import 4 > >> >> hours ago, > >> and it's still running. > >> > My computer has 4GB memory. > Any > >> suggestion on > >> >> the > >> solutions? > >> > Thanks! > >> > > >> > JB > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > >> -- > >> > >> >> > >> > - > >> Noble Paul | Principal Engineer| > AOL | http://aol.com > >> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> > - > >> >> Noble Paul | Principal Engineer| AOL | http://aol.com > >> >> > >> > > >> > > >> > > >> > >> -- > >> Grant Ingersoll > >> http://www.lucidimagination.com/ > >> > >> Search the Lucene ecosystem > >> (Lucene/Solr/Nutch/Mahout/Tika/Droids) > >> using Solr/Lucene: > >> http://www.lucidimagination...com/search > >> > >> > > > > > > > > > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Re: How to use DIH to index attributes in xml file
Oh, I guess I didn't say it clearly in my post. I didn't use wild cards in xpath. My question was how to index attributes "id" and "mid" in the following xml file. 301.46 In the data-config.xml, I use but what are the xpath for "id" and "mid"? Thanks again! --- On Fri, 5/22/09, Noble Paul നോബിള് नोब्ळ् wrote: > From: Noble Paul നോബിള് नोब्ळ् > Subject: Re: How to use DIH to index attributes in xml file > To: solr-user@lucene.apache.org > Date: Friday, May 22, 2009, 9:03 PM > wild cards are not supported . u must > use full xpath > > On Sat, May 23, 2009 at 4:55 AM, Jianbin Dai > wrote: > > > > I have an xml file like this > > > > > > type="stock-4" /> > > type="cond-0" /> > > > 301.46 > > > > > > In the data-config.xml, I use > > xpath="/.../merchantProduct/price" /> > > > > but how can I index "id", "mid"? > > > > Thanks. > > > > > > > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Re: How to use DIH to index attributes in xml file
On Sat, May 23, 2009 at 10:31 AM, Jianbin Dai wrote: > > Oh, I guess I didn't say it clearly in my post. > I didn't use wild cards in xpath. My question was how to index attributes > "id" and "mid" in the following xml file. > > > > >301.46 > > > In the data-config.xml, I use > > > but what are the xpath for "id" and "mid"? > That would be /merchantProduct/@id and /merchantProduct/@mid -- Regards, Shalin Shekhar Mangar.