Kinda-sorta realtime?
We're using Solr as the backbone for our shiny new helpdesk application, and by and large it's been a big win... especially in terms of search performance. But before I pat myself on the back because the Solr devs have done a great job, I had a question regarding commit frequency. While our app doesn't need truly realtime search, documents get updated and replaced somewhat frequently, and those changes need to be visible in the index within 500ms. At the moment, I'm using autocommit to satisfy this, but I've run across a few threads mentioning that frequent commits may cause some serious performance issues. Our average document size is quite small (less than 10k), and I'm expecting that we're going to have a maximum of around 100k documents per day on any given index; most of these will be replacing existing documents. So, rather than getting bitten by this down the road, I figure I may as well (a) ask if anybody else here is running a similar setup or has any input, and then (b) do some heavy load testing via a fake data generator. Thanks-in-advance!
problem querying date field
I have the follwing field in my schema : Querying for another field, I verify that the value is being set as expected. http://localhost:8080/apache-solr-1.4.0/select/?q=url:www.vg.no ... 2010-04-16T10:47:25.282Z however, querying on a date-range that definitely includes this document, I get no documents returned: http://localhost:8080/apache-solr-1.4.0/select/?q=indextime:NOW-1DAY%20TO%20NOW%20+1DAY What is wrong with my query here ? -- jo
Re: Kinda-sorta realtime?
Hi Don, We've got a similar requirement in our environment - here's what we've found.. Every time you commit, you're doing a relatively disk I/O intensive task to insert the document(s) into the index. For very small indexes (say, <10,000 docs), the commit time is pretty short and you can get away with doing frequent commits. With large indexes, commits can take seconds to complete, and use a fair bit of CPU & disk resource along the way. This of course impacts search performance, and it won't get your docs searchable within your 500ms requirement. The planned NRT (near real-time) feature (I believe scheduled for 1.5?) is probably what you need, where Lucene commits are done on a pre-segment basis. You could also check out the Zoie plugin, but make sure you're not also committing to disk straightaway, and that you don't mind having to reinput some data if your server crashes (Zoie uses an in-memory lookup for new doc insertions). HTH Peter On Fri, Apr 16, 2010 at 10:13 AM, Don Werve wrote: > We're using Solr as the backbone for our shiny new helpdesk application, > and > by and large it's been a big win... especially in terms of search > performance. But before I pat myself on the back because the Solr devs > have > done a great job, I had a question regarding commit frequency. > > While our app doesn't need truly realtime search, documents get updated and > replaced somewhat frequently, and those changes need to be visible in the > index within 500ms. At the moment, I'm using autocommit to satisfy this, > but I've run across a few threads mentioning that frequent commits may > cause > some serious performance issues. > > Our average document size is quite small (less than 10k), and I'm expecting > that we're going to have a maximum of around 100k documents per day on any > given index; most of these will be replacing existing documents. > > So, rather than getting bitten by this down the road, I figure I may as > well > (a) ask if anybody else here is running a similar setup or has any input, > and then (b) do some heavy load testing via a fake data generator. > > Thanks-in-advance! >
Re: problem querying date field
You need to use brackets around range queries. See http://wiki.apache.org/solr/SolrQuerySyntax Erik On Apr 16, 2010, at 7:08 AM, Jan-Olav Eide wrote: I have the follwing field in my schema : default="NOW" multiValued="false"/> Querying for another field, I verify that the value is being set as expected. http://localhost:8080/apache-solr-1.4.0/select/?q=url:www.vg.no ... 2010-04-16T10:47:25.282Z however, querying on a date-range that definitely includes this document, I get no documents returned: http://localhost:8080/apache-solr-1.4.0/select/?q=indextime:NOW-1DAY%20TO%20NOW%20+1DAY What is wrong with my query here ? -- jo
Getting the length of a field?
Hi there, I'm looking around to see if there's a function that will return the length of a string in a field, but not seeing one. This is a field whose data I store, but don't use for querying generally, but I want to be able to take its length into account. Is this possible? Any help much appreciated :) —Oliver
Re: StreamingUpdateSolrServer hangs
Hi Yonik, Yonik Seeley wrote: Stephen, were you running stock Solr 1.4, or did you apply any of the SolrJ patches? I'm trying to figure out if anyone still has any problems, or if this was fixed with SOLR-1711: I'm using the latest trunk version (rev. 934846) and constantly running into the same problem. I'm using StreamingUpdateSolrServer with 3 treads and a queue size of 20 (not really knowing if this configuration is optimal). My multi-threaded application indexes 200k data items (bibliographic metadata in Dublin Core format) and constantly hangs after running for some time. Below you can find the thread dump of one of my index threads (after the app hangs all dumps are the same) "thread 19" prio=10 tid=0x7fe8c0415800 nid=0x277d waiting on condition [0x42d05000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fe8cdcb7598> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254) at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:216) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64) at de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:29) at de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:10) at de.kobv.ked.index.AbstractIndexThread.addIndexDocument(AbstractIndexThread.java:59) at de.kobv.ked.rss.RssThread.indiziere(RssThread.java:30) at de.kobv.ked.rss.RssThread.run(RssThread.java:58) and of the three SUSS threads: "pool-1-thread-3" prio=10 tid=0x7fe8c7b7f000 nid=0x2780 in Object.wait() [0x409ac000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x7fe8cdcb6f10> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) - locked <0x7fe8cdcb6f10> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-2" prio=10 tid=0x7fe8c7afa000 nid=0x277f in Object.wait() [0x40209000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x7fe8cdcb6f10> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) - locked <0x7fe8cdcb6f10> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-1" prio=10 tid=0x7fe8c79f2800 nid=0x277e in Object.wait() [0x42e06000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x7fe8cdcb6f10> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.
Re: StreamingUpdateSolrServer hangs
Thanks for the report Sascha. So after the hang, it never recovers? Some amount of hanging could be visible if there was a commit on the Solr server or something else to cause the solr requests to block for a while... but it should return to normal on it's own... Looking at the stack trace, it looks like threads are blocked waiting to get an http connection. I'm traveling all next week, but I'll open a JIRA issue for this now. Anything that would help us reproduce this is much appreciated. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague On Fri, Apr 16, 2010 at 8:57 AM, Sascha Szott wrote: > Hi Yonik, > > Yonik Seeley wrote: >> >> Stephen, were you running stock Solr 1.4, or did you apply any of the >> SolrJ patches? >> I'm trying to figure out if anyone still has any problems, or if this >> was fixed with SOLR-1711: > > I'm using the latest trunk version (rev. 934846) and constantly running into > the same problem. I'm using StreamingUpdateSolrServer with 3 treads and a > queue size of 20 (not really knowing if this configuration is optimal). My > multi-threaded application indexes 200k data items (bibliographic metadata > in Dublin Core format) and constantly hangs after running for some time. > > Below you can find the thread dump of one of my index threads (after the app > hangs all dumps are the same) > > "thread 19" prio=10 tid=0x7fe8c0415800 nid=0x277d waiting on condition > [0x42d05000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fe8cdcb7598> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254) > at > org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:216) > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64) > at > de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:29) > at > de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:10) > at > de.kobv.ked.index.AbstractIndexThread.addIndexDocument(AbstractIndexThread.java:59) > at de.kobv.ked.rss.RssThread.indiziere(RssThread.java:30) > at de.kobv.ked.rss.RssThread.run(RssThread.java:58) > > > > and of the three SUSS threads: > > "pool-1-thread-3" prio=10 tid=0x7fe8c7b7f000 nid=0x2780 in Object.wait() > [0x409ac000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x7fe8cdcb6f10> (a > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) > - locked <0x7fe8cdcb6f10> (a > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) > at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) > at > org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > "pool-1-thread-2" prio=10 tid=0x7fe8c7afa000 nid=0x277f in Object.wait() > [0x40209000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x7fe8cdcb6f10> (a > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) > - locked <0x7fe8cdcb6f10> (a > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) > at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) > at > org.apach
Re: SOLR Exact match problem - Punctuations, double quotes etc.
Well, I think that's part of your problem. WhitespaceAnalyzer does exactly what it says, splits on whitespace. So indexing "carbon" and searching "carbon." won't generate a hit. If KeywordAnalyzer doesn't work for you, you could consider either using one of the Pattern* guys or write your own. HTH Erick On Fri, Apr 16, 2010 at 12:18 AM, Hid-Mubarmij wrote: > > Hi Erick, > > Thanks, I am using solr.WhitespaceTokenizerFactory and > solr.LowerCaseFilterFactory for both index and query time. > Following is the complete field i am using schema.xml: > == > positionIncrementGap="100"> > > > > > > > > > > > > > > > > > > > > > == > > > -- > View this message in context: > http://n3.nabble.com/SOLR-Exact-match-problem-Punctuations-double-quotes-etc-tp720807p723099.html > Sent from the Solr - User mailing list archive at Nabble.com. >
RE: XSD for Solrv1.4
Thanks I'm taking a look at SolrJ Longer term i'd still like to have access to an XSD - then i can see us integrating this better in the Oracle Service Bus and writing less Java code in our webApp -Original Message- From: hkmortensen [mailto:ko...@yahoo.com] Sent: 15 April 2010 21:26 To: solr-user@lucene.apache.org Subject: Re: XSD for Solrv1.4 Smaric-2 wrote: > > Are there any plans to release an xsd (& preferably a set of JAXB classes) > so we can process the xml returned for a search request > > > I do not know. I would recommend to use solrj, the java client. I always do that myself, do you have a reason not to do that? -- View this message in context: http://n3.nabble.com/weight-and-ranking-tp720944p722255.html Sent from the Solr - User mailing list archive at Nabble.com. No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.437 / Virus Database: 271.1.1/2812 - Release Date: 04/15/10 06:31:00
Solr Index Lock Issue
Hi All, We are facing the issue with the Solr server in the DMOZ data migration. The Solr has 0 records when the migration starts and the data is added into Solr in the batches of 2 records. The commit is called on Solr after 20k records are processed. While commiting the data into Solr, a lucene lock file is created in the /data/index folder which is automatically released once the successful commit happens. But after 4-5 batches, the lock file remains there and Solr just hangs and does not add any new records. Some times the whole migration goes through without any errors. Kindly let me know in case some setting needs to be required on Solr side, which ensures that until the Solr commits the index, the next set of records should not be added. Thanks, Param
Re: StreamingUpdateSolrServer hangs
Hi Yonik, thanks for your fast reply. Yonik Seeley wrote: Thanks for the report Sascha. So after the hang, it never recovers? Some amount of hanging could be visible if there was a commit on the Solr server or something else to cause the solr requests to block for a while... but it should return to normal on it's own... In my case the whole application hangs and never recovers (CPU utilization goes down to near 0%). Interestingly, the problem reproducibly occurs only if SUSS is created with *more than 2* threads. Looking at the stack trace, it looks like threads are blocked waiting to get an http connection. I forgot to mention that my index app has exclusive access to the Solr instance. Therefore, concurrent searches against the same Solr instance while indexing are excluded. I'm traveling all next week, but I'll open a JIRA issue for this now. Thank you! Anything that would help us reproduce this is much appreciated. Are there any other who have experienced the same problem? -Sascha On Fri, Apr 16, 2010 at 8:57 AM, Sascha Szott wrote: Hi Yonik, Yonik Seeley wrote: Stephen, were you running stock Solr 1.4, or did you apply any of the SolrJ patches? I'm trying to figure out if anyone still has any problems, or if this was fixed with SOLR-1711: I'm using the latest trunk version (rev. 934846) and constantly running into the same problem. I'm using StreamingUpdateSolrServer with 3 treads and a queue size of 20 (not really knowing if this configuration is optimal). My multi-threaded application indexes 200k data items (bibliographic metadata in Dublin Core format) and constantly hangs after running for some time. Below you can find the thread dump of one of my index threads (after the app hangs all dumps are the same) "thread 19" prio=10 tid=0x7fe8c0415800 nid=0x277d waiting on condition [0x42d05000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for<0x7fe8cdcb7598> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254) at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:216) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64) at de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:29) at de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:10) at de.kobv.ked.index.AbstractIndexThread.addIndexDocument(AbstractIndexThread.java:59) at de.kobv.ked.rss.RssThread.indiziere(RssThread.java:30) at de.kobv.ked.rss.RssThread.run(RssThread.java:58) and of the three SUSS threads: "pool-1-thread-3" prio=10 tid=0x7fe8c7b7f000 nid=0x2780 in Object.wait() [0x409ac000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on<0x7fe8cdcb6f10> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) - locked<0x7fe8cdcb6f10> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) "pool-1-thread-2" prio=10 tid=0x7fe8c7afa000 nid=0x277f in Object.wait() [0x40209000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on<0x7fe8cdcb6f10> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) - locked<0x7fe8cdcb6f10> (a org.apache.commons.http
Re: Handling missing date fields in a date-oriented function query
I still like this approach, but I've discovered one wrinkle, which is that I have dates in my dataset dated at the epoch (i.e. midnight Jan 1, 1970), as well as before the epoch (e.g. midnight Jan 1, 1950). The docs dated *before* the epoch so far don't seem to be a problem; they end up having a negative numeric value, but that seems workable. The docs dated exactly *at* the epoch, though, are trouble, because I can't tell those docs apart from the undated docs in my function query. (Both end up with numeric value of 0 in the function query code. So missing date is the same as midnight Jan 1, 1970.) So far, in my case, the best bet seems to be changing the time component of my dates. So rather than rounding dates to the nearest midnight (e.g. 1970-01-01T00:00:00Z), I could round them to the nearest, say, 1AM (e.g. 1970-01-01T01:00:00Z), with the goal of making sure that none of my legitimate date field values will evaluate to numeric value 0. Since I don't show the time component of dates to my users, I don't think this would cause any real trouble. It feels slightly unclean, though. On Thu, Apr 8, 2010 at 1:05 PM, Chris Harris wrote: > If anyone is curious, I've created a patch that creates a variant of > map that can be used in the way indicated below. See > http://issues.apache.org/jira/browse/SOLR-1871 > > On Wed, Apr 7, 2010 at 3:41 PM, Chris Harris wrote: > >> Option 1. Use map >> >> The most obvious way to do this would be to wrap the reference to >> mydatefield inside a map, like this: >> >> recip(ms(NOW,map(mydatefield,0,0,ms(NOW)),3.16e-11,1,1)) >> >> However, this throws an exception because map is hard-coded to take >> float constants, rather than arbitrary subqueries. >
Re: Handling missing date fields in a date-oriented function query
On Fri, Apr 16, 2010 at 4:42 PM, Chris Harris wrote: > The docs dated exactly *at* the epoch, though, are trouble, because I > can't tell those docs apart from the undated docs in my function > query. Neither can Solr currently... it's a Lucene FieldCache limitation. The other thing we can't do because of this limitation is sortMissingFirst, sortMissingLast like we can do with string based fields. I'm hopeful we'll be able to get this added to Lucene, and that will enable us to truly deprecate the string-based numeric fields. We'll also be able to add true defaults in function queries for documents without a value. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague
Re: Solr Index Lock Issue
Hi, What you are doing sounds fine. You don't need to commit while indexing, though, just commit/optimize at the end. I'm not saying this will solve your problem, but give it a try. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: "Sethi, Parampreet" > To: solr-user@lucene.apache.org > Sent: Fri, April 16, 2010 1:13:57 PM > Subject: Solr Index Lock Issue > > Hi All, We are facing the issue with the Solr server in the DMOZ data > migration. The Solr has 0 records when the migration starts and the data is > added into Solr in the batches of 2 records. The commit is called on > Solr after 20k records are processed. While commiting the data into > Solr, a lucene lock file is created in the /data/index > folder which is automatically released once the successful commit happens. > But after 4-5 batches, the lock file remains there and Solr just hangs and > does not add any new records. Some times the whole migration goes through > without any errors. Kindly let me know in case some setting needs to be > required on Solr side, which ensures that until the Solr commits the index, > the next set of records should not be added. Thanks, Param
Sum of return fields
Is it possible to add or subtract a value and return that field from the index in solr? Or do you have to do it programmatically afterwards? Thanks!
Re: Getting the length of a field?
Hm, I don't follow what you are looking to do, Oliver. You want to take the field length into account. when indexing? Or when searching? You want it to affect relevance? You can certainly get the length of the String (original value) in a field *after* you get your result set, but that's probably not what you are after, because that's just a length()-type call. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Oliver Beattie > To: solr-user > Sent: Fri, April 16, 2010 8:52:58 AM > Subject: Getting the length of a field? > > Hi there, I'm looking around to see if there's a function that will > return the length of a string in a field, but not seeing one. This is a field > whose data I store, but don't use for querying generally, but I want to be > able to take its length into account. Is this possible? Any help much > appreciated :) —Oliver
Re: Sum of return fields
Jim, like this: https://issues.apache.org/jira/browse/SOLR-1298 ? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Jim Adams > To: solr-user@lucene.apache.org > Sent: Fri, April 16, 2010 6:22:00 PM > Subject: Sum of return fields > > Is it possible to add or subtract a value and return that field from the > index in solr? Or do you have to do it > programmatically afterwards? Thanks!
Re: run in background
Better ye, use screent: http://www.manpagez.com/man/1/screen/ Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Walter Underwood > To: solr-user@lucene.apache.org > Sent: Thu, April 15, 2010 11:31:31 PM > Subject: Re: run in background > > nohup my_command & That will run "my_command" in the background and > "nohup" ignores the SIGHUP signal sent when you log out. Or, originally, > "hang > up" the modem. wunder On Apr 15, 2010, at 8:27 PM, Dan Yamins > wrote: > Hi, > > Normally I've been starting solr like > so: > > java -jar start.jar > > However, I need > to have this process executed over a remove ssh connection > that cannot > be blocking. I'd therefore like to execute the process "in the > > background", , somehow in a forked process, so that the command > returns > while having set solr to run in the child process. Is > there a simple way to > do this? > > Thanks, > > dan
Re: DIH dataimport.properties with
Hm, why not just go to the MySQL master then? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: Michael Tibben > To: solr-user@lucene.apache.org > Sent: Thu, April 15, 2010 10:15:14 PM > Subject: DIH dataimport.properties with > > Hi, I am using the DIH to import data from a mysql slave. However, the > slave sometimes runs behind the master. The delay is variable, most of the > time > it is in sync, but sometimes can run behind by a few minutes. This is a > problem, because DIH uses dataimport.properties to determine the > last_index_time > for delta updates. This last_index_time does not correspond to the position > of > the slave, and so documents are being missed. What I need to be able to > do is tell DIH what the last_index_time should be. Or alternatively, be able > to > specify another property in dataimport.properties, perhaps called > datasource_version or similar. Is this possible? I have > thought of a sneaky way to hack around the issue. Just before the delta > update > is run, I will switch the system time to the mysql slave's replication time. > The > system is used for nothing but solr master, so I think this should work OK. > Any > thoughts? Regards, Michael
Re: Sum of return fields
Yes, that's exactly it. Looks like it is going in to 1.5...hmmm...guess I'll have to do something programmatically instead as I'm not there yet. On Fri, Apr 16, 2010 at 4:24 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Jim, like this: > > https://issues.apache.org/jira/browse/SOLR-1298 ? > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: Jim Adams > > To: solr-user@lucene.apache.org > > Sent: Fri, April 16, 2010 6:22:00 PM > > Subject: Sum of return fields > > > > Is it possible to add or subtract a value and return that field from > the > > index in solr? Or do you have to do it > > programmatically > afterwards? > > Thanks! >
Re: StreamingUpdateSolrServer hangs
I experienced the hang described with the Solr 1.4.0 build. Yonik - I also thought the streaming updater was blocking on commits but updates never resumed. To be honest I was in a bit of a rush to meet a deadline so after spending a day or so tinkering I bailed out and just wrote a component by hand. I have not tried to reproduce this using the current trunk. I was using the 32-bit Sun JRE on a Red Hat EL 5 HP server. I'm not sure if the following enriches this thread, but I'll include it anyways: write a document generator and start adding a ton of 'em to a Solr server instance using the streaming updater. You *should* experience the hang. HTH, Rich On Fri, Apr 16, 2010 at 1:34 PM, Sascha Szott wrote: > Hi Yonik, > > thanks for your fast reply. > > > Yonik Seeley wrote: > >> Thanks for the report Sascha. >> So after the hang, it never recovers? Some amount of hanging could be >> visible if there was a commit on the Solr server or something else to >> cause the solr requests to block for a while... but it should return >> to normal on it's own... >> > In my case the whole application hangs and never recovers (CPU utilization > goes down to near 0%). Interestingly, the problem reproducibly occurs only > if SUSS is created with *more than 2* threads. > > > Looking at the stack trace, it looks like threads are blocked waiting >> to get an http connection. >> > I forgot to mention that my index app has exclusive access to the Solr > instance. Therefore, concurrent searches against the same Solr instance > while indexing are excluded. > > > I'm traveling all next week, but I'll open a JIRA issue for this now. >> > Thank you! > > > Anything that would help us reproduce this is much appreciated. >> > Are there any other who have experienced the same problem? > > -Sascha > > > >> On Fri, Apr 16, 2010 at 8:57 AM, Sascha Szott wrote: >> >>> Hi Yonik, >>> >>> Yonik Seeley wrote: >>> Stephen, were you running stock Solr 1.4, or did you apply any of the SolrJ patches? I'm trying to figure out if anyone still has any problems, or if this was fixed with SOLR-1711: >>> >>> I'm using the latest trunk version (rev. 934846) and constantly running >>> into >>> the same problem. I'm using StreamingUpdateSolrServer with 3 treads and a >>> queue size of 20 (not really knowing if this configuration is optimal). >>> My >>> multi-threaded application indexes 200k data items (bibliographic >>> metadata >>> in Dublin Core format) and constantly hangs after running for some time. >>> >>> Below you can find the thread dump of one of my index threads (after the >>> app >>> hangs all dumps are the same) >>> >>> "thread 19" prio=10 tid=0x7fe8c0415800 nid=0x277d waiting on >>> condition >>> [0x42d05000] >>> java.lang.Thread.State: WAITING (parking) >>>at sun.misc.Unsafe.park(Native Method) >>>- parking to wait for<0x7fe8cdcb7598> (a >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) >>>at >>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>at >>> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) >>>at >>> >>> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254) >>>at >>> >>> org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:216) >>>at >>> >>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) >>>at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64) >>>at >>> >>> de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:29) >>>at >>> >>> de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:10) >>>at >>> >>> de.kobv.ked.index.AbstractIndexThread.addIndexDocument(AbstractIndexThread.java:59) >>>at de.kobv.ked.rss.RssThread.indiziere(RssThread.java:30) >>>at de.kobv.ked.rss.RssThread.run(RssThread.java:58) >>> >>> >>> >>> and of the three SUSS threads: >>> >>> "pool-1-thread-3" prio=10 tid=0x7fe8c7b7f000 nid=0x2780 in >>> Object.wait() >>> [0x409ac000] >>> java.lang.Thread.State: WAITING (on object monitor) >>>at java.lang.Object.wait(Native Method) >>>- waiting on<0x7fe8cdcb6f10> (a >>> >>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) >>>at >>> >>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) >>>- locked<0x7fe8cdcb6f10> (a >>> >>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) >>>at >>> >>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) >>>at >>> >>> org.apache.commons.httpclient.HttpMethodDirector.e
admin-extra file in multicore
Hi, It looks like Im trying to do the same thing in this open JIRA here ... https://issues.apache.org/jira/browse/SOLR-975 I noticed in index.jsp it has a reference to: <% // a quick hack to get rid of get-file.jsp -- note this still spits out invalid HTML out.write( org.apache.solr.handler.admin.ShowFileRequestHandler.getFileContents( "admin-extra.html" ) ); %> Instead of resolving with the core.getName() path ... Was trying to avoid building a custom solr.war for this project is there another quick hack to include content for admin backend or is patching the only way? Thanks. - Jon
Re: bug using distributed search, highlighting and q.alt
Marc - Mind creating a ticket in JIRA and attaching our patch? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: Marc Sturlese > To: solr-user@lucene.apache.org > Sent: Thu, April 15, 2010 1:30:22 PM > Subject: bug using distributed search, highlighting and q.alt > > I have noticed when using q.alt even if hl=true highlights are not > returned. When using distributed search, q.alt and hl, > HighlightComponent.java finishStage expects the highlighting NamedList of > each shard (if hl=true) but it will never be returned. It will end up with a > NullPointerExcepion. I have temporally solved it checking that highlight > NamedList is always returned for each shard. If it's not the case, highlights > are not added to the response: @Override public void > finishStage(ResponseBuilder rb) { boolean hasHighlighting = > true ; if (rb.doHighlights && rb.stage == > ResponseBuilder.STAGE_GET_FIELDS) { > Map.Entry[] arr = > new NamedList.NamedListEntry[rb.resultIds.size()]; > // TODO: make a generic routine to do automatic merging of id > keyed data for (ShardRequest sreq : rb.finished) > { if ((sreq.purpose & > ShardRequest.PURPOSE_GET_HIGHLIGHTS) == 0) continue; > for (ShardResponse srsp : sreq.responses) { > NamedList hl > = (NamedList)srsp.getSolrResponse().getResponse().get("highlighting"); > if(hl != null) { > for (int i=0; iString id = hl.getName(i); > ShardDoc sdoc = rb.resultIds.get(id); >int idx = sdoc.positionInResponse; >arr[idx] = new NamedList.NamedListEntry(id, > hl.getVal(i)); } > } else { > hasHighlighting = false; } > } } // > remove nulls in case not all docs were able to be retrieved > if(hasHighlighting) { > rb.rsp.add("highlighting", removeNulls(new SimpleOrderedMap(arr))); > } } } -- View this message in > context: > href="http://n3.nabble.com/bug-using-distributed-search-highlighting-and-q-alt-tp721797p721797.html"; > > target=_blank > >http://n3.nabble.com/bug-using-distributed-search-highlighting-and-q-alt-tp721797p721797.html Sent > from the Solr - User mailing list archive at Nabble.com.
Re: Supporting multiple index / query analyzer stacks
Gert, You could: * run 1 Solr instance with N cores. Each core would have a different flavour/stack of otherwise the same schema * run 1 Solr instance with 1 core and in it N copies of each fiel, each copy with its flavour/stack * run N Solr instances, each with a different flavour/stack of otherwise the same schema If I had to do this, I'd go with the first option - it's the last management, not super resource hungry, and I each stack would be cleanly and truly separate. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: "Villemos, Gert" > To: solr-user@lucene.apache.org > Sent: Thu, April 15, 2010 5:19:52 AM > Subject: Supporting multiple index / query analyzer stacks > > Having developed a system based on SOLr, we are now optimizing the ranking of > the search results to give the user a better search experience. > We would like to create multiple index / query analyzer stacks in > the SOLr configuration to test how this affects the results. We would > index the same text field with all stacks and then at runtime allow the > user to select the stack to be used to execute the search. He can > thus perform the same search in for example 5 ways, and tell us which > search stack gave him the best set of results. How can we do > this? We were thinking along the > lines: *In the schema.xml define the different index / > query stacks for different field types ("text_stack1", "text_stack2", > "text_stack3",...). *Create a field of each type > ("", " name="text_s2" type="text_stack2" ...>", ...). * > Create a copy field definition, copying the same text into the five different > fields ("", " source="text_in" dest="text_s2">", ...). Is this the smart > way of doing it? Is there a better way? > Thanks, Gert. Please help Logica to respect the > environment by not printing this email / Pour contribuer comme Logica au > respect de l'environnement, merci de ne pas imprimer ce mail / Bitte > drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die > Umwelt > zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao > imprimindo este correio electronico. This e-mail and any > attachment is for authorised use by the intended recipient(s) only. It may > contain proprietary material, confidential information and/or be subject to > legal privilege. It should not be copied, disclosed to, retained or used by, > any > other party. If you are not an intended recipient then please promptly delete > this e-mail and any attachment and all copies and inform the sender. Thank > you.
Re: DIH
Oops, haven't checked that. The Wiki page generally marks new stuff with a "Solr 1.5" marker. On Wed, Apr 14, 2010 at 9:30 PM, Sandhya Agarwal wrote: > Thanks a lot, Lance. > > So, are these part of solr 1.4 release ? > > -Original Message- > From: Lance Norskog [mailto:goks...@gmail.com] > Sent: Thursday, April 15, 2010 9:53 AM > To: solr-user@lucene.apache.org > Subject: Re: DIH > > FileListEntityProcessor -> BinFileDataSource -> TikaEntityProcessor (I think) > > FLEP walks the directory and supplies a separate record per file. > BFDS pulls the file and supplies it to TikaEntityProcessor. > > BinFileDataSource is not documented, but you need it for binary data > streams like PDF & Word. For text files, use FileDataSource. > > On 4/14/10, Sandhya Agarwal wrote: >> Hello, >> >> We want to design a solution where we have one polling directory (data >> source directory) containing the xml files, of all data that must be >> indexed. These XML files contain a reference to the content file. So, we >> need another datasource that must be created for the content files. Could >> somebody please tell me what is the best way to get this working using the >> DIH / tika processor. >> >> Thanks, >> Sandhya >> >> >> > > > -- > Lance Norskog > goks...@gmail.com > -- Lance Norskog goks...@gmail.com
Re: SOLR Exact match problem - Punctuations, double quotes etc.
Thanks a lot Erick, I just used this "solr.PatternReplaceFilterFactory" in my field and the problem is solved. Thanks -- View this message in context: http://n3.nabble.com/SOLR-Exact-match-problem-Punctuations-double-quotes-etc-tp720807p725630.html Sent from the Solr - User mailing list archive at Nabble.com.