AW: AW: Searching for empty fields possible?
>> I'm not sure, theoretically fields with a null value >> (php-side) should end >> up not having the field. But then again i don't think it's >> relevant just >> yet. What bugs me is that if I add the -puid:[* TO *], all >> results for >> puid:[0 TO *] disappear, even though I am using "OR". > >- operator does not work with OR operator as you think. >Your query can be re-written as (puid:[0 TO *] OR (*:* -puid:[* TO *])) > >This new query satisfies your needs? And more importantly does type="integer" supports correct numeric range queries? In Solr 1.4.0 range queries work >correctly with type="tint". Strangely enough when I rewrote my query to ((puid:[0 TO *]) OR (-puid:[* TO *])) I did actually get results. Weather they were correct I currently cannot verify properly since my index does not actually contain null values for the column. I will however check out if your query gets me any different results :) Speaking of your query, I don't quite understand what the *:* does there and how it gets parsed. Best Jan
Re: StreamingUpdateSolrServer seems to hang on indexing big batches
2010/1/26 Jake Brownell : > I swapped our indexing process over to the streaming update server, but now > I'm seeing places where our indexing code adds several documents, but > eventually hangs. It hangs just before the completion message, which comes > directly after sending to solr. I found this issue in jira > > https://issues.apache.org/jira/browse/SOLR-1711 > > which may be what I'm seeing. If this is indeed what we're running up against > is there any best practice to work around it? I experience this too I think. My indexing script has been running all night and has accomplished nothing. I see lots of disk activity though, which is weird. To me it doesn't look like the patch is added to version control, so you need to apply it to your own svn checkout of solrj. /Tim
Re: Solr wiki link broken
All seems well now. The wiki does have its flakey moments though. Erik On Jan 26, 2010, at 1:23 AM, Teruhiko Kurosaka wrote: In http://lucene.apache.org/solr/ the wiki tab and "Docs (wiki)" hyper text in the side bar text after expansion are the link to http://wiki.apache.org/solr But the wiki site seems to be broken. The above link took me to a generic help page of the Wiki system. What's going on? Did I just hit the site in a maintenance time? Kuro
Re: Solr wiki link broken
Hi, you might want to try the link called Frontpage on the generic wiki page. But well, this seems to be kind of broken for some locales. Regards, Sven --On Dienstag, 26. Januar 2010 01:23 -0500 Teruhiko Kurosaka wrote: In http://lucene.apache.org/solr/ the wiki tab and "Docs (wiki)" hyper text in the side bar text after expansion are the link to http://wiki.apache.org/solr But the wiki site seems to be broken. The above link took me to a generic help page of the Wiki system. What's going on? Did I just hit the site in a maintenance time? Kuro
Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource
Hi Shah, I am assuming you are talking about the integration of SOLR-1358, i am very interested in this feature as well. Did you get it to work ? Is there a snapshot build available for this somewhere or do i have to build solr from source myself ? Thanks, Jorg On Mon, Jan 25, 2010 at 6:27 PM, Shah, Nirmal wrote: > Hi, > > > > I am fairly new to Solr and would like to use the DIH to pull rich text > files (pdfs, etc) from BLOB fields in my database. > > > > There was a suggestion made to use the FieldReaderDataSource with the > recently commited TikaEntityProcessor. Has anyone accomplished this? > > This is my configuration, and the resulting error - I'm not sure if I'm > using the FieldReaderDataSource correctly. If anyone could shed light > on whether I am going the right direction or not, it would be > appreciated. > > > > ---Data-config.xml: > > > > > >url="jdbc:oracle:thin:un/p...@host:1521:sid" /> > > > > > > dataField="attach.attachment" format="text"> > > > > > > > > > > > > > > > > -Debug error: > > > > > > 0 > > 203 > > > > > > > > testdb-data-config.xml > > > > > > full-import > > debug > > > > > > > > > > select id as name, attachment from testtable2 > > 0:0:0.32 > > --- row #1- > > java.math.BigDecimal:2 > > oracle.sql.BLOB:oracle.sql.b...@1c8e807 > > - > > > > > > org.apache.solr.handler.dataimport.DataImportHandlerException: No > dataSource :f1 available for entity :253433571801723 Processing Document > # 1 > >at > org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da > taImporter.java:279) > >at > org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl > .java:93) > >at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit > yProcessor.java:97) > >at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity > ProcessorWrapper.java:237) > >at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j > ava:357) > >at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j > ava:383) > >at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java > :242) > >at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 > 0) > >at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte > r.java:331) > >at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java > :389) > >at > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D > ataImportHandler.java:203) > >at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB > ase.java:131) > >at > org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > >at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja > va:338) > >at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j > ava:241) > >at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan > dler.java:1089) > >at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > >at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2 > 16) > >at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > >at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > >at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > >at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler > Collection.java:211) > >at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav > a:114) > >at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > >at org.mortbay.jetty.Server.handle(Server.java:285) > >at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > >at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne > ction.java:821) > >at > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) > >at > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > >at > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > >at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav > a:226) > >at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja > va:442) > > > > Thanks, > > Nirmal > >
Need hardware recommendation
I am trying to do the following: Index 6 Million database records( SQL Server 2008). Full index daily. Differential every 15 minutes Index 2 Million rich documents. Full index weekly. Differential every 15 minutes Search queries: 1 per minute 20 cores I am looking for hardware recommendations. Any advice/recommendation will be appreciated. -Jayesh Wadhwani
Re: StreamingUpdateSolrServer seems to hang on indexing big batches
<<< My indexing script has been running all night and has accomplished nothing. I see lots of disk activity though, which is weird.>>> One explanation would be that you're memory-starved and the disk activity you see is thrashing. How much memory do you allocate to your JVM? A further indication that this is where you should start looking would be if your CPU usage is very low at the same time. Erick 2010/1/26 Tim Terlegård > 2010/1/26 Jake Brownell : > > > I swapped our indexing process over to the streaming update server, but > now I'm seeing places where our indexing code adds several documents, but > eventually hangs. It hangs just before the completion message, which comes > directly after sending to solr. I found this issue in jira > > > > https://issues.apache.org/jira/browse/SOLR-1711 > > > > which may be what I'm seeing. If this is indeed what we're running up > against is there any best practice to work around it? > > I experience this too I think. My indexing script has been running all > night and has accomplished nothing. I see lots of disk activity > though, which is weird. > > To me it doesn't look like the patch is added to version control, so > you need to apply it to your own svn checkout of solrj. > > /Tim >
Re: Invalid CRLF - StreamingUpdateSolrServer ?
I've patched the solrj release(tag) 1.4 with SOLR-1595, it's online for about two weeks now and It's working just fine. Thanks a lot. Patrick. P.S.: It's a pity there is no plan for a 1.4.1 release Yonik Seeley a écrit : It could be this bug, fixed in trunk: * SOLR-1595: StreamingUpdateSolrServer used the platform default character set when streaming updates, rather than using UTF-8 as the HTTP headers indicated, leading to an encoding mismatch. (hossman, yonik) Could you try a recent nightly build (or build your own from trunk) and see if it fixes it? -Yonik http://www.lucidimagination.com On Thu, Dec 31, 2009 at 5:07 AM, Patrick Sauts wrote: I'm using solr 1.4 on tomcat 5.0.28, with client StreamingUpdateSolrServer with 10threads and xml communication via Post method. Is there a way to avoid this error (data lost)? And is StreamingUpdateSolrServer reliable ? GRAVE: org.apache.solr.common.SolrException: Invalid CRLF at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) Caused by: com.ctc.wstx.exc.WstxIOException: Invalid CRLF
RE: Solr vs. Compass
Ultimately... You're right, to some extent, the transaction synchronisation isn't ideal for sheer throughput if you many small transactions (as Lucene benefits from batching documents when you index...). However, the subindex feature gives you decidedly more throughput since the locking is at the subindex level. >> It is just blatant advertisement, trick; even JavaDocs remain unchanged... Such sneaky developers While I suspect its changed a bit since you last looked, I only ever used the local tx synch support, and not terribly interested in arguing the point... -N -Original Message- From: Funtick [mailto:f...@efendi.ca] Sent: 26 January 2010 02:44 To: solr-user@lucene.apache.org Subject: RE: Solr vs. Compass Minutello, Nick wrote: > > Maybe spend some time playing with Compass rather than speculating ;) > I spent few weeks by studying Compass source code, it was three years ago, and Compass docs (3 years ago) were saying the same as now: "Compass::Core provides support for two phase commits transactions (read_committed and serializable), implemented on top of Lucene index segmentations. The implementation provides fast commits (faster than Lucene), though they do require the concept of Optimizers that will keep the index at bay. Compass::Core comes with support for Local and JTA transactions, and Compass::Spring comes with Spring transaction synchronization. When only adding data to the index, Compass comes with the batch_insert transaction, which is the same IndexWriter operation with the same usual suspects for controlling performance and memory. " It is just blatant advertisement, trick; even JavaDocs remain unchanged... Clever guys from Compass can re-apply transaction log to Lucene in case of server crash (for instance, server was 'killed' _before_ Lucene flushed new segment to disk). Internally, it is implemented as a background thread. Nothing says in docs "lucene is part of transaction"; I studied source - it is just 'speculating'. Minutello, Nick wrote: > > If it helps, on the project where I last used compass, we had what I > consider to be a small dataset - just a few million documents. Nothing > related to indexing/searching took more than a second or 2 - mostly it > was 10's or 100's of milliseconds. That app has been live almost 3 > years. > I did the same, and I was happy with Compass: I got Lucene-powered search without any development. But I got performance problems after few weeks... I needed about 300 TPS, and Compass-based approach didn't work. With SOLR, I have 4000 index updates per second. -Fuad http://www.tokenizer.org -- View this message in context: http://old.nabble.com/Solr-vs.-Compass-tp27259766p27317213.html Sent from the Solr - User mailing list archive at Nabble.com. === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===
Re: StreamingUpdateSolrServer seems to hang on indexing big batches
2010/1/26 Erick Erickson : > > My indexing script has been running all > > night and has accomplished nothing. I see lots of disk activity > > though, which is weird. > > > One explanation would be that you're memory-starved and > the disk activity you see is thrashing. How much memory > do you allocate to your JVM? A further indication that > this is where you should start looking would be if your > CPU usage is very low at the same time. CPU usage was very low. There were lots of free memory. I immediately thought solr caused the disk activity, but I might have been wrong, because the disk activity stopped after a while and the indexing still showed no progress. Does this thread dump reveal anything? It doesn't look like solr is doing much? /Tim example 1.5.0_22-147 Java HotSpot(TM) Server VM 20 23 3 64 pool-8-thread-1 WAITING 10430,6650ms 8602,0210ms at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841) at java.util.concurrent.DelayQueue.take(DelayQueue.java:131) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:533) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:526) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:613) 60 pool-7-thread-1 WAITING 132,0080ms 13,6950ms at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:359) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:613) 26 DestroyJavaVM RUNNABLE 1453,5030ms 1317,7670ms 25 Timer-2 TIMED_WAITING java.util.taskqu...@6a58c4 3,2500ms 0,8090ms at java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:509) at java.util.TimerThread.run(Timer.java:462) 24 pool-1-thread-1 WAITING 41,5590ms 39,1740ms at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:359) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:613) 22 Timer-1 TIMED_WAITING java.util.taskqu...@e9d100 96,9640ms 74,5150ms at java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:509) at java.util.TimerThread.run(Timer.java:462) 21 btpool0-9 - Acceptor0 SocketConnector @ 0.0.0.0:8983 RUNNABLE 26,0110ms 23,3400ms at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:450) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:97) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:516) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) 20 btpool0-8 TIMED_WAITING org.mortbay.thread.boundedthreadpool$poolthr...@7a17 734105,5810ms 727677,9460ms at java.lang.Object.wait(Native Method) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482) 19 btpool0-7 TIMED_WAITING org.mortbay.thread.boundedthreadpool$poolthr...@7414c8 798010,4300ms 785039,2820ms at java.lang.Object.wait(Native Method) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482) 18 btpool0-6 TIMED_WAITING org.mortbay.thread.boundedthreadpool$poolthr...@e5c339 719254,0510ms 710319,6850ms at java.lang.Object.wait(Native Method) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482) 17 btpool0-5 TIMED_WAITING org.mortbay.thread.boundedthreadpool$poolthr...@d38976 243756,7410ms 240759,1390ms at java.lang.Object.wait(Native Method) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482) 16 btpool0-4 TIMED_WAITING org.mortbay.thread.boundedthreadpool$poolthr...@ad97f5 501531,8820ms 496494,6760ms at java.lang.Object.wait(Native Method) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482)
RE: Solr vs. Compass
Hi, Well, I thought I would jump here as the creator of Compass (up until this point, the discussion was great and very objective). Compass is here for about 5/6 years now (man, how time passes). Concentrating on the transactional implementation it provides, there have been changes to it along the years. The funny thing about that, by the way, is that the first implementation based on Lucene 1.9 was a combination of Lucene NRT and how it handles segments in the latest Lucene version. I will try and focus on the latest implementation, which uses latest Lucene IndexWriter features. Lucene IndexWriter provides the ability to prepare a commit point, and them commit it. The idea is that most of the heavy operations and things that might go wrong are done on the prepare phase, with the commit basically just updating the segments file. In its nature, its very close to what databases do with their 2 phase commit implementation (though, admittedly, the second phase probably has higher chances of 2 phase success). What Compass does, with its transactional integration with other transactional mechanisms, like JTA, is the ability to act as an XA Resource, and use the IndexWriter prepare and commit within the appropriate XA resource phases. Ultimately thought, even XA is not 100% safe, for example, what happens when you have 5 resources, all gone through the prepare phase, and the 4th failed in the commit phase ... (simplified example, but proves the point). Another point, is how Compass handles transactions. Basically, it has what I call transaction processors. The read committed one provides just that, a read committed transactional isolation level (you do changes, you see them while within the transaction, other see them when you commit the transaction). It does come with its overhead compared with other paradigms of how to use Lucene, but it gives you other things that a lot of people find good. There are other transaction processors that work differently, each with its own use case (heavy indexing, non real time search, async indexing, and so on). At the end, its really hard to compare Compass to Solr. One evident difference is the fact that Solr is more geared to be a Server solution, while Compass at being more embeddable. There are difference in features that each provides, and each comes with its own benefits. I think the rest of the mails on this thread have already covered that very objectively. In any case, you, the user, should use the right tool for the job, if it happens to be either Compass or Solr, I wish you all the best (and luck) at succeeding in it. Shay Minutello, Nick wrote: > > > > Ultimately... You're right, to some extent, the transaction > synchronisation isn't ideal for sheer throughput if you many small > transactions (as Lucene benefits from batching documents when you > index...). However, the subindex feature gives you decidedly more > throughput since the locking is at the subindex level. > >>> It is just blatant advertisement, trick; even JavaDocs remain > unchanged... > Such sneaky developers > While I suspect its changed a bit since you last looked, I only ever > used the local tx synch support, and not terribly interested in arguing > the point... > > -N > > > -Original Message- > From: Funtick [mailto:f...@efendi.ca] > Sent: 26 January 2010 02:44 > To: solr-user@lucene.apache.org > Subject: RE: Solr vs. Compass > > > > Minutello, Nick wrote: >> >> Maybe spend some time playing with Compass rather than speculating ;) >> > > I spent few weeks by studying Compass source code, it was three years > ago, and Compass docs (3 years ago) were saying the same as now: > "Compass::Core provides support for two phase commits transactions > (read_committed and serializable), implemented on top of Lucene index > segmentations. The implementation provides fast commits (faster than > Lucene), though they do require the concept of Optimizers that will keep > the index at bay. Compass::Core comes with support for Local and JTA > transactions, and Compass::Spring comes with Spring transaction > synchronization. When only adding data to the index, Compass comes with > the batch_insert transaction, which is the same IndexWriter operation > with the same usual suspects for controlling performance and memory. " > > It is just blatant advertisement, trick; even JavaDocs remain > unchanged... > > > Clever guys from Compass can re-apply transaction log to Lucene in case > of server crash (for instance, server was 'killed' _before_ Lucene > flushed new segment to disk). > > Internally, it is implemented as a background thread. Nothing says in > docs "lucene is part of transaction"; I studied source - it is just > 'speculating'. > > > > > Minutello, Nick wrote: >> >> If it helps, on the project where I last used compass, we had what I >> consider to be a small dataset - just a few million documents. Nothing > >> related to indexing/searching took mor
Re: determine which value produced a hit in multivalued field type
Hi, SIREn [1] could provide you such information (return the value index in the multi-valued field). But actually, only a Lucene extension is available, and you'll have to modified a little bit the SIREn query operator to returns you the value position in the query results. [1] http://siren.sindice.com/ -- Renaud Delbru On 22/01/10 22:52, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] wrote: Hi, If I have a multiValued field type of text, and I put values [cat,dog,green,blue] in it. Is there a way to tell when I execute a query against that field for dog, that it was in the 1st element position for that multiValued field? Thanks! Tim
Re: StreamingUpdateSolrServer seems to hang on indexing big batches
I'll have to defer that one for now. 2010/1/26 Tim Terlegård > 2010/1/26 Erick Erickson : > > > My indexing script has been running all > > > night and has accomplished nothing. I see lots of disk activity > > > though, which is weird. > > > > > > One explanation would be that you're memory-starved and > > the disk activity you see is thrashing. How much memory > > do you allocate to your JVM? A further indication that > > this is where you should start looking would be if your > > CPU usage is very low at the same time. > > CPU usage was very low. There were lots of free memory. I immediately > thought solr caused the disk activity, but I might have been wrong, > because the disk activity stopped after a while and the indexing still > showed no progress. > > Does this thread dump reveal anything? It doesn't look like solr is doing > much? > > /Tim > > > > > example > > > 1.5.0_22-147 > Java HotSpot(TM) Server VM > > > 20 > 23 > 3 > > > > 64 > pool-8-thread-1 > WAITING > 10430,6650ms > 8602,0210ms > > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841) > > at java.util.concurrent.DelayQueue.take(DelayQueue.java:131) > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:533) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:526) > > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) > > at java.lang.Thread.run(Thread.java:613) > > > > 60 > pool-7-thread-1 > WAITING > 132,0080ms > 13,6950ms > > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841) > > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:359) > > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) > > at java.lang.Thread.run(Thread.java:613) > > > > 26 > DestroyJavaVM > RUNNABLE > 1453,5030ms > 1317,7670ms > > > > 25 > Timer-2 > TIMED_WAITING > java.util.taskqu...@6a58c4 > 3,2500ms > 0,8090ms > > at java.lang.Object.wait(Native Method) > at java.util.TimerThread.mainLoop(Timer.java:509) > at java.util.TimerThread.run(Timer.java:462) > > > > 24 > pool-1-thread-1 > WAITING > 41,5590ms > 39,1740ms > > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841) > > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:359) > > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) > > at java.lang.Thread.run(Thread.java:613) > > > > 22 > Timer-1 > TIMED_WAITING > java.util.taskqu...@e9d100 > 96,9640ms > 74,5150ms > > at java.lang.Object.wait(Native Method) > at java.util.TimerThread.mainLoop(Timer.java:509) > at java.util.TimerThread.run(Timer.java:462) > > > > 21 > btpool0-9 - Acceptor0 SocketConnector @ 0.0.0.0:8983 > RUNNABLE > > 26,0110ms > 23,3400ms > > at java.net.PlainSocketImpl.socketAccept(Native Method) > at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) > at java.net.ServerSocket.implAccept(ServerSocket.java:450) > at java.net.ServerSocket.accept(ServerSocket.java:421) > at > org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:97) > > at > org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:516) > > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > > > > > 20 > btpool0-8 > TIMED_WAITING > org.mortbay.thread.boundedthreadpool$poolthr...@7a17 > 734105,5810ms > 727677,9460ms > > at java.lang.Object.wait(Native Method) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482) > > > > > 19 > btpool0-7 > TIMED_WAITING > org.mortbay.thread.boundedthreadpool$poolthr...@7414c8 > 798010,4300ms > 785039,2820ms > > at java.lang.Object.wait(Native Method) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482) > > > > > 18 > btpool0-6 > TIMED_WAITING > org.mortbay.thread.boundedthreadpool$poolthr...@e5c339 > 719254,0510ms > 710319,6850ms > > at java.lang.Object.wait(Native Method) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482) > > > > >
Re: Lock problems: Lock obtain timed out
We traced one of the lock files, and it had been around for 3 hours. A restart removed it - but is 3 hours normal for one of these locks? Ian. On Mon, Jan 25, 2010 at 4:14 PM, mike anderson wrote: > I am getting this exception as well, but disk space is not my problem. What > else can I do to debug this? The solr log doesn't appear to lend any other > clues.. > > Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990 > Jan 25, 2010 4:02:22 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain > timed > out: NativeFSLock@ > /solr8984/index/lucene-98c1cb272eb9e828b1357f68112231e0-write.lock > at org.apache.lucene.store.Lock.obtain(Lock.java:85) > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402) > at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190) > at > > org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98) > at > > org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) > at > > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220) > at > > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) > at > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > > > Should I consider changing the lock timeout settings (currently set to > defaults)? If so, I'm not sure what to base these values on. > > Thanks in advance, > mike > > > On Wed, Nov 4, 2009 at 8:27 PM, Lance Norskog wrote: > > > This will not ever work reliably. You should have 2x total disk space > > for the index. Optimize, for one, requires this. > > > > On Wed, Nov 4, 2009 at 6:37 AM, Jérôme Etévé > > wrote: > > > Hi, > > > > > > It seems this situation is caused by some No space left on device > > exeptions: > > > SEVERE: java.io.IOException: No space left on device > > >at java.io.RandomAccessFile.writeBytes(Native Method) > > >at java.io.RandomAccessFile.write(RandomAccessFile.java:466) > > >at > > > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192) > > >at > > > org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) > > > > > > > > > I'd better try to set my maxMergeDocs and mergeFactor to more > > > adequates values for my app (I'm indexing ~15 Gb of data on 20Gb > > > device, so I guess there's problem when solr tries to merge the index > > > bits being build. > > > > > > At the moment, they are set to 100 and > > > 2147483647 > > > > > > Jerome. > > > > > > -- > > > Jerome Eteve. > > > http://www.eteve.net > > > jer...@eteve.net > > > > > > > > > > > -- > > Lance Norskog > > goks...@gmail.com > > >
Re: Solr wiki link broken
Hi Erik, one observation from me who is using the wiki from a browser living in a non-US locale: I usually get the standard wiki frontpage (in German) and not (!) the Solr-Frontpage I get, if I use a US locale (or click on the link FrontPage). B.t.w I know that this does not strictly belong to this list. Cheers, Sven --On Dienstag, 26. Januar 2010 04:05 -0500 Erik Hatcher wrote: All seems well now. The wiki does have its flakey moments though. Erik On Jan 26, 2010, at 1:23 AM, Teruhiko Kurosaka wrote: In http://lucene.apache.org/solr/ the wiki tab and "Docs (wiki)" hyper text in the side bar text after expansion are the link to http://wiki.apache.org/solr But the wiki site seems to be broken. The above link took me to a generic help page of the Wiki system. What's going on? Did I just hit the site in a maintenance time? Kuro
solr1.5
Hi quick question: Is there any release date scheduled for solr 1.5 with all the wonderful patches (StreamingUpdateSolrServer etc ...)? Thank you !
Behaviour Indicitive of Throttling
I've been working on benchmarking our solr response times in relation to the a variable number of concurrent queries. With maxThreads=150 - I've tried running between 20-100 queries concurrently against our solr instance and have noted that for all n-way (>20) queries I'm finding that performance flatlines at 20-30 requests/second. We've tried tuning caches and while part of the poor performance is down to poor query formulation - I find the lack of seeing either performance improvement or degradation as being indicative of some kind of throttling. Not sure if this is the case or not however as a novice in these realms I would appreciate some guidance as to what I should be looking at and where we might be able to tune/investigate? We've ruled out disk contention and network latency. Useful metrics: maxThreads:150 filterCache Size; 16384 queryResultCache size: 16384 documentCache size: 10502 I'm running Solr/Lucene version: Solr Specification Version: 1.3.0.2009.08.19.15.54.27Solr Implementation Version: 1.4-dev ${svnversion} - rafiq - 2009-08-19 15:54:27Lucene Specification Version: 2.9-dev Would be grateful for any pointers and can furnish more details. -- Raf Gemmail Software Engineer www.tmdr.com 0207 3489 912 Extension: 5112 Raf Gemmail Senior Developer www.tmdr.com d: 0207 3489 912 t: 0845 468 0568 f: 0845 468 0868 m: Beaumont House, Kensington Village, Avonmore Road, London, W14 8TS - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This message is sent in confidence for the addressee only. It may contain privileged information. The contents are not to be disclosed to anyone other than the addressee. Unauthorised recipients are requested to preserve this confidentiality and to advise us of any errors in transmission. Thank you. Trinity Mirror Digital Recruitment ltd is registered in England & Wales. Registered office: One Canada Square, Canary Wharf, London E14 5AP. Registered No: 01904765.
Re: Behaviour Indicitive of Throttling
Have you tried watching the threads in a monitoring program like VisualVM? We have found that at a certain point the solr software starts locking in the synchronous calls including logging. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 > From: Raf Gemmail > Reply-To: > Date: Tue, 26 Jan 2010 15:25:51 + > To: > Subject: Behaviour Indicitive of Throttling > > I've been working on benchmarking our solr response times in relation to > the a variable number of concurrent queries. With maxThreads=150 - I've > tried running between 20-100 queries concurrently against our solr > instance and have noted that for all n-way (>20) queries I'm finding > that performance flatlines at 20-30 requests/second. > > We've tried tuning caches and while part of the poor performance is > down to poor query formulation - I find the lack of seeing either > performance improvement or degradation as being indicative of some kind > of throttling. > > Not sure if this is the case or not however as a novice in these realms > I would appreciate some guidance as to what I should be looking at and > where we might be able to tune/investigate? > > We've ruled out disk contention and network latency. > > Useful metrics: > maxThreads:150 > filterCache Size; 16384 > queryResultCache size: 16384 > documentCache size: 10502 > > I'm running Solr/Lucene version: > Solr Specification Version: 1.3.0.2009.08.19.15.54.27Solr Implementation > Version: 1.4-dev ${svnversion} - rafiq - 2009-08-19 15:54:27Lucene > Specification Version: 2.9-dev > > Would be grateful for any pointers and can furnish more details. > > -- > Raf Gemmail > > Software Engineer > www.tmdr.com > 0207 3489 912 > Extension: 5112 > Raf Gemmail > Senior Developer > www.tmdr.com > d: 0207 3489 912 > t: 0845 468 0568 > f: 0845 468 0868 > m: > Beaumont House, Kensington Village, Avonmore Road, London, W14 8TS > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - > This message is sent in confidence for the addressee only. It may contain > privileged > information. The contents are not to be disclosed to anyone other than the > addressee. > Unauthorised recipients are requested to preserve this confidentiality and to > advise > us of any errors in transmission. Thank you. > Trinity Mirror Digital Recruitment ltd is registered in England & Wales. > Registered office: One Canada Square, Canary Wharf, London E14 5AP. > Registered No: 01904765.
replication setup
Hi I have set up replication following the wiki I downloaded the latest apache-solr-1.4 release and exploded it in 2 different directories I modified both solrconfig.xml for the master & the slave as described on the wiki page In both sirectory, I started solr from the example directory" example on the master: java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8983 -DSTOP.PORT=8078 -DSTOP.KEY=stop.now -jar start.jar and on the slave java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8982 -DSTOP.PORT=8077 -DSTOP.KEY=stop.now -jar start.jar I can see core0 and core 1 when I open the solr url However, I don't see a replication link and the following url solr url / replication returns a 404 error I must be doing something wrong. I would appreciate any help ! thanks a lot matt
Solr wiki link broken
In http://lucene.apache.org/solr/ the wiki tab and "Docs (wiki)" hyper text in the side bar text after expansion are the link to http://wiki.apache.org/solr But the wiki site seems to be broken. The above link took me to a generic help page of the Wiki system. What's going on? Did I just hit the site in a maintenance time? Kuro
RE: Solr wiki link broken
I'm sorry. Please ignore this duplicate posting. From: Teruhiko Kurosaka Sent: Tuesday, January 26, 2010 8:32 AM To: solr-user@lucene.apache.org Subject: Solr wiki link broken In http://lucene.apache.org/solr/ the wiki tab and "Docs (wiki)" hyper text in the side bar text after expansion are the link to http://wiki.apache.org/solr But the wiki site seems to be broken. The above link took me to a generic help page of the Wiki system. What's going on? Did I just hit the site in a maintenance time? Kuro
RE: Solr wiki link broken
Sven, You are right. The wiki can't be read if the preferred language is not English. The wiki system seems to implement or be configured to use a wrong way of choosing its locale. Erik, let me know if I can help solving this. Kuro From: Sven Maurmann [sven.maurm...@kippdata.de] Sent: Tuesday, January 26, 2010 7:24 AM To: solr-user@lucene.apache.org Subject: Re: Solr wiki link broken Hi Erik, one observation from me who is using the wiki from a browser living in a non-US locale: I usually get the standard wiki frontpage (in German) and not (!) the Solr-Frontpage I get, if I use a US locale (or click on the link FrontPage). B.t.w I know that this does not strictly belong to this list. Cheers, Sven --On Dienstag, 26. Januar 2010 04:05 -0500 Erik Hatcher wrote: > All seems well now. The wiki does have its flakey moments though. > > Erik > > On Jan 26, 2010, at 1:23 AM, Teruhiko Kurosaka wrote: > >> In >> http://lucene.apache.org/solr/ >> the wiki tab and "Docs (wiki)" hyper text in the side bar text after >> expansion are the link to >> http://wiki.apache.org/solr >> >> But the wiki site seems to be broken. The above link took me to a >> generic help page of the Wiki system. >> >> What's going on? Did I just hit the site in a maintenance time? >> >> Kuro >
Re: Specify logging options from command line in Solr 1.4?
On Mon, Jan 18, 2010 at 19:15, Mark Miller wrote: > Mat Brown wrote: >> Hi all, >> >> Wondering if anyone can point me at a simple way to specify basic >> logging options (log level, log file location) when starting the Solr >> example jar from the command line. >> >> As a bit of background, I maintain a Ruby library for Solr called >> Sunspot that ships with a Solr installation for ease of use. Sunspot >> includes a script for starting Solr with various options, including >> logging options. With Solr 1.3, I was able to write out a >> logging.properties file and then set the system property >> java.util.logging.config.file via the command line; this no longer >> seems to work with Solr 1.4. >> >> I understand that Solr 1.4 has moved to SLF4J, but I haven't been able >> to find a readily available answer to the above question in the SLF4J >> or Solr logging documentation. To be honest, I've always found logging >> in Java rather mystifying. >> >> Any help much appreciated! >> Mat >> > By default, even though Solr uses SLF4J, it will actually use the Java > Util logging Impl: > > http://wiki.apache.org/solr/SolrLogging > > So you just specify a util logging properties file on the sommand line with: > > -Djava.util.logging.config.file=myLoggingConfigFilePath > > An example being: > > handlers=java.util.logging.FileHandler, java.util.logging.ConsoleHandler > > # Default global logging level. > # Loggers and Handlers may override this level > .level=INFO > > java.util.logging.ConsoleHandler.level=INFO > java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter > > > # --- FileHandler --- > # Override of global logging level > java.util.logging.FileHandler.level=ALL > > # Naming style for the output file: > # (The output file is placed in the directory > # defined by the "user.home" System property.) > java.util.logging.FileHandler.pattern=%h/java%u.log > > # Limiting size of output file in bytes: > java.util.logging.FileHandler.limit=5 > > # Number of output files to cycle through, by appending an > # integer to the base file name: > java.util.logging.FileHandler.count=1 > > # Style of output (Simple or XML): > java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter > > > -- > - Mark > > http://www.lucidimagination.com > > > > Hey Mark, Thanks very much for this - using the java.util.logging properties does indeed work just fine. Cheers, Mat
Re: StreamingUpdateSolrServer seems to hang on indexing big batches
On Mon, Jan 25, 2010 at 7:27 PM, Jake Brownell wrote: > I swapped our indexing process over to the streaming update server, but now > I'm seeing places where our indexing code adds several documents, but > eventually hangs. It hangs just before the completion message, which comes > directly after sending to solr. I found this issue in jira > > https://issues.apache.org/jira/browse/SOLR-1711 I just reviewed and committed this patch, if you want to try solr-trunk. -Yonik http://www.lucidimagination.com
RE: Solr wiki link broken
One more comment on this. I can see this page http://wiki.apache.org/solr/SolrTomcat w/o a problem, for example. Or I can see this: http://wiki.apache.org/solr/FrontPage I think it's only the main page without actual page name http://wiki.apache.org/solr/ that is having the problem. So the quick fix to this is to avoid solr/ and use the solr/FrontPage link. Kuro
Mail config
Hi, I do not want to receive all the emails from this mail list, I only want to receive the answers to my questions, is this possible? If I am not mistaken when I unsubscribed I sent an email which did not reach the mail list at all (therefore there was of course no chance to get any replies). How can I send questions and receive the replies but not to receive all other posts? I am newbie for Solr and I doubt I can contribute much by answering to other posts. -- Best regards, Bogdan
To store or not to store serialized objects in solr
Hi, We currently are storing all of our data in sql database and use solr for indexing. We get a list of id's from solr and retrieve the data from the db. We are considering storing all the data in solr to simplify administration and remove any synchronisation and are considering the following: 1. storing the data in individual fields in solr (indexed=true, store=true) 2. storing the data in a serialized form in a binary field in solr (using google proto buffers or similar) and keep the rest of the solr fields as indexed=true, stored=*false*. 3. keep as is. data stored in db and just keep solr fields as indexed=true, stored=false Can anyone provide some advice in terms of performance of the different approaches. Are there any obvious pitfalls to option 1 and 2 that i need to be mindful of? I am thinking option 2 would be the fastest as it would be reading the data in one contiguous block. Will be doing some preformance test to verify this soon. FYI we are looking at 5-10M records, a serialised object is 500 to 1000 bytes and we index approx 20 fields. Thanks for any advice. andre
Query 2 Cats
Sorry of this is a poor Q but cant seem to get it to work. I have a field called cat setup so I can query against specific categories. It ok I search all or one but cant seem to make it search over multiples. ie q=string AND cat:name1 AND cat:name2 I have tried the following variations. cat:name1,name2 cat:name1+name2 I have also tried using & instead of AND with still same results. Hope you can help !! Thank you in advance
RE: DataImportHandler TikaEntityProcessor FieldReaderDataSource
Hi Jorg, This is working now. If you look at SOLR-1583 (http://issues.apache.org/jira/browse/SOLR-1583) you can see that an InputStream was needed from the DataSource for file and URL data sources. The same is true for the FieldReaderDataSource. I created a class, BinFieldReaderDataSource that returns the InputStream rather than a Reader of the BLOB. I am working off the trunk code from a few days ago which I checked out using tortoise svn and compiled using the ant that was in my eclipse plugin directory, a fairly painless process. I am somewhat new to open source development, so for now I have just copied the text of the java file and my xml config below. # BinFieldReaderDataSource.java public class BinFieldReaderDataSource extends DataSource { private static final Logger LOG = LoggerFactory .getLogger(FieldReaderDataSource.class); protected VariableResolver vr; protected String dataField; private String encoding; private EntityProcessorWrapper entityProcessor; public void init(Context context, Properties initProps) { dataField = context.getEntityAttribute("dataField"); encoding = context.getEntityAttribute("encoding"); entityProcessor = (EntityProcessorWrapper) context.getEntityProcessor(); /* no op */ } public InputStream getData(String query) { Object o = entityProcessor.getVariableResolver().resolve(dataField); if (o == null) { throw new DataImportHandlerException(SEVERE, "No field available for name : " + dataField); } if (o instanceof String) { throw new DataImportHandlerException(SEVERE, "Unsupported field type: String"); } else if (o instanceof Clob) { throw new DataImportHandlerException(SEVERE, "Unsupported field type: CLOB"); } else if (o instanceof Blob) { Blob blob = (Blob) o; try { // Most of the JDBC drivers have getBinaryStream defined as // public // so let us just check it Method m = blob.getClass().getDeclaredMethod("getBinaryStream"); if (Modifier.isPublic(m.getModifiers())) { return getInputStream(m, blob); } else { // force invoke m.setAccessible(true); return getInputStream(m, blob); } } catch (Exception e) { LOG.info("Unable to get data from BLOB"); return null; } } else { return null; } } static Reader readCharStream(Clob clob) { try { Method m = clob.getClass().getDeclaredMethod("getCharacterStream"); if (Modifier.isPublic(m.getModifiers())) { return (Reader) m.invoke(clob); } else { // force invoke m.setAccessible(true); return (Reader) m.invoke(clob); } } catch (Exception e) { wrapAndThrow(SEVERE, e, "Unable to get reader from clob"); return null;// unreachable } } private InputStream getInputStream(Method m, Blob blob) throws IllegalAccessException, InvocationTargetException, UnsupportedEncodingException { InputStream is = (InputStream) m.invoke(blob); return is; } public void close() { } } ## Tika-data-config.xml Nirmal Shah -Original Message- From: Jorg Heymans [mailto:jorg.heym...@gmail.com] Sent: Tuesday, January 26, 2010 3:43 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource Hi Shah, I am assuming you are talking about the integration of SOLR-1358, i am very interested in this feature as well. Did you get it to work ? Is there a snapshot build available for this somewhere or do i have to build solr from source myself ? Thanks, Jorg On Mon, Jan 25, 2010 at 6:27 PM, Shah, Nirmal wrote: > Hi, > > > > I am
Re: To store or not to store serialized objects in solr
Hello Andre, We have used this approach before. We did keep all our data in a RDBMS but added serialized objects to the index so we could simply query the record and display it as is, without any hassle and SQL connections. Although storing this data sounds a bit strange, it actually works well and keeps things a bit simpler. The performance of querying the index is the same (or with extremely tiny differences). However, it does take some additional disk space and transfer time for it to reach your application. On the other hand, performance would surely be weaker if you would transfer the same data (although in a not so verbose XML format) and need to connect and query a SQL server. Cheers, Andre Parodi said: > Hi, > > We currently are storing all of our data in sql database and use solr > for indexing. We get a list of id's from solr and retrieve the data from > the db. > > We are considering storing all the data in solr to simplify > administration and remove any synchronisation and are considering the > following: > > 1. storing the data in individual fields in solr (indexed=true, > store=true) 2. storing the data in a serialized form in a binary field > in solr (using google proto buffers or similar) and keep the rest of > the solr fields as indexed=true, stored=*false*. > 3. keep as is. data stored in db and just keep solr fields as > indexed=true, stored=false > > Can anyone provide some advice in terms of performance of the different > approaches. Are there any obvious pitfalls to option 1 and 2 that i need > to be mindful of? > > I am thinking option 2 would be the fastest as it would be reading the > data in one contiguous block. Will be doing some preformance test to > verify this soon. > > FYI we are looking at 5-10M records, a serialised object is 500 to 1000 > bytes and we index approx 20 fields. > > Thanks for any advice. > andre
Re: Query 2 Cats
Tell us more about the cat field. Is there one (and only one) value per document? Or are there multiple values per document? Because if there's only one cat value/doc, you want something like q=string AND (cat:name1 OR cat:name2) Erick On Tue, Jan 26, 2010 at 1:52 PM, Lee Smith wrote: > Sorry of this is a poor Q but cant seem to get it to work. > > I have a field called cat setup so I can query against specific categories. > > It ok I search all or one but cant seem to make it search over multiples. > > ie q=string AND cat:name1 AND cat:name2 > > I have tried the following variations. > > cat:name1,name2 > cat:name1+name2 > > I have also tried using & instead of AND with still same results. > > Hope you can help !! > > Thank you in advance > >
Re: Query 2 Cats
Try > q=string AND (cat:name1 OR cat:name2) On 26 Jan 2010, at 18:53, "Lee Smith" wrote: > Sorry of this is a poor Q but cant seem to get it to work. > > I have a field called cat setup so I can query against specific > categories. > > It ok I search all or one but cant seem to make it search over > multiples. > > ie q=string AND cat:name1 AND cat:name2 > > I have tried the following variations. > > cat:name1,name2 > cat:name1+name2 > > I have also tried using & instead of AND with still same results. > > Hope you can help !! > > Thank you in advance >
Re: Query 2 Cats
Thank you Dave, Eric Worked a charm On 26 Jan 2010, at 18:58, Dave Searle wrote: > Try > >> q=string AND (cat:name1 OR cat:name2) > > > On 26 Jan 2010, at 18:53, "Lee Smith" wrote: > >> Sorry of this is a poor Q but cant seem to get it to work. >> >> I have a field called cat setup so I can query against specific >> categories. >> >> It ok I search all or one but cant seem to make it search over >> multiples. >> >> ie q=string AND cat:name1 AND cat:name2 >> >> I have tried the following variations. >> >> cat:name1,name2 >> cat:name1+name2 >> >> I have also tried using & instead of AND with still same results. >> >> Hope you can help !! >> >> Thank you in advance >>
Basic questions about Solr cost in programming time
Hi, I hope this message is OK for this list. I'm looking into search solutions for an intranet site built with Drupal. Eventually we'd like to scale to enterprise search, which would include the Drupal site, a document repository, and Jive SBS (collaboration software). I'm interested in Lucene/Solr because of its scalability, faceted search and optimization features, and because it is free. Our problem is that we are a non-profit organization with only three very busy programmers/sys admins supporting our employees around the world. To help me argue for Solr in terms of total cost, I'm hoping that members of this list can share their insights about the following: * About how many hours of programming did it take you to set up your instance of Lucene/Solr (not counting time spent on optimization)? * Are there any disadvantages of going with a certified distribution rather than the standard distribution? Thanks and best regards, Jeff Jeff Crump jcr...@hq.mercycorps.org
Re: Basic questions about Solr cost in programming time
On Tue, Jan 26, 2010 at 3:00 PM, Jeff Crump wrote: > Hi, > I hope this message is OK for this list. > > I'm looking into search solutions for an intranet site built with Drupal. > Eventually we'd like to scale to enterprise search, which would include the > Drupal site, a document repository, and Jive SBS (collaboration software). > I'm interested in Lucene/Solr because of its scalability, faceted search > and > optimization features, and because it is free. Our problem is that we are a > non-profit organization with only three very busy programmers/sys admins > supporting our employees around the world. > > To help me argue for Solr in terms of total cost, I'm hoping that members > of > this list can share their insights about the following: > > * About how many hours of programming did it take you to set up your > instance of Lucene/Solr (not counting time spent on optimization)? > > For me this generally took 30 to 70 hours to create the entire search application depending on the features on the web application and the complexity of the site. > * Are there any disadvantages of going with a certified distribution rather > than the standard distribution? > > > The people at Lucid Imagination can probably provide a better answer for this. It is not really a disadvantage to go with the certified version but you may have to pay in order to get the certified distribution. However, you will get dedicated support if you happen to run into any issues or need technical assistance. If you use the standard version you can always get help from the mailing list if you have any issues. > Thanks and best regards, > Jeff > > Jeff Crump > jcr...@hq.mercycorps.org > > > > > > > > > > > -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
SOLR index file system size estimate
We wanted to estimate the file system size requirements for index. Although space very cheap, its not so here as we have to go through a process to add space to the file system. So we don't want to end up estimating less and get the process to kick in. Is there a estimate tool for index sizes that can give a number based on estimated size of each document? How much % should we add to the actual document size considering we do all kinds of analysis/filters on text? We are currently looking at only 70 documents each 20k size. But the number of documents will increase to more than 10K soon. We would like to request for some space keeping in mind about the future. Any help is appreciated. Thanks, Pavan.
RE: matching exact/whole phrase
Extending this thread. Is it safe to say in order to do exact matches the field should be a string. Let say for example i have two fields on is caption which is of type string and the other is regular text. So if i index caption as "my car is the best car in the world" it will be stored and i copy the caption to the text field. Since text has all anylysers defined so lets assume only the following words are indexed after stop words and other filters "my", "car","best","world" Now in my dismax handler if i have the qf defined as text field and run a phrase search on text field "my car is the best car in the world" i dont get back any results. looking with debugQuery=on this is the parsedQuery text:"my tire pressure warning light came my honda civic" This will not work since text was indexed by removing all stop words. But if i remove the double quotes it matches that document. Now if i add extra query field &qf=caption and then do a phrase search i get back that document since caption is of type string and it maintains all the stop words and other stuff. Is my assumption correct. After i get a response i will put some more questions. Thanks darniz Sandeep Shetty-2 wrote: > > That was the answer I was looking for, I will try that one out > > Thanks Daniel > > -Original Message- > From: Daniel Papasian [mailto:daniel.papas...@chronicle.com] > Sent: 01 April 2008 16:03 > To: solr-user@lucene.apache.org > Subject: Re: matching exact/whole phrase > > Sandeep Shetty wrote: >> Hi people, >> >> I am looking to provide exact phrase match, along with the full text >> search with solr. I want to achieve the same effect in solr rather >> than use a separate SQL query. I want to do the following as an >> example >> >> The indexed field has the text "car repair" (without the double >> quotes) for a document and I want this document to come in the >> search result only if someone searches for "car repair". The document >> should not show up for "repair" and "car" searches. >> >> Is it possible to do this type of exact phrase matching if needed >> with solr itself? > > It sounds like you want to do an exact string match, and not a text > match, so I don't think there's anything complex you'd need to do... > just store the field with "car repair" as type="string" and do all of > the literal searches you want. > > But if you are working off a field that contains something beyond the > exact match of what you want to search for, you'll just need to define a > new field type and use only the analysis filters that you need, and > you'll have to think more about what you need if that's the case. > > Daniel > > Sandeep Shetty > Technical Development Manager > > Touch Local > 89 Albert Embankment, London, SE1 7TP, UK > D: 020 7840 4335 > E: sandeep.she...@touchlocal.com > T: 020 7840 4300 > F: 020 7840 4301 > > This email is confidential and may also be privileged. If you are not the > intended recipient please notify us immediately by calling 020 7840 4300 > or email postmas...@touchlocal.com. You should not copy it or use it for > any purpose nor disclose its contents to any other person. Touch Local Ltd > cannot accept liability for statements made which are clearly the sender's > own and are not made on behalf of the firm. > Registered in England and Wales. Registration Number: 2885607 VAT Number: > GB896112114 > > Help to save some trees. Print e-mails only if you really need to. > > -- View this message in context: http://old.nabble.com/matching-exact-whole-phrase-tp16424969p27329651.html Sent from the Solr - User mailing list archive at Nabble.com.
How to Create dynamic field names using script transformers
Hi, I am trying to generate a dynamic fieldname using custom transformers but couldn't achieve the expected results. My requirement is that I do not want to hardcode some of field names used by SOLR for indexing, instead the field name should be generated using the data retreieved from a table. Any help on this regard is greatly appreciated. Thanks, Barani -- View this message in context: http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27329876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Create dynamic field names using script transformers
Barani - Give us some details of what you tried, what you expected to happen, and what actually happened. Erik On Jan 26, 2010, at 4:15 PM, JavaGuy84 wrote: Hi, I am trying to generate a dynamic fieldname using custom transformers but couldn't achieve the expected results. My requirement is that I do not want to hardcode some of field names used by SOLR for indexing, instead the field name should be generated using the data retreieved from a table. Any help on this regard is greatly appreciated. Thanks, Barani -- View this message in context: http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27329876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Create dynamic field names using script transformers
Hey Erik, Thanks a lot for your reply.. I am a newbie to SOLR ... I am just trying to use the example present in Apache WIKI to understand "how" the scriptTransformer works. I want to know how to pass the data from table.field to transformer and get back the data from transformer and set the value to any field. Basically I want a field like... and index this field so that users can search on this dynamic field and get the corresponding data also. Thanks, Barani Erik Hatcher-4 wrote: > > Barani - > > Give us some details of what you tried, what you expected to happen, > and what actually happened. > > Erik > > > On Jan 26, 2010, at 4:15 PM, JavaGuy84 wrote: > >> >> Hi, >> >> I am trying to generate a dynamic fieldname using custom >> transformers but >> couldn't achieve the expected results. >> >> My requirement is that I do not want to hardcode some of field names >> used by >> SOLR for indexing, instead the field name should be generated using >> the data >> retreieved from a table. >> >> Any help on this regard is greatly appreciated. >> >> Thanks, >> Barani >> -- >> View this message in context: >> http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27329876.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > -- View this message in context: http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27330330.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Create dynamic field names using script transformers
To add some more details, this is what I am trying to acheive... There are 2 fields present in a database table and I am trying to make those 2 fields as key value pair. Eg: Consider there are 2 fields associated with each other (Propertyid and propertyValue) I want the property id as field name and property value as its field value..something like... <111>Test<1> Thanks, Barani JavaGuy84 wrote: > > Hi, > > I am trying to generate a dynamic fieldname using custom transformers but > couldn't achieve the expected results. > > My requirement is that I do not want to hardcode some of field names used > by SOLR for indexing, instead the field name should be generated using the > data retreieved from a table. > > Any help on this regard is greatly appreciated. > > Thanks, > Barani > -- View this message in context: http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27330470.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: determine which value produced a hit in multivalued field type
I guess it's not possible for all types then: int, sdate, etc. Because, Highlighting will only work on text fields. -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Monday, January 25, 2010 3:47 PM To: solr-user@lucene.apache.org Subject: Re: determine which value produced a hit in multivalued field type Thanks Erik, I did not know about the order guarantee for indexed multivalue fields. Timothy, it could be more than one term matches the queries. Highlighting will show you which terms matched your query. You'll have to post-process the results. On Mon, Jan 25, 2010 at 7:26 AM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] wrote: > If a simple "no" is the answer I'd be glad if anyone could confirm. > > Thanks. > > -Original Message- > From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] > [mailto:timothy.j.har...@nasa.gov] > Sent: Friday, January 22, 2010 2:53 PM > To: solr-user@lucene.apache.org > Subject: determine which value produced a hit in multivalued field type > > Hi, > If I have a multiValued field type of text, and I put values > [cat,dog,green,blue] in it. Is there a way to tell when I execute a query > against that field for dog, that it was in the 1st element position for that > multiValued field? > > Thanks! > Tim > > -- Lance Norskog goks...@gmail.com
Re: SOLR index file system size estimate
10K documents of 20K each is only 200M as a base, so I don't think you need to worry. Especially since your question is unanswerable given the number of variables About the only thing you can really do is measure, with the understanding that the first documents are more expensive space-wise than later documents. So, assuming your documents are similar, index the first 5,000, then index the next 2000 and use the size delta to calculate the average index growth/document. That'll give you a pretty good idea in *your* environment with *your* index structure.. But, again, this is not much data to index, so I really think you'll be fine. HTH Erick On Tue, Jan 26, 2010 at 3:41 PM, SHS SOLR wrote: > We wanted to estimate the file system size requirements for index. Although > space very cheap, its not so here as we have to go through a process to add > space to the file system. So we don't want to end up estimating less and > get > the process to kick in. > > Is there a estimate tool for index sizes that can give a number based on > estimated size of each document? How much % should we add to the actual > document size considering we do all kinds of analysis/filters on text? > > We are currently looking at only 70 documents each 20k size. But the number > of documents will increase to more than 10K soon. We would like to request > for some space keeping in mind about the future. > > Any help is appreciated. > > Thanks, > Pavan. >
Re: Dynamic boosting of ids at search time
: I mean, if for query x, ids to be boosted are 243452,346563,773567, then for : query y the ids to be boosted won't be the same. They are calculated at the : search time. : Also, I cant keep them in the lucene query as the list goes in thousands. : Please suggest a good resolution to it. I'm at a loss here ... your first sentence seems to suggest that every unique request needs to specify a distinct list of IDs to give a bosted score too, but your second sentence clarifies that it's infeasible for you to include the IDs in the query. that seems tantamount to saying "everytime i do a solr search, the rules about what is important change; but the rules are too long for me to tell solr what they are everytime i do a search." ... that's a catch-22. My best suggestion based on what little i understand of the information you're provided is to suggest that perhaps you could write a custom plugin ... either a RequestHandler, or a SearchComponent, or a QParser depending on what works best for your use cases ... where the client might be able to pass some "key" that can be used by the plugin to "look up" the list of IDs from some other data source and to build the query that way. ...but given how little i understnad about what it is you are trying to do, i suspect my best guess really isnt' a very good one. Frankly, this is starting to smell like an XY Problem http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details about the "X" so that we can understand the full issue. Perhaps the best solution doesn't involve "Y" at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Comparison of Solr with Sharepoint Search
: Has anyone done a functionality comparison of Solr with Sharepoint/Fast : Search? there's been some discussion on this over the years comparing Solr with FAST if you go looking for it... http://old.nabble.com/SOLR-X-FAST-to14284618.html http://old.nabble.com/Replacing-FAST-functionality-at-sesam.no-td19186109.html http://old.nabble.com/Experiences-from-migrating-from-FAST-to-Solr-td26371613.html http://sesat.no/moving-from-fast-to-solr-review.html ...i have no idea about Sharepoint Search (isn't that actaully a seperate system? ... Microsoft Search Server or something?) -Hoss
Re: How can I boost bq in FieldQParserPlugin?
: My original query is: : http://myhost:8080/solr/select?q=ipod&*bq=userId:12345^0.5* : &fq=&start=0&rows=10&fl=*%2Cscore&qt=dismax&wt=standard&debugQuery=on&explainOther=&hl.fl= : But I would like to place bq phrase in the default solrconfig.xml : configuration to make the query string more brief, so I did the following? : http://myhost:8080/solr/select?q=ipod&*bq={!field f=userId v=$qq}&qq=12345* : However, filedQueryParser doesn't accespt a boost parameter, then what shall ...the issue is not that "filedQueryParser doesn't accespt a boost parameter" the problem is that the weight syntax from your orriginal bq (the "^0.5" part) is actaul syntax from the standard parser -- and you arent' using tha parser any more (the distinction between query syntax and params is significant) I haven't tried this, but i think it might do what you want... q=ipod&bq={!dismax qf=userId^0.5 v=$qq}&qq=12345&qt=dismax ...but you might have to put other blank params inside that {!dismax} block to keep them from getting inherited formthe outer query (i can't remember how that logic works off the top of my head) -Hoss
Re: Design Question - Dynamic Field Names (*)
: - We are indexing CSV files and generating field names dynamically from the : "header" line. : User should be able to *list all the possible header names* (i.e. dynamic : field names), and filter results based on some of the field names. : - Also, list* all possible values* associated to for a given field name. #1) the LukeRequestHandler can list all field names in the index. #2) the TermsComponent or Faceting can list all *indexed* values in a given field ... which one you'll want to use depends largely on what you want to do with that list. -Hoss
Multiple Cores Vs. Single Core for the following use case
Hi Shall I set up Multiple Core or Single core for the following use case: I have X number of users. When I do a search, I always know for which user I am doing a search Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a userId field to each document? If I choose the 1 core solution then I am concerned with performance. Let's say I search for "NewYork" ... If lucene returns all "New York" matches for all users and then filters based on the userId, then this is going to be less efficient than if I have sharded per user and send the request for "New York" to the user's core Thank you for your help matt
RE: Solr wiki link broken
: You are right. The wiki can't be read if the preferred language is not English. : The wiki system seems to implement or be configured to use a wrong way of choosing its locale. : Erik, let me know if I can help solving this. Interesting. When accessing "http://wiki.apache.org/solr/"; MoinMoin evidently picks a "translated" version of the page to show each user based on the "Accept-Language" header sent by the browser. If it's "en" or unset, you get the same thing as http://wiki.apache.org/solr/FrontPage -- but if you have some other prefered langauge configured in your browser, then you get a differnet page, for example "de" causes http://wiki.apache.org/solr/StartSeite to be loaded instead. (this behavior can be forced inspite of the "Accept-Language" header sent by the browser if you are logged into the wiki and change the "Preferred langauge" setting from "" to something else ... but i don't recommend it since i was stuck with German for about 10 minutes and got 500 errors every time i tried to change my prefrences back) This is presumably designed to make it easy to support a multilanguage wiki, with users getting langauge specific "homepages" that can then link out to lanaguge specific versions of pages -- but that doesn't really help us much since we don't have any meaninful content on those langauge specific homepages. According to this... http://wiki.apache.org/solr/HelpOnLanguages ...we should be deleting all those unused pages, or have INFRA change or wiki config so that something other then FrontPage is out default (which now explains why Lunce-Java has "FrontPageEN" as the default) Any volunteers to help purge the wiki of (effectively) blank translation pages? ... it looks like they all (probably) have have comment "##master-page:FrontPage" at the top, so they should be easy to identify even if you don't speak the language ... but they aren't very easy to search for since those comments don't appear in the generated page. -Hoss
How to index the fields as key value pair if a query returns multiple rows
Hi all, I have a scenario where a particular query returns multiple results and I need to map those results as a key value pair. Ex: http://old.nabble.com/How-to-index-the-fields-as-key-value-pair-if-a-query-returns-multiple-rows-tp27332475p27332475.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Comparison of Solr with Sharepoint Search
I can only tell that Liferay Portal (WebDAV) Document Library Portlet has same functionality as Sharepoint (it has even /servlet/ URL with suffix '/sharepoint'); Liferay also has plugin (web-hook) for SOLR (it has generic search wrapper; any kind of search service provider can be hooked in Liferay) All assets (web content, message board posts, documents, and etc.) can implement "indexing" interface and get indexed (Lucene, SOLR, etc) So far, it is the best approach. You can enjoy configuring SOLR analyzers/fields/language/stemmers/dictionaries/... You can't do it with MS-Sharepoint (or, for instance, their close competitors Alfresco)!!! -Fuad http://www.tokenizer.ca > -Original Message- > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > Sent: January-26-10 7:49 PM > To: solr-user@lucene.apache.org > Subject: Re: Comparison of Solr with Sharepoint Search > > > : Has anyone done a functionality comparison of Solr with > Sharepoint/Fast > : Search? > > there's been some discussion on this over the years comparing Solr with > FAST if you go looking for it... > > http://old.nabble.com/SOLR-X-FAST-to14284618.html > http://old.nabble.com/Replacing-FAST-functionality-at-sesam.no- > td19186109.html > http://old.nabble.com/Experiences-from-migrating-from-FAST-to-Solr- > td26371613.html > http://sesat.no/moving-from-fast-to-solr-review.html > > ...i have no idea about Sharepoint Search (isn't that actaully a > seperate > system? ... Microsoft Search Server or something?) > > > -Hoss
Re: Basic questions about Solr cost in programming time
Having worked quite a bit on the Drupal integration - here's my quick take: If you have someone help you the first time, you can have a basic implementation running in Jetty in about 15 minutes. On your own, a couple hours maybe. For a non-public site (intranet) with modest traffic and no requirements for high availability, that is likely going to hold you for a while. If you are not already using tomcat6 and want a more robust deployment, getting that right will take you a couple days work I'd guess. There are already some options for indexing/searching documents via the Drupal integration, but that's still a little rough. Of course, we'd also be happy to have you get Drupal support and a hosted Solr index from us at Acquia. http://acquia.com/products-services/acquia-search-features However, I don't think you'll readily be able to use our service with Jive at the moment - you don't really describe why you'd be using both Jive and Drupal. If you are not doing any customization and compiling the java isn't something you enjoy, I'd think the certified distribution is a fine place to start and you can get with it Lucid's free PDF book, which is, I think, by far the best and most comprehensive Solr 1.4 reference work that exists at the moment. -Peter On Tue, Jan 26, 2010 at 3:00 PM, Jeff Crump wrote: > Hi, > I hope this message is OK for this list. > > I'm looking into search solutions for an intranet site built with Drupal. > Eventually we'd like to scale to enterprise search, which would include the > Drupal site, a document repository, and Jive SBS (collaboration software). > I'm interested in Lucene/Solr because of its scalability, faceted search and > optimization features, and because it is free. Our problem is that we are a > non-profit organization with only three very busy programmers/sys admins > supporting our employees around the world. > > To help me argue for Solr in terms of total cost, I'm hoping that members of > this list can share their insights about the following: > > * About how many hours of programming did it take you to set up your > instance of Lucene/Solr (not counting time spent on optimization)? > > * Are there any disadvantages of going with a certified distribution rather > than the standard distribution? > > > Thanks and best regards, > Jeff > > Jeff Crump > jcr...@hq.mercycorps.org > > > > > > > > > > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 - stats page slow
Sorry for not following up sooner- been a busy last couple weeks. We do see a significant instanity count - could this be due to updating indexes from the dev Solr build? E.g. on one server I see 61 and entries like: SUBREADER: Found caches for decendents of org.apache.lucene.index.readonlydirectoryrea...@2b8d6cbf+created 'org.apache.lucene.index.readonlydirectoryrea...@2b8d6cbf'=>'created',class org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#2002656056 (size =~ 74.4 KB) 'org.apache.lucene.store.niofsdirectory$niofsindexin...@47adeb94'=>'created',class org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#1099177573 (size =~ 74.4 KB) SUBREADER: Found caches for decendents of org.apache.lucene.index.readonlydirectoryrea...@d0340a9+created 'org.apache.lucene.index.readonlydirectoryrea...@d0340a9'=>'created',class org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#868132357 (size =~ 831.2 KB) 'org.apache.lucene.store.niofsdirectory$niofsindexin...@78802615'=>'created',class org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#1542727931 (size =~ 831.2 KB) And I think it's higher on the one associated with the screenshot. using the lucene checkIndex tool does not show any errors. Most of what we want is returned by the Luke handler, except for the pending adds and deletes and the index size. I can hack around this by creating a greatly reduced stats.jsp, but I'd also liek to understand what we are experiencing. -Peter On Fri, Jan 8, 2010 at 1:38 PM, Mark Miller wrote: > Yonik Seeley wrote: >> On Fri, Jan 8, 2010 at 1:03 PM, Mark Miller wrote: >> >>> It should be fixed in trunk, but that was after 1.4. Currently, it >>> should only do it if it sees insanity - which there shouldn't be any >>> with stock Solr. >>> >> >> http://svn.apache.org/viewvc/lucene/solr/tags/release-1.4.0/src/java/org/apache/solr/search/SolrFieldCacheMBean.java >> http://svn.apache.org/viewvc?view=revision&revision=826788 >> Seems like it's there? Or was it a different commit? >> >> Perhaps there is just real instanity... which may be unavoidable at >> this point since not everything in solr is done per-segment yet. >> >> -Yonik >> http://www.lucidimagination.com >> > > Your right - when looking at the Solr release date, I quickly took the > 10 as October - but it was 11/10, so it is in 1.4. > > So people seeing this should also being seeing an insanity count over one. > > I'd think that would be rarer than one this sounds like though ... whats > left that could cause insanity? > > We should prob switch to never calculating the size unless an explicit > param is pass to the stats page. > > > -- > - Mark > > http://www.lucidimagination.com > > > > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: schema.xml and Xinclude
It doesn't really work with the schema.xml - I beat my head on it for a few hours not long ago - maybe I sent an e-mail to this list about it? Yes, here: http://www.lucidimagination.com/search/document/ba68aa6f2f7702c3/is_it_possible_to_use_xinclude_in_schema_xml -Peter On Wed, Jan 6, 2010 at 8:36 AM, Patrick Sauts wrote: > As in schema.xml are the same between all our indexes, I'd like to > make them an XInclude so I tried : > > > > xmlns:xi="http://www.w3.org/2001/XInclude";> > > > > - > - > - > > > My Syntax might not be correct ? > Or it is not possible ? yet ? > > Thank you again for your time. > > Patrick. > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Solr 1.4 - stats page slow
On Tue, Jan 26, 2010 at 8:49 PM, Peter Wolanin wrote: > Sorry for not following up sooner- been a busy last couple weeks. > > We do see a significant instanity count - could this be due to > updating indexes from the dev Solr build? E.g. on one server I see Do you both sort (or use a function query) and facet on the "created" field? Faceting on single-valued fields is still currently done at the top-level reader, while sorting and function queries are at a segment level. -Yonik http://www.lucidimagination.com
Re: Multiple Cores Vs. Single Core for the following use case
Hi Matt, In most cases you are going to be better off going with the userid method unless you have a very small number of users and a very large number of docs/user. The userid method will likely be much easier to manage, as you won't have to spin up a new core every time you add a new user. I would start here and see if the performance is good enough for your requirements before you start worrying about it not being efficient. That being said, I really don't have any idea what your data looks like. How many users do you have? How many documents per user? Are any documents shared by multiple users? -Trey On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour wrote: > Hi > > > > Shall I set up Multiple Core or Single core for the following use case: > > > > I have X number of users. > > > > When I do a search, I always know for which user I am doing a search > > > > Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add > a userId field to each document? > > > > If I choose the 1 core solution then I am concerned with performance. > Let's say I search for "NewYork" ... If lucene returns all "New York" > matches for all users and then filters based on the userId, then this > is going to be less efficient than if I have sharded per user and send > the request for "New York" to the user's core > > > > Thank you for your help > > > > matt > > > > > > >
Re: NullPointerException in ReplicationHandler.postCommit + question about compression
never keep a 0. It is better to leave not mention the deletionPolicy at all. The defaults are usually fine. On Fri, Jan 22, 2010 at 11:12 AM, Stephen Weiss wrote: > Hi Shalin, > > Thanks for your reply. Please see below. > > > On Jan 18, 2010, at 4:19 AM, Shalin Shekhar Mangar wrote: > >> On Wed, Jan 13, 2010 at 12:51 AM, Stephen Weiss >> wrote: >> ... > >>> When we replicate >>> manually (via the admin page) things seem to go well. However, when >>> replication is triggered by a commit event on the master, the master gets >>> a >>> NullPointerException and no replication seems to take place. >>> >>> SEVERE: java.lang.NullPointerException at org.apache.solr.handler.ReplicationHandler$4.postCommit(ReplicationHandler.java:922) at... >>> >>> Does anyone know off the top of their head what this might indicate, or >>> know what further troubleshooting steps we should be taking to isolate >>> the >>> issue? >>> >> >> That is a strange one. It looks like the latest commit point was null. Do >> you have a deletion policy section in your solrconfig.xml? Are you always >> able to reproduce the exception? > > We are always able to reproduce the exception. > > The master has committed changes many times for over a year now... so if > that's what's being reported, it's not quite accurate. > > This is our deletion policy. I don't believe that I've edited it, it is > probably verbatim from the example (the example of what version of Solr, I > can't tell you for sure, but I imagine it's from 1.2 or 1.4 - we never > updated the config when using 1.3). > >> >> 1 >> 0 >> > > (removing comments per Noble Paul's request... we keep these in the file > for our own readability purposes but agreed, we have no need to e-mail them > along) > > I would have never thought to look there but it does seem suspicious now > that you mention it. For a proper replication configuration where we > replicate on commit, is there a recommended setting? > >>> ... >>> >> During our tests we found that enabling compression on a gigabit ethernet >> actually degrades transfer rate because of the compress/de-compress >> overhead. Just comment out that line to disable compression. > > Thank you for the clarification. We will comment it out. > > -- > Steve -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Wildcard Search and Filter in Solr
Hi just looked at the analysis.jsp and found out what it does during index / query Index Analyzer Intel intel intel intel intel intel Query Analyzer Inte* Inte* inte* inte inte inte int I think somewhere my configuration or my definition of the type "text" is wrong. This is my configuration . I think i am missing some basic configuration for doing wildcard searches . but could not figure it out . can someone help please Ahmet Arslan wrote: > > >> Hi , >> I m trying to use wildcard keywords in my search term and >> filter term . but >> i didnt get any results. >> Searched a lot but could not find any lead . >> Can someone help me in this. >> i m using solr 1.2.0 and have few records indexed with >> vendorName value as >> Intel >> >> In solr admin interface i m trying to do the search like >> this >> >> http://localhost:8983/solr/select?indent=on&version=2.2&q=intel&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl= >> >> and i m getting the result properly >> >> but when i use q=inte* no records are returned. >> >> the same is the case for Filter Query on using >> &fq=VendorName:"Intel" i get >> my results. >> >> but on using &fq=VendorName:"Inte*" no results are >> returned. >> >> I can guess i doing mistake in few obvious things , but >> could not figure it >> out .. >> Can someone pls help me out :) :) > > If &q=intel returns documents while q=inte* does not, it means that > fieldType of your defaultSearchField is reducing the token intel into > something. > > Can you find out it by using /admin/anaysis.jsp what happens to "Intel > intel" at index and query time? > > What is your defaultSearchField? Is it VendorName? > > It is expected that &fq=VendorName:Intel returns results while > &fq=VendorName:Inte* does not. Because prefix queries are not analyzed. > > > But it is strange that q=inte* does not return anything. Maybe your index > analyzer is reducing Intel into int or ıntel? > > I am not 100% sure but solr 1.2.0 may use default locale in lowercase > operation. What is your default locale? > > It is better to see what happens word Intel using analysis.jsp page. > > > > > -- View this message in context: http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27334486.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource
There is no corresponding DataSurce which can be used with TikaEntityProcessor which reads from BLOB I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737 On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal wrote: > Hi, > > > > I am fairly new to Solr and would like to use the DIH to pull rich text > files (pdfs, etc) from BLOB fields in my database. > > > > There was a suggestion made to use the FieldReaderDataSource with the > recently commited TikaEntityProcessor. Has anyone accomplished this? > > This is my configuration, and the resulting error - I'm not sure if I'm > using the FieldReaderDataSource correctly. If anyone could shed light > on whether I am going the right direction or not, it would be > appreciated. > > > > ---Data-config.xml: > > > > > > url="jdbc:oracle:thin:un/p...@host:1521:sid" /> > > > > > > dataField="attach.attachment" format="text"> > > > > > > > > > > > > > > > > -Debug error: > > > > > > 0 > > 203 > > > > > > > > testdb-data-config.xml > > > > > > full-import > > debug > > > > > > > > > > select id as name, attachment from testtable2 > > 0:0:0.32 > > --- row #1- > > java.math.BigDecimal:2 > > oracle.sql.BLOB:oracle.sql.b...@1c8e807 > > - > > > > > > org.apache.solr.handler.dataimport.DataImportHandlerException: No > dataSource :f1 available for entity :253433571801723 Processing Document > # 1 > > at > org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da > taImporter.java:279) > > at > org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl > .java:93) > > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit > yProcessor.java:97) > > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity > ProcessorWrapper.java:237) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j > ava:357) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j > ava:383) > > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java > :242) > > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 > 0) > > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte > r.java:331) > > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java > :389) > > at > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D > ataImportHandler.java:203) > > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB > ase.java:131) > > at > org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja > va:338) > > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j > ava:241) > > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan > dler.java:1089) > > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2 > 16) > > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler > Collection.java:211) > > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav > a:114) > > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > > at org.mortbay.jetty.Server.handle(Server.java:285) > > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne > ction.java:821) > > at > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) > > at > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > > at > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav > a:226) > > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja > va:442) > > > > Thanks, > > Nirmal > > -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Fastest way to use solrj
if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård : > I got the binary format to work perfectly now. Performance is better > than with xml. Thanks! > > Although, it doesn't look like a binary file is smaller in size than > an xml file? > > /Tim > > 2010/1/27 Noble Paul നോബിള് नोब्ळ् : >> 2010/1/21 Tim Terlegård : >>> Yes, it worked! Thank you very much. But do I need to use curl or can >>> I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't >>> use BinaryWriter then I don't know how to do this. >> if your data is serialized using JavaBinUpdateRequestCodec, you may >> POST it using curl. >> If you are writing directly , use CommonsHttpSolrServer >>> >>> /Tim >>> >>> 2010/1/20 Noble Paul നോബിള് नोब्ळ् : 2010/1/20 Tim Terlegård : BinaryRequestWriter does not read from a file and post it >>> >>> Is there any other way or is this use case not supported? I tried this: >>> >>> $ curl /solr/update/javabin -F stream.file=/tmp/data.bin >>> $ curl /solr/update -F stream.body=' ' >>> >>> Solr did read the file, because solr complained when the file wasn't >>> in the format the JavaBinUpdateRequestCodec expected. But no data is >>> added to the index for some reason. > >> how did you create the file /tmp/data.bin ? what is the format? > > I wrote this in the first email. It's in the javabin format (I think). > I did like this (groovy code): > > fieldId = new NamedList() > fieldId.add("name", "id") > fieldId.add("val", "9-0") > fieldId.add("boost", null) > fieldText = new NamedList() > fieldText.add("name", "text") > fieldText.add("val", "Some text") > fieldText.add("boost", null) > fieldNull = new NamedList() > fieldNull.add("boost", null) > doc = [fieldNull, fieldId, fieldText] > docs = [doc] > root = new NamedList() > root.add("docs", docs) > fos = new FileOutputStream("data.bin") > new JavaBinCodec().marshal(root, fos) > > /Tim > JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com >>> >> >> >> >> -- >> - >> Noble Paul | Systems Architect| AOL | http://aol.com >> > -- - Noble Paul | Systems Architect| AOL | http://aol.com