date:20130402

java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread Arkadi Colson


Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:
 
   1
   false
 

   
 2000
   

The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m

Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error 
opening new searcher
at 
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)

at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)

at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
at 
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:228)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
at 
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:96)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
at 
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
at 
org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
at 
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
at 
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
at 
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961)
at 
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952)
at 
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255)
at 
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249)
at 
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353)

... 11 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)
... 28 more


SEVERE: auto commit error...:java.lang.IllegalStateException: this 
writer hit an OutOfMemoryError; cannot commit
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:541)

at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)

Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen

On some queries I get out of memory errors:

{"error":{"msg":"java.lang.OutOfMemoryError: Java heap
space","trace":"java.lang.RuntimeException:
java.lang.OutOfMemoryError: Java heap space\n\tat
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:462)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:290)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:365)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)\n\tat
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)\n\tat
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
java.lang.Thread.run(Thread.java:679)\nCaused by:
java.lang.OutOfMemoryError: Java heap space\n\tat
org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:273)\n\tat
org.apache.solr.request.UnInvertedField.(UnInvertedField.java:178)\n\tat
org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:669)\n\tat
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:325)\n\tat
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:423)\n\tat
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)\n\tat
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:365)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n

AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread André Widhani

Hi Arkadi,

this error usually indicates that virtual memory is not sufficient (should be 
"unlimited").

Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168 

Regards,
André


Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 10:24
An: solr-user@lucene.apache.org
Betreff: java.lang.OutOfMemoryError: Map failed

Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:
  
1
false
  


  2000


The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m

Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error
opening new searcher
 at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
 at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)
 at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
 at
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
 at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:228)
 at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
 at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
 at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:96)
 at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
 at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
 at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
 at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
 at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
 at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961)
 at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952)
 at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368)
 at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
 at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255)
 at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249)
 at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353)
 ... 11 more
Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)
 ... 28 more


SEVERE: auto commit error...:java.lang.IllegalStateException: this
writer hit an OutOfMemoryError; cannot commit
 at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
 at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
 at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
 at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:541)
 at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPo

Re: AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread Arkadi Colson


Hmmm I checked it and it seems to be ok:

root@solr01-dcg:~# ulimit -v
unlimited

Any other tips or do you need more debug info?

BR

On 04/02/2013 11:15 AM, André Widhani wrote:

Hi Arkadi,

this error usually indicates that virtual memory is not sufficient (should be 
"unlimited").

Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168

Regards,
André


Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 10:24
An: solr-user@lucene.apache.org
Betreff: java.lang.OutOfMemoryError: Map failed

Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:
   
 1
 false
   

 
   2000
 

The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m

Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error
opening new searcher
  at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
  at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)
  at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
  at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
  at
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
  at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:228)
  at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
  at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
  at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:96)
  at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
  at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
  at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
  at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
  at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
  at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961)
  at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952)
  at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368)
  at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
  at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255)
  at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249)
  at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353)
  ... 11 more
Caused by: java.lang.OutOfMemoryError: Map failed
  at sun.nio.ch.FileChannelImpl.map0(Native Method)
  at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)
  ... 28 more


SEVERE: auto commit error...:java.lang.IllegalStateException: this
writer hit an OutOfMemoryError; cannot commit
  at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
  at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
  at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
  at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:541)
  at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  at
java.util.concurrent.ScheduledThreadPoolExe

AW: AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread André Widhani

The output is from the root user. Are you running Solr as root?

If not, please try again using the operating system user that runs Solr.

André

Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 11:26
An: solr-user@lucene.apache.org
Cc: André Widhani
Betreff: Re: AW: java.lang.OutOfMemoryError: Map failed

Hmmm I checked it and it seems to be ok:

root@solr01-dcg:~# ulimit -v
unlimited

Any other tips or do you need more debug info?

BR

On 04/02/2013 11:15 AM, André Widhani wrote:
> Hi Arkadi,
>
> this error usually indicates that virtual memory is not sufficient (should be 
> "unlimited").
>
> Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168
>
> Regards,
> André
>
> 
> Von: Arkadi Colson [ark...@smartbit.be]
> Gesendet: Dienstag, 2. April 2013 10:24
> An: solr-user@lucene.apache.org
> Betreff: java.lang.OutOfMemoryError: Map failed
>
> Hi
>
> Recently solr crashed. I've found this in the error log.
> My commit settings are loking like this:
>
>  1
>  false
>
>
>  
>2000
>  
>
> The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m
>
> Versions
> Solr: 4.2
> Tomcat: 7.0.33
> Java: 1.7
>
> Anybody any idea?
>
> Thx!
>
> Arkadi
>
> SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error
> opening new searcher
>   at
> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
>   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
>   at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)
>   at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>   at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>   at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
>   at
> org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
>   at
> org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:228)
>   at
> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
>   at
> org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
>   at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:96)
>   at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
>   at
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
>   at
> org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
>   at
> org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
>   at
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
>   at
> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961)
>   at
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952)
>   at
> org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368)
>   at
> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
>   at
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255)
>   at
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249)
>   at
> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353)
>   ... 11 more
> Caused by: java.lang.OutOfMemoryError: Map failed
>   at sun.nio.ch.FileChannelImpl.map0(Native Method)
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)
>   ... 28 more
>
>
> SEVERE: auto commit error...:java.lang.IllegalStateException: this
> writer hit an OutOfMemoryError; cannot commit
>   at
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
>   at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
>   at
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
>

Re: AW: AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread Arkadi Colson


It is running as root:

root@solr01-dcg:~# ps aux | grep tom
root  1809 10.2 67.5 49460420 6931232 ?Sl   Mar28 706:29 
/usr/bin/java 
-Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -server 
-Xms2048m -Xmx6144m -XX:PermSize=64m -XX:MaxPermSize=128m -XX:+UseG1GC 
-verbose:gc -Xloggc:/solr/tomcat-logs/gc.log -XX:+PrintGCTimeStamps 
-XX:+PrintGCDetails -Duser.timezone=UTC -Dfile.encoding=UTF8 
-Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName=smsc 
-DzkClientTimeout=2 
-DzkHost=solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181,solr02-dcg.intnet.smartbit.be:2181,solr02-gs.intnet.smartbit.be:2181,solr03-dcg.intnet.smartbit.be:2181,solr03-gs.intnet.smartbit.be:2181 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port= 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Djava.endorsed.dirs=/usr/local/tomcat/endorsed -classpath 
/usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat 
-Dcatalina.home=/usr/local/tomcat 
-Djava.io.tmpdir=/usr/local/tomcat/temp 
org.apache.catalina.startup.Bootstrap start


Arkadi

On 04/02/2013 11:29 AM, André Widhani wrote:

The output is from the root user. Are you running Solr as root?

If not, please try again using the operating system user that runs Solr.

André

Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 11:26
An: solr-user@lucene.apache.org
Cc: André Widhani
Betreff: Re: AW: java.lang.OutOfMemoryError: Map failed

Hmmm I checked it and it seems to be ok:

root@solr01-dcg:~# ulimit -v
unlimited

Any other tips or do you need more debug info?

BR

On 04/02/2013 11:15 AM, André Widhani wrote:

Hi Arkadi,

this error usually indicates that virtual memory is not sufficient (should be 
"unlimited").

Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168

Regards,
André


Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 10:24
An: solr-user@lucene.apache.org
Betreff: java.lang.OutOfMemoryError: Map failed

Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:

  1
  false


  
2000
  

The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m

Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error
opening new searcher
   at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
   at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)
   at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
   at
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
   at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:228)
   at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
   at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
   at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:96)
   at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
   at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:147)
   at
org.apache.lucene.index.SegmentReader.(SegmentReader.java:56)
   at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
   at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
   at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961)
   at
org.apache.lucene.index.

Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen

On Tue, 2013-04-02 at 11:09 +0200, Dotan Cohen wrote:
> On some queries I get out of memory errors:
> 
> {"error":{"msg":"java.lang.OutOfMemoryError: Java heap
[...]
> org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:273)\n\tat
> org.apache.solr.request.UnInvertedField.(UnInvertedField.java:178)\n\tat
[...]

Yep, your OOM is due to faceting.

How many documents does your index have, how many fields do you facet on
and approximately how many unique values does your facet fields have?

> I notice that this only occurs on queries that run facets. I start
> Solr with the following command:
> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
> /opt/solr-4.1.0/example/start.jar &

You are not specifying any maximum heap size (-Xmx), which you should do
in order to avoid unpleasant surprises. Facets and sorting are often
memory hungry, but your system seems to have 13GB free RAM so the easy
solution attempt would be to increase the heap until Solr serves the
facets without OOM.

- Toke Eskildsen, State and University Library, Denmark

Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen

On Tue, Apr 2, 2013 at 12:59 PM, Toke Eskildsen  
wrote:
> How many documents does your index have, how many fields do you facet on
> and approximately how many unique values does your facet fields have?
>

8971763 documents, growing at a rate of about 500 per minute. We
actually expect that to be ~5 per minute once we get out of
testing. Most documents are less than a KiB in the 'text' field, and
they have a few other fields which store short strings, dates, or
ints. You can think of these documents like tweets: short general
purpose text messages.

>> I notice that this only occurs on queries that run facets. I start
>> Solr with the following command:
>> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>> -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
>> /opt/solr-4.1.0/example/start.jar &
>
> You are not specifying any maximum heap size (-Xmx), which you should do
> in order to avoid unpleasant surprises. Facets and sorting are often
> memory hungry, but your system seems to have 13GB free RAM so the easy
> solution attempt would be to increase the heap until Solr serves the
> facets without OOM.
>

Thanks, I will start with "-Xmx8g" and test.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: AW: AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread Per Steffensen

I have seen the exact same on Ubuntu Server 12.04. It helped adding some 
swap space, but I do not understand why this is necessary, since OS 
ought to just use the actual memory mapped files if there is not room in 
(virtual) memory, swapping pages in and out on demand. Note that I saw 
this for memory mapped files opened for read+write - not in the exact 
same context as you see it where MMapDirectory is trying to map memory 
mapped files.


If you find a solution/explanation, please post it here. I really want 
to know more about why FileChannel.map can cause OOM. I do not think the 
OOM is a "real" OOM indicating no more space on java heap, but is more 
an exception saying that OS has no more memory (in some interpretation 
of that).


Regards, Per Steffensen

On 4/2/13 11:32 AM, Arkadi Colson wrote:

It is running as root:

root@solr01-dcg:~# ps aux | grep tom
root  1809 10.2 67.5 49460420 6931232 ?Sl   Mar28 706:29 
/usr/bin/java 
-Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties 
-server -Xms2048m -Xmx6144m -XX:PermSize=64m -XX:MaxPermSize=128m 
-XX:+UseG1GC -verbose:gc -Xloggc:/solr/tomcat-logs/gc.log 
-XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Duser.timezone=UTC 
-Dfile.encoding=UTF8 -Dsolr.solr.home=/opt/solr/ -Dport=8983 
-Dcollection.configName=smsc -DzkClientTimeout=2 
-DzkHost=solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181,solr02-dcg.intnet.smartbit.be:2181,solr02-gs.intnet.smartbit.be:2181,solr03-dcg.intnet.smartbit.be:2181,solr03-gs.intnet.smartbit.be:2181 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port= 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Djava.endorsed.dirs=/usr/local/tomcat/endorsed -classpath 
/usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar 
-Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat 
-Djava.io.tmpdir=/usr/local/tomcat/temp 
org.apache.catalina.startup.Bootstrap start


Arkadi

On 04/02/2013 11:29 AM, André Widhani wrote:

The output is from the root user. Are you running Solr as root?

If not, please try again using the operating system user that runs Solr.

André

Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 11:26
An: solr-user@lucene.apache.org
Cc: André Widhani
Betreff: Re: AW: java.lang.OutOfMemoryError: Map failed

Hmmm I checked it and it seems to be ok:

root@solr01-dcg:~# ulimit -v
unlimited

Any other tips or do you need more debug info?

BR

On 04/02/2013 11:15 AM, André Widhani wrote:

Hi Arkadi,

this error usually indicates that virtual memory is not sufficient 
(should be "unlimited").


Please see 
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168


Regards,
André


Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 10:24
An: solr-user@lucene.apache.org
Betreff: java.lang.OutOfMemoryError: Map failed

Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:

  1
  false


  
2000
  

The machine has 10GB of memory. Tomcat is running with -Xms2048m 
-Xmx6144m


Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: 
Error

opening new searcher
   at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
   at 
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)

   at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562) 

   at 
org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)

   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) 


   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) 


   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 


   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 


   at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
   at
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
   at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:228) 


   at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)

Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen

On Tue, 2013-04-02 at 12:16 +0200, Dotan Cohen wrote:
> 8971763 documents, growing at a rate of about 500 per minute. We
> actually expect that to be ~5 per minute once we get out of
> testing.

9M documents in a heavily updated index with faceting. Maybe you are
committing faster than the faceting can be prepared?
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F

Regards,
Toke Eskildsen

Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Lukasz Kujawa

Hello,

I'm using Solr collections API to create a collection.

http://127.0.0.1:8983/solr/admin/collections?action=CREATE&name=test2&numShards=1&replicationFactor=2&collection.configName=default

I'm expecting new collection to be named "test2" what I get instead is
"test2_shard1_replica2". I don't want to tie my index name to any curent
settings. Is there any way to set collection name precisely? 

Thank you,
Lukasz




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Yago Riveiro

Collection API is a wrapper for the CORE API,

If you don't want that the API defines the name for you, then use the CORE API, 
you can define the collection name and the shard id.

curl 
'http://localhost:8983/solr/admin/cores?action=CREATE&name=corename&collection=collection1&shard=XX'

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 1:01 PM, Lukasz Kujawa wrote:

> Hello,
> 
> I'm using Solr collections API to create a collection.
> 
> http://127.0.0.1:8983/solr/admin/collections?action=CREATE&name=test2&numShards=1&replicationFactor=2&collection.configName=default
> 
> I'm expecting new collection to be named "test2" what I get instead is
> "test2_shard1_replica2". I don't want to tie my index name to any curent
> settings. Is there any way to set collection name precisely? 
> 
> Thank you,
> Lukasz
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html
> Sent from the Solr - User mailing list archive at Nabble.com 
> (http://Nabble.com).
> 
>

Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Anshum Gupta

Also, I am assuming that the collection name in this case should be
'test2'. The replica names would be on the lines of what you've mentioned.
Is that not the case?



On Tue, Apr 2, 2013 at 5:31 PM, Lukasz Kujawa  wrote:

> Hello,
>
> I'm using Solr collections API to create a collection.
>
>
> http://127.0.0.1:8983/solr/admin/collections?action=CREATE&name=test2&numShards=1&replicationFactor=2&collection.configName=default
>
> I'm expecting new collection to be named "test2" what I get instead is
> "test2_shard1_replica2". I don't want to tie my index name to any curent
> settings. Is there any way to set collection name precisely?
>
> Thank you,
> Lukasz
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

Anshum Gupta
http://www.anshumgupta.net

Query using function query result

2013-04-02 Thread J Mohamed Zahoor

Hi


i want to query documents which match a certain dynamic criteria.
like, How do i get all documents, where sub(field1,field2) < 0 ?

i tried _val_: sub(field1,field2) and used fq:[_val_:[0 TO *]
But it doesnt work.

./Zahoor

Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Yago Riveiro

In this link you can see what is what 
http://wiki.apache.org/solr/SolrCloud#Glossary 

The collection represents a single index, the solrCores AKA core, encapsulates 
a single physical index, One or more make up a logical shard which make up a 
collection.

You can have a collection with the same name of the SolrCore if you want.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 1:53 PM, Anshum Gupta wrote:

> Also, I am assuming that the collection name in this case should be
> 'test2'. The replica names would be on the lines of what you've mentioned.
> Is that not the case?
> 
> 
> 
> On Tue, Apr 2, 2013 at 5:31 PM, Lukasz Kujawa  (mailto:luk...@php.net)> wrote:
> 
> > Hello,
> > 
> > I'm using Solr collections API to create a collection.
> > 
> > 
> > http://127.0.0.1:8983/solr/admin/collections?action=CREATE&name=test2&numShards=1&replicationFactor=2&collection.configName=default
> > 
> > I'm expecting new collection to be named "test2" what I get instead is
> > "test2_shard1_replica2". I don't want to tie my index name to any curent
> > settings. Is there any way to set collection name precisely?
> > 
> > Thank you,
> > Lukasz
> > 
> > 
> > 
> > 
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html
> > Sent from the Solr - User mailing list archive at Nabble.com 
> > (http://Nabble.com).
> > 
> 
> 
> 
> 
> -- 
> 
> Anshum Gupta
> http://www.anshumgupta.net
> 
>

Re: Top 10 Terms in Index (by date)

2013-04-02 Thread Tomás Fernández Löbbe

Oh, I see, essentially you want to get the sum of the term frequencies for
every term in a subset of documents (instead of the document frequency as
the FacetComponent would give you). I don't know of an easy/out of the box
solution for this. I know the TermVectorComponent will give you the tf for
every term in a document, but I'm not sure if you can filter or sort on it.
Maybe you can do something like:
https://issues.apache.org/jira/browse/LUCENE-2393
or what's suggested here:
http://search-lucene.com/m/of5Fn1PUOHU/
but I have never used something like that.

Tomás



On Mon, Apr 1, 2013 at 9:58 PM, Andy Pickler  wrote:

> I need "total number of occurrences" across all documents for each term.
> Imagine this...
>
> Post #1: "I think, therefore I am like you"
> Reply #1: "You think too much"
> Reply #2 "I think that I think much as you"
>
> Each of those "documents" are put into 'content'.  Pretending I don't have
> stop words, the top term query (not considering dateCreated in this
> example) would result in something like...
>
> "think": 4
> "I": 4
> "you": 3
> "much": 2
> ...
>
> Thus, just a "number of documents" approach doesn't work, because if a word
> occurs more than one time in a document it needs to be counted that many
> times.  That seemed to rule out faceting like you mentioned as well as the
> TermsComponent (which as I understand also only counts "documents").
>
> Thanks,
> Andy Pickler
>
> On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe <
> tomasflo...@gmail.com
> > wrote:
>
> > So you have one document per user comment? Why not use faceting plus
> > filtering on the "dateCreated" field? That would count "number of
> > documents" for each term (so, in your case, if a term is used twice in
> one
> > comment it would only count once). Is that what you are looking for?
> >
> > Tomás
> >
> >
> > On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler 
> > wrote:
> >
> > > Our company has an application that is "Facebook-like" for usage by
> > > enterprise customers.  We'd like to do a report of "top 10 terms
> entered
> > by
> > > users over (some time period)".  With that in mind I'm using the
> > > DataImportHandler to put all the relevant data from our database into a
> > > Solr 'content' field:
> > >
> > >  > > multiValued="false" required="true" termVectors="true"/>
> > >
> > > Along with the content is the 'dateCreated' for that content:
> > >
> > >  > > multiValued="false" required="true"/>
> > >
> > > I'm struggling with the TermVectorComponent documentation to understand
> > how
> > > I can put together a query that answers the 'report' mentioned above.
> >  For
> > > each document I need each term counted however many times it is entered
> > > (content of "I think what I think" would report 'think' as used twice).
> > >  Does anyone have any insight as to whether I'm headed in the right
> > > direction and then what my query would be?
> > >
> > > Thanks,
> > > Andy Pickler
> > >
> >
>

Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen

On Tue, Apr 2, 2013 at 2:41 PM, Toke Eskildsen  wrote:
> 9M documents in a heavily updated index with faceting. Maybe you are
> committing faster than the faceting can be prepared?
> https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F
>

Thank you Toke, this is exactly on my "list of things to learn about
Solr". We do get the error mentioned and we cannot reduce the amount
of commits. Also, I do believe that we have the necessary server
resources (16 GiB RAM).

I have increased maxWarmingSearchers to 4, let's see how this goes.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Slaves always replicate entire index & Index versions

2013-04-02 Thread yayati

I moved solr 4.1 to solr 4.2 on one of slave server earlier my index
directory has index.timestamp, but now, it has only index folder no
timestamp. Is this is bug.?? Though size of index is same as on master . It
shows replication running on dasboard with both master and slave version.
what happened to timestamp in index directory


index.timestamp  -- earlier with 4.1

index  -- this is new folder

Please reply asap.

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4053179.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Top 10 Terms in Index (by date)

2013-04-02 Thread Andy Pickler

A key problem with those approaches as well as Lucene's HighFreqTerms class
(
http://lucene.apache.org/core/4_2_0/misc/org/apache/lucene/misc/HighFreqTerms.html)
is that none of them seem to have the ability to combine with a date range
query...which is key in my scenario.  I'm kinda thinking that what I'm
asking to do just isn't supported by Lucene or Solr, and that I'll have to
pursue another avenue.  If anyone has any other suggestions, I'm all ears.
I'm starting to wonder if I need to have some nightly batch job that
executes against my database and builds up "that day's top terms" in a
table or something.

Thanks,
Andy Pickler

On Tue, Apr 2, 2013 at 7:16 AM, Tomás Fernández Löbbe  wrote:

> Oh, I see, essentially you want to get the sum of the term frequencies for
> every term in a subset of documents (instead of the document frequency as
> the FacetComponent would give you). I don't know of an easy/out of the box
> solution for this. I know the TermVectorComponent will give you the tf for
> every term in a document, but I'm not sure if you can filter or sort on it.
> Maybe you can do something like:
> https://issues.apache.org/jira/browse/LUCENE-2393
> or what's suggested here:
> http://search-lucene.com/m/of5Fn1PUOHU/
> but I have never used something like that.
>
> Tomás
>
>
>
> On Mon, Apr 1, 2013 at 9:58 PM, Andy Pickler 
> wrote:
>
> > I need "total number of occurrences" across all documents for each term.
> > Imagine this...
> >
> > Post #1: "I think, therefore I am like you"
> > Reply #1: "You think too much"
> > Reply #2 "I think that I think much as you"
> >
> > Each of those "documents" are put into 'content'.  Pretending I don't
> have
> > stop words, the top term query (not considering dateCreated in this
> > example) would result in something like...
> >
> > "think": 4
> > "I": 4
> > "you": 3
> > "much": 2
> > ...
> >
> > Thus, just a "number of documents" approach doesn't work, because if a
> word
> > occurs more than one time in a document it needs to be counted that many
> > times.  That seemed to rule out faceting like you mentioned as well as
> the
> > TermsComponent (which as I understand also only counts "documents").
> >
> > Thanks,
> > Andy Pickler
> >
> > On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe <
> > tomasflo...@gmail.com
> > > wrote:
> >
> > > So you have one document per user comment? Why not use faceting plus
> > > filtering on the "dateCreated" field? That would count "number of
> > > documents" for each term (so, in your case, if a term is used twice in
> > one
> > > comment it would only count once). Is that what you are looking for?
> > >
> > > Tomás
> > >
> > >
> > > On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler 
> > > wrote:
> > >
> > > > Our company has an application that is "Facebook-like" for usage by
> > > > enterprise customers.  We'd like to do a report of "top 10 terms
> > entered
> > > by
> > > > users over (some time period)".  With that in mind I'm using the
> > > > DataImportHandler to put all the relevant data from our database
> into a
> > > > Solr 'content' field:
> > > >
> > > >  stored="false"
> > > > multiValued="false" required="true" termVectors="true"/>
> > > >
> > > > Along with the content is the 'dateCreated' for that content:
> > > >
> > > >  > > > multiValued="false" required="true"/>
> > > >
> > > > I'm struggling with the TermVectorComponent documentation to
> understand
> > > how
> > > > I can put together a query that answers the 'report' mentioned above.
> > >  For
> > > > each document I need each term counted however many times it is
> entered
> > > > (content of "I think what I think" would report 'think' as used
> twice).
> > > >  Does anyone have any insight as to whether I'm headed in the right
> > > > direction and then what my query would be?
> > > >
> > > > Thanks,
> > > > Andy Pickler
> > > >
> > >
> >
>

performance on concurrent search request

2013-04-02 Thread Anatoli Matuskova

In this thread about performance on concurrent search requests, Otis said:
http://lucene.472066.n3.nabble.com/how-to-improve-concurrent-request-performance-and-stress-testing-td496411.html

/Imagine this type of code: 

synchronized (someGlobalObject) { 
  // search 
} 

What happens when  100 threads his this spot?  The first one to get there
gets in and runs the search and 99 of them wait. 
What happens if that  "// search" also involves expensive operations, lots
of IO, warming up, cache population, etc?  Those 99 threads will have to
wait a while :) 

That's why it is recommended to warm up the searcher ahead of time before
exposing it to real requests.  However, even if you warm things up, that
sync block will remain there, and at some point this will become a
bottleneck.  What that point is depends on the hardware, index size, query
complexity and rat, even JVM. 

Otis /

I'm wondering if this synchronized is still an issue in Solr 4.x? Is it
because how Solr deals with the index searcher or is it because how it is
implemented in Lucene?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-on-concurrent-search-request-tp4053182.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flow Chart of Solr

2013-04-02 Thread Koji Sekiguchi


(13/04/02 21:45), Furkan KAMACI wrote:

Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents) and
goes to parsing process (i.e. stemming processes etc.) and then reverse
indexes are get so on so forth?



There is an interesting ticket:

Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/jira/browse/LUCENE-2412

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html

Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen

On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:

[Tokd: maxWarmingSearchers limit exceeded?]

> Thank you Toke, this is exactly on my "list of things to learn about
> Solr". We do get the error mentioned and we cannot reduce the amount
> of commits. Also, I do believe that we have the necessary server
> resources (16 GiB RAM).

Memory does not help you if you commit too frequently. If you commit
each X seconds and warming takes X+Y seconds, then you will run out of
memory at some point.

> I have increased maxWarmingSearchers to 4, let's see how this goes.

If you still get the error with 4 concurrent searchers, you will have to
either speed up warmup time or commit less frequently. You should be
able to reduce facet startup time by switching to segment based faceting
(at the cost of worse search-time performance) or maybe by using
DocValues. Some of the current threads on the solr-user list is about
these topics.

How often do you commit and how many unique values does your facet
fields have?

Regards,
Toke Eskildsen

Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen

On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen  wrote:
> On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:
>
> [Tokd: maxWarmingSearchers limit exceeded?]
>
>> Thank you Toke, this is exactly on my "list of things to learn about
>> Solr". We do get the error mentioned and we cannot reduce the amount
>> of commits. Also, I do believe that we have the necessary server
>> resources (16 GiB RAM).
>
> Memory does not help you if you commit too frequently. If you commit
> each X seconds and warming takes X+Y seconds, then you will run out of
> memory at some point.
>
>> I have increased maxWarmingSearchers to 4, let's see how this goes.
>
> If you still get the error with 4 concurrent searchers, you will have to
> either speed up warmup time or commit less frequently. You should be
> able to reduce facet startup time by switching to segment based faceting
> (at the cost of worse search-time performance) or maybe by using
> DocValues. Some of the current threads on the solr-user list is about
> these topics.
>
> How often do you commit and how many unique values does your facet
> fields have?
>
> Regards,
> Toke Eskildsen
>



-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Flow Chart of Solr

2013-04-02 Thread Andre Bois-Crettez



On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:

(13/04/02 21:45), Furkan KAMACI wrote:

Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents) and
goes to parsing process (i.e. stemming processes etc.) and then reverse
indexes are get so on so forth?


There is an interesting ticket:

Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/jira/browse/LUCENE-2412

koji


I like this one, it is a bit more detailed :

http://www.cominvent.com/2011/04/04/solr-architecture-diagram/

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI

Actually maybe one the most important core thing is that Analysis part at
last diagram but there is nothing about it i.e. stamming, lemmitazing etc.
at any of them.


2013/4/2 Andre Bois-Crettez 

>
> On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
>
>> (13/04/02 21:45), Furkan KAMACI wrote:
>>
>>> Is there any documentation something like flow chart of Solr. i.e.
>>> Documents comes into Solr(maybe indicating which classes get documents)
>>> and
>>> goes to parsing process (i.e. stemming processes etc.) and then reverse
>>> indexes are get so on so forth?
>>>
>>>  There is an interesting ticket:
>>
>> Architecture Diagrams needed for Lucene, Solr and Nutch
>> https://issues.apache.org/**jira/browse/LUCENE-2412
>>
>> koji
>>
>
> I like this one, it is a bit more detailed :
>
> http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/
>
> --
> André Bois-Crettez
>
> Search technology, Kelkoo
> http://www.kelkoo.com/
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>

Re: Slaves always replicate entire index & Index versions

2013-04-02 Thread Arkadi Colson

The index folder is indeed gone but it seems to work. Maybe just a 
structural change...


Met vriendelijke groeten

Arkadi Colson

Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
T +32 11 64 08 80 • F +32 11 64 08 81

On 04/02/2013 04:08 PM, yayati wrote:

I moved solr 4.1 to solr 4.2 on one of slave server earlier my index
directory has index.timestamp, but now, it has only index folder no
timestamp. Is this is bug.?? Though size of index is same as on master . It
shows replication running on dasboard with both master and slave version.
what happened to timestamp in index directory


index.timestamp  -- earlier with 4.1

index  -- this is new folder

Please reply asap.

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4053179.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen

On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen  wrote:
> Memory does not help you if you commit too frequently. If you commit
> each X seconds and warming takes X+Y seconds, then you will run out of
> memory at some point.
>

How might I time the warming? I've been googling warming since your
earlier message but there does not seem to be any really good
documentation on the subject. If there is anything that you feel I
should be reading I would appreciate a link or a keyword to search on.
I've read the Solr wiki on caching and performance, but other than
that I don't see the issue addressed.

>> I have increased maxWarmingSearchers to 4, let's see how this goes.
>
> If you still get the error with 4 concurrent searchers, you will have to
> either speed up warmup time or commit less frequently. You should be
> able to reduce facet startup time by switching to segment based faceting
> (at the cost of worse search-time performance) or maybe by using
> DocValues. Some of the current threads on the solr-user list is about
> these topics.
>
> How often do you commit and how many unique values does your facet
> fields have?
>

Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true

Should I remove commit=true and run a cron job to commit once per minute?

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen

> How often do you commit and how many unique values does your facet
> fields have?
>

Most of the time I facet on one field that has about twenty unique
values. However, once per day I would like to facet on the text field,
which is a free-text field usually around 1 KiB (about 100 words), in
order to determine what the top keywords / topics are. That query
would take up to 200 seconds to run, but it does not have to return
the results in real-time (the output goes to another process, not to a
waiting user).

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Flow Chart of Solr

2013-04-02 Thread Yago Riveiro

For beginners is complicate understand the complexity of solr / lucene, I'm 
trying devel a custom search component and it's too hard keep in mind the flow, 
inheritance and iteration between classes. I think that there is a gap between 
software doc and user doc, or maybe I don't search enough T_T. Java doc not 
always is clear always.  

The fact that I'm beginner in solr world don't help.

Either way, this thread was very helpful, I found some very good resources here 
:)   

Cumprimentos

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:

> Actually maybe one the most important core thing is that Analysis part at
> last diagram but there is nothing about it i.e. stamming, lemmitazing etc.
> at any of them.
>  
>  
> 2013/4/2 Andre Bois-Crettez  (mailto:andre.b...@kelkoo.com)>
>  
> >  
> > On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
> >  
> > > (13/04/02 21:45), Furkan KAMACI wrote:
> > >  
> > > > Is there any documentation something like flow chart of Solr. i.e.
> > > > Documents comes into Solr(maybe indicating which classes get documents)
> > > > and
> > > > goes to parsing process (i.e. stemming processes etc.) and then reverse
> > > > indexes are get so on so forth?
> > > >  
> > > > There is an interesting ticket:
> > >  
> > > Architecture Diagrams needed for Lucene, Solr and Nutch
> > > https://issues.apache.org/**jira/browse/LUCENE-2412
> > >  
> > > koji
> >  
> > I like this one, it is a bit more detailed :
> >  
> > http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/
> >  
> > --
> > André Bois-Crettez
> >  
> > Search technology, Kelkoo
> > http://www.kelkoo.com/
> >  
> >  
> > Kelkoo SAS
> > Société par Actions Simplifiée
> > Au capital de € 4.168.964,30
> > Siège social : 8, rue du Sentier 75002 Paris
> > 425 093 069 RCS Paris
> >  
> > Ce message et les pièces jointes sont confidentiels et établis à
> > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> > destinataire de ce message, merci de le détruire et d'en avertir
> > l'expéditeur.
> >  
>  
>  
>

Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI

You are right about mentioning developer doc and user doc. Users separate
about it. Some of them uses Solr for indexing and monitoring via admin face
and that is quietly enough for them however some people wants to modify it
so it would be nice if there had been some documentation for developer side
too.


2013/4/2 Yago Riveiro 

> For beginners is complicate understand the complexity of solr / lucene,
> I'm trying devel a custom search component and it's too hard keep in mind
> the flow, inheritance and iteration between classes. I think that there is
> a gap between software doc and user doc, or maybe I don't search enough
> T_T. Java doc not always is clear always.
>
> The fact that I'm beginner in solr world don't help.
>
> Either way, this thread was very helpful, I found some very good resources
> here :)
>
> Cumprimentos
>
> --
> Yago Riveiro
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:
>
> > Actually maybe one the most important core thing is that Analysis part at
> > last diagram but there is nothing about it i.e. stamming, lemmitazing
> etc.
> > at any of them.
> >
> >
> > 2013/4/2 Andre Bois-Crettez  andre.b...@kelkoo.com)>
> >
> > >
> > > On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
> > >
> > > > (13/04/02 21:45), Furkan KAMACI wrote:
> > > >
> > > > > Is there any documentation something like flow chart of Solr. i.e.
> > > > > Documents comes into Solr(maybe indicating which classes get
> documents)
> > > > > and
> > > > > goes to parsing process (i.e. stemming processes etc.) and then
> reverse
> > > > > indexes are get so on so forth?
> > > > >
> > > > > There is an interesting ticket:
> > > >
> > > > Architecture Diagrams needed for Lucene, Solr and Nutch
> > > > https://issues.apache.org/**jira/browse/LUCENE-2412<
> https://issues.apache.org/jira/browse/LUCENE-2412>
> > > >
> > > > koji
> > >
> > > I like this one, it is a bit more detailed :
> > >
> > > http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/<
> http://www.cominvent.com/2011/04/04/solr-architecture-diagram/>
> > >
> > > --
> > > André Bois-Crettez
> > >
> > > Search technology, Kelkoo
> > > http://www.kelkoo.com/
> > >
> > >
> > > Kelkoo SAS
> > > Société par Actions Simplifiée
> > > Au capital de € 4.168.964,30
> > > Siège social : 8, rue du Sentier 75002 Paris
> > > 425 093 069 RCS Paris
> > >
> > > Ce message et les pièces jointes sont confidentiels et établis à
> > > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> > > destinataire de ce message, merci de le détruire et d'en avertir
> > > l'expéditeur.
> > >
> >
> >
> >
>
>
>

Re: [ANNOUNCE] Solr wiki editing change

2013-04-02 Thread Ryan Ernst

Please add RyanErnst to the contributors group.  Thanks!

On Mon, Apr 1, 2013 at 7:04 PM, Steve Rowe  wrote:

> On Apr 1, 2013, at 9:40 PM, "Vaillancourt, Tim" 
> wrote:
> > I would also like to contribute to SolrCloud's wiki where possible.
> Please add myself (TimVaillancourt) when you have a chance.
>
> Added to solr wiki ContributorsGroup.

Re: [ANNOUNCE] Solr wiki editing change

2013-04-02 Thread Steve Rowe

On Apr 2, 2013, at 11:23 AM, Ryan Ernst  wrote:
> Please add RyanErnst to the contributors group.  Thanks!

Added to solr wiki ContributorsGroup.

Re: Out of memory on some faceting queries

2013-04-02 Thread Andre Bois-Crettez


On 04/02/2013 05:04 PM, Dotan Cohen wrote:

How might I time the warming? I've been googling warming since your
earlier message but there does not seem to be any really good
documentation on the subject. If there is anything that you feel I
should be reading I would appreciate a link or a keyword to search on.
I've read the Solr wiki on caching and performance, but other than
that I don't see the issue addressed.


warmupTime is available on the admin page for each type of cache (in
milliseconds) :
http://solr-box:8983/solr/#/core1/plugins/cache

Or if you are only interested in the total :
http://solr-box:8983/solr/core1/admin/mbeans?stats=true&key=searcher


Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true Should I
remove commit=true and run a cron job to commit once per minute?


Even better, it sounds like a job for CommitWithin :
http://wiki.apache.org/solr/CommitWithin


André

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Re: [ANNOUNCE] Solr wiki editing change

2013-04-02 Thread Furkan KAMACI

Hi;

Please add FurkanKAMACI to the group.

Thanks;
Furkan KAMACI


2013/4/2 Steve Rowe 

> On Apr 2, 2013, at 11:23 AM, Ryan Ernst  wrote:
> > Please add RyanErnst to the contributors group.  Thanks!
>
> Added to solr wiki ContributorsGroup.
>

Job: Apache solr (Recruiting)

2013-04-02 Thread jessica katz

We have openings for Middleware architects (Apache solr)
*Locations:* Mountain View, California,New York City, NY, Houston, TEXAS

Mail me your resumes to jess...@kudukisgroup.com.
We can discuss more over the phone.

Thanks,
Jessica

Re: [ANNOUNCE] Solr wiki editing change

2013-04-02 Thread Steve Rowe

On Apr 2, 2013, at 11:28 AM, Furkan KAMACI  wrote:
> Please add FurkanKAMACI to the group.

Added to solr wiki ContributorsGroup.

Solrj 4.2 - CloudSolrServer aliases are not loaded

2013-04-02 Thread Elodie Sannier


Hello,

I am using the new collection alias feature, and it seems
CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for
update or select.

When I'm requesting the CloudSolrServer with a collection alias name, I
have the error:
org.apache.solr.common.SolrException: Collection not found:
aliasedCollection

The collection alias cannot be found because, in
CloudSolrServer#getCollectionList (line 319) method, the alias variable
is always empty.

When I'm requesting the CloudSolrServer, the connect method is called
and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method.
In the ZkStateReader#createClusterStateWatchersAndUpdate method, the
aliases are not loaded.

line 295, the data from /clusterstate.json are loaded :
ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet);
this.clusterState = clusterState;

Should we have the same data loading from /aliases.json, in order to
fill the aliases field ?
line 299, a Watcher for aliases is created but does not seem used.


As a workaround to avoid the error, I have to force the aliases loading
at my application start and when the aliases are updated:
CloudSolrServer solrServer = new CloudSolrServer("localhost:2181");
solrServer.setDefaultCollection("aliasedCollection");
solrServer.connect();
solrServer.getZkStateReader().updateAliases();

Is there a better way to use collection aliases with solrj ?

Elodie Sannier

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Lukasz Kujawa

If I use admin API instead of collection API according to my understanding
the new core will be only available on that server. If I will query
different solr server I will get an error. If I use collections API and I
query a server which physically doesn't hold the data I will still get
results. Creating cores "manually" across all Solr servers doesn't feel like
the right way to go.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155p4053230.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr URL uses non-standard format with pound sign

2013-04-02 Thread Dennis Haller

The Solr URL in Solr 4.2 for my localhost installation looks like this:
http://localhost:8883/solr/#/development_shard1_replica1

This URL when constructed dynamically in Ruby will not validate with the
Ruby URI:HTTP class because of the # sign in the path. This is a
non-standard URL as per RFC1738.

Here is the error message:

#


Is there another way to access the Solr URL without using the "#" sign?

Thanks,
Dennis Haller

Re: Solr URL uses non-standard format with pound sign

2013-04-02 Thread Chris Hostetter


: The Solr URL in Solr 4.2 for my localhost installation looks like this:
: http://localhost:8883/solr/#/development_shard1_replica1
: 
: This URL when constructed dynamically in Ruby will not validate with the
: Ruby URI:HTTP class because of the # sign in the path. This is a
: non-standard URL as per RFC1738.

1) RFC 1738 is antiquated, Among other things, RFC 3986 is much relevant 
and clarifies that "#" is a fragment identifier

2) the URL you are refering to is a *UI* view, and the fragement 
(/development_shard1_replica1) is dealt with entirely by your web browser 
via javascript.

3) for dealing with solr's HTTP APIs programaticly the type of "base url" 
you want will either be "http://localhost:8883/solr/"; or 
"http://localhost:8883/solr/development_shard1_replica1"; depending on 
wether your client code is expecting a base url for the entire server (to 
query multiple SolrCores), or a base url for a single SolrCore.


-Hoss

Re: Flow Chart of Solr

2013-04-02 Thread Alexandre Rafalovitch

I think there is a gap in the support of one's path of learning Solr . I'll
try to describe it based on my own experience. Hopefully, it is helpful.

At First, there is a "Solr is a blackbox" stage, where the person may not
know Java and is just using out of the box components. Wiki is reasonably
helpful there and there are other resources (blogs, etc). At this point,
Lucene is a black box within the black box and is something that is safely
ignored.

At the second stage, one hits the period where he/she understands what is
going on in their basic scenario and is trying to get into more advanced
case. This could be putting together a complex analyzer chain, trying to
use Update Request Processors or optimizing slow/OOM imports or doing
complex queries. Suddenly, they are pointed directly at Javadocs and have
to figure out the way around Java-based instructions. A Java programmer can
bridge that gap and get over the curve, but I suspect others get lost very
quickly and get stuck even when they don't need to be good programmers. An
example in my mind would be something like RegexReplaceProcessor. One has
to climb up and down the inheritance chain of the Javadoc to figure out
what can be done and what the parameters are. And the parameters syntax is
Java regular expressions rather than something used in copyField, so they
need to jump over and figure that out. So, it is fairly hard to envisage
those pieces and how they can combine together. Similarly, some of the
stuff is described in Jira requests, but also in a way that requires a
programmer's mind-set to parse it out. I think a lot of people drop out at
this stage and fall-back to 'black-box' view of Solr. Most of the questions
I see on Stack Overflow are conceptual troubles at this stage.

And then, those who get to the third stage, jump to the advanced level
where one could just read the source code to figure out what is going on. I
found www.grepcode.com to be useful (though it is quite slow now and is a
bit behind for Solr). Somewhere around here, one also starts to realize the
fuzzy relation between the Lucene and Solr code and becomes somewhat
clearer what Solr's benefits actually are (as opposed to bare Lucene's).
This also generates its own frustration and confusion of course, because
suddenly one starts to wish for Lucene's features that Solr does not use
(e.g. split/sync analyzer chains, some alternative facet implementation
features, etc).

And finally (at the end of the beginning), you become the contributor
and become very familiar with subversion/ant/etc. Though, I suspect, the
contributors become more specialized and actually understand less about
other parts of the system (e.g. Is anyone still fully understanding DIH?).

I am not blaming anyone with this story for the lack of support. I think
Solr is - in many ways - better documented than many other open source
projects. And the new manual being contributed to replace Wiki will (soon?)
make this even better. And, of course, this mailing list
is indescribably awesome. I am just trying to provide a fresh view of what
I went through and where I see people getting stuck.

I think a bit more effort in documenting that second stage would bring more
people to the community. I am trying to do my share through Wiki updates,
questions here, Jira issues, my upcoming book and some other little things.
I see others do the same. Perhaps, the diagram is something that we should
explicitly try to do. Though, I think it would be more fun to do it as a
Scrollorama Inception Explained style (
http://www.inception-explained.com/). :-)

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

On Tue, Apr 2, 2013 at 11:22 AM, Furkan KAMACI wrote:

> You are right about mentioning developer doc and user doc. Users separate
> about it. Some of them uses Solr for indexing and monitoring via admin face
> and that is quietly enough for them however some people wants to modify it
> so it would be nice if there had been some documentation for developer side
> too.
>
>
> 2013/4/2 Yago Riveiro 
>
> > For beginners is complicate understand the complexity of solr / lucene,
> > I'm trying devel a custom search component and it's too hard keep in mind
> > the flow, inheritance and iteration between classes. I think that there
> is
> > a gap between software doc and user doc, or maybe I don't search enough
> > T_T. Java doc not always is clear always.
> >
> > The fact that I'm beginner in solr world don't help.
> >
> > Either way, this thread was very helpful, I found some very good
> resources
> > here :)
> >
> > Cumprimentos
> >
> > --
> > Yago Riveiro
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >
> >
> > On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:
> >
> > > Actually maybe one

Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Yago Riveiro

Solr 4.2 implements a feature to proxy requests if the core not exists in node 
requested. https://issues.apache.org/jira/browse/SOLR-4210

Actually exists a bug in this mechanism 
https://issues.apache.org/jira/browse/SOLR-4584 

Without the proxy feature, creating the cores using manually or on automatic 
way, you only can query the collection in nodes that have least 1 replica of 
the collection.

If you have a solrCluster with 4 nodes and the collection only have 2 shards 
without replicas, then you can only query the collection in 50% of the cluster. 
(assuming that proxy request mechanism doesn't work properly)

When I said to create manually the collection, you need to create manually all 
shards that form the collection and the replicas in the others nodes of the 
cluster. It takes work, but if you want have some control you need to pay the 
price.

If it is possible that you can manage the name of shard with the collection 
API, the documentation doesn't say how.



-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 5:15 PM, Lukasz Kujawa wrote:

> l

Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Lukasz Kujawa

Thank you for you answers Yriveiro. I'm trying to use Solr for a big SaaS
platform. The reason why I want everything dynamic is each user will get own
Solr collection. It looks like there are still many issues with the
distributed computing. I hope 4.3 will arrive soon ;-) Anyway.. once again
thank you for your time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155p4053245.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flow Chart of Solr

2013-04-02 Thread Yago Riveiro

Alexandre,   

You describe the normal path when a beginner try to use a source of code that 
doesn't understand, black-box, reading code, hacking, ok now I know 10% of the 
project, with lucky :p.

First at all, the Solr community is fantastic and always helps when I need it. 
IMHO the devel documentation is dispersed in a lot of sources, blogs, wiki, 
lucidWorks wiki (I know that this wiki was donated to apache and it's in 
progress to present to the world as part of the project).

The curve for do funny thing with Solr at source level is hard, I see a lot of 
webinars teaching how deploy and use solr, but not how developing a 
ResponseWriter or a SearchComponent.

Unfortunately I don't have the knowledge to contribute right, in the future … 
will see.

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 5:24 PM, Alexandre Rafalovitch wrote:

> ommunity. I am trying to do my share throu

Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Yago Riveiro

I use solr with a similar propose, I'm understand that you want have control 
that as the sharing is done :) 

Regards.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 5:54 PM, Lukasz Kujawa wrote:

> Thank you for you answers Yriveiro. I'm trying to use Solr for a big SaaS
> platform. The reason why I want everything dynamic is each user will get own
> Solr collection. It looks like there are still many issues with the
> distributed computing. I hope 4.3 will arrive soon ;-) Anyway.. once again
> thank you for your time.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155p4053245.html
> Sent from the Solr - User mailing list archive at Nabble.com 
> (http://Nabble.com).
> 
>

Re: Solrj 4.2 - CloudSolrServer aliases are not loaded

2013-04-02 Thread Mark Miller

Answers inline:

On Apr 2, 2013, at 11:45 AM, Elodie Sannier  wrote:

> Hello,
> 
> I am using the new collection alias feature, and it seems
> CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for
> update or select.
> 
> When I'm requesting the CloudSolrServer with a collection alias name, I
> have the error:
> org.apache.solr.common.SolrException: Collection not found:
> aliasedCollection
> 
> The collection alias cannot be found because, in
> CloudSolrServer#getCollectionList (line 319) method, the alias variable
> is always empty.
> 
> When I'm requesting the CloudSolrServer, the connect method is called
> and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method.
> In the ZkStateReader#createClusterStateWatchersAndUpdate method, the
> aliases are not loaded.
> 
> line 295, the data from /clusterstate.json are loaded :
> ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet);
> this.clusterState = clusterState;
> 
> Should we have the same data loading from /aliases.json, in order to
> fill the aliases field ?
> line 299, a Watcher for aliases is created but does not seem used.

The Watcher is used. It updates the Aliases if they changed - there is some lag 
time though. There is some work that tries to avoid the lag in the update being 
a problem, but I'm guessing somehow it's not covering your case. 

It wouldn't hurt to add the updateAliases call automatically on ZkStateReader 
init. If the watcher was indeed not being used, that would not solve things 
though - the client still needs to be able to detect alias additions and 
changes.

Your best bet is to file a JIRA issue so we can work on a test that mimics what 
you are seeing.

- Mark

> 
> 
> As a workaround to avoid the error, I have to force the aliases loading
> at my application start and when the aliases are updated:
> CloudSolrServer solrServer = new CloudSolrServer("localhost:2181");
> solrServer.setDefaultCollection("aliasedCollection");
> solrServer.connect();
> solrServer.getZkStateReader().updateAliases();
> 
> Is there a better way to use collection aliases with solrj ?
> 
> Elodie Sannier
> 
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
> 
> Ce message et les pièces jointes sont confidentiels et établis à l'attention 
> exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
> message, merci de le détruire et d'en avertir l'expéditeur.

A request handler that manipulated the index

2013-04-02 Thread Benson Margulies

I am thinking about trying to structure a problem as a Solr plugin. The
nature of the plugin is that it would need to read and write the lucene
index to do its work. It could not be cleanly split into URP 'over here'
and a Search Component 'over there'.

Are there invariants of Solr that would preclude this, like assumptions in
the implementation of the cache?

Re: Solrj 4.2 - CloudSolrServer aliases are not loaded

2013-04-02 Thread Mark Miller

I think the current tests probably build the cloudsolrserver before creating 
the aliases - sounds like we need to do some creating the cloudsolrserver after.

- Mark

On Apr 2, 2013, at 1:31 PM, Mark Miller  wrote:

> Answers inline:
> 
> On Apr 2, 2013, at 11:45 AM, Elodie Sannier  wrote:
> 
>> Hello,
>> 
>> I am using the new collection alias feature, and it seems
>> CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for
>> update or select.
>> 
>> When I'm requesting the CloudSolrServer with a collection alias name, I
>> have the error:
>> org.apache.solr.common.SolrException: Collection not found:
>> aliasedCollection
>> 
>> The collection alias cannot be found because, in
>> CloudSolrServer#getCollectionList (line 319) method, the alias variable
>> is always empty.
>> 
>> When I'm requesting the CloudSolrServer, the connect method is called
>> and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method.
>> In the ZkStateReader#createClusterStateWatchersAndUpdate method, the
>> aliases are not loaded.
>> 
>> line 295, the data from /clusterstate.json are loaded :
>> ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet);
>> this.clusterState = clusterState;
>> 
>> Should we have the same data loading from /aliases.json, in order to
>> fill the aliases field ?
>> line 299, a Watcher for aliases is created but does not seem used.
> 
> The Watcher is used. It updates the Aliases if they changed - there is some 
> lag time though. There is some work that tries to avoid the lag in the update 
> being a problem, but I'm guessing somehow it's not covering your case. 
> 
> It wouldn't hurt to add the updateAliases call automatically on ZkStateReader 
> init. If the watcher was indeed not being used, that would not solve things 
> though - the client still needs to be able to detect alias additions and 
> changes.
> 
> Your best bet is to file a JIRA issue so we can work on a test that mimics 
> what you are seeing.
> 
> - Mark
> 
>> 
>> 
>> As a workaround to avoid the error, I have to force the aliases loading
>> at my application start and when the aliases are updated:
>> CloudSolrServer solrServer = new CloudSolrServer("localhost:2181");
>> solrServer.setDefaultCollection("aliasedCollection");
>> solrServer.connect();
>> solrServer.getZkStateReader().updateAliases();
>> 
>> Is there a better way to use collection aliases with solrj ?
>> 
>> Elodie Sannier
>> 
>> Kelkoo SAS
>> Société par Actions Simplifiée
>> Au capital de € 4.168.964,30
>> Siège social : 8, rue du Sentier 75002 Paris
>> 425 093 069 RCS Paris
>> 
>> Ce message et les pièces jointes sont confidentiels et établis à l'attention 
>> exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
>> message, merci de le détruire et d'en avertir l'expéditeur.
>

Re: Flow Chart of Solr

2013-04-02 Thread Alexandre Rafalovitch

Yago,

My point - perhaps lost in too much text - was that Solr is presented - and
can function - as a black-box. Which makes it different from more
traditional open-source project. So, the stage-2 happens exactly when the
non-programmers have to cross the boundary from the black-box into
code-first approach and the hand-off is not particularly smooth. Or even
when - say - php or .Net programmer  tries to get beyond the basic
operations their client library and has the understand the server-side
aspects of Solr.

Regards,
   Alex.

On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro  wrote:

> Alexandre,
>
> You describe the normal path when a beginner try to use a source of code
> that doesn't understand, black-box, reading code, hacking, ok now I know
> 10% of the project, with lucky :p.
>

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

WADL for REST service?

2013-04-02 Thread Peter Schï¿½tt

Hallo,

does a WADL exists for the REST service of SOLR?

Ciao
  Peter SchÃ¼tt

Re: Solrj 4.2 - CloudSolrServer aliases are not loaded

2013-04-02 Thread Mark Miller

I've created https://issues.apache.org/jira/browse/SOLR-4664

- Mark

On Apr 2, 2013, at 2:07 PM, Mark Miller  wrote:

> I think the current tests probably build the cloudsolrserver before creating 
> the aliases - sounds like we need to do some creating the cloudsolrserver 
> after.
> 
> - Mark
> 
> On Apr 2, 2013, at 1:31 PM, Mark Miller  wrote:
> 
>> Answers inline:
>> 
>> On Apr 2, 2013, at 11:45 AM, Elodie Sannier  wrote:
>> 
>>> Hello,
>>> 
>>> I am using the new collection alias feature, and it seems
>>> CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for
>>> update or select.
>>> 
>>> When I'm requesting the CloudSolrServer with a collection alias name, I
>>> have the error:
>>> org.apache.solr.common.SolrException: Collection not found:
>>> aliasedCollection
>>> 
>>> The collection alias cannot be found because, in
>>> CloudSolrServer#getCollectionList (line 319) method, the alias variable
>>> is always empty.
>>> 
>>> When I'm requesting the CloudSolrServer, the connect method is called
>>> and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method.
>>> In the ZkStateReader#createClusterStateWatchersAndUpdate method, the
>>> aliases are not loaded.
>>> 
>>> line 295, the data from /clusterstate.json are loaded :
>>> ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet);
>>> this.clusterState = clusterState;
>>> 
>>> Should we have the same data loading from /aliases.json, in order to
>>> fill the aliases field ?
>>> line 299, a Watcher for aliases is created but does not seem used.
>> 
>> The Watcher is used. It updates the Aliases if they changed - there is some 
>> lag time though. There is some work that tries to avoid the lag in the 
>> update being a problem, but I'm guessing somehow it's not covering your 
>> case. 
>> 
>> It wouldn't hurt to add the updateAliases call automatically on 
>> ZkStateReader init. If the watcher was indeed not being used, that would not 
>> solve things though - the client still needs to be able to detect alias 
>> additions and changes.
>> 
>> Your best bet is to file a JIRA issue so we can work on a test that mimics 
>> what you are seeing.
>> 
>> - Mark
>> 
>>> 
>>> 
>>> As a workaround to avoid the error, I have to force the aliases loading
>>> at my application start and when the aliases are updated:
>>> CloudSolrServer solrServer = new CloudSolrServer("localhost:2181");
>>> solrServer.setDefaultCollection("aliasedCollection");
>>> solrServer.connect();
>>> solrServer.getZkStateReader().updateAliases();
>>> 
>>> Is there a better way to use collection aliases with solrj ?
>>> 
>>> Elodie Sannier
>>> 
>>> Kelkoo SAS
>>> Société par Actions Simplifiée
>>> Au capital de € 4.168.964,30
>>> Siège social : 8, rue du Sentier 75002 Paris
>>> 425 093 069 RCS Paris
>>> 
>>> Ce message et les pièces jointes sont confidentiels et établis à 
>>> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le 
>>> destinataire de ce message, merci de le détruire et d'en avertir 
>>> l'expéditeur.
>> 
>

RE: Confusion over Solr highlight hl.q parameter

2013-04-02 Thread Van Tassell, Kristian

Thanks Koji, this helped with some of our problems, but it is still not perfect.

This query, for example, returns no highlighting:

?q=id:abc123&hl.q=text_it_IT:l'assieme&hl.fl=text_it_IT&hl=true&defType=edismax

But this one does (when it is, in effect, the same query):

?q=text_it_IT:l'assieme&hl=true&defType=edismax&hl.fl=text_it_IT

I've tried many combinations but can't seem to get the right one to work. Is 
this possibly a bug? 

-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Saturday, March 16, 2013 6:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Confusion over Solr highlight hl.q parameter

(13/03/16 4:08), Van Tassell, Kristian wrote:
> Hello everyone,
> 
> If I search for a term “baz” and tell it to highlight it, it highlights just 
> fine.
> 
> If, however, I search for “foo bar” using the q parameter, which appears in 
> that same document/same field, and use the hl.q parameter to search and 
> highlight “baz”, I get no highlighting results for “baz”.
> 
> ?q=パーツにおける機能強化
> &qf=text_ja_JP
> &defType=edismax
> &hl=true
> &hl.simple.pre=
> &hl.simple.post=
> &hl.fl=text_ja_JP
> 
> The above highlights query term just fine.
> 
> ?q=1234
> &hl.q=パーツにおける機能強化
> &qf=id
> &defType=edismax
> &hl=true
> &hl.simple.pre=
> &hl.simple.post=
> &hl.fl=text_ja_JP
> 
> This one returns zero highlighting hits.

I'm just guessing, Solr highlighter tries to highlight "パーツにおける機能強化" in your 
default search field? Can you try hl.q=text_ja_JP:パーツにおける機能強化 .

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html

Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI

I think about myself as an example. I have started to make research about
Solr just for some weeks. I have learned Solr and its related projects. My
next step writing down the main steps Solr. We have separated learning
curve of Solr into two main categories.
First one is who are using it as out of the box components. Second one is
developer side.

Actually developer side branches into two way.

First one is general steps of it. i.e. document comes into Solr (i.e.
crawled data of Nutch). which analyzing processes are going to done
(stamming, hamming etc.), what will be doing after parsing step by step.
When a search query happens what happens step by step, at which step scores
are calculated so on so forth.
Second one is more code specific i.e. which handlers takes into account
data that will going to be indexed(no need the explain every handler at
this step) . Which are the analyzer, tokenizer classes and what are the
flow between them. How response handlers works and what are they.

Also explaining about cloud side is other work.

Some of explanations are currently presents at wiki (but some of them are
at very deep places at wiki and it is not easy to find the parent topic of
it, maybe starting wiki from a top age and branching all other topics as
possible as from it could be better)

If we could show the big picture, and beside of it the smaller pictures
within it, it would be great (if you know the main parts it will be easy to
go deep into the code i.e. you don't need to explain every handler, if you
show the way to the developer he/she could debug and find the needs)

When I think about myself as an example, I have to write down the steps of
Solr a bit detail  even I read many pages at wiki and a book about it, I
see that it is not easy even writing down the big picture of developer side.

2013/4/2 Alexandre Rafalovitch 

> Yago,
>
> My point - perhaps lost in too much text - was that Solr is presented - and
> can function - as a black-box. Which makes it different from more
> traditional open-source project. So, the stage-2 happens exactly when the
> non-programmers have to cross the boundary from the black-box into
> code-first approach and the hand-off is not particularly smooth. Or even
> when - say - php or .Net programmer  tries to get beyond the basic
> operations their client library and has the understand the server-side
> aspects of Solr.
>
> Regards,
>Alex.
>
> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro 
> wrote:
>
> > Alexandre,
> >
> > You describe the normal path when a beginner try to use a source of code
> > that doesn't understand, black-box, reading code, hacking, ok now I know
> > 10% of the project, with lucky :p.
> >
>
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>

Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson

I am currently looking at moving our Solr cluster to 4.2 and noticed a
strange issue while testing today.  Specifically the replica has a higher
version than the master which is causing the index to not replicate.
 Because of this the replica has fewer documents than the master.  What
could cause this and how can I resolve it short of taking down the index
and scping the right version in?

MASTER:
Last Modified:about an hour ago
Num Docs:164880
Max Doc:164880
Deleted Docs:0
Version:2387
Segment Count:23

REPLICA:
Last Modified: about an hour ago
Num Docs:164773
Max Doc:164773
Deleted Docs:0
Version:3001
Segment Count:30

in the replicas log it says this:

INFO: Creating new http client,
config:maxConnectionsPerHost=20&maxConnections=1&connTimeout=3&socketTimeout=3&retry=false

Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync

INFO: PeerSync: core=dsc-shard5-core2
url=http://10.38.33.17:7577/solrSTART replicas=[
http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100

Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions

INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr
Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/

Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions

INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr  Our
versions are newer. ourLowThreshold=1431233788792274944
otherHigh=1431233789440294912

Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync

INFO: PeerSync: core=dsc-shard5-core2
url=http://10.38.33.17:7577/solrDONE. sync succeeded


which again seems to point that it thinks it has a newer version of the
index so it aborts.  This happened while having 10 threads indexing 10,000
items writing to a 6 shard (1 replica each) cluster.  Any thoughts on this
or what I should look for would be appreciated.

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Mark Miller

I don't think the versions you are thinking of apply here. Peersync does not 
look at that - it looks at version numbers for updates in the transaction log - 
it compares the last 100 of them on leader and replica. What it's saying is 
that the replica seems to have versions that the leader does not. Have you 
scanned the logs for any interesting exceptions?

Did the leader change during the heavy indexing? Did any zk session timeouts 
occur?

- Mark

On Apr 2, 2013, at 4:52 PM, Jamie Johnson  wrote:

> I am currently looking at moving our Solr cluster to 4.2 and noticed a
> strange issue while testing today.  Specifically the replica has a higher
> version than the master which is causing the index to not replicate.
> Because of this the replica has fewer documents than the master.  What
> could cause this and how can I resolve it short of taking down the index
> and scping the right version in?
> 
> MASTER:
> Last Modified:about an hour ago
> Num Docs:164880
> Max Doc:164880
> Deleted Docs:0
> Version:2387
> Segment Count:23
> 
> REPLICA:
> Last Modified: about an hour ago
> Num Docs:164773
> Max Doc:164773
> Deleted Docs:0
> Version:3001
> Segment Count:30
> 
> in the replicas log it says this:
> 
> INFO: Creating new http client,
> config:maxConnectionsPerHost=20&maxConnections=1&connTimeout=3&socketTimeout=3&retry=false
> 
> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
> 
> INFO: PeerSync: core=dsc-shard5-core2
> url=http://10.38.33.17:7577/solrSTART replicas=[
> http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
> 
> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
> 
> INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr
> Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/
> 
> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
> 
> INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr  Our
> versions are newer. ourLowThreshold=1431233788792274944
> otherHigh=1431233789440294912
> 
> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
> 
> INFO: PeerSync: core=dsc-shard5-core2
> url=http://10.38.33.17:7577/solrDONE. sync succeeded
> 
> 
> which again seems to point that it thinks it has a newer version of the
> index so it aborts.  This happened while having 10 threads indexing 10,000
> items writing to a 6 shard (1 replica each) cluster.  Any thoughts on this
> or what I should look for would be appreciated.

Re: Add fuzzy to edismax specs?

2013-04-02 Thread Jan Høydahl

Note that the "pf" field already parses this syntax as of 4.0, but then it is 
used as a phrase-slop value. You could probably use same parsing code for qf.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

29. mars 2013 kl. 18:33 skrev Walter Underwood :

> I've implemented this for the second time, so it is probably time to 
> contribute it. I find it really useful.
> 
> I've extended the query spec parser for edismax to also accept a tilde and to 
> generate a FuzzyQuery. I used this at Netflix (on 1.3 with dismax), and 
> re-implemented it for 3.3 here at Chegg. We've had it in production for 
> nearly a year. I'll need to re-port this as part of our move to 4.x.
> 
> Here is what the spec looks like. This expands to a fuzzy search on title 
> with a similarity of 0.75, and so on.
> 
>   title~0.75^4 long_title^4 title_stem^2 author~0.75
> 
> I'm not 100% sure I understand the spec parser in edismax, so I'd like some 
> review when this is ready. I'd probably only do it for edismax.
> 
> See: https://issues.apache.org/jira/browse/SOLR-629
> 
> wunder
> --
> Walter Underwood
> wun...@wunderwood.org
> Search Guy, Chegg.com
>

Re: Solr Phonetic Search Highlight issue in search results

2013-04-02 Thread Jan Høydahl

If you want to highlight, you need to turn on highlighting for the actual field 
you search, and that field needs to be stored, i.e. &hl.fl=ContentSearchPhonetic

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

1. apr. 2013 kl. 14:16 skrev Erick Erickson :

> Good question, you're causing me to think... about code I know very
> little about .
> 
> So rather than spouting off, I tried it and.. it works fine for me, either 
> with
> or without using fast vector highlighter on, admittedly, a very simple test.
> 
> So I think I'd try peeling off all the extra stuff you've put into your 
> configs
> (sorry, I don't have time right now to try to reproduce) and get the very
> simple case working, then build the rest back up and see where the
> problem begins.
> 
> Sorry for the mis-direction!
> 
> Erick
> 
> 
> 
> On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar
>  wrote:
>> Hi Erick,
>> 
>> Thanks for the reply. But help me understand this: If Solr is able to
>> isolate the two documents which contain the term "fact" being the phonetic
>> equivalent of the search term "fakt", then why will it be unable to
>> highlight the terms based on the same logic it uses to search the documents.
>> 
>> Also, it is correctly highlighting the results in other searches which are
>> also approximate searches and not exact ones for eg. Fuzzy or Synonym
>> search. In these cases also the highlights in the search results are far
>> from the actual search term but still they are getting correctly
>> highlighted.
>> 
>> Maybe I am getting it completely wrong but it looks like there is something
>> wrong with my implementation.
>> 
>> Thanks & Regards,
>> 
>> Soumya.
>> 
>> 
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: 27 March 2013 06:07 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Phonetic Search Highlight issue in search results
>> 
>> How would you expect it to highlight successfully? The term is "fakt",
>> there's nothing built in (and, indeed couldn't be) to un-phoneticize it into
>> "fact" and apply that to the Content field. The whole point of phonetic
>> processing is to do a lossy translation from the word into some variant,
>> losing precision all the way.
>> 
>> So this behavior is unsurprising...
>> 
>> Best
>> Erick
>> 
>> 
>> 
>> 
>> On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar >> wrote:
>> 
>>> When we are issuing a query with Phonetic Search, it is returning the
>>> correct documents but not returning the highlights. When we use
>>> Stemming or Synonym searches we are getting the proper highlights.
>>> 
>>> 
>>> 
>>> For example, when we execute a phonetic query for the term
>>> fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it
>>> returns two documents containing the term "fact"(phonetic token
>>> equivalent), but the list of highlights is empty as shown in the
>>> response below.
>>> 
>>> 
>>> 
>>>
>>> 
>>>
>>> 
>>>0
>>> 
>>>16
>>> 
>>>
>>> 
>>>  ContentSearchPhonetic:fakt
>>> 
>>>  xml
>>> 
>>>
>>> 
>>>  
>>> 
>>>
>>> 
>>>
>>> 
>>>  1
>>> 
>>>  Doc 1
>>> 
>>>  Anyway, this game was excellent and was
>>> well worth the time.  The graphics are truly amazing and the sound
>>> track was pretty pleasant also. The  preacher was in  fact a
>>> thief.
>>> 
>>>  1430480998833848320
>>> 
>>>
>>> 
>>>
>>> 
>>>  2
>>> 
>>>  Doc 2
>>> 
>>>  stunning. The  preacher was in  fact an
>>> excellent thief who  had stolen the original manuscript of Hamlet
>>> from an exhibit on the  Riviera, where  he also  acquired his
>>> remarkable and tan.
>>> 
>>>  1430480998841188352
>>> 
>>>
>>> 
>>>  
>>> 
>>>  
>>> 
>>>
>>> 
>>>
>>> 
>>>  
>>> 
>>>
>>> 
>>> 
>>> 
>>> Relevant section of Solr schema:
>>> 
>>> 
>>> 
>>>>> required="true"/>
>>> 
>>>>> required="true"/>
>>> 
>>>> stored="true"
>>> required="true"/>
>>> 
>>> 
>>> 
>>>>> stored="false" multiValued="true"/>
>>> 
>>>>> stored="false" multiValued="true"/>
>>> 
>>>> indexed="true"
>>> stored="false" multiValued="true"/>
>>> 
>>>>> stored="false" multiValued="true"/>
>>> 
>>> 
>>> 
>>>DocId
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>> 
>>> 
>>>
>>> 
>>>  
>>> 
>>> 
>>> 
>>> 
>>> 
>>>  
>>> 
>>>
>>> 
>>> 
>>> 
>>>
>>> 
>>>  
>>> 
>>> 
>>> 
>>> >> encoder="DoubleMetaphone" inject="false"/>
>>> 
>>>  
>>> 
>>>
>>> 
>>> 
>>> 
>>>
>>> 
>>>
>>> 
>>>  
>>> 
>>>  >> ignoreCase="true" expand="true"/>
>>> 
>>>
>>> 
>>>
>>> 
>>> 
>>> 
>>> Relevant section of Solr config:
>>> 
>>> 
>>> 
>>>
>>> 
>>>
>>> 
>>> 
>>> 
>>>   explicit
>>> 
>>>   100
>>> 
>>>   ContentSearch
>>> 
>>> true
>>> 
>>>Content
>>> 
>>>150
>>> 
>>>

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson

Looking at the master it looks like at some point there were shards that
went down.  I am seeing things like what is below.

NFO: A cluster state change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
nodes size: 12)
Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process
INFO: Updating live nodes... (9)
Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
runLeaderProcess
INFO: Running the leader process.
Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
shouldIBeLeader
INFO: Checking if I should try and be the leader.
Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
shouldIBeLeader
INFO: My last published State was Active, it's okay to be the leader.
Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
runLeaderProcess
INFO: I may be the new leader - try and sync



On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller  wrote:

> I don't think the versions you are thinking of apply here. Peersync does
> not look at that - it looks at version numbers for updates in the
> transaction log - it compares the last 100 of them on leader and replica.
> What it's saying is that the replica seems to have versions that the leader
> does not. Have you scanned the logs for any interesting exceptions?
>
> Did the leader change during the heavy indexing? Did any zk session
> timeouts occur?
>
> - Mark
>
> On Apr 2, 2013, at 4:52 PM, Jamie Johnson  wrote:
>
> > I am currently looking at moving our Solr cluster to 4.2 and noticed a
> > strange issue while testing today.  Specifically the replica has a higher
> > version than the master which is causing the index to not replicate.
> > Because of this the replica has fewer documents than the master.  What
> > could cause this and how can I resolve it short of taking down the index
> > and scping the right version in?
> >
> > MASTER:
> > Last Modified:about an hour ago
> > Num Docs:164880
> > Max Doc:164880
> > Deleted Docs:0
> > Version:2387
> > Segment Count:23
> >
> > REPLICA:
> > Last Modified: about an hour ago
> > Num Docs:164773
> > Max Doc:164773
> > Deleted Docs:0
> > Version:3001
> > Segment Count:30
> >
> > in the replicas log it says this:
> >
> > INFO: Creating new http client,
> >
> config:maxConnectionsPerHost=20&maxConnections=1&connTimeout=3&socketTimeout=3&retry=false
> >
> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
> >
> > INFO: PeerSync: core=dsc-shard5-core2
> > url=http://10.38.33.17:7577/solrSTART replicas=[
> > http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
> >
> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
> >
> > INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr
> > Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/
> >
> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
> >
> > INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our
> > versions are newer. ourLowThreshold=1431233788792274944
> > otherHigh=1431233789440294912
> >
> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
> >
> > INFO: PeerSync: core=dsc-shard5-core2
> > url=http://10.38.33.17:7577/solrDONE. sync succeeded
> >
> >
> > which again seems to point that it thinks it has a newer version of the
> > index so it aborts.  This happened while having 10 threads indexing
> 10,000
> > items writing to a 6 shard (1 replica each) cluster.  Any thoughts on
> this
> > or what I should look for would be appreciated.
>
>

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson

here is another one that looks interesting

Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the
leader, but locally we don't think so
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)



On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson  wrote:

> Looking at the master it looks like at some point there were shards that
> went down.  I am seeing things like what is below.
>
> NFO: A cluster state change: WatchedEvent state:SyncConnected
> type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
> nodes size: 12)
> Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process
> INFO: Updating live nodes... (9)
> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
> runLeaderProcess
> INFO: Running the leader process.
> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
> shouldIBeLeader
> INFO: Checking if I should try and be the leader.
> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
> shouldIBeLeader
> INFO: My last published State was Active, it's okay to be the leader.
> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
> runLeaderProcess
> INFO: I may be the new leader - try and sync
>
>
>
> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller  wrote:
>
>> I don't think the versions you are thinking of apply here. Peersync does
>> not look at that - it looks at version numbers for updates in the
>> transaction log - it compares the last 100 of them on leader and replica.
>> What it's saying is that the replica seems to have versions that the leader
>> does not. Have you scanned the logs for any interesting exceptions?
>>
>> Did the leader change during the heavy indexing? Did any zk session
>> timeouts occur?
>>
>> - Mark
>>
>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson  wrote:
>>
>> > I am currently looking at moving our Solr cluster to 4.2 and noticed a
>> > strange issue while testing today.  Specifically the replica has a
>> higher
>> > version than the master which is causing the index to not replicate.
>> > Because of this the replica has fewer documents than the master.  What
>> > could cause this and how can I resolve it short of taking down the index
>> > and scping the right version in?
>> >
>> > MASTER:
>> > Last Modified:about an hour ago
>> > Num Docs:164880
>> > Max Doc:164880
>> > Deleted Docs:0
>> > Version:2387
>> > Segment Count:23
>> >
>> > REPLICA:
>> > Last Modified: about an hour ago
>> > Num Docs:164773
>> > Max Doc:164773
>> > Deleted Docs:0
>> > Version:3001
>> > Segment Count:30
>> >
>> > in the replicas log it says this:
>> >
>> > INFO: Creating new http client,
>> >
>> config:maxConnectionsPerHost=20&maxConnections=1&connTimeout=3&socketTimeout=3&retry=false
>> >
>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
>> >
>> > INFO: PeerSync: core=dsc-shard5-core2
>> > url=http://10.38.33.17:7577/solrSTART replicas=[
>> > http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
>> >
>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
>> >
>> > INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr
>> > Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/
>> >
>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
>> >
>> > INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our
>> > versions are newer. ourLowThreshold=1431233788792274944
>> > otherHigh=1431233789440294912
>> >
>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
>> >
>> > INFO: PeerSync: core=dsc-shard5-core2
>> > url=http://10.38.33.17:7577/solrDONE. sync succeeded
>> >
>> >
>> > which again seems to point that it thinks it has a newer version of the
>> > index so it a

Lengthy description is converted to hash symbols

2013-04-02 Thread Danny Watari

Hi, I have a field that is defined to be of type "text_en".  Occasionally, I
notice that lengthy strings are converted to hash symbols.  Here is a
snippet of my field type:


  


  
  


  




Here is an example of the field's value:
###


Any ideas why this might be happening?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lengthy-description-is-converted-to-hash-symbols-tp4053338.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson

sorry for spamming here

shard5-core2 is the instance we're having issues with...

Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
SEVERE: shard update error StdNode:
http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException:
Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
status:503, message:Service Unavailable
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson  wrote:

> here is another one that looks interesting
>
> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the
> leader, but locally we don't think so
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
> at
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> at
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
> at
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>
>
>
> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson  wrote:
>
>> Looking at the master it looks like at some point there were shards that
>> went down.  I am seeing things like what is below.
>>
>> NFO: A cluster state change: WatchedEvent state:SyncConnected
>> type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
>> nodes size: 12)
>> Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
>> process
>> INFO: Updating live nodes... (9)
>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>> runLeaderProcess
>> INFO: Running the leader process.
>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>> shouldIBeLeader
>> INFO: Checking if I should try and be the leader.
>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>> shouldIBeLeader
>> INFO: My last published State was Active, it's okay to be the leader.
>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>> runLeaderProcess
>> INFO: I may be the new leader - try and sync
>>
>>
>>
>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller wrote:
>>
>>> I don't think the versions you are thinking of apply here. Peersync does
>>> not look at that - it looks at version numbers for updates in the
>>> transaction log - it compares the last 100 of them on leader and replica.
>>> What it's saying is that the replica seems to have versions that the leader
>>> does not. Have you scanned the logs for any interesting exceptions?
>>>
>>> Did the leader change during the heavy indexing? Did any zk session
>>> timeouts occur?
>>>
>>> - Mark
>>>
>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson  wrote:
>>>
>>> > I am currently looking at moving our Solr cluster to 4.2 and noticed a
>>> > strange issue while testing today.  Specifically the replica has a
>>> higher
>>> > version than the master which is causing the index to not replicate.
>>> > Because of this the replica has fewer documents than the master.  What
>>> > could cause this and how can I resolve it short of taking down the
>>> index
>>> > and scping the right version i

Re: Lengthy description is converted to hash symbols

2013-04-02 Thread Jack Krupansky

Can you enter the text on the Solr Admin UI Analysis page? Then you could 
tell which stage the issue occurs.


StandardTokenizer has a default token length limit of 255. You can override 
with the "maxTokenLength" attribute:


   maxTokenLength="1024" />


See:
https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizerFactory.html

But the "#" sounds like a bug.

-- Jack Krupansky

-Original Message- 
From: Danny Watari

Sent: Tuesday, April 02, 2013 5:45 PM
To: solr-user@lucene.apache.org
Subject: Lengthy description is converted to hash symbols

Hi, I have a field that is defined to be of type "text_en".  Occasionally, I
notice that lengthy strings are converted to hash symbols.  Here is a
snippet of my field type:


 
   
   
 
 
   
   
 




Here is an example of the field's value:
###


Any ideas why this might be happening?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lengthy-description-is-converted-to-hash-symbols-tp4053338.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Lengthy description is converted to hash symbols

2013-04-02 Thread Chris Hostetter


: Here is an example of the field's value:
: ###

where are you getting that  from? if that's what you see when 
you do a search for a document, then it has nothing to do with your 
fieldType or analyzer -- the strings returned from searches are the 
"stored" values, which are not modified by the analyzer at all.

What does your indexing code/process look like?
Do you have any custom UpdateProcessors?

details, details, details.

-Hoss

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson

Sorry I didn't ask the obvious question.  Is there anything else that I
should be looking for here and is this a bug?  I'd be happy to troll
through the logs further if more information is needed, just let me know.

Also what is the most appropriate mechanism to fix this.  Is it required to
kill the index that is out of sync and let solr resync things?


On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson  wrote:

> sorry for spamming here
>
> shard5-core2 is the instance we're having issues with...
>
> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
> SEVERE: shard update error StdNode:
> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException:
> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
> status:503, message:Service Unavailable
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> at
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
> at
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
>
> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson  wrote:
>
>> here is another one that looks interesting
>>
>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
>> the leader, but locally we don't think so
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>> at
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>> at
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>> at
>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>> at
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>> at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>>
>>
>>
>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson  wrote:
>>
>>> Looking at the master it looks like at some point there were shards that
>>> went down.  I am seeing things like what is below.
>>>
>>> NFO: A cluster state change: WatchedEvent state:SyncConnected
>>> type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
>>> nodes size: 12)
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
>>> process
>>> INFO: Updating live nodes... (9)
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>> runLeaderProcess
>>> INFO: Running the leader process.
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>> shouldIBeLeader
>>> INFO: Checking if I should try and be the leader.
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>> shouldIBeLeader
>>> INFO: My last published State was Active, it's okay to be the leader.
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>> runLeaderProcess
>>> INFO: I may be the new leader - try and sync
>>>
>>>
>>>
>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller wrote:
>>>
 I don't think the versions you are thinking of apply here. Peersync
 does not look at that - it looks at version numbers for updates in the
 transaction log - it compares the last 100 of them on leader and replica.
 What it's saying is that the replica seems to have versions that the leader
 does not. Have you scanned the logs for any interesting exceptions?

 Did the leader change during the heavy indexing? Did any

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Mark Miller

It would appear it's a bug given what you have said.

Any other exceptions would be useful. Might be best to start tracking in a JIRA 
issue as well.

To fix, I'd bring the behind node down and back again.

Unfortunately, I'm pressed for time, but we really need to get to the bottom of 
this and fix it, or determine if it's fixed in 4.2.1 (spreading to mirrors now).

- Mark

On Apr 2, 2013, at 7:21 PM, Jamie Johnson  wrote:

> Sorry I didn't ask the obvious question.  Is there anything else that I
> should be looking for here and is this a bug?  I'd be happy to troll
> through the logs further if more information is needed, just let me know.
> 
> Also what is the most appropriate mechanism to fix this.  Is it required to
> kill the index that is out of sync and let solr resync things?
> 
> 
> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson  wrote:
> 
>> sorry for spamming here
>> 
>> shard5-core2 is the instance we're having issues with...
>> 
>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>> SEVERE: shard update error StdNode:
>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException:
>> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
>> status:503, message:Service Unavailable
>>at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>>at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>at
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>>at
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>>at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>at java.lang.Thread.run(Thread.java:662)
>> 
>> 
>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson  wrote:
>> 
>>> here is another one that looks interesting
>>> 
>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
>>> the leader, but locally we don't think so
>>>at
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>>>at
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>>>at
>>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>>>at
>>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>>>at
>>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>>>at
>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>>>at
>>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>>at
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>>at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>>>at
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>>>at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>>> 
>>> 
>>> 
>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson  wrote:
>>> 
 Looking at the master it looks like at some point there were shards that
 went down.  I am seeing things like what is below.
 
 NFO: A cluster state change: WatchedEvent state:SyncConnected
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
 nodes size: 12)
 Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
 process
 INFO: Updating live nodes... (9)
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: Running the leader process.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: Checking if I should try and be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: My last published State was Active, it's okay to be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: I may be the new leader - try and

RequestHandler.. Conditional components

2013-04-02 Thread venkata

In our use cases,  for certain query terms, we want to redirect the query
processing to external system
& for the rest of the keywords, we want to continue with query component ,
facets etc.

Based on some condition it is possible to skip some components in a request
handler?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/RequestHandler-Conditional-components-tp4053381.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: MoreLikeThis - Odd results - what am I doing wrong?

2013-04-02 Thread David Parks

Isn't this an AWS security groups question? You should probably post this 
question on the AWS forums, but for the moment, here's the basic reading 
material - go set up your EC2 security groups and lock down your systems.


http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html

If you just want to password protect Solr here are the instructions:

http://wiki.apache.org/solr/SolrSecurity

But I most certainly would not leave it open to the world even with a password 
(note that the basic password authentication sends passwords in clear text if 
you're not using HTTPS, best lock the thing down behind a firewall).

Dave


-Original Message-
From: DC tech [mailto:dctech1...@gmail.com] 
Sent: Tuesday, April 02, 2013 1:02 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis - Odd results - what am I doing wrong?

OK - so I have my SOLR instance running on AWS. 
Any suggestions on how to safely share the link?  Right now, the whole SOLR 
instance is totally open. 



Gagandeep singh  wrote:

>say &debugQuery=true&mlt=true and see the scores for the MLT query, not 
>a sample query. You can use Amazon ec2 to bring up your solr, you 
>should be able to get a micro instance for free trial.
>
>
>On Mon, Apr 1, 2013 at 5:10 AM, dc tech  wrote:
>
>> I did try the raw query against the *simi* field and those seem to 
>> return results in the order expected.
>> For instance, Acura MDX has  ( large, SUV, 4WD   Luxury) in the simi field.
>> Running a query with those words against the simi field returns the 
>> expected models (X5, Audi Q5, etc) and then the subsequent documents 
>> have decreasing relevance. So the basic query mechanism seems to be fine.
>>
>> The issue just seems to be with MoreLikeThis component and handler.
>> I can post the index on a public SOLR instance - any suggestions? (or 
>> for
>> hosting)
>>
>>
>> On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh 
>> > >wrote:
>>
>> > If you can bring up your solr setup on a public machine then im 
>> > sure a
>> lot
>> > of debugging can be done. Without that, i think what you should 
>> > look at
>> is
>> > the tf-idf scores of the terms like "camry" etc. Usually idf is the 
>> > deciding factor into which results show at the top (tf should be 1 
>> > for
>> your
>> > data).
>> > Enable &debugQuery=true and look at explain section to see show 
>> > score is getting calculated.
>> >
>> > You should try giving different boosts to class, type, drive, size 
>> > to control the results.
>> >
>> >
>> > On Sun, Mar 31, 2013 at 8:52 PM, dc tech  wrote:
>> >
>> >> I am running some experiments on more like this and the results 
>> >> seem rather odd - I am doing something wrong but just cannot figure out 
>> >> what.
>> >> Basically, the similarity results are decent - but not great.
>> >>
>> >> *Issue 1  = Quality*
>> >> Toyota Camry : finds Altima (good) but then next one is Camry 
>> >> Hybrid whereas it should have found Accord.
>> >> I have normalized the data into a simi field which has only the 
>> >> attributes that I care about.
>> >> Without the simi field, I could not get mlt.qf boosts to work well
>> enough
>> >> to return results
>> >>
>> >> *Issue 2*
>> >> Some fields do not work at all. For instance, text+simi (in 
>> >> mlt.fl)
>> works
>> >> whereas just simi does not.
>> >> So some weirdness that am just not understanding.
>> >>
>> >> Would be grateful for your guidance !
>> >>
>> >>
>> >> Here is the setup:
>> >> *1. SOLR Version*
>> >> solr-spec 4.2.0.2013.03.06.22.32.13
>> >> solr-impl 4.2.0 1453694   rmuir - 2013-03-06 22:32:13
>> >> lucene-spec 4.2.0
>> >> lucene-impl 4.2.0 1453694 -  rmuir - 2013-03-06 22:25:29
>> >>
>> >> *2. Machine Information*
>> >> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23
>> >> 19.0-b09)
>> >> Windows 7 Home 64 Bit with 4 GB RAM
>> >>
>> >> *3. Sample Data *
>> >> I created this 'dummy' data of cars  - the idea being that these 
>> >> would
>> be
>> >> sufficient and simple to generate similarity and understand how it 
>> >> would work.
>> >> There are 181 rows in the data set (I have attached it for 
>> >> reference in CSV format)
>> >>
>> >> [image: Inline image 1]
>> >>
>> >> *4. SCHEMA*
>> >> *Field Definitions*
>> >>> >> termVectors="true" multiValued="false"/>
>> >>> >> termVectors="true" multiValued="false"/>
>> >>> >> termVectors="true" multiValued="false"/>
>> >>> >> termVectors="true" multiValued="false"/>
>> >>> >> termVectors="true" multiValued="false"/>
>> >>> >> termVectors="true" multiValued="false"/>
>> >>> stored="true"
>> >> termVectors="true" multiValued="true"/>
>> >>> >> termVectors="true" multiValued="false"/>
>> >> *
>> >> *
>> >> *Copy Fields*
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >>   
>> >> *  *
>> >> *  
>> >> *
>> >> *  *
>> >> *  
>> >> *
>> >>
>> >> Note that the "simi" field ends u

Re: Confusion over Solr highlight hl.q parameter

2013-04-02 Thread Koji Sekiguchi

(13/04/03 5:27), Van Tassell, Kristian wrote:
> Thanks Koji, this helped with some of our problems, but it is still not 
> perfect.
> 
> This query, for example, returns no highlighting:
> 
> ?q=id:abc123&hl.q=text_it_IT:l'assieme&hl.fl=text_it_IT&hl=true&defType=edismax
> 
> But this one does (when it is, in effect, the same query):
> 
> ?q=text_it_IT:l'assieme&hl=true&defType=edismax&hl.fl=text_it_IT
> 
> I've tried many combinations but can't seem to get the right one to work. Is 
> this possibly a bug?

As hl.q doesn't care defType parameter but does localParams,
can you try to put {!edismax} to hl.q parameter?

koji
-- 
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson

I brought the bad one down and back up and it did nothing.  I can clear the
index and try4.2.1. I will save off the logs and see if there is anything
else odd
On Apr 2, 2013 9:13 PM, "Mark Miller"  wrote:

> It would appear it's a bug given what you have said.
>
> Any other exceptions would be useful. Might be best to start tracking in a
> JIRA issue as well.
>
> To fix, I'd bring the behind node down and back again.
>
> Unfortunately, I'm pressed for time, but we really need to get to the
> bottom of this and fix it, or determine if it's fixed in 4.2.1 (spreading
> to mirrors now).
>
> - Mark
>
> On Apr 2, 2013, at 7:21 PM, Jamie Johnson  wrote:
>
> > Sorry I didn't ask the obvious question.  Is there anything else that I
> > should be looking for here and is this a bug?  I'd be happy to troll
> > through the logs further if more information is needed, just let me know.
> >
> > Also what is the most appropriate mechanism to fix this.  Is it required
> to
> > kill the index that is out of sync and let solr resync things?
> >
> >
> > On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson  wrote:
> >
> >> sorry for spamming here
> >>
> >> shard5-core2 is the instance we're having issues with...
> >>
> >> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
> >> SEVERE: shard update error StdNode:
> >>
> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
> :
> >> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
> >> status:503, message:Service Unavailable
> >>at
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
> >>at
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>at
> >>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
> >>at
> >>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
> >>at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> >>at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >>at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >>at java.lang.Thread.run(Thread.java:662)
> >>
> >>
> >> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson 
> wrote:
> >>
> >>> here is another one that looks interesting
> >>>
> >>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
> >>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
> >>> the leader, but locally we don't think so
> >>>at
> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
> >>>at
> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
> >>>at
> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
> >>>at
> >>>
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> >>>at
> >>>
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
> >>>at
> >>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> >>>at
> >>>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> >>>at
> >>>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >>>at
> >>>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
> >>>at
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
> >>>at
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
> >>>
> >>>
> >>>
> >>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson 
> wrote:
> >>>
>  Looking at the master it looks like at some point there were shards
> that
>  went down.  I am seeing things like what is below.
> 
>  NFO: A cluster state change: WatchedEvent state:SyncConnected
>  type:NodeChildrenChanged path:/live_nodes, has occurred - updating...
> (live
>  nodes size: 12)
>  Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
>  process
>  INFO: Updating live nodes... (9)
>  Apr 2, 2013 8:12:52 PM
> org.apache.solr.cloud.ShardLeaderElectionContext
>  runLeaderProcess
>  INFO: Running the leader process.
>  Apr 2, 2013 8:12:52 PM

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson

Mark
It's there a particular jira issue that you think may address this? I read
through it quickly but didn't see one that jumped out
On Apr 2, 2013 10:07 PM, "Jamie Johnson"  wrote:

> I brought the bad one down and back up and it did nothing.  I can clear
> the index and try4.2.1. I will save off the logs and see if there is
> anything else odd
> On Apr 2, 2013 9:13 PM, "Mark Miller"  wrote:
>
>> It would appear it's a bug given what you have said.
>>
>> Any other exceptions would be useful. Might be best to start tracking in
>> a JIRA issue as well.
>>
>> To fix, I'd bring the behind node down and back again.
>>
>> Unfortunately, I'm pressed for time, but we really need to get to the
>> bottom of this and fix it, or determine if it's fixed in 4.2.1 (spreading
>> to mirrors now).
>>
>> - Mark
>>
>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson  wrote:
>>
>> > Sorry I didn't ask the obvious question.  Is there anything else that I
>> > should be looking for here and is this a bug?  I'd be happy to troll
>> > through the logs further if more information is needed, just let me
>> know.
>> >
>> > Also what is the most appropriate mechanism to fix this.  Is it
>> required to
>> > kill the index that is out of sync and let solr resync things?
>> >
>> >
>> > On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson 
>> wrote:
>> >
>> >> sorry for spamming here
>> >>
>> >> shard5-core2 is the instance we're having issues with...
>> >>
>> >> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>> >> SEVERE: shard update error StdNode:
>> >>
>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
>> :
>> >> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non
>> ok
>> >> status:503, message:Service Unavailable
>> >>at
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>> >>at
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >>at
>> >>
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>> >>at
>> >>
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>> >>at
>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >>at
>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> >>at
>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >>at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >>at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >>at java.lang.Thread.run(Thread.java:662)
>> >>
>> >>
>> >> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson 
>> wrote:
>> >>
>> >>> here is another one that looks interesting
>> >>>
>> >>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>> >>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
>> >>> the leader, but locally we don't think so
>> >>>at
>> >>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>> >>>at
>> >>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>> >>>at
>> >>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>> >>>at
>> >>>
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>> >>>at
>> >>>
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>> >>>at
>> >>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>> >>>at
>> >>>
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>> >>>at
>> >>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> >>>at
>> >>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> >>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>> >>>at
>> >>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>> >>>at
>> >>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson 
>> wrote:
>> >>>
>>  Looking at the master it looks like at some point there were shards
>> that
>>  went down.  I am seeing things like what is below.
>> 
>>  NFO: A cluster state change: WatchedEvent state:SyncConnected
>>  type:NodeChildrenChanged path:/live_nodes, has occurred -
>> updating... (live
>>

Re: WADL for REST service?

2013-04-02 Thread Otis Gospodnetic

Hi Peter,

I'm afraid we don't have anything that formal... almost empty:
http://search-lucene.com/?q=wadl&fc_project=Solr

Otis
--
Solr & ElasticSearch Support
http://sematext.com/

On Tue, Apr 2, 2013 at 6:38 AM, Peter SchÃ¼tt  wrote:
> Hallo,
>
> does a WADL exists for the REST service of SOLR?
>
> Ciao
>   Peter SchÃ¼tt
>

solre scores remains same for exact match and nearly exact match

2013-04-02 Thread amit


Below is my query
http://localhost:8983/solr/select/?q=subject:session management in
php&fq=category:[*%20TO%20*]&fl=category,score,subject

The result is like below




0
983

category:[* TO *]
subject:session management in php
category,score,subject




0.8770298
Annapurnap
session management in asp.net



0.8770298
Annapurnap
session management in PHP

 


The question is how come both have the same score when 1 is exact match and
the other isn't.
This is the schema







--
View this message in context: 
http://lucene.472066.n3.nabble.com/solre-scores-remains-same-for-exact-match-and-nearly-exact-match-tp4053406.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr Phonetic Search Highlight issue in search results

2013-04-02 Thread Soumyanayan Kar

Thanks a lot Erick for trying this out.

Will wait for a reply from your end.

Thanks & Regards,

Soumya.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 01 April 2013 05:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Phonetic Search Highlight issue in search results

Good question, you're causing me to think... about code I know very little
about .

So rather than spouting off, I tried it and.. it works fine for me, either
with or without using fast vector highlighter on, admittedly, a very simple
test.

So I think I'd try peeling off all the extra stuff you've put into your
configs (sorry, I don't have time right now to try to reproduce) and get the
very simple case working, then build the rest back up and see where the
problem begins.

Sorry for the mis-direction!

Erick



On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar 
wrote:
> Hi Erick,
>
> Thanks for the reply. But help me understand this: If Solr is able to 
> isolate the two documents which contain the term "fact" being the 
> phonetic equivalent of the search term "fakt", then why will it be 
> unable to highlight the terms based on the same logic it uses to search
the documents.
>
> Also, it is correctly highlighting the results in other searches which 
> are also approximate searches and not exact ones for eg. Fuzzy or 
> Synonym search. In these cases also the highlights in the search 
> results are far from the actual search term but still they are getting 
> correctly highlighted.
>
> Maybe I am getting it completely wrong but it looks like there is 
> something wrong with my implementation.
>
> Thanks & Regards,
>
> Soumya.
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 27 March 2013 06:07 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Phonetic Search Highlight issue in search results
>
> How would you expect it to highlight successfully? The term is "fakt", 
> there's nothing built in (and, indeed couldn't be) to un-phoneticize 
> it into "fact" and apply that to the Content field. The whole point of 
> phonetic processing is to do a lossy translation from the word into 
> some variant, losing precision all the way.
>
> So this behavior is unsurprising...
>
> Best
> Erick
>
>
>
>
> On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar 
> > wrote:
>
>> When we are issuing a query with Phonetic Search, it is returning the 
>> correct documents but not returning the highlights. When we use 
>> Stemming or Synonym searches we are getting the proper highlights.
>>
>>
>>
>> For example, when we execute a phonetic query for the term
>> fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it 
>> returns two documents containing the term "fact"(phonetic token 
>> equivalent), but the list of highlights is empty as shown in the 
>> response below.
>>
>>
>>
>> 
>>
>> 
>>
>> 0
>>
>> 16
>>
>> 
>>
>>   ContentSearchPhonetic:fakt
>>
>>   xml
>>
>> 
>>
>>   
>>
>> 
>>
>> 
>>
>>   1
>>
>>   Doc 1
>>
>>   Anyway, this game was excellent and was 
>> well worth the time.  The graphics are truly amazing and the sound 
>> track was pretty pleasant also. The  preacher was in  fact a 
>> thief.
>>
>>   1430480998833848320
>>
>> 
>>
>> 
>>
>>   2
>>
>>   Doc 2
>>
>>   stunning. The  preacher was in  fact an 
>> excellent thief who  had stolen the original manuscript of Hamlet 
>> from an exhibit on the  Riviera, where  he also  acquired his 
>> remarkable and tan.
>>
>>   1430480998841188352
>>
>> 
>>
>>   
>>
>>   
>>
>> 
>>
>> 
>>
>>   
>>
>> 
>>
>>
>>
>> Relevant section of Solr schema:
>>
>>
>>
>> > required="true"/>
>>
>> > required="true"/>
>>
>>  stored="true"
>> required="true"/>
>>
>>
>>
>> > stored="false" multiValued="true"/>
>>
>> > stored="false" multiValued="true"/>
>>
>>  indexed="true"
>> stored="false" multiValued="true"/>
>>
>> > stored="false" multiValued="true"/>
>>
>>
>>
>> DocId
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>>
>>
>> 
>>
>>   
>>
>>  
>>
>>  
>>
>>   
>>
>> 
>>
>>
>>
>> 
>>
>>   
>>
>>  
>>
>>  > encoder="DoubleMetaphone" inject="false"/>
>>
>>   
>>
>> 
>>
>>
>>
>> 
>>
>> 
>>
>>   
>>
>>   > ignoreCase="true" expand="true"/>
>>
>> 
>>
>> 
>>
>>
>>
>> Relevant section of Solr config:
>>
>>
>>
>> 
>>
>> 
>>
>>  
>>
>>explicit
>>
>>100
>>
>>ContentSearch
>>
>>  true
>>
>> Content
>>
>> 150
>>
>>   40
>>
>>  
>>
>> 
>>
>> > name="highlight">
>>
>> 
>>
>> 
>>
>> 
>>
>> >
>> default="true"
>>
>> class="solr.highlight.GapFragmenter">
>>
>>   
>>
>> 100
>>
>>   
>>
>> 
>>
>>
>>
>> 
>>
>> >
>>

Re: solre scores remains same for exact match and nearly exact match

2013-04-02 Thread Gora Mohanty

On 3 April 2013 10:52, amit  wrote:
>
> Below is my query
> http://localhost:8983/solr/select/?q=subject:session management in
> php&fq=category:[*%20TO%20*]&fl=category,score,subject
[...]

Add debugQuery=on to your Solr URL, and you will get an
explanation of the score. Your subject field is tokenised, so
that there is no a priori reason that an exact match should
score higher. Several strategies are available if you want that
behaviour. Try searching Google, e.g., for "solr exact match
higher score".

Regards,
Gora

74 matches

Mail list logo