Re: XLSB files not indexed

2013-10-21 Thread Roland Everaert
Hi Otis,

In our case, there is no exception raised by tika or solr, a lucene
document is created, but the content field contains only a few white spaces
like for ODF files.


Roland.


On Sat, Oct 19, 2013 at 3:54 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi Roland,
>
> It looks like:
> Tika - yes
> Solr - no?
>
> Based on http://search-lucene.com/?q=xlsb
>
> ODF != XLSB though, I think...
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Oct 18, 2013 at 7:36 AM, Roland Everaert 
> wrote:
> > Hi,
> >
> > Can someone tells me if tika is supposed to extract data from xlsb files
> > (the new MS Office format in binary form)?
> >
> > If so then it seems that solr is not able to index them like it is not
> able
> > to index ODF files (a JIRA is already opened for ODF
> > https://issues.apache.org/jira/browse/SOLR-4809)
> >
> > Can someone confirm the problem, or tell me what to do to make solr works
> > with XLSB files.
> >
> >
> > Regards,
> >
> >
> > Roland.
>


RE: Facet performance

2013-10-21 Thread Toke Eskildsen
On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
> Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote:
> > Unfortunately the enum-solution is normally quite slow when there
> > are enough unique values to trigger the "too many > values"-exception.
> > [...]
> 
> [...] And yes, the fc method was terribly slow in a case where it did
> work.  Something like 20 minutes whereas enum returned within a few
> seconds.

Err.. What? That sounds _very_ strange. You have millions of unique
values so fc should be a lot faster than enum, not the other way around.

I assume the 20 minutes was for the first call. How fast does subsequent
calls return for fc?


Maybe you could provide some approximate numbers?

- Documents in your index
- Unique values in the CONTENT field
- Hits are returned from a typical query
- Xmx

Regards,
Toke Eskildsen, State and University Library, Denmark



how to debug my own analyzer in solr

2013-10-21 Thread Mingzhu Gao
Dear solr expert ,

I would like to write my own analyser ( Chinese analyser ) and integrate them 
into solr as solr plugin .

>From the log information , the custom analyzer can be loaded into solr 
>successfully .  I define my  with this custom analyzer.

Now the problem is that ,  when I try this analyzer from 
http://localhost:8983/solr/#/collection1/analysis , click the analysis , then 
choose my FieldType , then input some text .
After I click "Analyse Value" button , the solr hang there , I cannot get any 
result or response in a few minutes.

I also try to add  some data by "curl 
http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" , or 
by "post.sh" in exampledocs folder ,
The same issue , the solr hang there , no result and not response .

Can anybody give me some suggestions on how to debug solr to work with my own 
custom analyzer ?

By the way , I write a java program to call my custom analyzer , the result is 
okay , for example , the following code can work well .
==
Analyzer analyzer = new MyAnalyzer() ;

TokenStream ts = analyzer.tokenStream() ;

CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);

ts.reset();

while (ts.incrementToken()){

System.out.println(ta.toString());

}

=


Thanks,

-Mingz



Ordering Results

2013-10-21 Thread kumar
Hi,


I have a situation that if user looking for anything first it has to give
the suggestions from the exact match and as well as the fuzzy matches.

Suppose we are showing 15 suggestions.

First 10 results are exact match results.
And remaining 5 results from fuzzy matches.

Can anybody give me suggestions how to achieve this task.



Regards,
kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Ordering-Results-tp4096774.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to avoid recover? how to ensure a recover success?

2013-10-21 Thread sling
Hi, guys:

I have an online application with solrcloud 4.1, but I get errors of
syncpeer every 2 or 3 weeks...
In my opinion, a recover occers when a replica can not sync data to its
leader successfully.

I see the topic 
http://lucene.472066.n3.nabble.com/SolrCloud-5x-Errors-while-recovering-td4022542.html
and https://issues.apache.org/jira/i#browse/SOLR-4032, but why did I still
get similar errors in solrcloud4.1?

so is there any settings for syncpeer? 
how to reduce the probability of this error?
when recover happens, how to ensure its success?



The errors I got is like these:
[2013.10.21 10:39:13.482]2013-10-21 10:39:13,482 WARN
[org.apache.solr.handler.SnapPuller] - Error in fetching packets 
[2013.10.21 10:39:13.482]java.io.EOFException
[2013.10.21 10:39:13.482]   at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154)
[2013.10.21 10:39:13.482]   at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1136)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1099)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:738)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:395)
[2013.10.21 10:39:13.482]   at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:274)
[2013.10.21 10:39:13.482]   at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:153)
[2013.10.21 10:39:13.482]   at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
[2013.10.21 10:39:13.482]   at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
[2013.10.21 10:39:13.485]2013-10-21 10:39:13,485 WARN
[org.apache.solr.handler.SnapPuller] - Error in fetching packets 
[2013.10.21 10:39:13.485]java.io.EOFException
[2013.10.21 10:39:13.485]   at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154)
[2013.10.21 10:39:13.485]   at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1136)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1099)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:738)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:395)
[2013.10.21 10:39:13.485]   at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:274)
[2013.10.21 10:39:13.485]   at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:153)
[2013.10.21 10:39:13.485]   at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
[2013.10.21 10:39:13.485]   at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
[2013.10.21 10:41:08.461]2013-10-21 10:41:08,461 ERROR
[org.apache.solr.handler.ReplicationHandler] - SnapPull failed
:org.apache.solr.common.SolrException: Unable to download
_fi05_Lucene41_0.pos completely. Downloaded 0!=1485
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1230)
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1110)
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:738)
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:395)
[2013.10.21 10:41:08.461]   at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:274)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:153)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
[2013.10.21 10:41:08.461]
[2013.10.21 10:41:08.461]2013-10-21 10:41:08,461 ERROR
[org.apache.solr.cloud.RecoveryStrategy] - Error while trying to
recover:org.apache.solr.common.SolrException: Replication for recovery
failed.
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
[2013.10.21 10:41:08.461]   at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
[2013.10.21 10:41:08.461]
[2013.10.21 10:41:08.555]2013-10-21 10:41:08,462 ERROR
[org.apache.solr.handler.Repli

Re: Solr timeout after reboot

2013-10-21 Thread michael.boom
Thank you, Otis!

I've integrated the SPM on my Solr instances and now I have access to
monitoring data.
Could you give me some hints on which metrics should I watch?

Below I've added my query configs. Is there anything I could tweak here?


1024




   




  

true

   20

   100


  

  active:true

  


false

10

  



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096780.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrconfig.xml carrot2 params

2013-10-21 Thread Stanislaw Osinski
> Thanks, I'm new to the clustering libraries.  I finally made this
> connection when I started browsing through the carrot2 source.  I had
> pulled down a smaller MM document collection from our test environment.  It
> was not ideal as it was mostly structured, but small.  I foolishly thought
> I could cluster on the text copy field before realizing that it was index
> only.  Doh!
>

That is correct -- for the time being the clustering can only be applied to
stored Solr fields.



> Our documents are indexed in SolrCloud, but stored in HBase.  I want to
> allow users to page through Solr hits, but would like to cluster on all (or
> at least several thousand) of the top search hits.  Now I'm puzzling over
> how to efficiently cluster over possibly several thousand Solr hits when
> the documents are in HBase.  I thought an HBase coprocessor, but carrot2
> isn't designed for distributed computation.  Mahout, in the Hadoop M/R
> context, seems slow and heavy handed for this scale; maybe, I just need to
> dig deeper into their library.  Or I could just be missing something
> fundamental?  :)
>

Carrot2 algorithms were not designed to be distributed, but you can still
use them in a single-threaded scenario. To do this, you'd probably need to
write a bit of code that gets the text of your documents from your HBase
and runs Carrot2 clustering on it. If you use the STC clustering algorithm,
you should be able to process several thousands of documents in a
reasonable time (order of seconds). The clustering side of the code should
be a matter of a few lines of code (
http://download.carrot2.org/stable/javadoc/overview-summary.html#clustering-documents).
The tricky bit of the setup may be efficiently getting the text for
clustering -- it can happen that fetching can take longer than the actual
clustering.

S.


Re: how to debug my own analyzer in solr

2013-10-21 Thread Mingzhu Gao
More information about this , the custom analyzer just implement
"createComponents" of Analyzer.

And my configure in schema.xml is just something like :


 



>From the log I cannot see any error information , however , when I want to
analysis or add document data , it always hang there .

Any way to debug or narrow down the problem ?

Thanks in advance .

-Mingz

On 10/21/13 4:35 PM, "Mingzhu Gao"  wrote:

>Dear solr expert ,
>
>I would like to write my own analyser ( Chinese analyser ) and integrate
>them into solr as solr plugin .
>
>From the log information , the custom analyzer can be loaded into solr
>successfully .  I define my  with this custom analyzer.
>
>Now the problem is that ,  when I try this analyzer from
>http://localhost:8983/solr/#/collection1/analysis , click the analysis ,
>then choose my FieldType , then input some text .
>After I click "Analyse Value" button , the solr hang there , I cannot get
>any result or response in a few minutes.
>
>I also try to add  some data by "curl
>http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml"
>, or by "post.sh" in exampledocs folder ,
>The same issue , the solr hang there , no result and not response .
>
>Can anybody give me some suggestions on how to debug solr to work with my
>own custom analyzer ?
>
>By the way , I write a java program to call my custom analyzer , the
>result is okay , for example , the following code can work well .
>==
>Analyzer analyzer = new MyAnalyzer() ;
>
>TokenStream ts = analyzer.tokenStream() ;
>
>CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
>
>ts.reset();
>
>while (ts.incrementToken()){
>
>System.out.println(ta.toString());
>
>}
>
>=
>
>
>Thanks,
>
>-Mingz
>



Error: Repeated service interruptions - failure processing document: Read timed out

2013-10-21 Thread Ronny Heylen
Hi,

Just installed SOLR and when running a job I have the following problem :


Error: Repeated service interruptions - failure processing document: Read
timed out


Like I said, just installed SOLR and so very new to the topic. ( On Windows
2008R2 )

SOLR 4.4

Tomcat 7.0.42

ManifoldCF 1.3

Postgresql 9.1.1

In the log Tomcat I find the following error :

ERROR - 2013-10-21 09:35:16.551; org.apache.solr.common.SolrException;
null:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
Processing of multipart/form-data request failed. null

at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)

at
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)

at
org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:492)

at
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:626)

at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:143)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:342)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)

at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)

at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)

at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)

at
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1852)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

Caused by: java.net.SocketTimeoutException

at
org.apache.coyote.http11.InternalAprInputBuffer.fill(InternalAprInputBuffer.java:607)

at
org.apache.coyote.http11.InternalAprInputBuffer$SocketInputBuffer.doRead(InternalAprInputBuffer.java:642)

at
org.apache.coyote.http11.filters.ChunkedInputFilter.readBytes(ChunkedInputFilter.java:275)

at
org.apache.coyote.http11.filters.ChunkedInputFilter.parseCRLF(ChunkedInputFilter.java:377)

at
org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:147)

at
org.apache.coyote.http11.InternalAprInputBuffer.doRead(InternalAprInputBuffer.java:534)

at org.apache.coyote.Request.doRead(Request.java:422)

at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:290)

at
org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:449)

at
org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:315)

at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:200)

at java.io.FilterInputStream.read(Unknown Source)

at
org.apache.commons.fileupload.util.LimitedInputStream.read(LimitedInputStream.java:125)

at
org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977)

at
org.apache.commons.fileupload.MultipartStream$ItemInputStream.read(MultipartStream.java:887)

at java.io.InputStream.read(Unknown Source)

at
org.apache.commons.fileupload.util.Streams.copy(Streams.java:94)

at
org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)

at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)

... 21 more


Re: how to debug my own analyzer in solr

2013-10-21 Thread Siegfried Goeschl

Thread Dump and/or Remote Debugging?!

Cheers,

Siegfried Goeschl

On 21.10.13 11:58, Mingzhu Gao wrote:

More information about this , the custom analyzer just implement
"createComponents" of Analyzer.

And my configure in schema.xml is just something like :


  



 From the log I cannot see any error information , however , when I want to
analysis or add document data , it always hang there .

Any way to debug or narrow down the problem ?

Thanks in advance .

-Mingz

On 10/21/13 4:35 PM, "Mingzhu Gao"  wrote:


Dear solr expert ,

I would like to write my own analyser ( Chinese analyser ) and integrate
them into solr as solr plugin .

From the log information , the custom analyzer can be loaded into solr
successfully .  I define my  with this custom analyzer.

Now the problem is that ,  when I try this analyzer from
http://localhost:8983/solr/#/collection1/analysis , click the analysis ,
then choose my FieldType , then input some text .
After I click "Analyse Value" button , the solr hang there , I cannot get
any result or response in a few minutes.

I also try to add  some data by "curl
http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml"
, or by "post.sh" in exampledocs folder ,
The same issue , the solr hang there , no result and not response .

Can anybody give me some suggestions on how to debug solr to work with my
own custom analyzer ?

By the way , I write a java program to call my custom analyzer , the
result is okay , for example , the following code can work well .
==
Analyzer analyzer = new MyAnalyzer() ;

TokenStream ts = analyzer.tokenStream() ;

CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);

ts.reset();

while (ts.incrementToken()){

System.out.println(ta.toString());

}

=


Thanks,

-Mingz







Re: how to debug my own analyzer in solr

2013-10-21 Thread Koji Sekiguchi

Hi Mingz,

If you use Eclipse, you can debug Solr with your plugin like this:

# go to Solr install directory
$ cd $SOLR
$ ant run-example -Dexample.debug=true

Then connect the JVM from Eclipse via remote debug port 5005.

Good luck!

koji


(13/10/21 18:58), Mingzhu Gao wrote:

More information about this , the custom analyzer just implement
"createComponents" of Analyzer.

And my configure in schema.xml is just something like :


  




From the log I cannot see any error information , however , when I want to

analysis or add document data , it always hang there .

Any way to debug or narrow down the problem ?

Thanks in advance .

-Mingz

On 10/21/13 4:35 PM, "Mingzhu Gao"  wrote:


Dear solr expert ,

I would like to write my own analyser ( Chinese analyser ) and integrate
them into solr as solr plugin .

From the log information , the custom analyzer can be loaded into solr
successfully .  I define my  with this custom analyzer.

Now the problem is that ,  when I try this analyzer from
http://localhost:8983/solr/#/collection1/analysis , click the analysis ,
then choose my FieldType , then input some text .
After I click "Analyse Value" button , the solr hang there , I cannot get
any result or response in a few minutes.

I also try to add  some data by "curl
http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml"
, or by "post.sh" in exampledocs folder ,
The same issue , the solr hang there , no result and not response .

Can anybody give me some suggestions on how to debug solr to work with my
own custom analyzer ?

By the way , I write a java program to call my custom analyzer , the
result is okay , for example , the following code can work well .
==
Analyzer analyzer = new MyAnalyzer() ;

TokenStream ts = analyzer.tokenStream() ;

CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);

ts.reset();

while (ts.incrementToken()){

System.out.println(ta.toString());

}

=


Thanks,

-Mingz







--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html


Re: Ordering Results

2013-10-21 Thread Upayavira
Do two searches.

Why do you want to do this though? It seems a bit strange. Presumably
your users want the best matches possible whether exact or fuzzy?
Wouldn't it be best to return both exact and fuzzy matches, but score
the exact ones above the fuzzy ones?

Upayavira

On Mon, Oct 21, 2013, at 09:56 AM, kumar wrote:
> Hi,
> 
> 
> I have a situation that if user looking for anything first it has to give
> the suggestions from the exact match and as well as the fuzzy matches.
> 
> Suppose we are showing 15 suggestions.
> 
> First 10 results are exact match results.
> And remaining 5 results from fuzzy matches.
> 
> Can anybody give me suggestions how to achieve this task.
> 
> 
> 
> Regards,
> kumar
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Ordering-Results-tp4096774.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Performance Issue

2013-10-21 Thread Erick Erickson
Shamik:

You're right, the use of NOW shouldn't be making that much of a difference
between versions. FYI, though, here's a way to use NOW and re-use fq
clauses:

http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

It may well be this setting:


1000


Every second (assuming you're indexing), you're throwing away all your
top-level caches and executing any autowarm queries etc. And if you _don't_
have any autowarming queries, you may not be filling caches, an expensive
process. Try lengthening that out to, say, a minute (6) or even longer
and see if that makes a difference. If that's the culprit, you at least
have a place to start.

If that's not it, it's also possible you're seeing decompression.

How many documents are you returning and how big are they? There's some
anecdotal comments that the default stored field decompression for either a
large number of doc or very large docs may be playing a role here. Try
setting fl=id (don't return any stored fields). If that is faster, this
might be your problem.

queryResultCache is often not very high re: hit ratio. It's usually used
for paging, so if your users aren't hitting the "next" page you may not hit
many.

Best,
Erick


On Sat, Oct 19, 2013 at 4:12 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> What happens if you have just 1 shard - no distributed search, like
> before? SPM for Solr or any other monitoring tool that captures OS and
> Solr metrics should help you find the source of the problem faster.
> Is disk IO the same? utilization of caches? JVM version, heap, etc.?
> CPU usage? network?  I'd look at each of these things side by side and
> look for big differences.
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> SOLR Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Oct 18, 2013 at 1:38 AM, shamik  wrote:
> > I tried commenting out NOW in bq, but didn't make any difference in the
> > performance. I do see minor entry in the queryfiltercache rate which is a
> > meager 0.02.
> >
> > I'm really struggling to figure out the bottleneck, any known pain
> points I
> > should be checking ?
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Issue-tp4095971p4096277.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: caching HTML pages in SOLR

2013-10-21 Thread Furkan KAMACI
You can also try: https://www.varnish-cache.org/


2013/10/21 Alexandre Rafalovitch 

> I have not used it myself, but perhaps something like
> http://www.crawl-anywhere.com/ is along what you were looking for.
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Mon, Oct 21, 2013 at 1:44 PM, Shailendra Mudgal <
> mudgal.shailen...@gmail.com> wrote:
>
> > Thanks Alex.
> >
> > I was thinking if something already exists of this sort.
> >
> >
> >
> >
> > On Mon, Oct 21, 2013 at 12:05 PM, Alexandre Rafalovitch
> > wrote:
> >
> > > Not in Solr itself, no. Solr is all about Search. Caching (and
> rewriting
> > > resource links, etc) should probably be part of whatever does the
> > document
> > > fetching.
> > >
> > > Regards,
> > >Alex.
> > >
> > > Personal website: http://www.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all at
> > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> > >
> > >
> > > On Mon, Oct 21, 2013 at 1:19 PM, Shailendra Mudgal <
> > > mudgal.shailen...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > As google stores HTML pages as "*cached*" documents, is there a
> similar
> > > > provision in SOLR. I am using SOLR-4.4.0.
> > > >
> > > >
> > > > Thanks,
> > > > Shailendra
> > > >
> > >
> >
>


Re: ExtractRequestHandler, skipping errors

2013-10-21 Thread Jan Høydahl
Guido, can you point us to the Commons-Compress JIRA issue which reports your 
particular problem? Perhaps uncompress works just fine?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

18. okt. 2013 kl. 14:48 skrev Guido Medina :

> Dont, commons compress 1.5 is broken, either use 1.4.1 or later. Our app 
> stopped compressing properly for a maven update.
> 
> Guido.
> 
> On 18/10/13 12:40, Roland Everaert wrote:
>> I will open a JIRA issue, I suppose that I just have to create an account
>> first?
>> 
>> 
>> Regards,
>> 
>> 
>> Roland.
>> 
>> 
>> On Fri, Oct 18, 2013 at 12:05 PM, Koji Sekiguchi  wrote:
>> 
>>> Hi,
>>> 
>>> I think the flag cannot ignore NoSuchMethodError. There may be something
>>> wrong here?
>>> 
>>> ... I've just checked my Solr 4.5 directories and I found Tika version is
>>> 1.4.
>>> 
>>> Tika 1.4 seems to use commons compress 1.5:
>>> 
>>> http://svn.apache.org/viewvc/**tika/tags/1.4/tika-parsers/**
>>> pom.xml?view=markup
>>> 
>>> But I see commons-compress-1.4.1.jar in solr/contrib/extraction/lib/
>>> directory.
>>> 
>>> Can you open a JIRA issue?
>>> 
>>> For now, you can get commons compress 1.5 and put it to the directory
>>> (don't forget to remove 1.4.1 jar file).
>>> 
>>> koji
>>> 
>>> 
>>> (13/10/18 16:37), Roland Everaert wrote:
>>> 
 Hi,
 
 We already configure the extractrequesthandler to ignore tika exceptions,
 but it is solr that complains. The customer manage to reproduce the
 problem. Following is the error from the solr.log. The file type cause
 this
 exception was WMZ. It seems that something is missing in a solr class. We
 use SOLR 4.4.
 
 ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.**SolrException;
 null:java.lang.**RuntimeException: java.lang.NoSuchMethodError:
 org.apache.commons.compress.**compressors.**CompressorStreamFactory.**
 setDecompressConcatenated(Z)V
  at
 org.apache.solr.servlet.**SolrDispatchFilter.sendError(**
 SolrDispatchFilter.java:673)
  at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:383)
  at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:158)
  at
 org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(**
 ApplicationFilterChain.java:**243)
  at
 org.apache.catalina.core.**ApplicationFilterChain.**doFilter(**
 ApplicationFilterChain.java:**210)
  at
 org.apache.catalina.core.**StandardWrapperValve.invoke(**
 StandardWrapperValve.java:222)
  at
 org.apache.catalina.core.**StandardContextValve.invoke(**
 StandardContextValve.java:123)
  at
 org.apache.catalina.core.**StandardHostValve.invoke(**
 StandardHostValve.java:171)
  at
 org.apache.catalina.valves.**ErrorReportValve.invoke(**
 ErrorReportValve.java:99)
  at
 org.apache.catalina.valves.**AccessLogValve.invoke(**
 AccessLogValve.java:953)
  at
 org.apache.catalina.core.**StandardEngineValve.invoke(**
 StandardEngineValve.java:118)
  at
 org.apache.catalina.connector.**CoyoteAdapter.service(**
 CoyoteAdapter.java:408)
  at
 org.apache.coyote.http11.**AbstractHttp11Processor.**process(**
 AbstractHttp11Processor.java:**1023)
  at
 org.apache.coyote.**AbstractProtocol$**AbstractConnectionHandler.**
 process(AbstractProtocol.java:**589)
  at
 org.apache.tomcat.util.net.**AprEndpoint$SocketProcessor.**
 run(AprEndpoint.java:1852)
  at java.util.concurrent.**ThreadPoolExecutor.runWorker(**Unknown
 Source)
  at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**Unknown
 Source)
  at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.NoSuchMethodError:
 org.apache.commons.compress.**compressors.**CompressorStreamFactory.**
 setDecompressConcatenated(Z)V
  at
 org.apache.tika.parser.pkg.**CompressorParser.parse(**
 CompressorParser.java:102)
  at
 org.apache.tika.parser.**CompositeParser.parse(**
 CompositeParser.java:242)
  at
 org.apache.tika.parser.**CompositeParser.parse(**
 CompositeParser.java:242)
  at
 org.apache.tika.parser.**AutoDetectParser.parse(**
 AutoDetectParser.java:120)
  at
 org.apache.solr.handler.**extraction.**ExtractingDocumentLoader.load(**
 ExtractingDocumentLoader.java:**219)
  at
 org.apache.solr.handler.**ContentStreamHandlerBase.**handleRequestBody(**
 ContentStreamHandlerBase.java:**74)
  at
 org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
 RequestHandlerBase.java:135)
  at
 org.apache.solr.core.**RequestHandlers$**LazyRequestHandlerWrapper.**
 handleRequest(RequestHandlers.**java:24

Question about docvalues

2013-10-21 Thread yriveiro
Hi,

If I have a field (named dv_field) configured to be indexed, stored and with
docvalues=true.

How I know that when I do a query like:

q=*:*&facet=true&facet.field=dv_field, I'm really using the docvalues and
not the normal way?

Is it necessary duplicate the field and set index and stored to false and
let the docvalues property set to true?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr timeout after reboot

2013-10-21 Thread Peter Keegan
Have you tried this old trick to warm the FS cache?
cat ...//data/index/* >/dev/null

Peter


On Mon, Oct 21, 2013 at 5:31 AM, michael.boom  wrote:

> Thank you, Otis!
>
> I've integrated the SPM on my Solr instances and now I have access to
> monitoring data.
> Could you give me some hints on which metrics should I watch?
>
> Below I've added my query configs. Is there anything I could tweak here?
>
> 
> 1024
>
>   size="1000"
>  initialSize="1000"
>  autowarmCount="0"/>
>
>   size="1000"
>  initialSize="1000"
>  autowarmCount="0"/>
>
> size="1000"
>initialSize="1000"
>autowarmCount="0"/>
>
>
>  size="1000"
> initialSize="1000"
> autowarmCount="0" />
>
>
> true
>
>20
>
>100
>
> 
>   
> 
>   active:true
> 
>   
> 
>
> false
>
> 10
>
>   
>
>
>
> -
> Thanks,
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096780.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr timeout after reboot

2013-10-21 Thread michael.boom
Hmm, no, I haven't...

What would be the effect of this ?



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096809.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr timeout after reboot

2013-10-21 Thread François Schiettecatte
To put the file data into file system cache which would make for faster access.

François


On Oct 21, 2013, at 8:33 AM, michael.boom  wrote:

> Hmm, no, I haven't...
> 
> What would be the effect of this ?
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096809.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Exact Match Results

2013-10-21 Thread kumar
I am querying solr for exact match results. But it is showing some other
results also.

Examle :

User Query String : 

Okkadu telugu movie

Results :

1.Okkadu telugu movie
2.Okkadunnadu telugu movie
3.YuganikiOkkadu telugu movie
4.Okkadu telugu movie stills


how can we order these results that 4th result has to come second.


Please anyone can you give me any idea?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exact Match Results

2013-10-21 Thread François Schiettecatte
Kumar

You might want to look into the 'pf' parameter:


https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

François

On Oct 21, 2013, at 9:24 AM, kumar  wrote:

> I am querying solr for exact match results. But it is showing some other
> results also.
> 
> Examle :
> 
> User Query String : 
> 
> Okkadu telugu movie
> 
> Results :
> 
> 1.Okkadu telugu movie
> 2.Okkadunnadu telugu movie
> 3.YuganikiOkkadu telugu movie
> 4.Okkadu telugu movie stills
> 
> 
> how can we order these results that 4th result has to come second.
> 
> 
> Please anyone can you give me any idea?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Class name of parsing the fq clause

2013-10-21 Thread Jack Krupansky
Start with org.apache.solr.handler.component.QueryComponent#prepare which 
fetches the fq parameters and indirectly invokes the query parser(s):


String[] fqs = req.getParams().getParams(CommonParams.FQ);
if (fqs!=null && fqs.length!=0) {
  List filters = rb.getFilters();
  // if filters already exists, make a copy instead of modifying the 
original
  filters = filters == null ? new ArrayList(fqs.length) : new 
ArrayList(filters);

  for (String fq : fqs) {
if (fq != null && fq.trim().length()!=0) {
  QParser fqp = QParser.getParser(fq, null, req);
  filters.add(fqp.getQuery());
}
  }
  // only set the filters if they are not empty otherwise
  // fq=&someotherParam= will trigger all docs filter for every request
  // if filter cache is disabled
  if (!filters.isEmpty()) {
rb.setFilters( filters );

Note that this line actually invokes the parser:

  filters.add(fqp.getQuery());

Then in org.apache.lucene.search.Query.QParser#getParser:

QParserPlugin qplug = req.getCore().getQueryPlugin(parserName);
QParser parser =  qplug.createParser(qstr, localParams, req.getParams(), 
req);


And for the common case of the Lucene query parser, 
org.apache.solr.search.LuceneQParserPlugin#createParser:


public QParser createParser(String qstr, SolrParams localParams, SolrParams 
params, SolrQueryRequest req) {

 return new LuceneQParser(qstr, localParams, params, req);
}

And then in org.apache.lucene.search.Query.QParser#getQuery:

public Query getQuery() throws SyntaxError {
 if (query==null) {
   query=parse();

And then in org.apache.lucene.search.Query.LuceneQParser#parse:

lparser = new SolrQueryParser(this, defaultField);

lparser.setDefaultOperator
 (QueryParsing.getQueryParserDefaultOperator(getReq().getSchema(),
 getParam(QueryParsing.OP)));

return lparser.parse(qstr);

And then in org.apache.solr.parser.SolrQueryParserBase#parse:

Query res = TopLevelQuery(null);  // pass null so we can tell later if an 
explicit field was provided or not


And then in org.apache.solr.parser.QueryParser#TopLevelQuery, the parsing 
begins.


And org.apache.solr.parser.QueryParser.jj is the grammar for a basic 
Solr/Lucene query, and org.apache.solr.parser.QueryParser.java is generated 
by JFlex, and a lot of the logic is in the base class of the generated 
class, org.apache.solr.parser.SolrQueryParserBase.java.


Good luck! Happy hunting!

-- Jack Krupansky

-Original Message- 
From: YouPeng Yang

Sent: Monday, October 21, 2013 2:57 AM
To: solr-user@lucene.apache.org
Subject: Class name of parsing the fq clause

Hi
  I search the solr with fq clause,which is like:
  fq=BEGINTIME:[2013-08-25T16:00:00Z TO *] AND BUSID:(M3 OR M9)


  I am curious about the parsing process . I want to study it.
  What is the Java file name describes  the parsing  process of the fq
clause.


 Thanks

Regards. 



Re: Solr timeout after reboot

2013-10-21 Thread Peter Keegan
I found this warming to be especially necessary after starting an instance
of those m3.xlarge servers, else the response times for the first minutes
was terrible.

Peter


On Mon, Oct 21, 2013 at 8:39 AM, François Schiettecatte <
fschietteca...@gmail.com> wrote:

> To put the file data into file system cache which would make for faster
> access.
>
> François
>
>
> On Oct 21, 2013, at 8:33 AM, michael.boom  wrote:
>
> > Hmm, no, I haven't...
> >
> > What would be the effect of this ?
> >
> >
> >
> > -
> > Thanks,
> > Michael
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096809.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Solr timeout after reboot

2013-10-21 Thread michael.boom
I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
so I guess putting running the above command would bite all available
memory.



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096827.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to debug my own analyzer in solr

2013-10-21 Thread Mingzhu Gao
Koji , thank you for reply.

I am just using the binary of solr.war , I will try to use the solr source
and have a try . 

-Mingz

On 10/21/13 6:21 PM, "Koji Sekiguchi"  wrote:

>Hi Mingz,
>
>If you use Eclipse, you can debug Solr with your plugin like this:
>
># go to Solr install directory
>$ cd $SOLR
>$ ant run-example -Dexample.debug=true
>
>Then connect the JVM from Eclipse via remote debug port 5005.
>
>Good luck!
>
>koji
>
>
>(13/10/21 18:58), Mingzhu Gao wrote:
>> More information about this , the custom analyzer just implement
>> "createComponents" of Analyzer.
>>
>> And my configure in schema.xml is just something like :
>>
>> 
>>   
>> 
>>
>>
>>>From the log I cannot see any error information , however , when I want
>>>to
>> analysis or add document data , it always hang there .
>>
>> Any way to debug or narrow down the problem ?
>>
>> Thanks in advance .
>>
>> -Mingz
>>
>> On 10/21/13 4:35 PM, "Mingzhu Gao"  wrote:
>>
>>> Dear solr expert ,
>>>
>>> I would like to write my own analyser ( Chinese analyser ) and
>>>integrate
>>> them into solr as solr plugin .
>>>
>>>From the log information , the custom analyzer can be loaded into solr
>>> successfully .  I define my  with this custom analyzer.
>>>
>>> Now the problem is that ,  when I try this analyzer from
>>> http://localhost:8983/solr/#/collection1/analysis , click the analysis
>>>,
>>> then choose my FieldType , then input some text .
>>> After I click "Analyse Value" button , the solr hang there , I cannot
>>>get
>>> any result or response in a few minutes.
>>>
>>> I also try to add  some data by "curl
>>> http://localhost:8983/solr/update?commit=true -H "Content-Type:
>>>text/xml"
>>> , or by "post.sh" in exampledocs folder ,
>>> The same issue , the solr hang there , no result and not response .
>>>
>>> Can anybody give me some suggestions on how to debug solr to work with
>>>my
>>> own custom analyzer ?
>>>
>>> By the way , I write a java program to call my custom analyzer , the
>>> result is okay , for example , the following code can work well .
>>> ==
>>> Analyzer analyzer = new MyAnalyzer() ;
>>>
>>> TokenStream ts = analyzer.tokenStream() ;
>>>
>>> CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
>>>
>>> ts.reset();
>>>
>>> while (ts.incrementToken()){
>>>
>>> System.out.println(ta.toString());
>>>
>>> }
>>>
>>> =
>>>
>>>
>>> Thanks,
>>>
>>> -Mingz
>>>
>>
>>
>
>
>-- 
>http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wik
>ipedia.html



Re: Local Solr and Webserver-Solr act differently ("and" treated like "or")

2013-10-21 Thread Stavros Delisavas
Okay, I emtpied the stopword file. I don't know where the wordlist came
from. I have never seen this and never touched that file. Anyways...
Now my queries do work with one word, like "in" or "to" but the queries
still do not work when I use more than one stopword within one query.
Instead of too many results I now get NO results at all.

What could be the problem?



On 17.10.2013 15:02, Jack Krupansky wrote:
> The default Solr stopwords.txt file is empty, so SOMEBODY created that
> non-empty stop words file.
> 
> The StopFilterFactory token filter in the field type analyzer controls
> stop word processing. You can remove that step entirely, or different
> field types can reference different stop word files, or some field type
> analyzers can use the stop filter and some would not have it. This does
> mean that you would have to use different field types for fields that
> want different stop word processing.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Stavros Delisavas
> Sent: Thursday, October 17, 2013 3:27 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Local Solr and Webserver-Solr act differently ("and"
> treated like "or")
> 
> Thank you,
> I found the file with the stopwords and noticed that my local file is
> empty (comments only) and the one on my webserver has a big list of
> english stopwords. That seems to be the problem.
> 
> I think in general it is a good idea to use stopwords for random
> searches, but it is not usefull in my special case. Is there a way to
> (de)activate stopwords query-wise? Like I would like to ignore stopwords
> when searching in titles but I would like to use stopwords when users do
> a fulltext-search on whole articles, etc.
> 
> Thanks again,
> Stavros
> 
> 
> On 17.10.2013 09:13, Upayavira wrote:
>> Stopwords are small words such as "and", "the" or "is",that we might
>> choose to exclude from our documents and queries because they are such
>> common terms. Once you have stripped stop words from your above query,
>> all that is left is the word "wild", or so is being suggested.
>>
>> Somewhere in your config, close to solr config.xml, you will find a file
>> called something like stopwords.txt. Compare these files between your
>> two systems.
>>
>> Upayavira
>>
>> On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote:
>>> Unfortunatly, I don't really know what stopwords are. I would like it to
>>> not ignore any words of my query.
>>> How/Where can I change this stopwords-behaviour?
>>>
>>>
>>> Am 16.10.2013 23:45, schrieb Jack Krupansky:
 So, the stopwords.txt file is different between the two systems - the
 first has stop words but the second does not. Did you expect stop
 words to be removed, or not?

 -- Jack Krupansky

 -Original Message- From: Stavros Delsiavas
 Sent: Wednesday, October 16, 2013 5:02 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Local Solr and Webserver-Solr act differently ("and"
 treated like "or")

 Okay I understand,

 here's the rawquerystring. It was at about line 3000:

 
  title:(into AND the AND wild*)
  title:(into AND the AND wild*)
  +title:wild*
  +title:wild*

 At this place the debug output DOES differ from the one on my local
 system. But I don't understand why...
 This is the local debug output:

 
   title:(into AND the AND wild*)
   title:(into AND the AND wild*)
   +title:into +title:the +title:wild*
   +title:into +title:the
 +title:wild*

 Why is that? Any ideas?




 Am 16.10.2013 21:03, schrieb Shawn Heisey:
> On 10/16/2013 4:46 AM, Stavros Delisavas wrote:
>> My local solr gives me:
>> http://pastebin.com/Q6d9dFmZ
>>
>> and my webserver this:
>> http://pastebin.com/q87WEjVA
>>
>> I copied only the first few hundret lines (of more than 8000) because
>> the webserver output was to big even for pastebin.
>>
>>
>>
>> On 16.10.2013 12:27, Erik Hatcher wrote:
>>> What does the debug output say from debugQuery=true say between the
>>> two?
> What's really needed here is the first part of the  section,
> which has rawquerystring, querystring, parsedquery, and
> parsedquery_toString.  The info from your local solr has this part,
> but
> what you pasted from the webserver one didn't include those parts,
> because it's further down than the first few hundred lines.
>
> Thanks,
> Shawn
>
> 



Re: Solr timeout after reboot

2013-10-21 Thread François Schiettecatte
Well no, the OS is smarter than that, it manages file system cache along with 
other memory requirements. If applications need more memory then file system 
cache will likely be reduced. 

The command is a cheap trick to get the OS to fill the file system cache as 
quickly as possible, not sure how much it will help though with a 100GB index 
on a 15GB machine. This might work if you 'cat' the index files other than the 
'.fdx' and '.fdt' files.

François

On Oct 21, 2013, at 10:03 AM, michael.boom  wrote:

> I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
> so I guess putting running the above command would bite all available
> memory.
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096827.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Class name of parsing the fq clause

2013-10-21 Thread YouPeng Yang
HI Jack

  Thanks a lot for your explanation.


2013/10/21 Jack Krupansky 

> Start with org.apache.solr.handler.**component.QueryComponent#**prepare
> which fetches the fq parameters and indirectly invokes the query parser(s):
>
> String[] fqs = req.getParams().getParams(**CommonParams.FQ);
> if (fqs!=null && fqs.length!=0) {
>   List filters = rb.getFilters();
>   // if filters already exists, make a copy instead of modifying the
> original
>   filters = filters == null ? new ArrayList(fqs.length) : new
> ArrayList(filters);
>   for (String fq : fqs) {
> if (fq != null && fq.trim().length()!=0) {
>   QParser fqp = QParser.getParser(fq, null, req);
>   filters.add(fqp.getQuery());
> }
>   }
>   // only set the filters if they are not empty otherwise
>   // fq=&someotherParam= will trigger all docs filter for every request
>   // if filter cache is disabled
>   if (!filters.isEmpty()) {
> rb.setFilters( filters );
>
> Note that this line actually invokes the parser:
>
>   filters.add(fqp.getQuery());
>
> Then in org.apache.lucene.search.**Query.QParser#getParser:
>
> QParserPlugin qplug = req.getCore().getQueryPlugin(**parserName);
> QParser parser =  qplug.createParser(qstr, localParams, req.getParams(),
> req);
>
> And for the common case of the Lucene query parser, org.apache.solr.search.
> **LuceneQParserPlugin#**createParser:
>
> public QParser createParser(String qstr, SolrParams localParams,
> SolrParams params, SolrQueryRequest req) {
>  return new LuceneQParser(qstr, localParams, params, req);
> }
>
> And then in org.apache.lucene.search.**Query.QParser#getQuery:
>
> public Query getQuery() throws SyntaxError {
>  if (query==null) {
>query=parse();
>
> And then in org.apache.lucene.search.**Query.LuceneQParser#parse:
>
> lparser = new SolrQueryParser(this, defaultField);
>
> lparser.setDefaultOperator
>  (QueryParsing.**getQueryParserDefaultOperator(**getReq().getSchema(),
>  getParam(QueryParsing.OP)));
>
> return lparser.parse(qstr);
>
> And then in org.apache.solr.parser.**SolrQueryParserBase#parse:
>
> Query res = TopLevelQuery(null);  // pass null so we can tell later if an
> explicit field was provided or not
>
> And then in org.apache.solr.parser.**QueryParser#TopLevelQuery, the
> parsing begins.
>
> And org.apache.solr.parser.**QueryParser.jj is the grammar for a basic
> Solr/Lucene query, and org.apache.solr.parser.**QueryParser.java is
> generated by JFlex, and a lot of the logic is in the base class of the
> generated class, org.apache.solr.parser.**SolrQueryParserBase.java.
>
> Good luck! Happy hunting!
>
> -- Jack Krupansky
>
> -Original Message- From: YouPeng Yang
> Sent: Monday, October 21, 2013 2:57 AM
> To: solr-user@lucene.apache.org
> Subject: Class name of parsing the fq clause
>
>
> Hi
>   I search the solr with fq clause,which is like:
>   fq=BEGINTIME:[2013-08-25T16:**00:00Z TO *] AND BUSID:(M3 OR M9)
>
>
>   I am curious about the parsing process . I want to study it.
>   What is the Java file name describes  the parsing  process of the fq
> clause.
>
>
>  Thanks
>
> Regards.
>


RE: Facet performance

2013-10-21 Thread Lemke, Michael SZ/HZA-ZSW
On Mon, October 21, 2013 10:04 AM, Toke Eskildsen wrote:
>On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
>> Toke Eskildsen wrote:
>> > Unfortunately the enum-solution is normally quite slow when there
>> > are enough unique values to trigger the "too many > values"-exception.
>> > [...]
>> 
>> [...] And yes, the fc method was terribly slow in a case where it did
>> work.  Something like 20 minutes whereas enum returned within a few
>> seconds.
>
>Err.. What? That sounds _very_ strange. You have millions of unique
>values so fc should be a lot faster than enum, not the other way around.
>
>I assume the 20 minutes was for the first call. How fast does subsequent
>calls return for fc?

QTime enum:
 1st call: 1200
 subsequent calls: 200

QTime fc:
   never returns, webserver restarts itself after 30 min with 100% CPU load


This is on the test system, the production system managed to return with
"... Too many values for UnInvertedField faceting ...".

However, I also have different faceting queries I played with today.

One complete example:

q=ottomotor&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0

These are the results, all with facet.method=enum (fc doesn't work).  They
were executed in the sequence shown on an otherwise unused server:

QTime=41205  facet.prefix=q=frequent_word  
numFound=44532

Same query repeated:
QTime=225810 facet.prefix=q=ottomotor  
numFound=909
QTime=199839 facet.prefix=q=ottomotor  
numFound=909

QTime=0  facet.prefix=q=ottomotor jkdhwjfh 
numFound=0
QTime=0  facet.prefix=q=jkdhwjfh   
numFound=0

QTime=185948 facet.prefix=q=ottomotor  
numFound=909

QTime=3344   facet.prefix=d   q=ottomotor  
numFound=909
QTime=3078   facet.prefix=d   q=ottomotor  
numFound=909
QTime=3141   facet.prefix=d   q=ottomotor  
numFound=909

The response time is obviously not dependent on the number of documents found.
Caching doesn't kick in either.

>
>
>Maybe you could provide some approximate numbers?

I'll try, see below.  Thanks for asking and having a closer look.

>
>- Documents in your index
13,434,414

>- Unique values in the CONTENT field
Not sure how to get this.  In luke I find
21,797,514 term count CONTENT

Is that what you mean?

>- Hits are returned from a typical query
Hm, that can be anything between 0 and 40,000 or more.
Or do you mean from the facets?  Or do my tests above
answer it?

>- Xmx
The maximum the system allows me to get: 1612m


Maybe I have a hopelessly under-dimensioned server for this sort of things?

Thanks a lot for your help,
Michael


Re: Solr timeout after reboot

2013-10-21 Thread Shawn Heisey
On 10/21/2013 8:03 AM, michael.boom wrote:
> I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
> so I guess putting running the above command would bite all available
> memory.

With a 100GB index, I would want a minimum server memory size of 64GB,
and I would much prefer 128GB.  If you shard your index, then each
machine will require less memory, because each one will have less of the
index onboard.  Running a big Solr install is usually best handled on
bare metal, because it loves RAM, and getting a lot of memory in a
virtual environment is quite expensive.  It's also expensive on bare
metal too, but unlike Amazon, more memory doesn't increase your monthly
cost.

With only 15GB total RAM and an index that big, you're probably giving
at least half of your RAM to Solr, leaving *very* little for the OS disk
cache, compared to your index size.  The ideal cache size is the same as
your index size, but you can almost always get away with less.

http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

If you try the "cat" trick with your numbers, it's going to take forever
every time you run it, it will kill your performance while it's
happening, and only the last few GB that it reads will remain in the OS
disk cache.  Chances are that it will be the wrong part of the index, too.

You only want to cat your entire index if you have enough free RAM to
*FIT* your entire index.  If you *DO* have that much free memory (which
for you would require a total RAM size of about 128GB), then the first
time will take quite a while, but every time you do it after that, it
will happen nearly instantly, because it will not have to actually read
the disk at all.

You could try only doing the cat on certain index files, but when you
don't have enough cache for the entire index, running queries will do a
better job of filling the cache intelligently.  The first bunch of
queries will be slow.

Summary: You need more RAM.  Quite a bit more RAM.

Thanks,
Shawn



Re: Local Solr and Webserver-Solr act differently ("and" treated like "or")

2013-10-21 Thread Jack Krupansky

Did you completely reindex your data after emptying the stop words file?

-- Jack Krupansky

-Original Message- 
From: Stavros Delisavas

Sent: Monday, October 21, 2013 10:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently ("and" treated 
like "or")


Okay, I emtpied the stopword file. I don't know where the wordlist came
from. I have never seen this and never touched that file. Anyways...
Now my queries do work with one word, like "in" or "to" but the queries
still do not work when I use more than one stopword within one query.
Instead of too many results I now get NO results at all.

What could be the problem?



On 17.10.2013 15:02, Jack Krupansky wrote:

The default Solr stopwords.txt file is empty, so SOMEBODY created that
non-empty stop words file.

The StopFilterFactory token filter in the field type analyzer controls
stop word processing. You can remove that step entirely, or different
field types can reference different stop word files, or some field type
analyzers can use the stop filter and some would not have it. This does
mean that you would have to use different field types for fields that
want different stop word processing.

-- Jack Krupansky

-Original Message- From: Stavros Delisavas
Sent: Thursday, October 17, 2013 3:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently ("and"
treated like "or")

Thank you,
I found the file with the stopwords and noticed that my local file is
empty (comments only) and the one on my webserver has a big list of
english stopwords. That seems to be the problem.

I think in general it is a good idea to use stopwords for random
searches, but it is not usefull in my special case. Is there a way to
(de)activate stopwords query-wise? Like I would like to ignore stopwords
when searching in titles but I would like to use stopwords when users do
a fulltext-search on whole articles, etc.

Thanks again,
Stavros


On 17.10.2013 09:13, Upayavira wrote:

Stopwords are small words such as "and", "the" or "is",that we might
choose to exclude from our documents and queries because they are such
common terms. Once you have stripped stop words from your above query,
all that is left is the word "wild", or so is being suggested.

Somewhere in your config, close to solr config.xml, you will find a file
called something like stopwords.txt. Compare these files between your
two systems.

Upayavira

On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote:

Unfortunatly, I don't really know what stopwords are. I would like it to
not ignore any words of my query.
How/Where can I change this stopwords-behaviour?


Am 16.10.2013 23:45, schrieb Jack Krupansky:

So, the stopwords.txt file is different between the two systems - the
first has stop words but the second does not. Did you expect stop
words to be removed, or not?

-- Jack Krupansky

-Original Message- From: Stavros Delsiavas
Sent: Wednesday, October 16, 2013 5:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently ("and"
treated like "or")

Okay I understand,

here's the rawquerystring. It was at about line 3000:


 title:(into AND the AND wild*)
 title:(into AND the AND wild*)
 +title:wild*
 +title:wild*

At this place the debug output DOES differ from the one on my local
system. But I don't understand why...
This is the local debug output:


  title:(into AND the AND wild*)
  title:(into AND the AND wild*)
  +title:into +title:the +title:wild*
  +title:into +title:the
+title:wild*

Why is that? Any ideas?




Am 16.10.2013 21:03, schrieb Shawn Heisey:

On 10/16/2013 4:46 AM, Stavros Delisavas wrote:

My local solr gives me:
http://pastebin.com/Q6d9dFmZ

and my webserver this:
http://pastebin.com/q87WEjVA

I copied only the first few hundret lines (of more than 8000) because
the webserver output was to big even for pastebin.



On 16.10.2013 12:27, Erik Hatcher wrote:

What does the debug output say from debugQuery=true say between the
two?

What's really needed here is the first part of the  section,
which has rawquerystring, querystring, parsedquery, and
parsedquery_toString.  The info from your local solr has this part,
but
what you pasted from the webserver one didn't include those parts,
because it's further down than the first few hundred lines.

Thanks,
Shawn





Re: Local Solr and Webserver-Solr act differently ("and" treated like "or")

2013-10-21 Thread Stavros Delsiavas
I did a full-import again. That solved the issue. I didn't know that the 
stopwords apply on the indexing itself too.


Thanks a lot,

Stavros


Am 21.10.2013 17:13, schrieb Jack Krupansky:

Did you completely reindex your data after emptying the stop words file?

-- Jack Krupansky

-Original Message- From: Stavros Delisavas
Sent: Monday, October 21, 2013 10:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently ("and" 
treated like "or")


Okay, I emtpied the stopword file. I don't know where the wordlist came
from. I have never seen this and never touched that file. Anyways...
Now my queries do work with one word, like "in" or "to" but the queries
still do not work when I use more than one stopword within one query.
Instead of too many results I now get NO results at all.

What could be the problem?



On 17.10.2013 15:02, Jack Krupansky wrote:

The default Solr stopwords.txt file is empty, so SOMEBODY created that
non-empty stop words file.

The StopFilterFactory token filter in the field type analyzer controls
stop word processing. You can remove that step entirely, or different
field types can reference different stop word files, or some field type
analyzers can use the stop filter and some would not have it. This does
mean that you would have to use different field types for fields that
want different stop word processing.

-- Jack Krupansky

-Original Message- From: Stavros Delisavas
Sent: Thursday, October 17, 2013 3:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently ("and"
treated like "or")

Thank you,
I found the file with the stopwords and noticed that my local file is
empty (comments only) and the one on my webserver has a big list of
english stopwords. That seems to be the problem.

I think in general it is a good idea to use stopwords for random
searches, but it is not usefull in my special case. Is there a way to
(de)activate stopwords query-wise? Like I would like to ignore stopwords
when searching in titles but I would like to use stopwords when users do
a fulltext-search on whole articles, etc.

Thanks again,
Stavros


On 17.10.2013 09:13, Upayavira wrote:

Stopwords are small words such as "and", "the" or "is",that we might
choose to exclude from our documents and queries because they are such
common terms. Once you have stripped stop words from your above query,
all that is left is the word "wild", or so is being suggested.

Somewhere in your config, close to solr config.xml, you will find a 
file

called something like stopwords.txt. Compare these files between your
two systems.

Upayavira

On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote:
Unfortunatly, I don't really know what stopwords are. I would like 
it to

not ignore any words of my query.
How/Where can I change this stopwords-behaviour?


Am 16.10.2013 23:45, schrieb Jack Krupansky:

So, the stopwords.txt file is different between the two systems - the
first has stop words but the second does not. Did you expect stop
words to be removed, or not?

-- Jack Krupansky

-Original Message- From: Stavros Delsiavas
Sent: Wednesday, October 16, 2013 5:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Local Solr and Webserver-Solr act differently ("and"
treated like "or")

Okay I understand,

here's the rawquerystring. It was at about line 3000:


 title:(into AND the AND wild*)
 title:(into AND the AND wild*)
 +title:wild*
 +title:wild*

At this place the debug output DOES differ from the one on my local
system. But I don't understand why...
This is the local debug output:


  title:(into AND the AND wild*)
  title:(into AND the AND wild*)
  +title:into +title:the +title:wild*
  +title:into +title:the
+title:wild*

Why is that? Any ideas?




Am 16.10.2013 21:03, schrieb Shawn Heisey:

On 10/16/2013 4:46 AM, Stavros Delisavas wrote:

My local solr gives me:
http://pastebin.com/Q6d9dFmZ

and my webserver this:
http://pastebin.com/q87WEjVA

I copied only the first few hundret lines (of more than 8000) 
because

the webserver output was to big even for pastebin.



On 16.10.2013 12:27, Erik Hatcher wrote:
What does the debug output say from debugQuery=true say between 
the

two?

What's really needed here is the first part of the  section,
which has rawquerystring, querystring, parsedquery, and
parsedquery_toString.  The info from your local solr has this part,
but
what you pasted from the webserver one didn't include those parts,
because it's further down than the first few hundred lines.

Thanks,
Shawn









SolrCloud performance in VM environment

2013-10-21 Thread Tom Mortimer
Hi everyone,

I've been working on an installation recently which uses SolrCloud to index
45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
identical VMs set up for replicas). The reason we're using so many shards
for a relatively small index is that there are complex filtering
requirements at search time, to restrict users to items they are licensed
to view. Initial tests demonstrated that multiple shards would be required.

The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
total) and 4 CPU units. I know this is far under what would normally be
recommended for an index of this size, and I'm working on persuading the
customer to increase the RAM (basically, telling them it won't work
otherwise.) Performance is currently pretty poor and I would expect more
RAM to improve things. However, there are a couple of other oddities which
concern me,

The first is that I've been reindexing a fixed set of 500 docs to test
indexing and commit performance (with soft commits within 60s). The time
taken to complete a hard commit after this is longer than I'd expect, and
highly variable - from 10s to 70s. This makes me wonder whether the SAN
(which provides all the storage for these VMs and the customers several
other VMs) is being saturated periodically. I grabbed some iostat output on
different occasions to (possibly) show the variability:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb  64.50 0.00  2476.00  0   4952
...
sdb   8.90 0.00   348.00  0   6960
...
sdb   1.15 0.0043.20  0864

The other thing that confuses me is that after a Solr restart or hard
commit, search times average about 1.2s under light load. After searching
the same set of queries for 5-6 iterations this improves to 0.1s. However,
in either case - cold or warm - iostat reports no device reads at all:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb   0.40 0.00 8.00  0160
...
sdb   0.30 0.0010.40  0104

(the writes are due to logging). This implies to me that the 'hot' blocks
are being completely cached in RAM - so why the variation in search time
and the number of iterations required to speed it up?

The Solr caches are only being used lightly by these tests and there are no
evictions. GC is not a significant overhead. Each Solr shard runs in a
separate JVM with 1GB heap.

I don't have a great deal of experience in low-level performance tuning, so
please forgive any naivety. Any ideas of what to do next would be greatly
appreciated. I don't currently have details of the VM implementation but
can get hold of this if it's relevant.

thanks,
Tom


RE: SolrCloud performance in VM environment

2013-10-21 Thread Boogie Shafer
some basic tips.

-try to create enough shards that you can get the size of each index portion on 
the shard closer to the amount of RAM you have on each node (e.g. if you are 
~140GB index on 16GB nodes, try doing 12-16 shards)

-start with just the initial shards, add replicas later when you have dialed 
things in a bit more

-try to leave some memory for the OS as well as the JVM

-try starting with 1/2 of the total ram on each vm allocated to JVM as Xmx value

-try setting the Xms in the range of .75 to 1.0 of Xmx

-do all the normal JVM tuning, esp the part about capturing the gc events in a 
log such that you can see what is going on with java itself..this will probably 
lead you to adjust your GC type, etc

-make sure you arent hammering your storage devices (or the interconnects 
between your servers and your storage)...the OS internal tools on the guest are 
helpful, but you probably want to look at the hypervisor and storage device 
layer directly as well. on vmware the built in perf graphs for datastore 
latency and network throughput are easily observed. esxtop is the cli tool 
which provides the same.

-if you are using a SAN, you probably want to make sure you have some sort of 
MPIO in place (esp if you are using 1GB iscsi)





From: Tom Mortimer 
Sent: Monday, October 21, 2013 08:48
To: solr-user@lucene.apache.org
Subject: SolrCloud performance in VM environment

Hi everyone,

I've been working on an installation recently which uses SolrCloud to index
45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
identical VMs set up for replicas). The reason we're using so many shards
for a relatively small index is that there are complex filtering
requirements at search time, to restrict users to items they are licensed
to view. Initial tests demonstrated that multiple shards would be required.

The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
total) and 4 CPU units. I know this is far under what would normally be
recommended for an index of this size, and I'm working on persuading the
customer to increase the RAM (basically, telling them it won't work
otherwise.) Performance is currently pretty poor and I would expect more
RAM to improve things. However, there are a couple of other oddities which
concern me,

The first is that I've been reindexing a fixed set of 500 docs to test
indexing and commit performance (with soft commits within 60s). The time
taken to complete a hard commit after this is longer than I'd expect, and
highly variable - from 10s to 70s. This makes me wonder whether the SAN
(which provides all the storage for these VMs and the customers several
other VMs) is being saturated periodically. I grabbed some iostat output on
different occasions to (possibly) show the variability:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb  64.50 0.00  2476.00  0   4952
...
sdb   8.90 0.00   348.00  0   6960
...
sdb   1.15 0.0043.20  0864

The other thing that confuses me is that after a Solr restart or hard
commit, search times average about 1.2s under light load. After searching
the same set of queries for 5-6 iterations this improves to 0.1s. However,
in either case - cold or warm - iostat reports no device reads at all:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb   0.40 0.00 8.00  0160
...
sdb   0.30 0.0010.40  0104

(the writes are due to logging). This implies to me that the 'hot' blocks
are being completely cached in RAM - so why the variation in search time
and the number of iterations required to speed it up?

The Solr caches are only being used lightly by these tests and there are no
evictions. GC is not a significant overhead. Each Solr shard runs in a
separate JVM with 1GB heap.

I don't have a great deal of experience in low-level performance tuning, so
please forgive any naivety. Any ideas of what to do next would be greatly
appreciated. I don't currently have details of the VM implementation but
can get hold of this if it's relevant.

thanks,
Tom


Re: Question about docvalues

2013-10-21 Thread Erick Erickson
I really don't understand the question. What behavior are you seeing
that leads you to ask?

bq: Is it necessary duplicate the field and set index and stored to false
and
If this means setting _both_ indexed and stored to false, then you
effectively
throw the field completely away, there's no point in doing this.

FWIW,
Erick


On Mon, Oct 21, 2013 at 1:39 PM, yriveiro  wrote:

> Hi,
>
> If I have a field (named dv_field) configured to be indexed, stored and
> with
> docvalues=true.
>
> How I know that when I do a query like:
>
> q=*:*&facet=true&facet.field=dv_field, I'm really using the docvalues and
> not the normal way?
>
> Is it necessary duplicate the field and set index and stored to false and
> let the docvalues property set to true?
>
>
>
> -
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Pivot faceting not working after upgrading to 4.5

2013-10-21 Thread Henrik Ossipoff Hansen
Hello,

We have a rather weird behavior I don't really understand. As written in a few 
other threads, we're migrating from a master/slave setup running 4.3 to a 
SolrCloud setup running 4.5. Both run on the same data set (the 4.5 instances 
have been re-indexed under 4.5 obviously).

The following query works fine under our 4.3 setup:

?q=*:*&facet.pivot=facet_category,facet_platform&facet=true&rows=0

However, in our 4.5 setup, the facet_pivot entry in the facet_count is straight 
up missing in the response. I've been digging around the logs for a bit, but 
I'm unable to find something relating to this. If I remove one of the 
facet.pivot elements (i.e. only having &facet.pivot=facet_category) I get an 
error as expected, so that part of the component is at least working.

Does anyone have an idea to something obvious I might have missed? I've been 
unable to find any change logs suggesting changes to this part of the facet 
component.

Thanks.

Regards,
Henrik

Re: Pivot faceting not working after upgrading to 4.5

2013-10-21 Thread Henrik Ossipoff Hansen
I realise now that distributed pivotal faceting is not implemented yet in 
SolrCloud after some digging through the internet.

Apologies :)

Den 21/10/2013 kl. 18.20 skrev Henrik Ossipoff Hansen 
:

> Hello,
> 
> We have a rather weird behavior I don't really understand. As written in a 
> few other threads, we're migrating from a master/slave setup running 4.3 to a 
> SolrCloud setup running 4.5. Both run on the same data set (the 4.5 instances 
> have been re-indexed under 4.5 obviously).
> 
> The following query works fine under our 4.3 setup:
> 
> ?q=*:*&facet.pivot=facet_category,facet_platform&facet=true&rows=0
> 
> However, in our 4.5 setup, the facet_pivot entry in the facet_count is 
> straight up missing in the response. I've been digging around the logs for a 
> bit, but I'm unable to find something relating to this. If I remove one of 
> the facet.pivot elements (i.e. only having &facet.pivot=facet_category) I get 
> an error as expected, so that part of the component is at least working.
> 
> Does anyone have an idea to something obvious I might have missed? I've been 
> unable to find any change logs suggesting changes to this part of the facet 
> component.
> 
> Thanks.
> 
> Regards,
> Henrik



Re: Question about docvalues

2013-10-21 Thread Yago Riveiro
Sorry if I don't make understand, my english is not too good.

My goal is remove pressure from the heap, my indexes are too big and the heap 
get full very quick and I get an OOM. I read about docValues stored on disk, 
but I don't know how configure it.

A read this link: 
https://cwiki.apache.org/confluence/display/solr/DocValues#DocValues-HowtoUseDocValues
 witch has an example that how to configure a field to use docValues:



With this configuration is obvious that I will use docValues.

Q: With this configuration, can I retrieve the field value on a normal search 
or still need to be stored?

If I have a field configured as:



And I do a facet query on manu_exact field: 
"q=*:*&facet=true&facet.field=manu_exact"

Q: I leverage the docValues feature?, This means, docValues always has 
precedency if is set over the regular method to do a facet?
Q: Make sense the field indexed if I have docValues?


-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, October 21, 2013 at 5:10 PM, Erick Erickson wrote:

> I really don't understand the question. What behavior are you seeing
> that leads you to ask?
> 
> bq: Is it necessary duplicate the field and set index and stored to false
> and
> If this means setting _both_ indexed and stored to false, then you
> effectively
> throw the field completely away, there's no point in doing this.
> 
> FWIW,
> Erick
> 
> 
> On Mon, Oct 21, 2013 at 1:39 PM, yriveiro  (mailto:yago.rive...@gmail.com)> wrote:
> 
> > Hi,
> > 
> > If I have a field (named dv_field) configured to be indexed, stored and
> > with
> > docvalues=true.
> > 
> > How I know that when I do a query like:
> > 
> > q=*:*&facet=true&facet.field=dv_field, I'm really using the docvalues and
> > not the normal way?
> > 
> > Is it necessary duplicate the field and set index and stored to false and
> > let the docvalues property set to true?
> > 
> > 
> > 
> > -
> > Best regards
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
> > Sent from the Solr - User mailing list archive at Nabble.com 
> > (http://Nabble.com).
> > 
> 
> 
> 




Re: Question about docvalues

2013-10-21 Thread Gun Akkor
Hello Yago,

To my knowledge, in facet calculations docValues take precedence over other 
methods. So, even if your field is also stored and indexed, your facets won't 
use the inverted index or fieldValueCache, when docValues are present.

I think you will still have to store and index to maintain your other 
functionality. DocValues are helpful only for facets and sorting to my 
knowledge.

Hope this helps,

Gun Akkor
www.carbonblack.com
Sent from my iPhone

On Oct 21, 2013, at 12:41 PM, Yago Riveiro  wrote:

> Sorry if I don't make understand, my english is not too good.
> 
> My goal is remove pressure from the heap, my indexes are too big and the heap 
> get full very quick and I get an OOM. I read about docValues stored on disk, 
> but I don't know how configure it.
> 
> A read this link: 
> https://cwiki.apache.org/confluence/display/solr/DocValues#DocValues-HowtoUseDocValues
>  witch has an example that how to configure a field to use docValues:
> 
>  docValues="true" />
> 
> With this configuration is obvious that I will use docValues.
> 
> Q: With this configuration, can I retrieve the field value on a normal search 
> or still need to be stored?
> 
> If I have a field configured as:
> 
>  docValues="true" />
> 
> And I do a facet query on manu_exact field: 
> "q=*:*&facet=true&facet.field=manu_exact"
> 
> Q: I leverage the docValues feature?, This means, docValues always has 
> precedency if is set over the regular method to do a facet?
> Q: Make sense the field indexed if I have docValues?
> 
> 
> -- 
> Yago Riveiro
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> 
> 
> On Monday, October 21, 2013 at 5:10 PM, Erick Erickson wrote:
> 
>> I really don't understand the question. What behavior are you seeing
>> that leads you to ask?
>> 
>> bq: Is it necessary duplicate the field and set index and stored to false
>> and
>> If this means setting _both_ indexed and stored to false, then you
>> effectively
>> throw the field completely away, there's no point in doing this.
>> 
>> FWIW,
>> Erick
>> 
>> 
>> On Mon, Oct 21, 2013 at 1:39 PM, yriveiro > (mailto:yago.rive...@gmail.com)> wrote:
>> 
>>> Hi,
>>> 
>>> If I have a field (named dv_field) configured to be indexed, stored and
>>> with
>>> docvalues=true.
>>> 
>>> How I know that when I do a query like:
>>> 
>>> q=*:*&facet=true&facet.field=dv_field, I'm really using the docvalues and
>>> not the normal way?
>>> 
>>> Is it necessary duplicate the field and set index and stored to false and
>>> let the docvalues property set to true?
>>> 
>>> 
>>> 
>>> -
>>> Best regards
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
>>> Sent from the Solr - User mailing list archive at Nabble.com 
>>> (http://Nabble.com).
> 
> 


Re: Exact Match Results

2013-10-21 Thread Developer
You need to provide us with the fieldtype information..

If you just want to match the phrase entered by user, you can use
KeywordTokenizerFactory..

Reference:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Creates org.apache.lucene.analysis.core.KeywordTokenizer.

Treats the entire field as a single token, regardless of its content.

Example: "http://example.com/I-am+example?Text=-Hello"; ==>
"http://example.com/I-am+example?Text=-Hello";



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816p4096846.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about docvalues

2013-10-21 Thread Yago Riveiro
Hi Gun,

Thanks for the response.

Indeed I only want docValues to do facets.

IMHO I think that a reference to the fact that docValues take precedence over 
other methods is needed. Is not always obvious.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, October 21, 2013 at 5:53 PM, Gun Akkor wrote:

> Hello Yago,
> 
> To my knowledge, in facet calculations docValues take precedence over other 
> methods. So, even if your field is also stored and indexed, your facets won't 
> use the inverted index or fieldValueCache, when docValues are present.
> 
> I think you will still have to store and index to maintain your other 
> functionality. DocValues are helpful only for facets and sorting to my 
> knowledge.
> 
> Hope this helps,
> 
> Gun Akkor
> www.carbonblack.com (http://www.carbonblack.com)
> Sent from my iPhone
> 
> On Oct 21, 2013, at 12:41 PM, Yago Riveiro  (mailto:yago.rive...@gmail.com)> wrote:
> 
> > Sorry if I don't make understand, my english is not too good.
> > 
> > My goal is remove pressure from the heap, my indexes are too big and the 
> > heap get full very quick and I get an OOM. I read about docValues stored on 
> > disk, but I don't know how configure it.
> > 
> > A read this link: 
> > https://cwiki.apache.org/confluence/display/solr/DocValues#DocValues-HowtoUseDocValues
> >  witch has an example that how to configure a field to use docValues:
> > 
> >  > docValues="true" />
> > 
> > With this configuration is obvious that I will use docValues.
> > 
> > Q: With this configuration, can I retrieve the field value on a normal 
> > search or still need to be stored?
> > 
> > If I have a field configured as:
> > 
> >  > docValues="true" />
> > 
> > And I do a facet query on manu_exact field: 
> > "q=*:*&facet=true&facet.field=manu_exact"
> > 
> > Q: I leverage the docValues feature?, This means, docValues always has 
> > precedency if is set over the regular method to do a facet?
> > Q: Make sense the field indexed if I have docValues?
> > 
> > 
> > -- 
> > Yago Riveiro
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > 
> > 
> > On Monday, October 21, 2013 at 5:10 PM, Erick Erickson wrote:
> > 
> > > I really don't understand the question. What behavior are you seeing
> > > that leads you to ask?
> > > 
> > > bq: Is it necessary duplicate the field and set index and stored to false
> > > and
> > > If this means setting _both_ indexed and stored to false, then you
> > > effectively
> > > throw the field completely away, there's no point in doing this.
> > > 
> > > FWIW,
> > > Erick
> > > 
> > > 
> > > On Mon, Oct 21, 2013 at 1:39 PM, yriveiro  > > (mailto:yago.rive...@gmail.com)> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > If I have a field (named dv_field) configured to be indexed, stored and
> > > > with
> > > > docvalues=true.
> > > > 
> > > > How I know that when I do a query like:
> > > > 
> > > > q=*:*&facet=true&facet.field=dv_field, I'm really using the docvalues 
> > > > and
> > > > not the normal way?
> > > > 
> > > > Is it necessary duplicate the field and set index and stored to false 
> > > > and
> > > > let the docvalues property set to true?
> > > > 
> > > > 
> > > > 
> > > > -
> > > > Best regards
> > > > --
> > > > View this message in context:
> > > > http://lucene.472066.n3.nabble.com/Question-about-docvalues-tp4096802.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com 
> > > > (http://Nabble.com).
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 
> 




Re: Exact Match Results

2013-10-21 Thread kumar
Hi i am using field type configuration in the following way.





  
  
  
  
  
  
  
  
   

   
   
   
   
   
   
   

 
 
  
 
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816p4096847.html
Sent from the Solr - User mailing list archive at Nabble.com.


Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

2013-10-21 Thread JT
Does anyone have a good link to a guide / tutorial /etc. for writing a
custom function query in Solr 4?

The tutorials I've seen vary from showing half the code to being written
for older versions of Solr.


Any type of pointers would be appreciated, thanks.


Re: Solr timeout after reboot

2013-10-21 Thread Otis Gospodnetic
Hi Michael,

I agree with Shawn, don't listen to Peter ;)  but only this once -
he's a smart guy, as you can see in list archives.
And I disagree with Shawn. again, only just this once and only
somewhat. :)  Because:

In general, Shawn's advice is correct, but we have no way of knowing
your particular details.  TO illustrate the point, let me use an
extreme case where you have just one query that you hammer your
servers with.  Your Solr caches will be well utilized and your servers
will not really need a lot of memory to cache your 100 GB index
because only a small portion of it will ever be accessed.  Of course,
this is an extreme case and not realistic, but I think it helps one
understands how as the number of distinct queries grows (and thus also
the number of distinct documents being matched and returned), the need
for more and more memory goes up.  So the question is where exactly
your particular application falls.

You mentioned stress testing.  Just like you, I am assuming, have a
real index there, you need to have your real queries, too - real
volume, real diversity, real rate, real complexity, real or as close
to real everything.

Since you as using SPM, you should be able to go to various graphs in
SPM and look for a little ambulance icon above each graph.  Use that
to assemble a message with N graphs you want us to look at and we'll
be able to help more.  Graphs that may be of interest here are your
Solr cache graphs, disk IO, and memory graphs -- taken during your
realistic stress testing, of course.  You can then send that message
directly to solr-user, assuming your SPM account email address is
subscribed to the list.  Or you can paste it into a new email, up to
you.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



On Mon, Oct 21, 2013 at 11:07 AM, Shawn Heisey  wrote:
> On 10/21/2013 8:03 AM, michael.boom wrote:
>> I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
>> so I guess putting running the above command would bite all available
>> memory.
>
> With a 100GB index, I would want a minimum server memory size of 64GB,
> and I would much prefer 128GB.  If you shard your index, then each
> machine will require less memory, because each one will have less of the
> index onboard.  Running a big Solr install is usually best handled on
> bare metal, because it loves RAM, and getting a lot of memory in a
> virtual environment is quite expensive.  It's also expensive on bare
> metal too, but unlike Amazon, more memory doesn't increase your monthly
> cost.
>
> With only 15GB total RAM and an index that big, you're probably giving
> at least half of your RAM to Solr, leaving *very* little for the OS disk
> cache, compared to your index size.  The ideal cache size is the same as
> your index size, but you can almost always get away with less.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
>
> If you try the "cat" trick with your numbers, it's going to take forever
> every time you run it, it will kill your performance while it's
> happening, and only the last few GB that it reads will remain in the OS
> disk cache.  Chances are that it will be the wrong part of the index, too.
>
> You only want to cat your entire index if you have enough free RAM to
> *FIT* your entire index.  If you *DO* have that much free memory (which
> for you would require a total RAM size of about 128GB), then the first
> time will take quite a while, but every time you do it after that, it
> will happen nearly instantly, because it will not have to actually read
> the disk at all.
>
> You could try only doing the cat on certain index files, but when you
> don't have enough cache for the entire index, running queries will do a
> better job of filling the cache intelligently.  The first bunch of
> queries will be slow.
>
> Summary: You need more RAM.  Quite a bit more RAM.
>
> Thanks,
> Shawn
>


Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

2013-10-21 Thread Jack Krupansky
Take a look at the unit tests for various "value sources", and find a Jira 
that added some value source and look at the patch for what changes had to 
be made.


-- Jack Krupansky

-Original Message- 
From: JT

Sent: Monday, October 21, 2013 1:17 PM
To: solr-user@lucene.apache.org
Subject: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

Does anyone have a good link to a guide / tutorial /etc. for writing a
custom function query in Solr 4?

The tutorials I've seen vary from showing half the code to being written
for older versions of Solr.


Any type of pointers would be appreciated, thanks. 



Re: SolrCloud performance in VM environment

2013-10-21 Thread Shawn Heisey

On 10/21/2013 9:48 AM, Tom Mortimer wrote:

Hi everyone,

I've been working on an installation recently which uses SolrCloud to index
45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
identical VMs set up for replicas). The reason we're using so many shards
for a relatively small index is that there are complex filtering
requirements at search time, to restrict users to items they are licensed
to view. Initial tests demonstrated that multiple shards would be required.

The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
total) and 4 CPU units. I know this is far under what would normally be
recommended for an index of this size, and I'm working on persuading the
customer to increase the RAM (basically, telling them it won't work
otherwise.) Performance is currently pretty poor and I would expect more
RAM to improve things. However, there are a couple of other oddities which
concern me,


Running multiple shards like you are, where each operating system is 
handling more than one shard, is only going to perform better if your 
query volume is low and you have lots of CPU cores.  If your query 
volume is high or you only have 2-4 CPU cores on each VM, you might be 
better off with fewer shards or not sharded at all.


The way that I read this is that you've got two physical machines with 
32GB RAM, each running two VMs that have 16GB.  Each VM houses 4 shards, 
or 70GB of index.


There's a scenario that might be better if all of the following are 
true: 1) I'm right about how your hardware is provisioned.  2) You or 
the client owns the hardware.  3) You have an extremely low-end third 
machine available - single CPU with 1GB of RAM would probably be 
enough.  In this scenario, you run one Solr instance and one zookeeper 
instance on each of your two "big" machines, and use the third wimpy 
machine as a third zookeeper node.  No virtualization.  For the rest of 
my reply, I'm assuming that you haven't taken this step, but it will 
probably apply either way.



The first is that I've been reindexing a fixed set of 500 docs to test
indexing and commit performance (with soft commits within 60s). The time
taken to complete a hard commit after this is longer than I'd expect, and
highly variable - from 10s to 70s. This makes me wonder whether the SAN
(which provides all the storage for these VMs and the customers several
other VMs) is being saturated periodically. I grabbed some iostat output on
different occasions to (possibly) show the variability:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb  64.50 0.00  2476.00  0   4952
...
sdb   8.90 0.00   348.00  0   6960
...
sdb   1.15 0.0043.20  0864


There are two likely possibilities for this.  One or both of them might 
be in play.  1) Because the OS disk cache is small, not much of the 
index can be cached.  This can result in a lot of disk I/O for a commit, 
slowing things way down.  Increasing the size of the OS disk cache is 
really the only solution for that. 2) Cache autowarming, particularly 
the filter cache.  In the cache statistics, you can see how long each 
cache took to warm up after the last searcher was opened.  The solution 
for that is to reduce the autowarmCount values.



The other thing that confuses me is that after a Solr restart or hard
commit, search times average about 1.2s under light load. After searching
the same set of queries for 5-6 iterations this improves to 0.1s. However,
in either case - cold or warm - iostat reports no device reads at all:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb   0.40 0.00 8.00  0160
...
sdb   0.30 0.0010.40  0104

(the writes are due to logging). This implies to me that the 'hot' blocks
are being completely cached in RAM - so why the variation in search time
and the number of iterations required to speed it up?


Linux is pretty good about making limited OS disk cache resources work.  
Sounds like the caching is working reasonably well for queries.  It 
might not be working so well for updates or commits, though.


Running multiple Solr JVMs per machine, virtual or not, causes more 
problems than it solves.  Solr has no limits on the number of cores 
(shard replicas) per instance, assuming there are enough system 
resources.  There should be exactly one Solr JVM per operating system.  
Running more than one results in quite a lot of overhead, and your 
memory is precious.  When you create a collection, you can give the 
collections API the "maxShardsPerNode" parameter to create more than one 
shard per instance.



I don't have a great deal of experience in low-level performance tuning, so
please forgive any naivety. Any ideas of what to do next would be greatly
appreciated. I don't currently have 

Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

2013-10-21 Thread fudong li
Hi Jack,

Do you have a date for the new version of your book:
solr_4x_deep_dive_early_access?

Thanks,

Fudong


On Mon, Oct 21, 2013 at 10:39 AM, Jack Krupansky wrote:

> Take a look at the unit tests for various "value sources", and find a Jira
> that added some value source and look at the patch for what changes had to
> be made.
>
> -- Jack Krupansky
>
> -Original Message- From: JT
> Sent: Monday, October 21, 2013 1:17 PM
> To: solr-user@lucene.apache.org
> Subject: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?
>
>
> Does anyone have a good link to a guide / tutorial /etc. for writing a
> custom function query in Solr 4?
>
> The tutorials I've seen vary from showing half the code to being written
> for older versions of Solr.
>
>
> Any type of pointers would be appreciated, thanks.
>


Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

2013-10-21 Thread Jack Krupansky

Hopefully at the end of the week.

-- Jack Krupansky

-Original Message- 
From: fudong li

Sent: Monday, October 21, 2013 1:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

Hi Jack,

Do you have a date for the new version of your book:
solr_4x_deep_dive_early_access?

Thanks,

Fudong


On Mon, Oct 21, 2013 at 10:39 AM, Jack Krupansky 
wrote:



Take a look at the unit tests for various "value sources", and find a Jira
that added some value source and look at the patch for what changes had to
be made.

-- Jack Krupansky

-Original Message- From: JT
Sent: Monday, October 21, 2013 1:17 PM
To: solr-user@lucene.apache.org
Subject: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?


Does anyone have a good link to a guide / tutorial /etc. for writing a
custom function query in Solr 4?

The tutorials I've seen vary from showing half the code to being written
for older versions of Solr.


Any type of pointers would be appreciated, thanks.





reindexing data

2013-10-21 Thread Christopher Gross
In Solr 4.5, I'm trying to create a new collection on the fly.  I have a
data dir with the index that should be in there, but the CREATE command
makes the directory be:
_shard1_replicant#

I was hoping that making a collection named something would use a directory
with that name to let me use the data that I already have to fill the
collection.  I could go and just make each one
(__replicant[1,2,3]), but I was hoping there may be an easier
way of doing this.

Sorry if this is confusing (it is Monday), I can try clarify if needed.
Thanks.

-- Chris


Re: Questions developing custom functionquery

2013-10-21 Thread JT
I would agree the "right" way to do this is probably just add the
information I wish to sort on directly, as a date field or something like
that.

The issue is we currently have ~300m documents that are already indexed.
Not all of the fields have stored=true (for good reason, we maintain the
documents externally, about 7TB worth. I didn't want to replicate 7TB of
data twice.) so we cannot update these indexed values.


I was hoping to spend 2-3 days writing a custom query to avoid 2+ months of
indexing everything all over again.



So let me just ask this question, given my current situation, lets say you
had the following field

/path/to/file/month/day/year/file.txt


I simply want to extract the month/day/year and sort based on that.

My current plan was to convert the month, day, year into seconds from right
now, and return that number. Thus sorting ascending, it should return
newest documents first.



-JT


On Fri, Oct 18, 2013 at 3:14 PM, Chris Hostetter
wrote:

>
> : Field-Type: org.apache.solr.schema.TextField
> ...
> : DocTermsIndexDocValues<
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-queries/4.3.0/org/apache/lucene/queries/function/docvalues/DocTermsIndexDocValues.java#DocTermsIndexDocValues
> >.
> : Calling "getVal()" on a DocTermsIndexDocValues does some really weird
> stuff
> : that I really don't understand.
>
> Your TextField is being analyzed in some way you haven't clarified, and
> the DocTermsIndexDocValues you get contains the details of each term in
> that TextField
>
> : Its possible I'm going about this wrong and need to re-do my approach.
> I'm
> : just currently at a loss for what that approach is.
>
> Based on your initial goal, you are most certainly going about this in a
> much more complicated way then you need to...
>
> : > > > My goal is to be able to implement a custom sorting technique.
>
> : > > > Example: /some
> : > > > example/data/here/2013/09/12/testing.text
> : > > >
> : > > > I would like to do a custom sort based on this resname field.
> : > > > Basically, I would like to parse out that date there (2013/09/12)
> and
> : > > sort
> : > > > on that date.
>
> You are going to be *MUCH* happier (both in terms of effort, and in terms
> of performance) if instead of writing a custom function to parse strings
> at query time when sorting, you implement the parsing logic when indexing
> the doc and index it up front as a date field that you can sort on.
>
> I would suggest something like CloneFieldUpdateProcessorFactory +
> RegexReplaceProcessorFactory could save you the work of needing to
> implement any custom logic -- but as Jack pointed out in SOLR-4864 it
> doesn't currently allow you to do capture group replacements (but maybe
> you could contribute a patch to fix that instead of needing to write
> completely custom code for yourself)
>
> Of maybe, as is, you could use RegexReplaceProcessorFactory to throw away
> non digits - and then use ParseDateFieldUpdateProcessorFactory to get what
> you want?  (I'm not certain - i haven't played with
> ParseDateFieldUpdateProcessorFactory much)
>
> https://issues.apache.org/jira/browse/SOLR-4864
>
> https://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html
>
> https://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
>
> https://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
>
>
>
> -Hoss
>


How to extract a field with a prefixed dimension?

2013-10-21 Thread javozzo
Hi,
i'm new in solr.
i use the content field to extract the text of solr documents, but this
field is too long. 
Is there a way to extract only a substring of this field?
i make my query in java as follow:

SolrQuery querySolr = new SolrQuery();
querySolr.setQuery("*:*");
querySolr.setRows(3);
querySolr.setParam("wt", "json");
querySolr.addField("content");
querySolr.addField("title");
querySolr.addField("url");

any ideas?
Thanks
Danilo



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-extract-a-field-with-a-prefixed-dimension-tp4096877.html
Sent from the Solr - User mailing list archive at Nabble.com.


External Zookeeper and JBOSS

2013-10-21 Thread Branham, Jeremy [HR]
When I use the Zookeeper CLI utility, I'm not sure if the configuration is 
uploading correctly.
How can I tell?

This is the command I am issuing -
./zkCli.sh -cmd upconfig -server 127.0.0.1:2181 -confdir 
/data/v8p/solr/root/conf -confname defaultconfig -solrhome /data/v8p/solr

Then checking with this -
[zk: localhost:2181(CONNECTED) 0] ls /
[aliases.json, live_nodes, overseer, overseer_elect, collections, zookeeper, 
clusterstate.json]


But I don't see any config node.

One thing to note - I have multiple cores but the configs are located in a 
common dir.
Maybe that is causing a problem.

Sorl.xml [simplified by removing additional cores]



  


  



Am I overlooking something obvious?

Thanks!



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.


RE: External Zookeeper and JBOSS

2013-10-21 Thread Branham, Jeremy [HR]

I've made progress...

Rather than using the zkCli.sh in the zookeep bin folder, I used the java libs 
fom SOLR and the config now shows up.




Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham


-Original Message-
From: Branham, Jeremy [HR]
Sent: Monday, October 21, 2013 2:20 PM
To: SOLR User distro (solr-user@lucene.apache.org)
Subject: External Zookeeper and JBOSS

When I use the Zookeeper CLI utility, I'm not sure if the configuration is 
uploading correctly.
How can I tell?

This is the command I am issuing -
./zkCli.sh -cmd upconfig -server 127.0.0.1:2181 -confdir 
/data/v8p/solr/root/conf -confname defaultconfig -solrhome /data/v8p/solr

Then checking with this -
[zk: localhost:2181(CONNECTED) 0] ls /
[aliases.json, live_nodes, overseer, overseer_elect, collections, zookeeper, 
clusterstate.json]


But I don't see any config node.

One thing to note - I have multiple cores but the configs are located in a 
common dir.
Maybe that is causing a problem.

Sorl.xml [simplified by removing additional cores]



  


  



Am I overlooking something obvious?

Thanks!



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.



This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.



Re: External Zookeeper and JBOSS

2013-10-21 Thread Shawn Heisey

On 10/21/2013 1:19 PM, Branham, Jeremy [HR] wrote:

Sorl.xml [simplified by removing additional cores]



   
 
 
   



These cores that you have listed here do not look like SolrCloud-related 
cores, because they do not reference a collection or a shard.  Here's 
what I've got on a 4.2.1 box where all cores were automatically created 
by the CREATE action on the collections API:


instanceDir="eatatjoes_shard1_replica2/" transient="false" 
name="eatatjoes_shard1_replica2" config="solrconfig.xml" 
collection="eatatjoes"/>
instanceDir="test3_shard1_replica1/" transient="false" 
name="test3_shard1_replica1" config="solrconfig.xml" collection="test3"/>
instanceDir="smb2_shard1_replica1/" transient="false" 
name="smb2_shard1_replica1" config="solrconfig.xml" collection="smb2"/>


On the commandline script -- the zkCli.sh script comes with zookeeper, 
but it is not aware of anything having to do with SolrCloud.  There is 
another script named zkcli.sh (note the lowercase C) that comes with the 
solr example (in example/cloud-scripts)- it's a very different script 
and will accept the options that you tried to give.


I do wonder how much pain would be caused by renaming the Solr zkcli 
script so it's not so similar to the one that comes with Zookeeper.


Thanks,
Shawn



Major GC does not reduce the old gen size

2013-10-21 Thread neoman
Hello everyone,
We are using solr 4.4 version production with 4 shards. This is our memory
settings.
-d64 -server -Xms8192m -Xmx12288m -XX:MaxPermSize=256m \
-XX:NewRatio=1 -XX:SurvivorRatio=6 \
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-XX:CMSIncrementalDutyCycleMin=0 \
-XX:CMSIncrementalDutyCycle=10 -XX:+CMSIncrementalPacing \
-XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC \
-XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC \
-XX:+UseLargePages \
-XX:+UseParNewGC \
-XX:ConcGCThreads=10 \
-XX:ParallelGCThreads=10 \
-XX:MaxGCPauseMillis=3 \
I notice in production that, the old generation becomes full and no amount
of garbage collection will free up the memory
This is similar to the issue discussed in this link. 
http://grokbase.com/t/lucene/solr-user/12bwydq5jr/permanently-full-old-generation
Did anyone have this problem? Can you please point anything wrong with the
GC configuration?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Major-GC-does-not-reduce-the-old-gen-size-tp4096880.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: measure result set quality

2013-10-21 Thread Alvaro Cabrerizo
Thanks for your valuable answers.

As a first approach I will evaluate (manually :( ) hits that are out of the
intersection set for every query in each system. Anyway I will keep
searching for literature in the field.

Regards.


On Sun, Oct 20, 2013 at 10:55 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> That's exactly what we advocate for in our Solr work. We call in "Test
> Driven Relevancy". We work closely with content experts to help build
> collaboration around search quality. (disclaimer, yes we build a product
> around this) but the advice still stands regardless.
>
>
> http://www.opensourceconnections.com/2013/10/14/what-is-test-driven-search-relevancy/
>
> Cheers
> -Doug Turnbull
> Search Relevancy Expert
> OpenSource Connections
>
>
>
>
> On Sun, Oct 20, 2013 at 4:21 PM, Furkan KAMACI  >wrote:
>
> > Let's assume that you have keywords to search and different
> configurations
> > for indexing. A/B testing is one of techniques that you can use as like
> > Erick mentioned.
> >
> > If you want to have an automated comparison and do not have a oracle for
> > A/B testing there is another way. If you have an ideal result list you
> can
> > compare the similarity of your different configuration results and that
> > ideal result list.
> >
> > The "ideal result list" can be created by an expert just for one time. If
> > you are developing a search engine you can search same keywords at that
> one
> > of search engines and you can use that results as ideal result list to
> > measure your result lists' similarities.
> >
> > Kendall's tau is one of the methods to use for such kind of situations.
> If
> > you do not have any document duplication at your index (without any other
> > versions) I suggest to use tau a.
> >
> > If you explain your system and if you explain what is good for you or
> what
> > is ideal for you I can explain you more.
> >
> > Thanks;
> > Furkan KAMACI
> >
> >
> > 2013/10/18 Erick Erickson 
> >
> > > bq: How do you compare the quality of your
> > > search result in order to decide which schema is better?
> > >
> > > Well, that's actually a hard problem. There's the
> > > various TREC data, but that's a generic solution and most
> > > every individual application of this generic thing called
> > > "search" has its own version of "good" results.
> > >
> > > Note that scores are NOT comparable across different
> > > queries even in the same data set, so don't go down that
> > > path.
> > >
> > > I'd fire the question back at you, "Can you define what
> > > good (or better) results are in such a way that you can
> > > program an evaluation?" Often the answer is "no"...
> > >
> > > One common technique is to have knowledgable users
> > > do what's called A/B testing. You fire the query at two
> > > separate Solr instances and display the results side-by-side,
> > > and the user says "A is more relevant", or "B is more
> > > relevant". Kind of like an eye doctor. In sophisticated A/B
> > > testing, the program randomly changes which side the
> > > results go, so you remove "sidedness" bias.
> > >
> > >
> > > FWIW,
> > > Erick
> > >
> > >
> > > On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo  > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > Imagine the next situation. You have a corpus of documents and a list
> > of
> > > > queries extracted from production environment. The corpus haven't
> been
> > > > manually annotated with relvant/non relevant tags for every query.
> Then
> > > you
> > > > configure various solr instances changing the schema (adding
> synonyms,
> > > > stopwords...). After indexing, you prepare and execute the test over
> > > > different schema configurations.  How do you compare the quality of
> > your
> > > > search result in order to decide which schema is better?
> > > >
> > > > Regards.
> > > >
> > >
> >
>
>
>
> --
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections 
>


Re: Exact Match Results

2013-10-21 Thread Developer
For exact phrase match you can wrap the query inside quotes but this will
perform the exact match and it wont match other results.

The below query will match only : Okkadu telugu movie stills

http://localhost:8983/solr/core1/select?q=%22okkadu%20telugu%20movie%20stills%22

Since you are using Edge N Gram filter, it produces so many tokens (as
below). You might not get the desired output. You can try using shingle
factory with standard analyzer instead of using edge n gram filter.

o
[6f]
0
26
1
1
word

ok
[6f 6b]
0
26
1
1
word

okk
[6f 6b 6b]
0
26
1
1
word

okka
[6f 6b 6b 61]
0
26
1
1
word

okkad
[6f 6b 6b 61 64]
0
26
1
1
word

okkadu
[6f 6b 6b 61 64 75]
0
26
1
1
word

okkadu
[6f 6b 6b 61 64 75 20]
0
26
1
1
word

okkadu t
[6f 6b 6b 61 64 75 20 74]
0
26
1
1
word

okkadu te
[6f 6b 6b 61 64 75 20 74 65]
0
26
1
1
word

okkadu tel
[6f 6b 6b 61 64 75 20 74 65 6c]
0
26
1
1
word





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816p4096906.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Seeking New Moderators for solr-user@lucene

2013-10-21 Thread Andrew Psaltis
Hey Hoss,
I would be interested in being a moderator.

Thanks,
Andrew


On Sun, Oct 20, 2013 at 7:09 AM, Jeevanandam M.  wrote:

> Hello Hoss -
>
> My pleasure, kindly accept my moderator nomination.
>
> Regards,
> Jeeva
>
> -- Original Message --
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> Sent: October 19, 2013 12:22:34 AM GMT+05:30
> To: solr-user@lucene.apache.org
> Subject: Seeking New Moderators for solr-user@lucene
>
>
>
> It looks like it's time to inject some fresh blood into the
> solr-user@lucene moderation team.
>
> If you'd like to volunteer to be a moderator, please reply back to this
> thread and specify which email address you'd like to use as a moderator (if
> different from the one you use when sending the email)
>
> Being a moderator is really easy: you'll get a some extra emails in your
> inbox with MODERATE in the subject, which you skim to see if they are spam
> -- if they are you delete them, if not you "reply all" to let them get sent
> to the list, and authorize that person to send future messages w/o
> moderation.
>
> Occasionally, you'll see an explicit email to solr-user-owner@lucene from
> a user asking for help realted to their subscription (usually unsubscribing
> problems) and you and the other moderators chime in with assistance when
> possible.
>
> More details can be found here...
>
> https://wiki.apache.org/solr/MailingListModeratorInfo
>
> (I'll wait ~72+ hours to see who responds, and then file the appropriate
> jira with INFRA)
>
>
> -Hoss
>
>


Re: How to extract a field with a prefixed dimension?

2013-10-21 Thread Upayavira
Not too sure what you're asking. Are you saying that you want to only
return a relevant part of a field in search results - i.e. a contextual
snippet?

If so, then you should look at the highlighting component, which can do
this.

http://wiki.apache.org/solr/HighlightingParameters

Upayavira

On Mon, Oct 21, 2013, at 07:57 PM, javozzo wrote:
> Hi,
> i'm new in solr.
> i use the content field to extract the text of solr documents, but this
> field is too long. 
> Is there a way to extract only a substring of this field?
> i make my query in java as follow:
> 
> SolrQuery querySolr = new SolrQuery();
> querySolr.setQuery("*:*");
> querySolr.setRows(3);
> querySolr.setParam("wt", "json");
> querySolr.addField("content");
> querySolr.addField("title");
> querySolr.addField("url");
> 
> any ideas?
> Thanks
> Danilo
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-extract-a-field-with-a-prefixed-dimension-tp4096877.html
> Sent from the Solr - User mailing list archive at Nabble.com.