Re: TIKA OCR not working

2015-04-24 Thread trung.ht
HI everyone, Does anyone have the answer for this problem :)? I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika 1.7, > but it looks like it does not work. Does anyone know that TIKA OCR works > automatically with Solr or I have to change some settings? > >> Trung. > It's n

Using SolrJ to access schema.xml

2015-04-24 Thread Steven White
Hi Everyone, Per this link https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-ListFieldTypes Solr supports REST Schema API to modify to the schema. I looked at http://lucene.apache.org/solr/4_2_1/solr-solrj/index.html?overview-summary.html in hope SolrJ has Java API to allow sc

Re: payload similarity

2015-04-24 Thread Erick Erickson
I put up a complete example not too long ago that may help, see: http://lucidworks.com/blog/end-to-end-payload-example-in-solr/ Best, Erick On Fri, Apr 24, 2015 at 6:33 AM, Dmitry Kan wrote: > Ahmet, exactly. As I have just illustrated with code, simultaneously with > your reply. Thanks! > > On

Re: require diversity in results?

2015-04-24 Thread Erick Erickson
Often, for small numbers of distinct types people use grouping and have the app layer mingle them or whatever is pleasing. I think this is different than post-processing you mention. Grouping (aka "field collapsing") can be expensive if there are a large number of groups but for small numbers it's

Re: Odp.: solr issue with pdf forms

2015-04-24 Thread Erick Erickson
Steve: Right, it's not exactly obvious. Bring up the admin UI, something like http://localhost:8983/solr. From there you have to select a core in the 'core selector' drop-down on the left side. If you're using SolrCloud, this will have a rather strange name, but it should be easy to identify what

Re: AW: o.a.s.c.SolrException: missing content stream

2015-04-24 Thread Chris Hostetter
: Another question I have though (which fits the subject even better): : In the log I see many : org.apache.solr.common.SolrException: missing content stream ... : What are possible reasons herfore? The possible and likeley reasons are that you sent an "update" request w/o any ContentStr

Re: and stopword in user query is being change to q.op=AND

2015-04-24 Thread Chris Hostetter
: I was under understanding that stopwords are filtered even before being : parsed by search handler, i do have the filter in collection schema to : filter stopwords and the analysis shows that this stopword is filtered Generally speaking, your understanding of the order of operations for query

Re: and stopword in user query is being change to q.op=AND

2015-04-24 Thread Shawn Heisey
On 4/24/2015 10:55 AM, Rajesh Hazari wrote: > I was under understanding that stopwords are filtered even before > being parsed by search handler, i do have the filter in collection > schema to filter stopwords and the analysis shows that this stopword > is filtered > > Analysis response : attached

Re: and stopword in user query is being change to q.op=AND

2015-04-24 Thread Rajesh Hazari
I was under understanding that stopwords are filtered even before being parsed by search handler, i do have the filter in collection schema to filter stopwords and the analysis shows that this stopword is filtered Analysis response : attached is the solr analysis json response. [image: Inline im

Re: Checking of Solr Memory and Disk usage

2015-04-24 Thread Zheng Lin Edwin Yeo
Meaning this was working fine until Solr 5.0.0? I'm quite new to Solr and I only started to use it when Solr 5.0.0 was released. Regards, Edwin On 24 April 2015 at 18:20, Tom Evans wrote: > On Fri, Apr 24, 2015 at 8:31 AM, Zheng Lin Edwin Yeo > wrote: > > Hi, > > > > So has anyone knows what i

RE: Remote connection to Solr

2015-04-24 Thread Garth Grimm
Shawn's explanation fits better with why Websphere and Jetty might behave differently. But something else that might be happening could be if the DHCP negotiation causes the IP address to change from one network to another and back. -Original Message- From: Steven White [mailto:swhite4

Re: Remote connection to Solr

2015-04-24 Thread Steven White
Hi Shawn, The firewall was the first thing I looked into and after fiddling with it, I still see the issue. But if that was the issue, why WebSphere doesn't run into it but Jetty is? However, your point about domain / non domain and private / public network maybe provide me with some new area to

Re: ArrayIndexOutOfBoundsException in RecordingJSONParser.java

2015-04-24 Thread Scott Dawson
Ticket opened: https://issues.apache.org/jira/i#browse/SOLR-7462 Thanks, Scott On Fri, Apr 24, 2015 at 9:38 AM, Shawn Heisey wrote: > On 4/24/2015 7:16 AM, Scott Dawson wrote: > > Should I create a JIRA ticket? (Am I allowed to?) I can provide more > info > > about my particular usage includin

Re: Remote connection to Solr

2015-04-24 Thread Shawn Heisey
On 4/24/2015 8:03 AM, Steven White wrote: > This maybe a Jetty question but let me start here first. > > I have Solr running on my laptop and from my desktop I have no issue > accessing it. However, if I take my laptop home and connect it to my home > network, the next day when I connect the lapt

Remote connection to Solr

2015-04-24 Thread Steven White
Hi Everyone, This maybe a Jetty question but let me start here first. I have Solr running on my laptop and from my desktop I have no issue accessing it. However, if I take my laptop home and connect it to my home network, the next day when I connect the laptop to my office network, I no longer c

Re: ArrayIndexOutOfBoundsException in RecordingJSONParser.java

2015-04-24 Thread Shawn Heisey
On 4/24/2015 7:16 AM, Scott Dawson wrote: > Should I create a JIRA ticket? (Am I allowed to?) I can provide more info > about my particular usage including a stacktrace if that's helpful. I'm > using the new custom JSON indexing, which, by the way, is an excellent > feature and will be of great be

Re: payload similarity

2015-04-24 Thread Dmitry Kan
Ahmet, exactly. As I have just illustrated with code, simultaneously with your reply. Thanks! On Fri, Apr 24, 2015 at 4:30 PM, Ahmet Arslan wrote: > Hi Dmitry, > > I think, it is activated by PayloadTermQuery. > > Ahmet > > > > On Friday, April 24, 2015 2:51 PM, Dmitry Kan > wrote: > Hi, > > >

Re: payload similarity

2015-04-24 Thread Dmitry Kan
Answering my own question: in order to account for payloads, PayloadTermQuery should be used instead of TermQuery: PayloadTermQuery payloadTermQuery = new PayloadTermQuery(new Term("body", "dogs"), new MaxPayloadFunction()); Then in the query explanation we get: --- Results for body:dog

Re: payload similarity

2015-04-24 Thread Ahmet Arslan
Hi Dmitry, I think, it is activated by PayloadTermQuery. Ahmet On Friday, April 24, 2015 2:51 PM, Dmitry Kan wrote: Hi, Using the approach here http://lucidworks.com/blog/getting-started-with-payloads/ I have implemented my own PayloadSimilarity class. When debugging the code I have noticed

ArrayIndexOutOfBoundsException in RecordingJSONParser.java

2015-04-24 Thread Scott Dawson
Hello, I'm running Solr 5.1 and during indexing I get an ArrayIndexOutOfBoundsException at line 61 of org/apache/solr/util/RecordingJSONParser.java. Looking at the code (see below), it seems obvious that the if-statement at line 60 should use a greater-than sign instead of greater-than-or-equals.

Re: SolrCloud to exclude xslt files in conf from zookeeper

2015-04-24 Thread Shawn Heisey
On 4/24/2015 4:54 AM, Kumaradas Puthussery Krishnadas wrote: > I am creating a SolrCloud with 4 solr instances and 5 zookeeper instances. I > need to make sure that querying is working even when my 3 zookeepers are > down. But it looks like the queries using json transformation based xslt > tem

AW: o.a.s.c.SolrException: missing content stream

2015-04-24 Thread Clemens Wyss DEV
Stupid me (yet again): Should have taken a TEXT instead of (only) a STRING field for the content ;) Another question I have though (which fits the subject even better): In the log I see many org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.ContentSt

Re: Simple search low speed

2015-04-24 Thread Joel Bernstein
Try breaking down the query to see which part of it is slow. If it turns out to be the range query you may want to look into using an frange postfilter. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Apr 24, 2015 at 6:50 AM, Norgorn wrote: > Thanks for your reply. > > Yes, 100% CPU is use

o.a.s.c.SolrException: missing content stream

2015-04-24 Thread Clemens Wyss DEV
Context: Solr/Lucene 5.1 Adding documents to Solr core/index through SolrJ I extract pdf's using tika. The pdf-content is one of the fields of my SolrDocuments that are transmitted to Solr using SolrJ. As not all documents seem to be "coming through" I looked into the Solr-logs and see the follw

payload similarity

2015-04-24 Thread Dmitry Kan
Hi, Using the approach here http://lucidworks.com/blog/getting-started-with-payloads/ I have implemented my own PayloadSimilarity class. When debugging the code I have noticed, that the scorePayload method is never called. What could be wrong? [code] class PayloadSimilarity extends DefaultSimi

SolrCloud to exclude xslt files in conf from zookeeper

2015-04-24 Thread Kumaradas Puthussery Krishnadas
I am creating a SolrCloud with 4 solr instances and 5 zookeeper instances. I need to make sure that querying is working even when my 3 zookeepers are down. But it looks like the queries using json transformation based xslt templates which is not available since the zookeeper ensemble is not ava

Re: Simple search low speed

2015-04-24 Thread Norgorn
Thanks for your reply. Yes, 100% CPU is used by SOLR (100% - I mean 1 core, not all cores), I'm totally sure. I have more than 80 GB RAM on test machine and about 50 is cached as disk cache, SOLR uses about 8, Xmx=40G. I use GC1, but it can't be the problem, cause memory usage is much lower than

Re: Simple search low speed

2015-04-24 Thread Tomasz Borek
Java side: - launch jvisualvm - see how heap and CPU are occupied What are your JVM settings (heap) and how much RAM do you have? The CPU100% is used only by Solr? That is, are you 100% certain it's Solr that drives CPU to it's limit? pozdrawiam, LAFK 2015-04-24 12:14 GMT+02:00 Norgorn : > The

Re: Checking of Solr Memory and Disk usage

2015-04-24 Thread Tom Evans
On Fri, Apr 24, 2015 at 8:31 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > So has anyone knows what is the issue with the "Heap Memory Usage" reading > showing the value -1. Should I open an issue in Jira? I have solr 4.8.1 and solr 5.0.0 servers, on the solr 4.8.1 servers the core statistics have val

Re: Simple search low speed

2015-04-24 Thread Norgorn
The number of documents in collection is about 100m. -- View this message in context: http://lucene.472066.n3.nabble.com/Simple-search-low-speed-tp4202135p4202152.html Sent from the Solr - User mailing list archive at Nabble.com.

require diversity in results?

2015-04-24 Thread Paul Libbrecht
Hello list, I'm wondering if there could extra parameters or query operators that where I could impose that sorting by relevance should be relaxed so that there's a minimum diversity in some fields in the first page of results. For example, I'd like the search results to contain at least three po

AW: Odp.: solr issue with pdf forms

2015-04-24 Thread Steve.Scholl
Hey Erick, thanks a lot for your answer. I went to the admin schema browser, but what should I see there? Sorry I'm not firm with the admin schema browser. :-( Best Steve -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Donnerstag, 23. April 201

Simple search low speed

2015-04-24 Thread Norgorn
We have simple search over 50 GB index. And it's slow. I can't even wonder why, whole index is in RAM (and a lot of free space is available) and CPU is a bottleneck (100% load). The query is simple (except tvrh): q=(text:(word1+word2)++title:(word1+word2))&tv=true&isShard=true&qt=/tvrh&fq=cat:(10

Re: Grouping Performance Optimation

2015-04-24 Thread Norgorn
If u need only 200 results grouped, u can easily do it with some external code, it will be much faster anyway. Also, it's widely suggested to use docValues="true" for fields, by which group is performed, it really helps (I can only say numbers in terms of RAM usage, but speed increases as-well).

Re: Checking of Solr Memory and Disk usage

2015-04-24 Thread Zheng Lin Edwin Yeo
Hi, So has anyone knows what is the issue with the "Heap Memory Usage" reading showing the value -1. Should I open an issue in Jira? Regards, Edwin On 22 April 2015 at 21:23, Zheng Lin Edwin Yeo wrote: > I see. I'm running on SolrCloud with 2 replicia, so I guess mine will > probably use much