Validating idea of architecture for RDB / Import / Solr

2015-10-19 Thread hangu choi
Hi, I am newbie for solr and I hope to check my idea is good or terrible if someone can help. # background * I have mysql as my primary data storage. * I want to import data from mysql to solr (solrcloud). * I have domain logics to make solr document - (means I can't make solr documen

[newbie] Configuration for SolrCloud + DataImportHandler

2015-10-19 Thread hangu choi
Hi, I am trying to start SolrCloud with embedded ZooKeeper. I know how to config solrconfig.xml and schema.xml, and other things for data import handler. but when I trying to config it with solrCloud, I don't know where to start. I know there is no conf directory in SolrCloud because conf direct

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-19 Thread Zheng Lin Edwin Yeo
Hi Scott, Here's my schema.xml for content and title, which uses text_chinese. The problem only occurs in content, and not in title. Here's my solrconfig.xml on the highlighting portion: explicit 10 json

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-19 Thread Scott Chu
Hi Edwin, I didn't use Jieba on Chinese (I use only CJK, very foundamental, I know) so I didn't experience this problem. I'd suggest you post your schema.xml so we can see how you define your content field and the field type it uses? In the mean time, refer to these articles, maybe the answer

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-19 Thread Zheng Lin Edwin Yeo
Hi Scott, Thank you for your reply. I've tried to set that and also tried changing to Fast Vector Highlighter, but it isn't working as well. I got the same highlighting results as previously. Regards, Edwin On 19 October 2015 at 23:56, Scott Stults wrote: > Edwin, > > Try setting hl.bs.langu

Re: SolrCloud - Replica is showen as "Recovery-Failed"

2015-10-19 Thread Shawn Heisey
On 10/19/2015 11:56 AM, Jae Joo wrote: > Found the root cause. I disabled the transaction log. SolrCloud requires the transaction log for proper operation. Disabling it might cause all sorts of future problems, including problems with data replication and recovery. Because it's so critical, ther

Re: PayloadTermQuery deprecated

2015-10-19 Thread William Bell
Alan, Does this code look equivalent? And how do I change PayLoadScoreQuery to do a Custom Similarity? PayloadScoreQuery psq = new PayloadScoreQuery(sq, new AveragePayloadFunction()); @Override public Query parse() throws SyntaxError { if (qstr == null || qstr.length() == 0) return null;

auto deploument/setup of Solr & Zookeeper on medium-large clusters

2015-10-19 Thread Susheel Kumar
Hi, I am trying to find the best practises for setting up Solr on new 20+ machines & ZK (5+) and repeating same on other environments. What's the best way to download, extract, setup Solr & ZK in an automated way along with other dependencies like java etc. Among shell scripts or puppet or dock

Re: Tokenize ShingleFilterFactory results and apply filters to tokens

2015-10-19 Thread Alexandre Rafalovitch
This sounds like an attempt to create an auto-complete using n-grams in text. In which case, Ted Sullivan's writing might be of relevance: http://lucidworks.com/blog/author/tedsullivan/ Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.c

Re: SolrCloud - Replica is showen as "Recovery-Failed"

2015-10-19 Thread Jae Joo
Found the root cause. I disabled the transaction log. Thanks, On Mon, Oct 19, 2015 at 1:07 PM, Jae Joo wrote: > Solr Version " 5.3 > > I just built the SoleCloud with 5 shards and 3 replicationfactor in 15 > nodes. It means that I have shards and replicas running in it's own servers. > > When I

Re: Tokenize ShingleFilterFactory results and apply filters to tokens

2015-10-19 Thread Steve Rowe
Hi Vitaliy, I don’t know of any combination of built-in Lucene/Solr analysis components that would do what you want, but there used to be filter called ShingleMatrixFilter that (if I understand both that filter and what you want correctly), would do what you want, following an EdgeNGramFilter:

SolrCloud - Replica is showen as "Recovery-Failed"

2015-10-19 Thread Jae Joo
Solr Version " 5.3 I just built the SoleCloud with 5 shards and 3 replicationfactor in 15 nodes. It means that I have shards and replicas running in it's own servers. When I see the Cloud page, I see that the status of replica is "recovery-failed". For testing, I downed the leader, but a replica

Re: RequestProcessor with IndexSearcher for Different Core

2015-10-19 Thread Kilian Woods
Hi Mikhail, Thank you very much, that looks very helpful indeed. Kilian. On 19 October 2015 at 15:48, Mikhail Khludnev wrote: > Assuming you need to access sibling core from UpdateRequestHandler you can > how it's done on cross core join > > https://github.com/apache/lucene-solr/blob/trunk/sol

Re: Autostart Zookeeper and Solr using scripting

2015-10-19 Thread Scott Stults
Hi Adrian, I'd probably start with the expect command and "echo ruok | nc " for a simple script. You might also want to try the Netflix Exhibitor REST interface: https://github.com/Netflix/exhibitor/wiki/REST-Cluster k/r, Scott On Thu, Oct 15, 2015 at 2:01 AM, Adrian Liew wrote: > Hi, > > I

Re: Highlighting content field problem when using JiebaTokenizerFactory

2015-10-19 Thread Scott Stults
Edwin, Try setting hl.bs.language and hl.bs.country in your request or requestHandler: https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter#FastVectorHighlighter-UsingBoundaryScannerswiththeFastVectorHighlighter -Scott On Tue, Oct 13, 2015 at 5:04 AM, Zheng Lin Edwin Yeo wr

Re: Anyone users IBM J9 JVM with 32G max heap ? Tuning recommendations?

2015-10-19 Thread Toke Eskildsen
Jeff Wu wrote: > By staying with IBM JVM, anyone has recommendations on this ? The general > average heap usage in our solr server is around 26G so we'd like to stay > with 32G max heap, but want to better tune the JVM to have less global gc > pause. I am not sure if the IBM JVM works the same as

Re: Configuration

2015-10-19 Thread Alexandre Rafalovitch
Sounds like a mission impossible given the number of inner joins. However, what are you _actually_ trying to do? Are you trying to reindex the data? Do you actually have the data to reindex? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-sta

Re: RequestProcessor with IndexSearcher for Different Core

2015-10-19 Thread Mikhail Khludnev
Assuming you need to access sibling core from UpdateRequestHandler you can how it's done on cross core join https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/join/ScoreJoinQParserPlugin.java#L256 don't forget to close all resources. On Mon, Oct 19, 2015 a

RequestProcessor with IndexSearcher for Different Core

2015-10-19 Thread Kilian Woods
Hi All, I am looking to solve a design problem of mine. I want to create a SolrIndexSearcher for a *different* core inside a RequestProcessor. I know how to create a SolrIndexSearcher from the SolrQueryRequest but I want to search a different core instead. I want to update the current document w

Re: Anyone users IBM J9 JVM with 32G max heap ? Tuning recommendations?

2015-10-19 Thread Pushkar Raste
Do you have GC logging turned on? If yes can you provide excerpt from the GC log for a pause that took > 30sec On 19 October 2015 at 04:16, Jeff Wu wrote: > Hi all, > > we are using solr4.7 on top of IBM JVM J9 Java7, max heap to 32G, system > RAM 64G. > > JVM parameters: -Xgcpolicy:balanced -ve

Re: Nested entities not imported / do not show up in search?

2015-10-19 Thread Mikhail Khludnev
On Mon, Oct 19, 2015 at 2:48 AM, Matthias Fischer < matthias.fisc...@doubleslash.de> wrote: > Ok, thanks for your advice so far. I can import companies with their > nested entities (business branches) now. But I wonder whether there is a > way to query for company name patterns and get the busines

Configuration

2015-10-19 Thread fabigol
Hi, i catch an old Solr Project that i would configur. I have the xml file for each entity but i don't have the database. Is there a way to find the table schema? do it exist Tools to generate the table from xml file? Here a file:

Re: File-based Spelling

2015-10-19 Thread Mark Fenbers
OK. I removed it, started Solr, adn refreshed the query, but my results are the same, indicating that queryAnalyzerFieldType has nothing to do with my problem. New ideas?? Mark On 10/19/2015 4:37 AM, Duck Geraint (ext) GBJH wrote: "Yet, it claimed it found my misspelled word to be "fenber" w

AW: AW: Nested entities not imported / do not show up in search?

2015-10-19 Thread Matthias Fischer
Thanks, Andrea, your answer does make sense! Obviously as a SOLR newbie I am still thinking too much in terms of traditional databases ;-) Kind regards Matthias -Ursprüngliche Nachricht- Von: Andrea Gazzarini [mailto:a.gazzar...@gmail.com] Gesendet: Montag, 19. Oktober 2015 12:05 An: so

Re: AW: Nested entities not imported / do not show up in search?

2015-10-19 Thread Andrea Gazzarini
Most probably my answer makes no sense because I don't know the overall context, but why don't you import flat branches and companies with a "type" attribute ("company" or "branch") and a "owner" field that will be valorized only for braches with the company id ? Then you could autocomplete on the

Re: Recursively scan documents for indexing in a folder in SolrJ

2015-10-19 Thread Zheng Lin Edwin Yeo
Yes, I've managed to "steal" some codes from post.jar to only send rich-text documents format to /update/extract. I've also change the setting of the Eclipse at Windows -> Preference -> General -> Workspace. Under Text file encoding, select Other, and choose UTF-8. The Eclipse is now able to read

AW: Nested entities not imported / do not show up in search?

2015-10-19 Thread Matthias Fischer
Ok, thanks for your advice so far. I can import companies with their nested entities (business branches) now. But I wonder whether there is a way to query for company name patterns and get the business branches nested inside the respective companies. Using the following query I only get the comp

Re: PayloadTermQuery deprecated

2015-10-19 Thread Alan Woodward
I opened https://issues.apache.org/jira/browse/LUCENE-6844 Alan Woodward www.flax.co.uk On 19 Oct 2015, at 08:49, Alan Woodward wrote: > Hi Bill, > > This looks like an oversight on my part when migrating the payload scoring > queries - can you open a JIRA ticket to add 'includeSpanScore' as

RE: File-based Spelling

2015-10-19 Thread Duck Geraint (ext) GBJH
"Yet, it claimed it found my misspelled word to be "fenber" without the "s"" I wonder if this is because you seem to applying a stemmer to your dictionary words. Try removing the "text_en" line from your spellcheck search component definition. Geraint Geraint Duck Data Scientist Toxicology an

RE: Recursively scan documents for indexing in a folder in SolrJ

2015-10-19 Thread Duck Geraint (ext) GBJH
"The problem for this is that it is indexing all the files regardless of the formats, instead of just those formats in post.jar. So I guess still have to "steal" some codes from there to detect the file format?" If you've not worked it out yourself yet, try something like: http://docs.oracle.com

Anyone users IBM J9 JVM with 32G max heap ? Tuning recommendations?

2015-10-19 Thread Jeff Wu
Hi all, we are using solr4.7 on top of IBM JVM J9 Java7, max heap to 32G, system RAM 64G. JVM parameters: -Xgcpolicy:balanced -verbose:gc -Xms12228m -Xmx32768m -XX:PermSize=128m -XX:MaxPermSize=512m We faced one issue here: we set zkClient timeout value to 30 seconds. By using the balanced GC po

Re: Problem with indexing chinese characters when using SolrJ

2015-10-19 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thank you for the explanation. Regards, Edwin On 19 October 2015 at 15:58, Shawn Heisey wrote: > On 10/19/2015 12:18 AM, Zheng Lin Edwin Yeo wrote: > > I found that it works if I put the code in this way without the > URLEncoder > > > > req.setParam("literal.title", filename); > > >

solr4.7: truncated log output in grouping.CommandHandler?

2015-10-19 Thread Jeff Wu
We had solr server 4.7 recently reported such WARN message, and come with a long GC pause after that. Sometime it will force Solr server disconnect with ZK server. Solr 4.7.0, got this warning message: WARN - 2015-10-19 02:23:24.503; org.apache.solr.search.grouping.CommandHandler; Query: +(+owner

Re: Problem with indexing chinese characters when using SolrJ

2015-10-19 Thread Shawn Heisey
On 10/19/2015 12:18 AM, Zheng Lin Edwin Yeo wrote: > I found that it works if I put the code in this way without the URLEncoder > > req.setParam("literal.title", filename); > > Is the URLEncoder doing the encoding from the chinese characters to the > string > of code like this "%E7%AB%8B%E9".?

Re: PayloadTermQuery deprecated

2015-10-19 Thread Alan Woodward
Hi Bill, This looks like an oversight on my part when migrating the payload scoring queries - can you open a JIRA ticket to add 'includeSpanScore' as an option to PayloadScoreQuery? As a workaround, you should be able to use a custom similarity that returns 1 for all scores (see IndexSearcher.

Re: SOLR-7191 SolrCloud 5 with thousands of collections

2015-10-19 Thread Damien Kamerman
OK, turned out ZkStateReader.constructState() was only calling ClusterState.getCollections() for log.debug(). I removed that and the next bottleneck is talking to ZkStateReader.fetchCollectionState. "coreZkRegister-4-thread-14-processing-n:ftet1:8003_solr x:t_1558_shard1_replica1 s:shard1 c:t_1558