date:20160523

Re: Indexing a (File attached to a document)

2016-05-23 Thread Solr User

Hi I am using MapReduceIndexer Tool to index data from hdfs , using morphlines as ETL tool. Specifying data path as xpath's in morphline file. sorry for delay -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334p4278730.html Sen

spellcheck on vietnamese (vi)

2016-05-23 Thread Nuhaa All Bakry

hello all, The site i’m working on has to support Vietnamese and Thai languages. The user should be able to search in a language and Solr should be able to detect misspelling and suggest some corrections. The search works as expected but the spellcheck doesnt. Currently I’m looking to implement

Re: Import html data in mysql and map schemas using only SolrCELL+TIKA+DIH [scottchu]

2016-05-23 Thread scott.chu

Can anyone show me an example or short help of how I can do it? I am to use Solr 5 or up to carry out it. scott.chu，scott@udngroup.com 2016/5/24 (週二) - Original Message - From: scott(自己) To: solr-user CC: Date: 2016/5/20 (週五) 14:17 Subject: Import html data in mysql and map schem

Re: SolrCloud increase replication factor

2016-05-23 Thread Hendrik Haddorp

Hi Tom, the pointer to the rule based placement was indeed what I was missing! I simply had to add the rule "shard:*,replica:<2,node:*", as documented, and my replicas do now get distributed as expected :-) thanks, Hendrik On 23/05/16 15:28, Tom Evans wrote: > On Mon, May 23, 2016 at 10:37 AM, H

Re: What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

2016-05-23 Thread scott.chu

Thanks for your considerable opinion. I'll try addreplica first. scott.chu，scott@udngroup.com 2016/5/24 (週二) - Original Message - From: Erick Erickson To: solr-user ; scott(自己) CC: Date: 2016/5/24 (週二) 01:56 Subject: Re: What to do best when expaning from 2 nodes to 4 nodes? [sco

Re: SolrCloud increase replication factor

2016-05-23 Thread Erick Erickson

About (1), bq: The Solr Admin UI showed that my replication factor changed but otherwise nothing happened. this is as designed AFAIK. There's nothing built in to Solr to _automatically_ add replicas when this property is changed. My guess is that the MODIFYCOLLECTION code was written to help with

Re: How to use a regex search within a phrase query?

2016-05-23 Thread Erick Erickson

I'd play with the timeAllowed option with a full corpus to get a sense of how painful these queries are. There's also the issue of the impact of queries like this on other users to consider Other than that, I think you're on the right path in terms of supporting some common use-cases with spec

Re: Using solr with increasing complicated access control

2016-05-23 Thread Erick Erickson

I know this seems facetious, but Talk to your clients about _why_ they want such increasingly complex access requirements. Often the logic is pretty flawed for the complexity. Things like "allow user X to see document Y if they're part of groups A, B, C but not D or E unless they are also part

Re: Solr 6.0 Parallel SQL

2016-05-23 Thread Erick Erickson

For <2> and <3> well, yes. To do _anything_ in Solr you need to index the data to Solr. It doesn't magically reach out into the DB and do stuff. <3> you can either use DIH or a SolrJ program and yes, you do have to do some kind of mapping of database columns into Solr documents I want to caut

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Erick Erickson

Well, ya learn somethin' new every day On Mon, May 23, 2016 at 4:31 PM, Timothy Potter wrote: > Thanks Joel, that cleared things up nicely ... using 4 workers against > 4 shards resulted in 16 queries to the collection. However, not all > replicas were used for all shards, so it's not as bala

Using solr with increasing complicated access control

2016-05-23 Thread Lisheng Zhang

Hi, i have been using solr for many years and it is VERY helpful. My problem is that our app has an increasingly more complicated access control to satisfy client's requirement, in solr/lucene it means we need to add more and more fields into each document and use more and more complicated filter

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Timothy Potter

Thanks Joel, that cleared things up nicely ... using 4 workers against 4 shards resulted in 16 queries to the collection. However, not all replicas were used for all shards, so it's not as balanced as I thought it would be, but we're dealing with small numbers of shards and replicas here. On Mon,

Solr mysql Json import

2016-05-23 Thread vsriram30

Hi All, I am having an use case where I want to index a json field from mysql into solr. The Json field will contain entries as key value pairs. The Json can be nested, but I want to index only the first level field value pairs of Jsons into solr keys and nested levels can be present as value of c

Re: highlight don't work if df not specified

2016-05-23 Thread Ahmet Arslan

Hi Solomon, How come hl.q=blah blah&hl.fl=normal_text,title would produce "undefined field text" error message? Please try hl.q=blah blah&hl.fl=normal_text,title just to verify there is a problem with the fielded queries. Ahmet On Monday, May 23, 2016 10:31 AM, michael solomon wrote: Hi, Wh

Re: Solr cloud with Grouping query gives inconsistent results

2016-05-23 Thread Jeff Wartes

My first thought is that you haven’t indexed such that all values of the field you’re grouping on are found in the same cores. See the end of the article here: (Distributed Result Grouping Caveats) https://cwiki.apache.org/confluence/display/solr/Result+Grouping And the “Document Routing” sectio

Re: Commit (hard) at shutdown?

2016-05-23 Thread Per Steffensen

Sorry, I did not see the responses here because I found out myself. I definitely seems like a hard commit it performed when shutting down gracefully. The info I got from production was wrong. It is not necessarily obvious that you will loose data on "kill -9". The tlog ought to save you, but it

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Joel Bernstein

Streaming expressions will utilize all replicas of a cluster when the number of workers >= the number of replicas. For example if there are 40 workers and 40 shards and 5 replicas. For a single parallel request: Each worker will send 1 query to a random replica in each shard. This is 1600 hundre

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Joel Bernstein

The image is the correct flow. Are you using workers? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, May 23, 2016 at 7:16 PM, Timothy Potter wrote: > This image from the wiki kind of gives that impression to me: > > > https://cwiki.apache.org/confluence/download/attachments/61311194/clu

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Timothy Potter

This image from the wiki kind of gives that impression to me: https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2 On Mon, May 23, 2016 at 11:50 AM, Erick Erickson wrote: > I _think_ this is a distinction between > serving

Re: Solr 6.0 Parallel SQL

2016-05-23 Thread Joel Bernstein

The docs describe the current capabilities. So if it's not in the docs, it's not supported yet. For example the docs don't mention joins or intersections and they are not supported. Another example is that select count(*) is supported, and select distinct is supported, but select count(distinct) is

Re: What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

2016-05-23 Thread Erick Erickson

Take a look at the SPLITSHARD Collections API here: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 Best value of numShards and replicationFactor: Impossible to say. You have to stress test respecting your SLAs. See: https://lucidworks.com/blog/2012/07/23/sizin

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Erick Erickson

I _think_ this is a distinction between serving the query and processing the results. The query is the standard Solr processing returning results from one replica per shard. Those results can be partitioned out to N Solr instances for sub-processing, where N is however many worker nodes you speci

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Alessandro Benedetti

Furthermore I was checking the internals of the old facet implementation ( which comes when using the classic request parameter based, instead of the json facet). It seems that if you enable docValues even with the enun method passed as parameter , actually fc with docValues will be used. i will g

Re: SolrCloud increase replication factor

2016-05-23 Thread Hendrik Haddorp

What I find odd is that creating a collection with a replication factor greater then 1 does seem to not end up with replicas on the same node. However when one wants to add replicas later on one need to do the whole placement manually to avoid single point of failures. On 23/05/16 15:28, Tom Evans

Re: Auto Suggestion in solr

2016-05-23 Thread Erick Erickson

Have you seen: https://lucidworks.com/blog/2015/03/04/solr-suggester/ Best, Erick On Sun, May 22, 2016 at 10:07 PM, Mugeesh Husain wrote: > Hello everyone, > > I am looking for some suggestion for auto-suggest like imdb.com. > > just type "samp" in search box in imdb.com site. > > results are re

Re: How to use "fq"

2016-05-23 Thread Yonik Seeley

On Mon, May 23, 2016 at 12:41 PM, Steven White wrote: > Thank you Erik and Scott. {!terms} did the job!! I tested like so: > fq={!terms f=category}1,2,3,4,...N > > I read that {!terms} treats the terms in the list as OR, if I have a need > to force AND on my terms, how do I do that? While ORing

Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-23 Thread Abdel Belkasri

That would be a welcomed feature for sure! On Mon, May 23, 2016 at 6:11 AM, Horváth Péter Gergely < peter.gergely.horv...@gmail.com> wrote: > Hi Steve, > > Thank you very much for your inputs. Yes, I do know the aliasing mechanism > offered in Solr. I think the whole question boils down to one th

Re: Atomic updates and "stored"

2016-05-23 Thread Erick Erickson

Yes, currently when using Atomic updates _all_ fields have to be stored, except the _destinations_ of copyField directives. Yes, it will make your index bigger. The affects on speed are probably minimal though. The stored data is in your *.fdt and *.fdx segments files and are not referenced only t

Re: How to use "fq"

2016-05-23 Thread Erick Erickson

Steven: I'm not sure you can, the terms query parser is built to OR things together. You might be able to use some of the nested query stuff. Or, assuming you have an _additional_ fq clause you want to use just use it as: fq={!terms f=category}1,2,3,4,...N&fq=whaterver Then you're taking advanta

Re: SolrCloud increase replication factor

2016-05-23 Thread Jeff Wartes

https://github.com/whitepages/solrcloud_manager was designed to provide some easier operations for common kinds of cluster operation. It hasn’t been tested with 6.0 though, so if you try it, please let me know your experience. On 5/23/16, 6:28 AM, "Tom Evans" wrote: >On Mon, May 23, 2016 at

Re: How to stop searches to solr while full data import is going in SOLR

2016-05-23 Thread Jeff Wartes

The PingRequestHandler contains support for a file check, which allows you to control whether the ping request succeeds based on the presence/absence of a file on disk on the node. http://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/handler/PingRequestHandler.html I suppose you could

Solr 6.0 Parallel SQL

2016-05-23 Thread Steven White

Hi everyone, I'm reading on Solr's Parallel SQL. I see some good examples but not much on how to set it up and what are the limitations. My reading on it is that I can use Parallel SQL to send to Solr SQL syntax to search in Solr, but: 1) Does this mean all of SQL's query statements are support

Re: How to use "fq"

2016-05-23 Thread Steven White

Thank you Erik and Scott. {!terms} did the job!! I tested like so: fq={!terms f=category}1,2,3,4,...N I read that {!terms} treats the terms in the list as OR, if I have a need to force AND on my terms, how do I do that? Steve On Mon, May 23, 2016 at 9:39 AM, Scott Chu wrote: > > Yonik has a

Streaming expression not hitting all replicas?

2016-05-23 Thread Timothy Potter

I've seen docs and diagrams that seem to indicate a streaming expression can utilize all replicas of a shard but I'm seeing only 1 replica per shard (I have 2) being queried. All replicas are on the same host for my experimentation, could that be the issue? What are the circumstances where all rep

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein

If you can make min/max work for you instead of sort then it should be faster, but I haven't spent time comparing the performance. But if you're using the top_fc with the min/max param the performance between Solr 4 & Solr 6 should be very close as the data structures behind them are the same.

What to do best when expaning from 2 nodes to 4 nodes? (fix typo) [scottchu]

2016-05-23 Thread Scott Chu

Sorry for the typo. I rewrite my question again: I just created a 90gb index collection with 1 shard and 2 replicas on 2 nodes. I am to migrate from 2 nodes to 4 nodes. I am wondering what's the best strategy to split this single shard? Furthermore, if I am ok to reindex, what's the best adequ

What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

2016-05-23 Thread Scott Chu

I just created a 90gb index collection with 1 shard and 2 replicas on 2 nodes. I am to migrate from 2 nodes to 4 node. I am wondering what's the best stragedy to split this single shard? Furthermore, If I am ok to reindex, what's the best adequate experienced value of numShards and replicationFa

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Alessandro Benedetti

Hi Joel, thanks for the reply, actually we were not using field collapsing before, we basically want to replace grouping with that. The grouping performance between Solr 4 and 6 are basically comparable. It's surprising I got so big degradation with the field collapsing. So basically the compariso

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein

For exact syntax of the top_fc hint use the official docs. The blog is using an upper case hint, but that was changed to a lower case hint. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, May 23, 2016 at 2:56 PM, Joel Bernstein wrote: > Also I wrote a guide for Solr 5 Collapsing/Expand per

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein

Also I wrote a guide for Solr 5 Collapsing/Expand performance, that use to be on Heliosearch.org. It's now long available accept through the magic of the Wayback machine. What's not covered is the sort param, which came later. Here it is: http://web.archive.org/web/20150709154420/http://heliosear

Re: hello i am solr cloud user! i have question!

2016-05-23 Thread Shawn Heisey

On 5/23/2016 6:35 AM, 김두형 wrote: > actually, i want to insert some logs into solrindexsearcher. so the place > where solrindexsearcher is solr-core.jar in dist. i replace new made > solr-core.jar with old solr-core.jar in dist. > in solrconfig i made this solrconfig refered this jar like below. > >

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein

Were you using the sort param or min/max param in Solr 4 to select the group head? The sort work came later and I'm not sure how it compares in performance to the min/max param. Since you are collapsing on a string field you can use the top_fc hint which will use a top level field cache for the co

Re: How to use "fq"

2016-05-23 Thread Scott Chu

Yonik has a very well article about term qp: Solr Terms Query for matching many terms - Solr 'n Stuff http://yonik.com/solr-terms-query/ Scott Chu，scott@udngroup.com 2016/5/23 (週一) - Original Message - From: Erik Hatcher To: solr-user CC: Date: 2016/5/23 (週一) 21:14 Subject: Re:

Re: SolrCloud increase replication factor

2016-05-23 Thread Tom Evans

On Mon, May 23, 2016 at 10:37 AM, Hendrik Haddorp wrote: > Hi, > > I have a SolrCloud 6.0 setup and created my collection with a > replication factor of 1. Now I want to increase the replication factor > but would like the replicas for the same shard to be on different nodes, > so that my collecti

Re: How to use "fq"

2016-05-23 Thread Erik Hatcher

Try the {!terms} query parser. That should make it work well for you. Let us know how it does. Erik > On May 23, 2016, at 08:52, Steven White wrote: > > Hi everyone, > > I'm trying to figure out what's the best way for me to use "fq" when the > list of items is large (up to 200, but I

Atomic updates and "stored"

2016-05-23 Thread Mark Robinson

Hi, I have some 150 fields in my schema out of which about 100 are dynamic fields which I am not storing (stored="false"). In case I need to do an atomic update to one or two fields which belong to the stored list of fields, do I need to change my dynamic fields (100 or so now not "stored") to sto

How to use "fq"

2016-05-23 Thread Steven White

Hi everyone, I'm trying to figure out what's the best way for me to use "fq" when the list of items is large (up to 200, but I have few cases with up to 1000). My current usage is like so: &fq=category:(1 OR 2 OR 3 OR 4 ... 200) When I tested with up to 1000, I hit the "too many boolean clauses"

hello i am solr cloud user! i have question!

2016-05-23 Thread 김두형

actually, i want to insert some logs into solrindexsearcher. so the place where solrindexsearcher is solr-core.jar in dist. i replace new made solr-core.jar with old solr-core.jar in dist. in solrconfig i made this solrconfig refered this jar like below. . . . however, solr did not refer what

Re: Sorting on child document field.

2016-05-23 Thread Pranaya Behera

Hi Mikhail, Thanks. Missed it completely thought it would handle by default. On Monday 23 May 2016 02:08 PM, Mikhail Khludnev wrote: https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter sort=score asc On Mon, May 23,

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Alessandro Benedetti

Let's add some additional details guys : 1) *Faceting* Currently the facet method used is "enum" and it runs over 20 fields more or less. Mainly using it on low cardinality fields except one which has a cardinality of 1000 terms. I am aware of the famous Jira related faceting regression : https://

Solr cloud with Grouping query gives inconsistent results

2016-05-23 Thread preeti kumari

Hi All, I am using grouping query with solr cloud version 5.2.1 . Parameters added in my query is &q=SIM*group=true&group.field=amid&group.limit=1&group.main=true. But each time I hit the query i get different results i.e top 10 results are different each time. Why is it so ? Please help me with

Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-23 Thread Horváth Péter Gergely

Hi Steve, Thank you very much for your inputs. Yes, I do know the aliasing mechanism offered in Solr. I think the whole question boils down to one thing: how much do you know about the data being stored -- and sometimes you know nothing about that. In some cases, you have to provide a generic sol

RE: How to use a regex search within a phrase query?

2016-05-23 Thread Erez Michalak

Good points, thanks Erick. As you guessed, the use case is not in the main flow for the general user, but an advanced flow for a technical one. Regarding the performance issue, I thought of a few optimizations for some expected expressions I need to support. For instance, to walk around the dig

Re: problems with nested queries

2016-05-23 Thread Matteo Grolla

Sure, sorry for the delay 2016-05-16 16:57 GMT+02:00 Yonik Seeley : > Thanks Matteo, looks like you found a bug. > I can reproduce this with simpler queries too: > > _query_:"ABC" name_t:"white cat"~3 > is parsed to > text:abc name_t:"white cat" > > Can you open a JIRA for this? > > -Yonik

SolrCloud increase replication factor

2016-05-23 Thread Hendrik Haddorp

Hi, I have a SolrCloud 6.0 setup and created my collection with a replication factor of 1. Now I want to increase the replication factor but would like the replicas for the same shard to be on different nodes, so that my collection does not fail when one node fails. I tried two approaches so far:

Re: Sorting on child document field.

2016-05-23 Thread Mikhail Khludnev

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter sort=score asc On Mon, May 23, 2016 at 11:17 AM, Pranaya Behera wrote: > Hi Mikhail, > I saw the blog post tried to do that with parent block > query {!parent} as I d

Re: Error opening new searcher

2016-05-23 Thread Victor D'agostino

Hi Erick Thanks for your help, it is alright now. Have a good day Victor Message original *Sujet: *Re: Error opening new searcher *De : *Erick Erickson *Pour : *solr-user *Date : *20/05/2016 17:57 Actually, it almost certainly _is_ in the regular Solr log file, just which o

Re: Sorting on child document field.

2016-05-23 Thread Pranaya Behera

Hi Mikhail, I saw the blog post tried to do that with parent block query {!parent} as I dont have the reference for the parent in the child to use in the {!join}. This is my result. https://gist.github.com/shadow-fox/b728683b27a2f39d1b5e1aac54b7a8fb . This yields me the result

Re: Parallel SQL and function queries?

2016-05-23 Thread Joel Bernstein

Also, I believe this syntax should work as well with SQL we'll need to test it out: _query_:"{!dismax qf=myfield}how now brown cow" Joel Bernstein http://joelsolr.blogspot.com/ On Mon, May 23, 2016 at 2:59 AM, Joel Bernstein wrote: > I opened SOLR-9148 and added a patch to pass through filter

Re: highlight don't work if df not specified

2016-05-23 Thread michael solomon

Hi, When I'm increase hl.maxAnalyzedChars nothing happened. AND hl.q=blah blah&hl.fl=normal_text,title I get: "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"undefined field text",

60 matches

Mail list logo