Hi
I am using MapReduceIndexer Tool to index data from hdfs , using morphlines
as ETL tool.
Specifying data path as xpath's in morphline file.
sorry for delay
--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334p4278730.html
Sen
hello all,
The site i’m working on has to support Vietnamese and Thai languages. The user
should be able to search in a language and Solr should be able to detect
misspelling and suggest some corrections. The search works as expected but the
spellcheck doesnt. Currently I’m looking to implement
Can anyone show me an example or short help of how I can do it? I am to use
Solr 5 or up to carry out it.
scott.chu,scott@udngroup.com
2016/5/24 (週二)
- Original Message -
From: scott(自己)
To: solr-user
CC:
Date: 2016/5/20 (週五) 14:17
Subject: Import html data in mysql and map schem
Hi Tom,
the pointer to the rule based placement was indeed what I was missing! I
simply had to add the rule "shard:*,replica:<2,node:*", as documented,
and my replicas do now get distributed as expected :-)
thanks,
Hendrik
On 23/05/16 15:28, Tom Evans wrote:
> On Mon, May 23, 2016 at 10:37 AM, H
Thanks for your considerable opinion. I'll try addreplica first.
scott.chu,scott@udngroup.com
2016/5/24 (週二)
- Original Message -
From: Erick Erickson
To: solr-user ; scott(自己)
CC:
Date: 2016/5/24 (週二) 01:56
Subject: Re: What to do best when expaning from 2 nodes to 4 nodes? [sco
About (1), bq: The Solr Admin UI showed that my replication factor
changed but otherwise nothing happened.
this is as designed AFAIK. There's nothing built in to Solr to
_automatically_ add replicas when this property is changed. My guess
is that the MODIFYCOLLECTION code was written to help with
I'd play with the timeAllowed option with a full corpus to get a sense
of how painful these queries are. There's also the issue of the impact
of queries like this on other users to consider
Other than that, I think you're on the right path in terms of
supporting some common use-cases with spec
I know this seems facetious, but Talk to your
clients about _why_ they want such increasingly
complex access requirements. Often the logic
is pretty flawed for the complexity. Things like
"allow user X to see document Y if they're part of
groups A, B, C but not D or E unless they are
also part
For <2> and <3> well, yes. To do _anything_ in
Solr you need to index the data to Solr. It doesn't
magically reach out into the DB and do stuff.
<3> you can either use DIH or a SolrJ program
and yes, you do have to do some kind of mapping of
database columns into Solr documents
I want to caut
Well, ya learn somethin' new every day
On Mon, May 23, 2016 at 4:31 PM, Timothy Potter wrote:
> Thanks Joel, that cleared things up nicely ... using 4 workers against
> 4 shards resulted in 16 queries to the collection. However, not all
> replicas were used for all shards, so it's not as bala
Hi, i have been using solr for many years and it is VERY helpful.
My problem is that our app has an increasingly more complicated access
control to satisfy client's requirement, in solr/lucene it means we need
to add more and more fields into each document and use more and more
complicated filter
Thanks Joel, that cleared things up nicely ... using 4 workers against
4 shards resulted in 16 queries to the collection. However, not all
replicas were used for all shards, so it's not as balanced as I
thought it would be, but we're dealing with small numbers of shards
and replicas here.
On Mon,
Hi All,
I am having an use case where I want to index a json field from mysql into
solr. The Json field will contain entries as key value pairs. The Json can
be nested, but I want to index only the first level field value pairs of
Jsons into solr keys and nested levels can be present as value of
c
Hi Solomon,
How come
hl.q=blah blah&hl.fl=normal_text,title
would produce "undefined field text" error message?
Please try
hl.q=blah blah&hl.fl=normal_text,title
just to verify there is a problem with the fielded queries.
Ahmet
On Monday, May 23, 2016 10:31 AM, michael solomon wrote:
Hi,
Wh
My first thought is that you haven’t indexed such that all values of the field
you’re grouping on are found in the same cores.
See the end of the article here: (Distributed Result Grouping Caveats)
https://cwiki.apache.org/confluence/display/solr/Result+Grouping
And the “Document Routing” sectio
Sorry, I did not see the responses here because I found out myself. I
definitely seems like a hard commit it performed when shutting down
gracefully. The info I got from production was wrong.
It is not necessarily obvious that you will loose data on "kill -9". The
tlog ought to save you, but it
Streaming expressions will utilize all replicas of a cluster when the
number of workers >= the number of replicas.
For example if there are 40 workers and 40 shards and 5 replicas.
For a single parallel request:
Each worker will send 1 query to a random replica in each shard. This is
1600 hundre
The image is the correct flow. Are you using workers?
Joel Bernstein
http://joelsolr.blogspot.com/
On Mon, May 23, 2016 at 7:16 PM, Timothy Potter
wrote:
> This image from the wiki kind of gives that impression to me:
>
>
> https://cwiki.apache.org/confluence/download/attachments/61311194/clu
This image from the wiki kind of gives that impression to me:
https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2
On Mon, May 23, 2016 at 11:50 AM, Erick Erickson
wrote:
> I _think_ this is a distinction between
> serving
The docs describe the current capabilities. So if it's not in the docs,
it's not supported yet. For example the docs don't mention joins or
intersections and they are not supported. Another example is that select
count(*) is supported, and select distinct is supported, but select
count(distinct) is
Take a look at the SPLITSHARD Collections API here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
Best value of numShards and replicationFactor: Impossible to say. You have
to stress test respecting your SLAs. See:
https://lucidworks.com/blog/2012/07/23/sizin
I _think_ this is a distinction between
serving the query and processing the results. The
query is the standard Solr processing returning
results from one replica per shard.
Those results can be partitioned out to N Solr instances
for sub-processing, where N is however many worker
nodes you speci
Furthermore I was checking the internals of the old facet implementation (
which comes when using the classic request parameter based, instead of the
json facet). It seems that if you enable docValues even with the enun
method passed as parameter , actually fc with docValues will be used.
i will g
What I find odd is that creating a collection with a replication factor
greater then 1 does seem to not end up with replicas on the same node.
However when one wants to add replicas later on one need to do the whole
placement manually to avoid single point of failures.
On 23/05/16 15:28, Tom Evans
Have you seen:
https://lucidworks.com/blog/2015/03/04/solr-suggester/
Best,
Erick
On Sun, May 22, 2016 at 10:07 PM, Mugeesh Husain wrote:
> Hello everyone,
>
> I am looking for some suggestion for auto-suggest like imdb.com.
>
> just type "samp" in search box in imdb.com site.
>
> results are re
On Mon, May 23, 2016 at 12:41 PM, Steven White wrote:
> Thank you Erik and Scott. {!terms} did the job!! I tested like so:
> fq={!terms f=category}1,2,3,4,...N
>
> I read that {!terms} treats the terms in the list as OR, if I have a need
> to force AND on my terms, how do I do that?
While ORing
That would be a welcomed feature for sure!
On Mon, May 23, 2016 at 6:11 AM, Horváth Péter Gergely <
peter.gergely.horv...@gmail.com> wrote:
> Hi Steve,
>
> Thank you very much for your inputs. Yes, I do know the aliasing mechanism
> offered in Solr. I think the whole question boils down to one th
Yes, currently when using Atomic updates _all_ fields
have to be stored, except the _destinations_ of copyField
directives.
Yes, it will make your index bigger. The affects on speed are
probably minimal though. The stored data is in your *.fdt and
*.fdx segments files and are not referenced only t
Steven:
I'm not sure you can, the terms query parser is built to
OR things together.
You might be able to use some of the nested query stuff.
Or, assuming you have an _additional_ fq clause
you want to use just use it as:
fq={!terms f=category}1,2,3,4,...N&fq=whaterver
Then you're taking advanta
https://github.com/whitepages/solrcloud_manager was designed to provide some
easier operations for common kinds of cluster operation.
It hasn’t been tested with 6.0 though, so if you try it, please let me know
your experience.
On 5/23/16, 6:28 AM, "Tom Evans" wrote:
>On Mon, May 23, 2016 at
The PingRequestHandler contains support for a file check, which allows you to
control whether the ping request succeeds based on the presence/absence of a
file on disk on the node.
http://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/handler/PingRequestHandler.html
I suppose you could
Hi everyone,
I'm reading on Solr's Parallel SQL. I see some good examples but not much
on how to set it up and what are the limitations. My reading on it is that
I can use Parallel SQL to send to Solr SQL syntax to search in Solr, but:
1) Does this mean all of SQL's query statements are support
Thank you Erik and Scott. {!terms} did the job!! I tested like so:
fq={!terms f=category}1,2,3,4,...N
I read that {!terms} treats the terms in the list as OR, if I have a need
to force AND on my terms, how do I do that?
Steve
On Mon, May 23, 2016 at 9:39 AM, Scott Chu wrote:
>
> Yonik has a
I've seen docs and diagrams that seem to indicate a streaming
expression can utilize all replicas of a shard but I'm seeing only 1
replica per shard (I have 2) being queried.
All replicas are on the same host for my experimentation, could that
be the issue? What are the circumstances where all rep
If you can make min/max work for you instead of sort then it should be
faster, but I haven't spent time comparing the performance.
But if you're using the top_fc with the min/max param the performance
between Solr 4 & Solr 6 should be very close as the data structures behind
them are the same.
Sorry for the typo. I rewrite my question again:
I just created a 90gb index collection with 1 shard and 2 replicas on 2 nodes.
I am to migrate from 2 nodes to 4 nodes. I am wondering what's the best
strategy to split this single shard? Furthermore, if I am ok to reindex, what's
the best adequ
I just created a 90gb index collection with 1 shard and 2 replicas on 2 nodes.
I am to migrate from 2 nodes to 4 node. I am wondering what's the best stragedy
to split this single shard? Furthermore, If I am ok to reindex, what's the best
adequate experienced value of numShards and replicationFa
Hi Joel,
thanks for the reply, actually we were not using field collapsing before,
we basically want to replace grouping with that.
The grouping performance between Solr 4 and 6 are basically comparable.
It's surprising I got so big degradation with the field collapsing.
So basically the compariso
For exact syntax of the top_fc hint use the official docs. The blog is
using an upper case hint, but that was changed to a lower case hint.
Joel Bernstein
http://joelsolr.blogspot.com/
On Mon, May 23, 2016 at 2:56 PM, Joel Bernstein wrote:
> Also I wrote a guide for Solr 5 Collapsing/Expand per
Also I wrote a guide for Solr 5 Collapsing/Expand performance, that use to
be on Heliosearch.org. It's now long available accept through the magic of
the Wayback machine. What's not covered is the sort param, which came later.
Here it is:
http://web.archive.org/web/20150709154420/http://heliosear
On 5/23/2016 6:35 AM, 김두형 wrote:
> actually, i want to insert some logs into solrindexsearcher. so the place
> where solrindexsearcher is solr-core.jar in dist. i replace new made
> solr-core.jar with old solr-core.jar in dist.
> in solrconfig i made this solrconfig refered this jar like below.
>
>
Were you using the sort param or min/max param in Solr 4 to select the
group head? The sort work came later and I'm not sure how it compares in
performance to the min/max param.
Since you are collapsing on a string field you can use the top_fc hint
which will use a top level field cache for the co
Yonik has a very well article about term qp:
Solr Terms Query for matching many terms - Solr 'n Stuff
http://yonik.com/solr-terms-query/
Scott Chu,scott@udngroup.com
2016/5/23 (週一)
- Original Message -
From: Erik Hatcher
To: solr-user
CC:
Date: 2016/5/23 (週一) 21:14
Subject: Re:
On Mon, May 23, 2016 at 10:37 AM, Hendrik Haddorp
wrote:
> Hi,
>
> I have a SolrCloud 6.0 setup and created my collection with a
> replication factor of 1. Now I want to increase the replication factor
> but would like the replicas for the same shard to be on different nodes,
> so that my collecti
Try the {!terms} query parser. That should make it work well for you. Let us
know how it does.
Erik
> On May 23, 2016, at 08:52, Steven White wrote:
>
> Hi everyone,
>
> I'm trying to figure out what's the best way for me to use "fq" when the
> list of items is large (up to 200, but I
Hi,
I have some 150 fields in my schema out of which about 100 are dynamic
fields which I am not storing (stored="false").
In case I need to do an atomic update to one or two fields which belong to
the stored list of fields, do I need to change my dynamic fields (100 or so
now not "stored") to sto
Hi everyone,
I'm trying to figure out what's the best way for me to use "fq" when the
list of items is large (up to 200, but I have few cases with up to 1000).
My current usage is like so: &fq=category:(1 OR 2 OR 3 OR 4 ... 200)
When I tested with up to 1000, I hit the "too many boolean clauses"
actually, i want to insert some logs into solrindexsearcher. so the place
where solrindexsearcher is solr-core.jar in dist. i replace new made
solr-core.jar with old solr-core.jar in dist.
in solrconfig i made this solrconfig refered this jar like below.
.
.
.
however, solr did not refer what
Hi Mikhail,
Thanks. Missed it completely thought it would handle by
default.
On Monday 23 May 2016 02:08 PM, Mikhail Khludnev wrote:
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter
sort=score asc
On Mon, May 23,
Let's add some additional details guys :
1) *Faceting*
Currently the facet method used is "enum" and it runs over 20 fields more
or less.
Mainly using it on low cardinality fields except one which has a
cardinality of 1000 terms.
I am aware of the famous Jira related faceting regression :
https://
Hi All,
I am using grouping query with solr cloud version 5.2.1 .
Parameters added in my query is
&q=SIM*group=true&group.field=amid&group.limit=1&group.main=true. But each
time I hit the query i get different results i.e top 10 results are
different each time.
Why is it so ? Please help me with
Hi Steve,
Thank you very much for your inputs. Yes, I do know the aliasing mechanism
offered in Solr. I think the whole question boils down to one thing: how
much do you know about the data being stored -- and sometimes you know
nothing about that.
In some cases, you have to provide a generic sol
Good points, thanks Erick.
As you guessed, the use case is not in the main flow for the general user, but
an advanced flow for a technical one.
Regarding the performance issue, I thought of a few optimizations for some
expected expressions I need to support.
For instance, to walk around the dig
Sure,
sorry for the delay
2016-05-16 16:57 GMT+02:00 Yonik Seeley :
> Thanks Matteo, looks like you found a bug.
> I can reproduce this with simpler queries too:
>
> _query_:"ABC" name_t:"white cat"~3
> is parsed to
> text:abc name_t:"white cat"
>
> Can you open a JIRA for this?
>
> -Yonik
Hi,
I have a SolrCloud 6.0 setup and created my collection with a
replication factor of 1. Now I want to increase the replication factor
but would like the replicas for the same shard to be on different nodes,
so that my collection does not fail when one node fails. I tried two
approaches so far:
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter
sort=score asc
On Mon, May 23, 2016 at 11:17 AM, Pranaya Behera
wrote:
> Hi Mikhail,
> I saw the blog post tried to do that with parent block
> query {!parent} as I d
Hi Erick
Thanks for your help, it is alright now.
Have a good day
Victor
Message original
*Sujet: *Re: Error opening new searcher
*De : *Erick Erickson
*Pour : *solr-user
*Date : *20/05/2016 17:57
Actually, it almost certainly _is_ in the regular Solr log file, just
which o
Hi Mikhail,
I saw the blog post tried to do that with parent block
query {!parent} as I dont have the reference for the parent in the child
to use in the {!join}. This is my result.
https://gist.github.com/shadow-fox/b728683b27a2f39d1b5e1aac54b7a8fb .
This yields me the result
Also, I believe this syntax should work as well with SQL we'll need to test
it out:
_query_:"{!dismax qf=myfield}how now brown cow"
Joel Bernstein
http://joelsolr.blogspot.com/
On Mon, May 23, 2016 at 2:59 AM, Joel Bernstein wrote:
> I opened SOLR-9148 and added a patch to pass through filter
Hi,
When I'm increase hl.maxAnalyzedChars nothing happened.
AND
hl.q=blah blah&hl.fl=normal_text,title
I get:
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"undefined field text",
60 matches
Mail list logo