date:20160523

Re: highlight don't work if df not specified

2016-05-23 Thread michael solomon

Hi,
When I'm increase hl.maxAnalyzedChars nothing happened.

AND

hl.q=blah blah&hl.fl=normal_text,title
I get:

"error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"undefined field text",
"code":400}}



On Sun, May 22, 2016 at 5:34 PM, Ahmet Arslan 
wrote:

> Hi,
>
> What happens when you increase hl.maxAnalyzedChars?
>
> OR
>
> hl.q=blah blah&hl.fl=normal_text,title
>
> Ahmet
>
>
>
> On Sunday, May 22, 2016 5:24 PM, michael solomon 
> wrote:
>  "true" stored="true"/>
> 
>
>
> On Sun, May 22, 2016 at 5:18 PM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > Weird, are your fields stored?
> >
> >
> >
> > On Sunday, May 22, 2016 5:14 PM, michael solomon 
> > wrote:
> > Thanks Ahmet,
> > It was mistake in the question, sorry, in the quey I wrote it properly.
> >
> >
> > On Sun, May 22, 2016 at 5:06 PM, Ahmet Arslan  >
> > wrote:
> >
> > > Hi,
> > >
> > > q=normal_text:"bla bla"&title:"bla bla"
> > >
> > > should be
> > > q=+normal_text:"bla bla" +title:"bla bla"
> > >
> > >
> > >
> > > On Sunday, May 22, 2016 4:52 PM, michael solomon  >
> > > wrote:
> > > Hi,
> > > I'm I query multiple fields in solr:
> > > q=normal_text:"bla bla"&title:"bla bla" 
> > >
> > > I turn on the highlighting, but it doesn't work even when I fill hl.fl.
> > > it work when I fill df(default field) parameter, but then it's
> highlights
> > > only one field.
> > > What the problem?
> > > Thanks,
> > > michael
> > >
> >
>

Re: Parallel SQL and function queries?

2016-05-23 Thread Joel Bernstein

Also, I believe this syntax should work as well with SQL we'll need to test
it out:

_query_:"{!dismax qf=myfield}how now brown cow"

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 23, 2016 at 2:59 AM, Joel Bernstein  wrote:

> I opened SOLR-9148 and added a patch to pass through filter queries.
>
> I also saw that there is one kind of geo filter that uses the Solr range
> syntax (filtering by rectangle):
>
> store:[45,-94 TO 46,-93]
>
> This should work with the current SQL interface.
>
> We should check with David Smiley and see if there are other Geo-spatial
> queries that will work with the range syntax.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, May 22, 2016 at 10:30 PM, Joel Bernstein 
> wrote:
>
>> Actually, I just reviewed the code and it's not passing through the
>> params as I described.
>>
>> I think this is important to have working so we can support the qparser
>> plugins through filter queries. I'll create a ticket for this and work up a
>> patch for this.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Sun, May 22, 2016 at 5:16 PM, Joel Bernstein 
>> wrote:
>>
>>> I didn't have a chance yet to fully formulate a strategy for using the
>>> qparser plugins through the SQL interface.
>>>
>>> In the initial release there is a back door that allows you tack on any
>>> parameters you want, and they should be passed through to Solr. There are
>>> no test cases for this, but I can add some to verify this is working. The
>>> idea being that you can do this:
>>>
>>> /sql?stmt=SELECT...&fq={!geofilt ...}
>>>
>>> One of the main things I had in mind with this was allowing access
>>> control lists to be passed in, because Alfresco supports document level
>>> access control which would need to be supported through the SQL interface.
>>>
>>> But any fq should get passed through. it's not clear right now whether
>>> multiple fq's will be passed through, but due to changes that Erick
>>> Erickson recently contributed, I believe they will. Again we need test
>>> cases for this.
>>>
>>> The JDBC driver should pass through any properties that are set on the
>>> connection as well. Again I was mostly thinking about access control here
>>> but other qparsers can be added as well.
>>>
>>> This was not meant as the long term solution though. In future releases
>>> I think we should roll out as many function queries and qparser plugins as
>>> possible as SQL functions.
>>>
>>>
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Sun, May 22, 2016 at 3:11 PM, Timothy Potter 
>>> wrote:
>>>
 How would I do something like: find all docs using a geofilt, e.g.

 SELECT title_s
   FROM movielens
 WHERE location_p='{!geofilt d=90 pt=37.773972,-122.431297
 sfield=location_p}'

 This fails with:

 {"result-set":{"docs":[
 {"EXCEPTION":"java.util.concurrent.ExecutionException:
 java.io.IOException: -->

 http://ec2-54-165-55-141.compute-1.amazonaws.com:8984/solr/movielens_shard2_replica1/:Can't
 parse point '{!geofilt d=90 pt=37.773972,-122.431297
 sfield=location_p}' because: java.lang.NumberFormatException: For
 input string: \"{!geofilt d=90
 pt=37.773972\"","EOF":true,"RESPONSE_TIME":4}]}}

 In general, I wasn't able to find much about using functions with
 local params in the SQL syntax?

>>>
>>>
>>
>

Re: Sorting on child document field.

2016-05-23 Thread Pranaya Behera


Hi Mikhail,
 I saw the blog post tried to do that with parent block 
query {!parent} as I dont have the reference for the parent in the child 
to use in the {!join}. This is my result. 
https://gist.github.com/shadow-fox/b728683b27a2f39d1b5e1aac54b7a8fb . 
This yields me the results in desc even if I am using score=max. How 
would I get it in the asc order ? Any suggestions where I am messing up?


On Saturday 21 May 2016 12:03 AM, Mikhail Khludnev wrote:

Hello,

Check this
http://blog-archive.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
Let me know if you need further comments.

On Thu, May 19, 2016 at 4:25 PM, Pranaya Behera 
wrote:


Example would be:
Lets say that I have a product document with regular fields as name,
price, desc, is_parent. it has child documents such as
CA:: fields as a,b,c,rank
and another child document as
CB:: fields as  x,y,z.
I am using the query where {!parent which="is_parent:true"}a:some AND
b:somethingelse , here only CA child document is used for searching no
other child document has been touched. this CA has rank field. I want to
sort the parents using this field.
Product contains multiple CA documents. But the query matches only one
document exactly.


On Thursday 19 May 2016 04:09 PM, Pranaya Behera wrote:


While searching in the lucene code base I found
/ToParentBlockJoinSortField /but its not in the solr or even in solrj as
well. How would I use it with solrj as I can't find anything to query it
through the UI.

On Thursday 19 May 2016 11:29 AM, Pranaya Behera wrote:


Hi,

  How can I sort the results i.e. from a block join parent query
using the field from child document field ?

Thanks & Regards

Pranaya Behera

Re: Error opening new searcher

2016-05-23 Thread Victor D'agostino

Hi Erick
Thanks for your help, it is alright now.

Have a good day
Victor

Message original
*Sujet: *Re: Error opening new searcher
*De : *Erick Erickson
*Pour : *solr-user
*Date : *20/05/2016 17:57

Actually, it almost certainly _is_ in the regular Solr log file, just
which one. The file logging
rolls over, which is why you have solr.log, solr.log.1 etc. Likely the
message is in one of
those unless it happened a long time ago. Those Solr logs are really a
window that spans
some time period, how long depends on how much log traffic you're
generating. If you
need a longer window, adjust the log4j.properties file.

The console log file will accumulate all messages forever, which is why Shawn
recommends you disable it. If you monitor it and set just the console
logging to WARN
it won't grow very quickly (at least it better not) but you then have to monitor
it yourself.

Best,
Erick

On Fri, May 20, 2016 at 4:19 AM, Victor D'agostino
wrote:

Hi Shawn

Ok I am going to comit less often then.

I have planned to set the console log from INFO to WARN but this kind of log
was not in the solr.log regular log file !

Regards
Victor

Message original
*Sujet: *Re: Error opening new searcher
*De : *Shawn Heisey
*Pour : *solr-user@lucene.apache.org
*Date : *20/05/2016 11:40

On 5/20/2016 1:46 AM, Victor D'agostino wrote:

What doest this "try again later" log means in solr--console.log :

You should really disable console logging entirely. I assume you're
running at least version 5.0?

193899678 WARN (qtp1393423910-18329) [c:db s:shard3 r:core_node3
x:db_shard3_replica1] o.a.s.u.p.DistributedUpdateProcessor Error
sending update to
https://urldefense.proofpoint.com/v2/url?u=http-3A__10.69.212.22-3A8983_solr&d=CwIFaQ&c=1tDFxPZjcWEmlmmx4CZtyA&r=GIbD6pb1nH9ZrxFDfhl_c8kJe7NkpbmXG1YHXBYFth8&m=XvSFBhKqCFis7ZDQP0Npk9DWx5YWGwcP1wx5ZWQMbm4&s=0LMx0511pQ6Au2CeRhtuzD-zDGiCaxoBE0MnbvneS5I&e=
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at
https://urldefense.proofpoint.com/v2/url?u=http-3A__10.69.212.22-3A8983_solr_db-5Fshard3-5Freplica1-3A&d=CwIFaQ&c=1tDFxPZjcWEmlmmx4CZtyA&r=GIbD6pb1nH9ZrxFDfhl_c8kJe7NkpbmXG1YHXBYFth8&m=XvSFBhKqCFis7ZDQP0Npk9DWx5YWGwcP1wx5ZWQMbm4&s=GqsWQqbS0ryjcM6Cb_zPBkg0oljISX0dNnIjuJBoR1A&e=
Error opening new searcher. exceeded limit of maxWarmingSearchers=2,
try again later.

Am I supposed to resend the document or will it be inserted just fine
later ?

Most likely the update itself was fine -- the error was when opening a
new searcher, which is something that happens at commit time. You would
need to check the solr.log file on the server with address 10.69.212.22
to be sure. This particular error message means that you are committing
too frequently -- two previous commits with openSearcher=true were not
yet finished before a third commit with openSearcher=true was started.

And is it possible to set the log timestamp "193899678" to a human
readable format ?

Check the timestamp in the solr.log file. Like I said above -- the
console log should be disabled entirely. You should be able to remove
it as a logging destination by editing resources/log4j.properties
(assuming 5.x or 6.x).

Thanks,
Shawn

Ce message et les éventuels documents joints peuvent contenir des
informations confidentielles. Au cas où il ne vous serait pas destiné, nous
vous remercions de bien vouloir le supprimer et en aviser immédiatement
l'expéditeur. Toute utilisation de ce message non conforme à sa destination,
toute diffusion ou publication, totale ou partielle et quel qu'en soit le
moyen est formellement interdite. Les communications sur internet n'étant
pas sécurisées, l'intégrité de ce message n'est pas assurée et la société
émettrice ne peut être tenue pour responsable de son contenu.

Ce message et les éventuels documents joints peuvent contenir des informations
confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions
de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute
utilisation de ce message non conforme à sa destination, toute diffusion ou
publication, totale ou partielle et quel qu'en soit le moyen est formellement
interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité
de ce message n'est pas assurée et la société émettrice ne peut être tenue pour
responsable de son contenu.

Re: Sorting on child document field.

2016-05-23 Thread Mikhail Khludnev

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter

sort=score asc

On Mon, May 23, 2016 at 11:17 AM, Pranaya Behera 
wrote:

> Hi Mikhail,
>  I saw the blog post tried to do that with parent block
> query {!parent} as I dont have the reference for the parent in the child to
> use in the {!join}. This is my result.
> https://gist.github.com/shadow-fox/b728683b27a2f39d1b5e1aac54b7a8fb .
> This yields me the results in desc even if I am using score=max. How would
> I get it in the asc order ? Any suggestions where I am messing up?
>
>
> On Saturday 21 May 2016 12:03 AM, Mikhail Khludnev wrote:
>
>> Hello,
>>
>> Check this
>>
>> http://blog-archive.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
>> Let me know if you need further comments.
>>
>> On Thu, May 19, 2016 at 4:25 PM, Pranaya Behera 
>> wrote:
>>
>> Example would be:
>>> Lets say that I have a product document with regular fields as name,
>>> price, desc, is_parent. it has child documents such as
>>> CA:: fields as a,b,c,rank
>>> and another child document as
>>> CB:: fields as  x,y,z.
>>> I am using the query where {!parent which="is_parent:true"}a:some AND
>>> b:somethingelse , here only CA child document is used for searching no
>>> other child document has been touched. this CA has rank field. I want to
>>> sort the parents using this field.
>>> Product contains multiple CA documents. But the query matches only one
>>> document exactly.
>>>
>>>
>>> On Thursday 19 May 2016 04:09 PM, Pranaya Behera wrote:
>>>
>>> While searching in the lucene code base I found
 /ToParentBlockJoinSortField /but its not in the solr or even in solrj as
 well. How would I use it with solrj as I can't find anything to query it
 through the UI.

 On Thursday 19 May 2016 11:29 AM, Pranaya Behera wrote:

 Hi,
>
>   How can I sort the results i.e. from a block join parent query
> using the field from child document field ?
>
> Thanks & Regards
>
> Pranaya Behera
>
>
>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

SolrCloud increase replication factor

2016-05-23 Thread Hendrik Haddorp

Hi,

I have a SolrCloud 6.0 setup and created my collection with a
replication factor of 1. Now I want to increase the replication factor
but would like the replicas for the same shard to be on different nodes,
so that my collection does not fail when one node fails. I tried two
approaches so far:

1) When I use the collections API with the MODIFYCOLLECTION action [1] I
can set the replication factor but that did not result in the creation
of additional replicas. The Solr Admin UI showed that my replication
factor changed but otherwise nothing happened. A reload of the
collection did also result in no change.

2) Using the ADDREPLICA action [2] from the collections API I have to
add the replicas to the shard individually, which is a bit more
complicated but otherwise worked. During testing this did however at
least once result in the replica being created on the same node. My
collection was split in 4 shards and for 2 of them all replicas ended up
on the same node.

So is the only option to create the replicas manually and also pick the
nodes manually or is the perceived behavior wrong?

regards,
Hendrik

[1]
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-modifycoll
[2]
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica

Re: problems with nested queries

2016-05-23 Thread Matteo Grolla

Sure,
   sorry for the delay

2016-05-16 16:57 GMT+02:00 Yonik Seeley :

> Thanks Matteo, looks like you found a bug.
> I can reproduce this with simpler queries too:
>
> _query_:"ABC" name_t:"white cat"~3
> is parsed to
> text:abc name_t:"white cat"
>
> Can you open a JIRA for this?
>
> -Yonik
>
>
> On Mon, May 16, 2016 at 10:23 AM, Matteo Grolla 
> wrote:
> > Hi everyone,
> >  I have a problem with nested queries
> > If the order is:
> > 1) query
> > 2) nested query (embedded in _query_:"...")
> > everything works fine
> > if it is the opposite, like this
> >
> >
> http://localhost:8983/solr/test/select?q=_query_:%22{!lucene%20df=name_t}(\%22black%20dog\%22)%22%20OR%20name_t:%22white%20cat%22~20&debug=true
> >
> > then the span query "white cat"~20
> > becomes a phrase query "white cat"
> >
> > if both queries are embedded in _query_:"..." the behaviour is correct.
> > The bevaviour seems odd to me, is there any reason for it?
> >
> > 
> > 
> > 0
> > 8
> > 
> > 
> > _query_:"{!lucene df=name_t}(\"black dog\")" OR name_t:"white cat"~20
> > 
> > true
> > 
> > 
> > 
> > 
> > 
> > _query_:"{!lucene df=name_t}(\"black dog\")" OR name_t:"white cat"~20
> > 
> > 
> > _query_:"{!lucene df=name_t}(\"black dog\")" OR name_t:"white cat"~20
> > 
> > 
> > PhraseQuery(name_t:"black dog") PhraseQuery(name_t:"white cat")
> > 
> > name_t:"black dog" name_t:"white
> cat"
> > 
> > LuceneQParser
>

RE: How to use a regex search within a phrase query?

2016-05-23 Thread Erez Michalak

Good points, thanks Erick.

As you guessed, the use case is not in the main flow for the general user, but 
an advanced flow for a technical one.

Regarding the performance issue, I thought of a few optimizations for some 
expected expressions I need to support.
For instance, to walk around the digits regex in all my examples from the mail 
below, I can simply index terms with '\d' instead of every digit (like '\d\d\d' 
for '123').
This enables a faster search as follows:
* search for "\d\d\d" instead of "/[0-9]{3}/"
* search for "\d\d\d \d\d\d\d" instead of "/[0-9]{3}/ /[0-9]{4}/"
* search for "\d\d\d example" instead of "/[0-9]{3}/ example"
Clearly, this approach supports very limited set of expressions in expense for 
an increase in the index size.
For the general case, though, regular expressions may indeed require a full 
index scan. Seems like all I can do in that case is to warn the user in advance 
that this may take a (long) while.

Any further ideas on how to reduce the performance hit and survive the bad 
impact of a full index scan are welcomed..
Erez

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Sunday, May 22, 2016 7:43 PM
To: solr-user 
Subject: Re: How to use a regex search within a phrase query?

Erez:

Before going too far down this path, understand that even if you can get this 
syntax to work, you're going to pay a _very_ significant performance hit if you 
have any decent size corpus. Conceptually, what happens is that all the terms 
that the regex matches are made into clauses. So let's take a very simple 
wildcard case:

field1 has two values f1A and f1B
field2 has two values, f2A and f2B

The result of asking for "field1:f1? field2:f2?" (as a phrase) is "field1:f1A 
field2:f2A"
OR
"field1:f1A field2:f2B"
OR
"field1:f1B field2:f2A"
OR
"field1:f1B field2:f2B"

which may take quite a while to execute, and that doesn't even include the time 
that it'll take to enumerate the terms in a field that match your regex, which 
can get very ugly if your regex is such that it has to examine _every_ term in 
the field, i.e. the entire terms list for the field for the entire corpus.

This might be an XY problem, what problem are you solving with regexes? Might 
you be better off constructing better analysis chains?
The reason I ask is that unless you have technical users, regexes are unlikely 
to be even used

FWIW,
Erick

On Sun, May 22, 2016 at 8:19 AM, Erez Michalak  wrote:
> Thanks you Ahmet for the JIRA reference - it looks really promising and I'll 
> check it out.
>
> Regarding your question - once a piece of text is tokenized, it seems like 
> there is no way to perform a regex query across term boundaries. The pure 
> regex is good as long I'm querying for a single term.
>
>
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Sunday, May 22, 2016 4:49 PM
> To: solr-user@lucene.apache.org; Erez Michalak 
> Subject: Re: How to use a regex search within a phrase query?
>
> Hi Erez,
>
> I don't think it is possible to combine regex with phrase out-of-the-box.
> However, there is https://issues.apache.org/jira/browse/LUCENE-5205 for the 
> task.
>
> Can't you define your query in terms of pure regex?
> something like /[0-9]{3} .* [0-9]{4}/
>
> ahmet
>
>
> On Sunday, May 22, 2016 1:37 PM, Erez Michalak  wrote:
> Hey,
> I'm developing a search application based on SOLR 5.3.1, and would like to 
> add to it regex search capabilities on a specific tokenized text field named 
> 'content'.
> Is it possible to combine the default regex syntax within a phrase query (and 
> moreover, within a proximity search)? If so, please instruct me how..
>
> Thanks in advance,
> Erez Michalak
>
> p.s.
> Maybe the following example will make my question clearer:
> The query content:/[0-9]{3}/ returns documents with (at least one) 3 digits 
> token as expected.
> However,
>
> * the query content:"/[0-9]{3}/ /[0-9]{4}/" doesn't match the 
> contents '123-1234' and '123 1234', even though they are tokenized to two 
> tokens ('123' and '1234') which individually match each part of the query
>
> * the query content:"/[0-9]{3}/ example" doesn't match the content 
> '123 example'
>
> * even the query content:"/[0-9]{3}/" (same as the query that works 
> but surrounded with quotation marks) doesn't return documents with 3 digits 
> token!
>
> * etc.
>
>
> 
> This email and any attachments thereto may contain private, confidential, and 
> privileged material for the sole use of the intended recipient. Any review, 
> copying, or distribution of this email (or any attachments thereto) by others 
> is strictly prohibited. If you are not the intended recipient, please contact 
> the sender immediately and permanently delete the original and any copies of 
> this email and any attachments thereto.
> 
> This email and any attachments thereto may contain pr

Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-23 Thread Horváth Péter Gergely

Hi Steve,

Thank you very much for your inputs. Yes, I do know the aliasing mechanism
offered in Solr. I think the whole question boils down to one thing: how
much do you know about the data being stored -- and sometimes you know
nothing about that.

In some cases, you have to provide a generic solution for users to store
and query their own data. With Solr / Lucene backing your storage, you can
easily expose a restricted (but still powerful) subset of the Solr / Lucene
query syntax for querying user-defined data. Things however would start
getting complicated if you have to tell your customers, that the field you
loaded as "foo" must be referred as "foo_s" and the field you loaded "bar"
must be referred  as "bar_i", since it contains a number and so on...
Implementing the mapping in your application would be overly complex, as
you would have to maintain a mapping between the internal representation
("foo_s") and the query interface ("foo") and alias results from the
internal format to the format visible to the user ("foo_s" --> "foo"). I
think you get the idea.

I like the way Solr can use the name for specifying type: having a
configuration option (either global or at collection level), which tells
Solr to handle type postfixes slightly differently and strip the type
prefix automatically would be perfectly enough for this use-case.

Imagine the following approach: if configured so, Solr would still create
the field based on the type postfix, but would strip it from the name: for
example, if a document is inserted with the field "foo_s" and "bar_i", Solr
could create a string field named "foo" and a numeric field "bar".

I think this solution would be both backwards compatible (has to be
explicitly enabled) and relatively simple to implement in the Solr code
base. I have created a Jira issue for the feature request:
https://issues.apache.org/jira/browse/SOLR-9150

What do you think?

Thanks,
Peter

2016-05-19 15:30 GMT+02:00 Steve Rowe :

> Peter,
>
> It’s an interesting idea.  Could you make a Solr JIRA?
>
> I don’t know where the field type specification would go, but providing a
> mechanism to specify field type for previously non-existent fields, outside
> of the field names themselves, seems useful.
>
> In the meantime, do you know about field aliasing?
>
> 1. You can get results back that rename fields to whatever you want: see
> the section “Field Name Aliases” here: <
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters>.
>
> 2. On the query side, eDisMax can perform aliasing so that user-specified
> field names in queries get mapped to one or more indexed fields: look for
> “alias” in <
> https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
> >.
>
> --
> Steve
> www.lucidworks.com
>
> > On May 19, 2016, at 4:43 AM, Horváth Péter Gergely <
> peter.gergely.horv...@gmail.com> wrote:
> >
> > Hi Steve,
> >
> > Yes, I know the schema API, however I do not want to specify the field
> type
> > problematically for every single field.
> >
> > I would like to be able to specify the field type when it is being added
> > (similar to the name postfixes, but without affecting the field names).
> >
> > Thanks,
> > Peter
> >
> >
> > 2016-05-17 17:08 GMT+02:00 Steve Rowe :
> >
> >> Hi Peter,
> >>
> >> Are you familiar with the Schema API?: <
> >> https://cwiki.apache.org/confluence/display/solr/Schema+API>
> >>
> >> You can use it to create fields, field types, etc. prior to ingesting
> your
> >> data.
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >>> On May 17, 2016, at 11:05 AM, Horváth Péter Gergely <
> >> peter.gergely.horv...@gmail.com> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> By default Solr allows you to define the type of a dynamic field by
> >>> appending a post-fix to the name itself. E.g. creating a color_s field
> >>> instructs Solr to create a string field. My understanding is that if we
> >> do
> >>> this, all queries must refer the post-fixed field name as well. So
> >>> instead of a query like color:"red", we will have to write something
> like
> >>> color_s:"red" -- and so on for other field types as well.
> >>>
> >>> I am wondering if it is possible to specify the data type used for a
> >> field
> >>> in Solr 6.0.0, without having to modify the field name. (Or at least
> in a
> >>> way that would allow us to use the original field name) Do you have any
> >>> idea, how to achieve this? I am fine, if we have to specify the field
> >> type
> >>> during the insertion of a document, however, I do not want to keep
> using
> >>> post-fixes while running queries...
> >>>
> >>> Thanks,
> >>> Peter
> >>
> >>
>
>

Solr cloud with Grouping query gives inconsistent results

2016-05-23 Thread preeti kumari

Hi All,

I am using grouping query with solr cloud version 5.2.1 .
Parameters added in my query is
&q=SIM*group=true&group.field=amid&group.limit=1&group.main=true. But each
time I hit the query i get different results i.e top 10 results are
different each time.

Why is it so ? Please help me with this.
Is there any way by which I can get consistent results from grouping query
in solr cloud.

Thanks
Preeti

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Alessandro Benedetti

Let's add some additional details guys :

1) *Faceting*
Currently the facet method used is "enum" and it runs over 20 fields more
or less.
Mainly using it on low cardinality fields except one which has a
cardinality of 1000 terms.
I am aware of the famous Jira related faceting regression :
https://issues.apache.org/jira/browse/SOLR-8096 .

Our index is indeed quite static ( we index once per day) and the fields we
facet on are multi-valued ( by schema definition but not in practise) .
But we use Term Enum as method so i was not expecting to hit the regression.
We currently see  query times which are 30% worse than Solr 4.10.2 .
Our next experiment will be to enable docValues for all the fields and
verify if we get any benefit ( switching the facet method to fc) .
At the moment, switching to json faceting is not an option as we would like
first to proceed with a transparent migration and then possibly add
improvements and refactor in the future.
Following will be to fix the schema to set as multi valued only what is
really multi-valued ( do you know if this can affect ? the wrong schema
definition is enough to mess up the facet performance ? even if then the
fields are single valued ?)

2) *Field Collapsing*
Field collapsing performance seems much, much worse, something like 200 ms
( Solr 4) vs 1800 ms ( Solr 6) .
This is suprising as I never heard about any regression in field collapsing.
I will investigate a little bit more in details about the internals of the
field collapsing and why the performance could be so degraded.
I will also verify if I find any info in the mailing list or Jira.

&fq={!collapse field=string_field sort='TrieDoubleField asc'}

let me know if you faced something similar

Cheers

On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> I'm planning a migration from 4.10.2 to 6.0 .
> Because we generate the index on daily basis from scratch, we don't need
> to migrate the index but actually only migrate the server instances.
> With my team we were doing some experiments on some dev machines,
> basically comparing Solr 4.10.2 and Solr 6.0 to check any functional and
> performance regression in our use cases.
>
> After setting up two installation on the same machine ( switching on and
> off each version for doing comparison and experiments) we are verifying a
> degradation of the performances with Solr 6.
>
> Basically from a queryTime and throughput perspective Solr 6 is not
> performing as well as Solr 4.10.2 .
> Still need to start the proper investigations but this appears weird to me.
> Will proceed with all the analysis of the case and a deep study of our
> queries ( which anyway are mainly fq , faceting and grouping).
>
> Any suggestion in particular to start with ? Has anyone experienced a
> similar migration with similar experience ?
> I will anyway explore also the mailing list in search for similar cases.
>
> Cheers
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Sorting on child document field.

2016-05-23 Thread Pranaya Behera

Hi Mikhail,
Thanks. Missed it completely thought it would handle by
default.

On Monday 23 May 2016 02:08 PM, Mikhail Khludnev wrote:

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter

sort=score asc

On Mon, May 23, 2016 at 11:17 AM, Pranaya Behera
mailto:pranaya.beh...@igp.com>> wrote:

Hi Mikhail,
I saw the blog post tried to do that with parent
block query {!parent} as I dont have the reference for the parent
in the child to use in the {!join}. This is my result.
https://gist.github.com/shadow-fox/b728683b27a2f39d1b5e1aac54b7a8fb
. This yields me the results in desc even if I am using score=max.
How would I get it in the asc order ? Any suggestions where I am
messing up?

On Saturday 21 May 2016 12:03 AM, Mikhail Khludnev wrote:

Hello,

Check this

http://blog-archive.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
Let me know if you need further comments.

On Thu, May 19, 2016 at 4:25 PM, Pranaya Behera
mailto:pranaya.beh...@igp.com>>
wrote:

Example would be:
Lets say that I have a product document with regular
fields as name,
price, desc, is_parent. it has child documents such as
CA:: fields as a,b,c,rank
and another child document as
CB:: fields as x,y,z.
I am using the query where {!parent
which="is_parent:true"}a:some AND
b:somethingelse , here only CA child document is used for
searching no
other child document has been touched. this CA has rank
field. I want to
sort the parents using this field.
Product contains multiple CA documents. But the query
matches only one
document exactly.

On Thursday 19 May 2016 04:09 PM, Pranaya Behera wrote:

While searching in the lucene code base I found
/ToParentBlockJoinSortField /but its not in the solr
or even in solrj as
well. How would I use it with solrj as I can't find
anything to query it
through the UI.

On Thursday 19 May 2016 11:29 AM, Pranaya Behera wrote:

Hi,

How can I sort the results i.e. from a block
join parent query
using the field from child document field ?

Thanks & Regards

Pranaya Behera

--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

hello i am solr cloud user! i have question!

2016-05-23 Thread 김두형

actually, i want to insert some logs into solrindexsearcher. so the place
where solrindexsearcher is solr-core.jar in dist. i replace new made
solr-core.jar with old solr-core.jar in dist.
in solrconfig i made this solrconfig refered this jar like below.

  
.
.
.

however, solr did not refer what i made(solr-core.jar file). i stayed up
all night.
but i cannot find proper way...

would u please please give me the way..?

How to use "fq"

2016-05-23 Thread Steven White

Hi everyone,

I'm trying to figure out what's the best way for me to use "fq" when the
list of items is large (up to 200, but I have few cases with up to 1000).

My current usage is like so: &fq=category:(1 OR 2 OR 3 OR 4 ... 200)

When I tested with up to 1000, I hit the "too many boolean clauses", so my
fix was to increase the value of maxBooleanClauses.  However, reading [1]
warns that increasing the value of maxBooleanClauses has negative impact.
The link offers an alternative usage like so:
fq=category:1&fq=category:2...  But I cannot use it because I need my "fq"
to be treated as OR (my default is set to AND).

I'm trying to understand what's the best way for me to coded this so I
don't get a performance or memory hit.

Thanks

Steve

[1]
http://solr.pl/en/2011/12/19/do-i-have-to-look-for-maxbooleanclauses-when-using-filters/

Atomic updates and "stored"

2016-05-23 Thread Mark Robinson

Hi,

I have some 150 fields in my schema out of which about 100 are dynamic
fields which I am not storing (stored="false").
In case I need to do an atomic update to one or two fields which belong to
the stored list of fields, do I need to change my dynamic fields (100 or so
now not "stored") to stored="true"?

If so wouldn't it considerably increase index size and affect performance
in the negative?

Is there any way currently to do partial/ atomic updates to one or two
fields (which I will make stored="true") without having to make my now
stored="false" fields to stored="true" just
to accommodate atomic updates.

Could some one pls give your suggestions.

Thanks!
Mark.

Re: How to use "fq"

2016-05-23 Thread Erik Hatcher

Try the {!terms} query parser.  That should make it work well for you.  Let us 
know how it does.  

   Erik

> On May 23, 2016, at 08:52, Steven White  wrote:
> 
> Hi everyone,
> 
> I'm trying to figure out what's the best way for me to use "fq" when the
> list of items is large (up to 200, but I have few cases with up to 1000).
> 
> My current usage is like so: &fq=category:(1 OR 2 OR 3 OR 4 ... 200)
> 
> When I tested with up to 1000, I hit the "too many boolean clauses", so my
> fix was to increase the value of maxBooleanClauses.  However, reading [1]
> warns that increasing the value of maxBooleanClauses has negative impact.
> The link offers an alternative usage like so:
> fq=category:1&fq=category:2...  But I cannot use it because I need my "fq"
> to be treated as OR (my default is set to AND).
> 
> I'm trying to understand what's the best way for me to coded this so I
> don't get a performance or memory hit.
> 
> Thanks
> 
> Steve
> 
> [1]
> http://solr.pl/en/2011/12/19/do-i-have-to-look-for-maxbooleanclauses-when-using-filters/

Re: SolrCloud increase replication factor

2016-05-23 Thread Tom Evans

On Mon, May 23, 2016 at 10:37 AM, Hendrik Haddorp
 wrote:
> Hi,
>
> I have a SolrCloud 6.0 setup and created my collection with a
> replication factor of 1. Now I want to increase the replication factor
> but would like the replicas for the same shard to be on different nodes,
> so that my collection does not fail when one node fails. I tried two
> approaches so far:
>
> 1) When I use the collections API with the MODIFYCOLLECTION action [1] I
> can set the replication factor but that did not result in the creation
> of additional replicas. The Solr Admin UI showed that my replication
> factor changed but otherwise nothing happened. A reload of the
> collection did also result in no change.
>
> 2) Using the ADDREPLICA action [2] from the collections API I have to
> add the replicas to the shard individually, which is a bit more
> complicated but otherwise worked. During testing this did however at
> least once result in the replica being created on the same node. My
> collection was split in 4 shards and for 2 of them all replicas ended up
> on the same node.
>
> So is the only option to create the replicas manually and also pick the
> nodes manually or is the perceived behavior wrong?
>
> regards,
> Hendrik
>
> [1]
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-modifycoll
> [2]
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica

With ADDREPLICA, you can specify the node to create the replica on. If
you are using a script to increase/remove replicas, you can simply
incorporate the logic you desire in to your script - you can also use
CLUSTERSTATUS to get a list of nodes/collections/shards etc in order
to inform the logic in the script. This is the approach we took, we
have a fabric script to add/remove extra nodes to/from the cluster, it
works well.

The alternative is to put the logic in to Solr itself, using what Solr
calls a "snitch" to define the rules on where replicas are created.
The snitch is specified at collection creation time, or you can use
MODIFYCOLLECTION to set it after the fact. See this wiki patch for
details:

https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement

Cheers

Tom

Re: How to use "fq"

2016-05-23 Thread Scott Chu


Yonik has a very well article about term qp:

Solr Terms Query for matching many terms - Solr 'n Stuff
http://yonik.com/solr-terms-query/


Scott Chu，scott@udngroup.com
2016/5/23 (週一)
- Original Message - 
From: Erik Hatcher 
To: solr-user 
CC: 
Date: 2016/5/23 (週一) 21:14
Subject: Re: How to use "fq"


Try the {!terms} query parser. That should make it work well for you. Let us 
know how it does. 

   Erik 

> On May 23, 2016, at 08:52, Steven White  wrote: 
> 
> Hi everyone, 
> 
> I'm trying to figure out what's the best way for me to use "fq" when the 

> list of items is large (up to 200, but I have few cases with up to 1000). 
> 
> My current usage is like so: &fq=category:(1 OR 2 OR 3 OR 4 ... 200) 
> 
> When I tested with up to 1000, I hit the "too many boolean clauses", so my 
> fix was to increase the value of maxBooleanClauses. However, reading [1] 
> warns that increasing the value of maxBooleanClauses has negative impact. 
> The link offers an alternative usage like so: 
> fq=category:1&fq=category:2... But I cannot use it because I need my "fq" 
> to be treated as OR (my default is set to AND). 
> 
> I'm trying to understand what's the best way for me to coded this so I 
> don't get a performance or memory hit. 
> 
> Thanks 
> 
> Steve 
> 
> [1] 
> http://solr.pl/en/2011/12/19/do-i-have-to-look-for-maxbooleanclauses-when-using-filters/
>  


- 
??? 
??? AVG ?? - www.avg.com 
??: 2015.0.6201 / ???: 4568/12281 - : 05/23/16

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein

Were you using the sort param or min/max param in Solr 4 to select the
group head? The sort work came later and I'm not sure how it compares in
performance to the min/max param.

Since you are collapsing on a string field you can use the top_fc hint
which will use a top level field cache for the collapse. This is faster at
query time then the default which uses MultiDocValue ordinal map.

The docs cover the top_fc hint.
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> Let's add some additional details guys :
>
> 1) *Faceting*
> Currently the facet method used is "enum" and it runs over 20 fields more
> or less.
> Mainly using it on low cardinality fields except one which has a
> cardinality of 1000 terms.
> I am aware of the famous Jira related faceting regression :
> https://issues.apache.org/jira/browse/SOLR-8096 .
>
> Our index is indeed quite static ( we index once per day) and the fields we
> facet on are multi-valued ( by schema definition but not in practise) .
> But we use Term Enum as method so i was not expecting to hit the
> regression.
> We currently see  query times which are 30% worse than Solr 4.10.2 .
> Our next experiment will be to enable docValues for all the fields and
> verify if we get any benefit ( switching the facet method to fc) .
> At the moment, switching to json faceting is not an option as we would like
> first to proceed with a transparent migration and then possibly add
> improvements and refactor in the future.
> Following will be to fix the schema to set as multi valued only what is
> really multi-valued ( do you know if this can affect ? the wrong schema
> definition is enough to mess up the facet performance ? even if then the
> fields are single valued ?)
>
>
> 2) *Field Collapsing*
> Field collapsing performance seems much, much worse, something like 200 ms
> ( Solr 4) vs 1800 ms ( Solr 6) .
> This is suprising as I never heard about any regression in field
> collapsing.
> I will investigate a little bit more in details about the internals of the
> field collapsing and why the performance could be so degraded.
> I will also verify if I find any info in the mailing list or Jira.
>
> &fq={!collapse field=string_field sort='TrieDoubleField asc'}
>
> let me know if you faced something similar
>
> Cheers
>
> On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
>
> > I'm planning a migration from 4.10.2 to 6.0 .
> > Because we generate the index on daily basis from scratch, we don't need
> > to migrate the index but actually only migrate the server instances.
> > With my team we were doing some experiments on some dev machines,
> > basically comparing Solr 4.10.2 and Solr 6.0 to check any functional and
> > performance regression in our use cases.
> >
> > After setting up two installation on the same machine ( switching on and
> > off each version for doing comparison and experiments) we are verifying a
> > degradation of the performances with Solr 6.
> >
> > Basically from a queryTime and throughput perspective Solr 6 is not
> > performing as well as Solr 4.10.2 .
> > Still need to start the proper investigations but this appears weird to
> me.
> > Will proceed with all the analysis of the case and a deep study of our
> > queries ( which anyway are mainly fq , faceting and grouping).
> >
> > Any suggestion in particular to start with ? Has anyone experienced a
> > similar migration with similar experience ?
> > I will anyway explore also the mailing list in search for similar cases.
> >
> > Cheers
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: hello i am solr cloud user! i have question!

2016-05-23 Thread Shawn Heisey

On 5/23/2016 6:35 AM, 김두형 wrote:
> actually, i want to insert some logs into solrindexsearcher. so the place
> where solrindexsearcher is solr-core.jar in dist. i replace new made
> solr-core.jar with old solr-core.jar in dist.
> in solrconfig i made this solrconfig refered this jar like below.
>
>"${solr.install.dir:}/dist/solr-core-5.5.0-SNAPSHOT.jar"/>

Assuming 5.3 or later (your filename says 5.5.0), you have to replace
the solr-core jar in the server/solr-webapp/webapp/WEB-INF/lib
directory.  It does not need to have the same filename when you do this,
but if it has a different name, you must delete the old one.

Jetty loads the solr-core jar from the directory mentioned above when it
starts Solr.  The classes that actually process solrconfig.xml are in
the solr-core jar ... so the original version is already loaded when
your new config line is processed.

Thanks,
Shawn

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein

Also I wrote a guide for Solr 5 Collapsing/Expand performance, that use to
be on Heliosearch.org. It's now long available accept through the magic of
the Wayback machine. What's not covered is the sort param, which came later.

Here it is:

http://web.archive.org/web/20150709154420/http://heliosearch.org/solr5-collapse-expand

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein  wrote:

> Were you using the sort param or min/max param in Solr 4 to select the
> group head? The sort work came later and I'm not sure how it compares in
> performance to the min/max param.
>
> Since you are collapsing on a string field you can use the top_fc hint
> which will use a top level field cache for the collapse. This is faster at
> query time then the default which uses MultiDocValue ordinal map.
>
> The docs cover the top_fc hint.
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
>
>> Let's add some additional details guys :
>>
>> 1) *Faceting*
>> Currently the facet method used is "enum" and it runs over 20 fields more
>> or less.
>> Mainly using it on low cardinality fields except one which has a
>> cardinality of 1000 terms.
>> I am aware of the famous Jira related faceting regression :
>> https://issues.apache.org/jira/browse/SOLR-8096 .
>>
>> Our index is indeed quite static ( we index once per day) and the fields
>> we
>> facet on are multi-valued ( by schema definition but not in practise) .
>> But we use Term Enum as method so i was not expecting to hit the
>> regression.
>> We currently see  query times which are 30% worse than Solr 4.10.2 .
>> Our next experiment will be to enable docValues for all the fields and
>> verify if we get any benefit ( switching the facet method to fc) .
>> At the moment, switching to json faceting is not an option as we would
>> like
>> first to proceed with a transparent migration and then possibly add
>> improvements and refactor in the future.
>> Following will be to fix the schema to set as multi valued only what is
>> really multi-valued ( do you know if this can affect ? the wrong schema
>> definition is enough to mess up the facet performance ? even if then the
>> fields are single valued ?)
>>
>>
>> 2) *Field Collapsing*
>> Field collapsing performance seems much, much worse, something like 200 ms
>> ( Solr 4) vs 1800 ms ( Solr 6) .
>> This is suprising as I never heard about any regression in field
>> collapsing.
>> I will investigate a little bit more in details about the internals of the
>> field collapsing and why the performance could be so degraded.
>> I will also verify if I find any info in the mailing list or Jira.
>>
>> &fq={!collapse field=string_field sort='TrieDoubleField asc'}
>>
>> let me know if you faced something similar
>>
>> Cheers
>>
>> On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti <
>> abenede...@apache.org> wrote:
>>
>> > I'm planning a migration from 4.10.2 to 6.0 .
>> > Because we generate the index on daily basis from scratch, we don't need
>> > to migrate the index but actually only migrate the server instances.
>> > With my team we were doing some experiments on some dev machines,
>> > basically comparing Solr 4.10.2 and Solr 6.0 to check any functional and
>> > performance regression in our use cases.
>> >
>> > After setting up two installation on the same machine ( switching on and
>> > off each version for doing comparison and experiments) we are verifying
>> a
>> > degradation of the performances with Solr 6.
>> >
>> > Basically from a queryTime and throughput perspective Solr 6 is not
>> > performing as well as Solr 4.10.2 .
>> > Still need to start the proper investigations but this appears weird to
>> me.
>> > Will proceed with all the analysis of the case and a deep study of our
>> > queries ( which anyway are mainly fq , faceting and grouping).
>> >
>> > Any suggestion in particular to start with ? Has anyone experienced a
>> > similar migration with similar experience ?
>> > I will anyway explore also the mailing list in search for similar cases.
>> >
>> > Cheers
>> >
>> > --
>> > --
>> >
>> > Benedetti Alessandro
>> > Visiting card : http://about.me/alessandro_benedetti
>> >
>> > "Tyger, tyger burning bright
>> > In the forests of the night,
>> > What immortal hand or eye
>> > Could frame thy fearful symmetry?"
>> >
>> > William Blake - Songs of Experience -1794 England
>> >
>>
>>
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein

For exact syntax of the top_fc hint use the official docs. The blog is
using an upper case hint, but that was changed to a lower case hint.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 23, 2016 at 2:56 PM, Joel Bernstein  wrote:

> Also I wrote a guide for Solr 5 Collapsing/Expand performance, that use to
> be on Heliosearch.org. It's now long available accept through the magic of
> the Wayback machine. What's not covered is the sort param, which came later.
>
> Here it is:
>
>
> http://web.archive.org/web/20150709154420/http://heliosearch.org/solr5-collapse-expand
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein 
> wrote:
>
>> Were you using the sort param or min/max param in Solr 4 to select the
>> group head? The sort work came later and I'm not sure how it compares in
>> performance to the min/max param.
>>
>> Since you are collapsing on a string field you can use the top_fc hint
>> which will use a top level field cache for the collapse. This is faster at
>> query time then the default which uses MultiDocValue ordinal map.
>>
>> The docs cover the top_fc hint.
>> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti <
>> abenede...@apache.org> wrote:
>>
>>> Let's add some additional details guys :
>>>
>>> 1) *Faceting*
>>> Currently the facet method used is "enum" and it runs over 20 fields more
>>> or less.
>>> Mainly using it on low cardinality fields except one which has a
>>> cardinality of 1000 terms.
>>> I am aware of the famous Jira related faceting regression :
>>> https://issues.apache.org/jira/browse/SOLR-8096 .
>>>
>>> Our index is indeed quite static ( we index once per day) and the fields
>>> we
>>> facet on are multi-valued ( by schema definition but not in practise) .
>>> But we use Term Enum as method so i was not expecting to hit the
>>> regression.
>>> We currently see  query times which are 30% worse than Solr 4.10.2 .
>>> Our next experiment will be to enable docValues for all the fields and
>>> verify if we get any benefit ( switching the facet method to fc) .
>>> At the moment, switching to json faceting is not an option as we would
>>> like
>>> first to proceed with a transparent migration and then possibly add
>>> improvements and refactor in the future.
>>> Following will be to fix the schema to set as multi valued only what is
>>> really multi-valued ( do you know if this can affect ? the wrong schema
>>> definition is enough to mess up the facet performance ? even if then the
>>> fields are single valued ?)
>>>
>>>
>>> 2) *Field Collapsing*
>>> Field collapsing performance seems much, much worse, something like 200
>>> ms
>>> ( Solr 4) vs 1800 ms ( Solr 6) .
>>> This is suprising as I never heard about any regression in field
>>> collapsing.
>>> I will investigate a little bit more in details about the internals of
>>> the
>>> field collapsing and why the performance could be so degraded.
>>> I will also verify if I find any info in the mailing list or Jira.
>>>
>>> &fq={!collapse field=string_field sort='TrieDoubleField asc'}
>>>
>>> let me know if you faced something similar
>>>
>>> Cheers
>>>
>>> On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti <
>>> abenede...@apache.org> wrote:
>>>
>>> > I'm planning a migration from 4.10.2 to 6.0 .
>>> > Because we generate the index on daily basis from scratch, we don't
>>> need
>>> > to migrate the index but actually only migrate the server instances.
>>> > With my team we were doing some experiments on some dev machines,
>>> > basically comparing Solr 4.10.2 and Solr 6.0 to check any functional
>>> and
>>> > performance regression in our use cases.
>>> >
>>> > After setting up two installation on the same machine ( switching on
>>> and
>>> > off each version for doing comparison and experiments) we are
>>> verifying a
>>> > degradation of the performances with Solr 6.
>>> >
>>> > Basically from a queryTime and throughput perspective Solr 6 is not
>>> > performing as well as Solr 4.10.2 .
>>> > Still need to start the proper investigations but this appears weird
>>> to me.
>>> > Will proceed with all the analysis of the case and a deep study of our
>>> > queries ( which anyway are mainly fq , faceting and grouping).
>>> >
>>> > Any suggestion in particular to start with ? Has anyone experienced a
>>> > similar migration with similar experience ?
>>> > I will anyway explore also the mailing list in search for similar
>>> cases.
>>> >
>>> > Cheers
>>> >
>>> > --
>>> > --
>>> >
>>> > Benedetti Alessandro
>>> > Visiting card : http://about.me/alessandro_benedetti
>>> >
>>> > "Tyger, tyger burning bright
>>> > In the forests of the night,
>>> > What immortal hand or eye
>>> > Could frame thy fearful symmetry?"
>>> >
>>> > William Blake - Songs of Experience -1794 England
>>> >
>>>
>>>
>>>
>>> --
>>> -

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Alessandro Benedetti

Hi Joel,
thanks for the reply, actually we were not using field collapsing before,
we basically want to replace grouping with that.
The grouping performance between Solr 4 and 6 are basically comparable.
It's surprising I got so big degradation with the field collapsing.

So basically the comparison we did were based on the Solr4 queries ,
extracted from logs, and modified slightly to include field collapsing
parameter.

To build the tests to compare Solr 4.10.2 to Solr 6 we basically proceeded
in this way :

1) install Solr 4.10.2 and Solr 6.0.0
2) migrate the index with the related lucene tool ( 4.10.2 -> 5.5.0 -> Solr
6.0 )
3) switch on/off the 2 instances and repeating the tests both with cold
instances and warm instances.

This means that the query looks the same.
I have not double checked the results but only the timings.
I will provide additional feedback to see if the query are producing
comparable results as well.

Related your suggestion about the top_fc, thanks, I will try that .
I actually discovered that a little bit after I posted the mailing list ( I
think exactly from another post of yours :) )

Not sure if setting up docValues for the field we use to collapse could
give some benefit as well.

I keep you updated,

Cheers

On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein  wrote:

> Were you using the sort param or min/max param in Solr 4 to select the
> group head? The sort work came later and I'm not sure how it compares in
> performance to the min/max param.
>
> Since you are collapsing on a string field you can use the top_fc hint
> which will use a top level field cache for the collapse. This is faster at
> query time then the default which uses MultiDocValue ordinal map.
>
> The docs cover the top_fc hint.
>
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
>
> > Let's add some additional details guys :
> >
> > 1) *Faceting*
> > Currently the facet method used is "enum" and it runs over 20 fields more
> > or less.
> > Mainly using it on low cardinality fields except one which has a
> > cardinality of 1000 terms.
> > I am aware of the famous Jira related faceting regression :
> > https://issues.apache.org/jira/browse/SOLR-8096 .
> >
> > Our index is indeed quite static ( we index once per day) and the fields
> we
> > facet on are multi-valued ( by schema definition but not in practise) .
> > But we use Term Enum as method so i was not expecting to hit the
> > regression.
> > We currently see  query times which are 30% worse than Solr 4.10.2 .
> > Our next experiment will be to enable docValues for all the fields and
> > verify if we get any benefit ( switching the facet method to fc) .
> > At the moment, switching to json faceting is not an option as we would
> like
> > first to proceed with a transparent migration and then possibly add
> > improvements and refactor in the future.
> > Following will be to fix the schema to set as multi valued only what is
> > really multi-valued ( do you know if this can affect ? the wrong schema
> > definition is enough to mess up the facet performance ? even if then the
> > fields are single valued ?)
> >
> >
> > 2) *Field Collapsing*
> > Field collapsing performance seems much, much worse, something like 200
> ms
> > ( Solr 4) vs 1800 ms ( Solr 6) .
> > This is suprising as I never heard about any regression in field
> > collapsing.
> > I will investigate a little bit more in details about the internals of
> the
> > field collapsing and why the performance could be so degraded.
> > I will also verify if I find any info in the mailing list or Jira.
> >
> > &fq={!collapse field=string_field sort='TrieDoubleField asc'}
> >
> > let me know if you faced something similar
> >
> > Cheers
> >
> > On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti <
> > abenede...@apache.org> wrote:
> >
> > > I'm planning a migration from 4.10.2 to 6.0 .
> > > Because we generate the index on daily basis from scratch, we don't
> need
> > > to migrate the index but actually only migrate the server instances.
> > > With my team we were doing some experiments on some dev machines,
> > > basically comparing Solr 4.10.2 and Solr 6.0 to check any functional
> and
> > > performance regression in our use cases.
> > >
> > > After setting up two installation on the same machine ( switching on
> and
> > > off each version for doing comparison and experiments) we are
> verifying a
> > > degradation of the performances with Solr 6.
> > >
> > > Basically from a queryTime and throughput perspective Solr 6 is not
> > > performing as well as Solr 4.10.2 .
> > > Still need to start the proper investigations but this appears weird to
> > me.
> > > Will proceed with all the analysis of the case and a deep study of our
> > > queries ( which anyway are mainly fq , faceting and grouping).
> > >
> > > Any suggestion in part

What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

2016-05-23 Thread Scott Chu

I just created a 90gb index collection with 1 shard and 2 replicas on 2 nodes. 
I am to migrate from 2 nodes to 4 node. I am wondering what's the best stragedy 
to split this single shard? Furthermore, If I am ok to reindex, what's the best 
adequate experienced value of numShards and replicationFactor? Lastly, I think 
there's no other way but reindex if I want my data to be evenly distributed 
into every shard I create, right?

Scott Chu，scott@udngroup.com
2016/5/23 (週一)

P.S. For those who are curious of why I add [scottchu] in subject, the reason 
is that I want my email filter to route those emails that answer to my question 
to specific folder.

What to do best when expaning from 2 nodes to 4 nodes? (fix typo) [scottchu]

2016-05-23 Thread Scott Chu


Sorry for the typo. I rewrite my question again:

I just created a 90gb index collection with 1 shard and 2 replicas on 2 nodes. 
I am to migrate from 2 nodes to 4 nodes. I am wondering what's the best 
strategy to split this single shard? Furthermore, if I am ok to reindex, what's 
the best adequate experienced values of numShards and replicationFactor? 
Lastly, If I add new shard(s), I think there's no other way but reindex if I 
want my data to be evenly distributed into every shard, right? 


Scott Chu，scott@udngroup.com
2016/5/23 (週一)

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Joel Bernstein

If you can make min/max work for you instead of sort then it should be
faster, but I haven't spent time comparing the performance.

But if you're using the top_fc with the min/max param the performance
between Solr 4 & Solr 6 should be very close as the data structures behind
them are the same.






Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 23, 2016 at 3:34 PM, Alessandro Benedetti  wrote:

> Hi Joel,
> thanks for the reply, actually we were not using field collapsing before,
> we basically want to replace grouping with that.
> The grouping performance between Solr 4 and 6 are basically comparable.
> It's surprising I got so big degradation with the field collapsing.
>
> So basically the comparison we did were based on the Solr4 queries ,
> extracted from logs, and modified slightly to include field collapsing
> parameter.
>
> To build the tests to compare Solr 4.10.2 to Solr 6 we basically proceeded
> in this way :
>
> 1) install Solr 4.10.2 and Solr 6.0.0
> 2) migrate the index with the related lucene tool ( 4.10.2 -> 5.5.0 -> Solr
> 6.0 )
> 3) switch on/off the 2 instances and repeating the tests both with cold
> instances and warm instances.
>
> This means that the query looks the same.
> I have not double checked the results but only the timings.
> I will provide additional feedback to see if the query are producing
> comparable results as well.
>
> Related your suggestion about the top_fc, thanks, I will try that .
> I actually discovered that a little bit after I posted the mailing list ( I
> think exactly from another post of yours :) )
>
> Not sure if setting up docValues for the field we use to collapse could
> give some benefit as well.
>
> I keep you updated,
>
> Cheers
>
> On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein 
> wrote:
>
> > Were you using the sort param or min/max param in Solr 4 to select the
> > group head? The sort work came later and I'm not sure how it compares in
> > performance to the min/max param.
> >
> > Since you are collapsing on a string field you can use the top_fc hint
> > which will use a top level field cache for the collapse. This is faster
> at
> > query time then the default which uses MultiDocValue ordinal map.
> >
> > The docs cover the top_fc hint.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti <
> > abenede...@apache.org> wrote:
> >
> > > Let's add some additional details guys :
> > >
> > > 1) *Faceting*
> > > Currently the facet method used is "enum" and it runs over 20 fields
> more
> > > or less.
> > > Mainly using it on low cardinality fields except one which has a
> > > cardinality of 1000 terms.
> > > I am aware of the famous Jira related faceting regression :
> > > https://issues.apache.org/jira/browse/SOLR-8096 .
> > >
> > > Our index is indeed quite static ( we index once per day) and the
> fields
> > we
> > > facet on are multi-valued ( by schema definition but not in practise) .
> > > But we use Term Enum as method so i was not expecting to hit the
> > > regression.
> > > We currently see  query times which are 30% worse than Solr 4.10.2 .
> > > Our next experiment will be to enable docValues for all the fields and
> > > verify if we get any benefit ( switching the facet method to fc) .
> > > At the moment, switching to json faceting is not an option as we would
> > like
> > > first to proceed with a transparent migration and then possibly add
> > > improvements and refactor in the future.
> > > Following will be to fix the schema to set as multi valued only what is
> > > really multi-valued ( do you know if this can affect ? the wrong schema
> > > definition is enough to mess up the facet performance ? even if then
> the
> > > fields are single valued ?)
> > >
> > >
> > > 2) *Field Collapsing*
> > > Field collapsing performance seems much, much worse, something like 200
> > ms
> > > ( Solr 4) vs 1800 ms ( Solr 6) .
> > > This is suprising as I never heard about any regression in field
> > > collapsing.
> > > I will investigate a little bit more in details about the internals of
> > the
> > > field collapsing and why the performance could be so degraded.
> > > I will also verify if I find any info in the mailing list or Jira.
> > >
> > > &fq={!collapse field=string_field sort='TrieDoubleField asc'}
> > >
> > > let me know if you faced something similar
> > >
> > > Cheers
> > >
> > > On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti <
> > > abenede...@apache.org> wrote:
> > >
> > > > I'm planning a migration from 4.10.2 to 6.0 .
> > > > Because we generate the index on daily basis from scratch, we don't
> > need
> > > > to migrate the index but actually only migrate the server instances.
> > > > With my team we were doing some experiments on some dev machines,
> > > > basically comparing Solr 4.10.2 and Solr 6.0 to check any functional
> > and
> > > > performanc

Streaming expression not hitting all replicas?

2016-05-23 Thread Timothy Potter

I've seen docs and diagrams that seem to indicate a streaming
expression can utilize all replicas of a shard but I'm seeing only 1
replica per shard (I have 2) being queried.

All replicas are on the same host for my experimentation, could that
be the issue? What are the circumstances where all replicas will be
utilized?

Or is this a mis-understanding of the docs?

Re: How to use "fq"

2016-05-23 Thread Steven White

Thank you Erik and Scott.  {!terms} did the job!!  I tested like so:
fq={!terms f=category}1,2,3,4,...N

I read that {!terms} treats the terms in the list as OR, if I have a need
to force AND on my terms, how do I do that?

Steve


On Mon, May 23, 2016 at 9:39 AM, Scott Chu  wrote:

>
> Yonik has a very well article about term qp:
>
> Solr Terms Query for matching many terms - Solr 'n Stuff
> http://yonik.com/solr-terms-query/
>
>
> Scott Chu，scott@udngroup.com
> 2016/5/23 (週一)
> - Original Message -
> From: Erik Hatcher
> To: solr-user
> CC:
> Date: 2016/5/23 (週一) 21:14
> Subject: Re: How to use "fq"
>
>
> Try the {!terms} query parser. That should make it work well for you. Let
> us know how it does.
>
>Erik
>
> > On May 23, 2016, at 08:52, Steven White  wrote:
> >
> > Hi everyone,
> >
> > I'm trying to figure out what's the best way for me to use "fq" when the
>
> > list of items is large (up to 200, but I have few cases with up to 1000).
> >
> > My current usage is like so: &fq=category:(1 OR 2 OR 3 OR 4 ... 200)
> >
> > When I tested with up to 1000, I hit the "too many boolean clauses", so
> my
> > fix was to increase the value of maxBooleanClauses. However, reading [1]
> > warns that increasing the value of maxBooleanClauses has negative impact.
> > The link offers an alternative usage like so:
> > fq=category:1&fq=category:2... But I cannot use it because I need my "fq"
> > to be treated as OR (my default is set to AND).
> >
> > I'm trying to understand what's the best way for me to coded this so I
> > don't get a performance or memory hit.
> >
> > Thanks
> >
> > Steve
> >
> > [1]
> >
> http://solr.pl/en/2011/12/19/do-i-have-to-look-for-maxbooleanclauses-when-using-filters/
>
>
> -
> ???
> ??? AVG ?? - www.avg.com
> ??: 2015.0.6201 / ???: 4568/12281 - : 05/23/16
>

Solr 6.0 Parallel SQL

2016-05-23 Thread Steven White

Hi everyone,

I'm reading on Solr's Parallel SQL.  I see some good examples but not much
on how to set it up and what are the limitations.  My reading on it is that
I can use Parallel SQL to send to Solr SQL syntax to search in Solr, but:

1) Does this mean all of SQL's query statements are supported, no matter
how complex?
2) If yes, doesn't this mean I have to index into Solr all of my tables in
the DB?
3) If yes, how do I go about indexing my tables into Solr (i.e.: don't I
have to map each table into a Solr document mapping each column into a Solr
field, and what about the data-types)?

Thanks.

Steve

Re: How to stop searches to solr while full data import is going in SOLR

2016-05-23 Thread Jeff Wartes

The PingRequestHandler contains support for a file check, which allows you to 
control whether the ping request succeeds based on the presence/absence of a 
file on disk on the node.

http://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/handler/PingRequestHandler.html

I suppose you could try using this to configure a load balancer.


On 5/20/16, 3:21 PM, "Erick Erickson"  wrote:

>There really isn't any good way to do this built in that I know of.
>
>If your "clusters" are separate Solr collections (SolrCloud), you
>can use collection aliasing to point queries at one or the other
>atomically.
>
>This presumes you have some control over when DIH runs
>however. The idea is that you have an alias that you use for
>searching. You switch this alias to the "cold" cluster (the one
>that's not changing) and trigger a DIH run to to the "hot" cluster.
>Once that's done, change the alias to point to it or, perhaps,
>both.
>
>Best,
>Erick
>
>On Wed, May 18, 2016 at 11:27 PM, preeti kumari  wrote:
>> Hi,
>>
>> I am using solr 5.2.1. I have two clusters Primary A and Primary B.
>> I was pinging servers to check whether they are up or not to route the
>> searches to working cluster A or B.
>>
>> But while I am running Full data import in Primary cluster A , There is not
>> all the data and pinging servers will not help as my solr servers would be
>> responding.
>>
>> But I want my searches to go to Cluster B instead of A.
>>
>> Please help me with a way from solr which can say solr not ready to support
>> searches as full data import is running there.
>>
>> Thanks
>> Preeti

Re: SolrCloud increase replication factor

2016-05-23 Thread Jeff Wartes


https://github.com/whitepages/solrcloud_manager was designed to provide some 
easier operations for common kinds of cluster operation. 
It hasn’t been tested with 6.0 though, so if you try it, please let me know 
your experience.


On 5/23/16, 6:28 AM, "Tom Evans"  wrote:

>On Mon, May 23, 2016 at 10:37 AM, Hendrik Haddorp
> wrote:
>> Hi,
>>
>> I have a SolrCloud 6.0 setup and created my collection with a
>> replication factor of 1. Now I want to increase the replication factor
>> but would like the replicas for the same shard to be on different nodes,
>> so that my collection does not fail when one node fails. I tried two
>> approaches so far:
>>
>> 1) When I use the collections API with the MODIFYCOLLECTION action [1] I
>> can set the replication factor but that did not result in the creation
>> of additional replicas. The Solr Admin UI showed that my replication
>> factor changed but otherwise nothing happened. A reload of the
>> collection did also result in no change.
>>
>> 2) Using the ADDREPLICA action [2] from the collections API I have to
>> add the replicas to the shard individually, which is a bit more
>> complicated but otherwise worked. During testing this did however at
>> least once result in the replica being created on the same node. My
>> collection was split in 4 shards and for 2 of them all replicas ended up
>> on the same node.
>>
>> So is the only option to create the replicas manually and also pick the
>> nodes manually or is the perceived behavior wrong?
>>
>> regards,
>> Hendrik
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-modifycoll
>> [2]
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>
>
>With ADDREPLICA, you can specify the node to create the replica on. If
>you are using a script to increase/remove replicas, you can simply
>incorporate the logic you desire in to your script - you can also use
>CLUSTERSTATUS to get a list of nodes/collections/shards etc in order
>to inform the logic in the script. This is the approach we took, we
>have a fabric script to add/remove extra nodes to/from the cluster, it
>works well.
>
>The alternative is to put the logic in to Solr itself, using what Solr
>calls a "snitch" to define the rules on where replicas are created.
>The snitch is specified at collection creation time, or you can use
>MODIFYCOLLECTION to set it after the fact. See this wiki patch for
>details:
>
>https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
>
>Cheers
>
>Tom

Re: How to use "fq"

2016-05-23 Thread Erick Erickson

Steven:

I'm not sure you can, the terms query parser is built to
OR things together.

You might be able to use some of the nested query stuff.
Or, assuming you have an _additional_ fq clause
you want to use just use it as:
fq={!terms f=category}1,2,3,4,...N&fq=whaterver

Then you're taking advantage of the default
behavior of multiple "fq" clauses.

And I'd do one other thing, I'd add "cache=false" (at least I'm
pretty sure this works):
fq={!terms f=category cache=false}1,2,3,4,...N
on the assumption that it's highly unlikely that you'll send in
another fq clause _exactly_ like the one you're creating.

Best,
Erick

On Mon, May 23, 2016 at 9:41 AM, Steven White  wrote:
> Thank you Erik and Scott.  {!terms} did the job!!  I tested like so:
> fq={!terms f=category}1,2,3,4,...N
>
> I read that {!terms} treats the terms in the list as OR, if I have a need
> to force AND on my terms, how do I do that?
>
> Steve
>
>
> On Mon, May 23, 2016 at 9:39 AM, Scott Chu  wrote:
>
>>
>> Yonik has a very well article about term qp:
>>
>> Solr Terms Query for matching many terms - Solr 'n Stuff
>> http://yonik.com/solr-terms-query/
>>
>>
>> Scott Chu，scott@udngroup.com
>> 2016/5/23 (週一)
>> - Original Message -
>> From: Erik Hatcher
>> To: solr-user
>> CC:
>> Date: 2016/5/23 (週一) 21:14
>> Subject: Re: How to use "fq"
>>
>>
>> Try the {!terms} query parser. That should make it work well for you. Let
>> us know how it does.
>>
>>Erik
>>
>> > On May 23, 2016, at 08:52, Steven White  wrote:
>> >
>> > Hi everyone,
>> >
>> > I'm trying to figure out what's the best way for me to use "fq" when the
>>
>> > list of items is large (up to 200, but I have few cases with up to 1000).
>> >
>> > My current usage is like so: &fq=category:(1 OR 2 OR 3 OR 4 ... 200)
>> >
>> > When I tested with up to 1000, I hit the "too many boolean clauses", so
>> my
>> > fix was to increase the value of maxBooleanClauses. However, reading [1]
>> > warns that increasing the value of maxBooleanClauses has negative impact.
>> > The link offers an alternative usage like so:
>> > fq=category:1&fq=category:2... But I cannot use it because I need my "fq"
>> > to be treated as OR (my default is set to AND).
>> >
>> > I'm trying to understand what's the best way for me to coded this so I
>> > don't get a performance or memory hit.
>> >
>> > Thanks
>> >
>> > Steve
>> >
>> > [1]
>> >
>> http://solr.pl/en/2011/12/19/do-i-have-to-look-for-maxbooleanclauses-when-using-filters/
>>
>>
>> -
>> ???
>> ??? AVG ?? - www.avg.com
>> ??: 2015.0.6201 / ???: 4568/12281 - : 05/23/16
>>

Re: Atomic updates and "stored"

2016-05-23 Thread Erick Erickson

Yes, currently when using Atomic updates _all_ fields
have to be stored, except the _destinations_ of copyField
directives.

Yes, it will make your index bigger. The affects on speed are
probably minimal though. The stored data is in your *.fdt and
*.fdx segments files and are not referenced only to pull
the top N docs back, they're not referenced for _search_ at all.

Coming Real Soon will be updateable DocValues, which may
be what you really need.

Best,
Erick

On Mon, May 23, 2016 at 6:13 AM, Mark Robinson  wrote:
> Hi,
>
> I have some 150 fields in my schema out of which about 100 are dynamic
> fields which I am not storing (stored="false").
> In case I need to do an atomic update to one or two fields which belong to
> the stored list of fields, do I need to change my dynamic fields (100 or so
> now not "stored") to stored="true"?
>
> If so wouldn't it considerably increase index size and affect performance
> in the negative?
>
> Is there any way currently to do partial/ atomic updates to one or two
> fields (which I will make stored="true") without having to make my now
> stored="false" fields to stored="true" just
> to accommodate atomic updates.
>
> Could some one pls give your suggestions.
>
> Thanks!
> Mark.

Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-23 Thread Abdel Belkasri

That would be a welcomed feature for sure!

On Mon, May 23, 2016 at 6:11 AM, Horváth Péter Gergely <
peter.gergely.horv...@gmail.com> wrote:

> Hi Steve,
>
> Thank you very much for your inputs. Yes, I do know the aliasing mechanism
> offered in Solr. I think the whole question boils down to one thing: how
> much do you know about the data being stored -- and sometimes you know
> nothing about that.
>
> In some cases, you have to provide a generic solution for users to store
> and query their own data. With Solr / Lucene backing your storage, you can
> easily expose a restricted (but still powerful) subset of the Solr / Lucene
> query syntax for querying user-defined data. Things however would start
> getting complicated if you have to tell your customers, that the field you
> loaded as "foo" must be referred as "foo_s" and the field you loaded "bar"
> must be referred  as "bar_i", since it contains a number and so on...
> Implementing the mapping in your application would be overly complex, as
> you would have to maintain a mapping between the internal representation
> ("foo_s") and the query interface ("foo") and alias results from the
> internal format to the format visible to the user ("foo_s" --> "foo"). I
> think you get the idea.
>
> I like the way Solr can use the name for specifying type: having a
> configuration option (either global or at collection level), which tells
> Solr to handle type postfixes slightly differently and strip the type
> prefix automatically would be perfectly enough for this use-case.
>
> Imagine the following approach: if configured so, Solr would still create
> the field based on the type postfix, but would strip it from the name: for
> example, if a document is inserted with the field "foo_s" and "bar_i", Solr
> could create a string field named "foo" and a numeric field "bar".
>
> I think this solution would be both backwards compatible (has to be
> explicitly enabled) and relatively simple to implement in the Solr code
> base. I have created a Jira issue for the feature request:
> https://issues.apache.org/jira/browse/SOLR-9150
>
> What do you think?
>
> Thanks,
> Peter
>
>
>
> 2016-05-19 15:30 GMT+02:00 Steve Rowe :
>
> > Peter,
> >
> > It’s an interesting idea.  Could you make a Solr JIRA?
> >
> > I don’t know where the field type specification would go, but providing a
> > mechanism to specify field type for previously non-existent fields,
> outside
> > of the field names themselves, seems useful.
> >
> > In the meantime, do you know about field aliasing?
> >
> > 1. You can get results back that rename fields to whatever you want: see
> > the section “Field Name Aliases” here: <
> > https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters
> >.
> >
> > 2. On the query side, eDisMax can perform aliasing so that user-specified
> > field names in queries get mapped to one or more indexed fields: look for
> > “alias” in <
> >
> https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
> > >.
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> > > On May 19, 2016, at 4:43 AM, Horváth Péter Gergely <
> > peter.gergely.horv...@gmail.com> wrote:
> > >
> > > Hi Steve,
> > >
> > > Yes, I know the schema API, however I do not want to specify the field
> > type
> > > problematically for every single field.
> > >
> > > I would like to be able to specify the field type when it is being
> added
> > > (similar to the name postfixes, but without affecting the field names).
> > >
> > > Thanks,
> > > Peter
> > >
> > >
> > > 2016-05-17 17:08 GMT+02:00 Steve Rowe :
> > >
> > >> Hi Peter,
> > >>
> > >> Are you familiar with the Schema API?: <
> > >> https://cwiki.apache.org/confluence/display/solr/Schema+API>
> > >>
> > >> You can use it to create fields, field types, etc. prior to ingesting
> > your
> > >> data.
> > >>
> > >> --
> > >> Steve
> > >> www.lucidworks.com
> > >>
> > >>> On May 17, 2016, at 11:05 AM, Horváth Péter Gergely <
> > >> peter.gergely.horv...@gmail.com> wrote:
> > >>>
> > >>> Hi All,
> > >>>
> > >>> By default Solr allows you to define the type of a dynamic field by
> > >>> appending a post-fix to the name itself. E.g. creating a color_s
> field
> > >>> instructs Solr to create a string field. My understanding is that if
> we
> > >> do
> > >>> this, all queries must refer the post-fixed field name as well. So
> > >>> instead of a query like color:"red", we will have to write something
> > like
> > >>> color_s:"red" -- and so on for other field types as well.
> > >>>
> > >>> I am wondering if it is possible to specify the data type used for a
> > >> field
> > >>> in Solr 6.0.0, without having to modify the field name. (Or at least
> > in a
> > >>> way that would allow us to use the original field name) Do you have
> any
> > >>> idea, how to achieve this? I am fine, if we have to specify the field
> > >> type
> > >>> during the insertion of a document, however, I do not want to keep
> > using
> > >>> post-fixes while running querie

Re: How to use "fq"

2016-05-23 Thread Yonik Seeley

On Mon, May 23, 2016 at 12:41 PM, Steven White  wrote:
> Thank you Erik and Scott.  {!terms} did the job!!  I tested like so:
> fq={!terms f=category}1,2,3,4,...N
>
> I read that {!terms} treats the terms in the list as OR, if I have a need
> to force AND on my terms, how do I do that?

While ORing a thousand terms seems to be common (categories, unique ids, etc)
I haven't really seen that use-case with AND.  I'm curious what your
use case is.

-Yonik

Re: Auto Suggestion in solr

2016-05-23 Thread Erick Erickson

Have you seen:
https://lucidworks.com/blog/2015/03/04/solr-suggester/

Best,
Erick

On Sun, May 22, 2016 at 10:07 PM, Mugeesh Husain  wrote:
> Hello everyone,
>
> I am looking for some suggestion for auto-suggest like imdb.com.
>
> just type "samp" in search box in imdb.com site.
>
> results are returned
> 1. Based on popularity
> 2. Even results with space are shown give. Example: "Sam Page"
>
> So, I am looking for this kind of auto suggestion, please suggest me about
> this requirement.
>
>
> Thanks
> Mugeesh
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Auto-Suggestion-in-solr-tp4278458.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud increase replication factor

2016-05-23 Thread Hendrik Haddorp

What I find odd is that creating a collection with a replication factor
greater then 1 does seem to not end up with replicas on the same node.
However when one wants to add replicas later on one need to do the whole
placement manually to avoid single point of failures.

On 23/05/16 15:28, Tom Evans wrote:
> On Mon, May 23, 2016 at 10:37 AM, Hendrik Haddorp
>  wrote:
>> Hi,
>>
>> I have a SolrCloud 6.0 setup and created my collection with a
>> replication factor of 1. Now I want to increase the replication factor
>> but would like the replicas for the same shard to be on different nodes,
>> so that my collection does not fail when one node fails. I tried two
>> approaches so far:
>>
>> 1) When I use the collections API with the MODIFYCOLLECTION action [1] I
>> can set the replication factor but that did not result in the creation
>> of additional replicas. The Solr Admin UI showed that my replication
>> factor changed but otherwise nothing happened. A reload of the
>> collection did also result in no change.
>>
>> 2) Using the ADDREPLICA action [2] from the collections API I have to
>> add the replicas to the shard individually, which is a bit more
>> complicated but otherwise worked. During testing this did however at
>> least once result in the replica being created on the same node. My
>> collection was split in 4 shards and for 2 of them all replicas ended up
>> on the same node.
>>
>> So is the only option to create the replicas manually and also pick the
>> nodes manually or is the perceived behavior wrong?
>>
>> regards,
>> Hendrik
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-modifycoll
>> [2]
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>
> With ADDREPLICA, you can specify the node to create the replica on. If
> you are using a script to increase/remove replicas, you can simply
> incorporate the logic you desire in to your script - you can also use
> CLUSTERSTATUS to get a list of nodes/collections/shards etc in order
> to inform the logic in the script. This is the approach we took, we
> have a fabric script to add/remove extra nodes to/from the cluster, it
> works well.
>
> The alternative is to put the logic in to Solr itself, using what Solr
> calls a "snitch" to define the rules on where replicas are created.
> The snitch is specified at collection creation time, or you can use
> MODIFYCOLLECTION to set it after the fact. See this wiki patch for
> details:
>
> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
>
> Cheers
>
> Tom

Re: [Solr 6] Migration from Solr 4.10.2

2016-05-23 Thread Alessandro Benedetti

Furthermore I was checking the internals of the old facet implementation (
which comes when using the classic request parameter based,  instead of the
json facet). It seems that if you enable docValues even with the enun
method passed as parameter , actually fc with docValues will be used.
i will give some report on the performance we get with docValues.

Cheers
On 23 May 2016 16:29, "Joel Bernstein"  wrote:

> If you can make min/max work for you instead of sort then it should be
> faster, but I haven't spent time comparing the performance.
>
> But if you're using the top_fc with the min/max param the performance
> between Solr 4 & Solr 6 should be very close as the data structures behind
> them are the same.
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 23, 2016 at 3:34 PM, Alessandro Benedetti <
> abenede...@apache.org
> > wrote:
>
> > Hi Joel,
> > thanks for the reply, actually we were not using field collapsing before,
> > we basically want to replace grouping with that.
> > The grouping performance between Solr 4 and 6 are basically comparable.
> > It's surprising I got so big degradation with the field collapsing.
> >
> > So basically the comparison we did were based on the Solr4 queries ,
> > extracted from logs, and modified slightly to include field collapsing
> > parameter.
> >
> > To build the tests to compare Solr 4.10.2 to Solr 6 we basically
> proceeded
> > in this way :
> >
> > 1) install Solr 4.10.2 and Solr 6.0.0
> > 2) migrate the index with the related lucene tool ( 4.10.2 -> 5.5.0 ->
> Solr
> > 6.0 )
> > 3) switch on/off the 2 instances and repeating the tests both with cold
> > instances and warm instances.
> >
> > This means that the query looks the same.
> > I have not double checked the results but only the timings.
> > I will provide additional feedback to see if the query are producing
> > comparable results as well.
> >
> > Related your suggestion about the top_fc, thanks, I will try that .
> > I actually discovered that a little bit after I posted the mailing list
> ( I
> > think exactly from another post of yours :) )
> >
> > Not sure if setting up docValues for the field we use to collapse could
> > give some benefit as well.
> >
> > I keep you updated,
> >
> > Cheers
> >
> > On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein 
> > wrote:
> >
> > > Were you using the sort param or min/max param in Solr 4 to select the
> > > group head? The sort work came later and I'm not sure how it compares
> in
> > > performance to the min/max param.
> > >
> > > Since you are collapsing on a string field you can use the top_fc hint
> > > which will use a top level field cache for the collapse. This is faster
> > at
> > > query time then the default which uses MultiDocValue ordinal map.
> > >
> > > The docs cover the top_fc hint.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
> > >
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti <
> > > abenede...@apache.org> wrote:
> > >
> > > > Let's add some additional details guys :
> > > >
> > > > 1) *Faceting*
> > > > Currently the facet method used is "enum" and it runs over 20 fields
> > more
> > > > or less.
> > > > Mainly using it on low cardinality fields except one which has a
> > > > cardinality of 1000 terms.
> > > > I am aware of the famous Jira related faceting regression :
> > > > https://issues.apache.org/jira/browse/SOLR-8096 .
> > > >
> > > > Our index is indeed quite static ( we index once per day) and the
> > fields
> > > we
> > > > facet on are multi-valued ( by schema definition but not in
> practise) .
> > > > But we use Term Enum as method so i was not expecting to hit the
> > > > regression.
> > > > We currently see  query times which are 30% worse than Solr 4.10.2 .
> > > > Our next experiment will be to enable docValues for all the fields
> and
> > > > verify if we get any benefit ( switching the facet method to fc) .
> > > > At the moment, switching to json faceting is not an option as we
> would
> > > like
> > > > first to proceed with a transparent migration and then possibly add
> > > > improvements and refactor in the future.
> > > > Following will be to fix the schema to set as multi valued only what
> is
> > > > really multi-valued ( do you know if this can affect ? the wrong
> schema
> > > > definition is enough to mess up the facet performance ? even if then
> > the
> > > > fields are single valued ?)
> > > >
> > > >
> > > > 2) *Field Collapsing*
> > > > Field collapsing performance seems much, much worse, something like
> 200
> > > ms
> > > > ( Solr 4) vs 1800 ms ( Solr 6) .
> > > > This is suprising as I never heard about any regression in field
> > > > collapsing.
> > > > I will investigate a little bit more in details about the internals
> of
> > > the
> > > > field collapsing and why the performance could be so degraded.
> > > > I will also verify if I fi

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Erick Erickson

I _think_ this is a distinction between
serving the query and processing the results. The
query is the standard Solr processing returning
results from one replica per shard.

Those results can be partitioned out to N Solr instances
for sub-processing, where N is  however many worker
nodes you specified that may or may not be host
to any replicas of that collection.

At least I think that's what's up, but then again this is
new to me too.

Which bits of the doc anyway? Sounds like some
clarification is in order.

Best,
Erick

On Mon, May 23, 2016 at 9:32 AM, Timothy Potter  wrote:
> I've seen docs and diagrams that seem to indicate a streaming
> expression can utilize all replicas of a shard but I'm seeing only 1
> replica per shard (I have 2) being queried.
>
> All replicas are on the same host for my experimentation, could that
> be the issue? What are the circumstances where all replicas will be
> utilized?
>
> Or is this a mis-understanding of the docs?

Re: What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

2016-05-23 Thread Erick Erickson

Take a look at the SPLITSHARD Collections API here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3

Best value of numShards and replicationFactor: Impossible to say. You have
to stress test respecting your SLAs. See:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

But there's _no_ reason to split your shard if you are getting adequate
response times to queries. In fact, going to more than one shard
will possibly slow your query response as distributed queries add
inevitable overhead.

If you simply want to add more replicas to increase the QPS rate you
can handle. just bring up your Solr nodes and use the Collections API
ADDREPLICA command to your single shard.

Best,
Erick

On Mon, May 23, 2016 at 7:52 AM, Scott Chu  wrote:
> I just created a 90gb index collection with 1 shard and 2 replicas on 2 
> nodes. I am to migrate from 2 nodes to 4 node. I am wondering what's the best 
> stragedy to split this single shard? Furthermore, If I am ok to reindex, 
> what's the best adequate experienced value of numShards and 
> replicationFactor? Lastly, I think there's no other way but reindex if I want 
> my data to be evenly distributed into every shard I create, right?
>
> Scott Chu，scott@udngroup.com
> 2016/5/23 (週一)
>
> P.S. For those who are curious of why I add [scottchu] in subject, the reason 
> is that I want my email filter to route those emails that answer to my 
> question to specific folder.

Re: Solr 6.0 Parallel SQL

2016-05-23 Thread Joel Bernstein

The docs describe the current capabilities. So if it's not in the docs,
it's not supported yet. For example the docs don't mention joins or
intersections and they are not supported. Another example is that select
count(*) is supported, and select distinct is supported, but select
count(distinct) is not yet supported. So, give the docs a close read to
understand the current capabilities.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 23, 2016 at 5:54 PM, Steven White  wrote:

> Hi everyone,
>
> I'm reading on Solr's Parallel SQL.  I see some good examples but not much
> on how to set it up and what are the limitations.  My reading on it is that
> I can use Parallel SQL to send to Solr SQL syntax to search in Solr, but:
>
> 1) Does this mean all of SQL's query statements are supported, no matter
> how complex?
> 2) If yes, doesn't this mean I have to index into Solr all of my tables in
> the DB?
> 3) If yes, how do I go about indexing my tables into Solr (i.e.: don't I
> have to map each table into a Solr document mapping each column into a Solr
> field, and what about the data-types)?
>
> Thanks.
>
> Steve
>

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Timothy Potter

This image from the wiki kind of gives that impression to me:

https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2

On Mon, May 23, 2016 at 11:50 AM, Erick Erickson
 wrote:
> I _think_ this is a distinction between
> serving the query and processing the results. The
> query is the standard Solr processing returning
> results from one replica per shard.
>
> Those results can be partitioned out to N Solr instances
> for sub-processing, where N is  however many worker
> nodes you specified that may or may not be host
> to any replicas of that collection.
>
> At least I think that's what's up, but then again this is
> new to me too.
>
> Which bits of the doc anyway? Sounds like some
> clarification is in order.
>
> Best,
> Erick
>
> On Mon, May 23, 2016 at 9:32 AM, Timothy Potter  wrote:
>> I've seen docs and diagrams that seem to indicate a streaming
>> expression can utilize all replicas of a shard but I'm seeing only 1
>> replica per shard (I have 2) being queried.
>>
>> All replicas are on the same host for my experimentation, could that
>> be the issue? What are the circumstances where all replicas will be
>> utilized?
>>
>> Or is this a mis-understanding of the docs?

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Joel Bernstein

The image is the correct flow. Are you using workers?



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 23, 2016 at 7:16 PM, Timothy Potter 
wrote:

> This image from the wiki kind of gives that impression to me:
>
>
> https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2
>
> On Mon, May 23, 2016 at 11:50 AM, Erick Erickson
>  wrote:
> > I _think_ this is a distinction between
> > serving the query and processing the results. The
> > query is the standard Solr processing returning
> > results from one replica per shard.
> >
> > Those results can be partitioned out to N Solr instances
> > for sub-processing, where N is  however many worker
> > nodes you specified that may or may not be host
> > to any replicas of that collection.
> >
> > At least I think that's what's up, but then again this is
> > new to me too.
> >
> > Which bits of the doc anyway? Sounds like some
> > clarification is in order.
> >
> > Best,
> > Erick
> >
> > On Mon, May 23, 2016 at 9:32 AM, Timothy Potter 
> wrote:
> >> I've seen docs and diagrams that seem to indicate a streaming
> >> expression can utilize all replicas of a shard but I'm seeing only 1
> >> replica per shard (I have 2) being queried.
> >>
> >> All replicas are on the same host for my experimentation, could that
> >> be the issue? What are the circumstances where all replicas will be
> >> utilized?
> >>
> >> Or is this a mis-understanding of the docs?
>

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Joel Bernstein

Streaming expressions will utilize all replicas of a cluster when the
number of workers >= the number of replicas.

For example if there are 40 workers and 40 shards and 5 replicas.

For a single parallel request:

Each worker will send 1 query to a random replica in each shard. This is
1600 hundreds requests. The 1600 requests will be spread evenly across all
200 nodes in the cluster, with each node handling 8 requests. Each request
will return 1/1600 of the result set.

If you add another row of replicas the 1600 hundred requests will be
handled by 240 nodes.

-

In streaming expressions you use the parallel function to send requests to
workers.

In SQL you specify aggregationMode=map_reduce and workers=X. The SQL
interface only goes into parallel mode for GROUP BY and SELECT DISTINCT
queries.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 23, 2016 at 7:17 PM, Joel Bernstein  wrote:

> The image is the correct flow. Are you using workers?
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 23, 2016 at 7:16 PM, Timothy Potter 
> wrote:
>
>> This image from the wiki kind of gives that impression to me:
>>
>>
>> https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2
>>
>> On Mon, May 23, 2016 at 11:50 AM, Erick Erickson
>>  wrote:
>> > I _think_ this is a distinction between
>> > serving the query and processing the results. The
>> > query is the standard Solr processing returning
>> > results from one replica per shard.
>> >
>> > Those results can be partitioned out to N Solr instances
>> > for sub-processing, where N is  however many worker
>> > nodes you specified that may or may not be host
>> > to any replicas of that collection.
>> >
>> > At least I think that's what's up, but then again this is
>> > new to me too.
>> >
>> > Which bits of the doc anyway? Sounds like some
>> > clarification is in order.
>> >
>> > Best,
>> > Erick
>> >
>> > On Mon, May 23, 2016 at 9:32 AM, Timothy Potter 
>> wrote:
>> >> I've seen docs and diagrams that seem to indicate a streaming
>> >> expression can utilize all replicas of a shard but I'm seeing only 1
>> >> replica per shard (I have 2) being queried.
>> >>
>> >> All replicas are on the same host for my experimentation, could that
>> >> be the issue? What are the circumstances where all replicas will be
>> >> utilized?
>> >>
>> >> Or is this a mis-understanding of the docs?
>>
>
>

Re: Commit (hard) at shutdown?

2016-05-23 Thread Per Steffensen

Sorry, I did not see the responses here because I found out myself. I 
definitely seems like a hard commit it performed when shutting down 
gracefully. The info I got from production was wrong.
It is not necessarily obvious that you will loose data on "kill -9". The 
tlog ought to save you, but it probably not 100% bulletproof.

We are not using the bin/solr script (yet)

On 21/05/16 04:02, Shawn Heisey wrote:

On 5/20/2016 2:51 PM, Jon Drews wrote:

I would be interested in an answer to this question.

 From my research it looks like it will do a hard commit if cleanly shut
down. However if you "kill -9" it you'll loose data (obviously). Perhaps
production isn't cleanly shutting down solr?
https://dzone.com/articles/understanding-solr-soft

I do not know whether a graceful shutdown does a hard commit or not.

I do know that all versions of Solr that utilize the bin/solr script are
configured by default to forcibly kill Solr only five seconds after the
graceful shutdown is requested.  Five seconds is usually not enough time
for production installs, so it needs to be increased.  The only way to
do this currently is to edit the bin/solr script directly.

Thanks,
Shawn

Re: Solr cloud with Grouping query gives inconsistent results

2016-05-23 Thread Jeff Wartes

My first thought is that you haven’t indexed such that all values of the field 
you’re grouping on are found in the same cores.

See the end of the article here: (Distributed Result Grouping Caveats)
https://cwiki.apache.org/confluence/display/solr/Result+Grouping

And the “Document Routing” section here:
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

If I’m right, you haven’t used the “amid” field as part of your doc routing 
policy.



On 5/23/16, 3:57 AM, "preeti kumari"  wrote:

>Hi All,
>
>I am using grouping query with solr cloud version 5.2.1 .
>Parameters added in my query is
>&q=SIM*group=true&group.field=amid&group.limit=1&group.main=true. But each
>time I hit the query i get different results i.e top 10 results are
>different each time.
>
>Why is it so ? Please help me with this.
>Is there any way by which I can get consistent results from grouping query
>in solr cloud.
>
>Thanks
>Preeti

Re: highlight don't work if df not specified

2016-05-23 Thread Ahmet Arslan

Hi Solomon,

How come 
hl.q=blah blah&hl.fl=normal_text,title 
would produce "undefined field text" error message?

Please try 
hl.q=blah blah&hl.fl=normal_text,title
just to verify there is a problem with the fielded queries.

Ahmet

On Monday, May 23, 2016 10:31 AM, michael solomon  wrote:
Hi,
When I'm increase hl.maxAnalyzedChars nothing happened.

AND

hl.q=blah blah&hl.fl=normal_text,title
I get:

"error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"undefined field text",
"code":400}}




On Sun, May 22, 2016 at 5:34 PM, Ahmet Arslan 
wrote:

> Hi,
>
> What happens when you increase hl.maxAnalyzedChars?
>
> OR
>
> hl.q=blah blah&hl.fl=normal_text,title
>
> Ahmet
>
>
>
> On Sunday, May 22, 2016 5:24 PM, michael solomon 
> wrote:
>  "true" stored="true"/>
> 
>
>
> On Sun, May 22, 2016 at 5:18 PM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > Weird, are your fields stored?
> >
> >
> >
> > On Sunday, May 22, 2016 5:14 PM, michael solomon 
> > wrote:
> > Thanks Ahmet,
> > It was mistake in the question, sorry, in the quey I wrote it properly.
> >
> >
> > On Sun, May 22, 2016 at 5:06 PM, Ahmet Arslan  >
> > wrote:
> >
> > > Hi,
> > >
> > > q=normal_text:"bla bla"&title:"bla bla"
> > >
> > > should be
> > > q=+normal_text:"bla bla" +title:"bla bla"
> > >
> > >
> > >
> > > On Sunday, May 22, 2016 4:52 PM, michael solomon  >
> > > wrote:
> > > Hi,
> > > I'm I query multiple fields in solr:
> > > q=normal_text:"bla bla"&title:"bla bla" 
> > >
> > > I turn on the highlighting, but it doesn't work even when I fill hl.fl.
> > > it work when I fill df(default field) parameter, but then it's
> highlights
> > > only one field.
> > > What the problem?
> > > Thanks,
> > > michael
> > >
> >
>

Solr mysql Json import

2016-05-23 Thread vsriram30

Hi All,

I am having an use case where I want to index a json field from mysql into
solr. The Json field will contain entries as key value pairs. The Json can
be nested, but I want to index only the first level field value pairs of
Jsons into solr keys and nested levels can be present as value of
corresponding field in solr. 

Eg) 

{  
   "k1":"value1",
   "k2":"value2",
   "k3":{  
  "f1":"fv1",
  "f2":"fv2"
   },
   "k4":[  
  "v1",
  "v2",
  "v3",
  "v4"
   ]
}

The above Json is present as value for a mysql field. Along with this field,
I have few other fields in mysql like id, timestamp etc.

Considering this, can I import this data from mysql to solr and map fields
like,

Mysql_fields => solr_fields
id => id
timestamp => timestamp
k1 => k1
k2 => k2
k3 => k3 ( Type "text" field and it will contain this Json
k4 => k4 ( Type "text" with multivalued=true)?

Can this be achieved? I have used simplistic data import to import data from
mysql to solr, not for these complicated use cases. Also I would like to use
that timestamp field for constructing my delta import query.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-mysql-Json-import-tp4278686.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Timothy Potter

Thanks Joel, that cleared things up nicely ... using 4 workers against
4 shards resulted in 16 queries to the collection. However, not all
replicas were used for all shards, so it's not as balanced as I
thought it would be, but we're dealing with small numbers of shards
and replicas here.

On Mon, May 23, 2016 at 12:58 PM, Joel Bernstein  wrote:
> Streaming expressions will utilize all replicas of a cluster when the
> number of workers >= the number of replicas.
>
> For example if there are 40 workers and 40 shards and 5 replicas.
>
> For a single parallel request:
>
> Each worker will send 1 query to a random replica in each shard. This is
> 1600 hundreds requests. The 1600 requests will be spread evenly across all
> 200 nodes in the cluster, with each node handling 8 requests. Each request
> will return 1/1600 of the result set.
>
> If you add another row of replicas the 1600 hundred requests will be
> handled by 240 nodes.
>
> -
>
> In streaming expressions you use the parallel function to send requests to
> workers.
>
> In SQL you specify aggregationMode=map_reduce and workers=X. The SQL
> interface only goes into parallel mode for GROUP BY and SELECT DISTINCT
> queries.
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 23, 2016 at 7:17 PM, Joel Bernstein  wrote:
>
>> The image is the correct flow. Are you using workers?
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, May 23, 2016 at 7:16 PM, Timothy Potter 
>> wrote:
>>
>>> This image from the wiki kind of gives that impression to me:
>>>
>>>
>>> https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2
>>>
>>> On Mon, May 23, 2016 at 11:50 AM, Erick Erickson
>>>  wrote:
>>> > I _think_ this is a distinction between
>>> > serving the query and processing the results. The
>>> > query is the standard Solr processing returning
>>> > results from one replica per shard.
>>> >
>>> > Those results can be partitioned out to N Solr instances
>>> > for sub-processing, where N is  however many worker
>>> > nodes you specified that may or may not be host
>>> > to any replicas of that collection.
>>> >
>>> > At least I think that's what's up, but then again this is
>>> > new to me too.
>>> >
>>> > Which bits of the doc anyway? Sounds like some
>>> > clarification is in order.
>>> >
>>> > Best,
>>> > Erick
>>> >
>>> > On Mon, May 23, 2016 at 9:32 AM, Timothy Potter 
>>> wrote:
>>> >> I've seen docs and diagrams that seem to indicate a streaming
>>> >> expression can utilize all replicas of a shard but I'm seeing only 1
>>> >> replica per shard (I have 2) being queried.
>>> >>
>>> >> All replicas are on the same host for my experimentation, could that
>>> >> be the issue? What are the circumstances where all replicas will be
>>> >> utilized?
>>> >>
>>> >> Or is this a mis-understanding of the docs?
>>>
>>
>>

Using solr with increasing complicated access control

2016-05-23 Thread Lisheng Zhang

Hi, i have been using solr for many years and it is VERY helpful.

My problem is that our app has an increasingly more complicated access
control to satisfy client's requirement, in solr/lucene  it means we need
to add more and more fields into each document and use more and more
complicated filter conditions, so code is hard to maintain and indexing
becomes a serious issue because we want to search as real time as possible.

I would appreciate a high level guidance on how to deal with this issue?
recently i investigated mySQL fulltext search (our app uses mySQL), using
mySQL means we simply reuse DB for access control, but mySQL fulltext
search performance is far from ideal compared to solr.

Thanks very much for helps, Lisheng

Re: Streaming expression not hitting all replicas?

2016-05-23 Thread Erick Erickson

Well, ya learn somethin' new every day

On Mon, May 23, 2016 at 4:31 PM, Timothy Potter  wrote:
> Thanks Joel, that cleared things up nicely ... using 4 workers against
> 4 shards resulted in 16 queries to the collection. However, not all
> replicas were used for all shards, so it's not as balanced as I
> thought it would be, but we're dealing with small numbers of shards
> and replicas here.
>
> On Mon, May 23, 2016 at 12:58 PM, Joel Bernstein  wrote:
>> Streaming expressions will utilize all replicas of a cluster when the
>> number of workers >= the number of replicas.
>>
>> For example if there are 40 workers and 40 shards and 5 replicas.
>>
>> For a single parallel request:
>>
>> Each worker will send 1 query to a random replica in each shard. This is
>> 1600 hundreds requests. The 1600 requests will be spread evenly across all
>> 200 nodes in the cluster, with each node handling 8 requests. Each request
>> will return 1/1600 of the result set.
>>
>> If you add another row of replicas the 1600 hundred requests will be
>> handled by 240 nodes.
>>
>> -
>>
>> In streaming expressions you use the parallel function to send requests to
>> workers.
>>
>> In SQL you specify aggregationMode=map_reduce and workers=X. The SQL
>> interface only goes into parallel mode for GROUP BY and SELECT DISTINCT
>> queries.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, May 23, 2016 at 7:17 PM, Joel Bernstein  wrote:
>>
>>> The image is the correct flow. Are you using workers?
>>>
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Mon, May 23, 2016 at 7:16 PM, Timothy Potter 
>>> wrote:
>>>
 This image from the wiki kind of gives that impression to me:


 https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2

 On Mon, May 23, 2016 at 11:50 AM, Erick Erickson
  wrote:
 > I _think_ this is a distinction between
 > serving the query and processing the results. The
 > query is the standard Solr processing returning
 > results from one replica per shard.
 >
 > Those results can be partitioned out to N Solr instances
 > for sub-processing, where N is  however many worker
 > nodes you specified that may or may not be host
 > to any replicas of that collection.
 >
 > At least I think that's what's up, but then again this is
 > new to me too.
 >
 > Which bits of the doc anyway? Sounds like some
 > clarification is in order.
 >
 > Best,
 > Erick
 >
 > On Mon, May 23, 2016 at 9:32 AM, Timothy Potter 
 wrote:
 >> I've seen docs and diagrams that seem to indicate a streaming
 >> expression can utilize all replicas of a shard but I'm seeing only 1
 >> replica per shard (I have 2) being queried.
 >>
 >> All replicas are on the same host for my experimentation, could that
 >> be the issue? What are the circumstances where all replicas will be
 >> utilized?
 >>
 >> Or is this a mis-understanding of the docs?

>>>
>>>

Re: Solr 6.0 Parallel SQL

2016-05-23 Thread Erick Erickson

For <2> and <3> well, yes. To do _anything_ in
Solr you need to index the data to Solr. It doesn't
magically reach out into the DB and do stuff.

<3> you can either use DIH or a SolrJ program
and yes, you do have to do some kind of mapping of
database columns into Solr documents

I want to caution you about this though. At first blush
you're talking about just transferring your DB to Solr
and pushing the search button. The ParallelSQL
stuff builds SQL capabilities over _search_ so I'd advise
spending some time thinking about how to leverage
the _search_ bits rather than simply treating Solr
as an RDBMS.

Best,
Erick

On Mon, May 23, 2016 at 11:06 AM, Joel Bernstein  wrote:
> The docs describe the current capabilities. So if it's not in the docs,
> it's not supported yet. For example the docs don't mention joins or
> intersections and they are not supported. Another example is that select
> count(*) is supported, and select distinct is supported, but select
> count(distinct) is not yet supported. So, give the docs a close read to
> understand the current capabilities.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 23, 2016 at 5:54 PM, Steven White  wrote:
>
>> Hi everyone,
>>
>> I'm reading on Solr's Parallel SQL.  I see some good examples but not much
>> on how to set it up and what are the limitations.  My reading on it is that
>> I can use Parallel SQL to send to Solr SQL syntax to search in Solr, but:
>>
>> 1) Does this mean all of SQL's query statements are supported, no matter
>> how complex?
>> 2) If yes, doesn't this mean I have to index into Solr all of my tables in
>> the DB?
>> 3) If yes, how do I go about indexing my tables into Solr (i.e.: don't I
>> have to map each table into a Solr document mapping each column into a Solr
>> field, and what about the data-types)?
>>
>> Thanks.
>>
>> Steve
>>

Re: Using solr with increasing complicated access control

2016-05-23 Thread Erick Erickson

I know this seems facetious, but Talk to your
clients about _why_ they want such increasingly
complex access requirements. Often the logic
is pretty flawed for the complexity. Things like
"allow user X to see document Y if they're part of
groups A, B, C but not D or E unless they are
also part of sub-group F and it's raining outside"...

If the rules _must_ be complicated, that's what
post-filters were actually invented for. Pretty often
I'll build in some "bailout" because whatever you
build has, eventually, to deal with the system
admin searching all documents, i.e. doing the
ACL calcs for every document.

Best,
Erick

On Mon, May 23, 2016 at 6:02 PM, Lisheng Zhang  wrote:
> Hi, i have been using solr for many years and it is VERY helpful.
>
> My problem is that our app has an increasingly more complicated access
> control to satisfy client's requirement, in solr/lucene  it means we need
> to add more and more fields into each document and use more and more
> complicated filter conditions, so code is hard to maintain and indexing
> becomes a serious issue because we want to search as real time as possible.
>
> I would appreciate a high level guidance on how to deal with this issue?
> recently i investigated mySQL fulltext search (our app uses mySQL), using
> mySQL means we simply reuse DB for access control, but mySQL fulltext
> search performance is far from ideal compared to solr.
>
> Thanks very much for helps, Lisheng

Re: How to use a regex search within a phrase query?

2016-05-23 Thread Erick Erickson

I'd play with the timeAllowed option with a full corpus to get a sense
of how painful these queries are. There's also the issue of the impact
of queries like this on other users to consider

Other than that, I think you're on the right path in terms of
supporting some common use-cases with special indexing. Personally
though I think you'll have a difficult time getting that all to work
with both index-time and query parsing.

Consider in your proposal what happens if you stick
WordDelimiterFilterFactory in the mix. Then you'd be indexing 'd' 'd'
'd' for "\d\d\d". Then there's the issue of pulling the "special"
supports out of a complex query. How do you even know the search is a
regex in the first place? I mean someone could have a document that
contains regex clauses and be searching for the literal '[0-9]{2}' ;)

For the specific cases you mentioned, you're probably better off just
making all digits '1' in your "special" field and training your
technical users accordingly. And/or just supporting a few regex
queries from the UI.

Actually, the first thing I'd do is just turn your technical users
loose with regexes and no special support except timeAllowed. If
anyone actually winds up _using_ regexes, _then_ build in special
support once you knew it was necessary.

Best,
Erick

On Mon, May 23, 2016 at 2:59 AM, Erez Michalak  wrote:
> Good points, thanks Erick.
>
> As you guessed, the use case is not in the main flow for the general user, 
> but an advanced flow for a technical one.
>
> Regarding the performance issue, I thought of a few optimizations for some 
> expected expressions I need to support.
> For instance, to walk around the digits regex in all my examples from the 
> mail below, I can simply index terms with '\d' instead of every digit (like 
> '\d\d\d' for '123').
> This enables a faster search as follows:
> * search for "\d\d\d" instead of "/[0-9]{3}/"
> * search for "\d\d\d \d\d\d\d" instead of "/[0-9]{3}/ /[0-9]{4}/"
> * search for "\d\d\d example" instead of "/[0-9]{3}/ example"
> Clearly, this approach supports very limited set of expressions in expense 
> for an increase in the index size.
> For the general case, though, regular expressions may indeed require a full 
> index scan. Seems like all I can do in that case is to warn the user in 
> advance that this may take a (long) while.
>
> Any further ideas on how to reduce the performance hit and survive the bad 
> impact of a full index scan are welcomed..
> Erez
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, May 22, 2016 7:43 PM
> To: solr-user 
> Subject: Re: How to use a regex search within a phrase query?
>
> Erez:
>
> Before going too far down this path, understand that even if you can get this 
> syntax to work, you're going to pay a _very_ significant performance hit if 
> you have any decent size corpus. Conceptually, what happens is that all the 
> terms that the regex matches are made into clauses. So let's take a very 
> simple wildcard case:
>
> field1 has two values f1A and f1B
> field2 has two values, f2A and f2B
>
> The result of asking for "field1:f1? field2:f2?" (as a phrase) is "field1:f1A 
> field2:f2A"
> OR
> "field1:f1A field2:f2B"
> OR
> "field1:f1B field2:f2A"
> OR
> "field1:f1B field2:f2B"
>
> which may take quite a while to execute, and that doesn't even include the 
> time that it'll take to enumerate the terms in a field that match your regex, 
> which can get very ugly if your regex is such that it has to examine _every_ 
> term in the field, i.e. the entire terms list for the field for the entire 
> corpus.
>
> This might be an XY problem, what problem are you solving with regexes? Might 
> you be better off constructing better analysis chains?
> The reason I ask is that unless you have technical users, regexes are 
> unlikely to be even used
>
> FWIW,
> Erick
>
>
> On Sun, May 22, 2016 at 8:19 AM, Erez Michalak  wrote:
>> Thanks you Ahmet for the JIRA reference - it looks really promising and I'll 
>> check it out.
>>
>> Regarding your question - once a piece of text is tokenized, it seems like 
>> there is no way to perform a regex query across term boundaries. The pure 
>> regex is good as long I'm querying for a single term.
>>
>>
>> -Original Message-
>> From: Ahmet Arslan [mailto:iori...@yahoo.com]
>> Sent: Sunday, May 22, 2016 4:49 PM
>> To: solr-user@lucene.apache.org; Erez Michalak 
>> Subject: Re: How to use a regex search within a phrase query?
>>
>> Hi Erez,
>>
>> I don't think it is possible to combine regex with phrase out-of-the-box.
>> However, there is https://issues.apache.org/jira/browse/LUCENE-5205 for the 
>> task.
>>
>> Can't you define your query in terms of pure regex?
>> something like /[0-9]{3} .* [0-9]{4}/
>>
>> ahmet
>>
>>
>> On Sunday, May 22, 2016 1:37 PM, Erez Michalak  wrote:
>> Hey,
>> I'm developing a search application based on SOLR 5.3.1, and would like to 
>> add to it regex search capabilities on a

Re: SolrCloud increase replication factor

2016-05-23 Thread Erick Erickson

About (1), bq: The Solr Admin UI showed that my replication factor
changed but otherwise nothing happened.

this is as designed AFAIK. There's nothing built in to Solr to
_automatically_ add replicas when this property is changed. My guess
is that the MODIFYCOLLECTION code was written to help with editing the
ZK nodes, i.e. make it unnecessary to hand-edit the ZK nodes to change
things like replication factor without recreating a collection. I've
modified the ref guide page to make this more explicit.

about (2)... I agree. If you can reliably repeat this (or even better,
come up with a test case) it would be worth a JIRA I think.

Best,
Erick



On Mon, May 23, 2016 at 10:46 AM, Hendrik Haddorp
 wrote:
> What I find odd is that creating a collection with a replication factor
> greater then 1 does seem to not end up with replicas on the same node.
> However when one wants to add replicas later on one need to do the whole
> placement manually to avoid single point of failures.
>
> On 23/05/16 15:28, Tom Evans wrote:
>> On Mon, May 23, 2016 at 10:37 AM, Hendrik Haddorp
>>  wrote:
>>> Hi,
>>>
>>> I have a SolrCloud 6.0 setup and created my collection with a
>>> replication factor of 1. Now I want to increase the replication factor
>>> but would like the replicas for the same shard to be on different nodes,
>>> so that my collection does not fail when one node fails. I tried two
>>> approaches so far:
>>>
>>> 1) When I use the collections API with the MODIFYCOLLECTION action [1] I
>>> can set the replication factor but that did not result in the creation
>>> of additional replicas. The Solr Admin UI showed that my replication
>>> factor changed but otherwise nothing happened. A reload of the
>>> collection did also result in no change.
>>>
>>> 2) Using the ADDREPLICA action [2] from the collections API I have to
>>> add the replicas to the shard individually, which is a bit more
>>> complicated but otherwise worked. During testing this did however at
>>> least once result in the replica being created on the same node. My
>>> collection was split in 4 shards and for 2 of them all replicas ended up
>>> on the same node.
>>>
>>> So is the only option to create the replicas manually and also pick the
>>> nodes manually or is the perceived behavior wrong?
>>>
>>> regards,
>>> Hendrik
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-modifycoll
>>> [2]
>>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>>
>> With ADDREPLICA, you can specify the node to create the replica on. If
>> you are using a script to increase/remove replicas, you can simply
>> incorporate the logic you desire in to your script - you can also use
>> CLUSTERSTATUS to get a list of nodes/collections/shards etc in order
>> to inform the logic in the script. This is the approach we took, we
>> have a fabric script to add/remove extra nodes to/from the cluster, it
>> works well.
>>
>> The alternative is to put the logic in to Solr itself, using what Solr
>> calls a "snitch" to define the rules on where replicas are created.
>> The snitch is specified at collection creation time, or you can use
>> MODIFYCOLLECTION to set it after the fact. See this wiki patch for
>> details:
>>
>> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
>>
>> Cheers
>>
>> Tom
>

Re: What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

2016-05-23 Thread scott.chu

Thanks for your considerable opinion. I'll try addreplica first.

scott.chu，scott@udngroup.com
2016/5/24 (週二)
- Original Message - 
From: Erick Erickson 
To: solr-user ; scott(自己) 
CC: 
Date: 2016/5/24 (週二) 01:56
Subject: Re: What to do best when expaning from 2 nodes to 4 nodes? [scottchu]

Take a look at the SPLITSHARD Collections API here: 
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3

Best value of numShards and replicationFactor: Impossible to say. You have 

to stress test respecting your SLAs. See: 
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

But there's _no_ reason to split your shard if you are getting adequate 
response times to queries. In fact, going to more than one shard 
will possibly slow your query response as distributed queries add 
inevitable overhead. 

If you simply want to add more replicas to increase the QPS rate you 
can handle. just bring up your Solr nodes and use the Collections API 
ADDREPLICA command to your single shard. 

Best, 
Erick 

On Mon, May 23, 2016 at 7:52 AM, Scott Chu  wrote: 

> I just created a 90gb index collection with 1 shard and 2 replicas on 2 
> nodes. I am to migrate from 2 nodes to 4 node. I am wondering what's the best 
> stragedy to split this single shard? Furthermore, If I am ok to reindex, 
> what's the best adequate experienced value of numShards and 
> replicationFactor? Lastly, I think there's no other way but reindex if I want 
> my data to be evenly distributed into every shard I create, right? 
> 
> Scott Chu，scott@udngroup.com 
> 2016/5/23 (週一) 
> 
> P.S. For those who are curious of why I add [scottchu] in subject, the reason 
> is that I want my email filter to route those emails that answer to my 
> question to specific folder. 

- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4568/12285 - 發佈日期: 05/23/16

Re: SolrCloud increase replication factor

2016-05-23 Thread Hendrik Haddorp

Hi Tom,

the pointer to the rule based placement was indeed what I was missing! I
simply had to add the rule "shard:*,replica:<2,node:*", as documented,
and my replicas do now get distributed as expected :-)

thanks,
Hendrik

On 23/05/16 15:28, Tom Evans wrote:
> On Mon, May 23, 2016 at 10:37 AM, Hendrik Haddorp
>  wrote:
>> Hi,
>>
>> I have a SolrCloud 6.0 setup and created my collection with a
>> replication factor of 1. Now I want to increase the replication factor
>> but would like the replicas for the same shard to be on different nodes,
>> so that my collection does not fail when one node fails. I tried two
>> approaches so far:
>>
>> 1) When I use the collections API with the MODIFYCOLLECTION action [1] I
>> can set the replication factor but that did not result in the creation
>> of additional replicas. The Solr Admin UI showed that my replication
>> factor changed but otherwise nothing happened. A reload of the
>> collection did also result in no change.
>>
>> 2) Using the ADDREPLICA action [2] from the collections API I have to
>> add the replicas to the shard individually, which is a bit more
>> complicated but otherwise worked. During testing this did however at
>> least once result in the replica being created on the same node. My
>> collection was split in 4 shards and for 2 of them all replicas ended up
>> on the same node.
>>
>> So is the only option to create the replicas manually and also pick the
>> nodes manually or is the perceived behavior wrong?
>>
>> regards,
>> Hendrik
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-modifycoll
>> [2]
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>
> With ADDREPLICA, you can specify the node to create the replica on. If
> you are using a script to increase/remove replicas, you can simply
> incorporate the logic you desire in to your script - you can also use
> CLUSTERSTATUS to get a list of nodes/collections/shards etc in order
> to inform the logic in the script. This is the approach we took, we
> have a fabric script to add/remove extra nodes to/from the cluster, it
> works well.
>
> The alternative is to put the logic in to Solr itself, using what Solr
> calls a "snitch" to define the rules on where replicas are created.
> The snitch is specified at collection creation time, or you can use
> MODIFYCOLLECTION to set it after the fact. See this wiki patch for
> details:
>
> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
>
> Cheers
>
> Tom

Re: Import html data in mysql and map schemas using only SolrCELL+TIKA+DIH [scottchu]

2016-05-23 Thread scott.chu

Can anyone show me an example or short help of how I can do it? I am to use 
Solr 5 or up to carry out it.

scott.chu，scott@udngroup.com
2016/5/24 (週二)
- Original Message - 
From: scott(自己) 
To: solr-user 
CC: 
Date: 2016/5/20 (週五) 14:17
Subject: Import html data in mysql and map schemas using only SolrCELL+TIKA+DIH 
[scottchu]

I have a mysql table with over 300M blog articles. The records are in html 
format. Is it possible to import these records using only Solr CELL+TIKA+DIH to 
some Solr with schema? I mean when importing, I can map schema on mysql to 
schema in Solr? 

scott.chu，scott@udngroup.com 
2016/5/20 (週五) 

- 
未在此訊息中找到病毒。 
已透過 AVG 檢查 - www.avg.com 
版本: 2015.0.6201 / 病毒庫: 4568/12262 - 發佈日期: 05/19/16

spellcheck on vietnamese (vi)

2016-05-23 Thread Nuhaa All Bakry

hello all,

The site i’m working on has to support Vietnamese and Thai languages. The user 
should be able to search in a language and Solr should be able to detect 
misspelling and suggest some corrections. The search works as expected but the 
spellcheck doesnt. Currently I’m looking to implement that for Vietnamese.

I have indexed these:
{ "term_vi":"giáo viên tiếng Anh” }, {"term_vi":"giáo viên" }

I have configured solr as the following but the spellcheck wont work for the 
language. 



 
 
 
 
 





text_spell_vi

 
default
term_vi
solr.DirectSolrSpellChecker
internal
0.5
2
1
5
5
0.01





on
default
false
10
5
5
false



spellcheck



When I query 
http://localhost:8983/solr/search/query?q=term_vi:gio&spellcheck.q=gio
spellcheck block in the response is empty. I should expect spellcheck to 
correct that to giáo.

What am i missing?

I’ve tried out Suggester component too, using FuzzyLookupFactory and 
DocumentDictionaryFactory, but does not give the expected result. 


regards,
nuhaa

Re: Indexing a (File attached to a document)

2016-05-23 Thread Solr User

Hi 
I am using MapReduceIndexer Tool to index data from hdfs , using morphlines
as ETL tool.

Specifying data path as xpath's in morphline file.

sorry for delay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334p4278730.html
Sent from the Solr - User mailing list archive at Nabble.com.

60 matches

Mail list logo