date:20181025

Re: Solr Stream vs Export Request Handlers

2018-10-25 Thread Kamal Kishore Aggarwal

Any update on this.

Regards
Kamal

On Thu, Oct 18, 2018 at 11:50 AM Kamal Kishore Aggarwal <
kkroyal@gmail.com> wrote:

> Hi,
>
> Thanks again Joel for your reply. I have noted your suggestions.
>
> I observed one more thing while using solrj to fetch the data using
> /stream with export and direct /export. The solr QTime is almost same,
> however elapsed time(total time) to fetch response in streaming with export
> is better than direct /export (Streaming export taking 30% less than
> /export).
>
> Is this also expected ?
>
> Regards
> Kamal Kishore
>
>
>
> On Tue, Oct 16, 2018 at 3:21 AM Joel Bernstein  wrote:
>
>> Yes this is correct. But keep in mind Streaming Expression has a wide
>> range
>> of features that have nothing at all to do with the export handler. In
>> general with Streaming Expressions you want to find the functions that get
>> the job done using the least amount of work. The /export handler is often
>> not the best choice. You'll want to read through the various streaming
>> expressions to see if they might be more efficient for your use case.
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Mon, Oct 15, 2018 at 12:05 PM Kamal Kishore Aggarwal <
>> kkroyal@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > After I performed the test on my data, I found out that direct /export
>> and
>> > streaming expression with export, both are giving almost same response
>> > time. This was also pointed out by *Jan Høydahl* in his reply.
>> >
>> > Also, the documentation says export feature uses stream sorting
>> technique
>> > and streaming expression also uses steam technique. So, are they
>> internally
>> > works in same fashion. Please confirm.
>> >
>> > Regards
>> > Kamal Kishore
>> >
>> >
>> >
>> > On Tue, Oct 2, 2018 at 5:51 PM Kamal Kishore Aggarwal <
>> > kkroyal@gmail.com>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > Thanks Jan & Joel.
>> > >
>> > > Though I will evaluate the performance over my data, but based on your
>> > > experience, which one of the two is better in performance ?. Please
>> > suggest
>> > >
>> > > Yeah, I know export does not get the data from all shards, but we can
>> > > write code to aggregate the data from all shards. But only if export
>> is
>> > > better over stream.
>> > >
>> > > Thanks
>> > > Kamal Kishore
>> > >
>> > >
>> > > On Thu, Sep 27, 2018 at 11:04 PM Joel Bernstein 
>> > > wrote:
>> > >
>> > >> The export handler does not do distributed search. So if you have a
>> > >> multi-shard collection you may have to use Streaming Expressions to
>> get
>> > >> exports from all shards.
>> > >>
>> > >>
>> > >> Joel Bernstein
>> > >> http://joelsolr.blogspot.com/
>> > >>
>> > >>
>> > >> On Thu, Sep 27, 2018 at 4:32 AM Jan Høydahl 
>> > >> wrote:
>> > >>
>> > >> > Hi,
>> > >> >
>> > >> > Yes, you can choose which to use, it should give  you about same
>> > result.
>> > >> > If you already work with the Solr search API it would be the
>> easiest
>> > for
>> > >> > you to consume /export as you don't need to learn the new syntax
>> and
>> > >> parse
>> > >> > the Tuple response. However, if you need to do stuff with the docs
>> as
>> > >> you
>> > >> > stream them from Solr, then streaming expressions lets you enrich
>> the
>> > >> docs,
>> > >> > modify, join etc on the fly.
>> > >> >
>> > >> > PS: When the /export docs says it uses a streaming tecnique, it
>> does
>> > NOT
>> > >> > mean that it has uses the solr feature streaming expressions :)
>> > >> >
>> > >> > --
>> > >> > Jan Høydahl, search solution architect
>> > >> > Cominvent AS - www.cominvent.com
>> > >> >
>> > >> > > 27. sep. 2018 kl. 09:07 skrev Kamal Kishore Aggarwal <
>> > >> > kkroyal@gmail.com>:
>> > >> > >
>> > >> > > Hi,
>> > >> > >
>> > >> > > I have a requirement to fetch all data from a collection. One
>> way is
>> > >> to
>> > >> > use
>> > >> > > streaming expression and other way is to use export.
>> > >> > >
>> > >> > > Streaming expression documentation says *streaming functions are
>> > >> designed
>> > >> > > to work with entire result sets rather then the top N results
>> like
>> > >> normal
>> > >> > > search. This is supported by the /export handler.*
>> > >> > >
>> > >> > > Also, Export handler documentation says *this feature uses a
>> stream
>> > >> > sorting
>> > >> > > technique that begins to send records within milliseconds and
>> > >> continues
>> > >> > to
>> > >> > > stream results until the entire result set has been sorted and
>> > >> exported.*
>> > >> > >
>> > >> > > These two statements concludes to me that for fetching entire
>> > results
>> > >> > > streaming expressions uses export handler and export handler uses
>> > >> stream,
>> > >> > > so, whether I use streaming expression or export handler, they
>> are
>> > >> > > internally same and would have same performance. I am correct
>> over
>> > >> here
>> > >> > to
>> > >> > > say so ?
>> > >> > >
>> > >> > >
>> > >> > > Ref Links:
>> > >> > >
>> > >> > >
>> https://lucene.apache.org/solr/guid

How to add two fields (mobile,year) in SolrCloud Collection

2018-10-25 Thread kbmanikanta90

Hello Everyone,

 I need small help, we have trying to migrate Solr Standalone to Solr
Cloud. During this journey,we have done R and D a lot but we have unable to
find any article related to our problem.
Can anyone please help out this problem

We have run zookeeper separately in local and started solr instance (8983)
with -z (zookeeper) localhost:2181.
*[solr start -cloud -p 8983 -z 172.22.9.16:2181]*

once solrcloud instance started. we have created core with shards (2) and
custom configSet("_test") (schema.xml), Fields : id,name

We have used dataImportHandler (DIH) post.jar to push data to respective
core.

Here we have facing problem. i.e we need to add two more fields(mobile,year)
in that core schema.
But we could find where that schema.xml exists. we went to _test configSet
and added those two fields and restarted the zookeeper and solrCloud
Instance.
Still it's not reflected to solrCloud Collection(While we check in Files
Folder -> Schema.xml on Solr Admin UI Portal).

Please, can anyone help me how to resolve this issue?

Regards,
Bala Manikanta K.
kbmanikant...@gmail.com




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reading data using Tika to Solr

2018-10-25 Thread Martin Frank Hansen (MHQ)

Hi,

I am trying to read content of msg-files using Tika and index these in Solr, 
however I am having some problems with the OfficeParser(). I keep getting the 
error java.lang.NoClassDefFoundError for the OfficeParcer, even though both 
tika-core and tika-parsers are included in the build path.


I am using Java with the following code:


public static void main(final String[] args) throws IOException,SAXException, 
TikaException {

processDocument(pathtofile)

 }

private static void processDocument(String 
pathfilename)  {


 try {

File file = new 
File(pathfilename);

Metadata meta = new 
Metadata();

 InputStream input = 
TikaInputStream.get(file);

 BodyContentHandler 
handler = new BodyContentHandler();

Parser parser = new 
OfficeParser();
 ParseContext context = 
new ParseContext();
 parser.parse(input, 
handler, meta, context);

 String doccontent = 
handler.toString();



System.out.println(doccontent);

System.out.println(meta);

 }
 }
In the buildpath I have the following dependencies:

[cid:image001.png@01D46C59.8AECF060]

Any help is appreciate.

Thanks in advance.

Best regards,

Martin Hansen


Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
KMD’s Privatlivspolitik, der fortæller, 
hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s 
Privacy Policy outlining how we process your 
personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis 
du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og 
ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre 
fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og 
læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar 
for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have 
received this message by mistake, please inform the sender of the mistake by 
sending a reply, then delete the message from your system without making, 
distributing or retaining any copies of it. Although we believe that the 
message and any attachments are free from viruses and other errors that might 
affect the computer or it-system where it is received and read, the recipient 
opens the message at his or her own risk. We assume no responsibility for any 
loss or damage arising from the receipt or use of this message.

Re: How to add two fields (mobile,year) in SolrCloud Collection

2018-10-25 Thread Yogendra Kumar Soni

You need to change local copy of schema.xml and upload that schema.xml  to
zookeeper using upconfig command.
you need to specify -n  ( should be same as collection name ).
https://lucene.apache.org/solr/guide/6_6/solr-control-script-reference.html#SolrControlScriptReference-UploadaConfigurationSet

after uploading config files you will need to reload collection.
admin/collections?action=RELOAD&name=


On Thu, Oct 25, 2018 at 1:30 PM kbmanikanta90 
wrote:

> Hello Everyone,
>
>  I need small help, we have trying to migrate Solr Standalone to Solr
> Cloud. During this journey,we have done R and D a lot but we have unable to
> find any article related to our problem.
> Can anyone please help out this problem
>
> We have run zookeeper separately in local and started solr instance (8983)
> with -z (zookeeper) localhost:2181.
> *[solr start -cloud -p 8983 -z 172.22.9.16:2181]*
>
> once solrcloud instance started. we have created core with shards (2) and
> custom configSet("_test") (schema.xml), Fields : id,name
>
> We have used dataImportHandler (DIH) post.jar to push data to respective
> core.
>
> Here we have facing problem. i.e we need to add two more
> fields(mobile,year)
> in that core schema.
> But we could find where that schema.xml exists. we went to _test configSet
> and added those two fields and restarted the zookeeper and solrCloud
> Instance.
> Still it's not reflected to solrCloud Collection(While we check in Files
> Folder -> Schema.xml on Solr Admin UI Portal).
>
> Please, can anyone help me how to resolve this issue?
>
> Regards,
> Bala Manikanta K.
> kbmanikant...@gmail.com
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
*Thanks and Regards,*
*Yogendra Kumar Soni*

different query for different dictionaries

2018-10-25 Thread Dan Rosher

Hi,

If I have 2 fields e.g. location and products then I might have 2
dictionaries

spell_location
spell_products

I cannot do for example spellcheck..q e.g.
spellcheck.spell_location.q=... only spellcheck.q=...

Does anyone have a workaround for this limitation?

Cheers
Dan

Re: Securying ONLY the web interface console

2018-10-25 Thread Amanda Shuman

Thanks - but I think I'm past those steps now. I set up an nginx reverse
proxy through the plesk panel initially, so that is fine. Binding it to
port 8983 seems to be the issue. Anyways, I think I'll try out the
instructions listed here and cross my fingers..:

https://talk.plesk.com/threads/unable-to-forward-requests-from-nginx-apache.347141/

Amanda
--
Dr. Amanda Shuman
Post-doc researcher, University of Freiburg, The Maoist Legacy Project

PhD, University of California, Santa Cruz
http://www.amandashuman.net/
http://www.prchistoryresources.org/
Office: +49 (0) 761 203 4925



On Mon, Oct 22, 2018 at 5:35 PM Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> I think that it is not really Solr's job to solve this.   I'm sure that
> there are many Java ways to solve this  with Jetty configuration of JAAS,
> but the *safest* ways involve ports and rights.   In other words, port 8983
> and zookeeper ports are then for Solr nodes to communicate with each
> other.   But a web proxy on some other port (443 with https suggested)
> forwards /solr to port 8983.
>
> You can use many, many servers as the proxy server - Apache httpd and
> NGINX probably being the biggest contenders.   Because my systems team
> understands Apache httpd better, I use the following Apache httpd
> configuration file (this is actually the template version so I don't share
> more):
>
> CASLoginURL  https://{{httpd.cas.server}}/cas/login
> CASValidateURL   https://{{httpd.cas.server}}/cas/serviceValidate
> CASRootProxiedAs https://{{httpd.local.name}}
> CASCookiePath/var/cache/mod_auth_cas/
>
> RewriteEngine On
> RewriteLogLevel 0
> RewriteRule ^/$ https://%{HTTP_HOST}/solr/ [R=301,L]
>
> 
>   ProxyPass http://127.0.0.1:8983/solr retry=0
>   ProxyPassReverse http://127.0.0.1:8983/solr
>   AuthName "NLM Login"
>   AuthType CAS
>   CASScope /
>   CASAuthNHeader REMOTE_USER
>
>   Require user {{solr.admin.users}}
> 
> Now the Apache httpd directives for CAS are all part of the mod_auth_cas
> module, https://github.com/apereo/mod_auth_cas
>
> Other folks are using OAuth, SAML, or just basic htpasswd protection.
>
> Since you are a PhD candidate, I want to point you towards like Apache the
> definitive guide, rather than towards google which will help you from here
> anyway if you look for "Apache httpd web proxy tutorial' or "NGINX web
> proxy tutorial".   Anyway, here are the full docs for Apache httpd and
> links to the book I mention:
>
> * http://httpd.apache.org/docs/2.4/
> *
> https://www.amazon.com/Apache-Definitive-Guide-Ben-Laurie/dp/0596002033/ref=sr_1_1
> *
> https://www.safaribooksonline.com/library/view/apache-the-definitive/0596002033/
>
> > -Original Message-
> > From: Amanda Shuman 
> > Sent: Monday, October 22, 2018 9:55 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Securying ONLY the web interface console
> >
> > Just a follow-up to say that I never have resolved this issue
> > satisfactorily.
> >
> > --
> > Dr. Amanda Shuman
> > Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> > 
> > PhD, University of California, Santa Cruz
> > http://www.amandashuman.net/
> > http://www.prchistoryresources.org/
> > Office: +49 (0) 761 203 4925
> >
> >
> >
> > On Mon, Jun 18, 2018 at 6:00 PM Amanda Shuman
> > 
> > wrote:
> >
> > > Hi Shawn et al,
> > >
> > > As a follow-up to this - then how would you solve the issue? I tried to
> > > use the instructions to set up basic authentication in solr (per a
> Stack
> > > Overflow post) and it worked to secure things, but the web app couldn't
> > > access solr. Tampering with the app code - which is the solr plug-in
> used
> > > for Omeka (https://github.com/scholarslab/SolrSearch) - would require
> a
> > > lot of extra work, so I'm wondering if there's a simpler solution. One
> of
> > > the developers on that told me to do a reverse proxy like the second
> > poster
> > > on this chain more or less suggests. But from what I understand of what
> > you
> > > wrote, this is not ideal because it only protects the admin UI panel
> and
> > > not everything else. So how then should I secure everything with the
> > > exception of calls coming from this web app?
> > >
> > > Best,
> > > Amanda
> > >
> > >
> > > --
> > > Dr. Amanda Shuman
> > > Post-doc researcher, University of Freiburg, The Maoist Legacy Project
> > > 
> > > PhD, University of California, Santa Cruz
> > > http://www.amandashuman.net/
> > > http://www.prchistoryresources.org/
> > > Office: +49 (0) 761 203 4925
> > >
> > >
> > > On Mon, Mar 19, 2018 at 11:03 PM, Shawn Heisey 
> > > wrote:
> > >
> > >> On 3/19/2018 11:19 AM, Jesus Olivan wrote:
> > >> > i'm trying to password protect only Solr web interface (not queries
> > >> > launched from my app). I'm currently using SolrCloud 6.6.0 with
> external
> > >> > zookeepers. I've read tons of Docs abo

Solr 7.5/skg

2018-10-25 Thread David Hastings

Hey all, I was going throught the Solr 7.5 documentation:
http://lucene.apache.org/solr/guide/7_5/index.html

and it appears to be incomplete.  last week Trey Grainger gave a
presentation about the skg plugin, and said it was now included in the 7.5
distribution.  There are no references to using it on the documentation, or
anywhere really.   the only thing close is some github information, from 2
years ago.  Is there a reason its not defined in the official documentation?
Thanks,
David

Question about SynonymGraphFilter

2018-10-25 Thread Gianpiero Sportelli


Hi,

I have a question about SynonymGraphFilter.

During the query parsing I expected a query phrase for multi word 
synonyms but the query produced is an or of all the tokens that compose 
the multi word. is the correct behavior? I attach a test for this 
question. Examples: query: "text analysis is a serious thing" (quoted 
query) Query after parsing: BooleanQuery -> spanNear([spanOr([text:nlp, 
spanNear([text:text, text:analysis], 0, true)]), text:is, text:a, 
text:serius, text:thing], 0, true) in this query nlp is a synonym of 
text analysis query: the labrador is an awesome dog Query after parsing: 
(((+text:a +text:big +text:dog) (+text:the +text:labrador))) (text:is) 
(text:an) (text:awesome) (((+text:mammal +text:animal) text:dog)) Query 
expected: (spanOr([spanNear([text:a ,text:big, text:dog],0,true), 
spanNear([text:the, text:labrador], 0, true)])) (text:is) (text:an) 
(text:awesome) (spanOr([spanNear([text: mammal, text:animal],0, true), 
text:dog]))



in this query the synonyms are:

the labrador -> a big dog

dog -> mammal animal

Thank you,

Gianpiero Sportelli


--



*CELI srl*
via San Quintino, 31 - Torino  



Torino IT – 10121 




*
*
*T  *+39 011 5627115
*W  *www.celi.it 
package it.celi.sophia.lucene7.analysis;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.WhitespaceTokenizer;
import org.apache.lucene.analysis.synonym.SynonymGraphFilter;
import org.apache.lucene.analysis.synonym.SynonymMap;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.util.CharsRefBuilder;
import org.junit.Test;

import java.io.IOException;

import static org.apache.commons.lang3.StringUtils.EMPTY;
import static org.apache.commons.lang3.StringUtils.repeat;

class Formatter{

public static String formatQuery(final Query q) {

final StringBuilder sb = new StringBuilder();
formatQuery(q, EMPTY, 0, sb);

return sb.toString();
}

private static void formatQuery(final Query q, final String prefix, final int level, final StringBuilder sb) {

final String indent = repeat(' ', level * 3);
sb.append(indent)
.append(prefix)
.append(q.getClass().getSimpleName())
.append(" ->   ")
.append(q.toString())
.append(System.lineSeparator());

if (q instanceof BooleanQuery) {
for (final BooleanClause bc : ((BooleanQuery) q).clauses()) {
final String occur = bc.getOccur().equals("+") ? "AND " : bc.getOccur().equals("-") ? "NOT " : "OR ";
formatQuery(bc.getQuery(), occur, level + 1, sb);
}
}
}
}

public class SynonymGraphFilterTest {

@Test
public void testSynonymAnalyzer() throws ParseException {

final Analyzer analyzer = createAnalyzer();


final QueryParser qp = new MultiFieldQueryParser(new String[]{"text"}, analyzer) {

@Override
protected Query newFieldQuery(final Analyzer analyzer, final String field, final String queryText, final boolean quoted)
throws ParseException {
System.out.println("text:: " + queryText + " - quoted:: " + quoted + " analyzer:: " + analyzer);
return super.newFieldQuery(analyzer, field, queryText, quoted);
}
};

System.out.println("");
String text = "\"text analysis is a serious thing\"";
System.out.println(text);
Query q = qp.parse(text);
System.out.println(Formatter.formatQuery(q));

text = "the labrador is an awesome dog";
System.out.println(text);
q = qp.parse(text);
System.out.println(Formatter.formatQuery(q));
}


private Analyzer createAnalyzer() {

return new Analyzer() {

@Override
protected Analyzer.TokenStreamComponents createComponents(final String fieldName) {
SynonymMap smap = null;
SynonymMap.Builder builder = new

Re: Solr 7.5/skg

2018-10-25 Thread Alexandre Rafalovitch

I think you are looking for:
http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs

Or, as a second option,
http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms

Regards,
   Alex.
On Thu, 25 Oct 2018 at 08:47, David Hastings
 wrote:
>
> Hey all, I was going throught the Solr 7.5 documentation:
> http://lucene.apache.org/solr/guide/7_5/index.html
>
> and it appears to be incomplete.  last week Trey Grainger gave a
> presentation about the skg plugin, and said it was now included in the 7.5
> distribution.  There are no references to using it on the documentation, or
> anywhere really.   the only thing close is some github information, from 2
> years ago.  Is there a reason its not defined in the official documentation?
> Thanks,
> David

Re: Streaming rollUp vs Streaming facet

2018-10-25 Thread Joel Bernstein

Your use case is somewhat special in that it involves 10 fields. With that
many nested facets the JSON facet API may or may not outperform streaming
rollups. For most other cases JSON facet API will outperform rollups.



Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Oct 17, 2018 at 11:21 PM RAUNAK AGRAWAL 
wrote:

> Thanks a lot Joel. This makes sense but in my use case, I am aggregating 10
> fields but it is performing 2x better than the facet streaming.
>
> On Wed, Oct 17, 2018 at 6:56 PM Joel Bernstein  wrote:
>
> > They are very different.
> >
> > The "facet" expression sends a request to the JSON facet API which pushes
> > the aggregation into the search engine. In most scenarios this is the
> > preferred method because it only streams aggregated results. I would
> always
> > try the "facet" expression first before going to rollup.
> >
> > The "rollup" expression rolls up aggregations over a sorted stream of
> > tuples. It almost always involves exporting and sorting entire result
> sets
> > with the /export handler. There are only two reasons to use this
> approach:
> >
> > 1) Very high cardinality faceting. By very high I mean millions of facet
> > values are being returned in the same query.
> > 2) Rollups following any kind of relational algebra. For example a rollup
> > on top of a hashJoin.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Tue, Oct 16, 2018 at 8:54 AM RAUNAK AGRAWAL  >
> > wrote:
> >
> > > Hi Guys,
> > >
> > > I am trying to do an aggregation (sum) using streaming API. I have
> around
> > > 10 billion documents in my collection and every document has around 10
> > > docValues.
> > >
> > > So streaming facet is taking close to 6 secs to respond with
> aggregation
> > on
> > > 10 fields while streaming rollup is returning the response in 2 secs.
> > >
> > > So my questions are:
> > >
> > > 1. What is the fundamental difference between streaming facet and
> rollUp.
> > > 2. When to use facet and when to use rollUp.
> > >
> > > Thanks
> > >
> >
>

Re: Solr Stream vs Export Request Handlers

2018-10-25 Thread Joel Bernstein

I'm not sure why /stream is exporting faster then /export. It may be that
the different approaches in the client are the reason for the difference.
But the /export handler would be used in both scenarios if you specify
qt=/export in the search() Streaming Expression.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Oct 25, 2018 at 3:07 AM Kamal Kishore Aggarwal <
kkroyal@gmail.com> wrote:

> Any update on this.
>
> Regards
> Kamal
>
> On Thu, Oct 18, 2018 at 11:50 AM Kamal Kishore Aggarwal <
> kkroyal@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks again Joel for your reply. I have noted your suggestions.
> >
> > I observed one more thing while using solrj to fetch the data using
> > /stream with export and direct /export. The solr QTime is almost same,
> > however elapsed time(total time) to fetch response in streaming with
> export
> > is better than direct /export (Streaming export taking 30% less than
> > /export).
> >
> > Is this also expected ?
> >
> > Regards
> > Kamal Kishore
> >
> >
> >
> > On Tue, Oct 16, 2018 at 3:21 AM Joel Bernstein 
> wrote:
> >
> >> Yes this is correct. But keep in mind Streaming Expression has a wide
> >> range
> >> of features that have nothing at all to do with the export handler. In
> >> general with Streaming Expressions you want to find the functions that
> get
> >> the job done using the least amount of work. The /export handler is
> often
> >> not the best choice. You'll want to read through the various streaming
> >> expressions to see if they might be more efficient for your use case.
> >>
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >>
> >> On Mon, Oct 15, 2018 at 12:05 PM Kamal Kishore Aggarwal <
> >> kkroyal@gmail.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > After I performed the test on my data, I found out that direct /export
> >> and
> >> > streaming expression with export, both are giving almost same response
> >> > time. This was also pointed out by *Jan Høydahl* in his reply.
> >> >
> >> > Also, the documentation says export feature uses stream sorting
> >> technique
> >> > and streaming expression also uses steam technique. So, are they
> >> internally
> >> > works in same fashion. Please confirm.
> >> >
> >> > Regards
> >> > Kamal Kishore
> >> >
> >> >
> >> >
> >> > On Tue, Oct 2, 2018 at 5:51 PM Kamal Kishore Aggarwal <
> >> > kkroyal@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > Thanks Jan & Joel.
> >> > >
> >> > > Though I will evaluate the performance over my data, but based on
> your
> >> > > experience, which one of the two is better in performance ?. Please
> >> > suggest
> >> > >
> >> > > Yeah, I know export does not get the data from all shards, but we
> can
> >> > > write code to aggregate the data from all shards. But only if export
> >> is
> >> > > better over stream.
> >> > >
> >> > > Thanks
> >> > > Kamal Kishore
> >> > >
> >> > >
> >> > > On Thu, Sep 27, 2018 at 11:04 PM Joel Bernstein  >
> >> > > wrote:
> >> > >
> >> > >> The export handler does not do distributed search. So if you have a
> >> > >> multi-shard collection you may have to use Streaming Expressions to
> >> get
> >> > >> exports from all shards.
> >> > >>
> >> > >>
> >> > >> Joel Bernstein
> >> > >> http://joelsolr.blogspot.com/
> >> > >>
> >> > >>
> >> > >> On Thu, Sep 27, 2018 at 4:32 AM Jan Høydahl  >
> >> > >> wrote:
> >> > >>
> >> > >> > Hi,
> >> > >> >
> >> > >> > Yes, you can choose which to use, it should give  you about same
> >> > result.
> >> > >> > If you already work with the Solr search API it would be the
> >> easiest
> >> > for
> >> > >> > you to consume /export as you don't need to learn the new syntax
> >> and
> >> > >> parse
> >> > >> > the Tuple response. However, if you need to do stuff with the
> docs
> >> as
> >> > >> you
> >> > >> > stream them from Solr, then streaming expressions lets you enrich
> >> the
> >> > >> docs,
> >> > >> > modify, join etc on the fly.
> >> > >> >
> >> > >> > PS: When the /export docs says it uses a streaming tecnique, it
> >> does
> >> > NOT
> >> > >> > mean that it has uses the solr feature streaming expressions :)
> >> > >> >
> >> > >> > --
> >> > >> > Jan Høydahl, search solution architect
> >> > >> > Cominvent AS - www.cominvent.com
> >> > >> >
> >> > >> > > 27. sep. 2018 kl. 09:07 skrev Kamal Kishore Aggarwal <
> >> > >> > kkroyal@gmail.com>:
> >> > >> > >
> >> > >> > > Hi,
> >> > >> > >
> >> > >> > > I have a requirement to fetch all data from a collection. One
> >> way is
> >> > >> to
> >> > >> > use
> >> > >> > > streaming expression and other way is to use export.
> >> > >> > >
> >> > >> > > Streaming expression documentation says *streaming functions
> are
> >> > >> designed
> >> > >> > > to work with entire result sets rather then the top N results
> >> like
> >> > >> normal
> >> > >> > > search. This is supported by the /export handler.*
> >> > >> > >
> >> > >> > > Also, Export handler documentation says *this feature uses a
> >> stream
> >> > >> > sortin

Re: Solr 7.5/skg

2018-10-25 Thread David Hastings

Thanks very much!  for being a search product, the documentation isn't very
search friendly :)

On Thu, Oct 25, 2018 at 9:29 AM Alexandre Rafalovitch 
wrote:

> I think you are looking for:
>
> http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs
>
> Or, as a second option,
>
> http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms
>
> Regards,
>Alex.
> On Thu, 25 Oct 2018 at 08:47, David Hastings
>  wrote:
> >
> > Hey all, I was going throught the Solr 7.5 documentation:
> > http://lucene.apache.org/solr/guide/7_5/index.html
> >
> > and it appears to be incomplete.  last week Trey Grainger gave a
> > presentation about the skg plugin, and said it was now included in the
> 7.5
> > distribution.  There are no references to using it on the documentation,
> or
> > anywhere really.   the only thing close is some github information, from
> 2
> > years ago.  Is there a reason its not defined in the official
> documentation?
> > Thanks,
> > David
>

Re: Internal Solr communication question

2018-10-25 Thread Fernando Otero

Hey Shawn
Thanks for your answer!. I changed the config to 1 shard with 7
replicas but I still see communication between nodes, is that expected?
Each node has 1 shard so it should have all the data needed to compute, I
don't get why I'm seeing communication between them.

Thanks

On Tue, Oct 23, 2018 at 2:21 PM Shawn Heisey  wrote:

> On 10/23/2018 9:31 AM, Fernando Otero wrote:
> > Hey all
> >   I'm running some tests on Solr cloud (10 nodes, 3 shards, 3
> replicas),
> > when I run the queries I end up seeing 7x traffic ( requests / minute)
> in
> > Newrelic.
> >
> > Could it be that the internal communication between nodes is done through
> > HTTP and newrelic counts those calls?
>
> The inter-node communication is indeed done over HTTP, using the same
> handlers that clients use, and if you have something watching Solr's
> statistics or watching Jetty's counters, one of the counters will go up
> when an inter-node request happens.
>
> With 3 shards, one request coming in will generate as many as six
> additional requests -- one request to a replica for each shard, and then
> another request to each shard that has matches for the query, to
> retrieve the documents that will be in the response. The node that
> received the initial request will compile the results from all the
> shards and send them back in response to the original request.
> Nutshell:  One request from a client expands. With three shards, that
> will be four to seven requests total.  If you have 10 shards, it will be
> between 11 and 21 total requests.
>
> Thanks,
> Shawn
>
>

-- 

Fernando Otero

Sr Engineering Manager, Panamera

Buenos Aires - Argentina

Mobile: +54 911 67697108

Email:  fernando.ot...@olx.com

Re: Solr 7.5/skg

2018-10-25 Thread Alexandre Rafalovitch

That's being worked on as well. We've migrated the documentation from
Confluence to standalone setup, so not all the pieces are in place
yet.

Regards,
   Alex.
On Thu, 25 Oct 2018 at 10:12, David Hastings
 wrote:
>
> Thanks very much!  for being a search product, the documentation isn't very
> search friendly :)
>
> On Thu, Oct 25, 2018 at 9:29 AM Alexandre Rafalovitch 
> wrote:
>
> > I think you are looking for:
> >
> > http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs
> >
> > Or, as a second option,
> >
> > http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms
> >
> > Regards,
> >Alex.
> > On Thu, 25 Oct 2018 at 08:47, David Hastings
> >  wrote:
> > >
> > > Hey all, I was going throught the Solr 7.5 documentation:
> > > http://lucene.apache.org/solr/guide/7_5/index.html
> > >
> > > and it appears to be incomplete.  last week Trey Grainger gave a
> > > presentation about the skg plugin, and said it was now included in the
> > 7.5
> > > distribution.  There are no references to using it on the documentation,
> > or
> > > anywhere really.   the only thing close is some github information, from
> > 2
> > > years ago.  Is there a reason its not defined in the official
> > documentation?
> > > Thanks,
> > > David
> >

Re: Internal Solr communication question

2018-10-25 Thread Emir Arnautović

Hi Fernando,
I did not look at code and not sure if there is special handling in case of a 
single shard collection, but Solr does not have to choose local shard to query. 
It assumes that one node will receive all requests and that it needs to 
balance. What you can do is add preferLocalShards=true to make sure local 
shards are queried.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 25 Oct 2018, at 16:18, Fernando Otero  wrote:
> 
> Hey Shawn
>Thanks for your answer!. I changed the config to 1 shard with 7
> replicas but I still see communication between nodes, is that expected?
> Each node has 1 shard so it should have all the data needed to compute, I
> don't get why I'm seeing communication between them.
> 
> Thanks
> 
> On Tue, Oct 23, 2018 at 2:21 PM Shawn Heisey  wrote:
> 
>> On 10/23/2018 9:31 AM, Fernando Otero wrote:
>>> Hey all
>>>  I'm running some tests on Solr cloud (10 nodes, 3 shards, 3
>> replicas),
>>> when I run the queries I end up seeing 7x traffic ( requests / minute)
>> in
>>> Newrelic.
>>> 
>>> Could it be that the internal communication between nodes is done through
>>> HTTP and newrelic counts those calls?
>> 
>> The inter-node communication is indeed done over HTTP, using the same
>> handlers that clients use, and if you have something watching Solr's
>> statistics or watching Jetty's counters, one of the counters will go up
>> when an inter-node request happens.
>> 
>> With 3 shards, one request coming in will generate as many as six
>> additional requests -- one request to a replica for each shard, and then
>> another request to each shard that has matches for the query, to
>> retrieve the documents that will be in the response. The node that
>> received the initial request will compile the results from all the
>> shards and send them back in response to the original request.
>> Nutshell:  One request from a client expands. With three shards, that
>> will be four to seven requests total.  If you have 10 shards, it will be
>> between 11 and 21 total requests.
>> 
>> Thanks,
>> Shawn
>> 
>> 
> 
> -- 
> 
> Fernando Otero
> 
> Sr Engineering Manager, Panamera
> 
> Buenos Aires - Argentina
> 
> Mobile: +54 911 67697108
> 
> Email:  fernando.ot...@olx.com

Re: Solr 7.5/skg

2018-10-25 Thread David Hastings

Another skg question.  the significantTerms
says it queries a solrcloud collection, only, is there any way to have it
work on standalone solr/cores as well?  the MLT function works fine on
standalone, was really hoping this would as well.


On Thu, Oct 25, 2018 at 10:25 AM Alexandre Rafalovitch 
wrote:

> That's being worked on as well. We've migrated the documentation from
> Confluence to standalone setup, so not all the pieces are in place
> yet.
>
> Regards,
>Alex.
> On Thu, 25 Oct 2018 at 10:12, David Hastings
>  wrote:
> >
> > Thanks very much!  for being a search product, the documentation isn't
> very
> > search friendly :)
> >
> > On Thu, Oct 25, 2018 at 9:29 AM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> > > I think you are looking for:
> > >
> > >
> http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs
> > >
> > > Or, as a second option,
> > >
> > >
> http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms
> > >
> > > Regards,
> > >Alex.
> > > On Thu, 25 Oct 2018 at 08:47, David Hastings
> > >  wrote:
> > > >
> > > > Hey all, I was going throught the Solr 7.5 documentation:
> > > > http://lucene.apache.org/solr/guide/7_5/index.html
> > > >
> > > > and it appears to be incomplete.  last week Trey Grainger gave a
> > > > presentation about the skg plugin, and said it was now included in
> the
> > > 7.5
> > > > distribution.  There are no references to using it on the
> documentation,
> > > or
> > > > anywhere really.   the only thing close is some github information,
> from
> > > 2
> > > > years ago.  Is there a reason its not defined in the official
> > > documentation?
> > > > Thanks,
> > > > David
> > >
>

Re: Solr 7.5/skg

2018-10-25 Thread Alexandre Rafalovitch

See 
https://www.slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks
, slides 19+

But it is not a fully-supported usage, due to
https://issues.apache.org/jira/browse/SOLR-12569 .

So, at your own risk.

Regards,
   Alex.
On Thu, 25 Oct 2018 at 10:32, David Hastings
 wrote:
>
> Another skg question.  the significantTerms
> says it queries a solrcloud collection, only, is there any way to have it
> work on standalone solr/cores as well?  the MLT function works fine on
> standalone, was really hoping this would as well.
>
>
> On Thu, Oct 25, 2018 at 10:25 AM Alexandre Rafalovitch 
> wrote:
>
> > That's being worked on as well. We've migrated the documentation from
> > Confluence to standalone setup, so not all the pieces are in place
> > yet.
> >
> > Regards,
> >Alex.
> > On Thu, 25 Oct 2018 at 10:12, David Hastings
> >  wrote:
> > >
> > > Thanks very much!  for being a search product, the documentation isn't
> > very
> > > search friendly :)
> > >
> > > On Thu, Oct 25, 2018 at 9:29 AM Alexandre Rafalovitch <
> > arafa...@gmail.com>
> > > wrote:
> > >
> > > > I think you are looking for:
> > > >
> > > >
> > http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs
> > > >
> > > > Or, as a second option,
> > > >
> > > >
> > http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms
> > > >
> > > > Regards,
> > > >Alex.
> > > > On Thu, 25 Oct 2018 at 08:47, David Hastings
> > > >  wrote:
> > > > >
> > > > > Hey all, I was going throught the Solr 7.5 documentation:
> > > > > http://lucene.apache.org/solr/guide/7_5/index.html
> > > > >
> > > > > and it appears to be incomplete.  last week Trey Grainger gave a
> > > > > presentation about the skg plugin, and said it was now included in
> > the
> > > > 7.5
> > > > > distribution.  There are no references to using it on the
> > documentation,
> > > > or
> > > > > anywhere really.   the only thing close is some github information,
> > from
> > > > 2
> > > > > years ago.  Is there a reason its not defined in the official
> > > > documentation?
> > > > > Thanks,
> > > > > David
> > > >
> >

Re: Solr 7.5/skg

2018-10-25 Thread David Hastings

Wow, thanks for that.  Will do some research and come back with the
inevitable questions I will have.

On Thu, Oct 25, 2018 at 10:37 AM Alexandre Rafalovitch 
wrote:

> See
> https://www.slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks
> , slides 19+
>
> But it is not a fully-supported usage, due to
> https://issues.apache.org/jira/browse/SOLR-12569 .
>
> So, at your own risk.
>
> Regards,
>Alex.
> On Thu, 25 Oct 2018 at 10:32, David Hastings
>  wrote:
> >
> > Another skg question.  the significantTerms
> > says it queries a solrcloud collection, only, is there any way to have it
> > work on standalone solr/cores as well?  the MLT function works fine on
> > standalone, was really hoping this would as well.
> >
> >
> > On Thu, Oct 25, 2018 at 10:25 AM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> > > That's being worked on as well. We've migrated the documentation from
> > > Confluence to standalone setup, so not all the pieces are in place
> > > yet.
> > >
> > > Regards,
> > >Alex.
> > > On Thu, 25 Oct 2018 at 10:12, David Hastings
> > >  wrote:
> > > >
> > > > Thanks very much!  for being a search product, the documentation
> isn't
> > > very
> > > > search friendly :)
> > > >
> > > > On Thu, Oct 25, 2018 at 9:29 AM Alexandre Rafalovitch <
> > > arafa...@gmail.com>
> > > > wrote:
> > > >
> > > > > I think you are looking for:
> > > > >
> > > > >
> > >
> http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs
> > > > >
> > > > > Or, as a second option,
> > > > >
> > > > >
> > >
> http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms
> > > > >
> > > > > Regards,
> > > > >Alex.
> > > > > On Thu, 25 Oct 2018 at 08:47, David Hastings
> > > > >  wrote:
> > > > > >
> > > > > > Hey all, I was going throught the Solr 7.5 documentation:
> > > > > > http://lucene.apache.org/solr/guide/7_5/index.html
> > > > > >
> > > > > > and it appears to be incomplete.  last week Trey Grainger gave a
> > > > > > presentation about the skg plugin, and said it was now included
> in
> > > the
> > > > > 7.5
> > > > > > distribution.  There are no references to using it on the
> > > documentation,
> > > > > or
> > > > > > anywhere really.   the only thing close is some github
> information,
> > > from
> > > > > 2
> > > > > > years ago.  Is there a reason its not defined in the official
> > > > > documentation?
> > > > > > Thanks,
> > > > > > David
> > > > >
> > >
>

Re: Query to multiple collections

2018-10-25 Thread Atita Arora

Hi,

This kind of was one of the problems I was facing recently.
While in my use case I am supposed to be showing spellcheck suggestions
(collated) from two different collections.
To also mention both these collections are using the same schema while they
need to be segregated as for the business nature they serve.

I considered using the aliasing approach too, while was little unsure if
this might work for me.
Weirdly the standard select URL itself is a trouble for me and I run into
the following exception on my browser :

http://:8983/solr/products.1,products.3/select?q=*:*

{
  "responseHeader": {
"zkConnected": true,
"status": 500,
"QTime": 24,
"params": {
  "q": "*:*"
}
  },
  "error": {
"trace": "java.lang.NullPointerException\n\tat
org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1034)\n\tat
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:885)\n\tat
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:585)\n\tat
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:564)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:423)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\n\tat
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat
java.lang.Thread.run(Thread.java:748)\n",
"code": 500
  }
}

I would really appreciate if someone could possibly tell me what could be
happening?

Thanks,
Atita

On Tue, Oct 23, 2018 at 1:58 AM Rohan Kasat  wrote:

> Thanks Shawn for the update.
> I am going ahead with the standard aliases approach , suits my use case.
>
> Regards,
> Rohan Kasat
>
>
> On Mon, Oct 22, 2018 at 4:49 PM Shawn Heisey  wrote:
>
> > On 10/22/2018 1:26 PM, Chris Ulicny wrote:
> > > There weren't any particular problems we ran into since the client that
> > > makes the queries to multiple collections previously would query
> multiple
> > > cores using the 'shards' parameter before we moved to solrcloud. We
> > didn't
> > > have any complicated sorting or scoring requirements fortunately.
> > >
> > > The one thing I remember looking into was what solr would do when two
> > > documents with the same id were found in both collections. I believe it
> > > just non-deterministic

Re: Index fetch failed. Exception: Server refused connection

2018-10-25 Thread Walter Underwood

A 1 Gb heap is probably too small on the master. Run with 8 Gb like the slaves.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 24, 2018, at 10:20 PM, Bharat Yadav  wrote:
> 
> Hello Team,
>  
> We are now a days frequently facing below issue on our SOLR Slave Nodes 
> related to public Class SnapPuller.
>  
> Issue 
>  
> Master at: http://prd01slr:3/solr/MainSystem2 
>  is not available. Index fetch 
> failed. Exception: Server refused connection at: 
> http://prd01slr:3/solr/MainSystem2 
> 
>  
> slroms1@prd02slr:slroms1/JEE/SolrProduct/logs/SolrDomain_SolrSlaveServer2/SolrSlaveServer2>
>  grep "Index fetch failed" weblogic.20181018_004904.log
> 425511535 [snapPuller-14-thread-1] ERROR org.apache.solr.handler.SnapPuller  
> – Master at: http://prd01slr:3/solr/MainSystem2 
>  is not available. Index fetch 
> failed. Exception: Server refused connection at: 
> http://prd01slr:3/solr/MainSystem2 
> 
> 425511535 [snapPuller-13-thread-1] ERROR org.apache.solr.handler.SnapPuller  
> – Master at: http://prd01slr:3/solr/MainSystem1 
>  is not available. Index fetch 
> failed. Exception: Server refused connection 
> at:http://prd01slr:3/solr/MainSystem1 
> 
> 598311531 [snapPuller-14-thread-1] ERROR org.apache.solr.handler.SnapPuller  
> – Master at: http://prd01slr:3/solr/MainSystem2 
>  is not available. Index fetch 
> failed. Exception: Server refused connection 
> at:http://prd01slr:3/solr/MainSystem2 
> 
>  
> Note –
> MainSystem1 and MainSystem2 are the Cores in our account.
> When we face this issue sometime we have to bounce our SOLR JVM’s 
> and sometimes automatically it recovers and we don’t need to do any bounce.
>  
> SetUp of SOLR In Account
>  
> ·SOLR Version
>  
> 
>  
> ·We are using SOLR with 1 master and 2 Slave configuration.
>  
> a) Master is running with -  “-Xms1G -Xmx1G -XX:MaxPermSize=256m”
> b)Slaves are running with – “-Xms8G -Xmx8G -XX:MaxPermSize=1024m”
>  
> ·SOLR ear is deployed on all 3 individual Weblogic instances and it 
> is same across.
> ·Indexing is getting done on Master and then we have replication + 
> polling enabled on Slave JVM’s to have it in sync with Master at any time And 
> all querying are being handled by SOLR Slaves only.
> ·For Polling we have defined the timing of 60 sec as highlighted 
> below on Slave solr xml. (I am attaching solr xml configured to slave and 
> master for your reference)
>  
> 
> 
> 
>
>  ${enable.master:false}
>  commit
>  startup
>  schema.xml,stopwords.txt
>
> 
> 
>
>  true
>  http://xx:x/solr/MainSystem2 
> 
>  00:00:60
>
> 
>   
>  
> 
> ·We have GC enabled on jvm’s too but we didn’t find anything 
> suspicious there. If you need gc logs let us know.
>  
> Connectivity Check
>  
> ·Slave 1 – 
>  
>  
> ·Slave 2 –
>   
>  
> Statistics about the Core
>  
> 
>  
>  
> Thanks and Regards
> Bharat Yadav
> Infra INT
> Amdocs Intelligent Operations, SI&O
> India Mob  +91-987464 (WhatsApp only)
> Chile Mob  +56-998022829
>  
> 
> 
>  
>  
> “Amdocs’ email platform is based on a third-party, worldwide, cloud-based 
> system. Any emails sent to Amdocs will be processed and stored using such 
> system and are accessible by third party providers of such system on a 
> limited basis. Your sending of emails to Amdocs evidences your consent to the 
> use of such system and such processing, storing and access”.
>

Re: Solr 7.5/skg

2018-10-25 Thread David Hastings

Although another of Treys examples, the semantic query parser, Doesn't seem
to have documentation unless im missing something?

On Thu, Oct 25, 2018 at 10:41 AM David Hastings <
hastings.recurs...@gmail.com> wrote:

> Wow, thanks for that.  Will do some research and come back with the
> inevitable questions I will have.
>
> On Thu, Oct 25, 2018 at 10:37 AM Alexandre Rafalovitch 
> wrote:
>
>> See
>> https://www.slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks
>> , slides 19+
>>
>> But it is not a fully-supported usage, due to
>> https://issues.apache.org/jira/browse/SOLR-12569 .
>>
>> So, at your own risk.
>>
>> Regards,
>>Alex.
>> On Thu, 25 Oct 2018 at 10:32, David Hastings
>>  wrote:
>> >
>> > Another skg question.  the significantTerms
>> > says it queries a solrcloud collection, only, is there any way to have
>> it
>> > work on standalone solr/cores as well?  the MLT function works fine on
>> > standalone, was really hoping this would as well.
>> >
>> >
>> > On Thu, Oct 25, 2018 at 10:25 AM Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > wrote:
>> >
>> > > That's being worked on as well. We've migrated the documentation from
>> > > Confluence to standalone setup, so not all the pieces are in place
>> > > yet.
>> > >
>> > > Regards,
>> > >Alex.
>> > > On Thu, 25 Oct 2018 at 10:12, David Hastings
>> > >  wrote:
>> > > >
>> > > > Thanks very much!  for being a search product, the documentation
>> isn't
>> > > very
>> > > > search friendly :)
>> > > >
>> > > > On Thu, Oct 25, 2018 at 9:29 AM Alexandre Rafalovitch <
>> > > arafa...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > I think you are looking for:
>> > > > >
>> > > > >
>> > >
>> http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs
>> > > > >
>> > > > > Or, as a second option,
>> > > > >
>> > > > >
>> > >
>> http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms
>> > > > >
>> > > > > Regards,
>> > > > >Alex.
>> > > > > On Thu, 25 Oct 2018 at 08:47, David Hastings
>> > > > >  wrote:
>> > > > > >
>> > > > > > Hey all, I was going throught the Solr 7.5 documentation:
>> > > > > > http://lucene.apache.org/solr/guide/7_5/index.html
>> > > > > >
>> > > > > > and it appears to be incomplete.  last week Trey Grainger gave a
>> > > > > > presentation about the skg plugin, and said it was now included
>> in
>> > > the
>> > > > > 7.5
>> > > > > > distribution.  There are no references to using it on the
>> > > documentation,
>> > > > > or
>> > > > > > anywhere really.   the only thing close is some github
>> information,
>> > > from
>> > > > > 2
>> > > > > > years ago.  Is there a reason its not defined in the official
>> > > > > documentation?
>> > > > > > Thanks,
>> > > > > > David
>> > > > >
>> > >
>>
>

RE: TLOG replica stucks

2018-10-25 Thread Vadim Ivanov

Thanks Erick for you attention!
My comments below, but supposing that the problem resides in zookeeper 
I'll collect more information  from zk logs and solr logs and be back soon.

> bq. I've noticed that some replicas stop receiving updates from the
> leader without any visible signs from the cluster status.
> 
> Hmm, yes, this isn't expected at all. What are you seeing that causes
> you to say this? You'd have to be monitoring the log for update
> messages to the replicas that aren't leaders or the like.  If anyone is
> going to have a prayer of reproducing we'll need more info on exactly
> what you're seeing and how you're measuring this.

Meanwhile, I have log level WARN... I'l decrease  it to INFO and see. Tnx

> 
> Have you changed any configurations in your replicas at all? We'd need
> the exact steps you performed if so.
Command to create replicas was like this (implicit sharding and custom CoreName 
) :

mysolr07:8983/solr/admin/collections?action=ADDREPLICA
&collection=rpk94
&shard=rpk94_1_0
&property.name=rpk94_1_0_07
&type=tlog
&node=mysolr07:8983_solr

> 
> On a quick test I didn't see this, but if it were that easy to
> reproduce I'd expect it to have shown up before.

Yesterday I've tried to reproduce...  trying to change leader with 
REBALANCELEADERS command. 
It ended up with no leader at all for the shard  and I could not set leader at 
all for a long time.

   There was a problem trying to register as the 
leader:org.apache.solr.common.SolrException: Could not register as the leader 
because creating the ephemeral registration node in ZooKeeper failed
...
   Deleting duplicate registration: 
/collections/rpk94/leader_elect/rpk94_1_117/election/2983181187899523085-core_node73-n_22
...
  Index fetch failed :org.apache.solr.common.SolrException: No registered 
leader was found after waiting for 4000ms , collection: rpk94 slice: rpk94_1_117
...

Even to delete all replicas for the shard and recreate Replica to the same node 
with the same name did not help - no leader for that shard.
I had to delete collection, wait till morning and then it recreated 
successfully.
Suppose some weird znodes were deleted from  zk by morning.

> 
> NOTE: just looking at the cloud graph and having a node be active is
> not _necessarily_ sufficient for the node to be up to date. It
> _should_ be sufficient if (and only if) the node was shut down
> gracefully, but a "kill -9" or similar doesn't give the replicas on
> the node the opportunity to change the state. The "live_nodes" znode
> in ZooKeeper must also contain the node the replica resides on.

Node was live, cluster was healthy

> 
> If you see this state again, you could try pinging the node directly,
> does it respond? Your URL should look something like:
> http://host:port/solr/colection_shard1_replica_t1/query?q=*:*&distrib=false

Yes, sure I did. Ill replica responded and number of documents differs with the 
leader

> 
> The "distrib=false" is important as it won't forward the query to any
> other replica. If what you're reporting is really happening, that node
> should respond with a document count different from other nodes.
> 
> NOTE: there's a delay between the time the leader indexes a doc and
> it's visible on the follower. Are you sure you're waiting for
> leader_commit_interval+polling_interval+autowarm_time before
> concluding that there's a problem? I'm a bit suspicious that checking
> the versions is concluding that your indexes are out of sync when
> really they're just catching up normally. If it's at all possible to
> turn off indexing for a few minutes when this happens and everything
> just gets better then it's not really a problem.

Sure, the problem was on many shards but not on all shards
and for the long time. 

> 
> If we prove out that this is really happening as you think, then a
> JIRA (with steps to reproduce) is _definitely_ in order.
> 
> Best,
> Erick
> On Wed, Oct 24, 2018 at 2:07 AM Vadim Ivanov
>  wrote:
> >
> > Hi All !
> >
> > I'm testing Solr 7.5 with TLOG replicas on SolrCloud with 5 nodes.
> >
> > My collection has shards and every shard has 3 TLOG replicas on different
> > nodes.
> >
> > I've noticed that some replicas stop receiving updates from the leader
> > without any visible signs from the cluster status.
> >
> > (all replicas active and green in Admin UI CLOUD graph). But indexversion of
> > 'ill' replica not increasing with the leader.
> >
> > It seems to be dangerous, because that 'ill' replica could become a leader
> > after restart of the nodes and I already experienced data loss.
> >
> > I didn't notice any meaningfull records in solr log, except that probably
> > problem occurs when leader changes.
> >
> > Meanwhile, I monitor indexversion of all replicas in a cluster by mbeans and
> > recreate ill replicas when difference with the leader indexversion  more
> > than one
> >
> > Any suggestions?
> >
> > --
> >
> > Best regards, Vadim
> >
> >
> >

Re: Solr 7.5/skg

2018-10-25 Thread Alexandre Rafalovitch

Probably this one: https://issues.apache.org/jira/browse/SOLR-9418

I am not sure if that's documented yet.

Regards,
   Alex.
On Thu, 25 Oct 2018 at 11:08, David Hastings
 wrote:
>
> Although another of Treys examples, the semantic query parser, Doesn't seem
> to have documentation unless im missing something?
>
> On Thu, Oct 25, 2018 at 10:41 AM David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
> > Wow, thanks for that.  Will do some research and come back with the
> > inevitable questions I will have.
> >
> > On Thu, Oct 25, 2018 at 10:37 AM Alexandre Rafalovitch 
> > wrote:
> >
> >> See
> >> https://www.slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks
> >> , slides 19+
> >>
> >> But it is not a fully-supported usage, due to
> >> https://issues.apache.org/jira/browse/SOLR-12569 .
> >>
> >> So, at your own risk.
> >>
> >> Regards,
> >>Alex.
> >> On Thu, 25 Oct 2018 at 10:32, David Hastings
> >>  wrote:
> >> >
> >> > Another skg question.  the significantTerms
> >> > says it queries a solrcloud collection, only, is there any way to have
> >> it
> >> > work on standalone solr/cores as well?  the MLT function works fine on
> >> > standalone, was really hoping this would as well.
> >> >
> >> >
> >> > On Thu, Oct 25, 2018 at 10:25 AM Alexandre Rafalovitch <
> >> arafa...@gmail.com>
> >> > wrote:
> >> >
> >> > > That's being worked on as well. We've migrated the documentation from
> >> > > Confluence to standalone setup, so not all the pieces are in place
> >> > > yet.
> >> > >
> >> > > Regards,
> >> > >Alex.
> >> > > On Thu, 25 Oct 2018 at 10:12, David Hastings
> >> > >  wrote:
> >> > > >
> >> > > > Thanks very much!  for being a search product, the documentation
> >> isn't
> >> > > very
> >> > > > search friendly :)
> >> > > >
> >> > > > On Thu, Oct 25, 2018 at 9:29 AM Alexandre Rafalovitch <
> >> > > arafa...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > I think you are looking for:
> >> > > > >
> >> > > > >
> >> > >
> >> http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs
> >> > > > >
> >> > > > > Or, as a second option,
> >> > > > >
> >> > > > >
> >> > >
> >> http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms
> >> > > > >
> >> > > > > Regards,
> >> > > > >Alex.
> >> > > > > On Thu, 25 Oct 2018 at 08:47, David Hastings
> >> > > > >  wrote:
> >> > > > > >
> >> > > > > > Hey all, I was going throught the Solr 7.5 documentation:
> >> > > > > > http://lucene.apache.org/solr/guide/7_5/index.html
> >> > > > > >
> >> > > > > > and it appears to be incomplete.  last week Trey Grainger gave a
> >> > > > > > presentation about the skg plugin, and said it was now included
> >> in
> >> > > the
> >> > > > > 7.5
> >> > > > > > distribution.  There are no references to using it on the
> >> > > documentation,
> >> > > > > or
> >> > > > > > anywhere really.   the only thing close is some github
> >> information,
> >> > > from
> >> > > > > 2
> >> > > > > > years ago.  Is there a reason its not defined in the official
> >> > > > > documentation?
> >> > > > > > Thanks,
> >> > > > > > David
> >> > > > >
> >> > >
> >>
> >

Score relevancy

2018-10-25 Thread Amjad Khan

Hi

Is there a way to achieve the following - 

We have a RANK field in each document, and essentially, I would like my score 
to be influenced by this RANK as follows -

score = score*0.1 + RANK

How can I achieve this with function queries?

Thanks!

Re: Solr 7.5/skg

2018-10-25 Thread David Hastings

Yup, thats the one.  Thanks.

On Thu, Oct 25, 2018 at 11:54 AM Alexandre Rafalovitch 
wrote:

> Probably this one: https://issues.apache.org/jira/browse/SOLR-9418
>
> I am not sure if that's documented yet.
>
> Regards,
>Alex.
> On Thu, 25 Oct 2018 at 11:08, David Hastings
>  wrote:
> >
> > Although another of Treys examples, the semantic query parser, Doesn't
> seem
> > to have documentation unless im missing something?
> >
> > On Thu, Oct 25, 2018 at 10:41 AM David Hastings <
> > hastings.recurs...@gmail.com> wrote:
> >
> > > Wow, thanks for that.  Will do some research and come back with the
> > > inevitable questions I will have.
> > >
> > > On Thu, Oct 25, 2018 at 10:37 AM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > > wrote:
> > >
> > >> See
> > >>
> https://www.slideshare.net/arafalov/searching-for-ai-leveraging-solr-for-classic-artificial-intelligence-tasks
> > >> , slides 19+
> > >>
> > >> But it is not a fully-supported usage, due to
> > >> https://issues.apache.org/jira/browse/SOLR-12569 .
> > >>
> > >> So, at your own risk.
> > >>
> > >> Regards,
> > >>Alex.
> > >> On Thu, 25 Oct 2018 at 10:32, David Hastings
> > >>  wrote:
> > >> >
> > >> > Another skg question.  the significantTerms
> > >> > says it queries a solrcloud collection, only, is there any way to
> have
> > >> it
> > >> > work on standalone solr/cores as well?  the MLT function works fine
> on
> > >> > standalone, was really hoping this would as well.
> > >> >
> > >> >
> > >> > On Thu, Oct 25, 2018 at 10:25 AM Alexandre Rafalovitch <
> > >> arafa...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > That's being worked on as well. We've migrated the documentation
> from
> > >> > > Confluence to standalone setup, so not all the pieces are in place
> > >> > > yet.
> > >> > >
> > >> > > Regards,
> > >> > >Alex.
> > >> > > On Thu, 25 Oct 2018 at 10:12, David Hastings
> > >> > >  wrote:
> > >> > > >
> > >> > > > Thanks very much!  for being a search product, the documentation
> > >> isn't
> > >> > > very
> > >> > > > search friendly :)
> > >> > > >
> > >> > > > On Thu, Oct 25, 2018 at 9:29 AM Alexandre Rafalovitch <
> > >> > > arafa...@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > I think you are looking for:
> > >> > > > >
> > >> > > > >
> > >> > >
> > >>
> http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs
> > >> > > > >
> > >> > > > > Or, as a second option,
> > >> > > > >
> > >> > > > >
> > >> > >
> > >>
> http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms
> > >> > > > >
> > >> > > > > Regards,
> > >> > > > >Alex.
> > >> > > > > On Thu, 25 Oct 2018 at 08:47, David Hastings
> > >> > > > >  wrote:
> > >> > > > > >
> > >> > > > > > Hey all, I was going throught the Solr 7.5 documentation:
> > >> > > > > > http://lucene.apache.org/solr/guide/7_5/index.html
> > >> > > > > >
> > >> > > > > > and it appears to be incomplete.  last week Trey Grainger
> gave a
> > >> > > > > > presentation about the skg plugin, and said it was now
> included
> > >> in
> > >> > > the
> > >> > > > > 7.5
> > >> > > > > > distribution.  There are no references to using it on the
> > >> > > documentation,
> > >> > > > > or
> > >> > > > > > anywhere really.   the only thing close is some github
> > >> information,
> > >> > > from
> > >> > > > > 2
> > >> > > > > > years ago.  Is there a reason its not defined in the
> official
> > >> > > > > documentation?
> > >> > > > > > Thanks,
> > >> > > > > > David
> > >> > > > >
> > >> > >
> > >>
> > >
>

Re: Reading data using Tika to Solr

2018-10-25 Thread Erick Erickson

Martin:

The mail server is pretty aggressive about stripping attachments, your
png didn't come though. You might also get a more informed answer on
the Tika mailing list.

That said (and remember I can't see your png so this may be a silly
question), how are you executing the program .vs. compiling it? You
mentioned the "build path". I'm usually lazy and just execute it in
IntelliJ for development and have forgotten to set my classpath on
_numerous_ occasions when running it from a command line ;)

Best,
Erick

On Thu, Oct 25, 2018 at 2:55 AM Martin Frank Hansen (MHQ)  wrote:
>
> Hi,
>
>
>
> I am trying to read content of msg-files using Tika and index these in Solr, 
> however I am having some problems with the OfficeParser(). I keep getting the 
> error java.lang.NoClassDefFoundError for the OfficeParcer, even though both 
> tika-core and tika-parsers are included in the build path.
>
>
>
>
>
> I am using Java with the following code:
>
>
>
>
>
> public static void main(final String[] args) throws IOException,SAXException, 
> TikaException {
>
>
>
> processDocument(pathtofile)
>
>
>
>  }
>
>
>
> private static void processDocument(String 
> pathfilename)  {
>
>
>
>
>
>  try {
>
>
>
> File file = new 
> File(pathfilename);
>
>
>
> Metadata meta = new 
> Metadata();
>
>
>
>  InputStream input = 
> TikaInputStream.get(file);
>
>
>
>  BodyContentHandler 
> handler = new BodyContentHandler();
>
>
>
> Parser parser = new 
> OfficeParser();
>
>  ParseContext context 
> = new ParseContext();
>
>  parser.parse(input, 
> handler, meta, context);
>
>
>
>  String doccontent = 
> handler.toString();
>
>
>
>
>
> 
> System.out.println(doccontent);
>
> 
> System.out.println(meta);
>
>
>
>  }
>
>  }
>
> In the buildpath I have the following dependencies:
>
>
>
>
>
> Any help is appreciate.
>
>
>
> Thanks in advance.
>
>
>
> Best regards,
>
>
>
> Martin Hansen
>
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik, der fortæller, hvordan vi behandler oplysninger om 
> dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy outlining how we process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
>
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.

Re: Score relevancy

2018-10-25 Thread David Hastings

is this RANK value stored as a float/integer?  and whats the range?  one
idea is you could use edismax and have a really possible long boost query:
RANK:[1 TO 2]^10 OR RANK:[3 TO 4]^9
but this isnt actually a great idea and gets sloppy fast.  you could apply
boost at index time, or a function query at query time, both represented
here:
https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents

On Thu, Oct 25, 2018 at 11:58 AM Amjad Khan  wrote:

> Hi
>
> Is there a way to achieve the following -
>
> We have a RANK field in each document, and essentially, I would like my
> score to be influenced by this RANK as follows -
>
> score = score*0.1 + RANK
>
> How can I achieve this with function queries?
>
> Thanks!

Re: Internal Solr communication question

2018-10-25 Thread Fernando Otero

Thanks Emir!
I was already looking at preferLocalShards but I wasn't sure it'll help
with only 1 shard, I'll give it a try


On Thu, Oct 25, 2018 at 11:26 AM Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Fernando,
> I did not look at code and not sure if there is special handling in case
> of a single shard collection, but Solr does not have to choose local shard
> to query. It assumes that one node will receive all requests and that it
> needs to balance. What you can do is add preferLocalShards=true to make
> sure local shards are queried.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 25 Oct 2018, at 16:18, Fernando Otero  wrote:
> >
> > Hey Shawn
> >Thanks for your answer!. I changed the config to 1 shard with 7
> > replicas but I still see communication between nodes, is that expected?
> > Each node has 1 shard so it should have all the data needed to compute, I
> > don't get why I'm seeing communication between them.
> >
> > Thanks
> >
> > On Tue, Oct 23, 2018 at 2:21 PM Shawn Heisey 
> wrote:
> >
> >> On 10/23/2018 9:31 AM, Fernando Otero wrote:
> >>> Hey all
> >>>  I'm running some tests on Solr cloud (10 nodes, 3 shards, 3
> >> replicas),
> >>> when I run the queries I end up seeing 7x traffic ( requests / minute)
> >> in
> >>> Newrelic.
> >>>
> >>> Could it be that the internal communication between nodes is done
> through
> >>> HTTP and newrelic counts those calls?
> >>
> >> The inter-node communication is indeed done over HTTP, using the same
> >> handlers that clients use, and if you have something watching Solr's
> >> statistics or watching Jetty's counters, one of the counters will go up
> >> when an inter-node request happens.
> >>
> >> With 3 shards, one request coming in will generate as many as six
> >> additional requests -- one request to a replica for each shard, and then
> >> another request to each shard that has matches for the query, to
> >> retrieve the documents that will be in the response. The node that
> >> received the initial request will compile the results from all the
> >> shards and send them back in response to the original request.
> >> Nutshell:  One request from a client expands. With three shards, that
> >> will be four to seven requests total.  If you have 10 shards, it will be
> >> between 11 and 21 total requests.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
> > --
> >
> > Fernando Otero
> >
> > Sr Engineering Manager, Panamera
> >
> > Buenos Aires - Argentina
> >
> > Mobile: +54 911 67697108
> >
> > Email:  fernando.ot...@olx.com
>
>

-- 

Fernando Otero

Sr Engineering Manager, Panamera

Buenos Aires - Argentina

Mobile: +54 911 67697108

Email:  fernando.ot...@olx.com

Re: Score relevancy

2018-10-25 Thread Amjad Khan

We use ranking below 100 and yes it is float.

> On Oct 25, 2018, at 1:08 PM, David Hastings  
> wrote:
> 
> is this RANK value stored as a float/integer?  and whats the range?  one
> idea is you could use edismax and have a really possible long boost query:
> RANK:[1 TO 2]^10 OR RANK:[3 TO 4]^9
> but this isnt actually a great idea and gets sloppy fast.  you could apply
> boost at index time, or a function query at query time, both represented
> here:
> https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents
> 
> 
> 
> On Thu, Oct 25, 2018 at 11:58 AM Amjad Khan  wrote:
> 
>> Hi
>> 
>> Is there a way to achieve the following -
>> 
>> We have a RANK field in each document, and essentially, I would like my
>> score to be influenced by this RANK as follows -
>> 
>> score = score*0.1 + RANK
>> 
>> How can I achieve this with function queries?
>> 
>> Thanks!

Re: Reading data using Tika to Solr

2018-10-25 Thread Tim Allison

To follow up w Erick’s point, there are a bunch of transitive dependencies
from tika-parsers. If you aren’t using maven or similar build system to
grab the dependencies, it can be tricky to get it right. If you aren’t
using maven, and you can afford the risks of jar hell, consider using
tika-app or, better perhaps, tika-server.

Stay tuned for SOLR-11721...

On Thu, Oct 25, 2018 at 1:08 PM Erick Erickson 
wrote:

> Martin:
>
> The mail server is pretty aggressive about stripping attachments, your
> png didn't come though. You might also get a more informed answer on
> the Tika mailing list.
>
> That said (and remember I can't see your png so this may be a silly
> question), how are you executing the program .vs. compiling it? You
> mentioned the "build path". I'm usually lazy and just execute it in
> IntelliJ for development and have forgotten to set my classpath on
> _numerous_ occasions when running it from a command line ;)
>
> Best,
> Erick
>
> On Thu, Oct 25, 2018 at 2:55 AM Martin Frank Hansen (MHQ) 
> wrote:
> >
> > Hi,
> >
> >
> >
> > I am trying to read content of msg-files using Tika and index these in
> Solr, however I am having some problems with the OfficeParser(). I keep
> getting the error java.lang.NoClassDefFoundError for the OfficeParcer, even
> though both tika-core and tika-parsers are included in the build path.
> >
> >
> >
> >
> >
> > I am using Java with the following code:
> >
> >
> >
> >
> >
> > public static void main(final String[] args) throws
> IOException,SAXException, TikaException {
> >
> >
> >
> > processDocument(pathtofile)
> >
> >
> >
> >  }
> >
> >
> >
> > private static void processDocument(String
> pathfilename)  {
> >
> >
> >
> >
> >
> >  try {
> >
> >
> >
> > File file = new
> File(pathfilename);
> >
> >
> >
> > Metadata meta =
> new Metadata();
> >
> >
> >
> >  InputStream
> input = TikaInputStream.get(file);
> >
> >
> >
> >
> BodyContentHandler handler = new BodyContentHandler();
> >
> >
> >
> > Parser parser =
> new OfficeParser();
> >
> >  ParseContext
> context = new ParseContext();
> >
> >
> parser.parse(input, handler, meta, context);
> >
> >
> >
> >  String
> doccontent = handler.toString();
> >
> >
> >
> >
> >
> >
>  System.out.println(doccontent);
> >
> >
>  System.out.println(meta);
> >
> >
> >
> >  }
> >
> >  }
> >
> > In the buildpath I have the following dependencies:
> >
> >
> >
> >
> >
> > Any help is appreciate.
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> > Best regards,
> >
> >
> >
> > Martin Hansen
> >
> >
> >
> > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder
> du KMD’s Privatlivspolitik, der fortæller, hvordan vi behandler oplysninger
> om dig.
> >
> > Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy outlining how we process your personal data.
> >
> > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig
> information. Hvis du ved en fejltagelse modtager e-mailen, beder vi dig
> venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
> beder vi dig slette e-mailen i dit system uden at videresende eller kopiere
> den. Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
> er fri for virus og andre fejl, som kan påvirke computeren eller
> it-systemet, hvori den modtages og læses, åbnes den på modtagerens eget
> ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er opstået i
> forbindelse med at modtage og bruge e-mailen.
> >
> > Please note that this message may contain confidential information. If
> you have received this message by mistake, please inform the sender of the
> mistake by sending a reply, then delete the message from your system
> without making, distributing or retaining any copies of it. Although we
> believe that the message and any attachments are free from viruses and
> other errors that might affect the computer or it-system where it is
> received and read, the recipient opens the message at his or her own risk.
> We assume no responsibility for any loss or damage arising from the receipt
> or use of this message.
>

Re: Score relevancy

2018-10-25 Thread Walter Underwood

Use a bf of 10 * RANK. That will give the same ordering as dividing the score 
by 10 and adding RANK.

There are problems with additive boosts, so I strongly recommend looking at the 
“boost” parameter, which is a multiplicative boost. That is more stable over a 
wide range of score values.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 25, 2018, at 11:11 AM, Amjad Khan  wrote:
> 
> We use ranking below 100 and yes it is float.
> 
>> On Oct 25, 2018, at 1:08 PM, David Hastings  
>> wrote:
>> 
>> is this RANK value stored as a float/integer?  and whats the range?  one
>> idea is you could use edismax and have a really possible long boost query:
>> RANK:[1 TO 2]^10 OR RANK:[3 TO 4]^9
>> but this isnt actually a great idea and gets sloppy fast.  you could apply
>> boost at index time, or a function query at query time, both represented
>> here:
>> https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents
>> 
>> 
>> 
>> On Thu, Oct 25, 2018 at 11:58 AM Amjad Khan  wrote:
>> 
>>> Hi
>>> 
>>> Is there a way to achieve the following -
>>> 
>>> We have a RANK field in each document, and essentially, I would like my
>>> score to be influenced by this RANK as follows -
>>> 
>>> score = score*0.1 + RANK
>>> 
>>> How can I achieve this with function queries?
>>> 
>>> Thanks!
>

Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?

2018-10-25 Thread Jason Gerlowski

One comment to complicate Erick's already-good advice.

> If a doc that needs to go to shard2 is received by a replica on shard1, it 
> must be forwarded to the leader of shard1, introducing an extra hop.

Definitely true, but I don't think that's the only factor in the
relative performance of CUSC vs CSC.  CUSC responds asynchronously
when you're using it for updates, which lets users continue on to
prepare the next set of docs while a CloudSolrClient might still be
waiting to hear back from Solr.  I benchmarked this recently and was
surprised to see that ConcurrentUpdateSolrClient actually came out
ahead in some setups.

Now I'm not trying to say that CUSC performs better than CSC, just
that "It Depends" (Erick's TM) on the rest of your ETL code, on the
topology of your SolrCloud cluster, etc.

Good luck!

Jason

On Wed, Oct 24, 2018 at 6:49 PM shamik  wrote:
>
> Thanks Erick, appreciate your help
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Reading data using Tika to Solr

2018-10-25 Thread Martin Frank Hansen (MHQ)

Hi Erick and Tim,

Thanks for your answers, I can see that my mail got messed up on the way 
through the server. It looked much more readable at my end 😉 The attachment 
simply included my build-path.

@Erick I am compiling the program using Netbeans at the moment.

I updated to tika-1.7 but that did not help, and I haven't tried maven yet but 
will probably have to give that a chance. I just find it a bit odd that I can 
see the dependencies are included in the jar files I added to the project, but 
I must be missing something?

My buildpath looks as follows:

Tika-parsers-1.4.jar
Tika-core-1.4.jar
Commons-io-2.5.jar
Httpclient-4.5.3
Httpcore-4.4.6.jar
Httpmime-4.5.3.jar
Slf4j-api1-7-24.jar
Jcl-over--slf4j-1.7.24.jar
Solr-cell-7.5.0.jar
Solr-core-7.5.0.jar
Solr-solrj-7.5.0.jar
Noggit-0.8.jar



-Original Message-
From: Tim Allison 
Sent: 25. oktober 2018 20:21
To: solr-user@lucene.apache.org
Subject: Re: Reading data using Tika to Solr

To follow up w Erick’s point, there are a bunch of transitive dependencies from 
tika-parsers. If you aren’t using maven or similar build system to grab the 
dependencies, it can be tricky to get it right. If you aren’t using maven, and 
you can afford the risks of jar hell, consider using tika-app or, better 
perhaps, tika-server.

Stay tuned for SOLR-11721...

On Thu, Oct 25, 2018 at 1:08 PM Erick Erickson 
wrote:

> Martin:
>
> The mail server is pretty aggressive about stripping attachments, your
> png didn't come though. You might also get a more informed answer on
> the Tika mailing list.
>
> That said (and remember I can't see your png so this may be a silly
> question), how are you executing the program .vs. compiling it? You
> mentioned the "build path". I'm usually lazy and just execute it in
> IntelliJ for development and have forgotten to set my classpath on
> _numerous_ occasions when running it from a command line ;)
>
> Best,
> Erick
>
> On Thu, Oct 25, 2018 at 2:55 AM Martin Frank Hansen (MHQ) 
> wrote:
> >
> > Hi,
> >
> >
> >
> > I am trying to read content of msg-files using Tika and index these
> > in
> Solr, however I am having some problems with the OfficeParser(). I
> keep getting the error java.lang.NoClassDefFoundError for the
> OfficeParcer, even though both tika-core and tika-parsers are included in the 
> build path.
> >
> >
> >
> >
> >
> > I am using Java with the following code:
> >
> >
> >
> >
> >
> > public static void main(final String[] args) throws
> IOException,SAXException, TikaException {
> >
> >
> >
> > processDocument(pathtofile)
> >
> >
> >
> >  }
> >
> >
> >
> > private static void
> > processDocument(String
> pathfilename)  {
> >
> >
> >
> >
> >
> >  try {
> >
> >
> >
> > File file =
> > new
> File(pathfilename);
> >
> >
> >
> > Metadata
> > meta =
> new Metadata();
> >
> >
> >
> >  InputStream
> input = TikaInputStream.get(file);
> >
> >
> >
> >
> BodyContentHandler handler = new BodyContentHandler();
> >
> >
> >
> > Parser
> > parser =
> new OfficeParser();
> >
> >
> > ParseContext
> context = new ParseContext();
> >
> >
> parser.parse(input, handler, meta, context);
> >
> >
> >
> >  String
> doccontent = handler.toString();
> >
> >
> >
> >
> >
> >
>  System.out.println(doccontent);
> >
> >
>  System.out.println(meta);
> >
> >
> >
> >  }
> >
> >  }
> >
> > In the buildpath I have the following dependencies:
> >
> >
> >
> >
> >
> > Any help is appreciate.
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> > Best regards,
> >
> >
> >
> > Martin Hansen
> >
> >
> >
> > Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> > finder
> du KMD’s Privatlivspolitik, der fortæller, hvordan vi behandler
> oplysninger om dig.
> >
> > Protection of your personal data is important to us. Here you can
> > read
> KMD’s Privacy Policy outlining how we process your personal data.
> >
> > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig
> information. Hvis du ved en fejltagelse modtager e-mailen, beder vi
> dig venligst informere afsender om fejlen ved at bruge svarfunktionen.
> Samtidig beder vi dig slette e-mailen i dit system uden at videresende
> eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter
> vores overbevisning er fri for virus og andre fejl, som kan påvirke
> computeren eller it-systemet, hvori den modtages og læses, åbnes den
> på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og
> skade, som er opstået i forbindelse med at modtage og bruge e-mailen.
> >
> > Please note that this message may contain confidential information.

Re: Reading data using Tika to Solr

2018-10-25 Thread Tim Allison

If you’re processing actual msg (not eml), you’ll also need poi and
poi-scratchpad and their dependencies, but then those msgs could have
attachments, at which point, you may as just add tika-app. :D

On Thu, Oct 25, 2018 at 2:46 PM Martin Frank Hansen (MHQ) 
wrote:

> Hi Erick and Tim,
>
> Thanks for your answers, I can see that my mail got messed up on the way
> through the server. It looked much more readable at my end 😉 The
> attachment simply included my build-path.
>
> @Erick I am compiling the program using Netbeans at the moment.
>
> I updated to tika-1.7 but that did not help, and I haven't tried maven yet
> but will probably have to give that a chance. I just find it a bit odd that
> I can see the dependencies are included in the jar files I added to the
> project, but I must be missing something?
>
> My buildpath looks as follows:
>
> Tika-parsers-1.4.jar
> Tika-core-1.4.jar
> Commons-io-2.5.jar
> Httpclient-4.5.3
> Httpcore-4.4.6.jar
> Httpmime-4.5.3.jar
> Slf4j-api1-7-24.jar
> Jcl-over--slf4j-1.7.24.jar
> Solr-cell-7.5.0.jar
> Solr-core-7.5.0.jar
> Solr-solrj-7.5.0.jar
> Noggit-0.8.jar
>
>
>
> -Original Message-
> From: Tim Allison 
> Sent: 25. oktober 2018 20:21
> To: solr-user@lucene.apache.org
> Subject: Re: Reading data using Tika to Solr
>
> To follow up w Erick’s point, there are a bunch of transitive dependencies
> from tika-parsers. If you aren’t using maven or similar build system to
> grab the dependencies, it can be tricky to get it right. If you aren’t
> using maven, and you can afford the risks of jar hell, consider using
> tika-app or, better perhaps, tika-server.
>
> Stay tuned for SOLR-11721...
>
> On Thu, Oct 25, 2018 at 1:08 PM Erick Erickson 
> wrote:
>
> > Martin:
> >
> > The mail server is pretty aggressive about stripping attachments, your
> > png didn't come though. You might also get a more informed answer on
> > the Tika mailing list.
> >
> > That said (and remember I can't see your png so this may be a silly
> > question), how are you executing the program .vs. compiling it? You
> > mentioned the "build path". I'm usually lazy and just execute it in
> > IntelliJ for development and have forgotten to set my classpath on
> > _numerous_ occasions when running it from a command line ;)
> >
> > Best,
> > Erick
> >
> > On Thu, Oct 25, 2018 at 2:55 AM Martin Frank Hansen (MHQ) 
> > wrote:
> > >
> > > Hi,
> > >
> > >
> > >
> > > I am trying to read content of msg-files using Tika and index these
> > > in
> > Solr, however I am having some problems with the OfficeParser(). I
> > keep getting the error java.lang.NoClassDefFoundError for the
> > OfficeParcer, even though both tika-core and tika-parsers are included
> in the build path.
> > >
> > >
> > >
> > >
> > >
> > > I am using Java with the following code:
> > >
> > >
> > >
> > >
> > >
> > > public static void main(final String[] args) throws
> > IOException,SAXException, TikaException {
> > >
> > >
> > >
> > > processDocument(pathtofile)
> > >
> > >
> > >
> > >  }
> > >
> > >
> > >
> > > private static void
> > > processDocument(String
> > pathfilename)  {
> > >
> > >
> > >
> > >
> > >
> > >  try {
> > >
> > >
> > >
> > > File file =
> > > new
> > File(pathfilename);
> > >
> > >
> > >
> > > Metadata
> > > meta =
> > new Metadata();
> > >
> > >
> > >
> > >  InputStream
> > input = TikaInputStream.get(file);
> > >
> > >
> > >
> > >
> > BodyContentHandler handler = new BodyContentHandler();
> > >
> > >
> > >
> > > Parser
> > > parser =
> > new OfficeParser();
> > >
> > >
> > > ParseContext
> > context = new ParseContext();
> > >
> > >
> > parser.parse(input, handler, meta, context);
> > >
> > >
> > >
> > >  String
> > doccontent = handler.toString();
> > >
> > >
> > >
> > >
> > >
> > >
> >  System.out.println(doccontent);
> > >
> > >
> >  System.out.println(meta);
> > >
> > >
> > >
> > >  }
> > >
> > >  }
> > >
> > > In the buildpath I have the following dependencies:
> > >
> > >
> > >
> > >
> > >
> > > Any help is appreciate.
> > >
> > >
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > > Best regards,
> > >
> > >
> > >
> > > Martin Hansen
> > >
> > >
> > >
> > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> > > finder
> > du KMD’s Privatlivspolitik, der fortæller, hvordan vi behandler
> > oplysninger om dig.
> > >
> > > Protection of your personal data is important to us. Here you can
> > > read
> > KMD’s Privacy Policy outlining how we process your personal data.
> > >
> > > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig
> > infor

Fuzzy search expansion problem on 6.6.3

2018-10-25 Thread Ryan Wilson

Hello all,

I am running a solr 6.6.3 3-shard cloud with one main collection that
contains 587,371,821 rows of data. One of the fields in this collection is
names. We are currently running into an issue with fuzzy searches on name
where it seems unable to get all possible values for a number of different
names even when only querying for 1 change (~1).

I've technically asked this question in the distant past and the answer I
received at the time was to modify org.apache.lucene.search.FuzzySearch to
have a larger defaultMaxExpansions value. For disclosure we also set
defaultTranspositions to false as the customers did not like query results
they were getting with it on. For a time this worked. However, within the
last 6 months or so we've started seeing signs of this issue cropping up
again.

The two things that have changed since the original email is that we've
migrated from 4.7.1 to 6.6.3 and we almost doubled the number of records in
the index. With the hope that the old solution would still work, I've
tweaked defaultMaxExpansions as high as 10240 with the requisite change to
maxBooleanClauses to match and it seems to have had no effect. So much so
that I am suspicious that the change is having no effect whatsoever. I am
in the process of setting up a much more focused testing environment for
just names, but figured I'd send this out to get some initial advice or
suggestions on what I might have missed or should investigate.

I've reviewed patch notes for versions before and after 6.6.3 to check for
breaking changes from 4.7.1 or fixes in future versions and haven't seen
anything.

Thanks,
Ryan Wilson

Solr Cell Input Parameter tika.config

2018-10-25 Thread Robertson, Eric J

Hello all,

Currently trying to define a tika config to use when posting a pdf to Solr Cell 
as we may want to override the default tika configuration depending on type of 
document being ingested.

In the docs it lists tika.config as an input parameter to the Solr Cell 
endpoint. Though in my tests it does not seem to be working or acknowledging it 
all.

Does anyone have working example using this input parameter?

I am running solr 7.4.0 on Windows 7.

Thanks!

Edismax query returning the same number of results using AND as it does with OR

2018-10-25 Thread Nicky Mastin


Oddity with edismax and queries involving boolean operators.  Here's the 
"parsedquery_toString" from two different queries:
input:  "dog AND kiwi":
https://apaste.info/gaQl
input:  "dog OR kiwi":
https://apaste.info/sBwa
Both queries return the same number of results (389).  The query with OR was 
expected to have a much higher numFound.  Those pastes have a one week 
lifetime.
The two parsed queries are almost identical.  The AND query has a couple of 
extra plus signs compared to the OR query, and the OR query has a ~2 after a 
right paren that the AND query doesn't have.  I'm at a loss as to what this 
all means, except to say that it didn't turn out as expected.
Should the two queries have returned different numbers of results?  If not, 
why is that the case?
Here is the output from echoParams=all on the OR query:
true
text
true
LINE
enum
3
 0.4
5

title^100 kw1ranked^100 kw1^100 keywordsranked_bm25_no_norms^50 
keywords_bm25_no_norms^50 authors text description species


before
after

subdocuments,keywords,authors
3<-1 6<-3 9<30%
true
html
on


max(recip(ms(NOW/DAY+1YEAR,dateint),3.16E-11,10,6),documentdatefix)

rank

true
1000
breakIterator
true
year
2015
spell_file
true
all

id,title,description,url,objecttypeid,contexturl,defaultsourceid,sourceid,score

false
100
5
5



{!ex=dt key="Last10yr"}dateint:[NOW/YEAR-10YEARS TO *]


{!ex=dt key="Last5yr"}dateint:[NOW/YEAR-5YEARS TO *]


{!ex=dt key="Last3yr"}dateint:[NOW/YEAR-3YEARS TO *]


{!ex=dt key="Last1yr"}dateint:[NOW/YEAR-1YEAR TO *]


edismax
false
enum
xml
true
*:*

folderid
sourceid
speciesid
admin

enum
map
0
true
25
2
true
dog OR kiwi
1970


title~20^5000 keywordsranked_bm25_no_norms~20^5000 kw1ranked~10^5000 
keywords_bm25_no_norms~20^1500 kw1~10^500 authors^250 text~20^1000 
text~100^500 description^1

1
unified
10

title~22^1000 keywordsranked_bm25_no_norms~22^1000 
keywords_bm25_no_norms~12^500 kw1ranked~12^100 kw1~12^100 text~22^100

authors~11 species~11
on
If anyone has any ideas about whether this behavior is expected or 
unexpected, I'd appreciate hearing them.  It is Solr 7.1.0 with a patch for 
SOLR-12243 applied.
There might be information that would be helpful that isn't provided.  If 
there is something else needed, please let me know, so I can provide it.

Re: Internal Solr communication question

2018-10-25 Thread Erick Erickson

preferLocalShards is a bit of a misnomer. I usually think of it as
"don't go to another Solr node if possible".
On Thu, Oct 25, 2018 at 10:46 AM Fernando Otero  wrote:
>
> Thanks Emir!
> I was already looking at preferLocalShards but I wasn't sure it'll help
> with only 1 shard, I'll give it a try
>
>
> On Thu, Oct 25, 2018 at 11:26 AM Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
> > Hi Fernando,
> > I did not look at code and not sure if there is special handling in case
> > of a single shard collection, but Solr does not have to choose local shard
> > to query. It assumes that one node will receive all requests and that it
> > needs to balance. What you can do is add preferLocalShards=true to make
> > sure local shards are queried.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 25 Oct 2018, at 16:18, Fernando Otero  wrote:
> > >
> > > Hey Shawn
> > >Thanks for your answer!. I changed the config to 1 shard with 7
> > > replicas but I still see communication between nodes, is that expected?
> > > Each node has 1 shard so it should have all the data needed to compute, I
> > > don't get why I'm seeing communication between them.
> > >
> > > Thanks
> > >
> > > On Tue, Oct 23, 2018 at 2:21 PM Shawn Heisey 
> > wrote:
> > >
> > >> On 10/23/2018 9:31 AM, Fernando Otero wrote:
> > >>> Hey all
> > >>>  I'm running some tests on Solr cloud (10 nodes, 3 shards, 3
> > >> replicas),
> > >>> when I run the queries I end up seeing 7x traffic ( requests / minute)
> > >> in
> > >>> Newrelic.
> > >>>
> > >>> Could it be that the internal communication between nodes is done
> > through
> > >>> HTTP and newrelic counts those calls?
> > >>
> > >> The inter-node communication is indeed done over HTTP, using the same
> > >> handlers that clients use, and if you have something watching Solr's
> > >> statistics or watching Jetty's counters, one of the counters will go up
> > >> when an inter-node request happens.
> > >>
> > >> With 3 shards, one request coming in will generate as many as six
> > >> additional requests -- one request to a replica for each shard, and then
> > >> another request to each shard that has matches for the query, to
> > >> retrieve the documents that will be in the response. The node that
> > >> received the initial request will compile the results from all the
> > >> shards and send them back in response to the original request.
> > >> Nutshell:  One request from a client expands. With three shards, that
> > >> will be four to seven requests total.  If you have 10 shards, it will be
> > >> between 11 and 21 total requests.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
> > >
> > > --
> > >
> > > Fernando Otero
> > >
> > > Sr Engineering Manager, Panamera
> > >
> > > Buenos Aires - Argentina
> > >
> > > Mobile: +54 911 67697108
> > >
> > > Email:  fernando.ot...@olx.com
> >
> >
>
> --
>
> Fernando Otero
>
> Sr Engineering Manager, Panamera
>
> Buenos Aires - Argentina
>
> Mobile: +54 911 67697108
>
> Email:  fernando.ot...@olx.com

Re: Solr Cell Input Parameter tika.config

2018-10-25 Thread Yasufumi Mizoguchi

Hello,

I could not find the process that parse tika.config parameter from solr
request.
Maybe, tika.config parameter can only be defined in solrconfig.xml as
following.


  tika-config.xml
  
true
ignored_
true
links
ignored_
  


Thanks,
Yasufumi

2018年10月26日(金) 7:07 Robertson, Eric J :

> Hello all,
>
> Currently trying to define a tika config to use when posting a pdf to Solr
> Cell as we may want to override the default tika configuration depending on
> type of document being ingested.
>
> In the docs it lists tika.config as an input parameter to the Solr Cell
> endpoint. Though in my tests it does not seem to be working or
> acknowledging it all.
>
> Does anyone have working example using this input parameter?
>
> I am running solr 7.4.0 on Windows 7.
>
> Thanks!
>

Re: Edismax query returning the same number of results using AND as it does with OR

2018-10-25 Thread Zheng Lin Edwin Yeo

Hi,

What is your full query path or URL that you pass for the query?
And how is your setting like for the edismax in your solrconfig.xml?

Regards,
Edwin

On Fri, 26 Oct 2018 at 06:24, Nicky Mastin  wrote:

>
> Oddity with edismax and queries involving boolean operators.  Here's the
> "parsedquery_toString" from two different queries:
> input:  "dog AND kiwi":
> https://apaste.info/gaQl
> input:  "dog OR kiwi":
> https://apaste.info/sBwa
> Both queries return the same number of results (389).  The query with OR
> was
> expected to have a much higher numFound.  Those pastes have a one week
> lifetime.
> The two parsed queries are almost identical.  The AND query has a couple
> of
> extra plus signs compared to the OR query, and the OR query has a ~2 after
> a
> right paren that the AND query doesn't have.  I'm at a loss as to what
> this
> all means, except to say that it didn't turn out as expected.
> Should the two queries have returned different numbers of results?  If
> not,
> why is that the case?
> Here is the output from echoParams=all on the OR query:
> true
> text
> true
> LINE
> enum
> 3
>  0.4
> 5
> 
> title^100 kw1ranked^100 kw1^100 keywordsranked_bm25_no_norms^50
> keywords_bm25_no_norms^50 authors text description species
> 
> 
> before
> after
> 
> subdocuments,keywords,authors
> 3<-1 6<-3 9<30%
> true
> html
> on
> 
> 
> max(recip(ms(NOW/DAY+1YEAR,dateint),3.16E-11,10,6),documentdatefix)
> 
> rank
> 
> true
> 1000
> breakIterator
> true
> year
> 2015
> spell_file
> true
> all
> 
>
> id,title,description,url,objecttypeid,contexturl,defaultsourceid,sourceid,score
> 
> false
> 100
> 5
> 5
> 
> 
> 
> {!ex=dt key="Last10yr"}dateint:[NOW/YEAR-10YEARS TO *]
> 
> 
> {!ex=dt key="Last5yr"}dateint:[NOW/YEAR-5YEARS TO *]
> 
> 
> {!ex=dt key="Last3yr"}dateint:[NOW/YEAR-3YEARS TO *]
> 
> 
> {!ex=dt key="Last1yr"}dateint:[NOW/YEAR-1YEAR TO *]
> 
> 
> edismax
> false
> enum
> xml
> true
> *:*
> 
> folderid
> sourceid
> speciesid
> admin
> 
> enum
> map
> 0
> true
> 25
> 2
> true
> dog OR kiwi
> 1970
> 
> 
> title~20^5000 keywordsranked_bm25_no_norms~20^5000 kw1ranked~10^5000
> keywords_bm25_no_norms~20^1500 kw1~10^500 authors^250 text~20^1000
> text~100^500 description^1
> 
> 1
> unified
> 10
> 
> title~22^1000 keywordsranked_bm25_no_norms~22^1000
> keywords_bm25_no_norms~12^500 kw1ranked~12^100 kw1~12^100 text~22^100
> 
> authors~11 species~11
> on
> If anyone has any ideas about whether this behavior is expected or
> unexpected, I'd appreciate hearing them.  It is Solr 7.1.0 with a patch
> for
> SOLR-12243 applied.
> There might be information that would be helpful that isn't provided.  If
> there is something else needed, please let me know, so I can provide it.
>
>

A different result with filters

2018-10-25 Thread Владислав Властовский

Hi, I use 7.5.0 Solr

Why do I get two different results for similar requests?

First req/res:
{
  "query": "*:*",
  "limit": 0,
  "filter": [
"{!parent which=kind_s:edition}condition_s:0",
"{!parent which=kind_s:edition}price_i:[* TO 75]"
  ]
}

{
  "response": {
"numFound": 453,
"start": 0,
"docs": []
  }
}

And second query:
{
  "query": "*:*",
  "limit": 0,
  "filter": [
"{!parent which=kind_s:edition}condition_s:0 AND price_i:[* TO 75]"
  ]
}

{
  "response": {
"numFound": 452,
"start": 0,
"docs": []
  }
}

41 matches

Mail list logo