"dismax" parameter "bq" filters instead of boosting

2019-03-05 Thread Nicolas Franck
I noticed a change in the behaviour of the regular "dismax" parser.
At least in version 7.4:

when you add "bq", it filters the results (like "fq" does), instead of boosting 
the matches.


e.g.

defType=dismax
bq=format:periodical^30

gives only records with format "periodical".
removing the parameter "bq" returns all records

It does work when defType is set to "edismax".

Any idea?

Re: Highlighting the search keywords

2018-07-31 Thread Nicolas Franck
Nope, that is how it works. It is not in place.

> On 31 Jul 2018, at 21:57, Renuka Srishti  wrote:
> 
> Hi All,
> 
> I was using highlighting in solr, solr gives highlighting results within
> the response but not included within the documents.
> Am i missing something? Can i configure so that it can show highlighted
> keywords matched within the documents.
> 
> Thanks
> Renuka Srishti



use highlighting on multivalued fields with positionIncrementGap 0

2020-02-14 Thread Nicolas Franck
I'm trying to use highlighting on a multivalued text field (analysis not so 
important) ..


  { text: [ "hello", "world" ], id: 1 }

but I want to match across the string boundaries:

  q=text:"hello world"

This works by setting the attribute
positionIncrementGap to 0, but then the hightlighting entry is empty

  "highlighting": { "1" : { "text" : [] } }

Parameters are:

  hl=true
  hl.fl=text
  hl.snippets=50
  hl.fragSize=1

Any idea why this happens? 
I guess this gap is internal stuff handled by Lucene that Solr doesn't know 
about?
(as for lucene, there are no multivalued fields!)



Re: Lemmatizer for Solr

2020-02-14 Thread Nicolas Franck
Try also looking at the HunspellFilter:

https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html

dictionaries ( .dic and .aff ) can be found here:

https://cgit.freedesktop.org/libreoffice/dictionaries

or via the git repo:

https://anongit.freedesktop.org/git/libreoffice/dictionaries.git

It is a spellingstool actually that works by applying rules (from the affix)
to each individual token until it finds a word in the dictionary.
And luckily there are a lot of dictionaries (from libreoffice)

The opennlp looked promising, but - as with hunspell - the quality
depends on the dictionary, and I could not find any dictionary
beyond the English ones (anyone):

http://opennlp.sourceforge.net/models-1.5/
https://github.com/richardwilly98/elasticsearch-opennlp-auto-tagging/tree/master/src/main/resources/models

I guess that was the only thing you were looking for?
I would use this one, if it wasn't for the lack of other dictionaries,
as it does thorough inspecting of the semantic context before
trying to match any word (hunspell determines this without
knowing any context, due to the way it is called).

On 14 Feb 2020, at 21:21, Shamik Bandopadhyay 
mailto:sham...@gmail.com>> wrote:

Hi,
 I'm trying to replace pprter stemmer with an english lemmatizer in my
analysis chain. Just wondering what
is the recommended way of achieving this. I've come across few different
implementation which are listed below;

Open NLP -->
https://lucene.apache.org/solr/guide/7_5/language-analysis.html#opennlp-
lemmatizer-filter

https://opennlp.apache.org/docs/1.8.0/manual/opennlp.html#tools.lemmatizer

KStem Filter -->
https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#kstem-filter

There are couple of third party libraries , but not sure if they are being
maintained or compatible with the solr version i'm using (7.5).

https://github.com/nicholasding/solr-lemmatizer
https://github.com/bejean/solr-lemmatizer

Currently, I'm looking for English only lemmatization. Also, I need to have
the ability to update the lemma dictionary to add custom terms specific to
our organization (not sure of kstem filter can do that).

Any pointers will be appreciated.

Regards,
Shamik



Re: A question about solr filter cache

2020-02-17 Thread Nicolas Franck
If 1GB would make solr go out of memory by using a filter query cache,
then it would have already happened during the initial upload of the
solr documents. Imagine the amount of memory you need for one billion 
documents..
A filter cache would be the least of your problems. 1GB is small in comparison
to the entire solr index.

> On 17 Feb 2020, at 10:13, Hongxu Ma  wrote:
> 
> Hi
> I want to know the internal of solr filter cache, especially its memory usage.
> 
> I googled some pages:
> https://teaspoon-consulting.com/articles/solr-cache-tuning.html
> https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.html 
> (Erick Erickson's answer)
> 
> All of them said its structure is: fq => a bitmap (total doc number bits), 
> but I think it's not so simple, reason:
> Given total doc number is 1 billion, each filter cache entry will use nearly 
> 1GB(10/8 bit), it's too big and very easy to make solr OOM (I have a 
> 1 billion doc cluster, looks it works well)
> 
> And I also checked solr node, but cannot find the details (only saw using 
> DocSets structure)
> 
> So far, I guess:
> 
>  *   degenerate into an doc id array/list when the bitmap is sparse
>  *   using some compressed bitmap, e.g. roaring bitmaps
> 
> which one is correct? or another answer, thanks you very much!
> 



Re: Should I index the field that use in fq field?

2020-03-13 Thread Nicolas Franck
Yes,

every field you query has to be "indexed"

every field you need to be returned in the response has to be "stored"

the parameter "fl" can only return fields that are "stored". Other fields
given are simply ignored.



> On 13 Mar 2020, at 13:15, GTHell  wrote:
> 
> I'm doing a lot of filter query in fq. My search is something like
> 'q=*:*&fq=..function on a few fields..' . Do I need to only index those
> field and use FL to get other result or do I need to index everything?
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: How do *you* restrict access to Solr?

2020-03-16 Thread Nicolas Franck
IPtables seems like the way to go, at least for me.
Even if this basic-auth-plugin works, then you'll have to
deal with denial-of-service attacks (although these can
also happen indirectly, by hitting the website that uses Solr).

> On 16 Mar 2020, at 15:44, Ryan W  wrote:
> 
> How do you, personally, do it?  Do you use IPTables?  Basic Authentication
> Plugin? Something else?
> 
> I'm asking in part so I'l have something to search for.  I don't know where
> I should begin, so I figured I would ask how others do it.
> 
> I haven't been able to find anything that works, so if you can tell me what
> works for you, I can at least narrow it down a bit and do some Google
> searches.  Do I need to learn Solr's plugin system?  Am I starting in the
> right place if I follow this document:
> https://lucene.apache.org/solr/guide/7_0/rule-based-authorization-plugin.html#rule-based-authorization-plugin
> 
> Initially, the above document seems far too comprehensive for my needs.  I
> just want to block access to the Solr admin UI, and the list of predefined
> permissions in that document don't seem to be relevant.  Also, it seems
> unlikely this plugin system is necessary just to control access to the
> admin UI... or maybe it necessary?
> 
> In any case, what is your approach?
> 
> I'm using version 7.7.2 of Solr.
> 
> Thanks!



Re: Limit Solr Disk IO

2020-06-04 Thread Nicolas Franck
The real questions are: 

* how much often do you commit (either explicitly or automatically)?
* how much segments do you allow? If you only allow 1 segment,
  then that whole segment is recreated using the old documents and the updates.
  And yes, that requires reading the old segment.
  It is common to allow multiple segments when you update often,
  so updating does not interfere with reading the index too often.


> On 4 Jun 2020, at 14:08, Anshuman Singh  wrote:
> 
> I noticed that while indexing, when commit happens, there is high disk read
> by Solr. The problem is that it is impacting search performance when the
> index is loaded from the disk with respect to the query, as the disk read
> speed is not quite good and the whole index is not cached in RAM.
> 
> When no searching is performed, I noticed that disk is usually read during
> commit operations and sometimes even without commit at low rate. I guess it
> is read due to segment merge operations. Can it be something else?
> If it is merging, can we limit disk IO during merging?



Re: Questions about Solr Search

2020-07-04 Thread Nicolas Franck
Short answer: no

Neither Solr nor ElasticSearch have such capabilities out of the box.

Solr does have a plugin infrastructure that enables you to provide
better tokenization based on language rules, and some are better
than others.

I saw for example integration of openNLP here: 
https://lucene.apache.org/solr/guide/7_3/language-analysis.html
but that requires additional data to provide language rules,
but not a lot of languages have been tackled 
(http://opennlp.sourceforge.net/models-1.5/).

There are of course more general solutions like provided
in the first link, but the tokenization are a bit "rough" and aggressive (that 
is why it is so fast),
delivering tokens that do not exist (debugging shows you that).

Anyway: do not overdo it. There is no way
you can beat large sized company like Google.
Language support is more than a set of rules to apply.

More important is this: make sure Google can decently index
your webpages. More than 90% of your users reach your website through Google,
so do not invest too much into this.


On 2 Jul 2020, at 16:19, Gautam K mailto:gka...@gmail.com>> 
wrote:

Dear Team,

Hope you all are doing well.

Can you please help with the following question? We are using Solr search
in our Organisation and now checking whether Solr provides search
capabilities like Google Enterprise search(Google Knowledge Graph Search).

1, Does Solr Search provide Voice Search like Google?
2. Does Solar Search provide NLP Search(Natural Language Processing)?
3. Does Solr have all the capabilities which Google Knowledge Graph
provides like below?


  - Getting a ranked list of the most notable entities that match certain
  criteria.
  - Predictively completing entities in a search box.
  - Annotating/organizing content using the Knowledge Graph entities.


*Your help will be appreciated highly.*

Many thanks
Gautam Kanaujia
India



Re: "dismax" parameter "bq" filters instead of boosting

2019-04-16 Thread Nicolas Franck
any update on this?

> On 5 Mar 2019, at 09:06, Nicolas Franck  wrote:
> 
> I noticed a change in the behaviour of the regular "dismax" parser.
> At least in version 7.4:
> 
> when you add "bq", it filters the results (like "fq" does), instead of 
> boosting the matches.
> 
> 
> e.g.
> 
> defType=dismax
> bq=format:periodical^30
> 
> gives only records with format "periodical".
> removing the parameter "bq" returns all records
> 
> It does work when defType is set to "edismax".
> 
> Any idea?



Re: "dismax" parameter "bq" filters instead of boosting

2019-04-16 Thread Nicolas Franck
I agree, but I thought my thread was lost in the long list of issues.

I prepared a simple case for solr 8.0:

  basic_dismax_set/config:

 schema.xml and solrconfig.xml

  basic_dismax_set/data:

 records_pp.json

 Total 6 records:

http://localhost:8983/solr/test/select?echoParams=all

 5 records match format:book

http://localhost:8983/solr/test/select?echoParams=all&q=format:book&defType=lucene

and 1 format:film

http://localhost:8983/solr/test/select?echoParams=all&q=format:film&defType=lucene

But when I try this (defType is dismax) ..:

http://localhost:8983/solr/test/select?echoParams=all&bq=format:book^2

the result list is filtered on format:book (total of 5 records)

This url gives the same result by the way:

http://localhost:8983/solr/test/select?echoParams=all&fq=format:book^2

while the character ^ isn't supposed to work in fq, right?

The same result in both Solr 7.4.0 and Solr 8.0

Thanks in advance

<>


Re: "dismax" parameter "bq" filters instead of boosting

2019-04-16 Thread Nicolas Franck
Ok, thanks for your investigation ;-) That was quick.

So you consider this as a bug, as it was fixed for edismax parser?

I thought the parameter q.op only applied to the terms in de main
query (parameter "q"), making ..

  jakarta apache

to be interpreted as

  +jakarta +apache

when q.op = AND

The documentation of bq at least describes it as an "optional" query that only
influences the score, not the result list.


> On 16 Apr 2019, at 23:59, Alexandre Rafalovitch  wrote:
> 
> If you set q.op=OR (and not as 'AND' you defined in your config), you
> will see the difference between your last two queries. The second last
> one will show 6 items and the last one still 5.
> 
> As is, with your custom config, booster query is added as one more
> clause in the search. q.op=ALL forces it to be a compulsory clause,
> rather than an optional (boosting one).
> 
> FQ is always a forced compulsory clause. Maybe it accepts boosts, but
> all scores are ignored anyway (it is just 0 for fail and anything else
> for pass).
> 
> Adding 'debug=all' into the query parameters (or defaults) would help
> you see that for yourself.
> 
> But it does seem (in 7.2.1 I have here) that edismax seems to wrap
> both query parts in individual brackets. Maybe there was a bug that
> was fixed in eDismax only. No ideas there, except that most of the
> effort goes into eDismax these days rather than dismax.
> 
> Regards,
>   Alex
> P.s. My suggestion was actually to give the queries against STOCK
> examples. That would have made all these parameters explicit and more
> obvious. And perhaps would have allowed you to discover the minimum
> parameter set causing the issue without all those other qf and pf in
> the game.
> 
> On Tue, 16 Apr 2019 at 16:13, Nicolas Franck  wrote:
>> 
>> I agree, but I thought my thread was lost in the long list of issues.
>> 
>> I prepared a simple case for solr 8.0:
>> 
>>  basic_dismax_set/config:
>> 
>> schema.xml and solrconfig.xml
>> 
>>  basic_dismax_set/data:
>> 
>> records_pp.json
>> 
>> Total 6 records:
>> 
>> http://localhost:8983/solr/test/select?echoParams=all
>> 
>> 5 records match format:book
>> 
>> http://localhost:8983/solr/test/select?echoParams=all&q=format:book&defType=lucene
>> 
>> and 1 format:film
>> 
>> http://localhost:8983/solr/test/select?echoParams=all&q=format:film&defType=lucene
>> 
>> But when I try this (defType is dismax) ..:
>> 
>> http://localhost:8983/solr/test/select?echoParams=all&bq=format:book^2
>> 
>> the result list is filtered on format:book (total of 5 records)
>> 
>> This url gives the same result by the way:
>> 
>> http://localhost:8983/solr/test/select?echoParams=all&fq=format:book^2
>> 
>> while the character ^ isn't supposed to work in fq, right?
>> 
>> The same result in both Solr 7.4.0 and Solr 8.0
>> 
>> Thanks in advance
>> 



Re: local paramas only with defType=lucene?

2019-04-17 Thread Nicolas Franck
Yup

Changes in Solr 7.2: local parameters only parsed when defType is either 
"lucene" or "func"

cf. https://lucene.apache.org/solr/guide/7_3/solr-upgrade-notes.html#solr-7-2
cf. https://issues.apache.org/jira/browse/SOLR-11501


On 17 Apr 2019, at 10:35, Michael Aleythe, Sternwald 
mailto:michael.aley...@sternwald.com>> wrote:

Hi everybody,

is it correct that local parameters ( q={!edismax qf=MEDIA_ID v=283813390} ) in 
solr only work with the lucene query parser defined for the main query? I tried 
with dismax/edismax but it did not work. The documentation is not clear on this 
point.

Best regards
Michael Aleythe



Re: Problem while indexing DATE field in SOLR.

2019-04-26 Thread Nicolas Franck
Dates need to be send in UTC format:

-mm-ddTHH:MM:SSZ

or if you want fractional seconds too:

-mm-ddTHH:MM:SS.NNNZ

See 
https://lucene.apache.org/solr/guide/6_6/working-with-dates.html#WorkingwithDates-DateFormatting

There is no automatic conversion for dates

On 26 Apr 2019, at 09:50, Neha 
mailto:neha.gu...@uni-jena.de>> wrote:


Dear SOLR Team,

I am using SOLR 6.6.0 version for indexing data stored in the POSTGRESQL 
database. I am facing an issue and needs your help

Below is the snapshot of the table i am trying to index: -



Steps followed for indexing DATETIMELOG field in above table: -

1) First i created field of type "text" and indexed DATETIMELOG in it. All goes 
OK and search is possible though it is treated as text which is natural.

2) Change field type to "date" and re-index all database again without 
restarting the SOLR. All goes OK and search is possible and in SOLR browse 
interface date is shown in UTC format which is fine






After this i  restarted SOLR and tried to re-index again but now i get below 
warning and SOLR documents are not getting created.

org.apache.solr.common.SolrException: Invalid Date String:'2017-12-28 
18:50:04.6'

I tried searching for this problem on internet but not able to find any 
solution.

I request you to please help me with this (may be some more configuration is 
required which i am not aware of) and let me know in case any other information 
is required from side.


Thanks and Regards

Neha Gupta



Re: Does Solr support retrieve a string text and get its filename accordingly?

2019-05-23 Thread Nicolas Franck
In that case you'll have to duplicate that field:

id: $name_of_file
id_t: $name_of_file

The first field should be marked as "string", and set to be the key field.
Id-fields cannot be tokenized.

The second field is a derivative (you can just copy the contents, or use 
copyField),
and should be set to a type of field, that does tokenization. In this case 
you'll
need a field type that uses n-grams:

https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-N-GramTokenizer

otherwise you'll end up using wildcard queries ( _id_s:my* ) that do not 
perform very well.

On 23 May 2019, at 09:39, Mohomed Rimash 
mailto:rim...@yaalalabs.com>> wrote:

yes in that case your file name should be key field of each document you
added to the solr

On Thu, 23 May 2019 at 12:32, luckydog xf 
mailto:luckydo...@gmail.com>> wrote:

Thanks  guys.

*Don't mean to be a bother*, just want to confirm, I know it's doable to
search keywords, but what I want  is * FileName(s) * that contains the
string. The answer is still a yes?

Thanks again.

On Thu, May 23, 2019 at 2:20 PM Jörn Franke 
mailto:jornfra...@gmail.com>> wrote:

You can go much more than grep. I recommend to get a book on Solr and
read
through it. Then you get the full context and you can see if it is useful
for you.

Am 23.05.2019 um 07:44 schrieb luckydog xf 
mailto:luckydo...@gmail.com>>:

Hi, list,

  A quick question, we have tons of Microsoft docx/PDFs files( some
PDFs
are scanned copies), and we want to populate into Apache solr and
search
a
few keywords that contain in the files and  return filenames
accordingly.

 # it's the same thing as `grep -r KEYWORD /PATH/XXX` in Linux system.

 Is it doable ?

 Thanks,





Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread Nicolas Franck
In that case, hard optimisation like that is out the question.
Resort to automatic merge policies, specifying a maximum
amount of segments. Solr is created with multiple segments
in mind. Hard optimisation seems like not worth the problem.

The problem is this: the less segments you specify during
during an optimisation, the longer it will take, because it has to read
all of these segments to be merged, and redo the sorting. And a cluster
has a lot of housekeeping on top of it.

If you really want to issue a optimisation, then you can
also do it in steps (max segments parameter)

10 -> 9 -> 8 -> 7 .. -> 1

that way less segments need to be merged in one go.

testing your index will show you what a good maximum
amount of segments is for your index.

> On 7 Jun 2019, at 07:27, jena  wrote:
> 
> Hello guys,
> 
> We have 4 solr(version 4.4) instance on production environment, which are
> linked/associated with zookeeper for replication. We do heavy deleted & add
> operations. We have around 26million records and the index size is around
> 70GB. We serve 100k+ requests per day.
> 
> 
> Because of heavy indexing & deletion, we optimise solr instance everyday,
> because of that our solr cloud getting unstable , every solr instance go on
> recovery mode & our search is getting affected & very slow because of that.
> Optimisation takes around 1hr 30minutes. 
> We are not able fix this issue, please help.
> 
> Thanks & Regards
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)

2019-08-05 Thread Nicolas Franck
If the ping request handler is taking too long,
and the server is not recovering automatically,
there is not much you can do automatically on that server.
You have to intervene manually, and restart Solr on that node.

First of all: the ping is just an internal check. If it takes too long
to respond, the requester (i.e. the script calling it), should stop
the request, and mark that node as problematic. If there are
for example memory problems every subsequent request will only enhance
the problem, and Solr cannot recover from that.

> On 5 Aug 2019, at 06:15, dinesh naik  wrote:
> 
> Thanks john,Erick and Furknan.
> 
> I have already defined the ping request handler in solrconfig.xml as below:
>   name="invariants"> /select _root_:abc  
> 
> My question is regarding the custom query being used. Here i am querying
> for field _root_ which is available in all of my cluster and defined as a
> string field. The result for _root_:abc might not get me any match as
> well(i am ok with not finding any matches, the query should not be taking
> 10-15 seconds for getting the response).
> 
> If the response comes within 1 second , then the core recovery issue is
> solved, hence need your suggestion if using _root_ field in custom query is
> fine?
> 
> 
> On Mon, Aug 5, 2019 at 2:49 AM Furkan KAMACI  wrote:
> 
>> Hi,
>> 
>> You can change invariants i.e. *qt* and *q* of a *PingRequestHandler*:
>> 
>> 
>>   
>> /search
>> some test query
>>   
>> 
>> 
>> Check documentation fore more info:
>> 
>> https://lucene.apache.org/solr/7_6_0//solr-core/org/apache/solr/handler/PingRequestHandler.html
>> 
>> Kind Regards,
>> Furkan KAMACI
>> 
>> On Sat, Aug 3, 2019 at 4:17 PM Erick Erickson 
>> wrote:
>> 
>>> You can also (I think) explicitly define the ping request handler in
>>> solrconfig.xml to do something else.
>>> 
 On Aug 2, 2019, at 9:50 AM, Jörn Franke  wrote:
 
 Not sure if this is possible, but why not create a query handler in
>> Solr
>>> with any custom query and you use that as ping replacement ?
 
> Am 02.08.2019 um 15:48 schrieb dinesh naik >> :
> 
> Hi all,
> I have few clusters with huge data set and whenever a node goes down
>> its
> not able to recover due to below reasons:
> 
> 1. ping request handler is taking more than 10-15 seconds to respond.
>>> The
> ping requesthandler however, expects it will return in less than 1
>>> second
> and fails a requestrecovery if it is not responded to in this time.
> Therefore recoveries never would start.
> 
> 2. soft commit is very low ie. 5 sec. This is a business requirement
>> so
> not much can be done here.
> 
> As the standard/default admin/ping request handler is using *:*
>> queries
>>> ,
> the response time is much higher, and i am looking for an option to
>>> change
> the same so that the ping handler returns the results within few
> miliseconds.
> 
> here is an example for standard query time:
> 
> snip---
> curl "
> 
>>> 
>> http://hostname:8983/solr/parts/select?indent=on&q=*:*&rows=0&wt=json&distrib=false&debug=timing
> "
> {
> "responseHeader":{
>  "zkConnected":true,
>  "status":0,
>  "QTime":16620,
>  "params":{
>"q":"*:*",
>"distrib":"false",
>"debug":"timing",
>"indent":"on",
>"rows":"0",
>"wt":"json"}},
> "response":{"numFound":1329638799,"start":0,"docs":[]
> },
> "debug":{
>  "timing":{
>"time":16620.0,
>"prepare":{
>  "time":0.0,
>  "query":{
>"time":0.0},
>  "facet":{
>"time":0.0},
>  "facet_module":{
>"time":0.0},
>  "mlt":{
>"time":0.0},
>  "highlight":{
>"time":0.0},
>  "stats":{
>"time":0.0},
>  "expand":{
>"time":0.0},
>  "terms":{
>"time":0.0},
>  "block-expensive-queries":{
>"time":0.0},
>  "slow-query-logger":{
>"time":0.0},
>  "debug":{
>"time":0.0}},
>"process":{
>  "time":16619.0,
>  "query":{
>"time":16619.0},
>  "facet":{
>"time":0.0},
>  "facet_module":{
>"time":0.0},
>  "mlt":{
>"time":0.0},
>  "highlight":{
>"time":0.0},
>  "stats":{
>"time":0.0},
>  "expand":{
>"time":0.0},
>  "terms":{
>"time":0.0},
>  "block-expensive-queries":{
>"time":0.0},
>  "slow-query-logger":{
>"time":0.0},
>  "debug":{
>"time":0.0}
> 
> 
> snap
> 
> can we use query: _root_:abc in the ping request handler ? Tried this
>>> query
> and its returning the results within few miliseconds and also the
>> nodes
>>> are
> able to recover with

Re: Searches across Cores

2019-08-09 Thread Nicolas Franck
He's right. You can use the parameter "shards" for a very long time,
even before the whole solr cloud existed.

e.g. http://localhost:8983/solr/core0/select

with parameters:

  shards = 
localhost:8983/solr/core0,example.com:8983/solr/core0
  q = *:*
  defType = lucene

Yes, I used same core name twice (in the path and in the parameter), but I do 
not see another way.
You need to start the query at a query handler..

I guess your data is generated by several parties,
each on their own core? That makes sense.



On 9 Aug 2019, at 19:21, Vadim Ivanov 
mailto:vadim.iva...@spb.ntk-intourist.ru>> 
wrote:


May be consider having one collection with implicit sharding ?
This way you can have all advantages of solrcloud and can control content of 
each core "manualy" as well as query them independently (&distrib=false)
... or some of them using &shards=core1,core2 as was proposed before
Quote from doc
" If you created the collection and defined the "implicit" router at the time 
of creation, you can additionally define a router.field parameter to use a 
field from each document to identify a shard where the document belongs. If the 
field specified is missing in the document, however, the document will be 
rejected. You could also use the _route_ parameter to name a specific shard."
--
Vadim


-Original Message-
From: Komal Motwani [mailto:motwani.ko...@gmail.com]
Sent: Friday, August 09, 2019 7:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Searches across Cores

For some good reasons, SolrCloud is not an option for me.
I need to run nested graph queries so firing parallel queries and taking
union/intersection won't work.
I am aware of achieving this via shards however I am looking for ways to
achieve this via multiple cores. We already have data existing in multiple
cores on which i need to add this feature.

Thanks,
Komal Motwani

On Fri, Aug 9, 2019 at 8:57 PM Erick Erickson 
mailto:erickerick...@gmail.com>>
wrote:

So my question is why do you have individual cores? Why not use SolrCloud
and collections and have this happen automatically?

There may be very good reasons, this is more if a sanity check….

On Aug 9, 2019, at 8:02 AM, Jan Høydahl 
mailto:jan@cominvent.com>> wrote:

USE request param &shards=core1,core2 or if on separate machines
host:port/solr/core1,host:port/solr/core2

Jan Høydahl

9. aug. 2019 kl. 11:23 skrev Komal Motwani
mailto:motwani.ko...@gmail.com>>:

Hi,



I have a use case where I would like a query to span across Cores
(Multi-Core); all the cores involved do have same schema. I have started
using solr just recently and have been trying to find ways to achieve
this
but couldn’t find any solution so far (Distributed searches, shards are
not
what I am looking for). I remember in one of the tech talks, there was a
mention of this feature to be included in future releases. Appreciate
any
pointers to help me progress further.



Thanks,

Komal Motwani






replica's of same shard have different file contents

2020-01-14 Thread Nicolas Franck
I noticed a - in my opinion - strange behavior in Solr Cloud.

I have a collection that has 1 shard and two replica's.

When I look at the directory structure, both have the same file names
in "data/index" ..

BUT the contents of those files are different.

So when I query this collection, and sort on "score",
and the score is the same for a lot of documents,
then the order is different depending on the node that
was queried. The results are the same, just the returned order.

I guess the segments are not sent as "is" from leaders to the other replica's?
Or something else could be wrong?

Thanks in advance




Re: Getting error "Bad Message 414 reason: URI Too Long"

2021-01-14 Thread Nicolas Franck
I believe you can also access this path in a HTTP POST request.
That way you do no hit the URI size limit

cf. 
https://stackoverflow.com/questions/2997014/can-you-use-post-to-run-a-query-in-solr-select

I think some solr libraries already use this approach (e.g.  WebService::Solr 
in perl)

On 14 Jan 2021, at 10:31, Abhay Kumar 
mailto:abhay.ku...@anjusoftware.com>> wrote:

Hello,

I am trying to post below query to Solr but getting error as “Bad Message 
414reason: URI Too Long”.

I am sending query using SolrNet library. Please suggest how to resolve this 
issue.

Query : 
http://localhost:8983/solr/documents/select?q=%22Geisteswissenschaften%22%20OR%20%22Humanities%22%20OR%20%22Art%22%20OR%20%22Arts%22%20OR%20%22Caricatures%22%20OR%20%22Caricature%22%20OR%20%22Cartoon%22%20OR%20%22Engraving%20and%20Engravings%22%20OR%20%22Engravings%20and%20Engraving%22%20OR%20%22Engraving%22%20OR%20%22Engravings%22%20OR%20%22Human%20Body%22%20OR%20%22Human%20Bodies%22%20OR%20%22Human%20Figure%22%20OR%20%22Human%20Figures%22%20OR%20%22menschlicher%20K%C3%B6rper%22%20OR%20%22Menschliche%20Gestalt%22%20OR%20%22Body%20Parts%22%20OR%20%22K%C3%B6rperteile%22%20OR%20%22Body%20Parts%20and%20Fluids%22%20OR%20%22K%C3%B6rperteile%20und%20-fl%C3%BCssigkeiten%22%20OR%20%22Medical%20Illustration%22%20OR%20%22Medical%20Illustrations%22%20OR%20%22medizinische%20Illustration%22%20OR%20%22Anatomy%2C%20Artistic%22%20OR%20%22Artistic%20Anatomy%22%20OR%20%22Artistic%20Anatomies%22%20OR%20%22Medicine%20in%20Art%22%20OR%20%22Medicine%20in%20Arts%22%20OR%20%22Numismatics%22%20OR%20%22M%C3%BCnzkunde%22%20OR%20%22Coins%22%20OR%20%22Coin%22%20OR%20%22M%C3%BCnzen%22%20OR%20%22Medals%22%20OR%20%22Medal%22%20OR%20%22Denkm%C3%BCnzen%22%20OR%20%22Gedenkm%C3%BCnzen%22%20OR%20%22Medaillen%22%20OR%20%22Paintings%22%20OR%20%22Painting%22%20OR%20%22Philately%22%20OR%20%22Philatelies%22%20OR%20%22Postage%20Stamps%22%20OR%20%22Postage%20Stamp%22%20OR%20%22Briefmarken%22%20OR%20%22Portraits%22%20OR%20%22Portrait%22%20OR%20%22Sculpture%22%20OR%20%22Sculptures%22%20OR%20%22Awards%20and%20Prizes%22%20OR%20%22Prizes%20and%20Awards%22%20OR%20%22Awards%22%20OR%20%22Award%22%20OR%20%22Prizes%22%20OR%20%22Prize%22%20OR%20%22Nobel%20Prize%22%20OR%20%22Ethics%22%20OR%20%22Egoism%22%20OR%20%22Ethical%20Issues%22%20OR%20%22Ethical%20Issue%22%20OR%20%22Metaethics%22%20OR%20%22Metaethik%22%20OR%20%22Moral%20Policy%22%20OR%20%22Moral%20Policies%22%20OR%20%22Moralischer%20Grundsatz%22%20OR%20%22Natural%20Law%22%20OR%20%22Natural%20Laws%22%20OR%20%22Naturrecht%22%20OR%20%22Situational%20Ethics%22%20OR%20%22Bioethical%20Issues%22%20OR%20%22Bioethical%20Issue%22%20OR%20%22Bioethics%22%20OR%20%22Biomedical%20Ethics%22%20OR%20%22Health%20Care%20Ethics%22%20OR%20%22Ethics%2C%20Clinical%22%20OR%20%22Clinical%20Ethics%22%20OR%20%22klinische%20Ethik%22%20OR%20%22Complicity%22%20OR%20%22Mitt%C3%A4terschaft%22%20OR%20%22Moral%20Complicity%22%20OR%20%22Moralische%20Komplizenschaft%22%20OR%20%22Moralische%20Mitt%C3%A4terschaft%22%20OR%20%22Conflict%20of%20Interest%22%20OR%20%22Interest%20Conflict%22%20OR%20%22Interest%20Conflicts%22%20OR%20%22Ethical%20Analysis%22%20OR%20%22Ethical%20Analyses%22%20OR%20%22Casuistry%22%20OR%20%22Retrospective%20Moral%20Judgment%22%20OR%20%22Retrospective%20Moral%20Judgments%22%20OR%20%22retrospektive%20Moralische%20Beurteilung%22%20OR%20%22Wedge%20Argument%22%20OR%20%22Wedge%20Arguments%22%20OR%20%22Slippery%20Slope%20Argument%22%20OR%20%22Slippery%20Slope%20Arguments%22%20OR%20%22Argument%20der%20schiefen%20Ebene%22%20OR%20%22Ethical%20Relativism%22%20OR%20%22Ethical%20Review%22%20OR%20%22Ethikgutachten%22%20OR%20%22Ethics%20Consultation%22%20OR%20%22Ethics%20Consultations%22%20OR%20%22Ethical%20Theory%22%20OR%20%22Ethical%20Theories%22%20OR%20%22Normative%20Ethics%22%20OR%20%22Normative%20Ethic%22%20OR%20%22Consequentialism%22%20OR%20%22Deontological%20Ethics%22%20OR%20%22Deontological%20Ethic%22%20OR%20%22Deontologie%22%20OR%20%22Ethik%20der%20Pflichtenlehre%22%20OR%20%22Teleological%20Ethics%22%20OR%20%22Teleological%20Ethic%22%20OR%20%22Teleologische%20Ethik%22%20OR%20%22Utilitarianism%22%20OR%20%22Utilitarianisms%22%20OR%20%22Utilitarismus%22%20OR%20%22Ethicists%22%20OR%20%22Ethicist%22%20OR%20%22Ethics%20Consultants%22%20OR%20%22Ethics%20Consultant%22%20OR%20%22Bioethicists%22%20OR%20%22Bioethicist%22%20OR%20%22Bioethics%20Consultants%22%20OR%20%22Bioethics%20Consultant%22%20OR%20%22Bioethiker%22%20OR%20%22Clinical%20Ethicists%22%20OR%20%22Clinical%20Ethicist%22%20OR%20%22Ethics%20Committees%22%20OR%20%22Ethics%20Committee%22%20OR%20%22Institutional%20Ethics%20Committees%22%20OR%20%22Institutional%20Ethics%20Committee%22%20OR%20%22Institutionalisierte%20Ethikkommission%22%20OR%20%22Regional%20Ethics%20Committees%22%20OR%20%22Regional%20Ethics%20Committee%22%20OR%20%22Regionale%20Ethikkommissionen%22%20OR%20%22Ethics%20Committees%2C%20Clinical%22%20OR%20%22Clinical%20Ethics%20Committees%22%20OR%20%22Clinical%20Ethics%20Committee%22%20OR%20%22Hospi

Re: Getting error "Bad Message 414 reason: URI Too Long"

2021-01-14 Thread Nicolas Franck
Euh, sorry: I did not read your message well enough.
You did actually use a post request, with the parameters in the body
(your example suggests otherwise)

> On 14 Jan 2021, at 10:37, Nicolas Franck  wrote:
> 
> I believe you can also access this path in a HTTP POST request.
> That way you do no hit the URI size limit
> 
> cf. 
> https://stackoverflow.com/questions/2997014/can-you-use-post-to-run-a-query-in-solr-select
> 
> I think some solr libraries already use this approach (e.g.  WebService::Solr 
> in perl)
> 
> On 14 Jan 2021, at 10:31, Abhay Kumar 
> mailto:abhay.ku...@anjusoftware.com>> wrote:
> 
> Hello,
> 
> I am trying to post below query to Solr but getting error as “Bad Message 
> 414reason: URI Too Long”.
> 
> I am sending query using SolrNet library. Please suggest how to resolve this 
> issue.
> 
> Query : 
> http://localhost:8983/solr/documents/select?q=%22Geisteswissenschaften%22%20OR%20%22Humanities%22%20OR%20%22Art%22%20OR%20%22Arts%22%20OR%20%22Caricatures%22%20OR%20%22Caricature%22%20OR%20%22Cartoon%22%20OR%20%22Engraving%20and%20Engravings%22%20OR%20%22Engravings%20and%20Engraving%22%20OR%20%22Engraving%22%20OR%20%22Engravings%22%20OR%20%22Human%20Body%22%20OR%20%22Human%20Bodies%22%20OR%20%22Human%20Figure%22%20OR%20%22Human%20Figures%22%20OR%20%22menschlicher%20K%C3%B6rper%22%20OR%20%22Menschliche%20Gestalt%22%20OR%20%22Body%20Parts%22%20OR%20%22K%C3%B6rperteile%22%20OR%20%22Body%20Parts%20and%20Fluids%22%20OR%20%22K%C3%B6rperteile%20und%20-fl%C3%BCssigkeiten%22%20OR%20%22Medical%20Illustration%22%20OR%20%22Medical%20Illustrations%22%20OR%20%22medizinische%20Illustration%22%20OR%20%22Anatomy%2C%20Artistic%22%20OR%20%22Artistic%20Anatomy%22%20OR%20%22Artistic%20Anatomies%22%20OR%20%22Medicine%20in%20Art%22%20OR%20%22Medicine%20in%20Arts%22%20OR%20%22Numismatics%22%20OR%20%22M%C3%BCnzkunde%22%20OR%20%22Coins%22%20OR%20%22Coin%22%20OR%20%22M%C3%BCnzen%22%20OR%20%22Medals%22%20OR%20%22Medal%22%20OR%20%22Denkm%C3%BCnzen%22%20OR%20%22Gedenkm%C3%BCnzen%22%20OR%20%22Medaillen%22%20OR%20%22Paintings%22%20OR%20%22Painting%22%20OR%20%22Philately%22%20OR%20%22Philatelies%22%20OR%20%22Postage%20Stamps%22%20OR%20%22Postage%20Stamp%22%20OR%20%22Briefmarken%22%20OR%20%22Portraits%22%20OR%20%22Portrait%22%20OR%20%22Sculpture%22%20OR%20%22Sculptures%22%20OR%20%22Awards%20and%20Prizes%22%20OR%20%22Prizes%20and%20Awards%22%20OR%20%22Awards%22%20OR%20%22Award%22%20OR%20%22Prizes%22%20OR%20%22Prize%22%20OR%20%22Nobel%20Prize%22%20OR%20%22Ethics%22%20OR%20%22Egoism%22%20OR%20%22Ethical%20Issues%22%20OR%20%22Ethical%20Issue%22%20OR%20%22Metaethics%22%20OR%20%22Metaethik%22%20OR%20%22Moral%20Policy%22%20OR%20%22Moral%20Policies%22%20OR%20%22Moralischer%20Grundsatz%22%20OR%20%22Natural%20Law%22%20OR%20%22Natural%20Laws%22%20OR%20%22Naturrecht%22%20OR%20%22Situational%20Ethics%22%20OR%20%22Bioethical%20Issues%22%20OR%20%22Bioethical%20Issue%22%20OR%20%22Bioethics%22%20OR%20%22Biomedical%20Ethics%22%20OR%20%22Health%20Care%20Ethics%22%20OR%20%22Ethics%2C%20Clinical%22%20OR%20%22Clinical%20Ethics%22%20OR%20%22klinische%20Ethik%22%20OR%20%22Complicity%22%20OR%20%22Mitt%C3%A4terschaft%22%20OR%20%22Moral%20Complicity%22%20OR%20%22Moralische%20Komplizenschaft%22%20OR%20%22Moralische%20Mitt%C3%A4terschaft%22%20OR%20%22Conflict%20of%20Interest%22%20OR%20%22Interest%20Conflict%22%20OR%20%22Interest%20Conflicts%22%20OR%20%22Ethical%20Analysis%22%20OR%20%22Ethical%20Analyses%22%20OR%20%22Casuistry%22%20OR%20%22Retrospective%20Moral%20Judgment%22%20OR%20%22Retrospective%20Moral%20Judgments%22%20OR%20%22retrospektive%20Moralische%20Beurteilung%22%20OR%20%22Wedge%20Argument%22%20OR%20%22Wedge%20Arguments%22%20OR%20%22Slippery%20Slope%20Argument%22%20OR%20%22Slippery%20Slope%20Arguments%22%20OR%20%22Argument%20der%20schiefen%20Ebene%22%20OR%20%22Ethical%20Relativism%22%20OR%20%22Ethical%20Review%22%20OR%20%22Ethikgutachten%22%20OR%20%22Ethics%20Consultation%22%20OR%20%22Ethics%20Consultations%22%20OR%20%22Ethical%20Theory%22%20OR%20%22Ethical%20Theories%22%20OR%20%22Normative%20Ethics%22%20OR%20%22Normative%20Ethic%22%20OR%20%22Consequentialism%22%20OR%20%22Deontological%20Ethics%22%20OR%20%22Deontological%20Ethic%22%20OR%20%22Deontologie%22%20OR%20%22Ethik%20der%20Pflichtenlehre%22%20OR%20%22Teleological%20Ethics%22%20OR%20%22Teleological%20Ethic%22%20OR%20%22Teleologische%20Ethik%22%20OR%20%22Utilitarianism%22%20OR%20%22Utilitarianisms%22%20OR%20%22Utilitarismus%22%20OR%20%22Ethicists%22%20OR%20%22Ethicist%22%20OR%20%22Ethics%20Consultants%22%20OR%20%22Ethics%20Consultant%22%20OR%20%22Bioethicists%22%20OR%20%22Bioethicist%22%20OR%20%22Bioethics%20Consultants%22%20OR%20%22Bioethics%20Consultant%22%20OR%20%22Bioethiker%22%20OR%20%22Clinical%20Ethicists%22%20OR%20%22Clinical%20Ethicist%22%20OR%20%22Ethics%20Committees%22%20OR%20%22Ethics%20Committee%22%20OR%20%22Institutional%20Ethics%20Committees%22%20OR%20%22Institutional%20Ethics%20Commi