Solr server configuration

2018-01-12 Thread Deepak Nair
Hello,

We want to implement Solr 7.x for one of our client with below requirement.


1.   The data to be index will be from 2 Oracle databases with 2 tables 
each and around 10 columns.

2.   The data volume is expected to be reach around 10 million in each 
table.

3.   4000+ users will query the indexed data from a UI. The peak load is 
expected to be around 2000 queries/sec.

4.   The implementation will be on a standalone or clustered Unix 
environment.

I want to know what should be the best server configuration for this kind of 
requirement. Eg: how many VMs, what should be the RAM, Heap size etc.

Thanks,
Deepak


Re: Regarding document routing

2018-01-12 Thread manish tanger
Hello Shawn,

Here are the UI options i filled and for more clarification i am using
solr 6.5.1



name : Collection_name
config set: ber
numshards: 1
replicationfactor: 1

Advance options:
router : Implicit
maxShardPerNode: 1
shards: 20180111_04,20180111_05
routerField: dateandhour




Regards

Manish Kr. Tanger


On Thu, Jan 11, 2018 at 2:59 PM, Shawn Heisey  wrote:

> On 1/10/2018 11:00 PM, manish tanger wrote:
>
>> As we are connecting through zookeeper my understanding was routing will
>> done by a zookeeper, Thanks for the clarification.
>>
>
> CloudSolrClient doesn't actually connect through ZK.  When you create the
> client using ZK info, the client reads information about the cloud from ZK,
> and discovers where the Solr servers are.  All the actual work that the
> client does is sent to those Solr servers that were discovered by reading
> the ZK database.
>
> *What was the precise commands or API calls that you used to create the
>>>
>> collection?  What is the definition of the dateandhour field?*
>> *
>> *Collection Creation Through UI:
>> Inline image 3
>>
>
> Attachments rarely make it to the list.  Your image showing the collection
> creation did not make it, so I can't that information.  If you want to use
> an image for that, you're going to need to find some kind of website for
> sharing images and provide us with a link.  But as you'll read below,
> sharing that may not be required.
>
> *dateandhour field defination:
>> **
>>
>
> I have discovered a problem in the admin UI on version 7.2, which may
> affect other versions.  Whatever you enter into the "routerField" box gets
> sent as a "routerField" parameter -- *not* as the "router.field" parameter
> that is actually required.  So the collection's state.json file does not
> have a router field defined.
>
> I opened an issue for that problem:
>
> https://issues.apache.org/jira/browse/SOLR-11843
>
> Can you try creating a collection with the API directly, rather than with
> the admin UI, and using the correct "router.field" parameter?
>
> https://lucene.apache.org/solr/guide/7_2/collections-api.
> html#CollectionsAPI-Input
>
> Thanks,
> Shawn
>


How different is solr 4.7 from latest version.

2018-01-12 Thread srini sampath
Hi,

I am reading a book (Solr in action
) to
understand how to work with different features in solr. It uses solr 4.7 to
explain features. But I don't find any better material (IMHO, documentation
has many looped references which makes it too difficult to understand for a
newbie).

Does it cover all the features related to new version (like important
features) or is it better to follow some other resource?

.Best,
Srini Sampath.


Re: Haystack: The Search Relevance & Cognitive Search Conference

2018-01-12 Thread Doug Turnbull
Hello Solristas,

Just a reminder the CFP for this is one week, Friday the 19th!

We’ve gotten fantastic submissions from organizations like Snagajob,
Wikimedia Foundation, Elsevier and other top organizations. We’d love to
learn from your “in the trenches” lessons!

http://o19s.com/haystack

Best
- Doug

On Fri, Dec 8, 2017 at 3:27 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Join us at Haystack, April 10 & 11 where we discuss advanced technical
> topics on search relevance and cognitive search! We'll discuss topics on
> applied relevance engineering with fellow practitioners, in Solr,
> Elasticsearch, Vespa, and adjacent technologies
>
> Topics include:
> - Learning to Rank
> - Semantic search
> - Personalization
> - Smart Search UX
> - Plugging search engine guts to control relevance
> - Adjacent topics in discovery & recommendations
> - And many more!
>
> We're doing this conference at a nominal fee of $75 at our headquarters in
> Charlottesville VA on April 10 & 11
>
> Click here to learn more. CFPs Needed!
> http://o19s.com/haystack
>
> Best
> -Doug
> --
> Consultant, OpenSource Connections. Contact info at
> http://o19s.com/about-us/doug-turnbull/; Free/Busy (
> http://bit.ly/dougs_cal)
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


LTR and working with feature stores

2018-01-12 Thread Dariusz Wojtas
Hi,

I am working with the LTR rescoring.
Works beautifully, but I am curious about something.
How do I specify the feature store in a way different than using the
[features] syntax?
[features store=yourFeatureStore]

I have a range of models in my custom feature store, with plenty of
features implemented.
I have found that when I call LTR with model using only two features, Solr
still executes them all.

My setup in solrconfig.xml
-
id,score,why_score:[explain style=nl],[features
store=store_incidentDB]
{!ltr reRankDocs=$reRankDocs model=simpleModelA}
--

simpleModel above only uses LinearModel with 2 features.

What do I see in results?
In response I can see it has executed ALL features (there are values
calculated) in section:
1)  -> response -> result -> doc -> HERE

In addition, there is my model executed and only TWO features of the
executed model are presented in:
2)  -> response -> debug -> explain

Why do I see all features being executed, if the specified model only
contains two features?

I tried to reduce 'fl' to:
  id,score,why_score:[explain style=nl]
and id works as expected then:
1. additional features are not executed (correct)
2. my model works, only two features of the selected model (correct)

And the final questions for this long email are:
1. why does it execute all features when i specify 'store'?
2. how do I specify the 'store', if I have more stores, but do not want to
execute all their features?

Best regards,
Dariusz Wojtas


Re: How different is solr 4.7 from latest version.

2018-01-12 Thread Shawn Heisey

On 1/12/2018 5:58 AM, srini sampath wrote:

I am reading a book (Solr in action
) to
understand how to work with different features in solr. It uses solr 4.7 to
explain features. But I don't find any better material (IMHO, documentation
has many looped references which makes it too difficult to understand for a
newbie).

Does it cover all the features related to new version (like important
features) or is it better to follow some other resource?


The latest version is 7.2, and the 7.2.1 release is being finalized now.

That's three major versions newer.  Most of the info in that book will 
still be relevant, but there is quite a bit of new functionality.


Here's the reference guide that is published as official documentation. 
You can download this as a PDF using the "Other Formats" link at the top 
of the page:


https://lucene.apache.org/solr/guide/7_2/

Full disclosure: There isn't very much available for extreme beginners. 
This lack is something the project is aware of, but writing 
documentation for the uninitiated is a difficult task.  The reference 
guide isn't awful, but it could be a lot better.


For differences between versions, there is the CHANGES.txt file included 
in every download.  The reference guide also has a section about big 
differences from the previous major version.


Thanks,
Shawn


RE: How different is solr 4.7 from latest version.

2018-01-12 Thread Joe Heasly
Srini,

We upgraded from Solr 4.6 to 6.4 last summer.  There are fundamental 
differences between those versions in the way the default Boolean operator and 
'minimum should match' functions interact.  Here's an excellent discussion of 
the change here (Jason Hellman does it more justice than I could hope to):

http://blog.innoventsolutions.com/innovent-solutions-blog/2017/02/solr-edismax-boolean-query.html

If you're just starting out and you're starting with a recent version, this 
won't matter.  But if you're upgrading, it's critical to be aware.

Regards,
Joe

{ Joe Heasly | L.L.Bean, Inc. | [O] 207 552-2254 [M] 207 756-9250 }

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, January 12, 2018 10:02 AM
To: solr-user@lucene.apache.org
Subject: Re: How different is solr 4.7 from latest version.

On 1/12/2018 5:58 AM, srini sampath wrote:
> I am reading a book (Solr in action
>  lr-2DAction-2DTrey-2DGrainger_dp_1617291021&d=DwICaQ&c=uC6H3HqR7J0hkle
> XqZF0oA&r=LGfOV9gkzZFmyXgI5jYqo5FeO_fORxZZyF8winHfJ8s&m=xoJLS_ZP0u4l7P
> AGZslEYaLCEBqnoJfoXeneaibCb-8&s=rPpw1EkXBvtuGGJXE7VSTHcgViWw_6X0Au7zZW
> Af4iw&e=>) to understand how to work with different features in solr. It uses 
> solr 4.7 to explain features. But I don't find any better material (IMHO, 
> documentation has many looped references which makes it too difficult to 
> understand for a newbie).
> 
> Does it cover all the features related to new version (like important
> features) or is it better to follow some other resource?

The latest version is 7.2, and the 7.2.1 release is being finalized now.

That's three major versions newer.  Most of the info in that book will still be 
relevant, but there is quite a bit of new functionality.

Here's the reference guide that is published as official documentation. 
You can download this as a PDF using the "Other Formats" link at the top of the 
page:

https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F2_&d=DwICaQ&c=uC6H3HqR7J0hkleXqZF0oA&r=LGfOV9gkzZFmyXgI5jYqo5FeO_fORxZZyF8winHfJ8s&m=xoJLS_ZP0u4l7PAGZslEYaLCEBqnoJfoXeneaibCb-8&s=ISiRaa2CCBrg1Y3ElGGY8_I0fCvSGGKiJGjU8TcsAas&e=

Full disclosure: There isn't very much available for extreme beginners. 
This lack is something the project is aware of, but writing documentation for 
the uninitiated is a difficult task.  The reference guide isn't awful, but it 
could be a lot better.

For differences between versions, there is the CHANGES.txt file included in 
every download.  The reference guide also has a section about big differences 
from the previous major version.

Thanks,
Shawn


Re: Regarding document routing

2018-01-12 Thread Erick Erickson
Shawn's point (and JIRA) is that the UI doesn't pass the "router"
parameter correctly, so it is being ignored.

Simply put: You cannot create collections with the admin UI using
implicit routing because of this bug.  Don't use it.

Either use the "solr/bin create_collection" command or put the
parameters directly on the url with parameters from here:

https://lucene.apache.org/solr/guide/6_6/collections-api.html

something like:
/solr/admin/collections?action=CREATE&router.name=implicit&shards=shard1,shard2,shard3&replicationFactor=.

You can check whether the collection is created correctly by going to the
admin UI>>cloud>>tree>>collections>>your_collection
You should see the data about your collection, including what router
was actually used.

Best,
Erick

On Fri, Jan 12, 2018 at 2:22 AM, manish tanger  wrote:
> Hello Shawn,
>
> Here are the UI options i filled and for more clarification i am using
> solr 6.5.1
>
>
>
> name : Collection_name
> config set: ber
> numshards: 1
> replicationfactor: 1
>
> Advance options:
> router : Implicit
> maxShardPerNode: 1
> shards: 20180111_04,20180111_05
> routerField: dateandhour
>
>
>
>
> Regards
>
> Manish Kr. Tanger
>
>
> On Thu, Jan 11, 2018 at 2:59 PM, Shawn Heisey  wrote:
>
>> On 1/10/2018 11:00 PM, manish tanger wrote:
>>
>>> As we are connecting through zookeeper my understanding was routing will
>>> done by a zookeeper, Thanks for the clarification.
>>>
>>
>> CloudSolrClient doesn't actually connect through ZK.  When you create the
>> client using ZK info, the client reads information about the cloud from ZK,
>> and discovers where the Solr servers are.  All the actual work that the
>> client does is sent to those Solr servers that were discovered by reading
>> the ZK database.
>>
>> *What was the precise commands or API calls that you used to create the

>>> collection?  What is the definition of the dateandhour field?*
>>> *
>>> *Collection Creation Through UI:
>>> Inline image 3
>>>
>>
>> Attachments rarely make it to the list.  Your image showing the collection
>> creation did not make it, so I can't that information.  If you want to use
>> an image for that, you're going to need to find some kind of website for
>> sharing images and provide us with a link.  But as you'll read below,
>> sharing that may not be required.
>>
>> *dateandhour field defination:
>>> **
>>>
>>
>> I have discovered a problem in the admin UI on version 7.2, which may
>> affect other versions.  Whatever you enter into the "routerField" box gets
>> sent as a "routerField" parameter -- *not* as the "router.field" parameter
>> that is actually required.  So the collection's state.json file does not
>> have a router field defined.
>>
>> I opened an issue for that problem:
>>
>> https://issues.apache.org/jira/browse/SOLR-11843
>>
>> Can you try creating a collection with the API directly, rather than with
>> the admin UI, and using the correct "router.field" parameter?
>>
>> https://lucene.apache.org/solr/guide/7_2/collections-api.
>> html#CollectionsAPI-Input
>>
>> Thanks,
>> Shawn
>>


Re: Solr server configuration

2018-01-12 Thread Erick Erickson
First, it's totally impossible to answer in the abstract, see:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Second, indexing DB tables directly into Solr is usually the wrong
approach. Solr is not a replacement for a relational DB, it does not
function as a DB, is not optimized for joins etc. It's  a _search
engine_ and does that superlatively.

At the very least, the most common recommendation if you have the
space is to de-normalize the data. My point is you need to think about
this problem in terms of _search_, not "move some tables to Solr and
use Solr like a DB". Which means that even if someone can answer your
questions, it won't help much.

Best,
Erick

On Thu, Jan 11, 2018 at 9:53 PM, Deepak Nair  wrote:
> Hello,
>
> We want to implement Solr 7.x for one of our client with below requirement.
>
>
> 1.   The data to be index will be from 2 Oracle databases with 2 tables 
> each and around 10 columns.
>
> 2.   The data volume is expected to be reach around 10 million in each 
> table.
>
> 3.   4000+ users will query the indexed data from a UI. The peak load is 
> expected to be around 2000 queries/sec.
>
> 4.   The implementation will be on a standalone or clustered Unix 
> environment.
>
> I want to know what should be the best server configuration for this kind of 
> requirement. Eg: how many VMs, what should be the RAM, Heap size etc.
>
> Thanks,
> Deepak


Re: trivia question: why q=*:* doesn't return same result as q.alt=*:*

2018-01-12 Thread Chris Hostetter

: defType=dismax does NOT do anything special with *:* other than treat it 
...
: > As Chris explained, this is special:
...

I'm interpreting your followup question differently then Erick & Erik 
did.  I'm going to assume both E & E missunderstood your question, and i'm 
going to assume you completley understood my response to your original 
question.

I'm going to assume that a way to rewrod/expand your followup question is 
something like this...

"I understand now that defType=dismax doesn't support special syntax like 
'*:*' and treats that 3 input as just another 3 character string to search 
against the qf & pf fields -- but now what i don't understand is why are 
list of fields in the debug query output is different for 'q=*:*' compared 
to something like 'q=hello'"

(If i have not understood your followup question correctly, please 
clarify)

Let's look at those outputs you mentioned...

: >> http://localhost:8983/solr/filesearch/select?fq=id:1193&;
: >> q=*:*&debugQuery=true
: >> 
: >> 
: >>   - parsedquery: "+DisjunctionMaxQuery((user_email:*:* | user_name:*:* |
: >>   tags:*:* | (name_shingle_zh-cn:, , name_shingle_zh-cn:, ,) |
: >> id:*:*)~0.01)
: >>   DisjunctionMaxQuery(((name_shingle_zh-cn:", , , ,"~100)^100.0 |
: >>   tags:*:*)~0.01)",
...
: >> e.g. following query uses the my expected set of pf and qf.
...
: >> http://localhost:8983/solr/filesearch/select?fq=id:1193&;
: >> q=hello&debugQuery=true
: >> 
: >> 
: >> 
: >>   - parsedquery: "+DisjunctionMaxQuery(((name_token:hello)^60.0 |
: >>   user_email:hello | (name_combined:hello)^10.0 | (name_zh-cn:hello)^10.0
: >> |
: >>   name_shingle:hello | comments:hello | user_name:hello |
: >> description:hello |
: >>   file_content_zh-cn:hello | file_content_de:hello | tags:hello |
: >>   file_content_it:hell | file_content_fr:hello | file_content_es:hell |
: >>   file_content_en:hello | id:hello)~0.01)
: >> DisjunctionMaxQuery((description:hello
: >>   | (name_shingle:hello)^100.0 | comments:hello | tags:hello)~0.01)",


The answer has to do with the list of qf & pf fields you have confiugred 
-- you didn't provide us with concrete specifics of what qf/pf you 
have configured in your requestHandler -- but you did mention in your 
second example that "following query uses the my expected set of pf and 
qf"

By comparing the 2 examples at a glance, It appears that the fields in the 
first example (q=*:* ... again, searching for the literal 3 character 
string '*:*') are (mostly) a subset of the fields you "expected" (from the 
2nd example)

I'm fairly certain that what's happening here is that in both examples the 
literal string input is being given to the analyzer for all of your fields 
-- but in the case of the (literal) string '*:*' many of the analyzers are 
producing no terms at all -- ie: they are completley striping out 
punctuation -- so they don't appear in the final query.

IIUC it looks like one other oddity here is that the reverse also 
seems to be true in some cases -- i suspect that 
although "name_shingle_zh-cn" doesn't appera in your 2nd example, it 
probably *is* in your pf param but whatever analyzer you have confiured 
for it produces no tokens for the latin characters "hello" but does 
produces tokens for the pure-punctuation characters "*:*"


(If i'm correct about your question, but wrong about your qf/pf then 
please provide us with a lot more details -- notably your full 
schema/solrconfig used when executing those queries.


-Hoss
http://www.lucidworks.com/


Re: Heavy operations in PostFilter are heavy

2018-01-12 Thread Chris Hostetter

: Yes I do so. The Problem ist that the collect-Method is called for EVERY 
: document the query matches. Even if the User only wants to see like 10 
: documents. The Operation I have to perform takes maybe 50ms/per document 

You running into a classic chicken/egg problem with document collection 
& filtering -- you don't want your expensive filte to be run against every 
doc that matches the query (and lower cost filters) just the "top 10" the 
user is going to see -- but solr doesn't know what those top 10 are yet, 
not untill it's collected & sorted all of them ... nad your PostFilter 
can change what gets collected ... it's a filter!

Also: Things like Faceting (and even just returning an accurate numFound!) 
require that all matches be "collected" ... unless you are useing sorted 
segments and early termintation, your PostFilter has to be consulted about 
every (potential) match in order for the results to be accurate.

: if have to process them singel. And maybe 30ms if I could get a 
: Document-List. But if the user e.g. uses an Wildcard query that matches 

If processing in batch is a viable option then, one approach you may want 
to consider is to take the approach used by the CollapseQParser and the 
PostFilter it generates -- it doesn't pass on any collected documents to 
it's delegate as it collects them -- it essentially just batches them all 
up, and then in the "finish" method it processes them and calls 
delegate.collect() on the ones it decies are important.

-Hoss
http://www.lucidworks.com/


LTR original score feature

2018-01-12 Thread Brian Yee
I wanted to get some opinions on using the original score feature. The original 
score produced by Solr is intuitively a very important feature. In my data set 
I'm seeing that the original score varies wildly between different queries. 
This makes sense since the score generated by Solr is not normalized across all 
queries. However, won't this mess with our training data? If this feature is 
3269.4 for the top result for one query, and then 32.7 for the top result for 
another query, it does not mean that the first document was 10x more relevant 
to its query than the second document. I am using a normalize param within 
Ranklib, but that only normalizes features between each other, not within one 
feature, right? How are people handling this? Am I missing something?


request for instructions to add a another solr node

2018-01-12 Thread Sushil K Tripathi
Team,


I am new to Solr and i need help to add new VM to existing solr cluster serving 
user request. Any help would be appreciated.


Environment Detail-

Red Hat Enterprise Linux Server release 7.4
VM1 configured with-
1. Zookeeper1, 2 and 3 on different port
2. Solr 7.2 configured with 2 node and 2 shard and 2 replica

VM2- New Server, we are trying to add in existing cluster. We followed the 
instruction from Apache Solr reference guide for 7.2. as below-

unzip the Solr-7.2.0.tar.gz and-
mkdir -p example/cloud/node3/solr
cp server/solr/solr.xml example/cloud/node3/solr
bin/solr start -cloud -s example/cloud/node3/solr -p 8987 -z :

Issue-
=
while calling URL- http://10.0.12.57:8983/solr/

It seems new node still not part of cluster also not having any core and 
indexes. Thanks for help in advance.

Error -
=
HTTP ERROR 404

Problem accessing /solr/. Reason:

Not Found

Caused by:

javax.servlet.UnavailableException: Error processing the request. CoreContainer 
is either not initialized or shutting down.
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:342)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)



With Warm Regards...
Sushil K. Tripathi


Re: LTR original score feature

2018-01-12 Thread Michael Alcorn
What you're suggesting is that there's a "nonlinear relationship
"
between the original score (the input variable) and some measure of
"relevance" (the output variable). Nonlinear models like decision trees
(which include LambdaMART) and neural networks (which include RankNet) can
handle these types of situations, assuming there's enough data. The
nonlinear phenomena you brought up are also probably part of the
reason why pairwise
models tend to perform better than pointwise models

in
learning to rank tasks.

On Fri, Jan 12, 2018 at 1:52 PM, Brian Yee  wrote:

> I wanted to get some opinions on using the original score feature. The
> original score produced by Solr is intuitively a very important feature. In
> my data set I'm seeing that the original score varies wildly between
> different queries. This makes sense since the score generated by Solr is
> not normalized across all queries. However, won't this mess with our
> training data? If this feature is 3269.4 for the top result for one query,
> and then 32.7 for the top result for another query, it does not mean that
> the first document was 10x more relevant to its query than the second
> document. I am using a normalize param within Ranklib, but that only
> normalizes features between each other, not within one feature, right? How
> are people handling this? Am I missing something?
>


Re: request for instructions to add a another solr node

2018-01-12 Thread Erick Erickson
What is the cause reported in the solr log? This should be in:

example/cloud/node3/solr/logs

that often gives a much more complete statement of what went wrong.

You don't really need the -cloud parameter, the -z parameter implies
that it's a SolrCloud
installation. That's not the root of your problem, more of an aside.

What's inconsistent here is that you started your third node on port
8987, but the URL you
accessed was 8983. That makes no sense to me. Forgetting the bits
about adding a new
Solr instance, do you see a healthy Solr cluster in the admin UI
before you add the
new instance? My bet is that your basic installation is messed up and
the new Solr node is
a red herring.

FWIW, I routinely spin up multiple Solr JVMs with :

mkdir ./example/cloud/node1

cp ./server/solr/solr.xml ./example/cloud/node1/solr

then

bin/solr start -z localhost:2181 -p 8981 -s example/cloud/node1/solr

Typically I use ports 8981, 8982, 8983, 8984 just because it makes keeping track
easier, but there's no reason 8987 wouldn't work.

Finally, assuming the Solr node starts successfully, you won't see
anything in the
admin UI unless you look under "live_nodes" in the
admin UI>>cloud>>tree
view

Best,
Erick