Re: any difference between using collection vs. shard in URL?

2014-11-06 Thread Ramkumar R. Aiyengar
Do keep one thing in mind though. If you are already doing the work of
figuring out the right shard leader (through solrJ or otherwise), using
that location with just the collection name might be suboptimal if there
are multiple shard leaders present in the same instance -- the collection
name just goes to *some* shard leader and not necessarily to the one where
your document is destined. If it chooses the wrong one, it will lead to a
HTTP request to itself.
On 5 Nov 2014 15:33, "Shalin Shekhar Mangar"  wrote:

> There's no difference between the two. Even if you send updates to a shard
> url, it will still be forwarded to the right shard leader according to the
> hash of the id (assuming you're using the default compositeId router). Of
> course, if you happen to hit the right shard leader then it is just an
> internal forward and not an extra network hop.
>
> The advantage with using the collection name is that you can hit any
> SolrCloud node (even the ones not hosting this collection) and it will
> still work. So for a non Java client, a load balancer can be setup in front
> of the entire cluster and things will just work.
>
> On Wed, Nov 5, 2014 at 8:50 PM, Ian Rose  wrote:
>
> > If I add some documents to a SolrCloud shard in a collection "alpha", I
> can
> > post them to "/solr/alpha/update".  However I notice that you can also
> post
> > them using the shard name, e.g. "/solr/alpha_shard4_replica1/update" - in
> > fact this is what Solr seems to do internally (like if you send documents
> > to the wrong node so Solr needs to forward them over to the leader of the
> > correct shard).
> >
> > Assuming you *do* always post your documents to the correct shard, is
> there
> > any difference between these two, performance or otherwise?
> >
> > Thanks!
> > - Ian
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


solr.xml coreRootDirectory relative to solr home

2014-11-06 Thread Andreas Hubold

Hi,

I'm trying to configure a different core discovery root directory in 
solr.xml with the coreRootDirectory setting as described in 
https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml


I'd like to just set it to a subdirectory of solr home (a "cores" 
directory to avoid confusion with configsets and other directories). I 
tried


cores

but that's interpreted relative to the current working directory. Other 
paths such as sharedLib are interpreted relative to Solr Home and I had 
expected this here too.


I do not set solr home via system property but via JNDI so I don't think 
I can use a ${solr.home}/cores or something like that? It would be nice 
solr home were available for property substitution even if set via JNDI.


Is there another way to set a path relative to solr home here?

Regards,
Andreas


RE: recovery process - node with stale data elected leader

2014-11-06 Thread francois.grollier
Hi all,

Any idea on my issue below?

Thanks
Francois

-Original Message-
From: Grollier, Francois: IT (PRG) 
Sent: Tuesday, November 04, 2014 6:19 PM
To: solr-user@lucene.apache.org
Subject: recovery process - node with stale data elected leader

Hi,

I'm running solrCloud 4.6.0 and I have a question/issue regarding the recovery 
process.

My cluster is made of 2 shards with 2 replicas each. Nodes A1 and B1 are 
leaders, A2 and B2 followers.

I start indexing docs and kill A2. I keep indexing for a while and then kill 
A1. At this point, the cluster stops serving queries as one shard is completely 
unavailable.
Then I restart A2 first, then A1. A2 gets elected leader, waits a bit for more 
replicas to be up and once it sees A1 it starts the recovery process.
My understanding of the recovery process was that at this point A2 would notice 
that A1 has a more up to date state and it would sync with A1. It seems to 
happen like this but then I get:

INFO  - 2014-11-04 11:50:43.068; org.apache.solr.cloud.RecoveryStrategy; 
Attempting to PeerSync from http://a1:8111/solr/executions/ core=executions - 
recoveringAfterStartup=false INFO  - 2014-11-04 11:50:43.069; 
org.apache.solr.update.PeerSync; PeerSync: core=executions 
url=http://a2:8211/solr START replicas=[http://a1:8111/solr/executions/] 
nUpdates=100 INFO  - 2014-11-04 11:50:43.076; org.apache.solr.update.PeerSync; 
PeerSync: core=executions url=http://a2:8211/solr  Received 98 versions from 
a1:8111/solr/executions/ INFO  - 2014-11-04 11:50:43.076; 
org.apache.solr.update.PeerSync; PeerSync: core=executions 
url=http://a2:8211/solr  Our versions are newer. 
ourLowThreshold=1483859630192852992 otherHigh=1483859633446584320 INFO  - 
2014-11-04 11:50:43.077; org.apache.solr.update.PeerSync; PeerSync: 
core=executions url=http://a2:8211/solr DONE. sync succeeded


And I end up with a different set of documents in each node (actually A1 has 
all the documents but A2 misses some).

Is my understanding wrong and is it a completely nonsense to start A2 before A1?

If my understanding right, what could cause the desync? (I can provide more 
logs) And is there a way to force A2 to index the missing documents? I have try 
the FORCERECOVERY command but it generates the same result as shown above.

Thanks
francois

___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___
___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___


grouping finds

2014-11-06 Thread Giovanni Bricconi
Sorry for the basic question

q=*:*&fq=-sku:2471834&fq=FiltroDispo:1&fq=has_image:1&rows=100&fl=descCat3,IDCat3,ranking2&group=true&group.field=IDCat3&group.sort=ranking2+desc&group.ngroups=true

returns some groups with no results. I'm using solr 4.8.0, the collection
has 3 shards

Am I missing some parameters?


   
297254
49

 
   0
 ...
1204312043SSD498


Re: EarlyTerminatingCollectorException

2014-11-06 Thread Dirk Högemann
https://issues.apache.org/jira/browse/SOLR-6710

2014-11-05 21:56 GMT+01:00 Mikhail Khludnev :

> I'm wondered too, but it seems it warmups queryResultCache
>
> https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L522
> at least this ERRORs broke nothing  see
>
> https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L165
>
> anyway, here are two usability issues:
>  - of key:org.apache.solr.search.QueryResultKey@62340b01 lack of readable
> toString()
>  - I don't think regeneration exceptions are ERRORs, they seem WARNs for me
> or even lower. also for courtesy, particularly
> EarlyTerminatingCollectorExcepions can be recognized, and even ignored,
> providing SolrIndexSearcher.java#L522
>
> Would you mind to raise a ticket?
>
> On Wed, Nov 5, 2014 at 6:51 PM, Dirk Högemann  wrote:
>
> > Our production Solr-Slaves-Cores (we have about 40 Cores (each has a
> > moderate size about 10K documents to  90K documents)) produce many
> > exceptions of type:
> >
> > 014-11-05 15:06:06.247 [searcherExecutor-158-thread-1] ERROR
> > org.apache.solr.search.SolrCache: Error during auto-warming of
> > key:org.apache.solr.search.QueryResultKey@62340b01
> > :org.apache.solr.search.EarlyTerminatingCollectorException
> >
> > Our relevant solrconfig is
> >
> >   
> > 
> >   18
> > 
> >   
> >
> >   
> > 2
> >  >   class="solr.FastLRUCache"
> >   size="8192"
> >   initialSize="8192"
> >   autowarmCount="4096"/>
> >
> >
> >  >   class="solr.FastLRUCache"
> >   size="8192"
> >   initialSize="8192"
> >   autowarmCount="4096"/>
> >
> >   
> >  >   class="solr.FastLRUCache"
> >   size="8192"
> >   initialSize="8192"
> >   autowarmCount="4096"/>
> >   
> >
> > What exactly does the exception mean?
> > Thank you!
> >
> > -- Dirk --
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Schemaless configuration using 4.10.2/API returning 404

2014-11-06 Thread Andreas Hubold

Hi,

it might be a silly question, but are you sure that a Solr core 
"collection1" exists? Or does it have a different name?

At least you would get a 404 if no such core exists.

Regards,
Andreas

nbosecker wrote on 11/05/2014 09:12 PM:

Hi all,

I'm working on updating legacy Solr to 4.10.2 to use schemaless
configuration. As such, I have added this snippet to solrconfig.xml per the
docs:


  true
  managed-schema


I see that schema.xml is renamed to schema-xml.bak and managed-schema file
is present on Solr restart.

My Solr Dashboard is accessible via:
https://myserver:9943/solr/#/

However, I still cannot access the schema via API - keep receiving 404 [The
requested resource (/solr/schema/fields) is not available] error:
https://myserver:9943/solr/collection1/schema/fields


What am I missing to access the schema API?

Much thanks!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869.html
Sent from the Solr - User mailing list archive at Nabble.com.






Delete data from stored documents

2014-11-06 Thread yriveiro
Hi,

It's possible remove store data of an index deleting the unwanted fields
from schema.xml and after do an optimize over the index?

Thanks,

/yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: grouping finds

2014-11-06 Thread Timo Schmidt
Hi Giovanni,

afaik grouping is not completly working with solr cloud. You maybe could check:

https://issues.apache.org/jira/browse/SOLR-5046

In addition, documents that should be grouped, need to be in the same shard 
(You can use &router.field=IDCat3 to place all of your documents with the same 
IDCat3 in the same shard).

Maybe someboy else can give some more insight's i am also interested into the 
topic.

Cheers

Timo



Von: Giovanni Bricconi [giovanni.bricc...@banzai.it]
Gesendet: Donnerstag, 6. November 2014 11:43
An: solr-user
Betreff: grouping finds 
   
297254
49

 
   0
 ...
1204312043SSD498


How to dynamically create Solr cores with schema

2014-11-06 Thread Andreas Hubold

Hi,

I have a use-case where Java applications need to create Solr indexes 
dynamically. Schema fields of these indexes differ and should be defined 
by the Java application upon creation.


So I'm trying to use the Core Admin API [1] to create new cores and the 
Schema API [2] to define fields. When creating a core, I have to specify 
solrconfig.xml (with enabled ManagedIndexSchemaFactory) and the schema 
to start with. I thought it would be a good idea to use a named config 
sets [3] for this purpose:


curl 
'http://localhost:8082/solr/admin/cores?action=CREATE&name=m1&instanceDir=cores/m1&configSet=myconfig&dataDir=data'


But when I add a field to the core "m1", the field actually gets added 
to the config set. Is this a bug of feature?


curl http://localhost:8082/solr/m1/schema/fields -X POST -H 
'Content-type:application/json'

  --data-binary '[{
"name":"foo",
"type":"tdate",
"stored":true
}]'

All cores created from the config set "myconfig" will get the new field 
"foo" in their schema. So this obviously does not work to create cores 
with different schema.


I also tried to use the config/schema parameters of the CREATE core 
command (instead of config sets) to specify some existing 
solrconfig.xml/schema.xml. I tried relative paths here (e.g. some level 
upwards) but I could not get it to work. The documentation [1] tells me 
that relative paths are allowed. Should this work?


Next thing that would come to my mind is to use dynamic fields instead 
of a correct managed schema, but that does not sound as nice.
Or maybe I should implement a custom CoreAdminHandler which takes list 
of field definitions, if that's possible somehow...?


I don't know. What's your recommended approach?

We're using Solr 4.10.1 non-SolrCloud. Would this be simpler or 
different with SolrCloud?


Thank you,
Andreas

[1] 
https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-CREATE
[2] 
https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema

[3] https://cwiki.apache.org/confluence/display/solr/Config+Sets


Re: Delete data from stored documents

2014-11-06 Thread Mikhail Khludnev
nope.

On Thu, Nov 6, 2014 at 5:19 PM, yriveiro  wrote:

> Hi,
>
> It's possible remove store data of an index deleting the unwanted fields
> from schema.xml and after do an optimize over the index?
>
> Thanks,
>
> /yago
>
>
>
> -
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: What's the most efficient way to sort by "number of terms matched"?

2014-11-06 Thread Ahmet Arslan
Hi Trey,

Not exactly the same but we did something similar with (e)dismax's mm 
parameter. By autoRelax'ing it.

In your example, 
try with mm=3 if numFound < 20 then try with mm=2 etc.

Ahmet

On Thursday, November 6, 2014 8:41 AM, Trey Grainger  wrote:



Just curious if there are some suggestions here. The use case is fairly
simple:

Given a query like  python OR solr OR hadoop, I want to sort results by
"number of keywords matched" first, and by relevancy separately.

I can think of ways to do this, but not efficiently. For example, I could
do:
q=python OR solr OR hadoop&
  p1=python&
  p2=solr&
  p3=hadoop&
  sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0))
desc, score desc

Other than the obvious downside that this requires me to pre-parse the
user's query, it's also somewhat inefficient to run the query function once
for each term in the original query since it is re-executing multiple
queries and looping through every document in the index during scoring.

Ideally, I would be able to do something like the below that could just
pull the count of unique matched terms from the main query (q parameter)
execution::
q=python OR solr OR hadoop&sort=uniquematchedterms() desc,score desc.

I don't think anything like this exists, but would love some suggestions if
anyone else has solved this before.

Thanks,

-Trey


Re: SolrCloud shard distribution with Collections API

2014-11-06 Thread ralph tice
I've had a bad enough experience with the default shard placement that I
create a collection with one shard, add the shards where I want them, then
use add/delete replica to move the first one to the right machine/port.

Typically this is in a SolrCloud of dozens or hundreds of shards.  Our
shards are all partitioned by time so there are big performance advantages
to optimal placement across JVMs and machines.

What sort of situation do you not have trouble with default shard placement?


On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson  wrote:
> They should be pretty well distributed by default, but if you want to
> take manual control, you can use the createNodeSet param on CREATE
> (with replication factor of 1) and then ADDREPLICA with the node param
> to put replicas for shards exactly where you want.
>
> Best,
> Erick
>
> On Wed, Nov 5, 2014 at 2:12 PM, CTO직속IsabellePhan  wrote:
>> Hello,
>>
>> I am testing a small SolrCloud cluster on 2 servers. I started 2 nodes on
>> each server, so that each collection can have 2 shards with replication
>> factor of 2.
>>
>> I am using below command from Collections API to create collection:
>>
>> curl '
>> http://serveraddress/solr/admin/collections?action=CREATE&name=cp_collection&numShards=2&replicationFactor=2&collection.configName=cp_config
>> '
>>
>> Is there a way to ensure that for each shard, leader and replica are on a
>> different server?
>> This command sometimes put them on 2 nodes from the same server.
>>
>>
>> Thanks a lot for your help,
>>
>> Isabelle


Updating an index

2014-11-06 Thread phiroc
Hello,

I have [mistakenly] created a SOLR index in which the document IDs contain URIs 
such as file:///Z:/1933/01/1933_01.png .

In a single SOLR update command, how can I:

- copy the contents of each document's id field to a new field called 'url', 
after replacing 'Z:' by 'Y:'

- make SOLR generate a new random Id for each document

Many thanks.

Philippe




Re: Schemaless configuration using 4.10.2/API returning 404

2014-11-06 Thread nbosecker
Thanks for the reply!

My Solr has 2 cores(collection1/collection2), I can access them via the Solr
dashboard with no problem.
https://myserver:9943/solr/#/collection1
https://myserver:9943/solr/#/collection2

I can also verify the solrconfig.xml for them contain the schemaless config:
https://myserver:9943/solr/collection1/admin/file?file=solrconfig.xml&contentType=text/xml;charset=utf-8

I'm perplexed, as the managed_schema file has been created and seems to be
active, yet the API continue to give 404. Is this the correct format to
access?
https://myserver:9943/solr/collection1/schema/fields

(I've also tried other variations, removing the collection name etc...always
404).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869p4168028.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: create new core based on named config set using the admin page

2014-11-06 Thread Erick Erickson
Yeah, please create a JIRA. There are a couple of umbrella JIRAs that
you might want to link it to
I'm not sure it quite fits in either, if not just let it hang out there bear:

https://issues.apache.org/jira/browse/SOLR-6703
https://issues.apache.org/jira/browse/SOLR-6084

On Wed, Nov 5, 2014 at 11:57 PM, Andreas Hubold
 wrote:
> Hi,
>
> Solr 4.8 introduced named config sets with
> https://issues.apache.org/jira/browse/SOLR-4478. You can create a new core
> based on a config set with the CoreAdmin API as described in
> https://cwiki.apache.org/confluence/display/solr/Config+Sets
>
> The Solr Admin page allows the creation of new cores as well. There's a "Add
> Core" button in the "Core Admin" tab. This will open a dialog where you can
> enter name, instanceDir, dataDir and the names of solrconfig.xml /
> schema.xml. It would be cool and consistent if one could create a core based
> on a named config set here as well.
>
> I'm asking because I might have overlooked something or maybe somebody is
> already working on this. But probably I should just create a JIRA issue,
> right?
>
> Regards,
> Andreas
>
> Ramzi Alqrainy wrote on 11/05/2014 08:24 PM:
>
>> Sorry, I did not get your point, can you please elaborate more
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/create-new-core-based-on-named-config-set-using-the-admin-page-tp4167850p4167860.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
> --
> Andreas Hubold
> Software Architect
>
> tel +49.40.325587.519
> fax +49.40.325587.999
> andreas.hub...@coremedia.com
>
> CoreMedia AG
> content | context | conversion
>
> Ludwig-Erhard-Str. 18
> 20459 Hamburg, Germany
> www.coremedia.com
>
> Executive Board: Gerrit Kolb (CEO), Dr. Klemens Kleiminger (CFO)
> Supervisory Board: Prof. Dr. Florian Matthes (Chairman)
> Trade Register: Amtsgericht Hamburg, HR B 76277
>


Re: solr.xml coreRootDirectory relative to solr home

2014-11-06 Thread Erick Erickson
An oversight I think. If you create a patch, let me know and we can
get it committed.

Hmmm, not sure though, this'll change the current behavior that people might be
counting on

On Thu, Nov 6, 2014 at 1:02 AM, Andreas Hubold
 wrote:
> Hi,
>
> I'm trying to configure a different core discovery root directory in
> solr.xml with the coreRootDirectory setting as described in
> https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml
>
> I'd like to just set it to a subdirectory of solr home (a "cores" directory
> to avoid confusion with configsets and other directories). I tried
>
> cores
>
> but that's interpreted relative to the current working directory. Other
> paths such as sharedLib are interpreted relative to Solr Home and I had
> expected this here too.
>
> I do not set solr home via system property but via JNDI so I don't think I
> can use a ${solr.home}/cores or something like that? It would be nice solr
> home were available for property substitution even if set via JNDI.
>
> Is there another way to set a path relative to solr home here?
>
> Regards,
> Andreas


Re: Schemaless configuration using 4.10.2/API returning 404

2014-11-06 Thread Alexandre Rafalovitch
Ok, I just booted fresh solr 4.10.2, started example-schemaless and
hit http://localhost:8983/solr/collection1/schema/fields - and it
worked.

So, I suspect the problem is not with Solr but with your setup around
it. For example, is your Solr listening on port 9943 directly (and not
8983) or do you have a proxy in between. Maybe the proxy is not
configured to forward that URL.

Do you have logs? Can you see if that URL is actually being called on
Solr's side? If you see other urls (like generic admin stuff), but not
this one, then it may not be making it there.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 6 November 2014 13:27, nbosecker  wrote:
> Thanks for the reply!
>
> My Solr has 2 cores(collection1/collection2), I can access them via the Solr
> dashboard with no problem.
> https://myserver:9943/solr/#/collection1
> https://myserver:9943/solr/#/collection2
>
> I can also verify the solrconfig.xml for them contain the schemaless config:
> https://myserver:9943/solr/collection1/admin/file?file=solrconfig.xml&contentType=text/xml;charset=utf-8
>
> I'm perplexed, as the managed_schema file has been created and seems to be
> active, yet the API continue to give 404. Is this the correct format to
> access?
> https://myserver:9943/solr/collection1/schema/fields
>
> (I've also tried other variations, removing the collection name etc...always
> 404).
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869p4168028.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Updating an index

2014-11-06 Thread Erick Erickson
No way that I know of, re-indexing is in order.

Solr does not "update in place", you have to re-add the document. Well,
AtomicUpdates work but iff all fields are stored. And it still wouldn't
be a single Solr command.

Best,
Erick

On Thu, Nov 6, 2014 at 8:20 AM,   wrote:
> Hello,
>
> I have [mistakenly] created a SOLR index in which the document IDs contain 
> URIs such as file:///Z:/1933/01/1933_01.png .
>
> In a single SOLR update command, how can I:
>
> - copy the contents of each document's id field to a new field called 'url', 
> after replacing 'Z:' by 'Y:'
>
> - make SOLR generate a new random Id for each document
>
> Many thanks.
>
> Philippe
>
>


Re: What's the most efficient way to sort by "number of terms matched"?

2014-11-06 Thread Sujit Pal
Hi Trey,

In an application I built few years ago, I had a component that rewrote the
input query into a Lucene BooleanQuery and we would set the
minimumNumberShouldMatch value for the query. Worked well, but lately we
are trying to move away from writing our own custom components since
maintaining them across releases becomes a bit of a chore.

So lately we simulate this behavior in the client by constructing
progressively smaller n-grams and OR'ing them then sending to Solr. For
your example, it becomes something like this:

(python AND solr AND hadoop) OR (python AND solr) OR (solr AND hadoop) OR
(python AND hadoop) OR (python) OR (solr) OR (hadoop).

-sujit


On Thu, Nov 6, 2014 at 7:25 AM, Ahmet Arslan 
wrote:

> Hi Trey,
>
> Not exactly the same but we did something similar with (e)dismax's mm
> parameter. By autoRelax'ing it.
>
> In your example,
> try with mm=3 if numFound < 20 then try with mm=2 etc.
>
> Ahmet
>
> On Thursday, November 6, 2014 8:41 AM, Trey Grainger 
> wrote:
>
>
>
> Just curious if there are some suggestions here. The use case is fairly
> simple:
>
> Given a query like  python OR solr OR hadoop, I want to sort results by
> "number of keywords matched" first, and by relevancy separately.
>
> I can think of ways to do this, but not efficiently. For example, I could
> do:
> q=python OR solr OR hadoop&
>   p1=python&
>   p2=solr&
>   p3=hadoop&
>   sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0))
> desc, score desc
>
> Other than the obvious downside that this requires me to pre-parse the
> user's query, it's also somewhat inefficient to run the query function once
> for each term in the original query since it is re-executing multiple
> queries and looping through every document in the index during scoring.
>
> Ideally, I would be able to do something like the below that could just
> pull the count of unique matched terms from the main query (q parameter)
> execution::
> q=python OR solr OR hadoop&sort=uniquematchedterms() desc,score desc.
>
> I don't think anything like this exists, but would love some suggestions if
> anyone else has solved this before.
>
> Thanks,
>
> -Trey
>


Re: solr.xml coreRootDirectory relative to solr home

2014-11-06 Thread Shawn Heisey
On 11/6/2014 12:02 PM, Erick Erickson wrote:
> An oversight I think. If you create a patch, let me know and we can
> get it committed.
>
> Hmmm, not sure though, this'll change the current behavior that people might 
> be
> counting on

Relative to the solr home sounds like the best option to me.  It's what
I would expect, since most of the rest of Solr uses directories relative
to other directories that may or may not be explicitly defined.  I
haven't researched in-depth, but I think that the solr home itself is
the only thing in Solr that defaults to something relative to the
current working directory ... and that seems like a very good policy to
keep.

Thanks,
Shawn



Re: What's the most efficient way to sort by "number of terms matched"?

2014-11-06 Thread Mikhail Khludnev
Sadly, it seems it wasn't been done so far. It's either custom similarity
or function query.

On Thu, Nov 6, 2014 at 9:40 AM, Trey Grainger  wrote:

> Just curious if there are some suggestions here. The use case is fairly
> simple:
>
> Given a query like  python OR solr OR hadoop, I want to sort results by
> "number of keywords matched" first, and by relevancy separately.
>
> I can think of ways to do this, but not efficiently. For example, I could
> do:
> q=python OR solr OR hadoop&
>   p1=python&
>   p2=solr&
>   p3=hadoop&
>   sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0))
> desc, score desc
>
> Other than the obvious downside that this requires me to pre-parse the
> user's query, it's also somewhat inefficient to run the query function once
> for each term in the original query since it is re-executing multiple
> queries and looping through every document in the index during scoring.
>
> Ideally, I would be able to do something like the below that could just
> pull the count of unique matched terms from the main query (q parameter)
> execution::
> q=python OR solr OR hadoop&sort=uniquematchedterms() desc,score desc.
>
> I don't think anything like this exists, but would love some suggestions if
> anyone else has solved this before.
>
> Thanks,
>
> -Trey
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Schemaless configuration using 4.10.2/API returning 404

2014-11-06 Thread nbosecker
I have some level of logging in Tomcat, and I can see that SolrDispatchFilter
is being invoked:
2014-11-06 17:23:19,016 [catalina-exec-3] DEBUG SolrDispatchFilter
- Closing out SolrRequest: {}

But that really isn't terribly helpful. Is there more logging that I could
invoke to get more info from the Solr side?

Some other logs from admin-type requests look like this:
2014-11-06 17:23:16,547 [catalina-exec-7] INFO  SolrDispatchFilter
- [admin] webapp=null path=/admin/info/logging
params={set=com.scitegic.web.catalog:ALL&wt=json} status=0 QTime=4 
2014-11-06 17:23:16,551 [catalina-exec-7] DEBUG SolrDispatchFilter
- Closing out SolrRequest: {set=com.scitegic.web.catalog:ALL&wt=json}

I don't have a proxy in between.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869p4168091.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud shard distribution with Collections API

2014-11-06 Thread CTO직속IsabellePhan
When using Collections API CREATE action, I found that sometimes default
shard placement is correct (leader and replica on different servers) and
sometimes not. So I was looking for a simple and reliable way to ensure
better placement.
It seems like I will have to do it manually for best control, as
recommended by Erick and you.

Thanks,

Isabelle


PS: I deleted emails from thread history, because my reply keeps being
rejected by apache server as spam...


On Thu, Nov 6, 2014 at 8:13 AM, ralph tice  wrote:

> I've had a bad enough experience with the default shard placement that I
> create a collection with one shard, add the shards where I want them, then
> use add/delete replica to move the first one to the right machine/port.
>
> Typically this is in a SolrCloud of dozens or hundreds of shards.  Our
> shards are all partitioned by time so there are big performance advantages
> to optimal placement across JVMs and machines.
>
> What sort of situation do you not have trouble with default shard
> placement?
>
>
> On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson 
> wrote:
> > They should be pretty well distributed by default, but if you want to
> > take manual control, you can use the createNodeSet param on CREATE
> > (with replication factor of 1) and then ADDREPLICA with the node param
> > to put replicas for shards exactly where you want.
> >
> > Best,
> > Erick
> >
>
>


Re: Best practice to setup schemas for documents having different structures

2014-11-06 Thread Vishal Sharma
Thanks for the response guys! Appreciate it.

*Vishal Sharma** Team Lead,   Grazitti Interactive*T: +1
650­ 641 1754
E: vish...@grazitti.com
www.grazitti.com [image: Description: LinkedIn]
[image: Description:
Twitter] [image: fbook]









On Wed, Nov 5, 2014 at 11:09 PM, Ryan Cooke  wrote:

> We define all fields as wildcard fields with a suffix indicating field
> type. Then we can use something like Java annotations to map pojo variables
> to field types to append the correct suffix. This allows us to use one very
> generic schema among all of our collections and we rarely need to update
> it. Our inspiration for this method comes from the ruby library Sunspot.
>
> - Ryan
>
>
>
> ---
> Ryan Cooke
> VP of Engineering
> Docurated
> (646) 535-4595
>
> On Wed, Nov 5, 2014 at 9:59 AM, Erick Erickson 
> wrote:
>
> > It Depends (tm).
> >
> > You have a lot of options, and it all depends on your data and
> > use-case. In general, there is very little cost involved when a doc
> > does _not_ use a field you've defined in a schema. That is, if you
> > have 100's of fields defined and only use 10, the other 90 don't take
> > up space in each doc. There is some overhead with many many fields,
> > but probably not so you'd notice.
> >
> > 1> you could have a single schema that contains all your fields and
> > use it amongst a bunch of indexes (cores). This is particularly easy
> > in the new "configset" pattern.
> >
> > 2> You could have a single schema that contains all your fields and
> > use it in a single index. That index could contain all your different
> > docs with, say, a "type" field to let you search subsets easily.
> >
> > 3> You could have a different schema for each index and put all of the
> > docs in the same index.
> >
> > <1> I don't really like at all. If you're going to have different
> > indexes, I think it's far easier to maintain if there are individual
> > schemas.
> >
> > Between, <2> and <3> it's a tossup. <2> will skew the relevance
> > calculations because all the terms are in a single index. So your
> > relevance calculations for students will be influenced by the terms in
> > courses docs and vice-versa. That said, you may not notice as it's
> > subtle.
> >
> > I generally prefer <3> but I've seen <2> serve as well.
> >
> > Best,
> > Erick
> >
> > On Tue, Nov 4, 2014 at 9:34 PM, Vishal Sharma 
> > wrote:
> > > This is something I have been thinking for a long time now.
> > >
> > > What is the best practice for setting up the Schemas for documents
> having
> > > different fields?
> > >
> > > Should we just create one schema with lot of fields or multiple schemas
> > for
> > > different data structures?
> > >
> > > Here is an example: I have two objects students and courses:
> > >
> > > Student:
> > >
> > >- Student Name
> > >- Student Registration number
> > >- Course Enrolled for
> > >
> > > Course:
> > >
> > >- Course ID
> > >- Course Name
> > >- Course duration
> > >
> > > What should the ideal schema setup should look like?
> > >
> > > Any guidance would is strongly appreciated.
> > >
> > >
> > >
> > > *Vishal Sharma** Team Lead,   Grazitti Interactive*T:
> +1
> > > 650­ 641 1754
> > > E: vish...@grazitti.com
> > > www.grazitti.com [image: Description: LinkedIn]
> > > [image:
> > Description:
> > > Twitter] [image: fbook]
> > > 
> >
>