Re: Indexing Approach

2018-06-26 Thread solrnoobie
1. We have 5 nodes and 3 zookeepers (will autoscale if needed)

2. We use java with the help of solrj / spring data for indexing.

3. We see the exception in our application so this is probably our fault and
not solr's so I'm asking what is the best approach for documents with a lot
of child documents.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Approach for Merge Database and Files

2018-06-26 Thread angeladdati
Hi:

I have two sources to indexing:
Database: MetadataDB1, MetadataDB2, File Url...
Files: MetadataF1, MetadataF2, File Url, Contain...

I index the data base and the files. When I search, I need search and show
the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1,
MetadataF2, File Url, Contain, ...). 


Is it possible?

Regards!

Angel



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Approach for Merge Database and Files

2018-06-26 Thread Peter Gylling Jørgensen
Hi,

I would create a search alias, that contains the latest versions of the 
different collections.

See:
https://lucene.apache.org/solr/guide/7_3/collections-api.html#collections-api

Then you use this alias to search for results

You get better results if you define the same schema for all collections

Best Regards
Peter Gylling Jørgensen
Findability Consultant
Mail: peter.jorgen...@findwise.com
Mobile: +45 42442890


Den 26. jun. 2018 kl. 13.55 skrev angeladdati 
mailto:angeladd...@gmail.com>>:

Hi:

I have two sources to indexing:
Database: MetadataDB1, MetadataDB2, File Url...
Files: MetadataF1, MetadataF2, File Url, Contain...

I index the data base and the files. When I search, I need search and show
the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1,
MetadataF2, File Url, Contain, ...).


Is it possible?

Regards!

Angel



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Default query parser

2018-06-26 Thread Jason Gerlowski
The "Standard Query Parser" _is_ the lucene query parser.  They're the
same parser.  As Shawn pointed out above, they're also the default, so
if you don't specify any defType, they will be used.  Though if you
want to be explicit and specify it anyway, the value is defType=lucene

Jason
On Mon, Jun 25, 2018 at 1:05 PM Kamal Kishore Aggarwal
 wrote:
>
> Hi Shawn,
>
> Thanks for the reply.
>
> If "lucene" is the default query parser, then how can we specify Standard
> Query Parser(QP) in the query.
>
> Dismax QP can be specified by defType=dismax and Extended Dismax Qp by
> defType=edismax, how about for declaration of Standard QP.
>
> Regards
> Kamal
>
> On Wed, Jun 6, 2018 at 9:41 PM, Shawn Heisey  wrote:
>
> > On 6/6/2018 9:52 AM, Kamal Kishore Aggarwal wrote:
> > >> What is the default query parser (QP) for solr.
> > >>
> > >> While I was reading about this, I came across two links which looks
> > >> ambiguous to me. It's not clear to me whether Standard is the default
> > QP or
> > >> Lucene is the default QP or they are same. Below is the screenshot and
> > >> links which are confusing me.
> >
> > The default query parser in Solr has the name "lucene".  This query
> > parser, which is part of Solr, deals with Lucene query syntax.
> >
> > The most recent documentation states this clearly right after the table
> > of contents:
> >
> > https://lucene.apache.org/solr/guide/7_3/the-standard-query-parser.html
> >
> > It is highly unlikely that the 6.6 documentation will receive any
> > changes, unless serious errors are found in it.  The omission of this
> > piece of information will not be seen as a serious error.
> >
> > Thanks,
> > Shawn
> >
> >


Re: Indexing Approach

2018-06-26 Thread Shawn Heisey

On 6/26/2018 12:06 AM, solrnoobie wrote:

We are having errors such as heap space error in our indexing so we decided
to lower the batch size to 50. The problem with this is that sometimes it
really does not help since 1 document can contain 1000 child documents and
it will still have the heap errors and indexing is generally slow everytime.


If you're seeing errors in a Java program related to heap space, you 
have two choices: Reduce the memory requirements of the application, or 
increase its heap size.  Sounds like you need to increase the heap 
size.  Determining the optimal heap size usually requires experimentation.


Thanks,
Shawn



Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread neotorand
Thanks Erick,

Though i saw this article in several places but never went through it
seriously.

Dont you think the below method is very exepensive

autoParser.parse(input, textHandler, metadata, context);


If the document size if bigger than it will need enough memory to hold the
document(ie ContentHandler).
Any other alternative?

Regards
Neo



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread neotorand
Thanks Shawn,

Yes I agree ERH is never suggested in production.
I am writing my custom ones.
Any pointer with this?

What exactly i am looking is a custom indexing program to compile precisely
the information 
that you need and send that to Solr.
On the other hand i see the below method is very expensive if document size
is large.
 autoParser.parse(input, textHandler, metadata, context);

Because ContentHandler would hold the entire contents in memory.
Any suggestions?

Regards
Neo



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Adding tag to fq makes query return child docs instead of parent docs

2018-06-26 Thread Florian Fankhauser
Hello,
given the following document structure (books as parent, libraries having these 
books as children) in a Solr 7.3.1 server:




book
1000
Mr. Mercedes
Stephen King

library
1000/100
20160810
Innsbruck


library
1000/101
20180103
Hall



book
1001
Noah
Sebastian Fitzek

library
1001/100
20170810
Innsbruck






Now I want to query with a filter-query on the acquisition_date_i field oft he 
child-documents and get parent documents as a result:

fq={!parent which=doc_type_s:book} acquisition_date_i:20180626

This works as expected.


Now for some reason I want to exclude the above filter-query from a 
facet-query. Therefore I need to add a tag to the filter-query:

q={!tag=datefilter}{!parent which=doc_type_s:book} acquisition_date_i:20180626 


And now the error occures: Just by adding "{!tag=datefilter}" to the query 
makes the result now contain child-documents.
I seems like the "{!parent which=doc_type_s:book}" does not work anymore as 
soon as I add "{!tag=datefilter}" before it.

When I put the "{!tag=datefilter}" after "{!parent which=doc_type_s:book}" then 
the result is correct again (result contains parent documents) but now the 
exclude in the facet-query does not work anymore.

So the question is: how can I apply a parent- and a tag-filter to a 
filter-query?


Thanks for your help.

Florian



Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread Shawn Heisey

On 6/26/2018 7:13 AM, neotorand wrote:

Dont you think the below method is very exepensive

autoParser.parse(input, textHandler, metadata, context);

If the document size if bigger than it will need enough memory to hold the
document(ie ContentHandler).
Any other alternative?


I did find this:

https://stackoverflow.com/questions/25043720/using-poi-or-tika-to-extract-text-stream-to-stream-without-loading-the-entire-f

But I have no actual experience with Tika.  If you want to get a 
definitive answer, you will need to go to a Tika support resource.  
Although Solr does incorporate Tika, we are not experts in its use.


Thanks,
Shawn



Re: Adding tag to fq makes query return child docs instead of parent docs

2018-06-26 Thread Shawn Heisey

On 6/26/2018 7:22 AM, Florian Fankhauser wrote:

Now for some reason I want to exclude the above filter-query from a 
facet-query. Therefore I need to add a tag to the filter-query:

q={!tag=datefilter}{!parent which=doc_type_s:book} acquisition_date_i:20180626 



According to the documentation:

https://lucene.apache.org/solr/guide/6_6/local-parameters-in-queries.html#LocalParametersinQueries-BasicSyntaxofLocalParameters

You can't specify multiple localparams like that - it says "You may 
specify only one local parameters prefix per argument."


I have no idea whether this is going to work, but here's what I would try:

{!parent which=doc_type_s:book tag=datefilter}

If this is not the proper syntax, I hope somebody who actually knows 
what's possible can speak up and help out.


Thanks,
Shawn



Re: Indexing part of Binary Documents and not the entire contents

2018-06-26 Thread Erick Erickson
Well, if you were using ERH you'd have the same problem as it uses
Tika. At least if you run Tika on some client somewhere, if you do
have a document that blows out memory or has some other problem, your
client can crash without taking Solr with it.

That's one of the reasons, in fact, that we don't recommend running ERH in prod.

And I should point out that this is not a flaw in Tika. Rather the
problem Tika has to cope with is immense.

And even a cursory look at Tika shows a streaming interface, see:
https://tika.apache.org/1.8/examples.html#Streaming_the_plain_text_in_chunks

Best,
Erick

On Tue, Jun 26, 2018 at 6:28 AM, Shawn Heisey  wrote:
> On 6/26/2018 7:13 AM, neotorand wrote:
>>
>> Dont you think the below method is very exepensive
>>
>> autoParser.parse(input, textHandler, metadata, context);
>>
>> If the document size if bigger than it will need enough memory to hold the
>> document(ie ContentHandler).
>> Any other alternative?
>
>
> I did find this:
>
> https://stackoverflow.com/questions/25043720/using-poi-or-tika-to-extract-text-stream-to-stream-without-loading-the-entire-f
>
> But I have no actual experience with Tika.  If you want to get a definitive
> answer, you will need to go to a Tika support resource.  Although Solr does
> incorporate Tika, we are not experts in its use.
>
> Thanks,
> Shawn
>


Re: Approach for Merge Database and Files

2018-06-26 Thread Erick Erickson
>From your problem description, it looks like you want to gather the
data from the DB and filesystem and combine them into a Solr document
at index time, then index that document.

Put enough information in Solr to fetch the document as necessary,
often people don't put the entire file in Solr especially if it's,
say, a PDF or Word etc.

Best,
Erick

On Tue, Jun 26, 2018 at 5:21 AM, Peter Gylling Jørgensen
 wrote:
> Hi,
>
> I would create a search alias, that contains the latest versions of the 
> different collections.
>
> See:
> https://lucene.apache.org/solr/guide/7_3/collections-api.html#collections-api
>
> Then you use this alias to search for results
>
> You get better results if you define the same schema for all collections
>
> Best Regards
> Peter Gylling Jørgensen
> Findability Consultant
> Mail: peter.jorgen...@findwise.com
> Mobile: +45 42442890
>
>
> Den 26. jun. 2018 kl. 13.55 skrev angeladdati 
> mailto:angeladd...@gmail.com>>:
>
> Hi:
>
> I have two sources to indexing:
> Database: MetadataDB1, MetadataDB2, File Url...
> Files: MetadataF1, MetadataF2, File Url, Contain...
>
> I index the data base and the files. When I search, I need search and show
> the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1,
> MetadataF2, File Url, Contain, ...).
>
>
> Is it possible?
>
> Regards!
>
> Angel
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Total Collection Size in Solr 7

2018-06-26 Thread Erick Erickson
Some work is being done on the admin UI, there are several JIRAs.
Perhaps you'd like to join that conversation? We need to have input,
especially in terms of what kinds of information would be useful from
a practitioner's standpoint.

Best,
Erick

On Mon, Jun 25, 2018 at 11:26 PM, Aroop Ganguly  wrote:
> I see, Thanks Susmit.
> I hoped there was something simpler, that could just be part of the 
> collections view we now have in solr 7 admin ui. Or a at least a one stop api 
> call.
> I guess this will be added in a later release.
>
>> On Jun 25, 2018, at 11:20 PM, Susmit  wrote:
>>
>> Hi Aroop,
>> i created a utility using solrzkclient api to read state.json, enumerated 
>> (one) replica for each shard and used /replication handler for size and 
>> added them up..
>>
>> Sent from my iPhone
>>
>>> On Jun 25, 2018, at 7:24 PM, Aroop Ganguly  wrote:
>>>
>>> Hi Team
>>>
>>> I am not sure how to ascertain the total size of a collection via the Solr 
>>> UI on a Solr7+ installation.
>>> The collection is shared and replicated heavily so its tedious to have to 
>>> look at each core and figure out the size of the entire collection from 
>>> this in an additive way.
>>>
>>> Is there an api or ui section from where this info can be obtained ?
>>>
>>> On the flip side, it would be great to have a consolidated view of the 
>>> collection size in GBs along with the individual shard sizes. (Should this 
>>> be a Jira :) ?)
>>>
>>> Thanks
>>> Aroop
>


Re: Approach for Merge Database and Files

2018-06-26 Thread Angel Addati
Thank both.

*"From your problem description, it looks like you want to gather the data
from the DB and filesystem and combine them into a Solr document at index
time, then index that document. " *

Exactly. I don't know if the best approach is combine in index time or in
query time. But I need search and show results of the combine items. I'm
investigating the allias sugguest. Do you think it solve the problem or Do
you know other approach?

PD: I need put the information in the file and the information in the data
base also, because it have some important content and metadata.

Regards...

* - - -*
*Angel** Adrián Addati*


2018-06-26 10:50 GMT-03:00 Erick Erickson :

> From your problem description, it looks like you want to gather the
> data from the DB and filesystem and combine them into a Solr document
> at index time, then index that document.
>
> Put enough information in Solr to fetch the document as necessary,
> often people don't put the entire file in Solr especially if it's,
> say, a PDF or Word etc.
>
> Best,
> Erick
>
> On Tue, Jun 26, 2018 at 5:21 AM, Peter Gylling Jørgensen
>  wrote:
> > Hi,
> >
> > I would create a search alias, that contains the latest versions of the
> different collections.
> >
> > See:
> > https://lucene.apache.org/solr/guide/7_3/collections-
> api.html#collections-api
> >
> > Then you use this alias to search for results
> >
> > You get better results if you define the same schema for all collections
> >
> > Best Regards
> > Peter Gylling Jørgensen
> > Findability Consultant
> > Mail: peter.jorgen...@findwise.com
> > Mobile: +45 42442890
> >
> >
> > Den 26. jun. 2018 kl. 13.55 skrev angeladdati  >:
> >
> > Hi:
> >
> > I have two sources to indexing:
> > Database: MetadataDB1, MetadataDB2, File Url...
> > Files: MetadataF1, MetadataF2, File Url, Contain...
> >
> > I index the data base and the files. When I search, I need search and
> show
> > the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1,
> > MetadataF2, File Url, Contain, ...).
> >
> >
> > Is it possible?
> >
> > Regards!
> >
> > Angel
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Configuring load balancer for Kerberised Solr cluster

2018-06-26 Thread mosheB
We are trying to enable authentication mechanism in our Solr cluster using
Kerberos authentication plugin. We use Active Directory as our KDC, each
Solr node has its own SPN in the form of HTTP/@ and things are
working as expected.
Things are getting complicated while trying to configure our load balancer,
as there is no specific SPN to ask the KDC a ticket for (the balancer is
routing to multiple SPNs...)
As a solution we though to add the balancer's principal to each of the Solr
nodes (and to the keytab files of course) as follow:

-Dsolr.kerberos.principal=HTTP/solr_host.our.domain@OUR.REALM,HTTP/balancer_host.our.domain@OUR.REALM

But it seems impossible to config Solr with more than one SPN.
Is there any other workaround?






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Indexing Approach

2018-06-26 Thread solrnoobie
Thanks for the tip.

Although we have increased our application's heap to 4g and it is still not
enough.

I guess here are the things we think we did wrong:

- Each SP call will return 15 result sets.
- Each document can contain 300-1000 child documents.
- If the batch size is 1000, the child documents for each can contain
300-1000 documents so that will eat up the 4g's allocated to the
application.

Granted if these are the things we did wrong, what strategies can anyone
suggest so that our indexing will not fail if the child documents will grow
unexpectedly? 

Should this even happen (main documents containing a ton of child
documents)?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Approach for Merge Database and Files

2018-06-26 Thread Erick Erickson
bq.  I don't know if the best approach is combine in index time or in query time

It Depends (tm). What is your goal? Let's say you have db_f1 and fm_f2
(db == from the database and fm = file data).

If you want to form a Solr query like

db_f1:something fm_f2:something_else

you don't have much choice, you've got to do it at index time or your
search time will be horrible.

OTOH, if you want search, say, _only_ on the db_* data or _only_ on
the file data and enrich the results returned to the user with data
from the other source, that's perfectly reasonable, although you
should really do some prototyping to see if it meets your SLA. This
presupposes that you're only returning a few rows. For example, use
Solr to get the top 10 docs based on file data and have your app layer
reach out to the DB to enrich just those 10 docs.

In general, you should always consider doing as much pre-processing at
index time as you can on the theory that what you want is fast
searches and you'll search over a doc many, many more times than you
index it.

Best,
Erick


On Tue, Jun 26, 2018 at 7:02 AM, Angel Addati  wrote:
> Thank both.
>
> *"From your problem description, it looks like you want to gather the data
> from the DB and filesystem and combine them into a Solr document at index
> time, then index that document. " *
>
> Exactly. I don't know if the best approach is combine in index time or in
> query time. But I need search and show results of the combine items. I'm
> investigating the allias sugguest. Do you think it solve the problem or Do
> you know other approach?
>
> PD: I need put the information in the file and the information in the data
> base also, because it have some important content and metadata.
>
> Regards...
>
> * - - -*
> *Angel** Adrián Addati*
>
>
> 2018-06-26 10:50 GMT-03:00 Erick Erickson :
>
>> From your problem description, it looks like you want to gather the
>> data from the DB and filesystem and combine them into a Solr document
>> at index time, then index that document.
>>
>> Put enough information in Solr to fetch the document as necessary,
>> often people don't put the entire file in Solr especially if it's,
>> say, a PDF or Word etc.
>>
>> Best,
>> Erick
>>
>> On Tue, Jun 26, 2018 at 5:21 AM, Peter Gylling Jørgensen
>>  wrote:
>> > Hi,
>> >
>> > I would create a search alias, that contains the latest versions of the
>> different collections.
>> >
>> > See:
>> > https://lucene.apache.org/solr/guide/7_3/collections-
>> api.html#collections-api
>> >
>> > Then you use this alias to search for results
>> >
>> > You get better results if you define the same schema for all collections
>> >
>> > Best Regards
>> > Peter Gylling Jørgensen
>> > Findability Consultant
>> > Mail: peter.jorgen...@findwise.com
>> > Mobile: +45 42442890
>> >
>> >
>> > Den 26. jun. 2018 kl. 13.55 skrev angeladdati > >:
>> >
>> > Hi:
>> >
>> > I have two sources to indexing:
>> > Database: MetadataDB1, MetadataDB2, File Url...
>> > Files: MetadataF1, MetadataF2, File Url, Contain...
>> >
>> > I index the data base and the files. When I search, I need search and
>> show
>> > the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1,
>> > MetadataF2, File Url, Contain, ...).
>> >
>> >
>> > Is it possible?
>> >
>> > Regards!
>> >
>> > Angel
>> >
>> >
>> >
>> > --
>> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>


AW: Adding tag to fq makes query return child docs instead of parent docs

2018-06-26 Thread Florian Fankhauser
Hi Shawn,
your answer was the solution to the problem!
It worked with the syntax in your example.

Thank you very much, great support!

Florian




-Ursprüngliche Nachricht-
Von: Shawn Heisey  
Gesendet: Dienstag, 26. Juni 2018 15:35
An: solr-user@lucene.apache.org
Betreff: Re: Adding tag to fq makes query return child docs instead of parent 
docs

On 6/26/2018 7:22 AM, Florian Fankhauser wrote:
> Now for some reason I want to exclude the above filter-query from a 
> facet-query. Therefore I need to add a tag to the filter-query:
>
> q={!tag=datefilter}{!parent which=doc_type_s:book} 
> acquisition_date_i:20180626 

According to the documentation:

https://lucene.apache.org/solr/guide/6_6/local-parameters-in-queries.html#LocalParametersinQueries-BasicSyntaxofLocalParameters

You can't specify multiple localparams like that - it says "You may specify 
only one local parameters prefix per argument."

I have no idea whether this is going to work, but here's what I would try:

{!parent which=doc_type_s:book tag=datefilter}

If this is not the proper syntax, I hope somebody who actually knows what's 
possible can speak up and help out.

Thanks,
Shawn



Re: Indexing Approach

2018-06-26 Thread Shawn Heisey
On 6/26/2018 8:24 AM, solrnoobie wrote:
> - Each SP call will return 15 result sets.
> - Each document can contain 300-1000 child documents.
> - If the batch size is 1000, the child documents for each can contain
> 300-1000 documents so that will eat up the 4g's allocated to the
> application.

If that's what your indexing is like, then you either need to drop the
batch size or increase the heap even further.

If each main document in the batch has 300 to 1000 child documents, then
a batch of 1000 main documents is actually a batch of 300 thousand to
one million, not 1000. I will generally target an indexing batch size
between 500 and 1, which means you may need to actually handle
batches of about 10 main documents, not 1000.

> Should this even happen (main documents containing a ton of child
> documents)?

That's a question I can't answer.  If your use case requires it, then
you'll need to do it that way.  I've got no idea how that affects
performance, except to say that the more total documents you have, the
slower things will run.

Thanks,
Shawn



Re: Total Collection Size in Solr 7

2018-06-26 Thread Aroop Ganguly
Hi Erick

Sure I will look those jiras up. 
In the interim, is what Susmit suggested the only way to get the size info? Or 
is there something else you can recommend? 

Thanks
Aroop



> On Jun 26, 2018, at 6:53 AM, Erick Erickson  wrote:
> 
> Some work is being done on the admin UI, there are several JIRAs.
> Perhaps you'd like to join that conversation? We need to have input,
> especially in terms of what kinds of information would be useful from
> a practitioner's standpoint.
> 
> Best,
> Erick
> 
>> On Mon, Jun 25, 2018 at 11:26 PM, Aroop Ganguly  
>> wrote:
>> I see, Thanks Susmit.
>> I hoped there was something simpler, that could just be part of the 
>> collections view we now have in solr 7 admin ui. Or a at least a one stop 
>> api call.
>> I guess this will be added in a later release.
>> 
>>> On Jun 25, 2018, at 11:20 PM, Susmit  wrote:
>>> 
>>> Hi Aroop,
>>> i created a utility using solrzkclient api to read state.json, enumerated 
>>> (one) replica for each shard and used /replication handler for size and 
>>> added them up..
>>> 
>>> Sent from my iPhone
>>> 
 On Jun 25, 2018, at 7:24 PM, Aroop Ganguly  wrote:
 
 Hi Team
 
 I am not sure how to ascertain the total size of a collection via the Solr 
 UI on a Solr7+ installation.
 The collection is shared and replicated heavily so its tedious to have to 
 look at each core and figure out the size of the entire collection from 
 this in an additive way.
 
 Is there an api or ui section from where this info can be obtained ?
 
 On the flip side, it would be great to have a consolidated view of the 
 collection size in GBs along with the individual shard sizes. (Should this 
 be a Jira :) ?)
 
 Thanks
 Aroop
>> 


Create an index field of type dictionary

2018-06-26 Thread Ritesh Kumar (Avanade)
Hello,

Is it possible to create an index field of type dictionary. I have seen 
stringarry, datetime, bool etc. but I am looking for a field type like list of 
objects.

Thanks

[OCP Logo]

Ritesh

Avanade Infrastructure Team

+1 (425) 588-7853 v-kur...@micrsoft.com

One Commercial Partner - Digital Services





Re: Total Collection Size in Solr 7

2018-06-26 Thread Erick Erickson
Aroop:

Not that I know of. You could do a reasonable approximation by
1> check the index size (manually) with, say, 10M docs
2> check it again with 20M docs
3> use a match all docs query and do the math.

That's clumsy but do-able. The reason I start with 10M and 20M is that
index size does not go up linearly so I like to seed the index first.

That said, though, it's hard to generalize index size as meaning much.
Is it 90% stored? 10% stored data? Those ratios have huge implications
on whether you're straining anything except disk space.

There are a lot of metrics, starting with Solr 6.4 that are available
that give you a much better view of Solr's health.

Best,
Erick

On Tue, Jun 26, 2018 at 9:21 AM, Aroop Ganguly  wrote:
> Hi Erick
>
> Sure I will look those jiras up.
> In the interim, is what Susmit suggested the only way to get the size info? 
> Or is there something else you can recommend?
>
> Thanks
> Aroop
>
>
>
>> On Jun 26, 2018, at 6:53 AM, Erick Erickson  wrote:
>>
>> Some work is being done on the admin UI, there are several JIRAs.
>> Perhaps you'd like to join that conversation? We need to have input,
>> especially in terms of what kinds of information would be useful from
>> a practitioner's standpoint.
>>
>> Best,
>> Erick
>>
>>> On Mon, Jun 25, 2018 at 11:26 PM, Aroop Ganguly  
>>> wrote:
>>> I see, Thanks Susmit.
>>> I hoped there was something simpler, that could just be part of the 
>>> collections view we now have in solr 7 admin ui. Or a at least a one stop 
>>> api call.
>>> I guess this will be added in a later release.
>>>
 On Jun 25, 2018, at 11:20 PM, Susmit  wrote:

 Hi Aroop,
 i created a utility using solrzkclient api to read state.json, enumerated 
 (one) replica for each shard and used /replication handler for size and 
 added them up..

 Sent from my iPhone

> On Jun 25, 2018, at 7:24 PM, Aroop Ganguly  
> wrote:
>
> Hi Team
>
> I am not sure how to ascertain the total size of a collection via the 
> Solr UI on a Solr7+ installation.
> The collection is shared and replicated heavily so its tedious to have to 
> look at each core and figure out the size of the entire collection from 
> this in an additive way.
>
> Is there an api or ui section from where this info can be obtained ?
>
> On the flip side, it would be great to have a consolidated view of the 
> collection size in GBs along with the individual shard sizes. (Should 
> this be a Jira :) ?)
>
> Thanks
> Aroop
>>>


Re: Create an index field of type dictionary

2018-06-26 Thread Erick Erickson
Well, there's a multiValued field that's just a list of whatever (string,
date, numeric, etc).

What's the use-case? This feels like an "XY" problem. a "dictionary" type
is usually some kind of structure that how want to have operate in a
specific manner. Solr doesn't really deal at that level, it just searches
tokens...

Best,
Erick

On Tue, Jun 26, 2018 at 10:48 AM, Ritesh Kumar (Avanade) <
v-kur...@microsoft.com.invalid> wrote:

> Hello,
>
>
>
> Is it possible to create an index field of type *dictionary*. I have seen
> stringarry, datetime, bool etc. but I am looking for a field type like list
> of objects.
>
>
>
> Thanks
>
>
>
> [image: OCP Logo]
>
> *Ritesh*
>
> Avanade Infrastructure Team
>
> +1 (425) 588-7853 v-kur...@micrsoft.com
>
> One Commercial Partner – Digital Services
>
>
>
>
>


Re: Total Collection Size in Solr 7

2018-06-26 Thread Aroop Ganguly
Hi Eric

Thanks for the advice. 
One open question still, about point 1 below: how to get that magic number of 
size in GBs :) ?
As I am mostly using streaming expressions, most of my fields are DocValues and 
not stored.

I will look at the health endpoint to see what it gives me in connection with 
size.

Thanks
Aroop


> On Jun 26, 2018, at 10:49 AM, Erick Erickson  wrote:
> 
> Aroop:
> 
> Not that I know of. You could do a reasonable approximation by
> 1> check the index size (manually) with, say, 10M docs
> 2> check it again with 20M docs
> 3> use a match all docs query and do the math.
> 
> That's clumsy but do-able. The reason I start with 10M and 20M is that
> index size does not go up linearly so I like to seed the index first.
> 
> That said, though, it's hard to generalize index size as meaning much.
> Is it 90% stored? 10% stored data? Those ratios have huge implications
> on whether you're straining anything except disk space.
> 
> There are a lot of metrics, starting with Solr 6.4 that are available
> that give you a much better view of Solr's health.
> 
> Best,
> Erick
> 
> On Tue, Jun 26, 2018 at 9:21 AM, Aroop Ganguly  
> wrote:
>> Hi Erick
>> 
>> Sure I will look those jiras up.
>> In the interim, is what Susmit suggested the only way to get the size info? 
>> Or is there something else you can recommend?
>> 
>> Thanks
>> Aroop
>> 
>> 
>> 
>>> On Jun 26, 2018, at 6:53 AM, Erick Erickson  wrote:
>>> 
>>> Some work is being done on the admin UI, there are several JIRAs.
>>> Perhaps you'd like to join that conversation? We need to have input,
>>> especially in terms of what kinds of information would be useful from
>>> a practitioner's standpoint.
>>> 
>>> Best,
>>> Erick
>>> 
 On Mon, Jun 25, 2018 at 11:26 PM, Aroop Ganguly  
 wrote:
 I see, Thanks Susmit.
 I hoped there was something simpler, that could just be part of the 
 collections view we now have in solr 7 admin ui. Or a at least a one stop 
 api call.
 I guess this will be added in a later release.
 
> On Jun 25, 2018, at 11:20 PM, Susmit  wrote:
> 
> Hi Aroop,
> i created a utility using solrzkclient api to read state.json, enumerated 
> (one) replica for each shard and used /replication handler for size and 
> added them up..
> 
> Sent from my iPhone
> 
>> On Jun 25, 2018, at 7:24 PM, Aroop Ganguly  
>> wrote:
>> 
>> Hi Team
>> 
>> I am not sure how to ascertain the total size of a collection via the 
>> Solr UI on a Solr7+ installation.
>> The collection is shared and replicated heavily so its tedious to have 
>> to look at each core and figure out the size of the entire collection 
>> from this in an additive way.
>> 
>> Is there an api or ui section from where this info can be obtained ?
>> 
>> On the flip side, it would be great to have a consolidated view of the 
>> collection size in GBs along with the individual shard sizes. (Should 
>> this be a Jira :) ?)
>> 
>> Thanks
>> Aroop
 



RE: Create an index field of type dictionary

2018-06-26 Thread Ritesh Kumar (Avanade)
Hey Eric,

Thanks for response, it was a Sitecore related modifications we had to do to 
make it work.

Thanks
Ritesh


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, June 26, 2018 10:52 AM
To: solr-user 
Subject: Re: Create an index field of type dictionary

Well, there's a multiValued field that's just a list of whatever (string, date, 
numeric, etc).

What's the use-case? This feels like an "XY" problem. a "dictionary" type is 
usually some kind of structure that how want to have operate in a specific 
manner. Solr doesn't really deal at that level, it just searches tokens...

Best,
Erick

On Tue, Jun 26, 2018 at 10:48 AM, Ritesh Kumar (Avanade) < 
v-kur...@microsoft.com.invalid> wrote:

> Hello,
>
>
>
> Is it possible to create an index field of type *dictionary*. I have 
> seen stringarry, datetime, bool etc. but I am looking for a field type 
> like list of objects.
>
>
>
> Thanks
>
>
>
> [image: OCP Logo]
>
> *Ritesh*
>
> Avanade Infrastructure Team
>
> +1 (425) 588-7853 v-kur...@micrsoft.com
>
> One Commercial Partner – Digital Services
>
>
>
>
>


Linux command to print top slow performing query (/get) from solr logs

2018-06-26 Thread Ganesh Sethuraman
Is there a way to print using Linux commands to print top slow performing
queries from Solr 7 logs (/get handler or /select handler)? In the reverse
sorted order across log files will be very useful and handy to trouble shoot

Regards
Ganesh


Change/Override Solrconfig.xml across collections

2018-06-26 Thread Ganesh Sethuraman
I would like to implement the Slow Query logging feature (
https://lucene.apache.org/solr/guide/6_6/configuring-logging.html#ConfiguringLogging-LoggingSlowQueries)
across multiple collection without changing solrconfig.xml in each and
every collection. Is that possible? I am using solr 7.2.1

If this is not possible, is it possible to update only the solrconfig.xml
into zookeeper for each collection, without the schema update? i have both
schema and solrconfig.xml in the same directory

Regards
Ganesh