Bug in Solr 6 dynamic-fields?

2016-05-04 Thread Tech Id
Hi,

We are unable to resolve a problem with dynamic fields in Solr 6.
The question and details can be found on stack-overflow at
http://stackoverflow.com/questions/37014345/unable-to-add-new-dynamic-fields-in-solr-6-0/37018450#37018450

If its a real bug, then we can file a JIRA for the same.

Appreciate any help !
Thanks
TiD


Re: admin/metrics API or read JMX by jolokia?

2017-06-26 Thread Tech Id
Yes, this is really good to know.

With Jolokia, it is difficult to parse the output a bit because of the
presence of special characters in the mbean name like '
http://localhost:17330/jolokia/read/solr!/my!-collection_shard1_replica2:*'

The presence of slashes etc. makes parsing and querying a bit messy.
And if we know for sure that Jolokia based metrics are less performant
anyways, then we can just use the metric APIs forever.

Thanks
TI





On Sun, Jun 25, 2017 at 6:29 PM, S G  wrote:

> Hi,
>
> The API admin/metrics
>  MetricsReporting-MetricsAPI>
> in
> 6.x version of Solr seems to be very good.
> Is it performance friendly as well?
>
> We want to use this API to query the metrics every minute or so from all
> Solr nodes and push to grafana.
> How does this compare with the performance overhead of reading JMX metrics
> via Jolokia?
>
> Rest API is surely easier to understand and parse.
> However it involves making a REST call that will pass through jetty,
> probably take up a thread for each request? etc.
> Is Jolokia lighter-weight in this respect?
>
> Some recommendation on this would be great.
>
> Thanks
> SG
>


Solr staying constant on popularity indexes

2017-10-09 Thread Tech Id
Hi,

So I was a bit frustrated the other day when all of a sudden my Solr nodes
started going into recovery.
Everything became normal after a rolling restart, but when I looked at the
logs, I was surprised to see --- nothing !
Solr UI gave me no information during recovery.
Solr logs gave me no information as to what really happened.

And though I have not had the time to use Elastic-Search yet, a couple of
friends have recommended it highly.

Here is a graph that shows 30% gain of ES over Solr in less than 2 years:

Reference: https://db-engines.com/en/ranking_trend/search+engine

​
Being a long term Solr user, I tried to do a little comparison myself and
actually found some interesting features in ES.

1. No zookeeper  - I have burnt my hands with some zookeeper issues in the
past and it is no fun to deal with. Kafka and Storm are also trying to
burden zookeeper less and less because ZK cannot handle heavy traffic.
2. REST APIs - this is a big wow over the complicated syntax Solr uses. I
think V2 APIs are coming to address this, but they did come a bit late in
the game.
3. Client nodes - No such equivalent in Solr. All nodes do scatter-gather
in Solr which adds scalability problems.
4. Much better logs in ES
5. Cluster level stats and hot-threads etc APIs make monitoring easy.

So I just wanted to discuss some of these important points about ES vs Solr.

At the very least, we should try to improve our logs.
When a node is behaving badly, Solr gives absolutely no information why its
is behaving the way it is.
In the same debugging spirit, the Solr-UI can also be improved to show
number-of-cores per node, total number of down/recovering etc nodes,
memory/CPU/disk used by each node etc which make the engineer's jobs a bit
more easy.


Cheers,
TI


Re: Confusing DocValues documentation

2017-12-22 Thread Tech Id
Very interesting discussion SG and Erick.
I wish these details were part of the official Solr documentation as well.
And yes, "columnar format" did not give any useful information to me either.


"A good explanation increases contributions to the project as more people
become empowered to improvise."
   - Self, LOL


I was expecting the sorting, faceting, pivoting to a bit more optimized for
docValues, something like a pre-calculated bit of information.
However, now it seems that the major benefit of docValues is to optimize
the lookup time of stored fields.
Here is the sorting function I wrote as pseudo-code from the discussion:


int docIDs[] = filterDocsOnQuery (query);
T docValues[] = loadDocValues (sortField);
TreeMap sortFieldValues[] = new TreeMap<>();
for (int docId : docIDs) {
T val = docValues[docId];
sortFieldValues.put(val, docId);
}
// return docIDs sorted by value
return sortFieldValues.values;


It is indeed difficult to pre-compute the sorts and facets because we do
not know what docIDs will be returned by the filtering.

Two last questions I have are:
1) If the docValues are that good, can we git rid of the stored values
altogether?
2) And why the docValues are not enabled by default for multi-valued fields?


-T




On Thu, Dec 21, 2017 at 9:02 PM, Erick Erickson 
wrote:

> OK, last bit of the tutorial.
>
> bq: But that does not seem to be helping with sorting or faceting of any
> kind.
> This seems to be like a good way to speed up a stored field's retrieval.
>
> These are the same thing. I have two docs. I have to know how they
> sort. Therefore I need the value in the sort field for each. This the
> same thing as getting the stored value, no?
>
> As for facets it's the same problem. To count facet buckets I have to
> find the values for the  field for each document in the results list
> and tally them. This is also getting the stored value, right? You're
> asking "for the docs in my result set, how many of them have val1, how
> many have val2, how many have val54 etc.
>
> And as an aside the docValues can also be used to return the stored value.
>
> Best,
> Erick
>
> On Thu, Dec 21, 2017 at 8:23 PM, S G  wrote:
> > Thank you Eric.
> >
> > I guess the biggest piece I was missing was the sort on a field other
> than
> > the search field.
> > Once you have filtered a list of documents and then you want to sort, the
> > inverted index cannot be used for lookup.
> > You just have doc-IDs which are values in inverted index, not the keys.
> > Hence they cannot be "looked" up - only option is to loop through all the
> > entries of that key's inverted index.
> >
> > DocValues come to rescue by reducing that looping operation to a lookup
> > again.
> > Because in docValues, the key (i.e. array-index) is the document-index
> and
> > gives an O(1) lookup for any doc-ID.
> >
> >
> > But that does not seem to be helping with sorting or faceting of any
> kind.
> > This seems to be like a good way to speed up a stored field's retrieval.
> >
> > DocValues in the current example are:
> > FieldA
> > doc1 = 1
> > doc2 = 2
> > doc3 =
> >
> > FieldB
> > doc1 = 2
> > doc2 = 4
> > doc3 = 5
> >
> > FieldC
> > doc1 = 5
> > doc2 =
> > doc3 = 5
> >
> > So if I have to run a query:
> > fieldA=*&sort=fieldB asc
> > I will get all the documents due to filter and then I will lookup the
> > values of field-B from the docValues lookup.
> > That will give me 2,4,5
> > This is sorted in this case, but assume that this was not sorted.
> > (The docValues array is indexed by Lucene's doc-ID not the field-value
> > after all, right?)
> >
> > Then does Lucene/Solr still sort them like regular array of values?
> > That does not seem very efficient.
> > And it does not seem to helping with faceting, pivoting too.
> > What did I miss?
> >
> > Thanks
> > SG
> >
> >
> >
> >
> >
> >
> > On Thu, Dec 21, 2017 at 5:31 PM, Erick Erickson  >
> > wrote:
> >
> >> Here's where you're going off the rails: "I can just look at the
> >> map-for-field-A"
> >>
> >> As I said before, you're totally right, all the information you need
> >> is there. But
> >> you're thinking of this as though speed weren't a premium when you say.
> >> "I can just look". Consider that there are single replicas out there
> with
> >> 300M
> >> (or more) docs in them. "Just looking" in a list 300M items long 300M
> times
> >> (q=*:*&sort=whatever) is simply not going to be performant compared to
> >> 300M indexing operations which is what DV does.
> >>
> >> Faceting is much worse.
> >>
> >> Plus space is also at a premium. Java takes 40+ bytes to store the first
> >> character. So any Java structure you use is going to be enormous. 300M
> ints
> >> is bad enough. And if you spoof this by using ordinals as Lucene does,
> >> you're
> >> well on your way to reinventing docValues.
> >>
> >> Maybe this will help. Imagine you have a phone book in your hands. It
> >> consists of documents like this:
> >>
> >> id: something
> >> phone: phone number
> >> name: perso

Re: Confusing DocValues documentation

2017-12-22 Thread Tech Id
Thanks Emir,

It seems that stored="false" docValues="true" is the default in Solr's
github and the recommended way to go.


grep "docValues=\"true\""
./server/solr/configsets/_default/conf/managed-schema








  Point fields don't support FieldCache, so they must have
docValues="true" if needed for sorting, faceting, functions, etc.

























So all the basic field-types (single and multi-valued) would have
docValues="true" and stored="false" is the default I assume.
But I do not get why the "id" field and the "dynamic fields" have
stored="true" in Solr 7:



grep "stored=\"true\""
./server/solr/configsets/_default/conf/managed-schema | grep -v "\*_txt_"




























































That is perhaps a bug?



Booleans seem to care neither about stored nor docValues:


grep -i boolean ./server/solr/configsets/_default/conf/managed-schema








-T



On Fri, Dec 22, 2017 at 11:20 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Your questions are already more or less answered:
> > 1) If the docValues are that good, can we git rid of the stored values
> > altogether?
> You can if you want - just configure your field with stored=“false” and
> docValues=“true”. Note that you can do that only if:
> * field is not analyzed (you cannot enable docValues for analyzed field)
> * you do not care about order of your values
>
> > 2) And why the docValues are not enabled by default for multi-valued
> fields?
> Because it is overhead when it comes to indexing and it is not used in all
> cases - only if field is used for faceting, sorting or in functions.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 22 Dec 2017, at 19:51, Tech Id  wrote:
> >
> > Very interesting discussion SG and Erick.
> > I wish these details were part of the official Solr documentation as
> well.
> > And yes, "columnar format" did not give any useful information to me
> either.
> >
> >
> > "A good explanation increases contributions to the project as more people
> > become empowered to improvise."
> >   - Self, LOL
> >
> >
> > I was expecting the sorting, faceting, pivoting to a bit more optimized
> for
> > docValues, something like a pre-calculated bit of information.
> > However, now it seems that the major benefit of docValues is to optimize
> > the lookup time of stored fields.
> > Here is the sorting function I wrote as pseudo-code from the discussion:
> >
> >
> > int docIDs[] = filterDocsOnQuery (query);
> > T docValues[] = loadDocValues (sortField);
> > TreeMap sortFieldValues[] = new TreeMap<>();
> > for (int docId : docIDs) {
> >T val = docValues[docId];
> >sortFieldValues.put(val, docId);
> > }
> > // return docIDs sorted by value
> > return sortFieldValues.values;
> >
> >
> > It is indeed difficult to pre-compute the sorts and facets because we do
> > not know what docIDs will be returned by the filtering.
> >
> > Two last questions I have are:
> > 1) If the docValues are that good, can we git rid of the stored values
> > altogether?
> > 2) And why the docValues are not enabled by default for multi-valued
> fields?
> >
> >
> > -T
> >
> >
> >
> >
> > On Thu, Dec 21, 2017 at 9:02 PM, Erick Erickson  >
> > wrote:
> >
> >> OK, last bit of the tutorial.
> >>
> >> bq: But that does not seem to be helping with sorting or faceting of any
> >> kind.
> >> This seems to be like a good way to speed up a stored field's retrieval.
> >>
> >> These are the same thing. I have two docs. I have to know how they
> >> sort. Therefore I need the value in the sort field for each. This the
> >> same thing as getting the stored value, no?
> >>
> >> As for facets it's the same problem. To count facet buckets I have to
> >> find the values for the  field for each document in the results list
> >> and tally them. This is also getting the stored value, right? You're
> >> asking "for the docs in my result set, how many of them have val1, how
> >> many have val2, how many have val54 etc.
> >>
> >&

Is DataImportHandler ready for production-usage?

2018-01-03 Thread Tech Id
Hi,

I stumbled across https://wiki.apache.org/solr/DataImportHandler and found
it matching my needs exactly.
So I just wanted to confirm if it is an actively supported plugin, before I
start using it for production.

Are there any users who have had a good or a bad experience with DIH ?

Thanks
TI


Should zookeeper be run on the worker machines?

2016-11-23 Thread Tech Id
Hi,

Can someone please respond to this zookeeper-for-Solr Stack-Overflow
question: http://stackoverflow.com/questions/40755137/should-
zookeeper-be-run-on-the-worker-machines

Thanks
TI


Example of join using Solr/Lucene

2013-11-05 Thread Tech Id
Hi,

I have been searching for an example of joins using solr/lucene.
But I have not found anything either on the net or in the src/examples.

Can someone please point me to the same?
Ideally, I need a join working with Solrj APIs (Please let me know if this
group is Lucene-specific).


Best Regards


Re: Example of join using Solr/Lucene

2013-11-05 Thread Tech Id
I think Solr has the ability to do joins in the latest version as verified
on this issue: https://issues.apache.org/jira/browse/SOLR-3076

And some online resources point to this example:
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
However, I am not sure if the above example is a complete one.
It leaves a lot for a fresh solr-user to guess about how to customize the
schema and how to index two documents into Solr before doing a join.



On Tue, Nov 5, 2013 at 9:31 AM, Tech Id  wrote:

> Hi,
>
> I have been searching for an example of joins using solr/lucene.
> But I have not found anything either on the net or in the src/examples.
>
> Can someone please point me to the same?
> Ideally, I need a join working with Solrj APIs (Please let me know if this
> group is Lucene-specific).
>
>
> Best Regards
>
>


Re: Example of join using Solr/Lucene

2013-11-05 Thread Tech Id
Hi Alvaro,

Could you please point me to some link from where I can see how to index
two documents separately (joined by foreign keys).
Or if you can oblige by putting down some details here itself.

*For example*, say if user has entities like :
  car  {id:5, color:red, year:2004, companyId:23, ownerId: 57},
  company {id:23, name: toyota, numEmployees:1000, established:1980},
  owner {id: 57, name: John, age: 50, profession: doctor, spouseId: 78,
carOwnedId: 5},
  owner {id: 78, name: Maria, age: 45, profession: doctor, spouseId: 57,
carOwnedId: 55}
  etc.
1) How can the above entities be put into Solr with their foreign keys?
2) Do we need to flatten them absolutely?
3) How are cyclic joins handled in flattening?

Some good link on how a join query can be actually run would also be
appreciated.
(I have some links on the reading part, but a complete example would be
good).

Thanks



On Tue, Nov 5, 2013 at 10:53 AM, Alvaro Cabrerizo wrote:

> In my case, everytime I've used joins, the FROM field was a multivalued
> string and the TO was an univalued string.
>
> Regards.
> El 05/11/2013 18:37, "Tech Id"  escribió:
>
> > I think Solr has the ability to do joins in the latest version as
> verified
> > on this issue: https://issues.apache.org/jira/browse/SOLR-3076
> >
> > And some online resources point to this example:
> >
> >
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
> > However, I am not sure if the above example is a complete one.
> > It leaves a lot for a fresh solr-user to guess about how to customize the
> > schema and how to index two documents into Solr before doing a join.
> >
> >
> >
> > On Tue, Nov 5, 2013 at 9:31 AM, Tech Id 
> wrote:
> >
> > > Hi,
> > >
> > > I have been searching for an example of joins using solr/lucene.
> > > But I have not found anything either on the net or in the src/examples.
> > >
> > > Can someone please point me to the same?
> > > Ideally, I need a join working with Solrj APIs (Please let me know if
> > this
> > > group is Lucene-specific).
> > >
> > >
> > > Best Regards
> > >
> > >
> >
>