Licensing issue advice for Solr.

2017-03-24 Thread russell . lemaster
Hi all, 

I'm just getting started with Solr (6.4.2) and am trying to get approval for 
usage in my workplace. 
I know that the product in general is licensed as Apache 2.0, but unfortunately 
there are packages 
included in the build that are considered "non-permissive" by my company and as 
such, means that 
I am having trouble getting things approved. 

It appears that the vast majority of the licensing issues are within the 
contrib directory. I know these 
provide significant functionality for Solr, but I was wondering if there is an 
official build that contains 
just the Solr and Lucene server distribution (minus demos and contrib). Some of 
the packages are 
dual licensed so I am able to deal with that by selecting which we wish to use, 
but there are some 
that are either not licensed at all or are only non-permissive (ie: not Apache, 
BSD, MIT, etc.) like 
GPL, CDDL, etc. 

Has anyone had to deal with this in the past. My apologies if this has been 
discussed before, but 
it doesn't appear that the mail list archive has a search option (correct me if 
I'm wrong on that). 

Thanks 




Advice on how to work with pure JSON data.

2017-04-20 Thread russell . lemaster

I have looked at many examples on how to do what I want, but they tend to only 
show fragments or they 
are based on older versions of Solr. I'm hoping there are new features that 
make what I'm doing easier. 

I am running version 6.5 and am testing by running in cloud mode but only on a 
single machine. 

Basically, I have a large number of documents stored as JSON in individual 
files. I want to take that JSON 
document and index it without having to do any pre-processing, etc. I also need 
to be able to write newly indexed 
JSON data back to individual files in the same format. 

For example, let's say I have a json document that looks like the following: 

{ 
"id" : "bb903493-55b0-421f-a83e-2199ea11e136", 
"productName_s" : "UsefulWidget", 
"productCategory_s" : "tool", 
"suppliers" : [ 
{ 
"id" : " bb903493-55b0-421f-a83e-2199ea11e221", 
"name_s" : "Acme Tools", 
"productNumber_s" : "10342UW" 
}, { 
"id" : " bb903493-55b0-421a-a83e-2199ea11e445", 
"name_s" : "Snappy Tools", 
"productNumber_s" : "ST-X100023" 
} 
], 
"resellers" : [ 
{ 
"id" : "cc 903493-55b0-421f-a83e-2199ea11e221", 
"name_s" : "Target", 
"productSKU_s" : "TA092310342UW" 
}, { 
"id" : "bc903493-55b0-421a-a83e-2199ea11e445", 
"name_s" : "Wal-Mart", 
"productSKU_s" : "029342ABLSWM" 
} 
] 
} 

I know I can use the /update/json/docs handler to insert the above but from 
what I understand, I'd have to set up parameters 
telling it how to split the children, etc. Though that is a bit of a pain, I 
can make that happen. 

The problem is that, when I then try to query for the data, it comes back with 
_childDocuments_ instead of the names of the 
child document lists. So, how can I have Solr return the document as it was 
originally indexed (I know it would be embedded 
in the results structure, but I can deal with that)? 

I am running version 6.5 and I am hoping there is a method I haven't seen 
documented that can do this. If not, can someone 
point me to some examples of how to do this another way. 

If there is no easy way to do this with the current version, can someone point 
me to a good resource for writing my own 
handlers? 

Thank you. 










Re: Advice on how to work with pure JSON data.

2017-04-20 Thread russell . lemaster
One thing I forgot to mention in my original post is that I wish to do this 
using the SolrJ client. 
I have my own rest server that presents a common API to our users, but the 
back-end can be 
anything I wish. I have been using "that other Lucene based product" :), but I 
wish to stick to 
a product that is more open and that perhaps I can contribute to. 

I've searched for SolrJ examples for child documents and unfortunately there 
are far too 
many references to implementations based off of older versions of Solr. 
Specifically, I would 
like to insert beans with multiple child collections in them, but the latest 
I've read says this 
is not currently possible. Is that still true? 

In short, It isn't so important that REST based requests / responses from Solr 
are pure JSON 
so long as I can do what I want from the java client. 

Do you know if there have been recent additions / enhancements up through 6.5 
that make 
this more straight-forward? 

Thanks 


- Original Message -

From: "Mikhail Khludnev"  
To: "solr-user"  
Sent: Thursday, April 20, 2017 3:38:11 PM 
Subject: Re: Advice on how to work with pure JSON data. 

This is one of the features of the epic 
https://issues.apache.org/jira/browse/SOLR-10144. 
Until it's done the only way to achieve this is to properly set many params 
for 
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents#TransformingResultDocuments-[subquery]
 

Note, here I assume that children mapping is static ie there is a limited 
list of optional scopes. 
Indexing and searching arbitrary JSON is esoteric (XML DB like) problem. 
Also, beware of https://issues.apache.org/jira/browse/SOLR-10500. I hope to 
fix it soon. 

On Thu, Apr 20, 2017 at 10:15 PM,  wrote: 

> 
> I have looked at many examples on how to do what I want, but they tend to 
> only show fragments or they 
> are based on older versions of Solr. I'm hoping there are new features 
> that make what I'm doing easier. 
> 
> I am running version 6.5 and am testing by running in cloud mode but only 
> on a single machine. 
> 
> Basically, I have a large number of documents stored as JSON in individual 
> files. I want to take that JSON 
> document and index it without having to do any pre-processing, etc. I also 
> need to be able to write newly indexed 
> JSON data back to individual files in the same format. 
> 
> For example, let's say I have a json document that looks like the 
> following: 
> 
> { 
> "id" : "bb903493-55b0-421f-a83e-2199ea11e136", 
> "productName_s" : "UsefulWidget", 
> "productCategory_s" : "tool", 
> "suppliers" : [ 
> { 
> "id" : " bb903493-55b0-421f-a83e-2199ea11e221", 
> "name_s" : "Acme Tools", 
> "productNumber_s" : "10342UW" 
> }, { 
> "id" : " bb903493-55b0-421a-a83e-2199ea11e445", 
> "name_s" : "Snappy Tools", 
> "productNumber_s" : "ST-X100023" 
> } 
> ], 
> "resellers" : [ 
> { 
> "id" : "cc 903493-55b0-421f-a83e-2199ea11e221", 
> "name_s" : "Target", 
> "productSKU_s" : "TA092310342UW" 
> }, { 
> "id" : "bc903493-55b0-421a-a83e-2199ea11e445", 
> "name_s" : "Wal-Mart", 
> "productSKU_s" : "029342ABLSWM" 
> } 
> ] 
> } 
> 
> I know I can use the /update/json/docs handler to insert the above but 
> from what I understand, I'd have to set up parameters 
> telling it how to split the children, etc. Though that is a bit of a pain, 
> I can make that happen. 
> 
> The problem is that, when I then try to query for the data, it comes back 
> with _childDocuments_ instead of the names of the 
> child document lists. So, how can I have Solr return the document as it 
> was originally indexed (I know it would be embedded 
> in the results structure, but I can deal with that)? 
> 
> I am running version 6.5 and I am hoping there is a method I haven't seen 
> documented that can do this. If not, can someone 
> point me to some examples of how to do this another way. 
> 
> If there is no easy way to do this with the current version, can someone 
> point me to a good resource for writing my own 
> handlers? 
> 
> Thank you. 
> 
> 
> 
> 
> 
> 
> 
> 
> 


-- 
Sincerely yours 
Mikhail Khludnev