Re: solr blocking and client timeout issue

2015-07-19 Thread Shawn Heisey
On 7/19/2015 12:46 AM, Jeremy Ashcraft wrote:
> That did the trick.  The GC tuning options also seems to be working, but
> I guess we'll see when traffic ramps back up on monday.  Thanks for all
> your help!
> 
> On 7/18/2015 8:16 AM, Shawn Heisey wrote:
>> The first thing I'd try is removing the UseLargePages option and see if
>> it goes away.

Glad you got the warning out of there.

Noticing that the message said "OpenJDK" I am betting you still have
OpenJDK 7u25 on the system.  I really was serious when I recommended
getting the latest Oracle Java that you could on the system.  Memory
management has seen a lot of improvement since the early Java 7 days.

There's nothing wrong with OpenJDK, as long as it's not OpenJDK 6, but
overall we do see the best results with the Oracle JVM.  If you want to
stick with OpenJDK, it would be a very good idea to get the latest v7 or
v8 version instead of the old version you've got.  7u25 is over two
years old.  Java development moves very quickly, which makes that
version a lot like ancient history.

Part of the reason that my general recommendation is Java 8 is that both
Java 6 and Java 7 have reached end of support at Oracle.  There will be
no more development on those versions.

Having said that, note that it's just a recommendation, whether you
follow that recommendation is up to you.  I found 7u25 to be a very
solid release ... but release 60 and later have better memory
management.  Don't use 7u40, 7u45, 7u51, or 7u55 -- those versions have
known bugs that DO affect Lucene/Solr.

Thanks,
Shawn



Re: Basic auth

2015-07-19 Thread solr . user . 1507
I followed this guide:
http://learnsubjects.drupalgardens.com/content/how-place-http-authentication-solr

But there is some something wrong, can anyone help or refer to a guide on how 
to setup http basic auth?

Regards

> On 19 Jul 2015, at 01:10, solr.user.1...@gmail.com wrote:
> 
> SOLR-4470 is about:
> Support for basic auth in internal Solr  requests.
> 
> What is wrong with the internal requests?
> Can someone help simplify, would it ever be possible to run with basic auth? 
> What work arounds?
> 
> Regards


Re: Basic auth

2015-07-19 Thread Erick Erickson
You're mixing up a couple of things. The Drupal is specific to, well,
Drupal. You'd probably be best off asking about that on the Drupal
lists.

SOLR-4470 has not been committed yet, so you can't really use it. This
may have been superceded by SOLR-7274 and there's a link to the Wiki
that points to:
https://cwiki.apache.org/confluence/display/solr/Security

This is all quite new, not sure how much is written in the way of docs.

Best,
Erick

On Sun, Jul 19, 2015 at 9:35 AM,   wrote:
> I followed this guide:
> http://learnsubjects.drupalgardens.com/content/how-place-http-authentication-solr
>
> But there is some something wrong, can anyone help or refer to a guide on how 
> to setup http basic auth?
>
> Regards
>
>> On 19 Jul 2015, at 01:10, solr.user.1...@gmail.com wrote:
>>
>> SOLR-4470 is about:
>> Support for basic auth in internal Solr  requests.
>>
>> What is wrong with the internal requests?
>> Can someone help simplify, would it ever be possible to run with basic auth? 
>> What work arounds?
>>
>> Regards


Usage of CloudSolrStream while new data is indexed

2015-07-19 Thread mihaela olteanu
Hello,
I am iterating through a whole collection in SolrCloud using CloudSolrStream. 
While I am doing this operation new data gets indexed into the collection. Does 
CloudSolrStream pick up the newly added values? Is it negatively impacted by 
this operation or what is the impact of the collection traversal on indexing?
I am not sure about how CloudSolrStream works. Does it simply read segment 
files, so it might be slowed down while merging segments due to newly data 
being indexed?
Could someone explain me this process and what is the relation between these 
two operations?

Thank you in advance.Mihaela

Tips for faster indexing

2015-07-19 Thread Vineeth Dasaraju
Hi,

I am trying to index JSON objects (which contain nested JSON objects and
Arrays in them) into solr.

My JSON Object looks like the following (This is fake data that I am using
for this example):

{
"RawEventMessage": "Lorem ipsum dolor sit amet, consectetur adipiscing
elit. Aliquam dolor orci, placerat ac pretium a, tincidunt consectetur
mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio iaculis.
Donec fringilla diam at placerat interdum. Proin vitae arcu non augue
facilisis auctor id non neque. Integer non nibh sit amet justo facilisis
semper a vel ligula. Pellentesque commodo vulputate consequat. ",
"EventUid": "1279706565",
"TimeOfEvent": "2015-05-01-08-07-13",
"TimeOfEventUTC": "2015-05-01-01-07-13",
"EventCollector": "kafka",
"EventMessageType": "kafka-@column",
"User": {
"User": "Lorem ipsum",
"UserGroup": "Manager",
"Location": "consectetur adipiscing",
"Department": "Legal"
},
"EventDescription": {
"EventApplicationName": "",
"Query": "SELECT * FROM MOVIES",
"Information": [
{
"domainName": "English",
"columns": [
{
"movieName": "Casablanca",
"duration": "154",
},
{
"movieName": "Die Hard",
"duration": "127",
}
]
},
{
"domainName": "Hindi",
"columns": [
{
"movieName": "DDLJ",
"duration": "176",
}
]
}
]
}
}



My function for indexing the object is as follows:

public static void indexJSON(JSONObject jsonOBJ) throws ParseException,
IOException, SolrServerException {
Collection batch = new
ArrayList();

SolrInputDocument mainEvent = new SolrInputDocument();
mainEvent.addField("id", generateID());
mainEvent.addField("RawEventMessage", jsonOBJ.get("RawEventMessage"));
mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector"));
mainEvent.addField("EventMessageType", jsonOBJ.get("EventMessageType"));
mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC"));

Object obj = parser.parse(jsonOBJ.get("User").toString());
JSONObject userObj = (JSONObject) obj;

SolrInputDocument childUserEvent = new SolrInputDocument();
childUserEvent.addField("id", generateID());
childUserEvent.addField("User", userObj.get("User"));

obj = parser.parse(jsonOBJ.get("EventDescription").toString());
JSONObject eventdescriptionObj = (JSONObject) obj;

SolrInputDocument childEventDescEvent = new SolrInputDocument();
childEventDescEvent.addField("id", generateID());
childEventDescEvent.addField("EventApplicationName",
eventdescriptionObj.get("EventApplicationName"));
childEventDescEvent.addField("Query", eventdescriptionObj.get("Query"));

obj= JSONValue.parse(eventdescriptionObj.get("Information").toString());
JSONArray informationArray = (JSONArray) obj;

for(int i = 0; i

Re: Usage of CloudSolrStream while new data is indexed

2015-07-19 Thread Erick Erickson
The basic assumption is that any search has a "snapshot" of the index
via the currently open searcher. I expect the streaming code is no
different, indexing while reading from a stream should _not_ show
documents that weren't in the index (and visible) when they query was
submitted, regardless of how long it takes to stream all the results
out and regardless of how the index changes while that's happening.

As far as segments being merged when the stream is writing out, it
shouldn't matter. The segment files are read-only, the fact that
background merging will read the segment while docs in that segment
are being streamed out shouldn't matter. And if a background merge
happens, the merged segments won't be deleted until after the query is
complete, in this case until all documents that satisfy the search
(regardless of how many segments are spanned) are done.

HTH,
Erick


On Sun, Jul 19, 2015 at 11:38 AM, mihaela olteanu
 wrote:
> Hello,
> I am iterating through a whole collection in SolrCloud using CloudSolrStream. 
> While I am doing this operation new data gets indexed into the collection. 
> Does CloudSolrStream pick up the newly added values? Is it negatively 
> impacted by this operation or what is the impact of the collection traversal 
> on indexing?
> I am not sure about how CloudSolrStream works. Does it simply read segment 
> files, so it might be slowed down while merging segments due to newly data 
> being indexed?
> Could someone explain me this process and what is the relation between these 
> two operations?
>
> Thank you in advance.Mihaela


Re: Tips for faster indexing

2015-07-19 Thread Erick Erickson
First thing is it looks like you're only sending one document at a
time, perhaps with child objects. This is not optimal at all. I
usually batch my docs up in groups of 1,000, and there is anecdotal
evidence that there may (depending on the docs) be some gains above
that number. Gotta balance the batch size off against how bug the docs
are of course.

Assuming that you really are calling this method for one doc (and
children) at a time, the far bigger problem other than calling
server.add for each parent/children is that you're then calling
solr.commit() every time. This is an anti-pattern. Generally, let the
autoCommit setting in solrconfig.xml handle the intermediate commits
while the indexing program is running and only issue a commit at the
very end of the job if at all.

Best,
Erick

On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju
 wrote:
> Hi,
>
> I am trying to index JSON objects (which contain nested JSON objects and
> Arrays in them) into solr.
>
> My JSON Object looks like the following (This is fake data that I am using
> for this example):
>
> {
> "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur adipiscing
> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt consectetur
> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio iaculis.
> Donec fringilla diam at placerat interdum. Proin vitae arcu non augue
> facilisis auctor id non neque. Integer non nibh sit amet justo facilisis
> semper a vel ligula. Pellentesque commodo vulputate consequat. ",
> "EventUid": "1279706565",
> "TimeOfEvent": "2015-05-01-08-07-13",
> "TimeOfEventUTC": "2015-05-01-01-07-13",
> "EventCollector": "kafka",
> "EventMessageType": "kafka-@column",
> "User": {
> "User": "Lorem ipsum",
> "UserGroup": "Manager",
> "Location": "consectetur adipiscing",
> "Department": "Legal"
> },
> "EventDescription": {
> "EventApplicationName": "",
> "Query": "SELECT * FROM MOVIES",
> "Information": [
> {
> "domainName": "English",
> "columns": [
> {
> "movieName": "Casablanca",
> "duration": "154",
> },
> {
> "movieName": "Die Hard",
> "duration": "127",
> }
> ]
> },
> {
> "domainName": "Hindi",
> "columns": [
> {
> "movieName": "DDLJ",
> "duration": "176",
> }
> ]
> }
> ]
> }
> }
>
>
>
> My function for indexing the object is as follows:
>
> public static void indexJSON(JSONObject jsonOBJ) throws ParseException,
> IOException, SolrServerException {
> Collection batch = new
> ArrayList();
>
> SolrInputDocument mainEvent = new SolrInputDocument();
> mainEvent.addField("id", generateID());
> mainEvent.addField("RawEventMessage", jsonOBJ.get("RawEventMessage"));
> mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
> mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector"));
> mainEvent.addField("EventMessageType", jsonOBJ.get("EventMessageType"));
> mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
> mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC"));
>
> Object obj = parser.parse(jsonOBJ.get("User").toString());
> JSONObject userObj = (JSONObject) obj;
>
> SolrInputDocument childUserEvent = new SolrInputDocument();
> childUserEvent.addField("id", generateID());
> childUserEvent.addField("User", userObj.get("User"));
>
> obj = parser.parse(jsonOBJ.get("EventDescription").toString());
> JSONObject eventdescriptionObj = (JSONObject) obj;
>
> SolrInputDocument childEventDescEvent = new SolrInputDocument();
> childEventDescEvent.addField("id", generateID());
> childEventDescEvent.addField("EventApplicationName",
> eventdescriptionObj.get("EventApplicationName"));
> childEventDescEvent.addField("Query", eventdescriptionObj.get("Query"));
>
> obj= JSONValue.parse(eventdescriptionObj.get("Information").toString());
> JSONArray informationArray = (JSONArray) obj;
>
> for(int i = 0; i JSONObject domain = (JSONObject) informationArray.get(i);
>
> SolrInputDocument domainDoc = new SolrInputDocument();
> domainDoc.addField("id", generateID());
> domainDoc.addField("domainName", domain.get("domainName"));
>
> String s = domain.get("columns").toString();
> obj= JSONValue.parse(s);
> JSONArray ColumnsArray = (JSONArray) obj;
>
> SolrInputDocument columnsDoc = new SolrInputDocument();
> columnsDoc.addField("id", generateID());
>
> for(int j = 0; j JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
>