Re: solr blocking and client timeout issue
On 7/19/2015 12:46 AM, Jeremy Ashcraft wrote: > That did the trick. The GC tuning options also seems to be working, but > I guess we'll see when traffic ramps back up on monday. Thanks for all > your help! > > On 7/18/2015 8:16 AM, Shawn Heisey wrote: >> The first thing I'd try is removing the UseLargePages option and see if >> it goes away. Glad you got the warning out of there. Noticing that the message said "OpenJDK" I am betting you still have OpenJDK 7u25 on the system. I really was serious when I recommended getting the latest Oracle Java that you could on the system. Memory management has seen a lot of improvement since the early Java 7 days. There's nothing wrong with OpenJDK, as long as it's not OpenJDK 6, but overall we do see the best results with the Oracle JVM. If you want to stick with OpenJDK, it would be a very good idea to get the latest v7 or v8 version instead of the old version you've got. 7u25 is over two years old. Java development moves very quickly, which makes that version a lot like ancient history. Part of the reason that my general recommendation is Java 8 is that both Java 6 and Java 7 have reached end of support at Oracle. There will be no more development on those versions. Having said that, note that it's just a recommendation, whether you follow that recommendation is up to you. I found 7u25 to be a very solid release ... but release 60 and later have better memory management. Don't use 7u40, 7u45, 7u51, or 7u55 -- those versions have known bugs that DO affect Lucene/Solr. Thanks, Shawn
Re: Basic auth
I followed this guide: http://learnsubjects.drupalgardens.com/content/how-place-http-authentication-solr But there is some something wrong, can anyone help or refer to a guide on how to setup http basic auth? Regards > On 19 Jul 2015, at 01:10, solr.user.1...@gmail.com wrote: > > SOLR-4470 is about: > Support for basic auth in internal Solr requests. > > What is wrong with the internal requests? > Can someone help simplify, would it ever be possible to run with basic auth? > What work arounds? > > Regards
Re: Basic auth
You're mixing up a couple of things. The Drupal is specific to, well, Drupal. You'd probably be best off asking about that on the Drupal lists. SOLR-4470 has not been committed yet, so you can't really use it. This may have been superceded by SOLR-7274 and there's a link to the Wiki that points to: https://cwiki.apache.org/confluence/display/solr/Security This is all quite new, not sure how much is written in the way of docs. Best, Erick On Sun, Jul 19, 2015 at 9:35 AM, wrote: > I followed this guide: > http://learnsubjects.drupalgardens.com/content/how-place-http-authentication-solr > > But there is some something wrong, can anyone help or refer to a guide on how > to setup http basic auth? > > Regards > >> On 19 Jul 2015, at 01:10, solr.user.1...@gmail.com wrote: >> >> SOLR-4470 is about: >> Support for basic auth in internal Solr requests. >> >> What is wrong with the internal requests? >> Can someone help simplify, would it ever be possible to run with basic auth? >> What work arounds? >> >> Regards
Usage of CloudSolrStream while new data is indexed
Hello, I am iterating through a whole collection in SolrCloud using CloudSolrStream. While I am doing this operation new data gets indexed into the collection. Does CloudSolrStream pick up the newly added values? Is it negatively impacted by this operation or what is the impact of the collection traversal on indexing? I am not sure about how CloudSolrStream works. Does it simply read segment files, so it might be slowed down while merging segments due to newly data being indexed? Could someone explain me this process and what is the relation between these two operations? Thank you in advance.Mihaela
Tips for faster indexing
Hi, I am trying to index JSON objects (which contain nested JSON objects and Arrays in them) into solr. My JSON Object looks like the following (This is fake data that I am using for this example): { "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam dolor orci, placerat ac pretium a, tincidunt consectetur mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio iaculis. Donec fringilla diam at placerat interdum. Proin vitae arcu non augue facilisis auctor id non neque. Integer non nibh sit amet justo facilisis semper a vel ligula. Pellentesque commodo vulputate consequat. ", "EventUid": "1279706565", "TimeOfEvent": "2015-05-01-08-07-13", "TimeOfEventUTC": "2015-05-01-01-07-13", "EventCollector": "kafka", "EventMessageType": "kafka-@column", "User": { "User": "Lorem ipsum", "UserGroup": "Manager", "Location": "consectetur adipiscing", "Department": "Legal" }, "EventDescription": { "EventApplicationName": "", "Query": "SELECT * FROM MOVIES", "Information": [ { "domainName": "English", "columns": [ { "movieName": "Casablanca", "duration": "154", }, { "movieName": "Die Hard", "duration": "127", } ] }, { "domainName": "Hindi", "columns": [ { "movieName": "DDLJ", "duration": "176", } ] } ] } } My function for indexing the object is as follows: public static void indexJSON(JSONObject jsonOBJ) throws ParseException, IOException, SolrServerException { Collection batch = new ArrayList(); SolrInputDocument mainEvent = new SolrInputDocument(); mainEvent.addField("id", generateID()); mainEvent.addField("RawEventMessage", jsonOBJ.get("RawEventMessage")); mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector")); mainEvent.addField("EventMessageType", jsonOBJ.get("EventMessageType")); mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC")); Object obj = parser.parse(jsonOBJ.get("User").toString()); JSONObject userObj = (JSONObject) obj; SolrInputDocument childUserEvent = new SolrInputDocument(); childUserEvent.addField("id", generateID()); childUserEvent.addField("User", userObj.get("User")); obj = parser.parse(jsonOBJ.get("EventDescription").toString()); JSONObject eventdescriptionObj = (JSONObject) obj; SolrInputDocument childEventDescEvent = new SolrInputDocument(); childEventDescEvent.addField("id", generateID()); childEventDescEvent.addField("EventApplicationName", eventdescriptionObj.get("EventApplicationName")); childEventDescEvent.addField("Query", eventdescriptionObj.get("Query")); obj= JSONValue.parse(eventdescriptionObj.get("Information").toString()); JSONArray informationArray = (JSONArray) obj; for(int i = 0; i
Re: Usage of CloudSolrStream while new data is indexed
The basic assumption is that any search has a "snapshot" of the index via the currently open searcher. I expect the streaming code is no different, indexing while reading from a stream should _not_ show documents that weren't in the index (and visible) when they query was submitted, regardless of how long it takes to stream all the results out and regardless of how the index changes while that's happening. As far as segments being merged when the stream is writing out, it shouldn't matter. The segment files are read-only, the fact that background merging will read the segment while docs in that segment are being streamed out shouldn't matter. And if a background merge happens, the merged segments won't be deleted until after the query is complete, in this case until all documents that satisfy the search (regardless of how many segments are spanned) are done. HTH, Erick On Sun, Jul 19, 2015 at 11:38 AM, mihaela olteanu wrote: > Hello, > I am iterating through a whole collection in SolrCloud using CloudSolrStream. > While I am doing this operation new data gets indexed into the collection. > Does CloudSolrStream pick up the newly added values? Is it negatively > impacted by this operation or what is the impact of the collection traversal > on indexing? > I am not sure about how CloudSolrStream works. Does it simply read segment > files, so it might be slowed down while merging segments due to newly data > being indexed? > Could someone explain me this process and what is the relation between these > two operations? > > Thank you in advance.Mihaela
Re: Tips for faster indexing
First thing is it looks like you're only sending one document at a time, perhaps with child objects. This is not optimal at all. I usually batch my docs up in groups of 1,000, and there is anecdotal evidence that there may (depending on the docs) be some gains above that number. Gotta balance the batch size off against how bug the docs are of course. Assuming that you really are calling this method for one doc (and children) at a time, the far bigger problem other than calling server.add for each parent/children is that you're then calling solr.commit() every time. This is an anti-pattern. Generally, let the autoCommit setting in solrconfig.xml handle the intermediate commits while the indexing program is running and only issue a commit at the very end of the job if at all. Best, Erick On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju wrote: > Hi, > > I am trying to index JSON objects (which contain nested JSON objects and > Arrays in them) into solr. > > My JSON Object looks like the following (This is fake data that I am using > for this example): > > { > "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur adipiscing > elit. Aliquam dolor orci, placerat ac pretium a, tincidunt consectetur > mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio iaculis. > Donec fringilla diam at placerat interdum. Proin vitae arcu non augue > facilisis auctor id non neque. Integer non nibh sit amet justo facilisis > semper a vel ligula. Pellentesque commodo vulputate consequat. ", > "EventUid": "1279706565", > "TimeOfEvent": "2015-05-01-08-07-13", > "TimeOfEventUTC": "2015-05-01-01-07-13", > "EventCollector": "kafka", > "EventMessageType": "kafka-@column", > "User": { > "User": "Lorem ipsum", > "UserGroup": "Manager", > "Location": "consectetur adipiscing", > "Department": "Legal" > }, > "EventDescription": { > "EventApplicationName": "", > "Query": "SELECT * FROM MOVIES", > "Information": [ > { > "domainName": "English", > "columns": [ > { > "movieName": "Casablanca", > "duration": "154", > }, > { > "movieName": "Die Hard", > "duration": "127", > } > ] > }, > { > "domainName": "Hindi", > "columns": [ > { > "movieName": "DDLJ", > "duration": "176", > } > ] > } > ] > } > } > > > > My function for indexing the object is as follows: > > public static void indexJSON(JSONObject jsonOBJ) throws ParseException, > IOException, SolrServerException { > Collection batch = new > ArrayList(); > > SolrInputDocument mainEvent = new SolrInputDocument(); > mainEvent.addField("id", generateID()); > mainEvent.addField("RawEventMessage", jsonOBJ.get("RawEventMessage")); > mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); > mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector")); > mainEvent.addField("EventMessageType", jsonOBJ.get("EventMessageType")); > mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); > mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC")); > > Object obj = parser.parse(jsonOBJ.get("User").toString()); > JSONObject userObj = (JSONObject) obj; > > SolrInputDocument childUserEvent = new SolrInputDocument(); > childUserEvent.addField("id", generateID()); > childUserEvent.addField("User", userObj.get("User")); > > obj = parser.parse(jsonOBJ.get("EventDescription").toString()); > JSONObject eventdescriptionObj = (JSONObject) obj; > > SolrInputDocument childEventDescEvent = new SolrInputDocument(); > childEventDescEvent.addField("id", generateID()); > childEventDescEvent.addField("EventApplicationName", > eventdescriptionObj.get("EventApplicationName")); > childEventDescEvent.addField("Query", eventdescriptionObj.get("Query")); > > obj= JSONValue.parse(eventdescriptionObj.get("Information").toString()); > JSONArray informationArray = (JSONArray) obj; > > for(int i = 0; i JSONObject domain = (JSONObject) informationArray.get(i); > > SolrInputDocument domainDoc = new SolrInputDocument(); > domainDoc.addField("id", generateID()); > domainDoc.addField("domainName", domain.get("domainName")); > > String s = domain.get("columns").toString(); > obj= JSONValue.parse(s); > JSONArray ColumnsArray = (JSONArray) obj; > > SolrInputDocument columnsDoc = new SolrInputDocument(); > columnsDoc.addField("id", generateID()); > > for(int j = 0; j JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j); >