Hi,

I am trying to index JSON objects (which contain nested JSON objects and
Arrays in them) into solr.

My JSON Object looks like the following (This is fake data that I am using
for this example):

{
    "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur adipiscing
elit. Aliquam dolor orci, placerat ac pretium a, tincidunt consectetur
mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio iaculis.
Donec fringilla diam at placerat interdum. Proin vitae arcu non augue
facilisis auctor id non neque. Integer non nibh sit amet justo facilisis
semper a vel ligula. Pellentesque commodo vulputate consequat. ",
    "EventUid": "1279706565",
    "TimeOfEvent": "2015-05-01-08-07-13",
    "TimeOfEventUTC": "2015-05-01-01-07-13",
    "EventCollector": "kafka",
    "EventMessageType": "kafka-@column",
    "User": {
        "User": "Lorem ipsum",
        "UserGroup": "Manager",
        "Location": "consectetur adipiscing",
        "Department": "Legal"
    },
    "EventDescription": {
        "EventApplicationName": "",
        "Query": "SELECT * FROM MOVIES",
        "Information": [
            {
                "domainName": "English",
                "columns": [
                    {
                        "movieName": "Casablanca",
                        "duration": "154",
                    },
    {
                        "movieName": "Die Hard",
                        "duration": "127",
                    }
                ]
            },
            {
                "domainName": "Hindi",
                "columns": [
                    {
                        "movieName": "DDLJ",
                        "duration": "176",
                    }
                ]
            }
        ]
    }
}



My function for indexing the object is as follows:

public static void indexJSON(JSONObject jsonOBJ) throws ParseException,
IOException, SolrServerException {
    Collection<SolrInputDocument> batch = new
ArrayList<SolrInputDocument>();

    SolrInputDocument mainEvent = new SolrInputDocument();
    mainEvent.addField("id", generateID());
    mainEvent.addField("RawEventMessage", jsonOBJ.get("RawEventMessage"));
    mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
    mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector"));
    mainEvent.addField("EventMessageType", jsonOBJ.get("EventMessageType"));
    mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
    mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC"));

    Object obj = parser.parse(jsonOBJ.get("User").toString());
    JSONObject userObj = (JSONObject) obj;

    SolrInputDocument childUserEvent = new SolrInputDocument();
    childUserEvent.addField("id", generateID());
    childUserEvent.addField("User", userObj.get("User"));

    obj = parser.parse(jsonOBJ.get("EventDescription").toString());
    JSONObject eventdescriptionObj = (JSONObject) obj;

    SolrInputDocument childEventDescEvent = new SolrInputDocument();
    childEventDescEvent.addField("id", generateID());
    childEventDescEvent.addField("EventApplicationName",
eventdescriptionObj.get("EventApplicationName"));
    childEventDescEvent.addField("Query", eventdescriptionObj.get("Query"));

    obj= JSONValue.parse(eventdescriptionObj.get("Information").toString());
    JSONArray informationArray = (JSONArray) obj;

    for(int i = 0; i<informationArray.size(); i++){
        JSONObject domain = (JSONObject) informationArray.get(i);

        SolrInputDocument domainDoc = new SolrInputDocument();
        domainDoc.addField("id", generateID());
        domainDoc.addField("domainName", domain.get("domainName"));

        String s = domain.get("columns").toString();
        obj= JSONValue.parse(s);
        JSONArray ColumnsArray = (JSONArray) obj;

        SolrInputDocument columnsDoc = new SolrInputDocument();
        columnsDoc.addField("id", generateID());

        for(int j = 0; j<ColumnsArray.size(); j++){
            JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
            SolrInputDocument columnDoc = new SolrInputDocument();
            columnDoc.addField("id", generateID());
            columnDoc.addField("movieName", ColumnsObj.get("movieName"));
            columnsDoc.addChildDocument(columnDoc);
        }
        domainDoc.addChildDocument(columnsDoc);
        childEventDescEvent.addChildDocument(domainDoc);
    }

    mainEvent.addChildDocument(childEventDescEvent);
    mainEvent.addChildDocument(childUserEvent);
    batch.add(mainEvent);
    solr.add(batch);
    solr.commit();
}

When I try to index the using the above code, I am able to index only 12
Objects per second. Is there a faster way to do the indexing? I believe I
am using the json-fast parser which is one of the fastest parsers for json.

Your help will be very valuable to me.

Thanks,
Vineeth

Reply via email to