Hi,

ok, my only recomendation is: you should fix your Elasticsearch cluster to
be able to handle the load because it seems the shards synchronisation is
too slow [0].
how many ES nodes, indices and shards do you have ?

IMHO, the implementation in Heka ES plugin of a retry strategy per document
would be quite expensive and surely inefficient.

[0]
https://discuss.elastic.co/t/elasticsearch-2-2-0-i-am-occasionally-getting-process-cluster-event-timeout-exception-failed-to-process-cluster-event-put-mapping-as-within-30s-while-bulk-indexing-documents/42305/3


2016-04-28 14:54 GMT+02:00 Ramin Ali Dousti <[email protected]>:

> Hi,
>
> The ES version is "2.2.0".
>
> This is the HTTP response. Look for the status 503 in the payload:
>
> T 127.0.0.1:9200 -> 127.0.0.1:34497 [AP]
> HTTP/1.1 200 OK.
> Content-Type: application/json; charset=UTF-8.
> Content-Length: 3770.
>
> {
>
>    - "took": 39911,
>    - "errors": true,
>    - "items": [
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qa",
>             - "status": 503,
>             - "error": {
>                - "type": "process_cluster_event_timeout_exception",
>                - "reason": "failed to process cluster event (put-mapping
>                [WAF]) within 30s"
>             }
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qb",
>             - "status": 503,
>             - "error": {
>                - "type": "process_cluster_event_timeout_exception",
>                - "reason": "failed to process cluster event (put-mapping
>                [WAF]) within 30s"
>             }
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qc",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.03.28",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qd",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.03.28",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qe",
>             - "status": 503,
>             - "error": {
>                - "type": "process_cluster_event_timeout_exception",
>                - "reason": "failed to process cluster event (put-mapping
>                [WAF]) within 30s"
>             }
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.03.28",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qf",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qg",
>             - "status": 503,
>             - "error": {
>                - "type": "process_cluster_event_timeout_exception",
>                - "reason": "failed to process cluster event (put-mapping
>                [WAF]) within 30s"
>             }
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qh",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qi",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qj",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qk",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4ql",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.03.28",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qm",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.03.28",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qn",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.03.28",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qo",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qp",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qq",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qr",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qs",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qt",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       },
>       - {
>          - "create": {
>             - "_index": "vdps-log-wf-2016.04.26",
>             - "_type": "WAF",
>             - "_id": "AVRTguzBSCHabxnyv4qu",
>             - "_version": 1,
>             - "_shards": {
>                - "total": 3,
>                - "successful": 3,
>                - "failed": 0
>             },
>             - "status": 201
>          }
>       }
>    ]
>
> }
>
>
> On Wed, Apr 27, 2016 at 3:41 AM, Swann Croiset <[email protected]> wrote:
>
>> Hi,
>>
>> IIRC, according to the code you're right, heka doesn't handle such case.
>>
>> That said, I've never seen a such ES response ... I'm interested about
>> it.
>>
>> Could you share these informations: the ES response, the ES version and
>> ES logs (when the thing happens)
>> also, what is your configuration on ES side? index template, field
>> mapping ?
>>
>> --
>> Swann
>>
>>
>>
>> 2016-04-26 22:28 GMT+02:00 Ramin Ali Dousti <[email protected]>:
>>
>>> Hi,
>>>
>>> I have an ES output that bulk uploads to a cluster. The HTTP status code
>>> is 200 OK but the reply payload says that it failed the upload for a few of
>>> the items. But heka doesn't seem to care about the failed items. I looked
>>> at the code and it says:
>>>
>>>
>>> https://github.com/mozilla-services/heka/blob/dev/plugins/elasticsearch/elasticsearch.go#L429
>>>
>>> if response != nil {
>>>
>>>         defer response.Body.Close()
>>>
>>>         if response_body, err = ioutil.ReadAll(response.Body); err !=
>>> nil {
>>>
>>>                 return fmt.Errorf("Can't read HTTP response body.
>>> Status: %s. Error: %s",
>>>
>>>                         response.Status, err.Error()), true
>>>
>>>         }
>>>
>>>         err = json.Unmarshal(response_body, &response_body_json)
>>>
>>>         if err != nil {
>>>
>>>                 return fmt.Errorf("HTTP response didn't contain valid
>>> JSON. Status: %s. Body: %s",
>>>
>>>                         response.Status, string(response_body)), true
>>>
>>>         }
>>>
>>>         json_errors, ok := response_body_json["errors"].(bool)
>>>
>>>         if ok && json_errors && response.StatusCode != 200 {
>>>
>>>                 return fmt.Errorf(
>>>
>>>                         "ElasticSearch server reported error within
>>> JSON. Status: %s. Body: %s",
>>>
>>>                         response.Status, string(response_body)), false
>>>
>>>         }
>>>
>>>         if response.StatusCode > 304 {
>>>
>>>                 return fmt.Errorf("HTTP response error. Status: %s.
>>> Body: %s", response.Status,
>>>
>>>                         string(response_body)), false
>>>
>>>         }
>>>
>>> }
>>>
>>>
>>> 1- In my case I see a 200 OK with "errors = true" which does not seem to
>>> be caught, according to the code.
>>> 2- I don't see any logic for recovery based on individual items. Am I
>>> missing anything here?
>>>
>>>
>>> --
>>> Ramin
>>>
>>> _______________________________________________
>>> Heka mailing list
>>> [email protected]
>>> https://mail.mozilla.org/listinfo/heka
>>>
>>>
>>
>
>
> --
> Ramin
>
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to