Am 13.05.2012 10:03, schrieb Zhouyi Zhou:
> hi all
>   
>   sometimes, qemu/kvm-0.1x will hang in endless loop in 
> qcow2_alloc_cluster_offset.
>   after some investigation, I found that:
>   in function posix_aio_process_queue(void *opaque)
> 440             ret = qemu_paio_error(acb);
> 441             if (ret == ECANCELED) {
> 442                 /* remove the request */
> 443                 *pacb = acb->next;
> 444                 qemu_aio_release(acb);
> 445                 result = 1;
> 446             } else if (ret != EINPROGRESS) {
>   in line 444 acb got released but acb->common.opaque does not.
> which will be released via guest OS via ide_dma_cancel which 
> will in term call qcow_aio_cancel which does not check its argument
> is in flight list or not.
>   The fix is as follows: (debian 6's qemu-kvm-0.12.5)
> #######################################
> --- block/qcow2.h~      2010-07-27 08:43:53.000000000 +0800
> +++ block/qcow2.h       2012-05-13 15:51:39.000000000 +0800
> @@ -143,6 +143,7 @@
>      QLIST_HEAD(QCowAioDependencies, QCowAIOCB) dependent_requests;
>  
>      QLIST_ENTRY(QCowL2Meta) next_in_flight;
> +    int inflight;       
>  } QCowL2Meta;
> --- block/qcow2.c~  2012-05-13 15:57:09.000000000 +0800
> +++ block/qcow2.c       2012-05-13 15:57:24.000000000 +0800
> @@ -349,6 +349,10 @@
>      QCowAIOCB *acb = (QCowAIOCB *)blockacb;
>      if (acb->hd_aiocb)
>          bdrv_aio_cancel(acb->hd_aiocb);
> +    if (acb->l2meta.inflight) {
> +        QLIST_REMOVE(&acb->l2meta, next_in_flight);
> +       acb->l2meta.inflight = 0;
> +    }
>      qemu_aio_release(acb);
>  }
>  
> @@ -506,6 +510,7 @@
>      acb->n = 0;
>      acb->cluster_offset = 0;
>      acb->l2meta.nb_clusters = 0;
> +    acb->l2meta.inflight = 0;
>      QLIST_INIT(&acb->l2meta.dependent_requests);
>      return acb;
>  }
> @@ -534,6 +539,7 @@
>      /* Take the request off the list of running requests */
>      if (m->nb_clusters != 0) {
>          QLIST_REMOVE(m, next_in_flight);
> +       m->inflight = 0;
>      }
>  
>      /*
> @@ -632,6 +638,7 @@
>  fail:
>      if (acb->l2meta.nb_clusters != 0) {
>          QLIST_REMOVE(&acb->l2meta, next_in_flight);
> +       acb->l2meta.inflight  = 0;
>      }
>  done:
>      if (acb->qiov->niov > 1)
> --- block/qcow2-cluster.c~      2010-07-27 08:43:53.000000000 +0800
> +++ block/qcow2-cluster.c       2012-05-13 15:53:53.000000000 +0800
> @@ -827,6 +827,7 @@
>      m->offset = offset;
>      m->n_start = n_start;
>      m->nb_clusters = nb_clusters;
> +    m->inflight = 1;
>  
>  out:
>      m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end);
> 
>  Thanks for investigation
> Zhouyi

The patch looks reasonable to me. Note however that while it fixes the
hang, it still causes cluster leaks. I'm not sure if someone is
interested in picking these up for old stable releases. Andreas, I think
you were going to take 0.15? The first version that doesn't have the
problem is 1.0.

Kevin

Reply via email to