Dear PETSc community,

I have some questions about the DMSWARM structure and a request for
suggestions in order to speed up a task that I require in my code.

I am developing a numerical method that is similar to the PIC, MPM, etc,
however, all cells of the DM mesh must contain particles. In this method,
about 10 particles are included in each cell (element), particles are
displaced at each time step, and carry information that must be projected
onto/from the mesh.

Searching on the internet I came across the DMSWARM structure. Indeed, the
structure allows easy initialization of a cloud of particles, easy
detection of the element that holds a specific particle and easy
particle-mesh projection. So it turns out to be quite convenient for my
purpose.

However, in a test implementation I performed (based on the PETSC
examples), I notice a significant bottleneck (IMHO) in the DMSwarmMigrate
function (line 198 of the attached code). Given that I am not sure if my
implementation is correct or optimal, I am attaching a sample code. Here,
some extra info: 3D implementation (tetra and hexa); cpu, mpich & open mpi;
DMPlex (from gmsh file, also attached). The following tables are obtained
with the attached code using 9329 tetra elements and 128085 particles (but
the desired goal is to reach millions of elements and particles).

TABLE 1 : using the same mesh and particles

..................................processors: - - - - 1 - - - - - - 2 - - -
- - - 4 - - - - - 8 - - - - - 16
......num. of particles per process: - -128000 - - 64000 - - 32000 - -
16000 - - 8000
...time per DMSwarmMigrate call: - - 200 s. - - - - 90 s. - - - 17 s. - - -
5 s. - - - 3 s.


TABLE 2 : using the same amount of processors = 16 (all cases)

...................................elements: - - - 9329  - - - 18730 - - -
- 45432
....................................particles: - - 128085 - - 260554 - - -
631430
time per DMSwarmMigrate call: - - - - 3 s. - - - - 13 s. - - - - - 88 s.


My questions:

- Is this performance of DMSwarmMigrate normal ? (I have the feeling that
it is quite time consuming, e.g. when compared to the time spent by
KSPSolve in linear elasticity using an iterative solver).

- The DMSwarm functionalities I use are: create a particle field, associate
it to a DMPlex, and obtain Vecs that are dynamic in size as a result of the
migration. ---> Is the implementation of the attached code optimal for the
purpose I mention?

Note that I did some tests with the DMSwarmSetLocalSizes function but saw
no difference. Also, I set DMSWARM_MIGRATE_BASIC, which is faster but did
not relocate particles.

If the answer to both questions is YES, I would appreciate any suggestions
on how I can reach my target (millions of particles).

Thank you very much for taking the time to read this email.

Kind regards,

Eduardo.
#include <memory> 
#include <petsc.h>

int main(int argc, char* argv[])
{
    // FIRST, SET THE GMSH FILE NAME                      
    const char * gmshFilePath = "../mesh/mesh_tetra.msh"; 
    
    // =======================
    //    INITIALIZE PETSC
    // =======================
    PetscMPIInt parallel_size;
    PetscCall(PetscInitialize(&argc, &argv,(char *) 0, NULL));
    PetscCall(MPI_Comm_size(PETSC_COMM_WORLD,&parallel_size));    
    PetscPrintf(PETSC_COMM_WORLD,"PETSc MPI size: %d",parallel_size);

    // =======================
    //      TIME MEASURE
    // =======================
    PetscLogDouble  time_total = 0.0 ;
    PetscLogDouble  time_total_check ;
    PetscTime(&time_total_check);

    PetscLogDouble  time_migrate = 0.0 ;
    PetscLogDouble  time_migrate_check ;

    PetscLogDouble  time_addPoints = 0.0 ;
    PetscLogDouble  time_addPoints_check ;

    PetscLogDouble  time_export = 0.0 ;
    PetscLogDouble  time_export_check ;
    
    // =======================
    // create DMPlex structure
    //    (from gmsh file)
    // =======================
    PetscInt dim  = 3;
    DM  celldm, distributedMesh = NULL;

    // LOAD THE MESH FROM A GMSH FILE
    PetscCall(DMPlexCreateGmshFromFile(PETSC_COMM_WORLD, gmshFilePath , PETSC_TRUE, &celldm));
    PetscCall(DMView(celldm, PETSC_VIEWER_STDOUT_WORLD));
    
    // Distribute mesh over processes
    PetscCall(DMPlexDistribute(celldm, 0, NULL, &distributedMesh));
    if (distributedMesh) {
      PetscCall(DMDestroy(&celldm));
      celldm = distributedMesh;
    }
    
    PetscCall(DMSetFromOptions(celldm));

    // =======================
    //  CREATE SECTION AT THE
    // NODE LEVEL (1 COMPONENT)
    // =======================
    {
      PetscInt     numComp[] = {1};
      PetscInt     numDof[]  = {1, 0, 0}; /* vert, edge, cell */
      PetscInt     numBC     = 0;
      PetscSection section;

      PetscCall(DMPlexCreateSection(celldm, NULL, numComp, numDof, numBC, NULL, NULL, NULL, NULL, &section));
      PetscCall(DMSetLocalSection(celldm, section));
      PetscCall(PetscSectionDestroy(&section));
    }

    PetscCall(DMSetUp(celldm));

    // =======================
    //     EXPORT THE MESH 
    //    (and measure time)
    // =======================
    {
      PetscTime(&time_export_check);

      PetscViewer viewer;
      PetscCall(PetscViewerCreate(PETSC_COMM_WORLD, &viewer));
      PetscCall(PetscViewerSetType(viewer, PETSCVIEWERVTK));
      PetscCall(PetscViewerFileSetMode(viewer, FILE_MODE_WRITE));
      PetscCall(PetscViewerFileSetName(viewer, "dmPlex_mesh.vtk"));
      PetscCall(DMView(celldm, viewer));
      PetscCall(PetscViewerDestroy(&viewer));

      PetscLogDouble tNow; 
      PetscTime(&tNow); 
      time_export += tNow - time_export_check;
    }

    // =======================
    // create swarm structure
    // =======================
    DM swarm;
    PetscCall(DMCreate(PETSC_COMM_WORLD, &swarm));
    PetscCall(PetscObjectSetName((PetscObject)swarm, "Swarm"));
    PetscCall(DMSetType(swarm, DMSWARM));
    PetscCall(DMSetDimension(swarm, dim));
    
    PetscCall(DMSwarmSetType(swarm, DMSWARM_PIC));
    PetscCall(DMSwarmSetCellDM(swarm, celldm));

    // =======================
    //   Register two scalar 
    //     fields in swarm 
    // =======================
    PetscCall(DMSwarmRegisterPetscDatatypeField(swarm, "p_viscosity", 1, PETSC_DOUBLE));
    PetscCall(DMSwarmRegisterPetscDatatypeField(swarm, "p_rank", 1, PETSC_DOUBLE));
    PetscCall(DMSwarmFinalizeFieldRegister(swarm));

    // I'm not sure how to set/use the following command:
    // PetscCall(DMSwarmSetLocalSizes(swarm, 1, 1));
    
    // =======================
    // Insert swarm coordinates 
    //        cell-wise 
    // =======================
    PetscCall(DMSwarmInsertPointsUsingCellDM(swarm, DMSWARMPIC_LAYOUT_GAUSS, 2));
    
    // =======================
    //  fields to export and
    //  view "WORLD" the DMs
    // =======================
    const char *fieldnames[] = {"p_viscosity", "p_rank"};
    
    // report DMs
    PetscPrintf(PETSC_COMM_WORLD,"\n\n ---- celldm ----- \n");
    PetscCall(DMView(celldm, PETSC_VIEWER_STDOUT_WORLD));

    PetscPrintf(PETSC_COMM_WORLD,"\n\n ----- swarm ----- \n");
    PetscCall(DMView(swarm, PETSC_VIEWER_STDOUT_WORLD));

    // =======================
    // =======================
    // LOOP FOR DISPLACING THE
    //    SWARM PARTICLES 
    // =======================
    // =======================
    
    PetscInt cStart, cEnd;
    PetscCall(DMPlexGetHeightStratum(celldm, 0, &cStart, &cEnd));

    // for exporting to paraview
    char filename[PETSC_MAX_PATH_LEN];

    PetscPrintf(PETSC_COMM_WORLD,"\n ==============================\n STARTING PARTICLE DISPLACEMENT \n ==============================");
    for(PetscInt t = 0 ; t < 4 ; ++t)
    {
      // -------------------------------------------------
      // get the particle coordinates and fields
      // -------------------------------------------------
      
      PetscReal   *coords;
      PetscScalar *w,*r;
      PetscCall(DMSwarmSortGetAccess(swarm));
      PetscCall(DMSwarmGetField(swarm, DMSwarmPICField_coor, NULL, NULL, (void **)&coords));
      PetscCall(DMSwarmGetField(swarm, "p_viscosity", NULL, NULL, (void **)&w));
      PetscCall(DMSwarmGetField(swarm, "p_rank", NULL, NULL, (void **)&r));

      PetscMPIInt    rank;
      MPI_Comm_rank(PETSC_COMM_WORLD, &rank);

      // -------------------------------------------------
      // to each particle in the cell:
      //    displace "x" coordinate
      //    give to viscosity a value equal to the x coordinate
      //    anotate the rank 
      // -------------------------------------------------

      for (PetscInt cell = cStart; cell < cEnd; ++cell) {
        PetscInt *pidx;
        PetscInt  Np, p;

        PetscCall(DMSwarmSortGetPointsPerCell(swarm, cell, &Np, &pidx));
        
        for (p = 0; p < Np; ++p) 
        {
          const PetscInt   idx = pidx[p];
          const PetscReal *c   = &coords[idx * dim];
          w[pidx[p]] = c[0];
          r[pidx[p]] = (PetscScalar) rank;
          coords[pidx[p]*dim + 0] = coords[pidx[p]*dim + 0] + 0.1/3;
        }

        PetscCall(PetscFree(pidx));
      }

      PetscCall(DMSwarmRestoreField(swarm, DMSwarmPICField_coor, NULL, NULL, (void **)&coords));
      PetscCall(DMSwarmRestoreField(swarm, "p_viscosity", NULL, NULL, (void **)&w));
      PetscCall(DMSwarmRestoreField(swarm, "p_rank", NULL, NULL, (void **)&r));

      PetscPrintf(PETSC_COMM_WORLD,"\n ---------------- \n SMARM at time: %d\n", t);
      PetscCall(DMSwarmSortRestoreAccess(swarm));

      // -------------------------------------------------    
      // MIGRATE PARTICLES AND MEASURE THE TIME
      // -------------------------------------------------
      PetscTime(&time_migrate_check);
      PetscCall(DMSwarmMigrate(swarm, PETSC_TRUE));
      PetscLogDouble tNow; 
      PetscTime(&tNow); 
      time_migrate += tNow - time_migrate_check;

      // -------------------------------------------------    
      // update the rank (after migration)
      // -------------------------------------------------
      PetscCall(DMSwarmSortGetAccess(swarm));
      PetscCall(DMSwarmGetField(swarm, "p_rank", NULL, NULL, (void **)&r));
      for (PetscInt cell = cStart; cell < cEnd; ++cell) {
        PetscInt *pidx;
        PetscInt  Np, p;
        PetscCall(DMSwarmSortGetPointsPerCell(swarm, cell, &Np, &pidx));
        for (p = 0; p < Np; ++p) {
          r[pidx[p]] = (PetscScalar) rank;
        }
        PetscCall(PetscFree(pidx));
      }

      PetscCall(DMSwarmRestoreField(swarm, "p_rank", NULL, NULL, (void **)&r));
      PetscCall(DMSwarmSortRestoreAccess(swarm));

      // -------------------------------------------------    
      // export the particles field and measure time
      // -------------------------------------------------
      PetscTime(&time_export_check);
      PetscCall(DMView(swarm, PETSC_VIEWER_STDOUT_WORLD));
      PetscCall(PetscSNPrintf(filename, PETSC_MAX_PATH_LEN - 1, "p_migrated_step_%" PetscInt_FMT ".xmf", t));
      PetscCall(DMSwarmViewFieldsXDMF(swarm, filename, 2, fieldnames));
      PetscTime(&tNow); 
      time_export += tNow - time_export_check;
    }

    // =======================
    // end check swarm
    // =======================


    // destroy
    PetscCall(DMDestroy(&celldm));
    PetscCall(DMDestroy(&swarm));

    // TIME
    PetscLogDouble tNow; 
    PetscTime(&tNow); 
    time_total += tNow - time_total_check;

    PetscPrintf(PETSC_COMM_WORLD,"\n Total              : %10.4f s. :  100 %%", time_total);
    PetscPrintf(PETSC_COMM_WORLD,"\n ╟> migrate         : %10.4f s. :  ╟> %6.3f %%", time_migrate, time_migrate/time_total*100.0);
    PetscPrintf(PETSC_COMM_WORLD,"\n ╟> addPoints       : %10.4f s. :  ╟> %6.3f %%", time_addPoints, time_addPoints/time_total*100.0);
    PetscPrintf(PETSC_COMM_WORLD,"\n ╟> export paraview : %10.4f s. :  ╟> %6.3f %%", time_export, time_export/time_total*100.0);
        
    // =======================
    //  FINALIZE PETSC & CODE
    // =======================
    PetscCall(PetscFinalize());

    return 0;
}

Attachment: geometry_tetra.geo
Description: Binary data

Reply via email to