Dear PETSc community, I have some questions about the DMSWARM structure and a request for suggestions in order to speed up a task that I require in my code.
I am developing a numerical method that is similar to the PIC, MPM, etc, however, all cells of the DM mesh must contain particles. In this method, about 10 particles are included in each cell (element), particles are displaced at each time step, and carry information that must be projected onto/from the mesh. Searching on the internet I came across the DMSWARM structure. Indeed, the structure allows easy initialization of a cloud of particles, easy detection of the element that holds a specific particle and easy particle-mesh projection. So it turns out to be quite convenient for my purpose. However, in a test implementation I performed (based on the PETSC examples), I notice a significant bottleneck (IMHO) in the DMSwarmMigrate function (line 198 of the attached code). Given that I am not sure if my implementation is correct or optimal, I am attaching a sample code. Here, some extra info: 3D implementation (tetra and hexa); cpu, mpich & open mpi; DMPlex (from gmsh file, also attached). The following tables are obtained with the attached code using 9329 tetra elements and 128085 particles (but the desired goal is to reach millions of elements and particles). TABLE 1 : using the same mesh and particles ..................................processors: - - - - 1 - - - - - - 2 - - - - - - 4 - - - - - 8 - - - - - 16 ......num. of particles per process: - -128000 - - 64000 - - 32000 - - 16000 - - 8000 ...time per DMSwarmMigrate call: - - 200 s. - - - - 90 s. - - - 17 s. - - - 5 s. - - - 3 s. TABLE 2 : using the same amount of processors = 16 (all cases) ...................................elements: - - - 9329 - - - 18730 - - - - 45432 ....................................particles: - - 128085 - - 260554 - - - 631430 time per DMSwarmMigrate call: - - - - 3 s. - - - - 13 s. - - - - - 88 s. My questions: - Is this performance of DMSwarmMigrate normal ? (I have the feeling that it is quite time consuming, e.g. when compared to the time spent by KSPSolve in linear elasticity using an iterative solver). - The DMSwarm functionalities I use are: create a particle field, associate it to a DMPlex, and obtain Vecs that are dynamic in size as a result of the migration. ---> Is the implementation of the attached code optimal for the purpose I mention? Note that I did some tests with the DMSwarmSetLocalSizes function but saw no difference. Also, I set DMSWARM_MIGRATE_BASIC, which is faster but did not relocate particles. If the answer to both questions is YES, I would appreciate any suggestions on how I can reach my target (millions of particles). Thank you very much for taking the time to read this email. Kind regards, Eduardo.
#include <memory> #include <petsc.h> int main(int argc, char* argv[]) { // FIRST, SET THE GMSH FILE NAME const char * gmshFilePath = "../mesh/mesh_tetra.msh"; // ======================= // INITIALIZE PETSC // ======================= PetscMPIInt parallel_size; PetscCall(PetscInitialize(&argc, &argv,(char *) 0, NULL)); PetscCall(MPI_Comm_size(PETSC_COMM_WORLD,¶llel_size)); PetscPrintf(PETSC_COMM_WORLD,"PETSc MPI size: %d",parallel_size); // ======================= // TIME MEASURE // ======================= PetscLogDouble time_total = 0.0 ; PetscLogDouble time_total_check ; PetscTime(&time_total_check); PetscLogDouble time_migrate = 0.0 ; PetscLogDouble time_migrate_check ; PetscLogDouble time_addPoints = 0.0 ; PetscLogDouble time_addPoints_check ; PetscLogDouble time_export = 0.0 ; PetscLogDouble time_export_check ; // ======================= // create DMPlex structure // (from gmsh file) // ======================= PetscInt dim = 3; DM celldm, distributedMesh = NULL; // LOAD THE MESH FROM A GMSH FILE PetscCall(DMPlexCreateGmshFromFile(PETSC_COMM_WORLD, gmshFilePath , PETSC_TRUE, &celldm)); PetscCall(DMView(celldm, PETSC_VIEWER_STDOUT_WORLD)); // Distribute mesh over processes PetscCall(DMPlexDistribute(celldm, 0, NULL, &distributedMesh)); if (distributedMesh) { PetscCall(DMDestroy(&celldm)); celldm = distributedMesh; } PetscCall(DMSetFromOptions(celldm)); // ======================= // CREATE SECTION AT THE // NODE LEVEL (1 COMPONENT) // ======================= { PetscInt numComp[] = {1}; PetscInt numDof[] = {1, 0, 0}; /* vert, edge, cell */ PetscInt numBC = 0; PetscSection section; PetscCall(DMPlexCreateSection(celldm, NULL, numComp, numDof, numBC, NULL, NULL, NULL, NULL, §ion)); PetscCall(DMSetLocalSection(celldm, section)); PetscCall(PetscSectionDestroy(§ion)); } PetscCall(DMSetUp(celldm)); // ======================= // EXPORT THE MESH // (and measure time) // ======================= { PetscTime(&time_export_check); PetscViewer viewer; PetscCall(PetscViewerCreate(PETSC_COMM_WORLD, &viewer)); PetscCall(PetscViewerSetType(viewer, PETSCVIEWERVTK)); PetscCall(PetscViewerFileSetMode(viewer, FILE_MODE_WRITE)); PetscCall(PetscViewerFileSetName(viewer, "dmPlex_mesh.vtk")); PetscCall(DMView(celldm, viewer)); PetscCall(PetscViewerDestroy(&viewer)); PetscLogDouble tNow; PetscTime(&tNow); time_export += tNow - time_export_check; } // ======================= // create swarm structure // ======================= DM swarm; PetscCall(DMCreate(PETSC_COMM_WORLD, &swarm)); PetscCall(PetscObjectSetName((PetscObject)swarm, "Swarm")); PetscCall(DMSetType(swarm, DMSWARM)); PetscCall(DMSetDimension(swarm, dim)); PetscCall(DMSwarmSetType(swarm, DMSWARM_PIC)); PetscCall(DMSwarmSetCellDM(swarm, celldm)); // ======================= // Register two scalar // fields in swarm // ======================= PetscCall(DMSwarmRegisterPetscDatatypeField(swarm, "p_viscosity", 1, PETSC_DOUBLE)); PetscCall(DMSwarmRegisterPetscDatatypeField(swarm, "p_rank", 1, PETSC_DOUBLE)); PetscCall(DMSwarmFinalizeFieldRegister(swarm)); // I'm not sure how to set/use the following command: // PetscCall(DMSwarmSetLocalSizes(swarm, 1, 1)); // ======================= // Insert swarm coordinates // cell-wise // ======================= PetscCall(DMSwarmInsertPointsUsingCellDM(swarm, DMSWARMPIC_LAYOUT_GAUSS, 2)); // ======================= // fields to export and // view "WORLD" the DMs // ======================= const char *fieldnames[] = {"p_viscosity", "p_rank"}; // report DMs PetscPrintf(PETSC_COMM_WORLD,"\n\n ---- celldm ----- \n"); PetscCall(DMView(celldm, PETSC_VIEWER_STDOUT_WORLD)); PetscPrintf(PETSC_COMM_WORLD,"\n\n ----- swarm ----- \n"); PetscCall(DMView(swarm, PETSC_VIEWER_STDOUT_WORLD)); // ======================= // ======================= // LOOP FOR DISPLACING THE // SWARM PARTICLES // ======================= // ======================= PetscInt cStart, cEnd; PetscCall(DMPlexGetHeightStratum(celldm, 0, &cStart, &cEnd)); // for exporting to paraview char filename[PETSC_MAX_PATH_LEN]; PetscPrintf(PETSC_COMM_WORLD,"\n ==============================\n STARTING PARTICLE DISPLACEMENT \n =============================="); for(PetscInt t = 0 ; t < 4 ; ++t) { // ------------------------------------------------- // get the particle coordinates and fields // ------------------------------------------------- PetscReal *coords; PetscScalar *w,*r; PetscCall(DMSwarmSortGetAccess(swarm)); PetscCall(DMSwarmGetField(swarm, DMSwarmPICField_coor, NULL, NULL, (void **)&coords)); PetscCall(DMSwarmGetField(swarm, "p_viscosity", NULL, NULL, (void **)&w)); PetscCall(DMSwarmGetField(swarm, "p_rank", NULL, NULL, (void **)&r)); PetscMPIInt rank; MPI_Comm_rank(PETSC_COMM_WORLD, &rank); // ------------------------------------------------- // to each particle in the cell: // displace "x" coordinate // give to viscosity a value equal to the x coordinate // anotate the rank // ------------------------------------------------- for (PetscInt cell = cStart; cell < cEnd; ++cell) { PetscInt *pidx; PetscInt Np, p; PetscCall(DMSwarmSortGetPointsPerCell(swarm, cell, &Np, &pidx)); for (p = 0; p < Np; ++p) { const PetscInt idx = pidx[p]; const PetscReal *c = &coords[idx * dim]; w[pidx[p]] = c[0]; r[pidx[p]] = (PetscScalar) rank; coords[pidx[p]*dim + 0] = coords[pidx[p]*dim + 0] + 0.1/3; } PetscCall(PetscFree(pidx)); } PetscCall(DMSwarmRestoreField(swarm, DMSwarmPICField_coor, NULL, NULL, (void **)&coords)); PetscCall(DMSwarmRestoreField(swarm, "p_viscosity", NULL, NULL, (void **)&w)); PetscCall(DMSwarmRestoreField(swarm, "p_rank", NULL, NULL, (void **)&r)); PetscPrintf(PETSC_COMM_WORLD,"\n ---------------- \n SMARM at time: %d\n", t); PetscCall(DMSwarmSortRestoreAccess(swarm)); // ------------------------------------------------- // MIGRATE PARTICLES AND MEASURE THE TIME // ------------------------------------------------- PetscTime(&time_migrate_check); PetscCall(DMSwarmMigrate(swarm, PETSC_TRUE)); PetscLogDouble tNow; PetscTime(&tNow); time_migrate += tNow - time_migrate_check; // ------------------------------------------------- // update the rank (after migration) // ------------------------------------------------- PetscCall(DMSwarmSortGetAccess(swarm)); PetscCall(DMSwarmGetField(swarm, "p_rank", NULL, NULL, (void **)&r)); for (PetscInt cell = cStart; cell < cEnd; ++cell) { PetscInt *pidx; PetscInt Np, p; PetscCall(DMSwarmSortGetPointsPerCell(swarm, cell, &Np, &pidx)); for (p = 0; p < Np; ++p) { r[pidx[p]] = (PetscScalar) rank; } PetscCall(PetscFree(pidx)); } PetscCall(DMSwarmRestoreField(swarm, "p_rank", NULL, NULL, (void **)&r)); PetscCall(DMSwarmSortRestoreAccess(swarm)); // ------------------------------------------------- // export the particles field and measure time // ------------------------------------------------- PetscTime(&time_export_check); PetscCall(DMView(swarm, PETSC_VIEWER_STDOUT_WORLD)); PetscCall(PetscSNPrintf(filename, PETSC_MAX_PATH_LEN - 1, "p_migrated_step_%" PetscInt_FMT ".xmf", t)); PetscCall(DMSwarmViewFieldsXDMF(swarm, filename, 2, fieldnames)); PetscTime(&tNow); time_export += tNow - time_export_check; } // ======================= // end check swarm // ======================= // destroy PetscCall(DMDestroy(&celldm)); PetscCall(DMDestroy(&swarm)); // TIME PetscLogDouble tNow; PetscTime(&tNow); time_total += tNow - time_total_check; PetscPrintf(PETSC_COMM_WORLD,"\n Total : %10.4f s. : 100 %%", time_total); PetscPrintf(PETSC_COMM_WORLD,"\n ╟> migrate : %10.4f s. : ╟> %6.3f %%", time_migrate, time_migrate/time_total*100.0); PetscPrintf(PETSC_COMM_WORLD,"\n ╟> addPoints : %10.4f s. : ╟> %6.3f %%", time_addPoints, time_addPoints/time_total*100.0); PetscPrintf(PETSC_COMM_WORLD,"\n ╟> export paraview : %10.4f s. : ╟> %6.3f %%", time_export, time_export/time_total*100.0); // ======================= // FINALIZE PETSC & CODE // ======================= PetscCall(PetscFinalize()); return 0; }
geometry_tetra.geo
Description: Binary data