1. Code Name: M3D
2. Code Category: Transport, Equilibrium, MHD, GK, Neutrals, RF (etc) Nonlinear MHD
3. Primary Developer Joshua Breslau
4. Other Developers and Users Developer/users: Wonchull Park (ret.), Hank Strauss, Linda Sugiyama, Guo-Yong Fu, Jin Chen Other users: Roberto Paccagnella, Jianying Lang, Feng Wang, CPES collaboration
5. Short description (one line if possible): M3D is a massively parallel multi-level 3D code that computes the initial value solutions of the extended MHD equations in toroidal geometry with multiple options, including ideal and resistive MHD, two-fluid, and kinetic/MHD hybrid model with gyrokinetic energetic particles.
6. Computer Language (Fortran77, Fortran90, C, C++, etc) and approx # of lines: Mixture of C (mesh generation, finite element operators, PETSc interface) and Fortran 90 (physics driver) routines. Approximately 55,000 lines of C and 61,000 lines of Fortran code.
7. Type of input required (including files and/or output from other codes). Is there any special input preparation system (e.g., GUI): Requires an input MHD equilibrium in one of several formats: old VMEC ascii (containing 3D R,Z, and J as double Fourier series in theta and phi as functions of poloidal flux s, with pressure and flux as 1D functions of s; VMEC-"light", produced by i2mex, containing 2D profiles; eqdsk.cdf equilibrium produced by JSOLVER; or free-boundary equilibrium + vacuum mesh produced by M3D-OMP (omp.out). Also requires a "wxy" file containing input namelists, and optionally a "config.dat"file containing mesh dimensions and domain decomposition (which may alternately be specified on the command line) and an "options" file specifying linear preconditioner and solver options for PETSc (likewise).
8. Type of output produced (including files that are read by other codes and sizes of large files and synthetic diagnostics):
Some diagnostic output is sent to stdout/stderr during the run, typically including description of initial state, followed by time history of poloidal kinetic energy and growth rate. (Typical file size is a few 100 kB).
Primary output mechanism is HDF5 file 3d.001.h5, containing 3D field data for up to 13 variables (including vector B field) sampled at a number of time steps. This is typically visualized using AVS/Express or Visit. There are also a number of post-processors for analyzing data. Typical file sizes range from 10s of MB up to 1 GB.
M3D also generates unformatted binary checkpoint files at specified intervals for restart purposes, up to a maximum of three for a single run; each is approximately 50% larger than a single time slice of the HDF5 file.
9. Describe any postprocessors which read the output files:
All postprocessors are optional.
poincareng12, qprofileng12: perform multiple field line traces to compute PoincarÂŽ maps, q profiles respectively.
surfit12, remap: jointly used to compute first-return maps for equilibrium surfaces based on perturbed field.
wallforce, wallforce2: Compute vessel forces and currents based on results of resistive wall calculations.
mode27: Fourier analyze total kinetic and magnetic energies
extractprof: extract a 1D profile of a single variable at specified z and phi (or a toroidal average) as a text file.
getfw: Use field line traces, q profile diagnostic to estimate width of island of specified helicity.
getprim: Compute primitive variables (Cylindrical components of B field, current density, and velocity) as well as pressure (if HDF5 contains temperature) or temperature (if HDF5 contains pressure), output to a 2nd HDF5 file.
diffHDF5: Compute the difference between all variables in two time slices in the same or different HDF5 files that have the same geometry, output to a new HDF5 file. (Can also be used to extract a single time slice from a file containing multiple slices).
concatHDF5: Concatenate several HDF5 files from separate runs in the same series together into one file for animation.
HDF2silo: Convert HDF5 file into silo format for easy visualization with Visit.
sxr: Simulated soft X-ray diagnostic; integrates a kernel along several chords at constant phi through the plasma at each time slice.
10. Status and location of code input/output documentation: Code documentation is incomplete. The most complete instructions are currently in the various documents at http://w3.pppl.gov/m3d/reference.html.
11. Code web site? http://w3.pppl.gov/m3d/index.php
12. Is code under version control? What system? Is automated regression testing performed? Yes. Subversion. No.
13. One to two paragraph description of equations solved and functionality including what discretizations are used in space and time:
The basic equations solved are those of resistive MHD: continuity, momentum, Faraday's and Ohm's laws, Ampere's law, and an equation of state, typically adiabatic. These are used to advance normalized scalar variables representing number density, a poloidal velocity stream function, a compressible poloidal velocity potential, toroidal velocity, poloidal magnetic flux, toroidal magnetic field, and total pressure. Terms may be dropped from these equations to give ideal or reduced MHD. Additional terms (ion gyroviscous and/or Hall) or equations (electron pressure) may be added to give two-fluid physics. In the case of a kinetic hot ion population, moments are taken of the distribution function to compute the divergence of the hot ion pressure tensor, which appears in the momentum equation. Parallel heat conduction is usually modeled with a diffusive artificial sound equation for temperature in the direction parallel to the magnetic field, advanced on substeps of the main time advance, though a fully implicit anisotropic heat conduction operator is now available as well.
The M3D mesh uses cylindrical coordinates (R,\phi,z) and a toroidal topology. The 2D spatial discretization in each constant-\phi plane is by linear finite elements with C^0 continuity on an unstructured triangular mesh. Higher-order finite element options exist, but are considered too time-consuming for routine use by most M3D users. Discretization in the toroidal direction is performed using either high-order finite differences or a pseudospectral option. Time discretization can be either first or second order, and is explicit with the exception of the compressional Alfven wave and the diffusive (parabolic) terms, which are treated implicitly and advanced separately. An option exists to treat the dominant shear Alfven wave terms implicitly as well, which typically allows the time step to be increased by approximately a factor of three before numerical instability sets in.
14. What modes of operation of code are there (e.g.: linear, nonlinear, reduced models, etc ): M3D can be run in linear, nonlinear, or equilibrium solver mode. Orthogonal to these are the tokamak and stellarator options. The physics model can be reduced MHD, ideal MHD, resistive MHD (with artificial sound or anisotropic heat conduction), two-fluid MHD, or a hybrid of MHD with a gyrokinetically described population of energetic ions. Calculations can be fixed- or free-boundary, with ideal or resistive walls. There is also considerably flexibility in choices of sources, sinks, external perturbations, etc.
15. Journal references describing code: W. Park, et al., Phys. Plasmas 6, 1796 (1999); G.Y. Fu, et al., Phys. Plasmas 13 052517 (2006).
16. Codes it is similar to and differences (public version): Similar to M3D-C1, NIMROD. Unlike M3D-C1, contains a hybrid model. Unlike NIMROD, uses finite differences in toroidal direction. Unlike both, uses linear finite elements with C0 continuity, is not fully implicit in time, and has stellarator capability.
17. Results of code verification and convergence studies (with references): The calculated growth rate and mode frequency of an n=1 internal kink mode and fishbone compare well with the NOVA-K code. [G.Y. Fu, et al., Phys. Plasmas 13, 052517 (2006).] The simulated sawteeth in CDX-U plasmas agree well with those of NIMROD with respect to nonlinear evolution of kinetic energy. [J. Breslau, et al., Phys. Plasmas 14, 056105 (2007)].
18. Present and recent applications and validation exercises (with references as available): NSTX non-resonant internal kink mode studies [J.A. Breslau, et al., in Proceedings of the 23rd IAEA Conference, THS/P2-03 2010] Nonlinear modeling of ELMs [L.E. Sugiyama and H.R. Strauss, Phys. Plasmas 17, 062505 (2010)] Disruption modeling for ITER [H.R. Strauss, et al., Phys. Plasmas 17, 082505 (2010)] Nonlinear hybrid simulations of TAE modes with source and sink [J. Lang, et al., Phys. Plasmas 17, 042309 (2010)] CDX sawtooth benchmark [J. Breslau, et al., Phys. Plasmas 14, 056105 (2007)] Nonlinear hybrid simulations of fishbone instability [G.Y. Fu, et al., Phys. Plasmas 13, 052517 (2006).]
19. Limitations of code parameter regime (dimensionless parameters accessible): Lundquist number S < 10^6. Due to the time step restriction and efficiency considerations, total simulated time is limited to no more than a few milliseconds.
20. What third party software is used? (e.g. Meshing software, PETSc, ...): PETSc (including HYPRE), FFTW, HDF5, NetCDF
21. Description of scalability M3D has been demonstrated to have good weak scaling up to ~10,000 processors on the Cray XT4. However, because the time advance is still partially explicit, it is highly impractical for production runs of slow resistive modes such as tearing and NTMs. At present, typical runs take advantage of up to ~500 cores, at which the scaling is very good. In future, we expect to use more than 1000 cores efficiently for larger production runs.
22. Major serial and parallel bottlenecks. The majority of execution time in M3D, both on small and large numbers of cores, is spent in linear algebra solver routines from the PETSc library. The scaling results quoted above suggest that there are no major serial bottlenecks in these routines. Scaling is most likely limited by the cost of node-to-node communication, while overall efficiency depends on the rate of convergence of the iterative solution algorithms (GMRES or conjugate gradient).
23. Are there smaller codes contained in the larger code? Describe: No.
24. Supported platforms and portability: Runs on the Cray XT4 and XT5 at NERSC. Limited portability to other systems due to reliance on an old and not-widely-supported version of PETSc. (To be remedied in the near future).
25. Illustrations of time-to-solution on different platforms and for different complexity of physics, if applicable. Typical linear fluid calculations take on the order of a day running on tens of processor cores. Typical nonlinear fluid calculations take on the order of a week on hundreds of cores.

M3D1. Code Name: M3D

2. Code Category: Transport, Equilibrium, MHD, GK, Neutrals, RF (etc) Nonlinear MHD

3. Primary Developer Joshua Breslau

4. Other Developers and Users Developer/users: Wonchull Park (ret.), Hank Strauss, Linda Sugiyama, Guo-Yong Fu, Jin Chen Other users: Roberto Paccagnella, Jianying Lang, Feng Wang, CPES collaboration

5. Short description (one line if possible): M3D is a massively parallel multi-level 3D code that computes the initial value solutions of the extended MHD equations in toroidal geometry with multiple options, including ideal and resistive MHD, two-fluid, and kinetic/MHD hybrid model with gyrokinetic energetic particles.

6. Computer Language (Fortran77, Fortran90, C, C++, etc) and approx # of lines: Mixture of C (mesh generation, finite element operators, PETSc interface) and Fortran 90 (physics driver) routines. Approximately 55,000 lines of C and 61,000 lines of Fortran code.

7. Type of input required (including files and/or output from other codes). Is there any special input preparation system (e.g., GUI): Requires an input MHD equilibrium in one of several formats: old VMEC ascii (containing 3D R,Z, and J as double Fourier series in theta and phi as functions of poloidal flux s, with pressure and flux as 1D functions of s; VMEC-"light", produced by i2mex, containing 2D profiles; eqdsk.cdf equilibrium produced by JSOLVER; or free-boundary equilibrium + vacuum mesh produced by M3D-OMP (omp.out). Also requires a "wxy" file containing input namelists, and optionally a "config.dat"file containing mesh dimensions and domain decomposition (which may alternately be specified on the command line) and an "options" file specifying linear preconditioner and solver options for PETSc (likewise).

8. Type of output produced (including files that are read by other codes and sizes of large files and synthetic diagnostics):

Some diagnostic output is sent to stdout/stderr during the run, typically including description of initial state, followed by time history of poloidal kinetic energy and growth rate. (Typical file size is a few 100 kB).

Primary output mechanism is HDF5 file 3d.001.h5, containing 3D field data for up to 13 variables (including vector B field) sampled at a number of time steps. This is typically visualized using AVS/Express or Visit. There are also a number of post-processors for analyzing data. Typical file sizes range from 10s of MB up to 1 GB.

M3D also generates unformatted binary checkpoint files at specified intervals for restart purposes, up to a maximum of three for a single run; each is approximately 50% larger than a single time slice of the HDF5 file.

9. Describe any postprocessors which read the output files:

All postprocessors are optional.

poincareng12, qprofileng12: perform multiple field line traces to compute PoincarÂŽ maps, q profiles respectively.

surfit12, remap: jointly used to compute first-return maps for equilibrium surfaces based on perturbed field.

wallforce, wallforce2: Compute vessel forces and currents based on results of resistive wall calculations.

mode27: Fourier analyze total kinetic and magnetic energies

extractprof: extract a 1D profile of a single variable at specified z and phi (or a toroidal average) as a text file.

getfw: Use field line traces, q profile diagnostic to estimate width of island of specified helicity.

getprim: Compute primitive variables (Cylindrical components of B field, current density, and velocity) as well as pressure (if HDF5 contains temperature) or temperature (if HDF5 contains pressure), output to a 2nd HDF5 file.

diffHDF5: Compute the difference between all variables in two time slices in the same or different HDF5 files that have the same geometry, output to a new HDF5 file. (Can also be used to extract a single time slice from a file containing multiple slices).

concatHDF5: Concatenate several HDF5 files from separate runs in the same series together into one file for animation.

HDF2silo: Convert HDF5 file into silo format for easy visualization with Visit.

sxr: Simulated soft X-ray diagnostic; integrates a kernel along several chords at constant phi through the plasma at each time slice.

10. Status and location of code input/output documentation: Code documentation is incomplete. The most complete instructions are currently in the various documents at http://w3.pppl.gov/m3d/reference.html.

11. Code web site? http://w3.pppl.gov/m3d/index.php

12. Is code under version control? What system? Is automated regression testing performed? Yes. Subversion. No.

13. One to two paragraph description of equations solved and functionality including what discretizations are used in space and time:

The basic equations solved are those of resistive MHD: continuity, momentum, Faraday's and Ohm's laws, Ampere's law, and an equation of state, typically adiabatic. These are used to advance normalized scalar variables representing number density, a poloidal velocity stream function, a compressible poloidal velocity potential, toroidal velocity, poloidal magnetic flux, toroidal magnetic field, and total pressure. Terms may be dropped from these equations to give ideal or reduced MHD. Additional terms (ion gyroviscous and/or Hall) or equations (electron pressure) may be added to give two-fluid physics. In the case of a kinetic hot ion population, moments are taken of the distribution function to compute the divergence of the hot ion pressure tensor, which appears in the momentum equation. Parallel heat conduction is usually modeled with a diffusive artificial sound equation for temperature in the direction parallel to the magnetic field, advanced on substeps of the main time advance, though a fully implicit anisotropic heat conduction operator is now available as well.

The M3D mesh uses cylindrical coordinates (R,\phi,z) and a toroidal topology. The 2D spatial discretization in each constant-\phi plane is by linear finite elements with C^0 continuity on an unstructured triangular mesh. Higher-order finite element options exist, but are considered too time-consuming for routine use by most M3D users. Discretization in the toroidal direction is performed using either high-order finite differences or a pseudospectral option. Time discretization can be either first or second order, and is explicit with the exception of the compressional Alfven wave and the diffusive (parabolic) terms, which are treated implicitly and advanced separately. An option exists to treat the dominant shear Alfven wave terms implicitly as well, which typically allows the time step to be increased by approximately a factor of three before numerical instability sets in.

14. What modes of operation of code are there (e.g.: linear, nonlinear, reduced models, etc ): M3D can be run in linear, nonlinear, or equilibrium solver mode. Orthogonal to these are the tokamak and stellarator options. The physics model can be reduced MHD, ideal MHD, resistive MHD (with artificial sound or anisotropic heat conduction), two-fluid MHD, or a hybrid of MHD with a gyrokinetically described population of energetic ions. Calculations can be fixed- or free-boundary, with ideal or resistive walls. There is also considerably flexibility in choices of sources, sinks, external perturbations, etc.

15. Journal references describing code: W. Park, et al., Phys. Plasmas

6, 1796 (1999); G.Y. Fu, et al., Phys. Plasmas13052517 (2006).16. Codes it is similar to and differences (public version): Similar to M3D-C1, NIMROD. Unlike M3D-C1, contains a hybrid model. Unlike NIMROD, uses finite differences in toroidal direction. Unlike both, uses linear finite elements with C0 continuity, is not fully implicit in time, and has stellarator capability.

17. Results of code verification and convergence studies (with references): The calculated growth rate and mode frequency of an n=1 internal kink mode and fishbone compare well with the NOVA-K code. [G.Y. Fu, et al., Phys. Plasmas

13, 052517 (2006).] The simulated sawteeth in CDX-U plasmas agree well with those of NIMROD with respect to nonlinear evolution of kinetic energy. [J. Breslau, et al., Phys. Plasmas14, 056105 (2007)].18. Present and recent applications and validation exercises (with references as available): NSTX non-resonant internal kink mode studies [J.A. Breslau, et al., in Proceedings of the 23rd IAEA Conference, THS/P2-03 2010] Nonlinear modeling of ELMs [L.E. Sugiyama and H.R. Strauss, Phys. Plasmas

17, 062505 (2010)] Disruption modeling for ITER [H.R. Strauss, et al., Phys. Plasmas17, 082505 (2010)] Nonlinear hybrid simulations of TAE modes with source and sink [J. Lang, et al., Phys. Plasmas17, 042309 (2010)] CDX sawtooth benchmark [J. Breslau, et al., Phys. Plasmas14, 056105 (2007)] Nonlinear hybrid simulations of fishbone instability [G.Y. Fu, et al., Phys. Plasmas13, 052517 (2006).]19. Limitations of code parameter regime (dimensionless parameters accessible): Lundquist number S < 10^6. Due to the time step restriction and efficiency considerations, total simulated time is limited to no more than a few milliseconds.

20. What third party software is used? (e.g. Meshing software, PETSc, ...): PETSc (including HYPRE), FFTW, HDF5, NetCDF

21. Description of scalability M3D has been demonstrated to have good weak scaling up to ~10,000 processors on the Cray XT4. However, because the time advance is still partially explicit, it is highly impractical for production runs of slow resistive modes such as tearing and NTMs. At present, typical runs take advantage of up to ~500 cores, at which the scaling is very good. In future, we expect to use more than 1000 cores efficiently for larger production runs.

22. Major serial and parallel bottlenecks. The majority of execution time in M3D, both on small and large numbers of cores, is spent in linear algebra solver routines from the PETSc library. The scaling results quoted above suggest that there are no major serial bottlenecks in these routines. Scaling is most likely limited by the cost of node-to-node communication, while overall efficiency depends on the rate of convergence of the iterative solution algorithms (GMRES or conjugate gradient).

23. Are there smaller codes contained in the larger code? Describe: No.

24. Supported platforms and portability: Runs on the Cray XT4 and XT5 at NERSC. Limited portability to other systems due to reliance on an old and not-widely-supported version of PETSc. (To be remedied in the near future).

25. Illustrations of time-to-solution on different platforms and for different complexity of physics, if applicable. Typical linear fluid calculations take on the order of a day running on tens of processor cores. Typical nonlinear fluid calculations take on the order of a week on hundreds of cores.