Benchmarks


Here you can find some of the benchmarks made with EPW.

  • Scalability of the interpolation part of EPW v4.2 on CSD3 for polar SiC on a 64x64x64 k-point grid and 8x8x8 q-grid.

The calculations were performed using Intel 17.0.4 with intel mpi and mkl and with "-xAVX -mavx -axCOMMON-AVX512" vectorization flags on Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz. Intel Omni-Path HPC interconnect. Multi-petabyte SSD-accelerated Intel Lustre.


Strong scaling of the interpolation part of EPW on CSD3 Xeon Phi for the polar SiC. The parallelization is done over k-points using MPI. The absolute time for the calculation was 6h01 at 64 cores and 9 min at 8192 cores. (S. Poncé)


  • Scalability of the interpolation part of EPW v4.1 on ARCHER Cray XC30 for the polar wurtzite GaN.

The calculations were performed using the Intel 15.0.2.164 compiler on a Cray XC30 machine with 12-core Intel Xeon E5-2697v2 (Ivy Bridge) 2.7 GHz processors sharing 64GB of memory and joined by two QPI links, connected via proprietary Cray Aries interconnect (Dragonfly topology). The analysis was performed using Score-P 2.0.2 and Scalasca 2.3.1. instrumentation.


Scalability of the interpolation part of EPW on ARCHER Cray XC30 for the polar wurtzite GaN. The parallelization is done over k-points using MPI. (S. Poncé)


  • Scalability EPW v4.0 on SiC using a 6 × 6 × 6 Γ-centered k and q-points coarse grids.

The fine grids on which the Wannier interpolation was performed were a 50 × 50 × 50 k-point grid and a 10 × 10 × 10 q-point grid. The test was performed on an Intel Xeon CPU E5620 with a clock frequency of 2.40 GHz. The codes were compiled using ifort 13.0.1 with the following compilation flags -O2 -assume byterecl -g -traceback -nomodule -fpp. The MPI parallelization was performed using Open MPI 1.8.1.


Parallelization in EPW to compute the electronic lifetime of SiC between v3 and v4 of EPW. The blue and red plain lines show the speedup obtained on a full calculation with the previous and current version of EPW. The speedup with 128 processors is 55 and 76 for the previous and current version, respectively. The interpolation algorithm (the most time consuming part) has been improved (dashed lines). (S. Poncé)

Comparison of the time required to compute the electronic lifetime of SiC using EPW 3 and EPW 4.0, run on one processor. We show the time required for the calculation of the electrons and phonons perturbations using DFPT (QE+PH), the calculations of the electron-phonon matrix elements and their unfolding from the IBZ to the BZ using the crystal symmetries (Unfolding), the Wannierization from the coarse Bloch space to the real space (Wannier) and the interpolation from real space to fine grids in Bloch space (Interpolation). (S. Poncé)