We report on an novel DEM (discrete element method) code with explicit time stepping. DEM codes simulate billions of small particles that interact with each other primarily through collisions. They are used to study granulates in environmental or medical engineering, e.g. Different to state-of-the-art codes, we have not used spheres to model the particles but rely on triangulated non-spherical particles. This makes the problem computationally more demanding. We benchmark spatial domain decomposition and synchronous/asynchronous data exchange communication techniques using the MPI (Message Passing Interface) interface on manycore supercomputers. In the context of contact detection, we discuss possible solutions to
overlapping ghost particles that overlap multiple subdomains and the implications for data structures and communication patterns between processes. At the level of compute node parallelism, shared memory parallelism as well as vectorised SIMD (Single Instruction Multiple Data) executions are investigated for node level high performance. On the intra-node level, we explore hybrid parallelisation approaches and memory layouts that are suited for achieving low latency and high degree of parallelism in the triangulated-geometry DEM context.