All :: 1990, ... , 2013, 2014, 2015, ... , 2017


Membarth, Richard, Reiche, Oliver, Schmitt, Christian, Hannig, Frank, Teich, Jürgen, Stürmer, Markus; Köstler, Harald
Towards a Performance-portable Description of Geometric Multigrid Algorithms using a Domain-specific Language
Journal of Parallel and Distributed Computing (JPDC), 24(12):3191-3201
December 2014

Keywords: multigrid; multiresolution; image pyramid; domain-specific language; stencil codes; code generation; GPU; CUDA; OpenCL

Abstract: High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level CUDA and OpenCL code is generated from a high-level description of an algorithm specified in a Domain-Specific Language (DSL) instead of writing hand-tuned code for GPU accelerators. The DSL is part of the Heterogeneous Image Processing Acceleration (HIPAcc) framework and was extended in this work to handle grid hierarchies in order to model different cycle types. Language constructs are introduced to process and represent data at different resolutions. This allows to describe image processing algorithms that work on image pyramids as well as multigrid methods in the stencil domain. By decoupling the algorithm from its schedule, the proposed approach allows to generate efficient stencil code implementations. Our results show that similar performance compared to hand-tuned codes can be achieved.

Sons, Kristian, Klein, Felix, Sutter, Jan; Slusallek, Philipp
shade.js: Adaptive Material Descriptions
Computer Graphics Forum, 33(7):51--60
October 2014
ISSN: 1467-8659
Köster, Marcel, Leißa, Roland, Hack, Sebastian, Membarth, Richard; Slusallek, Philipp
Code Refinement of Stencil Codes
Parallel Processing Letters (PPL), 24(3):1-16
September 2014

Keywords: stencil codes; partial evaluation; domain-specific language

Abstract: A straightforward implementation of an algorithm in a general-purpose programming language does usually not deliver peak performance: compilers often fail to automatically tune the code for certain hardware peculiarities like memory hierarchy or vector execution units. Manually tuning the code is firstly error-prone as well as time-consuming and secondly taints the code by exposing those peculiarities to the implementation. A popular method to circumvent these problems is to implement the algorithm in a Domain-Specific Language (DSL). A DSL compiler can then automatically tune the code for the target platform. In this paper we show how to embed a DSL for stencil codes in another language. In contrast to prior approaches we only use a single language for this task. Furthermore, we offer explicit control over code refinement in the language itself which is used to specialize stencils for particular scenarios. Our first results show that our specialized programs achieve competitive performance compared to hand-tuned CUDA programs.

Davidovic, Tomas, Krivanek, Jaroslav, Hasan, Milos; Slusallek, Philipp
Progressive Light Transport Simulation on the GPU: Survey and Improvements
CM Trans. Graph, 33(3):29:1-29:19
May 2014
ISSN: 0730-0301

Keywords: GPU; Global illumination; bidirectional path tracing; high performance; vertex connection and merging

Abstract: Graphics Processing Units (GPUs) recently became general enough to enable implementation of a variety of light transport algorithms. However, the efficiency of these GPU implementations has received relatively little attention in the research literature and no systematic study on the topic exists to date. The goal of our work is to fill this gap. Our main contribution is a comprehensive and in-depth investigation of the efficiency of the GPU implementation of a number of classic as well as more recent progressive light transport simulation algorithms. We present several improvements over the state-of-the-art. In particular, our Light Vertex Cache, a new approach to mapping connections of sub-path vertices in Bidirectional Path Tracing on the GPU, outperforms the existing implementations by 30-60%. We also describe a first GPU implementation of the recently introduced Vertex Connection and Merging algorithm [Georgiev et al. 2012], showing that even relatively complex light transport algorithms can be efficiently mapped on the GPU. With the implementation of many of the state-of-the-art algorithms within a single system at our disposal, we present a unique direct comparison and analysis of their relative performance.

Zinnikus, Ingo, Byelozyorov, Sergiy, Cao, Xiaoqi, Klusch, Matthias, Krauss, Christopher, Nonnengart, Andreas, Spieldenner, Torsten, Warwas, Stefan; Slusallek, Philipp
A Collaborative Virtual Workspace for Factory Configuration and Evaluation
Collaborative Computing,
March 2014
Dahmen, Tim, Baudoin, Jean-Pierre, Lupini, Andrew, Kübel, Christian, Slusallek, Philipp; de Jonge, Niels
Combined Scanning Transmission Electron Microscopy Tilt- and Focal Series
Microscopy and Microanalysis, :1-13
February 2014

Keywords: STEM, tomography, 3D, focal series, whole cell, nanoparticle, SART, 3D reconstruction, back projection

Abstract: In this study, a combined tilt- and focal series is proposed as a new recording scheme for high-angle annular dark-field scanning transmission electron microscopy (STEM) tomography. Three-dimensional (3D) data were acquired by mechanically tilting the specimen, and recording a through-focal series at each tilt direction. The sample was a whole-mount macrophage cell with embedded gold nanoparticles. The tilt–focal algebraic reconstruction technique (TF-ART) is introduced as a new algorithm to reconstruct tomograms from such combined tilt- and focal series. The feasibility of TF-ART was demonstrated by 3D reconstruction of the experimental 3D data. The results were compared with a conventional STEM tilt series of a similar sample. The combined tilt- and focal series led to smaller “missing wedge” artifacts, and a higher axial resolution than obtained for the STEM tilt series, thus improving on one of the main issues of tilt series-based electron tomography.


Membarth, Richard, Slusallek, Philipp, Köster, Marcel, Leißa, Roland; Hack, Sebastian
Target-Specific Refinement of Multigrid Codes
Proceedings of the 4th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) , page 52-57.
November 2014

Keywords: multigrid codes; partial evaluation; domain- specific language

Abstract: This paper applies partial evaluation to stage a stencil code Domain-Specific Language (DSL) onto a functional and imperative programming language. Platform-specific primitives such as scheduling or vectorization, and algorithmic variants such as boundary handling are factored out into a library that make up the elements of that DSL. We show how partial evaluation can eliminate all overhead of this separation of concerns and creates code that resembles hand-crafted versions for a particular target platform. We evaluate our technique by implementing a DSL for the V-cycle multigrid iteration. Our approach generates code for AMD and NVIDIA GPUs (via SPIR and NVVM) as well as for CPUs using AVX/AVX2 alike from the same high-level DSL program. First results show that we achieve a speedup of up to 3× on the CPU by vectorizing multigrid components and a speedup of up to 2× on the GPU by merging the computation of multigrid components.

Danilewski, Piotr, Köster, Marcel, Leißa, Roland, Membarth, Richard; Slusallek, Philipp
Specialization through Dynamic Staging
Proceedings of the 13th International Conference on Generative Programming: Concepts & Experiences (GPCE) , page 103-112.
September 2014

Keywords: dynamic staging; partial evaluation; code specialization

Abstract: Partial evaluation allows for specialization of program fragments. This can be realized by staging, where one fragment is executed earlier than its surrounding code. However, taking advantage of these capabilities is often a cumbersome endeavor. In this paper, we present a new metaprogramming concept using staging parameters that are first-class citizen entities and define the order of execution of the program. Staging parameters can be used to define MetaML-like quotations, but can also allow stages to be created and resolved dynamically. The programmer can write generic, polyvariant code which can be reused in the context of different stages. We demonstrate how our approach can be used to define and apply domain-specific optimizations. Our implementation of the proposed metaprogramming concept generates code which is on a par with templated C++ code in terms of execution time.

Dahmen, Tim, Baudoin, Jean Pierre, Lupini, Andrew, Kuebel, Christian, Slusallek, Philipp; de Jonge, Niels
Combined tilt- and focal series scanning transmission electron microscopy: TFS 3D STEM
Proceedings of 18th International Microscopy Congress
September 2014
Dahmen, Tim, Slusallek, Philipp; de Jonge, Niels
TFS: Combined Tilt- and Focal Series for Scanning Transmission Electron Microscopy.
Proceedings of Microscopy & Microanalysis 2014
September 2014
Roland, Michael, Dahmen, Tim, Tjardes, Thorsten, Otchwemah, Robin, Slusalleck, Philipp; Diebels, Stefan
Optimized patient-specific implants
Proceedings of 11th World Congress on Computational Mechanics
July 2014
Köster, Marcel, Leißa, Roland, Hack, Sebastian, Membarth, Richard; Slusallek, Philipp
Platform-Specific Optimization and Mapping of Stencil Codes through Refinement
Proceedings of the First International Workshop on High-Performance Stencil Computations (HiStencils) , page 1-6.
January 2014
Previous | 1, 2 | Next
Export as: