Where:
Universite Pierre et Marie Curie
Room 25-26/105 (first floor, enter through tower 26)
When:
Tuesday October 21st
Mini-workshop: 9:30 - 12:15
Thesis defense: 14:15 - ?
How:
For logistical purposes, please send an email before October 14
if you plan to attend the mini-workshop: Albert.Cohen@inria.fr
Preliminary program
09:30-09:35 Welcome
09:35-10:10 - P. Sadayappan, Ohio State U.
10:10-10:45 - Sven Verdoolaege, INRIA and K. U. Leuven
10:45-11:00 Break
11:00-11:35 - Franz Franchetti, Carnegie Mellon U.
11:35-12:10 - Issam Said, U. Pierre et Marie Curie
14:15- - Thesis defense, Tobias Grosser, INRIA
Titles and abstracts
Speaker: P. Sadayappan, Ohio State U.
Title: On Other Uses for the Polyhedral Model Besides Performance
Optimization
The use of the polyhedral model is now widely acknowledged as the
most effective approach to compiler optimization of affine loop
computations. However, a significant unsolved challenge is that of
effectively using the polyhedral model to optimize real application
codes, which rarely are fully affine, but invariably also contain
interleaved non-affine computations. In this talk, two uses of the
polyhedral model for other goals than loop transformations will be
discussed: 1) a compiler-assisted approach to detecting transient
memory errors, and 2) compiler-assistance for software cache
coherence. With both these uses, arbitrarily interleaved
affine/non-affine codes can be handled, where the polyhedral
analysis offers benefits for the affine code portions, along with a
conservative handling of the non-affine computations.
Speaker: Sven Verdoolaege, INRIA and K. U. Leuven
Title: Integer Set Coalescing for Polyhedral Compilation
Polyhedral compilation is widely used in high-level synthesis tools
and in production compilers such as gcc, LLVM and IBM/XL. It is
based on the polyhedral model, a powerful abstraction for analyzing
and transforming (parts of) programs that are
sufficiently
regular''. The key feature of this model is that it is instance
based, allowing for a representation and treatment of individual
dynamic executions of a statement inside a loop nest and/or
individual array elements. These individual elements are
represented by tuples of integers which are collected in sets and
relations described by affine constraints. Polyhedral compilation
then mainly consists of an analysis and transformation of these sets
and relations.
There are typically many ways to describe the same set of integer
tuples using affine constraints. Most operations that can be
performed on these sets do not depend on which particular
representation is used, in the sense that only the time required for
the computation is affected by this choice but not the value of the
result. There are however some operations where this choice can
have a significant impact, notably the computation of
(overapproximations of) transitive closures and the construction of
an AST for visiting each point in a set. Assuming a representation
in disjunctive normal form, it is in all cases important to keep the
number of disjuncts as small as possible. Excess disjuncts may
result from operations such as subtraction, lexicographic
optimization and integer projection. Set coalescing tries to
combine several disjuncts into a single disjunct without affecting
the elements in the set. We describe several cases where coalescing
can be applied.
Speaker: Franz Franchetti, Carnegie Mellon U.
Title: High Assurance Spiral: Co-Synthesizing Proof and Implementation
From High-Level Specification
In this talk we introduce "High Assurance SPIRAL" to solve the "last
mile" problem for the synthesis of high assurance implementations of
controllers for vehicular systems that are executed in todays and
future embedded and high performance embedded system
processors. High Assurance SPIRAL is a methodology to translate a
high level specification of a high assurance controller into a
highly resource-efficient, platform-adapted, verified control
software implementation for a given platform in a language like C or
C++. High Assurance SPIRAL proves that the implementation is
equivalent to the specification written in the control engineer's
domain language. Our approach supports problems involving
floating-point calculations and provides highly optimized
synthesized code. At the core of High Assurance SPIRAL is the
Hybrid Control Operator Language (HCOL) that leverages advanced
mathematical constructs expressing the controller specification to
provide high quality translation capabilities.
Speaker: Issam Said, U. Pierre et Marie Curie
Title: Evaluating APU for high performance scientific computing
AMD APUs eliminate the PCI Express bus which bottlenecks many GPU
scientific applications. However, integrated GPUs in APUs are less
powerful than mainstream discrete GPUs, and while the upcoming APUs
will rely on a unified memory, the first APUs still have a distinct
GPU memory partition. Hence, it is worthwhile to investigate the
applications, as well as problem sizes, for which the GPU part of an
APU may outperform a discrete GPU. In this talk we assess the
relevance of the early generations of APUs for scientific HPC via
hardware and applicative OpenCL micro-benchmarks. We present
detailed measurements of CPU-GPU data transfers and performance
tests of highly optimized 3D finite difference stencils. We also
emphasize the importance of data placement when APUs are used. Our
results show that integrated GPUs of upcoming APUs can outperform
discrete high-end GPUs for medium sized problems with high
communication requirements.
PhD candidate: Tobias Grosser, INRIA
Title: A Decoupled Approach to High-Level Loop Optimization,
tile shapes, polyhedral building blocks and low-level compilers
Despite decades of research on high-level loop optimizations and
their successful integration in production C/C++/FORTRAN compilers,
most compiler internal loop transformation systems only partially
address the challenges posed by the increased complexity and
diversity of today's hardware. Especially when exploiting domain
specific knowledge to obtain optimal code for complex targets such
as accelerators or many-cores processors, many existing loop
optimization frameworks have difficulties exploiting this
hardware. As a result, new domain specific optimization schemes are
developed independently without taking advantage of existing loop
optimization technology. This results both in missed optimization
opportunities as well as low portability of these optimization
schemes to different compilers. One area where we see the need for
better optimizations are iterative stencil computations, an
important computational problem that is regularly optimized by
specialized, domain specific compilers, but where generating
efficient code is difficult.
In this work we present new domain specific optimization strategies
that enable the generation of high-performance GPU code for stencil
computations. Different to how most existing domain specific
compilers are implemented, we decouple the high-level optimization
strategy from the low-level optimization and specialization
necessary to yield optimal performance. As high-level optimization
scheme we present a new formulation of split tiling, a tiling
technique that ensures reuse along the time dimension as well as
balanced coarse grained parallelism without the need for redundant
computations. Using split tiling we show how to integrate a domain
specific optimization into a general purpose C-to-CUDA translator,
an approach that allows us to reuse existing non-domain specific
optimizations. We then evolve split tiling into a hybrid
hexagonal/parallelogram tiling scheme that allows to generate code
that even better addresses GPU specific concerns. To conclude our
work on tiling we investigate the relation between diamond and
hexagonal tiling. Starting with a detailed analysis of diamond
tiling including the requirements it poses on tile sizes and
wavefront coefficients, we provide a unified formulation of
hexagonal and diamond tiling to perform hexagonal tiling for two
dimensional problems (one time, one space) in the context of a
general purpose optimizer such as Pluto. Finally, we use this
formulation to evaluate hexagonal and diamond tiling in terms of
compute-to-communication and compute-to-synchronization ratios.
In the second part of this work, we discuss our contributions to
important infrastructure components, our building blocks, that
enviable us to decouple our high-level optimizations from both the
nec- essary code generation optimizations as well as the compiler
infrastructure we apply the optimization to. We start with
presenting a new polyhedral extractor that obtains a polyhedral
representation from a piece of C code, widening the supported C code
to exploit the full generality of Presburger arithmetic and taking
special care of modeling language semantics even in the presence of
defined integer wrapping. As a next step, we present a new
polyhedral AST generation approach, which extends AST generation
beyond classical control flow generation by allowing the generation
of user provided mappings. Providing a fine-grained option
mechanism, we give the user fine grained control about AST generator
decisions and add extensive support for specialization e.g., with a
new generalized form of polyhedral unrolling. To facilitate the
implementation of polyhedral transformations, we present a new
schedule representation, schedule trees, which proposes to make the
inherent tree structure of schedules explicit to simplify the work
with complex polyhedral schedules.
The last part of this work takes a look at our contributions to
low-level compilers. The main focus in this part is our work on
optimistic delinearization, an approach to derive a
multi-dimensional array view for multi-variate polynomial
expressions which commonly result from code that models data as
multi-dimensional arrays of parametric size.