Polyhedral compilation for the masses

Where:

Universite Pierre et Marie Curie

Room 25-26/105 (first floor, enter through tower 26)

When:

Tuesday October 21st

Mini-workshop: 9:30 - 12:15

Thesis defense: 14:15 - ?

How:

For logistical purposes, please send an email before October 14

if you plan to attend the mini-workshop: Albert.Cohen@inria.fr

Preliminary program

09:30-09:35 Welcome

09:35-10:10 - P. Sadayappan, Ohio State U.

10:10-10:45 - Sven Verdoolaege, INRIA and K. U. Leuven

10:45-11:00 Break

11:00-11:35 - Franz Franchetti, Carnegie Mellon U.

11:35-12:10 - Issam Said, U. Pierre et Marie Curie

14:15- - Thesis defense, Tobias Grosser, INRIA

Titles and abstracts

Speaker: P. Sadayappan, Ohio State U.

Title: On Other Uses for the Polyhedral Model Besides Performance

Optimization

The use of the polyhedral model is now widely acknowledged as the

most effective approach to compiler optimization of affine loop

computations. However, a significant unsolved challenge is that of

effectively using the polyhedral model to optimize real application

codes, which rarely are fully affine, but invariably also contain

interleaved non-affine computations. In this talk, two uses of the

polyhedral model for other goals than loop transformations will be

discussed: 1) a compiler-assisted approach to detecting transient

memory errors, and 2) compiler-assistance for software cache

coherence. With both these uses, arbitrarily interleaved

affine/non-affine codes can be handled, where the polyhedral

analysis offers benefits for the affine code portions, along with a

conservative handling of the non-affine computations.

Speaker: Sven Verdoolaege, INRIA and K. U. Leuven

Title: Integer Set Coalescing for Polyhedral Compilation

Polyhedral compilation is widely used in high-level synthesis tools

and in production compilers such as gcc, LLVM and IBM/XL. It is

based on the polyhedral model, a powerful abstraction for analyzing

and transforming (parts of) programs that are sufficiently

regular''. The key feature of this model is that it is instance

based, allowing for a representation and treatment of individual

dynamic executions of a statement inside a loop nest and/or

individual array elements. These individual elements are

represented by tuples of integers which are collected in sets and

relations described by affine constraints. Polyhedral compilation

then mainly consists of an analysis and transformation of these sets

and relations.

There are typically many ways to describe the same set of integer

tuples using affine constraints. Most operations that can be

performed on these sets do not depend on which particular

representation is used, in the sense that only the time required for

the computation is affected by this choice but not the value of the

result. There are however some operations where this choice can

have a significant impact, notably the computation of

(overapproximations of) transitive closures and the construction of

an AST for visiting each point in a set. Assuming a representation

in disjunctive normal form, it is in all cases important to keep the

number of disjuncts as small as possible. Excess disjuncts may

result from operations such as subtraction, lexicographic

optimization and integer projection. Set coalescing tries to

combine several disjuncts into a single disjunct without affecting

the elements in the set. We describe several cases where coalescing

can be applied.

Speaker: Franz Franchetti, Carnegie Mellon U.

Title: High Assurance Spiral: Co-Synthesizing Proof and Implementation

From High-Level Specification

In this talk we introduce "High Assurance SPIRAL" to solve the "last

mile" problem for the synthesis of high assurance implementations of

controllers for vehicular systems that are executed in todays and

future embedded and high performance embedded system

processors. High Assurance SPIRAL is a methodology to translate a

high level specification of a high assurance controller into a

highly resource-efficient, platform-adapted, verified control

software implementation for a given platform in a language like C or

C++. High Assurance SPIRAL proves that the implementation is

equivalent to the specification written in the control engineer's

domain language. Our approach supports problems involving

floating-point calculations and provides highly optimized

synthesized code. At the core of High Assurance SPIRAL is the

Hybrid Control Operator Language (HCOL) that leverages advanced

mathematical constructs expressing the controller specification to

provide high quality translation capabilities.

Speaker: Issam Said, U. Pierre et Marie Curie

Title: Evaluating APU for high performance scientific computing

AMD APUs eliminate the PCI Express bus which bottlenecks many GPU

scientific applications. However, integrated GPUs in APUs are less

powerful than mainstream discrete GPUs, and while the upcoming APUs

will rely on a unified memory, the first APUs still have a distinct

GPU memory partition. Hence, it is worthwhile to investigate the

applications, as well as problem sizes, for which the GPU part of an

APU may outperform a discrete GPU. In this talk we assess the

relevance of the early generations of APUs for scientific HPC via

hardware and applicative OpenCL micro-benchmarks. We present

detailed measurements of CPU-GPU data transfers and performance

tests of highly optimized 3D finite difference stencils. We also

emphasize the importance of data placement when APUs are used. Our

results show that integrated GPUs of upcoming APUs can outperform

discrete high-end GPUs for medium sized problems with high

communication requirements.

PhD candidate: Tobias Grosser, INRIA

Title: A Decoupled Approach to High-Level Loop Optimization,

tile shapes, polyhedral building blocks and low-level compilers

Despite decades of research on high-level loop optimizations and

their successful integration in production C/C++/FORTRAN compilers,

most compiler internal loop transformation systems only partially

address the challenges posed by the increased complexity and

diversity of today's hardware. Especially when exploiting domain

specific knowledge to obtain optimal code for complex targets such

as accelerators or many-cores processors, many existing loop

optimization frameworks have difficulties exploiting this

hardware. As a result, new domain specific optimization schemes are

developed independently without taking advantage of existing loop

optimization technology. This results both in missed optimization

opportunities as well as low portability of these optimization

schemes to different compilers. One area where we see the need for

better optimizations are iterative stencil computations, an

important computational problem that is regularly optimized by

specialized, domain specific compilers, but where generating

efficient code is difficult.

In this work we present new domain specific optimization strategies

that enable the generation of high-performance GPU code for stencil

computations. Different to how most existing domain specific

compilers are implemented, we decouple the high-level optimization

strategy from the low-level optimization and specialization

necessary to yield optimal performance. As high-level optimization

scheme we present a new formulation of split tiling, a tiling

technique that ensures reuse along the time dimension as well as

balanced coarse grained parallelism without the need for redundant

computations. Using split tiling we show how to integrate a domain

specific optimization into a general purpose C-to-CUDA translator,

an approach that allows us to reuse existing non-domain specific

optimizations. We then evolve split tiling into a hybrid

hexagonal/parallelogram tiling scheme that allows to generate code

that even better addresses GPU specific concerns. To conclude our

work on tiling we investigate the relation between diamond and

hexagonal tiling. Starting with a detailed analysis of diamond

tiling including the requirements it poses on tile sizes and

wavefront coefficients, we provide a unified formulation of

hexagonal and diamond tiling to perform hexagonal tiling for two

dimensional problems (one time, one space) in the context of a

general purpose optimizer such as Pluto. Finally, we use this

formulation to evaluate hexagonal and diamond tiling in terms of

compute-to-communication and compute-to-synchronization ratios.

In the second part of this work, we discuss our contributions to

important infrastructure components, our building blocks, that

enviable us to decouple our high-level optimizations from both the

nec- essary code generation optimizations as well as the compiler

infrastructure we apply the optimization to. We start with

presenting a new polyhedral extractor that obtains a polyhedral

representation from a piece of C code, widening the supported C code

to exploit the full generality of Presburger arithmetic and taking

special care of modeling language semantics even in the presence of

defined integer wrapping. As a next step, we present a new

polyhedral AST generation approach, which extends AST generation

beyond classical control flow generation by allowing the generation

of user provided mappings. Providing a fine-grained option

mechanism, we give the user fine grained control about AST generator

decisions and add extensive support for specialization e.g., with a

new generalized form of polyhedral unrolling. To facilitate the

implementation of polyhedral transformations, we present a new

schedule representation, schedule trees, which proposes to make the

inherent tree structure of schedules explicit to simplify the work

with complex polyhedral schedules.

The last part of this work takes a look at our contributions to

low-level compilers. The main focus in this part is our work on

optimistic delinearization, an approach to derive a

multi-dimensional array view for multi-variate polynomial

expressions which commonly result from code that models data as

multi-dimensional arrays of parametric size.