2010-07-29

supercomputer power promised in the CHaPeL (Cascade Hi'Productivity Lang)

7.26: news.adda/lang"Chapel(Cascade Hi'Productivity Lang'):

2005.9 PGAS Programming Models Conference/Chapel (pdf):
. the final session of the
Parallel Global Address Space(PGAS)
Programming Models Conference was
devoted to DARPA's HPCS program
(High Productivity Computing Systems):
Cascade[chapel]
, X10[verbose java]
, Fortress[greek]
, StarP[slow matlab].
Locality Control Through Domains:
[7.29:
. domains are array subscript objects,
specifying the size and shape of arrays,
the represent a set of subscripts;
and so, applying a domain to an array
selects a set of array elements .
(recall that term"domain is part of
function: domain-->codomain terminology)
. domains make it easy to work with
sparse arrays, hash tables, graphs,
and interior slices of arrays .]
. domains can be distributed across locales,
which generally correspond to CPUs in Chapel.
This gives Chapel its fundamentally
global character, as contrasted to
the process-centric nature of MPI or CAF,
for example.
When operations are performed on
an array whose domain is distributed,
any needed IPC is implicitly carried out,
-- (inter-processor communication) --
without the need for function calls.
Chapel provides much more generality
in the distribution of domains
than high performance Fortran, or UPC,
but, will not take the place of
the complex domain decomposition tools
required to distribute data for
optimum load balance and communication
in most practical parallel programs.
[7.29:
. HPC systems today are overwhelmingly
distributed memory systems,
and the applications tend to require
highly irregular communication
between the CPUs.
This means that good performance
depends on effective locality management,
which minimizes the IPC costs .
whereas,
productive levels of abstraction
will degrades performance.
Chapel simply allows mixing
a variety of abstraction levels .]

Cray's Chapel intro (pdf):
Chapel (Cascade High Productivity Language)
is Cray's programming language for
supercomputers, like the Cascade system;
part of the Cray Cascade project,
a participant in DARPA's HPCS program
(High Productivity Computing Systems ) .
influences:
. iterators:
CLU, Ruby, Python
. latent types:
ML, Scala, Matlab, Perl, Python, C#
. OOP, type safety:
Java, C#:
. generic programming/templates:
C++
. data parallelism, index sets, distributed arrays:
ZPL (Z-level Programming Language)
HPF (High-Performance Fortran),
. task parallelism, synchronization:
Cray's MTA's extensions to Fortran and C.
(Multi-Threaded Architecture)

Global-view vs Fragmented models:
. in Fragmented programming models,
the Programmer's point-of-view
is a single processor/thread;
. the PGAS lang" UPC (unified parallel C),
has a fragmented compute model
and a Global-View data model .
. the shared mem' sytems" OpenMP & pThreads
have a trivially Global View of everything .
. the HPCS lang's, including Chapel,
hava a Global View of everything .

too-low- & too-high-level abstractions
vs multiple levels of design:

. openMP, pthreads, MPI are low-level & difficult;
HPF, ZPL are high-level & inefficient .
Chapel has a mix of abstractions for:
# task scheduling levels:
. work stealing; suspendable tasks;
task pool; thread per task .
# lang' concept levels:
. data parallelism; distributions;
task parallelism;
base lang; locality control .
# mem'mgt levels:
. gc; region-based;
manual(malloc,free) .
. chapel`downloads for linux and mac .
readme for Chapel 1.1
The highlights of this release include:
parallel execution of all
data parallel operations
on arithmetic domains and arrays;
improved control over the
degree and granularity of parallelism
for data parallel language constructs;
feature-complete
Block and Cyclic distributions;
simplified constructor calls for
Block and Cyclic distributions;
support for assignments between,
and removal of indices from,
sparse domains;
more robust performance optimizations on
aligned arithmetic domains and arrays;
many programmability
and correctness improvements;
new example programs demonstrating
task parallel concepts and distributions;
wide-ranging improvements
to the content and organization of the
language specification.
This release of Chapel contains
stable support for the base language,
and for task and regular data parallelism
using one or multiple nodes.
Data parallel features
on irregular domains and arrays
are supported via a single-threaded,
single-node reference implementation.
impl' status:
No support for inheritance from
multiple or generic classes
Incomplete support for user-defined constructors
Incomplete support for sparse arrays and domains
Unchecked support for index types and sub-domains
No support for skyline arrays
No constant checking for domains, arrays, fields
Several internal memory leaks
Task Parallelism
No support for atomic statements
Memory consistency model is not guaranteed
Locality and Affinity
String assignment across locales is by reference
Data Parallelism
Promoted functions/operators do not preserve shape
User-defined reductions are undocumented and in flux
No partial scans or reductions
Some data parallel statements are serialized
Distributions and Layouts
Distributions are limited to Block and Cyclic
User-defined domain maps are undocumented and in flux
7.27 ... 28:
IJHPCA` High Productivity Languages and Models
(Internat' J. HPC App's, 2007, 21(3))


Diaconescua, Zima 2007`An Approach to Data Distributions in Chapel (pdf):
--. same paper is ref'd here:
Chapel Publications from Collaborators:
. they note:
"( This paper presents early exploratory work
in developing a philosophy and foundation for
Chapel's user-defined distributions ).
User-defined distributions
are first-class objects:
placed in a library,
passed to functions,
and reused in array declarations.
In the simplest case,
the specification of a
new distribution
can consist of just a few lines
of code to define mappings between
the global indices of a data structure
and memory;
in contrast, a sophisticated user
(or distribution writer)
can control the internal
representation and layout of data
to an almost arbitrary degree,
allowing even the expression of
auxiliary structures
typically used for distributed
sparse matrix data.
Specifically,
our distribution framework is designed to
support:
• The mapping of arbitrary data collections
to units of locality,
• the specification of user-defined mappings
exploiting knowledge of data structures
and their access patterns,
• the capability to control the
layout (allocation) of data
within units of locality,
• orthogonality between
distributions and algorithms,
• the uniform expression of computation
for both dense and sparse data structures,
• reusability and extensibility
of the data mapping machinery itself,
as well as of the common data mapping patterns
occurring in various application domains.

Our approach is the first
that addresses these issues
completely and consistently
at a high level of abstraction;
in contrast to the current programming paradigm
that explicitly manages data locality
and the related aspects of synchronization,
communication, and thread management
at a level close to what assembly programming
was for sequential languages.
The challenge is to allow the programmer
high-level control of data locality
based on the knowledge of the problem
without unnecessarily burdening the
expression of the algorithm
with low-level detail,
and achieving target code performance
similar to that of
manually parallelized programs.

Data locality is expressed via
first-class objects called distributions.
Distributions apply to collections
of indices represented by domains,
which determine how arrays
associated with a domain
are to be mapped and allocated across
abstract units of uniform memory access
called locales.
Chapel offers an open concept of distributions,
defined by a set of classes
which establish the interface between
the programmer and the compiler.
Components of distributions
are overridable by the user,
at different levels of abstraction,
with varying degrees of difficulty.
Well-known regular standard distributions
can be specified along with
arbitrary irregular distributions
using the same uniform framework.
There are no built-in distributions
in our approach.
Instead, the vision is that
Chapel will be an open source language,
with an open distribution interface,
which allows experts and non-experts
to design new distribution classes
and support the construction of
distribution libraries that can be
further reused, extended, and optimized.
Data parallel computations
are expressed via forall loops,
which concurrently iterate over domains.

. the class of PGAS languages
including Unified Parallel C (UPC)
provide a reasonable improvement
over lower-level communications with MPI.
. UPC`threads support block-cyclic
distributions of one-dimensional arrays
over a one-dimensional set of processors .
and a stylized upc forall loop
that supports an affinity expression
to map iterations to threads.

. the other DARPA`HPCS lang's
provide built-in distributions
as well as the possibility to create
new distributions from existing ones.
However,
they do not contain features
for specifying user-defined
distributions and layouts.
Furthermore,
X10’s locality rule requires
an explicit distinction between
local and remote accesses
to be made by the programmer
at the source language level.
The key differences between
existing work and our approach
can be summarized as follows.
First,
we provide a general oop framework
for the specification of
user-defined distributions,
integrated into an advanced
high-productivity parallel language.
Secondly,
our framework allows the
flexible formulation
of data distributions,
locale-internal data arrangements,
and associated control mechanisms
at a high level of abstraction,
tuned to the properties of
architectures and applications.
This ensures
target code performance
that is otherwise achievable only via
low-level control.
. Chapel Publications and Papers .
. Chapel Specification [current version (pdf)] .


Parallel Programmability and the Chapel Language (pdf)
bradc@cray.com, d.callahan@microsoft.com, zima@jpl.nasa.gov`
Int.J. High Performance Computing App's, 21(3) 2007

This paper serves as a good introduction
to Chapel's themes and main language concepts.
7.28: adda/concurrency/chapel/Common Component Architecture:
Common Component Architecture (CCA) Forum

2005 CCA Whitepaper (pdf):
. reusable scientific components
and the tools with which to use them.
In addition to developing
simple component examples
and hands-on exercises as part of
CCA tutorial materials,
we are growing a CCA toolkit of components
that is based on widely used software packages,
including:
ARMCI (one-sided messaging),
CUMULVS (visualization and parallel data redistribution),
CVODE (integrators), DRA (parallel I/O),
Epetra (sparse linear solvers),
Global Arrays (parallel programming),
GrACE (structured adaptive meshes),
netCDF and parallel netCDF (input/output),
TAO (optimization), TAU (performance measurement),
and TOPS (linear and nonlinear solvers).
Babel (inter-lang communication):
Babel is a compiler
that generates glue code from
SIDL interface descriptions.
(Scientific Interface Description Language)
SIDL features support for
complex numbers, structs,
and dynamic multidimensional arrays.
SIDL provides a modern oop model,
with automatic ref'counting
and resource (de)allocation.
-- even on top of traditional
procedural languages.
Code written in one language
can be called from any of the
supported languages.
Full support for
Remote Method Invocation (RMI)
allows for parallel distributed applications.

Babel focuses on high-performance
language interoperability
within a single address space;
It won a prestigious R&D 100 award in 2006
for "The world's most
rapid communication among
many languages in a single application."

. Babel currently fully supports
C, C++ Fortran, Python, and Java.
-- Chapel is coming soon:
CCA working with chapel 2009:
Babel migration path for chapel:
Collaboration Status: Active
TASCS Contact: Brad Chamberlain, Cray mailto:bradc@cray.com
Collaboration Summary:
Cray is developing a Chapel language binding
to the Babel interoperability tool.
The work is purely exploratory
(source is not publicly available yet)
and Babel is providing
whatever consulting and training services
needed to facilitate.
doc's:
Common Component Architecture Core Specification
Babel Manuals:
User's Guide (tar.gz)
Implement a Protocol for BabelRMI (pdf)
Understanding the CCA Specification Using Decaf (pdf)
CCA toolkit
CCA tut's directory
CCA Hands-On Guide 0.7.0 (tar.gz)
Our language interoperability tool, Babel,
makes CCA components interoperable
across languages and CCA frameworks.
Numerous studies have demonstrated
that the overheads of the CCA environment
are small and easily amortized
in typical scientific applications.

specification of components.
. using SIDL,
The current syntactic specification
can be extended to capture
more of the semantics
of component behavior.
For example,
increasing the expressiveness
of component specifications
(the “metadata” available about them)
makes it possible to catch
certain types of errors automatically
. we must leverage the
unique capabilities of
component technology
to inspire new CS research directions.
For example,
the CCA provides a dynamic model
for components,
allowing them to be
reconnected during execution.
This model allows an application to
monitor and adapt itself
by swapping components for others.
This approach, called
computational quality of service,
can benefit numerical, performance,
and other aspects of software.
Enhanced component specifications
can provide copious information
that parallel runtime environments
could exploit to provide
the utmost performance.
. the development and use of
new runtime environments
could be simplified by integrating them
with component frameworks.