Showing posts with label HPCS. Show all posts
Showing posts with label HPCS. Show all posts

2012-08-31

parasail is big win for reliable concurrency

12.7.2: news.adda/co/ParaSail
(Parallel Spec and Impl Lang) is by Ada's Tucker Taft:

first siting of parasail:
11/09/11 Language Lessons: Where New
Parallel Developments Fit Into Your Toolkit

By John Moore for Intelligence In Software
The rise of multicore processors and programmable GPUs
has sparked a wave of developments in
parallel programming languages.
Developers seeking to exploit multicore and manycore systems
-- potentially thousands of processors --
now have more options at their disposal.
Parallel languages making moves of late
include the SEJITS of University of California, Berkeley;
The Khronos Group’s OpenCL;
the recently open-sourced Cilk Plus;
and the newly created ParaSail language.
Developers may encounter these languages directly,
though the wider community will most likely find them
embedded within higher-level languages.

. ParaSail incorporates formal methods such as
preconditions and post-conditions,
which are enforced by the compiler.
In another nod to secure, safety-critical systems,

2012-07-05

supercomputing programming languages

6.20: web.adda/supercomputing programming languages:
personal supercomputing:
The personal supercomputing idea
has also gained momentum thanks to the emergence of
programming languages for GPGPU
(general-purpose computing on GPU's).
. Nvidia has been trying to educate programmers
and build support for CUDA,
the C language programming environment
created specifically for programming GPUs.
Meanwhile, AMD has declared its support for
OpenCL (open computing language) in 2009 .
[6.25: web: the c in OpenCL stands for
computing not concurrency?

OpenCL (Open Computing Language) is the first
open, royalty-free standard for
general-purpose parallel programming of
heterogeneous systems.
OpenCL provides a uniform programming environment
for software developers to write efficient, portable code
for high-performance compute servers,
desktop computer systems and handheld devices
using a diverse mix of multi-core CPUs, GPUs,
Cell-type architectures
and other parallel processors such as DSPs.]
OpenCL is an industry standard programming language.
Nvidia says it also works with developers
to support OpenCL.
roll-your-own personal supercomputers:
. researchers at the University of Illinois were looking to
bypass the long waits for computer time at the
National Center for Supercomputing Applications;
so, they built “personal supercomputers,”
compact machines with a stack of graphics processors
that together can be used to run complex simulations.
. they have a quad-core Linux PC with 8GB of memory
and 3 GPUs (one NVIDIA Quadro FX 5800,
two NVIDIA Tesla C1060) each with 4GB .
any news on darpa's HPCS program
(High Productivity Computing Systems)?

2010-08-30

lang's for supercomputer concurrency

adda/concurrency/lang's for supercomputer concurrency
7.27:
. the DARPA`HPCS program
(High Productivity Computing Systems)
is meant to tame the costs of HPC
(High Performance Computing)
-- HPC is the use of supercomputers for
simulations to solve problems in
chemistry, physics, ecology, sociology,
and esp'ly warfare .
. programmer productivity
means making it easier to develope
code that can make full use of
a supercomputer's concurrency .
. the main source of the cost
is a lack of smart software tools
that can turn science experts
into computer coders .
. toward this end, they are funding
the design of new concurrency lang's .
7.28:
. the DoD and DARPA already made
quite an investment in the
Ada concurrency lang',
a language designed for expressing
the sort of concurrency needed by
embedded system engineers; 7.31:
but, the software developer community
spurned Ada as soon as there was
something better ...
. pascal was popular before '85,
and when Ada arrived '83,
it was popular too;
then came the Visual version
of Basic (familiar and slick).
. the top demanded langs by year were:
'85: C, lisp, Ada, basic;
'95: v.basic, c++, C, lisp, Ada;
'05: Java, C, c++, perl, php,

. the only currently popular lang's
meant for exploiting multi-core cpu's
if not other forms of concurrency, are:
(3.0% rating) obj'c 2.0 (with blocks)
(0.6% rating) Go
(0.4% rating) Ada
7.29:
. whether or not Ada's concurrency model
is well-suited for supercomputers
as well as embedded systems,
it is not increasing coder'productivity .
. while Ada boosted productivity beyond
that offered by C,
it was nevertheless proven to do
less for productivity than Haskell
.

HPC Productivity 2004/Kepner{ 2003(pdf), 2004(pdf) }:
. lang's were compared for
expressiveness vs performance:
. the goal of a high-performance lang'
is to have the expressiveness
of Matlab and Python,
with the performance of VHDL
(VHDL is a version of Ada for ASICs ).
. UPC (Unified Parallel C) and Co-array Fortran
are half way to high-productivity
merely by using PGAS
(Partitioned global address space)
rather than MPI
(Message Passing Interface).
. the older tool set: C, MPI, and openMP
is both slow and difficult to use .

. the 2 lang's that DARPA is banking on now
are Cray`Chapel, and IBM`X10 Java .
. they were also funding Sun`Fortress
until 2006,
which features a syntax like advanced math
-- the sort of greek that physics experts
are expected to appreciate .

2010-07-29

supercomputer power promised in the CHaPeL (Cascade Hi'Productivity Lang)

7.26: news.adda/lang"Chapel(Cascade Hi'Productivity Lang'):

2005.9 PGAS Programming Models Conference/Chapel (pdf):
. the final session of the
Parallel Global Address Space(PGAS)
Programming Models Conference was
devoted to DARPA's HPCS program
(High Productivity Computing Systems):
Cascade[chapel]
, X10[verbose java]
, Fortress[greek]
, StarP[slow matlab].
Locality Control Through Domains:
[7.29:
. domains are array subscript objects,
specifying the size and shape of arrays,
the represent a set of subscripts;
and so, applying a domain to an array
selects a set of array elements .
(recall that term"domain is part of
function: domain-->codomain terminology)
. domains make it easy to work with
sparse arrays, hash tables, graphs,
and interior slices of arrays .]
. domains can be distributed across locales,
which generally correspond to CPUs in Chapel.
This gives Chapel its fundamentally
global character, as contrasted to
the process-centric nature of MPI or CAF,
for example.
When operations are performed on
an array whose domain is distributed,
any needed IPC is implicitly carried out,
-- (inter-processor communication) --
without the need for function calls.
Chapel provides much more generality
in the distribution of domains
than high performance Fortran, or UPC,
but, will not take the place of
the complex domain decomposition tools
required to distribute data for
optimum load balance and communication
in most practical parallel programs.
[7.29:
. HPC systems today are overwhelmingly
distributed memory systems,
and the applications tend to require
highly irregular communication
between the CPUs.
This means that good performance
depends on effective locality management,
which minimizes the IPC costs .
whereas,
productive levels of abstraction
will degrades performance.
Chapel simply allows mixing
a variety of abstraction levels .]

Cray's Chapel intro (pdf):
Chapel (Cascade High Productivity Language)
is Cray's programming language for
supercomputers, like the Cascade system;
part of the Cray Cascade project,
a participant in DARPA's HPCS program
(High Productivity Computing Systems ) .
influences:
. iterators:
CLU, Ruby, Python
. latent types:
ML, Scala, Matlab, Perl, Python, C#
. OOP, type safety:
Java, C#:
. generic programming/templates:
C++
. data parallelism, index sets, distributed arrays:
ZPL (Z-level Programming Language)
HPF (High-Performance Fortran),
. task parallelism, synchronization:
Cray's MTA's extensions to Fortran and C.
(Multi-Threaded Architecture)

Global-view vs Fragmented models:
. in Fragmented programming models,
the Programmer's point-of-view
is a single processor/thread;
. the PGAS lang" UPC (unified parallel C),
has a fragmented compute model
and a Global-View data model .
. the shared mem' sytems" OpenMP & pThreads
have a trivially Global View of everything .
. the HPCS lang's, including Chapel,
hava a Global View of everything .

too-low- & too-high-level abstractions
vs multiple levels of design:

. openMP, pthreads, MPI are low-level & difficult;
HPF, ZPL are high-level & inefficient .
Chapel has a mix of abstractions for:
# task scheduling levels:
. work stealing; suspendable tasks;
task pool; thread per task .
# lang' concept levels:
. data parallelism; distributions;
task parallelism;
base lang; locality control .
# mem'mgt levels:
. gc; region-based;
manual(malloc,free) .
. chapel`downloads for linux and mac .
readme for Chapel 1.1
The highlights of this release include:
parallel execution of all
data parallel operations
on arithmetic domains and arrays;
improved control over the
degree and granularity of parallelism
for data parallel language constructs;
feature-complete
Block and Cyclic distributions;
simplified constructor calls for
Block and Cyclic distributions;
support for assignments between,
and removal of indices from,
sparse domains;
more robust performance optimizations on
aligned arithmetic domains and arrays;
many programmability
and correctness improvements;
new example programs demonstrating
task parallel concepts and distributions;
wide-ranging improvements
to the content and organization of the
language specification.
This release of Chapel contains
stable support for the base language,
and for task and regular data parallelism
using one or multiple nodes.
Data parallel features
on irregular domains and arrays
are supported via a single-threaded,
single-node reference implementation.
impl' status:
No support for inheritance from
multiple or generic classes
Incomplete support for user-defined constructors
Incomplete support for sparse arrays and domains
Unchecked support for index types and sub-domains
No support for skyline arrays
No constant checking for domains, arrays, fields
Several internal memory leaks
Task Parallelism
No support for atomic statements
Memory consistency model is not guaranteed
Locality and Affinity
String assignment across locales is by reference
Data Parallelism
Promoted functions/operators do not preserve shape
User-defined reductions are undocumented and in flux
No partial scans or reductions
Some data parallel statements are serialized
Distributions and Layouts
Distributions are limited to Block and Cyclic
User-defined domain maps are undocumented and in flux
7.27 ... 28:
IJHPCA` High Productivity Languages and Models
(Internat' J. HPC App's, 2007, 21(3))


Diaconescua, Zima 2007`An Approach to Data Distributions in Chapel (pdf):
--. same paper is ref'd here:
Chapel Publications from Collaborators:
. they note:
"( This paper presents early exploratory work
in developing a philosophy and foundation for
Chapel's user-defined distributions ).
User-defined distributions
are first-class objects:
placed in a library,
passed to functions,
and reused in array declarations.
In the simplest case,
the specification of a
new distribution
can consist of just a few lines
of code to define mappings between
the global indices of a data structure
and memory;
in contrast, a sophisticated user
(or distribution writer)
can control the internal
representation and layout of data
to an almost arbitrary degree,
allowing even the expression of
auxiliary structures
typically used for distributed
sparse matrix data.
Specifically,
our distribution framework is designed to
support:
• The mapping of arbitrary data collections
to units of locality,
• the specification of user-defined mappings
exploiting knowledge of data structures
and their access patterns,
• the capability to control the
layout (allocation) of data
within units of locality,
• orthogonality between
distributions and algorithms,
• the uniform expression of computation
for both dense and sparse data structures,
• reusability and extensibility
of the data mapping machinery itself,
as well as of the common data mapping patterns
occurring in various application domains.

Our approach is the first
that addresses these issues
completely and consistently
at a high level of abstraction;
in contrast to the current programming paradigm
that explicitly manages data locality
and the related aspects of synchronization,
communication, and thread management
at a level close to what assembly programming
was for sequential languages.
The challenge is to allow the programmer
high-level control of data locality
based on the knowledge of the problem
without unnecessarily burdening the
expression of the algorithm
with low-level detail,
and achieving target code performance
similar to that of
manually parallelized programs.

Data locality is expressed via
first-class objects called distributions.
Distributions apply to collections
of indices represented by domains,
which determine how arrays
associated with a domain
are to be mapped and allocated across
abstract units of uniform memory access
called locales.
Chapel offers an open concept of distributions,
defined by a set of classes
which establish the interface between
the programmer and the compiler.
Components of distributions
are overridable by the user,
at different levels of abstraction,
with varying degrees of difficulty.
Well-known regular standard distributions
can be specified along with
arbitrary irregular distributions
using the same uniform framework.
There are no built-in distributions
in our approach.
Instead, the vision is that
Chapel will be an open source language,
with an open distribution interface,
which allows experts and non-experts
to design new distribution classes
and support the construction of
distribution libraries that can be
further reused, extended, and optimized.
Data parallel computations
are expressed via forall loops,
which concurrently iterate over domains.

. the class of PGAS languages
including Unified Parallel C (UPC)
provide a reasonable improvement
over lower-level communications with MPI.
. UPC`threads support block-cyclic
distributions of one-dimensional arrays
over a one-dimensional set of processors .
and a stylized upc forall loop
that supports an affinity expression
to map iterations to threads.

. the other DARPA`HPCS lang's
provide built-in distributions
as well as the possibility to create
new distributions from existing ones.
However,
they do not contain features
for specifying user-defined
distributions and layouts.
Furthermore,
X10’s locality rule requires
an explicit distinction between
local and remote accesses
to be made by the programmer
at the source language level.
The key differences between
existing work and our approach
can be summarized as follows.
First,
we provide a general oop framework
for the specification of
user-defined distributions,
integrated into an advanced
high-productivity parallel language.
Secondly,
our framework allows the
flexible formulation
of data distributions,
locale-internal data arrangements,
and associated control mechanisms
at a high level of abstraction,
tuned to the properties of
architectures and applications.
This ensures
target code performance
that is otherwise achievable only via
low-level control.
. Chapel Publications and Papers .
. Chapel Specification [current version (pdf)] .


Parallel Programmability and the Chapel Language (pdf)
bradc@cray.com, d.callahan@microsoft.com, zima@jpl.nasa.gov`
Int.J. High Performance Computing App's, 21(3) 2007

This paper serves as a good introduction
to Chapel's themes and main language concepts.
7.28: adda/concurrency/chapel/Common Component Architecture:
Common Component Architecture (CCA) Forum

2005 CCA Whitepaper (pdf):
. reusable scientific components
and the tools with which to use them.
In addition to developing
simple component examples
and hands-on exercises as part of
CCA tutorial materials,
we are growing a CCA toolkit of components
that is based on widely used software packages,
including:
ARMCI (one-sided messaging),
CUMULVS (visualization and parallel data redistribution),
CVODE (integrators), DRA (parallel I/O),
Epetra (sparse linear solvers),
Global Arrays (parallel programming),
GrACE (structured adaptive meshes),
netCDF and parallel netCDF (input/output),
TAO (optimization), TAU (performance measurement),
and TOPS (linear and nonlinear solvers).
Babel (inter-lang communication):
Babel is a compiler
that generates glue code from
SIDL interface descriptions.
(Scientific Interface Description Language)
SIDL features support for
complex numbers, structs,
and dynamic multidimensional arrays.
SIDL provides a modern oop model,
with automatic ref'counting
and resource (de)allocation.
-- even on top of traditional
procedural languages.
Code written in one language
can be called from any of the
supported languages.
Full support for
Remote Method Invocation (RMI)
allows for parallel distributed applications.

Babel focuses on high-performance
language interoperability
within a single address space;
It won a prestigious R&D 100 award in 2006
for "The world's most
rapid communication among
many languages in a single application."

. Babel currently fully supports
C, C++ Fortran, Python, and Java.
-- Chapel is coming soon:
CCA working with chapel 2009:
Babel migration path for chapel:
Collaboration Status: Active
TASCS Contact: Brad Chamberlain, Cray mailto:bradc@cray.com
Collaboration Summary:
Cray is developing a Chapel language binding
to the Babel interoperability tool.
The work is purely exploratory
(source is not publicly available yet)
and Babel is providing
whatever consulting and training services
needed to facilitate.
doc's:
Common Component Architecture Core Specification
Babel Manuals:
User's Guide (tar.gz)
Implement a Protocol for BabelRMI (pdf)
Understanding the CCA Specification Using Decaf (pdf)
CCA toolkit
CCA tut's directory
CCA Hands-On Guide 0.7.0 (tar.gz)
Our language interoperability tool, Babel,
makes CCA components interoperable
across languages and CCA frameworks.
Numerous studies have demonstrated
that the overheads of the CCA environment
are small and easily amortized
in typical scientific applications.

specification of components.
. using SIDL,
The current syntactic specification
can be extended to capture
more of the semantics
of component behavior.
For example,
increasing the expressiveness
of component specifications
(the “metadata” available about them)
makes it possible to catch
certain types of errors automatically
. we must leverage the
unique capabilities of
component technology
to inspire new CS research directions.
For example,
the CCA provides a dynamic model
for components,
allowing them to be
reconnected during execution.
This model allows an application to
monitor and adapt itself
by swapping components for others.
This approach, called
computational quality of service,
can benefit numerical, performance,
and other aspects of software.
Enhanced component specifications
can provide copious information
that parallel runtime environments
could exploit to provide
the utmost performance.
. the development and use of
new runtime environments
could be simplified by integrating them
with component frameworks.