supercomputing programming languages

6.20: web.adda/supercomputing programming languages:
personal supercomputing:
The personal supercomputing idea
has also gained momentum thanks to the emergence of
programming languages for GPGPU
(general-purpose computing on GPU's).
. Nvidia has been trying to educate programmers
and build support for CUDA,
the C language programming environment
created specifically for programming GPUs.
Meanwhile, AMD has declared its support for
OpenCL (open computing language) in 2009 .
[6.25: web: the c in OpenCL stands for
computing not concurrency?

OpenCL (Open Computing Language) is the first
open, royalty-free standard for
general-purpose parallel programming of
heterogeneous systems.
OpenCL provides a uniform programming environment
for software developers to write efficient, portable code
for high-performance compute servers,
desktop computer systems and handheld devices
using a diverse mix of multi-core CPUs, GPUs,
Cell-type architectures
and other parallel processors such as DSPs.]
OpenCL is an industry standard programming language.
Nvidia says it also works with developers
to support OpenCL.
roll-your-own personal supercomputers:
. researchers at the University of Illinois were looking to
bypass the long waits for computer time at the
National Center for Supercomputing Applications;
so, they built “personal supercomputers,”
compact machines with a stack of graphics processors
that together can be used to run complex simulations.
. they have a quad-core Linux PC with 8GB of memory
and 3 GPUs (one NVIDIA Quadro FX 5800,
two NVIDIA Tesla C1060) each with 4GB .
any news on darpa's HPCS program
(High Productivity Computing Systems)?

HPCS systems have the following attributes:
(1) Performance: Improve the computational efficiency
and reduce the execution time of
critical national security applications.
(2) Programmability: Reduce cost and time .
(3) Portability: Insulate apps from system specifics.
(4) Robustness: Improve reliability
and reduce vulnerability to intentional attacks.
darpa's new project is UHPC 2010:
(Ubiquitous High Performance Computing)
The UHPC program seeks to develop
the architectures and technologies
that will provide the underpinnings,
framework and approaches
for improving power consumption, cyber resiliency,
and programmer productivity,
while delivering a thousand-fold increase
in processing capabilities.
6.25: UHPC details:
. DARPA, in 2010, selected four "performers"
to develop prototype systems for its UHPC program:
Intel, NVIDIA, MIT, and Sandia National Laboratory.
Georgia Tech was also tapped
to head up an evaluation team
for the systems under development.

NVIDIA is teaming with Cray [designer of Chapel.lang],
Oak Ridge National Laboratory
and six universities to design its ExtremeScale prototype.
As recently as five years ago,
NVIDIA was not seen as the
cutting-edge of supercomputing,
[but GPU's are apparently the future
of massive concurrency ].

. Intel's 2010 revival of Larrabee,
as the Many Integrated Core (MIC) processor,
may be of use in its UHPC designs .
. Intel's prior concurrency tools include Ct language,
Threading Building Blocks, and Parallel Studio .

The first UHPC prototype systems
are slated to be completed in 2018.
. single-cabinet UHPC systems will need to deliver
a petaflop of High Performance Linpack (HPL)
and achieve an energy efficiency of at least
50 gigaflops/watt (100 times more efficient
than today's supercomputers.)
. the programmer should be able to implement parallelism
without using MPI,
or any other communication-based mechanism.
In addition, the operating system for these machines
has to be "self-aware,"
such that it can dynamically manage performance,
dependability and system resources.
what's new with Cray's Chapel.lang? targeting GPUs:

Performance Portability with the Chapel Language.
IPDPS 2012, May 2012.
. Chapel targets GPUs and multicore processors
using a unified set of language concepts.
It has been widely shown that high-throughput
computing architectures such as GPUs
offer large performance gains compared with
their traditional low-latency counterparts .
The downside to these architectures
is  the current programming models:
lower-level languages, loss of portability,
explicit data movement,
and challenges in performance optimization.
. there are novel methods and compiler transformations
that increase programmer productivity
by enabling users of the language Chapel
to provide a single code implementation
that the compiler can then use to
target not only conventional multiprocessors,
but also high-throughput and hybrid machines.
Rather than resorting to
different parallel libraries or annotations
for a given parallel platform,
this work leverages a language that has been
designed from first principles
to address the challenge of programming
for parallelism and locality.
This also has the advantage of providing portability
across different parallel architectures.
Finally, this work presents experimental results
from the Parboil benchmark suite
which demonstrate that codes written in Chapel
achieve performance comparable
to the original versions implemented in CUDA
on both GPUs and multicore platforms.
Chapel Language Specification (version 0.91),
April 19, 2012. Chapel Team, Cray Inc.,

. I wonder if darpa expects the UHPC teams
to incorporate darpa's recent language investments
embodied in chapel lang ?
. maybe they are taking an integrated approach
because some of the energy efficiency
will depend on the OS and the OS's native language .
. the problem with Chapel is that the UHPC requirements
specifically preclude such languages
that implement parallelism by asking the programmer
to use some communication-based mechanism.
. the idea is that the compiler should be able to
analyze a conventional program
and automatically determine
which parts are naturally concurrent;
eg, such a compiler might be
a translator converting C into Chapel .
. the lead designer of Ada 1995 is currently working on
ParaSail, a GPU programming language that allows
"( many things to proceed in parallel by default,
effectively inserting implicit parallelism everywhere ).
. this is the sort of thing UHPC wants .

6.20: web: GPGPU programming in Russia:
NUDA: extensible languages for GPGPU
Wed, 06/15/2011 - 15:35 — diemenator
Graphics processing units are mostly programmed with
low-level tools, such as CUDA or OpenCL.
In order to port an application to GPU,
a number of transformations and optimizations
must be performed by hand.
This makes the resulting program
hard to read and maintain.
And even if such a program is portable
across various GPU architectures,
performance is often lost.
A higher-level GPGPU programming system
is therefore desired.
NUDA (Nemerle Unified Device Architecture)
is an LGPL-licensed project aimed at
creating such a system based on Nemerle,
an extensible programming language.
Extensible languages have flexible syntax and semantics
which can be extended to provide
a better tool for solving specific problems.
NUDA adds support for
writing GPU kernels using Nemerle.
Kernels are translated into OpenCL at compile time,
and executed on a GPU in runtime.
Code for marshalling parameters,
copying data arrays to and from GPU memory,
and invoking the kernel
is automatically generated by NUDA.
NUDA is used internally in RCC MSU
(Research Computing Center of Moscow State University)
for a number of GPU-related projects.
That includes a system of GPU performance tests,
and researching collisions of hash functions.
We are also investigating how extensible languages
can be used to simplify cluster programming,
as well as provide better GPU performance.
An effort is currently under way to provide
NUDA-based extensions for FPGA programming.
6.20: web: japan's linux-based supercomputer:
Fujitsu and Japan's Institute of Physical and Chemical Research, RIKEN,
today announced that RIKEN has decided to employ a
new system configuration with a scalar processing architecture
for its next-generation supercomputer.
The supercomputer is being sponsored by Japan's Ministry of
Education, Culture, Sports, Science, and Technology (MEXT)
as part of its project for the "Development and Use of an
Advanced, High-Performance, General-Purpose Supercomputer"
(Next-Generation Supercomputer Project).
The new architecture is for a scalar supercomputer, utilizing a
distributed-memory parallel computing system
Configuration details are described below.
. installation of the next-generation supercomputer
will begin in fiscal 2010 and should be ready in 2012.

System software and other features
    A software suite is simultaneously being developed
    to make full use of the system's network performance
    and CPUs with an error-recovery function.
    In particular,
Linux was chosen as the supercomputer's OS,
    equipped with standard programming languages,
    and communications libraries.
    The reliability of each hardware component
    needs to be maximized to ensure the stability
    of the ultra-large-scale system,
    so that a single component failure
    does not impact the entire system's operation.
    . this will depend on the CPU's error-recovery function
    and a network with excellent fault tolerance  .

(4) Direct-connection network, torus network
In a direct-connection network,
the entire network consists of numerous connections
between pairs of nodes.
In an indirect-connection network,
a switch sits between multiple nodes.
A three-dimensional torus network is a kind of
direct-connection network where the nodes are organized into
a three-dimensional structure,
and each is linked to six others,
forming a ring structure on each dimension.

(6) Distributed-memory parallel computing system
. a given node cannot directly access a different node's memory.
Given that the memory cannot be directly accessed,
data transmission is necessary in advance.
For these communications to work,
the typical distributed-memory parallel computing system
requires a high-performance network.

(7) Error-recovery function
The CPU has functions for detecting and correcting
erroneous data and instruction retry, if a fault occurs.
6.25: Japan's language?
The Next-Generation Supercomputer (2009)
harnesses the following system software:
. The adoption of Linux Operating System (OS)
providing high portability .
. A rich programming language suite that allows for
continuity of software assets
--[whatever gcc supports (includes prolog)]
. MPI (Message Passing Interface) Library
used in advanced data communication for parallelization
High-performance, highly functional and system-optimized
scientific and numerical library .

The Next-Generation Supercomputer,
a parallel machine connecting together many processors,
requires advanced software technology
to make use of the functions of all its computers.
In order to expand the Next-Generation Supercomputer
into an even larger-scale system,
we are developing more advanced software technology
to exploit this high-level performance,
with the aim of using it in real applications.
We are focusing all our efforts toward
upgrading this software
so that it harnesses the full performance of the
Next-Generation Supercomputer
and contributes to society
-- Kazuo Minami,
Research and Development Group,
Application Development Team Leader
Next-Generation Supercomputer R&D Center, RIKEN
. it will have an interface for the
NAREGI( National Research Grid Initiative) middleware,
one example of external uses of the system through the
Science Information NETwork (SINET).
the 5th generation project:
MITI (Ministry of International Trade and Industry) and
ICOT (Institute for New Generation Computer Technology)
embarked on a Sixth Generation Project in the 1990s.

A primary problem of the 5th gen project
was the choice of concurrent logic programming
as the bridge between the parallel computer architecture
and the use of logic as a knowledge representation
and problem solving language for AI applications.
This never happened cleanly;
a number of languages were developed,
all with their own limitations.
In particular, the committed choice feature of
concurrent constraint logic programming
interfered with the logical semantics of the languages:
Logic Programming can be broadly defined as
"using logic to infer computational steps
from existing propositions"
However, mathematical logic cannot always infer computational steps
because computational systems make use of arbitration
for determining which message is processed next
by a recipient that is sent multiple messages concurrently.
Since arrival orders are in general indeterminate,
they cannot be inferred from
prior information by mathematical logic alone.

Therefore mathematical logic cannot in general implement computation.
Consequently, Procedural Embedding of Knowledge
is strictly more general than Logic Programming.
This conclusion is contrary to Robert Kowalski who stated
"Looking back on our early discoveries,
I value most the discovery that
computation could be subsumed by deduction."
Nevertheless, logic programming (like functional programming)
can be a useful programming idiom.
 Over the course of history, the term "functional programming"
 has grown more precise and technical as the field has matured.
 Logic Programming should be on a similar trajectory.
 Accordingly, "Logic Programming" should have
a general precise characterization.
 Kowalski's approach has been to advocate
 limiting Logic Programming to
 backward-chaining only inference based on resolution
 using reduction to conjunctive normal form
 in a global states model.
 In contrast, our approach is explore Logic Programming
 building on the logical inference of computational steps
 using inconsistency-robust reasoning in a configurations model.
 Because contemporary large software systems
are pervasively inconsistent,

 it is not safe to reason about them using classical logic,
 e.g., using resolution theorem proving.
Sixth generation of computers (1990 -till date):
This present generation of computer technology
is highly related with parallel computing
and several growth areas has been noticed in this area,
in both hardware part and in the better understanding of
how to develop algorithms to make full use of
massive parallel architectures.
Though vector system is equally in use,
it is often speculated that the future would be
dominated by parallel systems.
However, there are several devices where there are
combinations of parallel-vector architectures.
Fujitsu Corporation is planning to build a system with
more than 200 vector processors.
Currently, the processors are constructed with
a combination of RISC, pipelining
and parallel processing.
Networking technology is spreading rapidly
and one of the most conspicuous growths of
the sixth generation computer technology
is the huge growth of WAN.
For regional network, T1 is the standard
and the national "backbone" uses T3
to interconnect the regional networks.
Finally, the rapid advancement and high level of awareness
regarding computer technology
is greatly indebted to the two legislations.
Just like the Lax report of 1982,
the High Performance Computing Act of 1991,
Information Infrastructure, and technology Act of 1992
have strengthened and ensured
the scope of high performance computing.
Sixth Generation @ phy.ornl.gov
Workstation technology has continued to improve,
with processor designs now using a combination of
RISC, pipelining, and parallel processing.
As a result, it is now possible with $30,000
to purchase a 100 megaflops workstation
-- same power as 4th generation supercomputers.
This development has sparked an interest in
heterogeneous computing:
a program started on one workstation
can find idle workstations elsewhere in the local network
to run parallel subtasks.
(Peter Lax led the very influential Lax Report, 1982,
which noted the lack of supercomputing facilities
in universities, and argued that
scientists without access to supercomputers
were eschewing problems requiring computational power
and jeopardizing American scientists’ leadership
in certain research areas)
A little over a decade after the warning
voiced in the Lax report,
the future of our computational science infrastructure
is bright.
The federal commitment to high performance computing
has been further strengthened with the passage of
two particularly significant pieces of legislation:
the High Performance Computing Act of 1991,
which established the HPCCP
(High Performance Computing and Communication Program)
and Sen. Gore's
Information Infrastructure and Technology Act of 1992,
which addresses a broad spectrum of issues
ranging from high performance computing
to expanded network access
and the necessity to make leading edge technologies
available to educators
from kindergarten through graduate school.
language in Japan's sixth-generation computing:
The term "sixth-generation computing proposal"
is the name used in the Japanese press
for the report, Promotion of Research and Development
on Electronics and Information Systems
that may Complement or Substitute for
Human Intelligence,
from the Subcommittee on Artificial Intelligence
of The Council on Aerospace, Electronics
and Other Advanced Technologies in Tokyo.
. in January 1983, The Council was asked by
the Ministry of Science and Technology
to report on artificial intelligence in these terms.
It used the term knowledge science for its subject matter,
and reported in March 1985.

Intelligent robot systems/
Problems in research and development to be solved in the future:
Programming languages must be developed
which validate input,
movement description,
parallel movement control,
use environmental models,
graphic and speech understanding,
and effective human-computer interfaces.

Technologies related to problem-solving/
Problems in research and development to be solved in the future:
# Intelligent programming technologies
Systems must developed for automatically generating
problem-solving procedures.
# Computer languages
Languages going beyond Lisp and Prolog,
based on a new logic system, must be developed.

Machine translation systems:
Processing of natural language for meaning and content
must be developed together with a large lexicon
and interactive modification.
. related to the construction of systems:
The possibility of a general intermediate language
should be investigated.

more about Sixth Generation:
"Software Report: Sixth Generation Project"
IEEE Micro, vol. 11, no. 2, pp. 11, 84-85, Mar./Apr. 1991,
1996 Foundations of Distributed Artificial Intelligence
. distributed logic-oriented programming: ICP-][
(chu, 1993)
integrates prolog (a commited-choice
non-deterministic concurrent logic prog'lang)
and ICP, an extended prolog dialect;
allows via internet coordination of
concurrently executing programs .
MECCA(mulitagent environ for constructiong cooperative apps)
implements MAIL agent model, and uses ICP-][ .
IC* Prolog ][ and L&O:
'I.C Prolog ][: a Language for Implementing
Multi-Agent Systems',

Yannis Cosmadopoulos and Damian Chu,
In Tutorial and Workshop on Cooperating Knowledge Based Systems,
September 23-25, 1992, Keele University, UK.
. The programming methodology behind the language
is described in the book
'Logic & Objects' by Frank McCabe (Prentice Hall Intl.).
* IC? developed at Imperial College ...