2009-12-29

c-- (c minus minus.org)

5.1: news.addm/c--/haskell won't be using LLVM:

In this post I will elaborate on
why some people think
C-- has more promise than LLVM
as a substrate for lazy, functional languages.
Let me start by making one thing clear:
LLVM does have support for garbage collectors.
I am not disputing that.
However, as Henderson has shown,
so does C and every other language.
The question we have to ask is not
"Does this environment support garbage collection?"
but rather
"How efficiently does this environment
support garbage collection?".
To recap,
Henderson's technique involves placing
root pointers
(the set of pointers which can be
followed to find all live data)
on a shadow stack.
Since we manage this stack ourself,
it shouldn't be a problem for the GC to walk it.
In short, each heap allocation incurs
an unnecessary stack allocation
and heap pointers are
never stored in registers for long.

Now what does this mean for
languages like Haskell?
Well, unlike programs written in
more traditional languages,
a Haskell application might very well
do between 10 and 20 million
heap allocations per second.
Writing Haskell programs is more about
producing the correct data stream
than it is about performing the right side-effects
. It's common for functions in Haskell
to manipulate data without execuing
any side-effects. (Think spreadsheets.)
This way of computing obviously requires
a very cheap method of allocation.
Performing 10 million unnecessary
stack allocations per second
would severely hurt performance,
and not having heap pointers in registers
could easily be equally devastating.

So what about LLVM?
Shouldn't the built-in GC support in LLVM
be more efficient than any cheap hack?
Well, it turns out it isn't.
The conflict between garbage collection
and optimizations haven't changed,
and neither have the solution:
disabling or bypassing optimizations.
This in turn means unnecessary stack allocations
and sub-optimal use of registers.

That LLVM'ers haven't solved the problem of
zero-overhead garbage collection
isn't too surprising
. Solving this while staying agnostic of the data model
is an open question in computer science.
It is here C-- differs from LLVM
. C-- is a research project that aims at solving
difficult problems such as supporting efficient GCs
and cheap concurrency.
LLVM, on the other hand, is an engineering project.

In conclusion:
garbage collection in LLVM incurs
unacceptable overhead,
and while C-- and LLVM do have some overlap,
the problems they're trying to solve are quite different.
Posted by David Himmelstrup at 11:52 AM
5.2: co.addm/stackoverflow.com/llvm vs c--:


I've been excited about llvm being
low enough to model any system
and saw it as promising
that Apple was adopting it,
but then again
they don't specifically support Haskell,
and some think that Haskell
would be better off with c--
adding that there's
nothing llvm can do to improve .

> That LLVM'ers haven't solved the problem of
zero-overhead garbage collection
> isn't too surprising .
> Solving this while staying agnostic of the
data model
> is an open question in computer science.
I am refering to

5.9: answer accepted:

Well, there is a project at UNSW
to translate GHC Core to LLVM
Remember: it wasn't clear 10 years ago
that LLVM would build up all the
infrastructure C-- wasn't able to
. Unfortunately,
LLVM has the infrastructure for
portable, optimized code,
but not the infrastructure
for nice high level language support,
that C-- ha(s)d.
An interesting project
would be to target LLVM from C-- ..

comment to answer:
. great answer; that was
just the blindspot-undo I was looking for!
. llvm'ers had a similar response
to the lack of concurrency support:
it's an add-on library thing .
. c-- can be ported to llvm,
meaning that llvm's gc simply won't be used .


11.9: web.adda/c--/review:


C-- is a compiler-target language.
The idea is that a compiler for a high-level language
translates programs into into C--,
leaving the C-- compiler to generate native code.
C--'s major goals are these:

C-- is not "(write-once, run-anywhere) .
It conceals most architecture-specific details,
such as the number of registers, but it exposes some.
In particular, C-- exposes the word size, byte order,
and alignment properties of the target architecture, for two reasons.
First, to hide these details would require
introducing a great deal of complexity, inefficiency, or both
-- especially when the front-end compiler
needs to control the representation of its high-level data types.
Second, these details are easy to handle in a front-end compiler.
Indeed, a compiler may benefit, because
it can do address arithmetic using integers
instead of symbolic constants such as FloatSize and IntSize.
web.adda/what do the c-- folks think of llvm?

summary:
. why isn't the llvm project working for c-- users?
llvm makes the assumption that there exists a generic assembler,
and c--, by assuming otherwise,
is not about portability:
the current version targets only the intel'86 architecture .

I do not understand the assertion that LLVM is uncooperative.
The direction LLVM takes is driven entirely by contributors.
I suggest you embrace this
and implement the necessary GC support in LLVM.
The devs would likely be happy to help out with any problems;
the team is *very* helpful.
Furthermore,
that support would open the door to implementing
other similar functional languages in LLVM,
rather making more isolated code islands.
In the long run, LHC will win *big*
by having that same code used by others
(and tested, and expanded.)
There are many things for which it is reasonable to have
NIH (Not Invented Here syndrome).
In 2009, a fast code generator is not one of them.
David Himmelstrup said...
It's unsolved in the academic sense of the word.
Solving it requires research and not engineering.
If I knew how to solve it, I definitely would add it to LLVM.
It's only unsolved in the general case.
I doubt, however, that LLVM is interested in my specific data model
(which is in a state of flux, even).
what I want to do
can't yet be done by any general-purpose compiler.
Chris Lattner
Sun, 17 Dec 2006 12:45:42 -0800
LLVM is written in C++, but, like C--, it provides first-class support for
intermediate representation written as a text file (described here:
http://llvm.org/docs/LangRef.html), which allows you to write your
compiler in the language that makes the most sense for you.

In addition to the feature set of C--, LLVM provides several useful pieces
of infrastructure: a C/C++/ObjC front-end based on GCC 4, JIT support,
aggressive scalar, vector (SIMD), data layout, and interprocedural
optimizations, support for X86/X86-64/PPC32/PPC64/Sparc/IA-64/Alpha and
others, far better codegen than C--, etc. Further, LLVM has a vibrant
community, active development, large organizations using and contributing
to it (e.g. Apple), and it is an 'industrial strength' tool, so you don't
spend the majority of your time fighting or working around our bugs :).

Like C--, LLVM doesn't provide with a runtime (beyond libc :) ), which can
be a good thing or a bad thing depending on your language (forcing you to
use a specific runtime is bad IMHO). I would like to see someone develop
a runtime to support common functional languages out of the box better
(which language designers could optionally use), but no-one has done so
yet.

OTOH, C-- does have some features that
LLVM does not yet have first class support for.
LLVM does not currently support for generating efficient code
that detects integer arithmetic overflow, doesn't expose the
rounding mode of the machine for FP computation, and does not yet support
multiple return values, for example.

While it is missing some minor features, one of the most important
features of LLVM is that it is relatively easy to extend and modify. For
example, right now LLVM's integer type system consists of signed and
unsigned integers of 1/8/16/32 and 64-bits. Soon, signedness will be
eliminated (giving us the equivalent of C--'s bits8/bits16/bits32/bits64
integer types) and after that, we plan to generalize the integer types to
allow any width (e.g. bits11). This is intended to provide better support
for people using LLVM for hardware synthesis, but is also useful for
precisely constrainted types like those in Ada (i.e. it communicates value
ranges to the optimizer better).

> I think the three new things I'd like to see out of C-- are (in rough
> order of priority):
> 1) x86-64 support
> 2) the ability to move/copy a stack frame from one stack to another, and
> 3) Some form of inline assembler without having to go to C (necessary for
> writting threading primitives in C--)

LLVM provides #1 and #3 'out of the box'. #2 requires runtime
interaction, which would be developed as part of the runtime aspect.

For me, one disappointment of the LLVM project so far is that we have not
been very successful engaging the functional language community. We have
people that use LLVM as "just another C/C++/Objc compiler", we have people
that reuse the extant front-ends and optimizer to target their crazy new
architectures, and we have mostly-imperative language people (e.g. python)
using LLVM as an optimizer and code generator. If we had a few
knowledgable people who wanted to see support for functional languages
excel, I believe LLVM could become the premier host for the functional
community.

If you are considering developing aggressive new languages, I strongly
recommend you check out LLVM. The llvmdev mailing list
is a great place to ask questions.
2006
> For me, one disappointment of the LLVM project so far
is that we have not been very successful engaging the
functional language community.

If you want to engage functional programmers,
you're not publishing in the right places.
PLDI gave up on functional programming long ago,
(Programming Language Design and Implementation)
and therefore
many functional programmers
no longer pay much attention to PLDI.

. the largest stumbling blocks for the industry adoption of
languages like Haskell and c--
is the fact that it still markets itself as
some mathematics/computer science professor's little experimental project.
I feel C-- still suffers a bit from "professor's pet project" syndrome a bit .

> - GCC: Still quite complicated to work with, still requires you to write
> your compiler in C. Implementing a decent type system is going to be
> interesting enough in Ocaml or Haskell, I'll pass on doing that in C.
> Which means a hybrid compiler, with a lot more complexity. Also,
> functional languages are definately still second class citizens in GCC
> world- things like tail call optimization are still not where they need to
> be. Which means implementing an optimization layer above GCC to deal with
> tail calls. Plus you still have all the run time library issues you need
> to deal with- you still need to write a GC, exception handlers, threading,
> etc. On the plus side, you do get a lot of fancy optimizations- SSE use,
> etc.
Where functional programming really shines, I think,
is programming in the large- word processors and CAD/CAM systems etc.
It's when you start dealing with things like maintainance
and large scale reuse and multithreading that
> functional programming really spreads it's wings and flies.
And, unlike scripting/web programming, performance really does matter.

>
> - Use C as a back-end. You're writing your own runtime again, tail
> recursion is poorly supported again, and a lot of function programming
> constructs don't map well to C.

> - Use C--. You still have to implement your runtime, but you're basically
> going to have to do that anyways. You get decent optimization, you get to
> write your compiler in the language you want to, and functional languages
> are first class languages.
>
> Of these options, I think C-- (assuming it's not a dead project) is the
> best of the lot. Even if it needs some work (an x86-64 back end, the
> ability to move a stack frame from one stack to another), it'll be no more
> work than any other option. My second choice would be GCC as a back end,
> I think. But the point here is that the fundamental niche C-- fills is
> still usefull and needed.
>

LLVM is very C-ish,
and makes it rather awkward to have
procedure environments and goto's out of procedures

Oct 2008 01:45:11 -0700
| Most of our users have reported that it is very easy to adapt a legacy
| compiler to generate C-- code, but nobody has been willing to attempt
| to adapt a legacy run-time system to work with the C-- run-time interface.

I don't know whether this'll be any use to anyone except us,
but we're using C-- like crazy inside GHC (the Glasgow Haskell Compiler).
But not as an arms-length language.
Instead,
inside GHC's compilation pipeline we use C-- as an internal data type;
and after this summer's work by John Dias,
we now have quite a respectable
story on transforming,
and framework for optimizing,
this C-- code.
Since some of the runtime system is written in C--,
we also have a route for parsing C-- and compiling it down the same pipeline.
All that said,
this is a *GHC specific* variant of C--.
It does not support the full generality of C--'s runtime interface
(it is specific to GHC's RTS), nor is it intended as a full C-- implementation.
In its present state it's not usable as a standalone C-- compiler.
Still, it is a live, actively-developed implementation
of something close to C--, and so might be of interest to some.

The OCaml Journal has published around 40 articles now. The most popular and
third most popular articles are both about LLVM. So I don't think it is
correct to say that "functional language people don't like LLVM". Indeed, I
thought I was a kook for trying to write a compiler for a functional language
using LLVM until I mentioned it to the OCaml community and half a dozen
people stepped forward with their own alternatives. :-)