5.1: news.addm/c--/haskell won't be using LLVM:
In this post I will elaborate onwhy some people thinkC-- has more promise than LLVMas a substrate for lazy, functional languages.Let me start by making one thing clear:LLVM does have support for garbage collectors.I am not disputing that.However, as Henderson has shown,so does C and every other language.The question we have to ask is not"Does this environment support garbage collection?"but rather"How efficiently does this environmentsupport garbage collection?".To recap,Henderson's technique involves placingroot pointers(the set of pointers which can befollowed to find all live data)on a shadow stack.Since we manage this stack ourself,it shouldn't be a problem for the GC to walk it.In short, each heap allocation incursan unnecessary stack allocationand heap pointers arenever stored in registers for long.Now what does this mean forlanguages like Haskell?Well, unlike programs written inmore traditional languages,a Haskell application might very welldo between 10 and 20 millionheap allocations per second.Writing Haskell programs is more aboutproducing the correct data streamthan it is about performing the right side-effects. It's common for functions in Haskellto manipulate data without execuingany side-effects. (Think spreadsheets.)This way of computing obviously requiresa very cheap method of allocation.Performing 10 million unnecessarystack allocations per secondwould severely hurt performance,and not having heap pointers in registerscould easily be equally devastating.So what about LLVM?Shouldn't the built-in GC support in LLVMbe more efficient than any cheap hack?Well, it turns out it isn't.The conflict between garbage collectionand optimizations haven't changed,and neither have the solution:disabling or bypassing optimizations.This in turn means unnecessary stack allocationsand sub-optimal use of registers.That LLVM'ers haven't solved the problem ofzero-overhead garbage collectionisn't too surprising. Solving this while staying agnostic of the data modelis an open question in computer science.It is here C-- differs from LLVM. C-- is a research project that aims at solvingdifficult problems such as supporting efficient GCsand cheap concurrency.LLVM, on the other hand, is an engineering project.In conclusion:garbage collection in LLVM incursunacceptable overhead,and while C-- and LLVM do have some overlap,the problems they're trying to solve are quite different.Posted by David Himmelstrup at 11:52 AM
5.2: co.addm/stackoverflow.com/llvm vs c--:
I've been excited about llvm being
low enough to model any system
and saw it as promising
that Apple was adopting it,
but then again
they don't specifically support Haskell,
and some think that Haskell
would be better off with c--
adding that there's
nothing llvm can do to improve .
> That LLVM'ers haven't solved the problem ofzero-overhead garbage collection> isn't too surprising .> Solving this while staying agnostic of thedata model> is an open question in computer science.
I am refering to
5.9: answer accepted:
Well, there is a project at UNSWto translate GHC Core to LLVMRemember: it wasn't clear 10 years agothat LLVM would build up all theinfrastructure C-- wasn't able to. Unfortunately,LLVM has the infrastructure forportable, optimized code,but not the infrastructurefor nice high level language support,that C-- ha(s)d.An interesting projectwould be to target LLVM from C-- ..
comment to answer:
. great answer; that was
just the blindspot-undo I was looking for!
. llvm'ers had a similar response
to the lack of concurrency support:
it's an add-on library thing .
. c-- can be ported to llvm,
meaning that llvm's gc simply won't be used .
11.9: web.adda/c--/review:
C-- is a compiler-target language.The idea is that a compiler for a high-level languagetranslates programs into into C--,leaving the C-- compiler to generate native code.C--'s major goals are these:C-- is not "(write-once, run-anywhere) .It conceals most architecture-specific details,such as the number of registers, but it exposes some.In particular, C-- exposes the word size, byte order,and alignment properties of the target architecture, for two reasons.First, to hide these details would requireintroducing a great deal of complexity, inefficiency, or both-- especially when the front-end compilerneeds to control the representation of its high-level data types.Second, these details are easy to handle in a front-end compiler.Indeed, a compiler may benefit, becauseit can do address arithmetic using integersinstead of symbolic constants such as FloatSize and IntSize.
web.adda/what do the c-- folks think of llvm?
summary:
. why isn't the llvm project working for c-- users?
llvm makes the assumption that there exists a generic assembler,
and c--, by assuming otherwise,
is not about portability:
the current version targets only the intel'86 architecture .
I do not understand the assertion that LLVM is uncooperative.The direction LLVM takes is driven entirely by contributors.I suggest you embrace thisand implement the necessary GC support in LLVM.The devs would likely be happy to help out with any problems;the team is *very* helpful.Furthermore,that support would open the door to implementingother similar functional languages in LLVM,rather making more isolated code islands.In the long run, LHC will win *big*by having that same code used by others(and tested, and expanded.)There are many things for which it is reasonable to haveNIH (Not Invented Here syndrome).In 2009, a fast code generator is not one of them.
David Himmelstrup said...
It's unsolved in the academic sense of the word.Solving it requires research and not engineering.If I knew how to solve it, I definitely would add it to LLVM.It's only unsolved in the general case.I doubt, however, that LLVM is interested in my specific data model(which is in a state of flux, even).
what I want to docan't yet be done by any general-purpose compiler.
Chris Lattner
Sun, 17 Dec 2006 12:45:42 -0800
LLVM is written in C++, but, like C--, it provides first-class support forintermediate representation written as a text file (described here:http://llvm.org/docs/LangRef.html), which allows you to write yourcompiler in the language that makes the most sense for you.In addition to the feature set of C--, LLVM provides several useful piecesof infrastructure: a C/C++/ObjC front-end based on GCC 4, JIT support,aggressive scalar, vector (SIMD), data layout, and interproceduraloptimizations, support for X86/X86-64/PPC32/PPC64/Sparc/IA-64/Alpha andothers, far better codegen than C--, etc. Further, LLVM has a vibrantcommunity, active development, large organizations using and contributingto it (e.g. Apple), and it is an 'industrial strength' tool, so you don'tspend the majority of your time fighting or working around our bugs :).Like C--, LLVM doesn't provide with a runtime (beyond libc :) ), which canbe a good thing or a bad thing depending on your language (forcing you touse a specific runtime is bad IMHO). I would like to see someone developa runtime to support common functional languages out of the box better(which language designers could optionally use), but no-one has done soyet.OTOH, C-- does have some features thatLLVM does not yet have first class support for.LLVM does not currently support for generating efficient codethat detects integer arithmetic overflow, doesn't expose therounding mode of the machine for FP computation, and does not yet supportmultiple return values, for example.While it is missing some minor features, one of the most importantfeatures of LLVM is that it is relatively easy to extend and modify. Forexample, right now LLVM's integer type system consists of signed andunsigned integers of 1/8/16/32 and 64-bits. Soon, signedness will beeliminated (giving us the equivalent of C--'s bits8/bits16/bits32/bits64integer types) and after that, we plan to generalize the integer types toallow any width (e.g. bits11). This is intended to provide better supportfor people using LLVM for hardware synthesis, but is also useful forprecisely constrainted types like those in Ada (i.e. it communicates valueranges to the optimizer better).> I think the three new things I'd like to see out of C-- are (in rough> order of priority):> 1) x86-64 support> 2) the ability to move/copy a stack frame from one stack to another, and> 3) Some form of inline assembler without having to go to C (necessary for> writting threading primitives in C--)LLVM provides #1 and #3 'out of the box'. #2 requires runtimeinteraction, which would be developed as part of the runtime aspect.For me, one disappointment of the LLVM project so far is that we have notbeen very successful engaging the functional language community. We havepeople that use LLVM as "just another C/C++/Objc compiler", we have peoplethat reuse the extant front-ends and optimizer to target their crazy newarchitectures, and we have mostly-imperative language people (e.g. python)using LLVM as an optimizer and code generator. If we had a fewknowledgable people who wanted to see support for functional languagesexcel, I believe LLVM could become the premier host for the functionalcommunity.If you are considering developing aggressive new languages, I stronglyrecommend you check out LLVM. The llvmdev mailing listis a great place to ask questions.
2006
> For me, one disappointment of the LLVM project so far
is that we have not been very successful engaging the
functional language community.
If you want to engage functional programmers,
you're not publishing in the right places.
PLDI gave up on functional programming long ago,
(Programming Language Design and Implementation)
and therefore
many functional programmers
no longer pay much attention to PLDI.
. the largest stumbling blocks for the industry adoption of
languages like Haskell and c--
is the fact that it still markets itself as
some mathematics/computer science professor's little experimental project.
I feel C-- still suffers a bit from "professor's pet project" syndrome a bit .
> - GCC: Still quite complicated to work with, still requires you to write
> your compiler in C. Implementing a decent type system is going to be
> interesting enough in Ocaml or Haskell, I'll pass on doing that in C.
> Which means a hybrid compiler, with a lot more complexity. Also,
> functional languages are definately still second class citizens in GCC
> world- things like tail call optimization are still not where they need to
> be. Which means implementing an optimization layer above GCC to deal with
> tail calls. Plus you still have all the run time library issues you need
> to deal with- you still need to write a GC, exception handlers, threading,
> etc. On the plus side, you do get a lot of fancy optimizations- SSE use,
> etc.
Where functional programming really shines, I think,
is programming in the large- word processors and CAD/CAM systems etc.
It's when you start dealing with things like maintainance
and large scale reuse and multithreading that
> functional programming really spreads it's wings and flies.
And, unlike scripting/web programming, performance really does matter.
>
> - Use C as a back-end. You're writing your own runtime again, tail
> recursion is poorly supported again, and a lot of function programming
> constructs don't map well to C.
> - Use C--. You still have to implement your runtime, but you're basically
> going to have to do that anyways. You get decent optimization, you get to
> write your compiler in the language you want to, and functional languages
> are first class languages.
>
> Of these options, I think C-- (assuming it's not a dead project) is the
> best of the lot. Even if it needs some work (an x86-64 back end, the
> ability to move a stack frame from one stack to another), it'll be no more
> work than any other option. My second choice would be GCC as a back end,
> I think. But the point here is that the fundamental niche C-- fills is
> still usefull and needed.
>
LLVM is very C-ish,
and makes it rather awkward to have
procedure environments and goto's out of procedures
Oct 2008 01:45:11 -0700
| Most of our users have reported that it is very easy to adapt a legacy
| compiler to generate C-- code, but nobody has been willing to attempt
| to adapt a legacy run-time system to work with the C-- run-time interface.
I don't know whether this'll be any use to anyone except us,
but we're using C-- like crazy inside GHC (the Glasgow Haskell Compiler).
But not as an arms-length language.
Instead,
inside GHC's compilation pipeline we use C-- as an internal data type;
and after this summer's work by John Dias,
we now have quite a respectable
story on transforming,
and framework for optimizing,
this C-- code.
Since some of the runtime system is written in C--,
we also have a route for parsing C-- and compiling it down the same pipeline.
All that said,
this is a *GHC specific* variant of C--.
It does not support the full generality of C--'s runtime interface
(it is specific to GHC's RTS), nor is it intended as a full C-- implementation.
In its present state it's not usable as a standalone C-- compiler.
Still, it is a live, actively-developed implementation
of something close to C--, and so might be of interest to some.
The OCaml Journal has published around 40 articles now. The most popular andthird most popular articles are both about LLVM. So I don't think it iscorrect to say that "functional language people don't like LLVM". Indeed, Ithought I was a kook for trying to write a compiler for a functional languageusing LLVM until I mentioned it to the OCaml community and half a dozenpeople stepped forward with their own alternatives. :-)
No comments:
Post a Comment