2009-12-29

bitc

10.4: web.adda/bitc/validating c as assembler code:

. reviewing the progress of gnu`hurd,
hurd/ng is referencing coyote who is working on bitc,
whose mailing list seems to be dwindling:

. it's got some interesting problems in 2005:

. recent bits:
Jonathan S. Shapiro Apr 7 2009:
. several people have asked what all of this means for Coyotos.
Active work on Coyotos stopped several months ago,
and is unlikely to resume.
I am debating whether to re-license it under BSD,
but other than that, I have no current plans to continue the Coyotos work.
Getitng BitC v1 done is already pushing the limits of what is feasible
pretty severely.
Sat Apr 25 13:15:24 EDT 2009:
We looked at C-- early on and decided not to go that way.
C-- has come a long way since then,
but the time for that option has passed.
Shapiro Feb 10 2009
At a certain point the complexity of the compiler got big enough
that Python was no longer the right tool.
We went to C++ rather than C because we wanted to use smart pointers.
In hindsight, we should have stuck with C.
2008.04 paper on bitc`origins:

what we didn't know then:
It is possible to program in something else (typically the prover)
and treat a small subset of a particular C implementation
as a well-defined assembly language,
as the L4.verified project has since done
Gerwin Klein, Michael Norrish, Kevin Elphinstone and Gernot Heiser.
``Verifying a High-Performance Micro-Kernel.''
7th Annual High-Confidence Software and Systems Conference,
Baltimore, MD, USA, May, 2007

. whatever its merits, ada is a dying language ...
it does not effectively exploit the advances in programming language theory
that have been made over the last two decades.


In ML, the use of state in the language is very carefully constrained to
simplify the core language semantics.
When state is introduced generally,
the language is suddenly forced to adopt a rich, first-class semantics
of locations into both the core semantics
and the core type system.
When let-polymorphism is present, adding core mutability
threatens the ability of the type inference engine
to generate principal types.
We wanted type inference, because Shapiro had witnessed
an unending stream of examples in UNIX
where failures to keep types synchronized across a code base led to errors,
and because
useable abstraction is hard to get in this class of language
without both type variables and type inference.
Being blissfully ignorant (at least when we started) of formal type theory,
all of this looked hard but straightforward,
which describes what kernel researchers do pretty much all the time:
navigate hard engineering compromises.
In fact, we soon learned that nobody had ever discovered
a sound and complete type system incorporating at the same time
both polymorphism and general mutability

Because of its preponderence for heap allocation,
extracting any sort of decent performance from an ML program
requires a level of optimization that is well beyond aggressive.
We needed a language that could preserve the illusion that
``what you see is what you get.''
This, among other issues, drove us to choose
eager rather than lazy evaluation.
object-oriented features actively got in the way of
understanding what was going on
however,
We needed the operator ``+'' to operate over arbitrary integer types.
Haskell presented a ready-made solution: type classes.
These simultaneously allowed us to generalize operators
and introduce a form of overloading into the language.
But type classes raise a problem that we will come back to later.
They introduce overloading and matching ambiguities that need to be addressed:
given two valid choices of specialization, which to choose?
In consequence, they have an unfortunate tendancy to break
separate compilation schemes.
Another issue with type classes is that the major implementations
all violate our code transparency objective.
The dictionary-based implementation technique
is not well-suited to languages having value types of multiple sizes;
we wanted an implementation that operated more in the style of C++ templates
-- not least because of the importance of linkage compatibility with C.
This would turn out to be the most problematic feature of the language.


In C, it is possible to have simultaneously a pointer to constant X
and a pointer to X,
both of which reference the same location.
This means that the compiler must be conservative when procedure calls are made.
In general, it cannot assume that supposedly constant objects
are unable to change.
In the interests of optimization, the ANSI C standard [1]
actually permits the compiler to make exactly this assumption.
Since there is no way for a C programmer to check whether
they have complied with the expectation of the compiler,
this is an invitation to error.
Since the result of computation depends on the
optimization decisions of particular language implementations,
bugs of this sort cannot be reliably eliminated through testing.
In BitC,
we decided very early to implement the notion of
immutability rather then constantness.
A BitC location that is declared to have immutable type
cannot be modified through [/]any reference.
In C, the ``const'' in const char is a type qualifier that can be stripped away.
In BitC it is an integral part of the type.
Actually, there wasn't any const type constructor in early BitC.
Variables and fields were constant unless declared mutable.

10.5:

The problems with module systems in stateful languages
are all about global variable initialization order.
. two language features that create problems for initialization order:
external declarations and assignment.
External declarations intentionally allow global identifiers
to be used before they are defined in ways that cannot be checked
when modules are compiled separately.
Assignment can cause initialized values to change between uses.
This either leads to a much finer dependency ordering on initializers
or to throwing up your hands and leaving ordering of effects during initialization undefined.
That may be type safe, but you get no guarantee about
what values global variables may hold when main() begins to run.
From a verification perspective this is problematic.
It is also makes the operation of a macro system
(which we intend to add in the future) undefined.
Well-formed programs invariably satisfy an implicit
lattice-structured dependency relationship for their initializers
(or at least don't get caught violating one).
The problem is that the ordering built by the compiler
is dictated by computational dependency,
while the organization of the program
required for manageability (the module relationships)
is dictated by conceptual relationships that exist in the minds of the developer.
The problem of initialization order is to satisfy both constraints simultaneously.

In BitC, we solved the initialization ordering problem
by declaring that interfaces are initialized first according to
the lattice defined by their import dependencies.
Within an interface, initialization proceeds from top to bottom,
and use of undefined forward references is prohibited.
It is source units of compilation that present the problem.

A BitC interface can declare a procedure that is implemented by
some exporting source unit of compilation.
In the absence of this feature,
we could declare that source units got initialized last in unspecified order.
Because source units can only export their identifiers through interfaces,
there cannot be further ordering dependencies
once the interface units have initialized in a defined order.

In the presence of export, this doesn't quite work.
What we do instead is to require that
any definition that is exported by a source unit
may rely (transitively) only on those symbols
that were in scope at the point of declaration in the interface.
The provided definition can rely on
other procedures and variables in the source unit,
and initialization proceeds as if all of those procedures and variables
had been temporarily copied into the interface unit of compilation.

But what about assignment and side effects?
And what if some of the procedures executed during initialization
get run multiple times?

9.2 Initialization vs. State

. permitting assignment during initialization is problematic.
Empirically, it isn't enough to require that
initializers be designed sensibly by the programmer
-- in the presence of shared libraries this requirement is nearly unachievable,
and the problem becomes more challenging in a higher-order programming language.
We cannot preclude the use of set! in initializers altogether.
This is too strong; it would prevent any occurrence of set!,
even if the containing procedure is not called at initialization time.
And strictly speaking, we can allow assignments in initializers
as long as the results of those assignments do not escape.

The position taken in BitC is that initializers must be ``pure,''
meaning that they may not modify any globally reachable values.
Implementing this forced us, with considerable hesitation,
to adopt an effect type system (Section 10).
10.5: adda/bitc/verifiable and modern systems prog' lang:
. after reading about the origins and req's of bitc
I was worried about the job ahead for designing adda .
. the storm shelter is to be found in modularity:
to be safe,
the system need only confine systems prog'ing to a module;
and within that module the system doesn't have to be safe:
it's ok if a module fails
so long as the failure doesn't bedevil other modules .
. for systems prog'ing with adda
we need to see how adda maps directly to c
while still staying true to the adda syntax .
efficiency:
. one paper points out why efficiency matters:
systems prog'ing is responsible for the energy use of data centers,
and that energy comes mostly in the form of cooling .
. the more cycles we use, the greater that ac bill .
. a major waste of energy is copying data that could be shared,
and there are 2 routes:
the complicated way is to try providing safe sharing;
while the easy way is to let systems prog'ing
happen in a separate, background process,
where any freeze-up's won't bedevil the safe parts .
[10.10: ie,
the most important thing for prototyping
is to help the user be an effective app'programmer
where worrying about efficiency can happen later . ]


10.5: bk.adda/Why Systems Programmers Still Use C:
. Singularity may obviate the cap'based coyote.os
. the Singularity project is probably the most serious challenge
to the current Coyotos work in the secure embedded system space.
There are some denial of resource issues
that are inherent in the Singularity communications layer.
These are fixable, and we gave serious thought to
replacing the Coyotos system with a structurally similar system
built on Singularity-like foundations.
In the commercial Coyotos effort,
we reluctantly concluded that the schedule risk
to our current commitments was too high,
but we think that there is an interesting research challenge here:
enhance the linear type approach of Singularity messages
with complete storage accountability during message exchange.
See if a system can be successfully structured
using language based mechanisms when abandoning the assumption
that there is an infinite pool of storage .
. if linear types prove to be a human-manageable mechanism
for handling interprocess communication,
it may really be true that
the era of hardware-based isolation is over .
Most of the performance unpredictability issues of garbage collection
become irrelevant if collection occurs over a small heap.
By increasing messaging performance,
Singularity permits applications to be split into
much smaller, independently collected domains.
Robustness is simultaneously improved.


Swaroop Sridhar, Jonathan S. Shapiro, Scott F. Smith



10.6: todo.adda/advanced typing systems:
. type systems of ml and haskell origins?
and see downloaded papers .



No comments:

Post a Comment