2010-11-14

programming in-the-large with c

8.5: news.adda/translate/info'hiding n type-safe linking:

2007 CMod Modular Information Hiding
and Type Safe Linking for C

CMod, provides a sound module system for C;
it works by enforcing a set of rules that are
based on principles of modular reasoning
and on current programming practice.

CMod's rules safely enforce the convention
that .h header files are module interfaces
and .c source files are module implementations.
Although this convention is well known,
developing CMod's rules revealed many subtleties
to applying the basic pattern correctly.
CMod's rules have been proven formally
to enforce both information hiding
and type-safe linking.

IEEE transactions on software engineering, vol.34, no.3, 2008

C’s weak type system still allows programmers to
violate type safety in other ways,
e.g., by using unchecked casts.
Other work, notably CCured and Deputy
can be used to strengthen C’s type system
to eliminate these problems.
CMOD complements these efforts;
its type-safe linking guarantee
can be viewed as extending
whatever type safety is provided for
a single C module
to that of the entire program .

Related approaches either require
linguistic extensions (e.g.,
Knit: Component Composition for Systems Software,
Koala Component Model for Consumer Electronics Software
) or
do enforce type-safe linking,
but not information hiding (e.g.,
CIL: Intermediate Language and Tools for
Analysis and Transformation of C Programs
and C++ “name mangling”).

Header Files as Interfaces:
. the key principle is that symbols
-- var's, functions, const'obj's --
are always shared via interfaces.
. a header (.h) acts as the interface to a body (.c);
clients #include the header to refer to body's symbols;
and, the body includes its own header
to make sure the types match in both places .
[11.13: formalism.section:
. if the body includes another .c file,
or some other non-(.h) file,
then the #include is considered to be an inline;
ie, it's a way of segmenting a module`body
into a number of files that the
preprocessor will merge together
just before the body's compilation .]

 Abstractly, a module interface I
is a set of {symbol, type} declarations.
a module implementation M
is a set of {symbol, type} definitions
exported by an interface,
as well as privates .
. when a client wishes to refer to
the definitions of module M,
it must do so through M’s interface.
Moreover, in most module systems,
the compiler ensures that each module
implements its interface,
[ie, everything declared in the interface (.h)
is defined in the implementation (.c)]
[11.13:
. the module is a compilation unit,
and facilitates separate compilation .]

The goal of CMOD is to define
a backward-compatible module system for C
that realizes the two key properties
that make modules safe and effective:

Property# Information Hiding:
-- clients must depend only on interfaces
rather than on particular implementations .

Suppose that M implements interface I:
. if symbol g is defined in M
but not declared in I,
then importers of M may not access g;
ie, M intends to hide what's not declared in I .
. an adt (abstract datatype)
is a collection of functions in I
that operate on some private obj's;
importers of I cannot access these private obj's .

. the info'hiding property
makes it easier to reason about
and reuse modules;
in particular, if a client successfully
compiles against interface I,
it can link against any module that
implements I.
Consequently, M may safely be changed
as long as it still implements I .

Property# Type-Safe Linking:
. linking must be type safe .
If M implements I,
and a client module refers to
symbols in some interface I
with {M, client} individually type safe,
then linking M and N together
will also be type-safe .

Rule 1 ( symbols are shared
 only through headers):

Whenever a file uses an obj' declaration
which is then def'd by another file;
both files must include a header
that contains the decl' of that obj'.
[ all shared obj'def's will be a copy of
some header's decl'
along with an initialization .]
. the mapping of body's(.c) to headers(.h)
doesn't have to be 1-to-1:
# use a single header for several body files;
eg, a library offers one face for many modules .
# use multiple headers for a single body
to provide “public” and “private” views,
where the body file would include
both headers,
while clients would include
the public view;
and employees, the private view .

. type errors can occur when a body file
fails to include its own header.
. if an extern obj'decl' appears
directly in a client
rather than via a header,
it violates privacy
and may also break type-safe linking,
e.g., by assigning the wrong type to a var'.

. declaring an obj' static
does enforce privacy
(because the linker ignores it);
however, it is often the case
that an obj' hidden from clients,
needs to be shared with employees .

Rule 1 applies only to obj's, not types
which are covered by rule 2:

Rule 2 (Type Ownership).
Each type def' in the linked program
must be owned by exactly one body or header file.
[11.13: discussion.section:
. another interpretation of that rule
is that if there are 2 def's for a concrete type,
then a body using def#1
cannot be linked to a body using def#2 .
. they could still be used for making
separate programs that are then
reused via calls to the OS .]

the Type Ownership rule allows 2 headers:
# “private”.h reveals a type’s definition,
[ie, a concrete vs abstract type,
some particular native c type
(int, char, struct, ptr),
or some renaming of those by a typedef.stmt .]
# “public”.h provides the adt:
. implementors of the type's functions include both,
while clients include only the public header .

. CMOD's notion of type ownership
makes sense for a global namespace
in which type names have a single meaning
throughout a program.
. while a static notion for types is conceivable,
CMOD's stronger rule is
simple to implement, and popular .
[11.13: C assumes the unix way:
. C does have localized types
if you consider it within the unix context,
which encourages the use of small programs
doing one thing well
for subsequent composition with other programs .]

. type def' control is modeled after linkers
which require that only one file
define a particular shared decl' .
This ensures there is no ambiguity about
the definition of a given obj' during linking.
Likewise for concrete types,
we can require that there is
only one def' of a type being “linked against”
in the following sense:
We say that a concrete type def' is
owned by the file in which it appears.

# if the type def' occurs in a header file
(and,hence, is owned by the header),
then the type is transparent
and many modules may know its definition .
in which case,
“linking” occurs by including the header .

# if the type def' appears in a body file
(and, hence, is owned by that file),
then the type is abstract:
Only that module,
which implements the type’s functions,
should know its definition.

Preprocessing and Header Files
Rules 1 and 2 form the core of CMOD’s
enforcement of information hiding
and type-safe linking.
However,
for preprocessor complications
we need rules {3,4} to enforce the
Principle of Consistent Interpretation:
Each header being used as an interface
the post-processed header
is identical wherever it is included.

Enforcing this principle allows us
to keep Rules 1 and 2 simple
and it makes it easier to reason about headers
since their meaning is less context-dependent
(though not entirely, as we discuss below).
This is the same principle underlying
proper use of precompiled headers
and, thus,
programs that adhere to CMOD’s rules
can also use such headers safely.
[11.13:
. another way to reduce pre-processor hazards
is to replace it with other techniques .
# macro's can be done at the adda level
(the HLL that is translated to C);
# conditional compilations can be replaced with
Software Config Mgt branches .]

Rule 3 (Proper Inclusion).
. idioms for safe header file inclusion:

# no vertically dependent interfaces:
. the interpretation of a header must be
the same no matter the order
in which it is included;
because, it's safer to not require coders
to remember an intended inclusion ordering .
. CMOD believes a better practice is
having only horizontal dependencies,
where interface #includes
any interface it is dependent on .
[11.13:
. an example of vertical dependence:
config.c: --[. vertically, above is before ...]
1 #define DEBUG --[. this line is placed ]
2 #include “eval.c” --[ before this line .]
. this config file has set a debug flag,
and then eval.c has compilation conditional's
that will check for that flag .
. headers not meant to be interfaces
as in that example, can be vertically dependent .]
. another allowed use of vertical dependence
is where a config.h header is always
included before every other included header .
. the config.h typically contains
nothing but flags acting as parameters
compensating for variations in
the platform or requested features;
eg, config.h:
#ifndef _CONFIG_H
#define _CONFIG_H
#ifdef __BSD__
#undef COMPACT
#else
#define COMPACT
#endif
#endif
. CMOD checks that your config.h
is #included first before any other #includes .

# interfaces must ignore duplicate inclusions:
. duplicate #include's are made harmless
by the #ifndef pattern [ie,
#ifndef _interface_H
#define _interface_H, ]
which CMOD requires of every interface .

# interfaces must avoid inclusion cycles:
. the #ifndef pattern is essentially a kind of
self-dependence.
. in the presence of recursive inclusion,
self-dependencies can violate
the consistent interpretation principle .
(( These subtle dependencies could be the reason that
some coding style guides
encourage vertical dependencies
in lieu of horizontal ones .))

Rule 4 (Consistent Environment).
All files linked together
must be compiled in a consistent
preprocessor environment.
ie, for any pair of linked files
that depend on a preprocessor symbol
being #define'd or #undef'd,
that symbol must be {un-, defined} identically
in the initial preprocessor environments
for each file .
Pre-processing each module
using the same state for all #define's
ensures that all of its included headers
(which,by Rule 3, are not vertically dependent)
are interpreted the same way everywhere .
[11.13:
. this rule is concerned with
combining conditional compilation
with undefined parameters,
such as the OS type and version .
. parameters can serve many other purposes,
and this rule points out that
any such parameters
should not vary across modules .]

In essence, Rules 3 and 4
allow the program -- all of its
linked bodies and their interfaces --
to be treated as a very large functor,
 parameterized by the
initial preprocessor environment
and a uniformly included config.h;
thus,
while CMOD allows individual headers to be
parameterized,
they must be consistently interpreted
throughout the program
in order to be treated as interfaces.
Consistent interpretation works well in practice:
Since a .h file acting as an interface
represents a .c file that is typically compiled once,
there is usually little reason
to interpret the .h file differently
in different contexts .

10.23: web.adda/translate/cmod as a tool:

the manual (umd.edu's google`doc)
. CMOD is an existing tool
that can check adda's translation
(when it's not targeting obj'c).

10.24: web.adda/translate/cil, ccured, and c code checker:
C resources:Parsing tools
CIL - C Intermediate Language - C to C transformation
C Code Checker (for CIL);
ccured .

10.23: pos.adda/translate/info'hiding n type-safe linking:
. what is the best way to use a pre-compiler ?
a pre-compiler does 2 things:
# macro's (including const's)
# param -controlled code revisions
via conditional compilation .
. modern source config mgt or version control
lets you check out a branch for each target
so it can conserve space by sharing a db,
but unlike the c`approach
(using conditional compilation),
you don't have to look at the
whole set of branches all tangled together .
. c`macros can still implement some adda macro's
(espl'ly const's and inline functions)
but conditional compilation will be
done at the adda level
and not be seen in the c code .
that specifically means no conditional includes,
which should obviate concerns about
"(vertical independence) .

10.23: adda/translate/choice of using c compiler's checks:
. the idioms that actually provide info hiding
should be distinguished from
the idioms that get the compiler to help;
because, the c compiler
doesn't offer a lot protections
compared to what the code generator
should be doing;
ie, adda must already be doing
the same tests the compiler would provide .
thus, it would be redundant to have
an impl.file include its own header.file .
. it could be an option:
do you want idiomatic c
or do you want efficient code ?
. that choice would be esp'ly important for
those who want to modify the resulting c code .

10.24: news.adda/translate/per-project-only includes:
Writing Bug-Free C Code
A Programming Style That Automatically Detects Bugs in C Code
by Jerry Jongerius 1995

. in large projects where you
end up with a lot of include files
and a lot of interdependencies between them.
one simplifier is a global include file
with all but a few sections enclosed in
#ifdef USE_Aclass sections.

10.25: the answer is libraries:
. build the classes and other libraries first,
after those are compiled
all the #include's needed to implement that library
simply disappear under the hood;
the only #include left
is the one the library wants to export .
[11.13: well, assuming no local typedef's were needed .]


8.24: news.adda/translate/the c book/modules in practice:

GBdirect`The C Book
> Specialized areas > Declarations & definitions

8.2.5. Realistic use of linkage and definitions
Figure 8.1. Layout of a source file

interface #include's

/* decl's imported:*/
extern int important_variable;
extern int library_func(double, int);

/*Def's exported */
extern int ext_int_def = 0; /* explicit def' */
int tent_ext_int_def;  /* tentative def' */

/* locals -- static means also tentative def's.*/
static int less_important_variable;
static struct{ ... } local_struct;

/* static fn defined below. */
static void lf(void);

/* local Def' = internal linkage */
static float int_link_f_def = 5.3;

/* after decl's the fn`definitions */

/* external linkage = exported */
void f1(int a){...}

/* local */
static int local_function(int a1, int a2){ ...}

static void lf(void){
static int count; /*Also a def' (because of no linkage) */
}
/* eof -- remaining tentative def's
get auto-defined with zero . */

10.25: adda/translate/c modules:
. notice 2 ways of using #include's:
# interfaces for modules:
[11.13:
. a file delimits the scope of
static-class symbols;
and is the body for module systems .]
# subfiles:
--[11.13: aka, inline vs called code]--
. for very large modules #define's can
be used for combining several files
so that they are separate during editing,
but then seen by the compiler
as being just one file .
. therefore, instead of
each file including shared symbols,
the module can have its locals
directly within in,
and in-line all its other contents .
. it's important not to mix styles
since the adt-#include'ing style
is allowing function-files to declare
file-local symbols,
whereas the function-#include'ing style
is now turning all those file-local symbols
into module-wide symbols .

10.26: adda/translate/other ways to use .h:

. just as there are .h files for
what a module wants to export,
could there be a .h for what to import?
... I wasn't sure where that was going;
but, then it lead to the idea of
one .h file shared by
all of a module's subroutines;
eg, the module #include's mysubs.h
which includes the prototypes
to every function called
only by this module .
[11.13: and that would be consistent with CMOD,
allowing mysubs.h to be #include'd by
both the importer, and each of the sub's .]

. after all the subroutines are
removed from a module; [11.13:
and replaced with in-line #include's],
then what's left is basically
a neat summary of the contents .
. if separate compilation is not needed,
then a subroutine's prototype
can be replaced with an in-line
(I've been calling that a subfile:
a single translation unit is made by
#include'ing .c files as if they were .h files)
10.29:
. good form suggests that you
have an impl' file include its own .h file;
but that is only useful when manually coding;
a code generator should be as trustworthy
as the c compiler, or more so . [11.13:
because you're dealing with a lot of c compilers;
some of which may not be well tested .
. plus, a code generator lets you use
very long names for all the linker symbols;
auto'ly translating them to dos-short names
for the c linkers that allow only short names .]

8.29: news.adda/translate/the c style guide:
2. File Organization 1989

2.1. File Naming Conventions
File names should be dos8.3 lower-case
letters and numbers starting with a letter;
(four, if you include the period).
These rules apply to both
program files and default files
used and produced by the program (e.g., "rogue.sav").

2.2. Program Files
order of sections for a program file:

   1. prologue
. comment describing module;
optionally contains author(s),
revision control information, references, etc.

   2. header includes
In most cases, system include files like stdio.h
should be included before user include files.

   3. Any defines and typedefs
that apply to the file as a whole are next.
"constant" macros, --except put d'str-specific's
-- with their corresponding d'str decl's .
"function" macros,
typedefs and enums.

   4. global (external) data declarations,
externs,
non-static globals,
static globals.
-- If a set of defines applies to a
particular piece of global data
(such as a flags word),
the defines should be immediately after
the data declaration
or embedded in structure declarations,
indented to put the defines
one level deeper than the first keyword
of the declaration to which they apply.

   5. The functions
 
2.3. Header Files

Header files are files that are included
in other files prior to compilation
by the C preprocessor.
Some, such as stdio.h,
are defined at the system level
and must included by any program
using the standard I/O library.
Header files are also used to contain
data declarations and defines
that are needed by more than one program.
Header files
should be functionally organized,
i.e., declarations for separate subsystems
should be in separate header files.
Also, if a set of declarations
is likely to change when code is ported;
they should be in a separate header file.

Avoid private header filenames
that are the same as library header filenames.
The statement #include "math.h" will include
the standard library math header file
if the intended one is not found
in the current directory.
If this is what you want to happen,
comment this fact.
Don't use absolute pathnames
for header files.
Use the angle-brackets for
getting them from a standard place,
or define them relative to
the current directory.
The "include-path" option of the C compiler
(-I on many systems)
is the best way to handle extensive private
libraries of header files;
it permits reorganizing the dir' structure
without having to alter source files.

Defining variables in a header file
is often a poor idea.
Frequently it is a symptom of poor
partitioning of code between files.
Also, some objects like typedefs
and initialized data definitions
cannot be seen twice by the compiler
in one compilation.
On some systems, there are problems
repeating uninitialized declarations
without the extern keyword .
Repeated declarations can happen if
include files are nested
and will cause the compilation to fail.

Header files should not be nested.
The prologue for a header file should,
therefore,
describe what other headers need to be
#included for the header to be functional.
In extreme cases,
where a large number of header files
are to be included in several different source files,
it is acceptable to put all common #includes
in one include file.

It is common to put the following
into each .h file
to prevent accidental double-inclusion.
    #ifndef EXAMPLE_H
    #define EXAMPLE_H
    ...    /* body of example.h file */
    #endif /* EXAMPLE_H */

This double-inclusion mechanism
should not be relied upon,
particularly to perform nested includes.

8.24: news.adda/translate/the c book/other notes:

GBdirect`The C Book
> Specialized areas > Declarations & definitions

8.2.1.1. Duration {static, automatic}
[10.25:
. internal var's can optionally be static,
but external's can't ever be auto?
. the way they could be auto,
is that the program could have multiple instances;
so then
if each instance has its own copy of extern vars;
that would be an auto storage class;
else if the instances all have to share extern vars;
that would be a static storage class;
and, static could then mean that
the value is preserved across program runs .
. however,
c's use of "(static) for file-level var's
is not referring to static storage!
. c is trying to minimize keywords:
so "(static) is reused:
# as a local static var:
it's as if it uses a non-local that
can only be seen or used locally .
# as a file-level static var:
this means it stays in the file:
-- stays, stasis, static --
it can't be seen by other files
that are asking for that name .]

register variables
[10.25:
. although modern compilers can
best decide what should go in registers;
the fact that you can't
take the address of a register
could mean (depending on compiler?)
that the register classification could provide
a way of declaring no-aliasing;
ie, you can have the compiler warn you
if your code ever aliases this var .]

8.2.2. Scope
function scope. This only applies to labels,
visible throughout the function
irrespective of the block structure.
[10.25:
. notice how label`scope
differs from external symbols
which aren't visible above their
place of declaration .
. that reminds us to ask the question:
do local var's have full-block scope
or downward-block scope?
it's a good practice to always declare
block-local var's at the beginning
-- not the middle -- of that block .]

8.2.3. Linkage
. Linkage is what makes the same name
declared in different scopes
refer to the same thing.
[10.25:
. the practical view of this
is that linkage really is about links:
an extern declaration is saying
that this is not the object itself
but a link to an object
of the same name defined elsewhere
(in a separately compiled obj'file).
. the object code could then have
a list of all places where
this link is used,
so when linker finds the definition,
it can update all these links .
. or,
it could simply use a handle:
all references to that name point to
a pointer at the top of the page;
and then linker has only to fill in
that one top-page pointer .]

The three different types of linkage are:
    * external linkage (inter-file)
    * internal linkage
    (within a given source code file)
    * no linkage
    (multi-declared names involve
    shadowing or distinct scopes,
    or in distinct files and typed static)
[10.25:
. internal linkage is often used when
the main program is above its subprograms;
like so:
subprograms are declared,
main is defined,
then subprograms are defined .]

8.2.4. Linkage and definitions
[10.25:
. I wondered if locals can be tagged extern
so that just as static is a local
but virtually at file-level,
local externs could be sharing
with other localities
as local-to-local communications .
. I'm pretty sure this doesn't exist in c,
but wondered if it made good sense
as an optional feature .
10.26:
. notice c allows recursion,
and the static local allows communication
between multiple instances of the same function .]


10.29: web.adda/translate/Type-Safe:
Type Safe Linking

. this paper proposes encoding the
type information in names of symbols:
. It roughly works by appending
the types of the arguments passed to a function
after the function name.
eg, double a(double b, int c)
would be encoded as a__Fdi
where F stands for function,
d for double
and i for int.

The advantages of this approach are:
    * the absence of extra-linguistic mechanisms
 (such as the C preprocessor
 or the lint program checker);
    * the ease of implementation
 as no other programs need to
 understand the program structure;
    * the avoidance of the need to
keep the headers consistent with
the program source.
[11.13: assumes you don't have access to CMOD]

The scheme does not encode
types of variables
and return types of functions.
This is done in order to ensure that
errors arising from
declaring a variable or function
in two different modules
with the same name,
but different type or return type
correspondingly,
are caught by the linker.
( for function overloading,
there's allowance for
defining the same function
with different argument types
in separate modules .)