Planet Scheme

April 15, 2014

Programming Praxis

Assembler, Part 1

In this exercise and the next one we will write a simple assembler for a hypothetical computer; we are following the assembler in the book The Awk Programming Langauge by Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger. Our computer has a memory of a thousand five-digit words, a single accumulator, and eleven opcodes:

OPCODE INSTRUCTION DESCRIPTION
00 const C assembler pseudo-operator to define a constant C
01 get read a number from the input to the accumulator
02 put write the number in the accumulator to the output
03 ld M load accumulator with contents of memory location M
04 st M store contents of accumulator in memory location M
05 add M add contents of memory location M to the accumulator
06 sub M subtract contents of memory location M from the accumulator
07 jpos M jump to memory location M if the accumulator is positive
08 jz M jump to memory location M if the accumulator is zero
09 j jump to memory location M, unconditionally
10 halt stop program execution

An assembly-language program is a series of blank lines and statements consisting of up to four fields: The first field, if it exists, is a label; it must start at the first position on the line and may not start with a digit. The second field, which is mandatory, is the opcode; it follows the optional label and mandatory spaces. The third field, which is used only with some opcodes, is the object; if it is present, it follows the opcode and mandatory spaces. The fourth field, which is optional, is a comment; everything on the line following a hash-sign is ignored. Here is a sample assembly-language program that prints the sum of a series of integers entered by the user, with the end of input marked by a 0:

# print sum of input numbers (terminated by zero)

     ld    zero   # initialize sum to zero
     st    sum
loop get          # read a number
     jz    done   # no more input if number is zero
     add   sum    # add input to accumulated sum
     st    sum    # store new value back in sum
     j     loop   # go back and read another number

done ld    sum    # print sum
     put
     halt

zero const 0
sum  const

The contents of memory after loading the sample program are shown on the next page.

Your task is to write a program that assembles a program written in our simple assembly language and loads the program into memory; we’ll write a simulator for our hypothetical computer in the next exercise. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at April 15, 2014 09:00 AM

April 14, 2014

Jeremy Kun

Sending and Authenticating Messages with Elliptic Curves

Last time we saw the Diffie-Hellman key exchange protocol, and discussed the discrete logarithm problem and the related Diffie-Hellman problem, which form the foundation for the security of most protocols that use elliptic curves. Let’s continue our journey to investigate some more protocols. Just as a reminder, the Python implementations of these protocols are not at all meant for […]

by Jeremy Kun at April 14, 2014 03:00 PM

April 11, 2014

Programming Praxis

Plotter

Programs that create graphs and charts are an important tool in many commercial and technical endeavors; for instance, a programmer might look at a graph of memory usage during a debugging run of his program to identify hot spots that need to be trimmed.

A simple type of graph is a scatter graph, sometimes called an x/y plot. The input is a list of points on an x,y-grid; the output is a picture with the points plotted on a grid. A simple input file appears on the next page.

Your task is to write a program to plot scatter graphs. When you are finished, you are welcome to read or run a suggested solution, or to post your own program or discuss the exercise in the comments below.


by programmingpraxis at April 11, 2014 09:00 AM

April 09, 2014

GNU Guix

GNU Guix 0.6 released

We are pleased to announce the sixth alpha release of GNU Guix.

This release provides a bunch of new features, among other things:

  • "Substitutes" (pre-built binaries) must now be signed and authorized to be installed;
  • Builds can be offloaded to other build machines over SSH; we use this facility for our build farm.
  • The +guix build+ command has a new --with-source option that allows a package to be built from a tarball other than that specified in the source. This is notably useful for maintainers who want to test pre-releases of their package.
  • 91 new packages, including GNU Octave, and many upgrades, notably GNU libc 2.19.

An updated QEMU x86_64 image is provided, featuring Guix 0.6 and dmd 0.1. It starts an X server with WindowMaker.

See the original announcement for details.

by Ludovic Courtès at April 09, 2014 06:44 PM

April 08, 2014

Programming Praxis

Formatted Output

The only output functions provided by standard Scheme are display, for plain output, and write, for system-formatted output. That’s rather limiting. At the other extreme, Lisp provides the format function, which has more options than you can dream about (before I discarded it in favor of the HyperSpec, the spine of my printed copy of CLtL2 was broken in two places, format and loop). Many languages provide some kind of formatted output — does anyone remember PIC in COBOL? In today’s exercise we will implement the printf function popularized by C and used in several languages.

There are three functions in the printf family: (sprintf fmt expr) returns a string formatted according to the specification given by format and containing the values expr …; (printf fmt expr) displays a string formatted similarly to sprintf to the current output port, and (fprintf port fmt expr) displays a string formatted similarly to sprintf to the indicated port. The fmt and port arguments are always required.

The fmt argument is a string that contains literal text, escape sequences, and specifications of how the expressions should be formatted. A specification consists of a literal percent-sign %, zero or more modifiers, and a single-character specifier. The single-character specifiers that we will support are:

c ascii character
d decimal integer
f floating-point number
o octal integer
s string
x hexadecimal integer
% literal percent sign

There should be as many expressions as there are format specifiers in the fmt string, except that a % literal percent sign specifier does not consume an expression. As many as four modifiers may appear between the literal percent sign % that starts a specifier and the single-character specifier that ends it:

- left-justify the expression in its field; if not given, the expression is right-justified
0 left-pad with leading zeros instead of spaces
width pad field to this width using spaces (or zeros)
.prec digits to right of decimal point, or maximum string length

The modifiers must appear in the order shown above. The width and prec arguments are unsigned decimal integers.

Escape sequences are introduced by a backslash; the following escape sequences are supported:

\b backspace
\f formfeed
\n newline
\r carriage return
\t horizontal tab
\ddd character with octal value ddd, where ddd is 1 to 3 digits between 0 and 7 inclusive
\c any other character c literally, for instance \\ for backslash or \" for quotation mark

Depending on the environment, a newline may (or may not) imply a carriage return, or vice-versa. Several examples are given on the next page.

Your task is to write a function that provides formatted output for your language, using the definition of printf given above or some other specification that is suitable for your language and your aspirations. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at April 08, 2014 02:00 AM

April 04, 2014

Programming Praxis

Factoring With Bicycle Chains

As penance for teasing you on April Fool’s Day, I have another exercise on factoring. Today we will simulate a machine invented by Derrick H Lehmer (the son of the famous father-son-daughter-in-law trio of mathematicians named Lehmer) that uses bicycle chains to factor integers. Unlike the previous exercise, this factoring method is real; in fact, a variant of this method was the best available method to factor integers from the mid-1930s to the mid-1960s, and Lehmer’s machine found many valuable factorizations for the Cunningham Project. We’ll discuss a simplified version of the machine today, as described by Uli Meyer, and the full version of the machine in a future exercise. I am unable to find a copyright-free picture of Lehmer’s machine, but if you’re interested you could visit the Computer History Museum, or see Lehmer himself describe the machine.

The machine is based on Fermat’s method of factoring by a difference of squares: if n = x2y2 then the factors of n are (xy) and (x + y). Fermat’s method starts with y = ⌈ √n ⌉, computes y2n, and stops when the result is a square.

Lehmer’s machine identifies likely squares by looking at quadratic residues. An integer a is a quadratic residue modulo n if there exists some x such that x2a (mod n). For a number y2n to be a square in Fermat’s algorithm, it must be a quadratic residue to many small bases. Lehmer’s machine represents quadratic residues as “open” points on a bicycle chain, then spins many bicycle chains of different lengths along a common axis and stops when all are quadratic residues; in many but not all cases, such a number will will indicate a factor of n.

Uli Meyer gives a better description, shows a useful diagram, and provides photos of his Lego sieve machine.

Your task is to write a program that simulates Lehmer’s machine to factor integers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at April 04, 2014 09:00 AM

April 02, 2014

Jeremy Kun

Stable Marriages and Designing Markets

Here is a fun puzzle. Suppose we have a group of 10 men and 10 women, and each of the men has sorted the women in order of their preference for marriage (that is, a man prefers to marry a woman earlier in his list over a woman later in the list). Likewise, each of […]

by Jeremy Kun at April 02, 2014 11:21 PM

April 01, 2014

Programming Praxis

Factoring By Digital Coding

A few days ago, an announcement was made in /r/crypto of a new factoring algorithm that runs in time O(log^4 n). A paper describes the algorithm. And even better, coding for the new algorithm is simple:

1)Set mn.
2) Express m as a list of its decimal digits.
3) Express each decimal digit as a list of binary bits.
4) Append the individual lists of binary bits.
5) Convert the resulting list of binary bits into a decimal number. Call it d.
6) If the greatest common divisor of n and d is between 1 and n, it is a factor of n. Report the factor and halt.
7) Set md and go to Step 1.

Steps 1 through 4 form the digital code of a number. For example, 88837 forms the bit-list 1 0 0 0, 1 0 0 0, 1 0 0 0, 1 1, 1 1 1, where we used commas to delineate the decimal digits. That bit-list is 69919 in decimal, which is the digital code of 88837. To find a factor of 88837, compute successive digital codes until a gcd is greater than 1; the chain runs 88837, 69919, 54073, 2847, 2599, 5529, 2921, 333, at which point gcd(88837, 333) = 37 and we have found a factor: 88837 = 74 × 37. Factoring 96577 is even faster: the digital code of 96577 is 40319, and gcd(96577, 40319) = 23, which is a factor of 96577 = 13 × 17 × 19 × 23. But trying to factor 65059 = 17 × 43 × 89 causes an infinite loop. The paper gives a heuristic solution to that problem, but we’ll stop there and accept that sometimes a factorization fails.

Your task is to write a function that factors by digital coding. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at April 01, 2014 09:00 AM

March 31, 2014

Jeremy Kun

Elliptic Curve Diffie-Hellman

So far in this series we’ve seen elliptic curves from many perspectives, including the elementary, algebraic, and programmatic ones. We implemented finite field arithmetic and connected it to our elliptic curve code. So we’re in a perfect position to feast on the main course: how do we use elliptic curves to actually do cryptography? History As the reader has […]

by Jeremy Kun at March 31, 2014 03:22 PM

March 29, 2014

Peter Bex

CHICKEN internals: the garbage collector

One of CHICKEN's coolest features has to be its unique approach to garbage collection. When someone asked about implementation details (hi, Arthur!), I knew this would make for an interesting blog post. This post is going to be long and technical, so hold on to your hats! Don't worry if you don't get through this in one sitting.

Prerequisites

There's a whole lot of stuff that we'll need to explain before we get to the actual garbage collector. CHICKEN's garbage collection (GC) strategy is deeply intertwined with its compilation strategy, so I'll start by explaining the basics of that, before we can continue (pun intended) with the actual GC stuff.

A short introduction to continuation-passing style

The essence of CHICKEN's design is a simple yet brilliant idea by Henry Baker, described in his paper CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A.. The paper is pretty terse, but it's well-written, so I recommend you check it out before reading on. If you grok everything in it, you probably won't get much out of my blog post and you can stop reading now. If you don't grok it, it's probably a good idea to re-read it again later.

Baker's approach assumes a Scheme to C compiler which uses continuation-passing style (CPS) as an internal representation. This is the quintessential internal representation of Scheme programs, going all the way back to the first proper Scheme compiler, RABBIT.

Guy L. Steele (RABBIT's author) did not use CPS to make garbage collection easier. In fact, RABBIT had no GC of its own, as it relied on MacLISP as a target language (which compiled to PDP-10 machine code and had its own garbage collector). Instead, continuations allowed for efficient implementation of nested procedure calls. It eliminated the need for a stack to keep track of this nesting by simply returning the "next thing to do" to a driver loop which took care of invoking it. This made it possible to write down iterative algorithms as a recursive function without causing a stack overflow.

Let's consider a silly program which sums up all the numbers in a list, and shows the result multiplied by two:

(define (calculate-sum lst result)
  (if (null? lst)
      result
      (calculate-sum (cdr lst) (+ result (car lst)))))

(define (show-sum lst)
  (print-number (* 2 (calculate-sum lst 0))))

(show-sum '(1 2 3))

A naive compilation to C would look something like this (brutally simplified):

void entry_point() {
  toplevel();
  exit(0); /* Assume exit(1) is explicitly called elsewhere in case of error. */
}

void toplevel() {
  /* SCM_make_list() & SCM_fx() allocate memory.  "fx" stands for "fixnum". */
  SCM_obj *lst = SCM_make_list(3, SCM_fx(1), SCM_fx(2), SCM_fx(3));
  show_sum(lst);
}

SCM_obj* show_sum(SCM_obj *lst) {
  SCM_obj result = calculate_sum(lst, SCM_fx(0));
  /* SCM_fx_times() allocates memory. */
  return SCM_print_number(SCM_fx_times(SCM_fx(2), result));
}

SCM_obj* calculate_sum(SCM_obj *lst, SCM_obj *result) {
  if (lst == SCM_NIL) { /* Optimised */
    return result;
  } else {
    /* SCM_fx_plus() allocates memory. */
    SCM_obj *tmp = SCM_cdr(lst);
    SCM_obj *tmp2 = SCM_fx_plus(result, SCM_car(lst));
    return calculate_sum(tmp, tmp2); /* Recur */
  }
}

SCM_obj *SCM_print_number(SCM_obj *data) {
  printf("%d\n", SCM_fx_to_integer(data));
  return SCM_VOID;
}

This particular implementation probably can't use a copying garbage collector like CHICKEN uses, because the SCM_obj pointers which store the Scheme objects' locations would all become invalid. But let's ignore that for now.

Due to the recursive call in calculate_sum(), the stack just keeps growing, and eventually we'll get a stack overflow if the list is too long. Steele argued that this is a silly limitation which results in the proliferation of special-purpose "iteration" constructs found in most languages. Also, he was convinced that this just cramps the programmer's style: we shouldn't have to think about implementation details like the stack size. In his time people often used goto instead of function calls as a performance hack. This annoyed him enough to write a rant about it, which should be required reading for all would-be language designers!

Anyway, a compiler can transparently convert our Scheme program into CPS, which would look something like this after translation to C:

/* Set up initial continuation & toplevel call. */
void entry_point() {
  SCM_cont *cont = SCM_make_cont(1, &toplevel, SCM_exit_continuation);
  SCM_call *call = SCM_make_call(0, cont);
  SCM_driver_loop(call);
}

void SCM_driver_loop(SCM_call *call) {
  /* The trampoline to which every function returns its continuation. */
  while(true)
    call = SCM_perform_continuation_call(call);
}

SCM_call *toplevel(SCM_cont *cont) {
  SCM_cont *next = SCM_make_cont(1, &show_sum, cont);
  SCM_obj *lst = SCM_make_list(3, SCM_fx(1), SCM_fx(2), SCM_fx(3));
  return SCM_make_call(1, next, lst);
}

SCM_call *show_sum(SCM_cont *cont, SCM_obj *lst) {
  SCM_cont *next = SCM_make_cont(1, &show_sum_continued, cont);
  SCM_cont *now = SCM_make_cont(2, &calculate_sum, next);
  return SCM_make_call(2, now, lst, SCM_fx(0));
}

SCM_call *calculate_sum(SCM_cont *cont, SCM_obj *lst, SCM_obj *result) {
  if (lst == SCM_NIL) { /* Optimised */
    return SCM_make_call(1, cont, result);
  } else {
    SCM_obj *tmp = SCM_cdr(lst);
    SCM_obj *tmp2 = SCM_fx_plus(result, SCM_car(lst));
    SCM_cont *now = SCM_make_cont(2, &calculate_sum, cont);
    return SCM_make_call(2, now, tmp, tmp2); /* "Recur" */
  }
}

SCM_call *show_sum_continued(SCM_cont *cont, SCM_obj *result) {
  SCM_cont *now = SCM_make_cont(1, &SCM_print_number, cont);
  SCM_obj *tmp = SCM_fx_times(SCM_fx(2), result);
  return SCM_make_call(1, now, tmp);
}

SCM_call *SCM_print_number(SCM_cont *cont, SCM_obj *data) {
  printf("%d\n", SCM_fx_to_integer(data));
  return SCM_make_call(1, cont, SCM_VOID);
}

In the above code, there are two new data types: SCM_cont and SCM_call.

An SCM_cont represents a Scheme continuation as a C function's address, the number of arguments which it expects (minus one) and another continuation, which indicates where to continue after the C function has finished. This sounds recursive, but as you can see the very first continuation created by entry_point() is a specially prepared one which will cause the process to exit.

An SCM_call is returned to the driver loop by every generated C function: this holds a continuation and the arguments with which to invoke it. SCM_perform_continuation_call() extracts the SCM_cont from the SCM_call and invokes its C function with its continuation and the arguments from the SCM_call. We won't dwell on the details of its implementation now, but assume this is some magic which just works.

You'll also note that the primitives SCM_car(), SCM_cdr(), SCM_fx_plus() and SCM_fx_times() do not accept a continuation. This is a typical optimisation: some primitives can be inlined by the compiler. However, this is not required: you can make them accept a continuation as well, at the cost of further splintering the C functions into small sections; the calculate_sum() function would be split up into 4 separate functions if we did that.

Anyway, going back to the big picture we can see that this continuation-based approach consumes a more or less constant amount of stack space, because each function returns to driver_loop. Baker's fundamental insight was that the stack is there anyway (and it will be used by C), and if we don't need it for tracking function call nesting, why not use it for something else? He proposed to allocate all newly created objects on the stack. Because the stack would hopefully fit the CPU's cache in its entirety, this could give quite a performance benefit.

Generational collection

To understand why keeping new data together on the stack can be faster, it's important to know that most objects are quite short-lived. Most algorithms involve intermediate values, which are accessed quite a bit during a calculation but are no longer needed afterwards. These values need to be stored somewhere in memory. Normally you would store them together with all other objects in the main heap, which may cause fragmentation of said heap. Fragmentation means that memory references may cross page boundaries. This is slow, because it will clear out the CPU's memory cache and may even require swapping it in, if the machine is low on memory.

On top of that, generating a lot of intermediate values means generating a lot of garbage, which will trigger many GCs during which a lot of these temporary objects will be cleaned up. However, during these GCs, the remaining longer-lived objects must also be analysed before it can be decided they can stick around.

This is rather wasteful, and it turns out we can avoid doing so much work by categorising objects by age. Objects that have just been created belong to the first generation and are stored in their own space (called the nursery - I'm not kidding!), while those that have survived several GC events belong to older generations, which each have their own space reserved for them. By keeping different generations separated, you do not have to examine long-lived objects of older generations (which are unlikely to be collected) when collecting garbage in a younger generation. This can save us a lot of wasted time.

Managing data on the stack

The Cheney on the M.T.A. algorithm as used by CHICKEN involves only two generations; one generation consisting of newly created objects and the other generation consisting of older objects. In this algorithm, new objects get immediately promoted (or tenured) to the old generation after a GC of the nursery (or stack). Such a GC is called a minor GC, whereas a GC of the heap is called a major GC.

This minor GC is where the novelty lies: objects are allocated on the stack. You might wonder how that can possibly work, considering the lengths I just went through to explain how CPS conversion gets rid of the stack. Besides, by returning to the trampoline function whenever a new continuation is invoked, anything you'd store on the stack would need to get purged (that's how the C calling convention works).

That's right! The way to make this work is pretty counter-intuitive: we go all the way back to the first Scheme to C conversion I showed you and make it even worse. Whenever we want to invoke a continuation, we just call its function. That means that the example program we started out with would compile to this:

/* Same as before */
void entry_point() {
  SCM_cont *cont = SCM_make_cont(1, &toplevel, SCM_exit_continuation);
  SCM_call *call = SCM_make_call(0, cont);
  SCM_driver_loop(call);
}

SCM_call *saved_cont_call; /* Set by SCM_save_call, read by driver_loop */
jmp_buf empty_stack_state; /* Set by driver_loop, read by minor_GC */

void SCM_driver_loop(SCM_call *call) {
  /* Save registers (including stack depth and address in this function) */
  if (setjmp(empty_stack_state))
    call = saved_cont_call; /* Got here via longjmp()? Use stored call */

  SCM_perform_continuation_call(call);
}

void SCM_minor_GC() {
  /* ...
     Copy live data from stack to heap, which is a minor GC.  Described later.
     ... */
  longjmp(empty_stack_state); /* Restore registers (jump back to driver_loop) */
}

void toplevel(SCM_cont *cont) {
  if (!fits_on_stack(SCM_CONT_SIZE(0) + SCM_CALL_SIZE(1) +
                     SCM_FIXNUM_SIZE * 3 + SCM_PAIR_SIZE * 3)) {
    SCM_save_call(1, &show_sum, cont, lst); /* Mutates saved_cont_call */
    SCM_minor_GC(); /* Will re-invoke this function from the start */
  } else {
    /* The below stuff will all fit on the stack, as calculated in the if() */
    SCM_cont *next = SCM_make_cont(1, &show_sum, cont);
    SCM_obj *lst = SCM_make_list(3, SCM_fx(1), SCM_fx(2), SCM_fx(3));
    SCM_call *call = SCM_make_call(1, next, lst);
    SCM_perform_continuation_call(call);
  }
}

void show_sum(SCM_cont *cont, SCM_obj *lst) {
  if (!fits_on_stack(SCM_CONT_SIZE(0) * 2 +
                     SCM_CALL_SIZE(2) + SCM_FIXNUM_SIZE)) {
    SCM_save_call(1, &show_sum, cont, lst);
    SCM_minor_GC();
  } else {
    SCM_cont *next = SCM_make_cont(1, &show_sum_continued, cont);
    SCM_cont *now = SCM_make_cont(2, &calculate_sum, next);
    SCM_call *call = SCM_make_call(2, now, lst, SCM_fx(0));
    SCM_perform_continuation_call(call);
  }
}

void calculate_sum(SCM_cont *cont, SCM_obj *lst, SCM_obj *result) {
  /* This calculation is overly pessimistic as it counts both arms
     of the if(), but this is acceptable */
  if (!fits_on_stack(SCM_CALL_SIZE(1) + SCM_FIXNUM_SIZE +
                     SCM_CONT_SIZE(1) + SCM_CALL_SIZE(2))) {
    SCM_save_call(2, &calculate_sum, cont, lst, result);
    SCM_minor_GC();
  } else {
    if (lst == SCM_NIL) {
      SCM_call *call = SCM_make_call(1, cont, result);
      SCM_perform_continuation_call(call);
    } else {
      SCM_obj *tmp = SCM_cdr(lst);
      SCM_obj *tmp2 = SCM_fx_plus(result, SCM_car(lst));
      SCM_cont *now = SCM_make_cont(2, &calculate_sum, cont);
      SCM_call *call = SCM_make_call(2, now, tmp, tmp2);
      SCM_perform_continuation_call(call); /* "Recur" */
    }
  }
}

void show_sum_continued(SCM_cont *cont, SCM_obj *result) {
  if (!fits_on_stack(SCM_CONT_SIZE(1) + SCM_CALL_SIZE(1) + SCM_FIXNUM_SIZE)) {
    SCM_save_call(1, &show_sum_continued, cont, result);
    SCM_minor_GC();
  } else {
    SCM_cont *now = SCM_make_cont(1, &SCM_print_number, cont);
    SCM_obj *tmp = SCM_fx_times(SCM_fx(2), result);
    SCM_call *call = SCM_make_call(1, now, tmp);
    SCM_perform_continuation_call(call);
  }
}

void SCM_print_number(SCM_cont *cont, SCM_obj *data) {
  if (!fits_on_stack(SCM_CALL_SIZE(1))) {
    SCM_save_call(1, &show_sum_continued, cont, data);
    SCM_minor_GC();
  } else {
    printf("%d\n", SCM_fx_to_integer(data));
    SCM_call *call = SCM_make_call(1, cont, SCM_VOID);
    SCM_perform_continuation_call(call);
  }
}

Whew! This program is quite a bit longer, but it isn't that different from the second program I showed you. The main change is that none of the continuation functions return anything. In fact, these functions, like Charlie in the M.T.A. song, never return. In the earlier version every function ended with a return statement, now they end with an invocation of SCM_perform_continuation_call().

To make things worse, allocating functions now also use alloca() to place objects on the stack. That means that the stack just keeps filling up like the first compilation I showed you, so we're back to where we started! However, this program is a lot longer due to one important thing: At the start of each continuation function we first check to see if there's enough space left on the stack to accommodate the objects this function will allocate.

If there's not enough space, we re-create the SCM_call with which this continuation function was invoked using SCM_save_call(). This differs from SCM_make_call() in that it will not allocate on the stack, but will use a separate area to set aside the call object. The pointer to that area is stored in saved_cont_call.

SCM_save_call() can't allocate on the stack for a few reasons: The first and most obvious reason is that the saved call wouldn't fit on the stack because we just concluded it is already full. Second, the arguments to the call must be kept around even when the stack is blown away after the GC has finished. Third, this stored call contains the "tip" of the iceberg of live data from which the GC will start its trace. This is described in the next section.

After the minor GC has finished, we can jump back to the trampoline again. We use the setjmp() and longjmp() functions for that. When the first call to SCM_driver_loop() is made, it will call setjmp() to save all the CPU's registers to a buffer. This includes the stack and instruction pointers. Then, when the minor GC finishes, it calls longjmp() to restore those registers. Because the stack and instruction pointer are restored, this means execution "restarts" at the place in driver_loop() where setjmp() was invoked. The setjmp() then returns again, but now with a nonzero value (it was zero the first time). The return value is checked and the call is fetched from the special save area to get back to where we were just before we performed the GC.

This is half the magic, so please make sure you understand this part!

The minor GC

The long story above served to set up all the context you need to know to dive into the GC itself, so let's take a closer look at it.

Picking the "live" data from the stack

As we've seen, the GC is invoked when the stack has completely filled up. At this point, the stack is a complete mess: it has many stack frames from all the function calls that happened between the previous GC and now. These stack frames consist of return addresses for the C function calls (which we're not even using), stack-allocated C data (which we don't need) and somewhere among that mess there are some Scheme objects. These objects themselves also belong to two separate categories: the "garbage" and the data that's still being used and needs to be kept around (the so-called live data). How on earth are we going to pick only the interesting bits from that mess?

Like I said before, the saved call contains the "tip of the iceberg" of live data. It turns out this is all we need to get at every single object which is reachable to the program. All you need to do is follow the pointers to the arguments and the continuation stored in the call. For each of these objects, you copy them to the heap and if they are compound objects you follow all the pointers to the objects stored within them, and so on. Let's take a look at a graphical representation of how this works. In the picture below I show the situation where a GC is triggered just after the second invocation of calculate-sum (i.e., the first recursive call of itself, with the list '(2 3)):

After the initial shock from seeing this cosmic horror has worn off, let's take a closer look. It's like a box of tangled cords: if you take the time to carefully untangle them, it's easy, but if you try to do it all at once, it'll leave you overwhelmed. Luckily, I'm going to talk you through it. (by the way: this is an SVG so you can zoom in on details as far as you like using your browser's zooming functionality).

Let's start with the big picture: On the left you see the stack, on the right the heap after copying and in the bottom centre there's a small area of statically allocated objects, which are not subject to GC. To get your bearings, check the left margin of the diagram. I have attempted to visualise C stack frames by writing each function's name above a line leading to the bottom of its frame.

Let's look at the most recently called function, at the top of the stack. This is the function which initiated the minor GC: the second call to calculate_sum(). The shaded area shows the pointers set aside by SCM_save_call(), which form the tip of the iceberg of live data. More on that later.

The next frame belongs to the first call to calculate_sum(). It has allocated a few things on the stack. The topmost element on the stack is the last thing that's allocated due to the way the stack grows upwards in this picture. This is a pointer to an SCM_call object, marked with "[call]", which is the name of the variable which is stored there. If you go back to the implementation of calculate_sum(), you can see that the last thing it does is allocate an SCM_call, and store its pointer in call. The object itself just precedes the variable on the stack, and is marked with a thick white border to group together the machine words from which it is composed. From bottom to top, these are:

  • A tag which indicates that this is a call containing 2 arguments,
  • a pointer to an SCM_cont object (taken from the now variable),
  • a pointer to an SCM_obj object (the cdr of lst, taken from tmp) and
  • a pointer to an SCM_obj object (a fixnum, taken from tmp2).

Other compound objects are indicated in the same way.

You'll also have noticed the green, white and dashed arcs with arrow tips. These connect pointers to their target addresses. The dashed ones on the right hand side of the stack indicate pointers that are used for local variables in C functions or SCM_call objects. These pointers are unimportant to the garbage collector. The ones on the left hand side of the stack are pointers from Scheme objects to other Scheme objects. These are important to the GC. The topmost pointer inside the call object we just looked at has a big dashed curve all the way down to the cdr of lst, and the one below it points at the value of result, which is the fixnum 1.

If you look further down the stack, you'll see the show_sum procedure which doesn't really allocate much: an SCM_call, the initial intermediate result (fixnum 0), and two continuations (next and now in the C code). The bulk of the allocation happens in toplevel, which contains the call to show_sum and allocates a list structure. This is on the stack in reverse order: first the pair X = (3 . ()), then the pair Y = (2 . <X>) and the pair Z = (1 . <Y>). The () is stored as SCM_NIL in the static area, to which the cdr of the bottom-most pair object on the stack points, which is represented by a long green line which swoops down to the static area.

Copying the live data to the heap

The green lines represent links from the saved call to the live data which we need to copy. You can consider the colour green "contagious": imagine everything is white initially, except for the saved call. Then, each line starting at the pointers of the call are painted green. The target object to which a line leads is also painted green. Then, we recursively follow lines from pointers in that object and paint those green, etc. The objects that were already in the heap or the static area are not traversed, so they stay white.

When an object is painted green, it is also copied to the heap, which is represented by a yellow line. The object is then overwritten by a special object which indicates that this object has been moved to the heap. This special object contains a forwarding pointer which indicates the new location of the object. This is useful when you have two objects which point to the same other object, like for example in this code:

(let ((a (list 3 2 1))
      (b (cons 4 a))
      (c (cons 4 a)))
  ...)

Here you have two lists (4 3 2 1) which share a common tail. If both lists are live at the moment of GC, we don't want to copy this tail twice, because that would result in it being split into two distinct objects. Then, a set-car! on a might only be reflected in b but not c, for example. The forwarding pointers prevent this from happening by simply adjusting a copied object's constituent objects to point to their new locations. Finally, after all data has been copied, all the newly copied objects are checked again for references to objects which may have been relocated after the object was copied.

The precise algorithm that performs this operation is very clever. It requires only two pointers and a while loop, but it still handles cyclic data structures correctly. The idea is that you do the copying I described above in a breadth-first way: you only copy the objects stored in the saved call (without touching their pointers). Next, you loop from the start of the heap to the end, looking at each object in turn (initially, those are the objects we just copied). For these objects, you check their components, and see whether they exist in the heap or in the stack. If they exist in the stack, you copy them over to the end of the heap (again, without touching their pointers). Because they are appended to the heap, the end pointer gets moved to the end of the last object, so the while loop will also take the newly copied objects into consideration. When you reach the end of the heap, you're done. In C, that would look something like this:

SCM_obj *slot;
int i, bytes_copied;
char *scan_start = heap_start;

for(i = 0; i < saved_object_count(saved_call); ++i) {
  obj = get_saved_object(saved_call, i);
  /* copy_object() is called "mark()" in CHICKEN.
     It also set up a forwarding pointer at the original location */
  bytes_copied = copy_object(obj, heap_end);
  heap_end += bytes_copied;
}

while(scan_start < heap_end) {
  obj = (SCM_obj *)scan_start;
  for(i = 0; i < object_size(obj); ++i) {
    slot = get_slot(obj, i);
    /* Nothing needs to be done if it's in the heap or static area */
    if (exists_in_stack(slot)) {
      if (is_forwarding_ptr(slot)) {
        set_slot(obj, i, forwarding_ptr_target(slot));
      } else {
        bytes_copied = copy_object(slot, heap_end);
        set_slot(obj, i, heap_end);
        heap_end += bytes_copied;
      }
    }
  }
  scan_start += object_size(obj);
}

This algorithm is the heart of our garbage collector. You can find it in runtime.c in the CHICKEN sources in C_reclaim(), under the rescan: label. The algorithm was invented in 1970 by C.J. Cheney, and is still used in the most "state of the art" implementations. Now you know why Henry Baker's paper is called "Cheney on the M.T.A." :)

After the data has been copied to the heap, the longjmp() in minor_GC() causes everything on the stack to be blown away. Then, the top stack frame is recreated from the saved call. This is illustrated below:

Everything in the shaded red area below the stack frame for driver_loop() is now unreachable because there are no more pointers from live data pointing into this region of the stack. Any live Scheme objects allocated here would have been copied to the heap, and all pointers which pointed there relayed to this new copy. Unfortunately, this stale copy of the data will permanently stick around on the stack, which means this data is forever irreclaimable. This means it is important that the entry point should consume as little stack space as possible.

The major GC

You might be wondering how garbage on the heap is collected. That's what the major GC is for. CHICKEN initially only allocates a small heap area. The heap consists of two halves: a fromspace and a tospace. The fromspace is the heap as we've seen it so far: in normal usage, this is the part that's used. The tospace is always empty.

When a minor GC is copying data from the stack to the fromspace, it may cause the fromspace to fill up. That's when a major GC is triggered: the data in the fromspace is copied to the tospace using Cheney's algorithm. Afterwards, the areas are flipped: the old fromspace is now called tospace and the old tospace is now called fromspace.

During a major GC, we have a slightly larger set of live data. It is not just the data from the saved call, because that's only the stuff directly used by the currently running continuation. We also need to consider global variables and literal objects compiled into the program, for example. These sorts of objects are also considered live data. Aside from this, a major collection is performed the same way as a minor collection.

The smart reader might have noticed a small problem here: what if the amount of garbage cleaned up is less than the data on the stack? Then, the stack data can't be copied to the new heap because it simply is too small. Well, this is when a third GC mode is triggered: a reallocating GC. This causes a new heap to be allocated, twice as big as the current heap. This is also split in from- and tospace. Then, Cheney's algorithm is performed on the old heap's fromspace, using one half of the new heap as tospace. When it's finished, the new tospace is called fromspace, and the other half of the new heap is called tospace. Then, the old heap is de-allocated.

Some practical notes

The above situation is a pretty rough sketch of the way the GC works in CHICKEN. Many details have been omitted, and the actual implementation is extremely hairy. Below I'll briefly mention how a few important things are implemented.

Object representation

You might have noticed that the stack grows pretty quickly in the CPS-converted C code I showed you. That's because the SCM_obj representation requires allocating every object and storing a pointer to it, so that's a minimum of two machine words per object.

CHICKEN avoids this overhead for small, often-used objects like characters, booleans and fixnums. It ensures all allocated objects are word-aligned, so all pointers to objects have their lower bits set to zero. This means you can easily see whether something is a pointer to an object or something else.

All objects in CHICKEN are represented by a C_word type, which is the size of a machine word. So-called immediate values are stored directly inside the machine word, with nonzero lower bits. Non-immediate values are cast to a pointer type to a C structure which contains the type tag and bits like I did in the example.

Calls are not represented by objects in CHICKEN. Instead, the C function is simply invoked directly from the continuation's caller. Continuations are represented as any other object. For didactic reasons, I used a separate C type to distinguish it from SCM_obj, but in Scheme continuations can be reified as first-class objects, so they shouldn't be represented in a fundamentally different way.

Closures

You might be wondering how closures are implemented, because this hasn't been discussed at all. The answer is pretty simple: in the example code, a SCM_call object stored a plain C function's address. Instead, we could store a closure instead: this is a new type of object which holds a C function plus its local variables. Each C function receives this closure as an extra argument (in the CHICKEN sources this is called self). When it needs to access a closed-over value, it can be accessed from the closure object.

Mutations

Another major oversight is the assumption that objects can only point from the stack into the heap. If Scheme was a purely functional language, this would be entirely accurate: new objects can refer to old objects, but there is no way that a preexisting object can be made to refer to a newly created object. For that, you need to support mutation.

But Scheme does support mutation! So what happens when you use vector-set! to store a newly created, stack-allocated value in an old, heap-allocated vector? If we used the above algorithm, the newly created element would either be part of the live set and get copied, but the vector's pointer would not be updated, or it wouldn't be part of the live set and the object would be lost in the stack reset.

The answer to this problem is also pretty simple: we add a so-called write barrier. Whenever a value is written to an object, it is remembered. Then, when performing a GC, these remembered values are considered to be part of the live set, just like the addresses in the saved call. This is also the reason CHICKEN always shows the number of mutations when you're asking for GC statistics: mutation may slow down a program because GCs might take longer.

Stack size

How does CHICKEN know when the stack is filled up? It turns out that there is no portable way to detect how big the stack is, or whether it has a limit at all!

CHICKEN works around this simply by limiting its stack to a predetermined size. On 64-bit systems, this is 1MB, on 32-bit systems it's 256KB. There is also no portable way of obtaining the address of the stack itself. On some systems, it uses a small bit of assembly code to check the stack pointer. On other systems, it falls back on alloca(), allocating a trivial amount of data. The address of the allocated data is the current value of the stack pointer.

When initialising the runtime, just before the entry point is called, the stack's address is taken to determine the stack's bottom address. The top address is checked in the continuation functions, and the difference between the two is the current stack size.

A small rant

While doing the background research for this post, I wanted to read Cheney's original paper. It was very frustrating to find so many references to it, which all lead to a a paywall on the ACM website.

I think it's absurd that the ACM charges $15 for a paper which is over forty years old, and only two measly pages long. What sane person would plunk down 15 bucks to read 2 pages, especially if it is possibly outdated, or not even the information they're looking for?

The ACM's motto is "Advancing Computing as a Science & Profession", but I don't see how putting essential papers behind a paywall is advancing the profession, especially considering how many innovations now happen as unpaid efforts in the open source/free software corner of the world. Putting such papers behind a paywall robs the industry from a sorely-needed historical perspective, and it stifles innovation by forcing us to keep reinventing the wheel.

Some might argue that the ACM needs to charge money to be able to host high-quality papers and maintain its high quality standard, but I don't buy it. You only need to look at USENIX, which is a similar association. They provide complete and perpetual access to all conference proceedings, and the authors maintain full rights to their work. The ACM, instead, has now come up with a new "protection" racket, requiring authors to give full control of their rights to the ACM, or pay for the privilege of keeping the rights on their own work, which is between $1,100 and $1,700 per article.

On a more positive note, authors are given permission to post drafts of their papers on their own website or through their "Author-izer" service. Unfortunately, this service only works when the link is followed directly from the domain on which the author's website is located (through the Referer header). This is not how the web works: it breaks links posted in e-mail as well as search engines.

Secondly, the ACM are also allowing their special interest groups to provide full access to conference papers of the most recent conference. However, this doesn't seem to be encouraged in any way, and only a few SIGs seem to do this.

Luckily, I found a copy of the Cheney paper on some course website. So do yourself a favour and get it before it's taken down :(

Update: If you are also concerned about this, please take a small moment to add your name to this petition.

Further reading

Garbage collection is a fascinating subject whose roots can be traced back all the way to the origins of LISP. There's plenty of information to be found:

by Peter Bex at March 29, 2014 10:58 AM

March 28, 2014

Programming Praxis

Brent’s Root Finder

[ Today's exercise was written by guest author Paul Hofstra, who holds a PhD in Theoretical Nuclear Physics from the Free University in Amsterdam and is now retired from a major oil company. Guest authors are always welcome; contact me at the link in the menu bar if you are interested in writing for Programming Praxis. ]

We discussed various algorithms for finding the roots of a function in two previous exercises. In the first of those exercises, we looked at the bisection method, which is “slow and steady” as it always finds a solution, and the secant method, which is sometimes very fast but also sometimes very slow. In the second of those two exercises, we looked at an algorithm of Theodorus Dekker that combines the bisection and secant methods in a way that retains the speed of the secant method where possible but retains the steadiness of the bisection method where necessary. Later, Richard Brent (the same Brent that improved Pollard’s rho algorithm for finding factors), observed that there are still some functions for which Dekker’s algorithm performs poorly, and he developed the method that is most commonly used today.

Brent augmented the linear interpolation of the secant method with a quadratic interpolation that tends to get closer to the root more quickly than the secant method, and he made conditions that force a bisection step at least every three steps, so his method can never be worse than three times the bisection method, and usually much better. Brent calculates a new point using both the secant and bisection methods, as does Dekker, but instead of simply accepting the secant method when it is better, he calculates another potential point using the quadratic method and takes that instead if it is better than the secant point. You can read Brent’s description of his algorithm at http://maths-people.anu.edu.au/~brent/pd/rpb005.pdf.

Your task is to implement Brent’s root finder. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at March 28, 2014 09:00 AM

March 26, 2014

Per Bothner

Shell-style programming in Kawa

A number of programming languages have facilities to allow access to system processes (commands). For example Java has java.lang.Process and java.lang.ProcessBuilder. However, they're not as convenient as as the old Bourne shell, or as elegant in composing commands.

If we ignore syntax, the shell's basic model is that a command is a function that takes an input string (standard input) along with some string-valued command-line arguments, and whose primary result is a string (standard output). The command also has some secondary outputs (including standard error, and the exit code). However, the elegance comes from treating standard output as a string that can be passed to another command, or used in other contexts that require a string (e.g. command substitution). This article presents how Kawa solves this problem.

Process is auto-convertable to string

To run a command like date, you can use the run-process procedure:

#|kawa:2|# (define p1 (run-process "date --utc"))
Equivalently you can use the &`{command} syntactic sugar:
#|kawa:2|# (define p1 &`{date --utc})

But what kind of value is p1? It's an instance of gnu.kawa.functions.LProcess, which is a class that extends java.lang.Process. You can see this if you invoke toString or call the write procedure:

#|kawa:2|# (p1:toString)
gnu.kawa.functions.LProcess@377dca04
#|kawa:3|# (write p1)
gnu.kawa.functions.LProcess@377dca04

An LProcess is automatically converted to a string or a bytevector in a context that requires it. More precisely, an LProcess can be converted to a blob because it implements Lazy<Blob>. This means you can convert to a string (or bytevector):

#|kawa:9|# (define s1 ::string p1)
#|kawa:10|# (write s1)
"Wed Jan  1 01:18:21 UTC 2014\n"
#|kawa:11|# (define b1 ::bytevector p1)
(write b1)
#u8(87 101 100 32 74 97 110 ... 52 10)

However the display procedure prints it in "human" form, as a string:

#|kawa:4|# (display p1)
Wed Jan  1 01:18:21 UTC 2014

This is also the default REPL formatting:

#|kawa:5|# &`{date --utc}
Wed Jan  1 01:18:22 UTC 2014

Command arguments

The general form for run-process is:

(run-process keyword-argument... command)

The command is the process command-line. It can be an array of strings, in which case those are used as the command arguments directly:

(run-process ["ls" "-l"])
The command can also be a single string, which is split (tokenized) into command arguments separated by whitespace. Quotation groups words together just like traditional shells:
(run-process "cmd a\"b 'c\"d k'l m\"n'o")
   ⇒ (run-process ["cmd"   "ab 'cd"   "k'l m\"no"])

Using string templates is more readable as it avoids having to quote quotation marks:

(run-process &{cmd a"b 'c"d k'l m"n'o})
You can also use the abbreviated form:
&`{cmd a"b 'c"d k'l m"n'o}
This syntax is the same as of SRFI-108 named quasi-literals. In general, the following are roughly equivalent (the difference is that the former does smart quoting of embedded expressions, as discussed later):
&`{command}
(run-command &{command})

Similarly, the following are also roughly equivalent:

&`[keyword-argument...]{command}
(run-command keyword-argument... &{command})

A keyword-argument can specify various properties of the process. For example you can specify the working directory of the process:

(run-process directory: "/tmp" "ls -l")
You can use the shell keyword to specify that we want to use the shell to split the string into arguments. For example:
(run-process shell: #t "command line")
is equivalent to:
(run-process ["/bin/sh" "-c" "command line"])
You can can also use the abbreviation &sh:
&sh{rm *.class}
which is equivalent to:
&`{/bin/sh -c "rm *.class"}

In general, the abbreviated syntax:

&sh[args...]{command}
is equivalent to:
&`[shell: #t args...]{command}

Command and variable substitution

Traditional shells allow you to insert the output from a command into the command arguments of another command. For example:
echo The directory is: `pwd`
The equivalent Kawa syntax is:
&`{echo The directory is: &`{pwd}}

This is just a special case of substituting the result from evaluating an expression. The above is a short-hand for:

&`{echo The directory is: &[&`{pwd}]}

In general, the syntax:

...&[expression]...
evaluates the expression, converts the result to a string, and combines it with the literal string. (We'll see the details in the next section.) This general form subsumes command substitution, variable substitution, and arithmetic expansion.

Tokenization of substitution result

Things gets more interesting when considering the interaction between substitution and tokenization. This is not simple string interpolation. For example, if an interpolated value contains a quote character, we want to treat it as a literal quote, rather than a token delimiter. This matches the behavior of traditional shells. There are multiple cases, depending on whether the interpolation result is a string or a vector/list, and depending on whether the interpolation is inside a quotes.

  • If the value is a string, and we're not inside quotes, then all non-whitespace characters (including quotes) are literal, but whitespace still separates tokens:
    (define v1 "a b'c ")
    &`{cmd x y&[v1]z}  ⟾  (run-process ["cmd" "x" "ya" "b'c" "z"])
    
  • If the value is a string, and we are inside single quotes, all characters (including whitespace) are literal.
    &`{cmd 'x y&[v1]z'}  ⟾  (run-process ["cmd" "x ya b'c z"])
    

    Double quotes work the same except that newline is an argument separator. This is useful when you have one filename per line, and the filenames may contain spaces, as in the output from find:

    &`{ls -l "&`{find . -name '*.pdf'}"}
    

    If the string ends with one or more newlines, those are ignored. This rule (which also applies in the previous not-inside-quotes case) matches traditional shell behavior.

  • If the value is a vector or list (of strings), and we're not inside quotes, then each element of the array becomes its own argument, as-is:
    (define v2 ["a b" "c\"d"])
    &`{cmd &[v2]}  ⟾  (run-process ["cmd" "a b" "c\"d"])
    
    However, if the enclosed expression is adjacent to non-space non-quote characters, those are prepended to the first element, or appended to the last element, respectively.
    &`{cmd x&[v2]y}  ⟾  (run-process ["cmd" "xa b" "c\"dy"])
    &`{cmd x&[[]]y}  ⟾  (run-process ["cmd" "xy"])
    
    This behavior is similar to how shells handle "$@" (or "${name[@]}" for general arrays), though in Kawa you would leave off the quotes.

    Note the equivalence:

    &`{&[array]}  ⟾  (run-process array)
    
  • If the value is a vector or list (of strings), and we are inside quotes, it is equivalent to interpolating a single string resulting from concatenating the elements separated by a space:
    &`{cmd "&[v2]"}  ⟾  (run-process ["cmd" "a b c\"d"])
    

    This behavior is similar to how shells handle "$*" (or "${name[*]}" for general arrays).

  • If the value is the result of a call to unescaped-data then it is parsed as if it were literal. For example a quote in the unescaped data may match a quote in the literal:
    (define vu (unescaped-data "b ' c d '"))
    &`{cmd 'a &[vu]z'}  ⟾  (run-process ["cmd" "a b " "c" "d" "z"])
    
  • If we're using a shell to tokenize the command, then we add quotes or backslashes as needed so that the shell will tokenize as described above:
    &sh{cmd x y&[v1]z}  ⟾  (run-process ["/bin/sh" "-c" "cmd x y'a' 'b'\\'''c' z'"])
    
Smart tokenization only happens when using the quasi-literal forms such as &`{command}. You can of course use string templates with run-process:
(run-process &{echo The directory is: &`{pwd}})
However, in that case there is no smart tokenization: The template is evaluated to a string, and then the resulting string is tokenized, with no knowledge of where expressions were substituted.

Input/output redirection

You can use various keyword arguments to specify standard input, output, and error streams. For example to lower-case the text in in.txt, writing the result to out.txt, you can do:

&`[in-from: "in.txt" out-to: "out.txt"]{tr A-Z a-z}
or:
(run-process in-from: "in.txt" out-to: "out.txt" "tr A-Z a-z")

These options are supported:

in: value
The value is evaluated, converted to a string (as if using display), and copied to the input file of the process. The following are equivalent:
&`[in: "text\n"]{command}
&`[in: &`{echo "text"}]{command}

You can pipe the output from command1 to the input of command2 as follows:

&`[in: &`{command1}]{command2}
in-from: path
The process reads its input from the specified path, which can be any value coercible to a filepath.
out-to: path
The process writes its output to the specified path.
err-to: path
Similarly for the error stream.
out-append-to: path
err-append-to: path
Similar to out-to and err-to, but append to the file specified by path, instead of replacing it.
in-from: 'pipe
out-to: 'pipe
err-to: 'pipe
Does not set up redirection. Instead, the specified stream is available using the methods getOutputStream, getInputStream, or getErrorStream, respectively, on the resulting Process object, just like Java's ProcessBuilder.Redirect.PIPE.
in-from: 'inherit
out-to: 'inherit
err-to: 'inherit
Inherits the standard input, output, or error stream from the current JVM process.
out-to: port
err-to: port
Redirects the standard output or error of the process to the specified port.
out-to: 'current
err-to: 'current
Same as out-to: (current-output-port), or err-to: (current-error-port), respectively.
in-from: port
in-from: 'current
Re-directs standard input to read from the port (or (current-input-port)). It is unspecified how much is read from the port. (The implementation is to use a thread that reads from the port, and sends it to the process, so it might read to the end of the port, even if the process doesn't read it all.)
err-to: 'out
Redirect the standard error of the process to be merged with the standard output.

The default for the error stream (if neither err-to or err-append-to is specifier) is equivalent to err-to: 'current.

Note: Writing to a port is implemented by copying the output or error stream of the process. This is done in a thread, which means we don't have any guarantees when the copying is finished. A possible approach is to have to process-exit-wait (discussed later) wait for not only the process to finish, but also for these helper threads to finish.

Here documents

A here document is a form a literal string, typically multi-line, and commonly used in shells for the standard input of a process. Kawa's string literals or string quasi-literals can be used for this. For example, this passes the string "line1\nline2\nline3\n" to the standard input of command:
(run-process [in: &{
    &|line1
    &|line2
    &|line3
    }] "command")

The &{...} delimits a string; the &| indicates the preceding indentation is ignored.

Pipe-lines

Writing a multi-stage pipe-line quickly gets ugly:

&`[in: &`[in: "My text\n"]{tr a-z A-Z}]{wc}
Aside: This would be nicer in a language with in-fix operators, assuming &` is treated as a left-associative infix operator, with the input as the operational left operand:
"My text\n" &`{tr a-z A-Z} &`{wc}

The convenience macro pipe-process makes this much nicer:

(pipe-process
  "My text\n"
  &`{tr a-z A-Z}
  &`{wc})

All but the first sub-expressions must be (optionally-sugared) run-process forms. The first sub-expression is an arbitrary expression which becomes the input to the second process expression; which becomes the input to the third process expression; and so on. The result of the pipe-process call is the result of the last sub-expression.

Copying the output of one process to the input of the next is optimized: it uses a copying loop in a separate thread. Thus you can safely pipe long-running processes that produce huge output. This isn't quite as efficient as using an operating system pipe, but is portable and works pretty well.

Setting the process environment

By default the new process inherits the system environment of the current (JVM) process as returned by System.getenv(), but you can override it.
env-name: value
In the process environment, set the "name" to the specified value. For example:
&`[env-CLASSPATH: ".:classes"]{java MyClass}
NAME: value
Same as using the env-name option, but only if the NAME is uppercase (i.e. if uppercasing NAME yields the same string). For example the previous example could be written:
(run-process CLASSPATH: ".:classes" "java MyClass")
environment: env
The env is evaluated and must yield a HashMap. This map is used as the system environment of the process.

Process-based control flow

Traditional shell provides logic control flow operations based on the exit code of a process: 0 is success (true), while non-zero is failure (false). Thus you might see:

if grep Version Makefile >/dev/null
then echo found Version
else echo no Version
fi

One idea to have a process be auto-convertible to a boolean, in addition to be auto-convertible to strings or bytevectors: In a boolean context, we'd wait for the process to finish, and return #t if the exit code is 0, and #f otherwise. This idea may be worth exploring later.

Currently Kawa provides process-exit-wait which waits for a process to exit, and then returns the exit code as an int. The convenience function process-exit-ok? returns true iff process-exit-wait returns 0.

(process-exit-wait (run-process "echo foo")) ⟾ 0
The previous sh example could be written:
(if (process-exit-ok? &`{grep Version Makefile})
  &`{echo found}
  &`{echo not found})
Note that unlike the sh, this ignores the output from the grep (because no-one has asked for it). To match the output from the shell, you can use out-to: 'inherit:
(if (process-exit-ok? &`[out-to: 'inherit]{grep Version Makefile})
  &`{echo found}
  &`{echo not found})

March 26, 2014 06:46 AM

Is it a binary file? Is it a text file? It's a Blob

Let's say we want to get the contents of a file as a value, using a simple function, without using a port or looping. Kawa has a function to do that:
(path-data path)
The path can be a Path object, or anything that can be converted to a Path, including a filename string or a URL.

You can also use the following syntactic sugar, which is an example of SRFI-108 named quasi-literals:

&<{pname}
This syntax is meant to suggest the shell input redirection operator <pname. The meaning of &<{pname} is the same as (path-data &{pname}), where &{pname} is a SRFI-109 string quasi-literal. (This is almost the same as (path-data "pname") using a traditional string literal, except for the rules for quoting and escaping.)

What kind of object is returned by &<{pname}? And what is printed when you type that at the REPL? Fundamentally, in modern computers the contents of a file is a sequence of uninterpreted bytes. Most commonly, these bytes represent text in a locale-dependent encoding, but we don't always know this. Sometimes they're images, or videos, or word-processor documents. It's like writing assembly code: you have to know the types of your values. At best we can guess at the type of a file based on its name or extension or looking for magic numbers. So unless we have more information, we'll say that path-data returns a blob, and we'll implementing it using the gnu.lists.Blob type.

$ cat README
Check doc directory.
$ kawa
#|kawa:1|# (define readme &<{README})
#|kawa:2|# readme:class
class gnu.lists.Blob
You can explicitly coerce a Blob to a string or to a bytevector:
#|kawa:3|# (write (->string readme))
"Check doc directory.\n"
#|kawa:4|# (write (->bytevector readme))
#u8(67 104 101 99 107 32 100 111 99 32 100 105 114 101 99 116 111 114 121 46 10)
#|kawa:5|# (->bytevector readme):class
class gnu.lists.U8Vector

The output of a command (which we'll discuss in the next article): is also a blob. For almost all programs, standard output is printable text, because if you try to run a program without re-direction, and it spews out binary data, it may mess up your terminal, which is annoying. Which suggests an answer to what happens when you get a blob result in the REPL: The REPL should try to print out the contents as text, converting the bytes of the blob to a string using a default encoding:

#|kawa:6|# &<{README}
Check doc directory.
It makes sense look at the bytes to see if we can infer an encoding, especially on Windows which doesn't use a default encoding. Currently Kawa checks for a byte-order mark; more sniffing is likely to be added later.

What if the file is not a text file? It might be reasonable to be able to configure a handler for binary files. For example for a .jpg image file, if the the console can display images, it makes sense to display the image inline. It helps if the blob has a known MIME type. (I believe a rich text console should be built using web browser technologies, but that's a different topic.)

Writing to a file

The &<{..} operation can be used with set! to replace the contents of a file:

(set! &<{README} "Check example.com\n")

If you dislike using < as an output operator, you can instead using the &>{..} operation, which evaluates to function whose single argument is the new value:

(&>{README} "Check example.com\n")

You can use &>> to append more data to a file:

(&>>{README} "or check example2.com\n")

The current directory

Functions like path-data or open-input-file or the sugar we seen above all use the current directory as a base directory for relative pathname. You get the current value of the current directory with the expression (current-path). This returns a path object, which prints as the string value of the path.

The initial value of (current-path) is the value of the "user.dir" property, but you change it using a setter:

(set! (current-path) "/opt/myApp/")
A string value is automatically converted to a path, normally a filepath.

The procedure current-path is a parameter, so you can alternatively call it with the new value:

(current-path "/opt/myApp/")
You can also use the parameterize form:
(parameterize ((current-path "/opt/myApp/"))
  (list &<{data1.txt} &<{data2.txt}))

March 26, 2014 06:43 AM

March 25, 2014

Grant Rettke

March 24, 2014

Jeremy Kun

Where Math ∩ Programming is Headed

This week is Spring break at UI Chicago. While I’ll be spending most of it working, it does give me some downtime to reflect. We’ve come pretty far, dear reader, in these almost three years. I learned, you learned. We all laughed. My blog has become my infinite source of entertainment and an invaluable tool for […]

by Jeremy Kun at March 24, 2014 03:23 PM