Planet Scheme

October 12, 2016

Andy Wingo

An incomplete history of language facilities for concurrency

I have lately been in the market for better concurrency facilities in Guile. I want to be able to write network servers and peers that can gracefully, elegantly, and efficiently handle many tens of thousands of clients and other connections, but without blowing the complexity budget. It's a hard nut to crack.

Part of the problem is implementation, but a large part is just figuring out what to do. I have often thought that modern musicians must be crushed under the weight of recorded music history, but it turns out in our humble field that's also the case; there are as many concurrency designs as languages, just about. In this regard, what follows is an incomplete, nuanced, somewhat opinionated history of concurrency facilities in programming languages, with an eye towards what I should "buy" for the Fibers library I have been tinkering on for Guile.

* * *

Modern machines have the raw capability to serve hundreds of thousands of simultaneous long-lived connections, but it’s often hard to manage this at the software level. Fibers tries to solve this problem in a nice way. Before discussing the approach taken in Fibers, it’s worth spending some time on history to see how we got here.

One of the most dominant patterns for concurrency these days is “callbacks”, notably in the Twisted library for Python and the Node.js run-time for JavaScript. The basic observation in the callback approach to concurrency is that the efficient way to handle tens of thousands of connections at once is with low-level operating system facilities like poll or epoll. You add all of the file descriptors that you are interested in to a “poll set” and then ask the operating system which ones are readable or writable, as appropriate. Once the operating system says “yes, file descriptor 7145 is readable”, you can do something with that socket; but what? With callbacks, the answer is “call a user-supplied closure”: a callback, representing the continuation of the computation on that socket.

Building a network service with a callback-oriented concurrency system means breaking the program into little chunks that can run without blocking. Whereever a program could block, instead of just continuing the program, you register a callback. Unfortunately this requirement permeates the program, from top to bottom: you always pay the mental cost of inverting your program’s control flow by turning it into callbacks, and you always incur run-time cost of closure creation, even when the particular I/O could proceed without blocking. It’s a somewhat galling requirement, given that this contortion is required of the programmer, but could be done by the compiler. We Schemers demand better abstractions than manual, obligatory continuation-passing-style conversion.

Callback-based systems also encourage unstructured concurrency, as in practice callbacks are not the only path for data and control flow in a system: usually there is mutable global state as well. Without strong patterns and conventions, callback-based systems often exhibit bugs caused by concurrent reads and writes to global state.

Some of the problems of callbacks can be mitigated by using “promises” or other library-level abstractions; if you’re a Haskell person, you can think of this as lifting all possibly-blocking operations into a monad. If you’re not a Haskeller, that’s cool, neither am I! But if your typey spidey senses are tingling, it’s for good reason: with promises, your whole program has to be transformed to return promises-for-values instead of values anywhere it would block.

An obvious solution to the control-flow problem of callbacks is to use threads. In the most generic sense, a thread is a language feature which denotes an independent computation. Threads are created by other threads, but fork off and run independently instead of returning to their caller. In a system with threads, there is implicitly a scheduler somewhere that multiplexes the threads so that when one suspends, another can run.

In practice, the concept of threads is often conflated with a particular implementation, kernel threads. Kernel threads are very low-level abstractions that are provided by the operating system. The nice thing about kernel threads is that they can use any CPU that is the kernel knows about. That’s an important factor in today’s computing landscape, where Moore’s law seems to be giving us more cores instead of more gigahertz.

However, as a building block for a highly concurrent system, kernel threads have a few important problems.

One is that kernel threads simply aren’t designed to be allocated in huge numbers, and instead are more optimized to run in a one-per-CPU-core fashion. Their memory usage is relatively high for what should be a lightweight abstraction: some 10 kilobytes at least and often some megabytes, in the form of the thread’s stack. There are ongoing efforts to reduce this for some systems but we cannot expect wide deployment in the next 5 years, if ever. Even in the best case, a hundred thousand kernel threads will take at least a gigabyte of memory, which seems a bit excessive for book-keeping overhead.

Kernel threads can be a bit irritating to schedule, too: when one thread suspends, it’s for a reason, and it can be that user-space knows a good next thread that should run. However because kernel threads are scheduled in the kernel, it’s rarely possible for the kernel to make informed decisions. There are some “user-mode scheduling” facilities that are in development for some systems, but again only for some systems.

The other significant problem is that building non-crashy systems on top of kernel threads is hard to do, not to mention “correct” systems. It’s an embarrassing situation. For one thing, the low-level synchronization primitives that are typically provided with kernel threads, mutexes and condition variables, are not composable. Also, as with callback-oriented concurrency, one thread can silently corrupt another via unstructured mutation of shared state. It’s worse with kernel threads, though: a kernel thread can be interrupted at any point, not just at I/O. And though callback-oriented systems can theoretically operate on multiple CPUs at once, in practice they don’t. This restriction is sometimes touted as a benefit by proponents of callback-oriented systems, because in such a system, the callback invocations have a single, sequential order. With multiple CPUs, this is not the case, as multiple threads can run at the same time, in parallel.

Kernel threads can work. The Java virtual machine does at least manage to prevent low-level memory corruption and to do so with high performance, but still, even Java-based systems that aim for maximum concurrency avoid using a thread per connection because threads use too much memory.

In this context it’s no wonder that there’s a third strain of concurrency: shared-nothing message-passing systems like Erlang. Erlang isolates each thread (called processes in the Erlang world), giving each it its own heap and “mailbox”. Processes can spawn other processes, and the concurrency primitive is message-passing. A process that tries receive a message from an empty mailbox will “block”, from its perspective. In the meantime the system will run other processes. Message sends never block, oddly; instead, sending to a process with many messages pending makes it more likely that Erlang will pre-empt the sending process. It’s a strange tradeoff, but it makes sense when you realize that Erlang was designed for network transparency: the same message send/receive interface can be used to send messages to processes on remote machines as well.

No network is truly transparent, however. At the most basic level, the performance of network sends should be much slower than local sends. Whereas a message sent to a remote process has to be written out byte-by-byte over the network, there is no need to copy immutable data within the same address space. The complexity of a remote message send is O(n) in the size of the message, whereas a local immutable send is O(1). This suggests that hiding the different complexities behind one operator is the wrong thing to do. And indeed, given byte read and write operators over sockets, it’s possible to implement remote message send and receive as a process that serializes and parses messages between a channel and a byte sink or source. In this way we get cheap local channels, and network shims are under the programmer’s control. This is the approach that the Go language takes, and is the one we use in Fibers.

Structuring a concurrent program as separate threads that communicate over channels is an old idea that goes back to Tony Hoare’s work on “Communicating Sequential Processes” (CSP). CSP is an elegant tower of mathematical abstraction whose layers form a pattern language for building concurrent systems that you can still reason about. Interestingly, it does so without any concept of time at all, instead representing a thread’s behavior as a trace of instantaneous events. Threads themselves are like functions that unfold over the possible events to produce the actual event trace seen at run-time.

This view of events as instantaneous happenings extends to communication as well. In CSP, one communication between two threads is modelled as an instantaneous event, partitioning the traces of the two threads into “before” and “after” segments.

Practically speaking, this has ramifications in the Go language, which was heavily inspired by CSP. You might think that a channel is just a an asynchronous queue that blocks when writing to a full queue, or when reading from an empty queue. That’s a bit closer to the Erlang conception of how things should work, though as we mentioned, Erlang simply slows down writes to full mailboxes rather than blocking them entirely. However, that’s not what Go and other systems in the CSP family do; sending a message on a channel will block until there is a receiver available, and vice versa. The threads are said to “rendezvous” at the event.

Unbuffered channels have the interesting property that you can select between sending a message on channel a or channel b, and in the end only one message will be sent; nothing happens until there is a receiver ready to take the message. In this way messages are really owned by threads and never by the channels themselves. You can of course add buffering if you like, simply by making a thread that waits on either sends or receives on a channel, and which buffers sends and makes them available to receives. It’s also possible to add explicit support for buffered channels, as Go, core.async, and many other systems do, which can reduce the number of context switches as there is no explicit buffer thread.

Whether to buffer or not to buffer is a tricky choice. It’s possible to implement singly-buffered channels in a system like Erlang via an explicit send/acknowlege protocol, though it seems difficult to implement completely unbuffered channels. As we mentioned, it’s possible to add buffering to an unbuffered system by the introduction of explicit buffer threads. In the end though in Fibers we follow CSP’s lead so that we can implement the nice select behavior that we mentioned above.

As a final point, select is OK but is not a great language abstraction. Say you call a function and it returns some kind of asynchronous result which you then have to select on. It could return this result as a channel, and that would be fine: you can add that channel to the other channels in your select set and you are good. However, what if what the function does is receive a message on a channel, then do something with the message? In that case the function should return a channel, plus a continuation (as a closure or something). If select results in a message being received over that channel, then we call the continuation on the message. Fine. But, what if the function itself wanted to select over some channels? It could return multiple channels and continuations, but that becomes unwieldy.

What we need is an abstraction over asynchronous operations, and that is the main idea of a CSP-derived system called “Concurrent ML” (CML). Originally implemented as a library on top of Standard ML of New Jersey by John Reppy, CML provides this abstraction, which in Fibers is called an operation1. Calling send-operation on a channel returns an operation, which is just a value. Operations are like closures in a way; a closure wraps up code in its environment, which can be later called many times or not at all. Operations likewise can be performed2 many times or not at all; performing an operation is like calling a function. The interesting part is that you can compose operations via the wrap-operation and choice-operation combinators. The former lets you bundle up an operation and a continuation. The latter lets you construct an operation that chooses over a number of operations. Calling perform-operation on a choice operation will perform one and only one of the choices. Performing an operation will call its wrap-operation continuation on the resulting values.

While it’s possible to implement Concurrent ML in terms of Go’s channels and baked-in select statement, it’s more expressive to do it the other way around, as that also lets us implement other operations types besides channel send and receive, for example timeouts and condition variables.


1 CML uses the term event, but I find this to be a confusing name. In this isolated article my terminology probably looks confusing, but in the context of the library I think it can be OK. The jury is out though.

2 In CML, synchronized.

* * *

Well, that's my limited understanding of the crushing weight of history. Note that part of this article is now in the Fibers manual.

Thanks very much to Matthew Flatt, Matthias Felleisen, and Michael Sperber for pushing me towards CML. In the beginning I thought its benefits were small and complication large, but now I see it as being the reverse. Happy hacking :)

by Andy Wingo at October 12, 2016 01:45 PM

Guile News

GNU Guile 2.0.13 released [security fixes]

We've just released a new version of GNU Guile, version 2.0.13, which is a security release for Guile (see the original announcement).

This handles a significant security vulnerability affecting the live REPL, CVE-2016-8606. Due to the nature of this bug, Guile applications themselves in general aren't vulnerable, but Guile developers are. Arbitrary Scheme code may be used to attack your system in this scenario. (A more minor security issue is also addressed, CVE-2016-8605.)

There is also a lesson here that applies beyond Guile: the presumption that "localhost" is only accessible by local users can't be guaranteed by modern operating system environments. If you are looking to provide local-execution-only, we recommend using Unix domain sockets or named pipes. Don't rely on localhost plus some port.

To give context, Guile supports a nice live-hacking feature where a user can expose a REPL to connect to, through Geiser or so on. This allows Guile users to hack programs even while programs are running.

The default in Guile has been to expose a port over localhost to which code may be passed. The assumption for this is that only a local user may write to localhost, so it should be safe. Unfortunately, users simultaneously developing Guile and operating modern browsers are vulnerable to a combination of an HTML form protocol attack and a DNS rebinding attack. How to combine these attacks is published in the article How to steal any developer's local database.

In Guile's case, the general idea is that you visit some site which presumably loads some JavaScript code (or tricks the developer into pressing a button which performs a POST), and the site operator switches the DNS from their own IP to 127.0.0.1. Then a POST is done from the website to 127.0.0.1 with the body containing Scheme code. This code is then executed by the Guile interpreter on the listening port.

The version we are releasing mitigates this problem by detecting incoming HTTP connections and closing them before executing any code.

However, there is a better long term solution, which is already available even to users of older versions of Guile: Guile supports Unix domain sockets in POSIX environments. For example, users may run the command:

to open and listen to a socket at /tmp/guile-socket. Geiser users may then connect using M-x geiser-connect-local. This is considerably safer.

We hope that other program authors take heed of this lesson as well: many programs make use of localhost + port as a way of limiting connections. Unfortunately, in today's complex networked environment, this isn't a safe assumption. It's very difficult to predict what programs may provide a way of chaining requests to an application listening on localhost, and certainly difficult on a system where web browsers are involved. Take heed!

(This post originally appeared on the guile-users mailing list.)

by Christopher Allan Webber at October 12, 2016 12:56 PM

October 11, 2016

Programming Praxis

Three List Manipulation Exercises

We must be approaching the middle of the autumn academic term, because the beginning-programmer message boards are starting to fill up with linked-list exercises; one enterprising fellow even sent me an email asking for help on his homework. Here are three tasks that involve manipulating linked lists. We’ve probably done all three before, and they’re simple enough for many of the people that read and respond here, but learners seem to have trouble with them, so we’ll discuss them here:

  1. Write a function that takes a linked list of integers and returns a new linked list with all the odd integers removed.
  2. Write a function that removes every nth item from a linked list.
  3. Write a function that reverses the two halves of a list; if the number of items in the list is odd, the middle element should go with the second half of the list

Your task is to write the three functions described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at October 11, 2016 09:00 AM

October 10, 2016

Peter Bex

CHICKEN's numeric tower: part 1

Originally, CHICKEN only supported fixnums (small integers) and flonums (floating-point numbers). The upcoming CHICKEN 5 will support a full numeric tower, adding arbitrary-length integers, rational numbers and complex numbers. This is the first in a series of blog posts about my journey to make this a reality. We'll start with a bit of background information. Later parts will dive into the technical details.

In the beginning, there were two numerical types

Like I mentioned, CHICKEN originally only supported fixnums and flonums. This is still the case in CHICKEN 4. When a fixnum overflows, it is coerced into a flonum. On 32-bit systems, this buys us 52 bits of precision, which is more than the 30 bits of precision fixnums offer:

 #;1> most-positive-fixnum
 1073741823
 #;2> (+ most-positive-fixnum 1)
 1073741824.0

This works reasonably well, and is well-behaved until you go beyond the 52 bits supported by the floating-point representation:

 #;3> (flonum-print-precision 100)
 #;4> (expt 2 53)
 9007199254740992.0
 #;5> (+ (expt 2 53) 1)
 9007199254740992.0
 #;6> (= (expt 2 53) (+ (expt 2 53) 1))
 #t

On a 64-bit machine, overflow of the 62 bits of a fixnum to the 52 bits of a flonum is rather weird:

 #;1> (= most-positive-fixnum (- (+ most-positive-fixnum 1) 2))
 #t

Since we only have fixnums and flonums, any attempt to enter a rational number will result in a flonum:

 #;1> 1/2
 0.5
 #;2> 1/3
 0.333333333333333

Complex numbers are not supported at all:

 #;1> 1+2i
 
 Error: unbound variable: 1+2i

Of course, some people still needed to work with complex numbers, so a long time ago, Thomas Chust created the "complex" egg. This added complex number support to the reader, and the basic numeric operators were overridden to support complex numbers. About a year later, Felix Winkelmann created the original version of the "numbers" egg, using Thomas's code for complex numbers. This added arbitrarily large integers ("bignums" in Lisp parlance) and rational number support via the GNU MP library. Thus, CHICKEN finally had a full numeric tower, and it was completely optional, too. Pretty awesome!

Cracks start to show

Unfortunately, it's not as awesome as it sounds. There are some problems with having parts of the numeric tower as an add-on, instead of having it all in core:

  • In a Scheme with modules, + from the scheme module should always refer to the same procedure. So if a module imports that + instead of the one from the numbers module, it will not understand extended numeric types. This means that you can't easily combine a library that uses numbers with one that doesn't. If you pass a bignum to the library that does not use numbers, it will raise an exception. This is mostly a problem with Scheme itself, which doesn't have a clean way to define polymorphic procedures. This makes the numeric tower a built-in special case. It is possible to mutate procedures, but allowing for that implies a big performance hit on all code, even if you don't use the numbers egg.
  • The numbers egg extends the reader to support extended numeric literals. This means that if some code somewhere loads the numbers egg, the reader extension is active even though you didn't load numbers yourself. This can cause confusion because normal numeric operations don't accept these numbers. For an example, see this bug report.
  • Speaking of extended numeric literals: the compiler doesn't know how to serialise those into the generated C code. This means you can't compile Scheme code containing such literals. You'd have to use string->number everywhere, instead. I found a clever hack to make this work with the numbers egg, but it isn't fool-proof. For instance, it doesn't work when cross-compiling to a platform with different endianness, or if one platform is 32-bit and the other is 64-bit.
  • The compiler can optimise tight loops by using inline C functions for primitive operations such as the built-in numerical procedures. A current weak spot of CHICKEN is that (as far as I know), eggs can't add such inline C function replacements. So, any code that uses the numbers egg is doomed to have bad performance in critical loops. I think making inlining of C functions available for user code would be a great project (hint, hint!).
  • Because the FFI (foreign function interface) is built into the compiler, it doesn't support bignums. This means 64-bit integers returned from C are converted to flonums, losing precision. Eggs can't hook into the FFI deeply enough to override this.

One could argue that these are all language or implementation limitations. On the one hand, that's a fair argument. On the other hand, keeping everything "open" so it can be tweaked by the user prevents many optimisations. It also makes the implementation more complex. For instance, there are hooks in core specifically for the numbers egg, to support reading and writing extended literals. The numeric tower needs deeper integration than most other things because numbers are a basic type, much like symbols, strings or lists. So, it makes more sense to have this in the core system.

The start of my quest

Traditionally, Lisps have supported a full numeric tower. At least since the MacLISP days (the early 1970s; see also The History of Lisp), bignums have been pretty standard. Scheme formalises this in the standard, but it does not require full support for all numeric types. Still, in my opinion any serious Lisp or Scheme implementation should support the full numerical tower. It's one of those things that make Lisp unique and more cause for that famous smugness of us Lisp weenies.

It is fantastic when a language supports arbitrarily large integers. Not having to worry about overflows helps prevent various nasty security bugs (luckily, overflowing into flonums, like CHICKEN, mitigates most of these). Bignums can also make it much easier to interact with native code, because integer width is never a problem. It basically frees the programmer from having to think about "unimportant" low-level details. Rational numbers (i.e., fractions like 1/2 or 3/5) and complex numbers are just icing on the cake that add a real feeling of "thoroughness" to Lisp.

This idea, and the fact that other "proper" Scheme implementations support the full numeric tower out of the box always frustrated me. I believe people are less likely to take CHICKEN seriously as a full Scheme implementation. Especially new users are often surprised when CHICKEN does not work as expected. Tutorials don't mention that the numeric tower is partly optional!

More experienced users were also frustrated with the limitations of having numbers as a separate egg, like you can see for example in this thread. In it, some of the problems are indicated, and it is also made clear why a GNU MP-based implementation should not be part of CHICKEN.

From all of this, I decided that the best way to get bignums into core would be to start with finding a good BSD-licensed implementation. Then I could replace GMP with this new implementation in the numbers egg, tweak it to use CHICKEN's naming conventions and finally integrate the new code into core. How hard could it be, really? Little did I suspect that 5 years later, the code would finally be introduced to core!

A very slow, but BSD-licensed implementation

Finding a BSD-licensed bignum implementation is not very difficult, and I quickly settled on the Scheme48 implementation, which was originally taken from MIT Scheme. I've always admired Scheme48 for its extremely clean and easy to understand code base, and CHICKEN core already used the syntax-rules implementation from Scheme48, so it made a lot of sense to use their code. Unfortunately, it turned out that the implementation was extremely inefficient, especially when dealing with rational numbers ("ratnums"). After a few weeks of intensive hacking to fix the worst problems, it was finally ready.

This new implementation was much more efficient than the GMP-based numbers egg, but that's only because the GMP-based version relied heavily on finalizers to clean up memory. The new version integrated properly with the CHICKEN garbage collector. This reduced a whole lot of overhead. Having said that, GMP itself is the fastest bignum implementation you'll ever find, so if you can at all get away with using it in your project, do so!

CHICKEN 5 is announced

The CHICKEN core team (of which I'm a member) decided that CHICKEN 5 should be a clean break, with no backwards compatibility. We wanted to finally restructure the core libraries, which had become rather messy, and change a few confusing aspects about modules. Doing this with backwards compatibility would sap too much development energy and possibly result in an even bigger mess. When this decision was made, I decided that this would be the perfect opportunity to finally integrate the numbers egg into core.

I had been working on the numbers egg on and off over the past years, hoping for a good moment to add it to core. When the opportunity presented itself, at first I naively thought a few tweaks would suffice to integrate it. I thought I only had to make some name changes and rearrange some functions. The Scheme48 code base used very descriptive and highly abstract naming, whereas CHICKEN uses terse names and has both inline and CPS variants for primitive operations. Besides, quite a bit of code in the numbers egg was purely in Scheme, whereas CHICKEN has a more-or-less official C API. So, I had to convert some of the functions to C. This would probably also result in some performance improvements.

Small changes lead to a total rewrite

During the conversion to C, I noticed various opportunities for performance improvements. For instance, the Scheme48 code still relied on malloc() to allocate temporary numbers in several places. Where this was done, the final result of an operation would then be allocated into GC-managed memory and the temporary buffer was immediately freed.

Rewriting the code to allocate directly in GC-able memory resulted in quite the restructuring of the code, because we'd need to have a restartable continuation at every point where an allocation would take place. For example, here's the code for negating a bignum:

static void big_neg(C_word c, C_word self, C_word k, C_word x)
{
  bignum_type big = big_of(x); /* Extract bignum data */
  C_word negated_big = bignum_new_sign(big, !(BIGNUM_NEGATIVE_P (big)));
  C_return_bignum(k, negated_big);
}

static bignum_type bignum_new_sign(bignum_type bignum, int negative_p)
{
  bignum_type result =
    (bignum_allocate ((BIGNUM_LENGTH (bignum)), negative_p));  /* mallocs */
  bignum_destructive_copy (bignum, result);  /* basically a manual memcpy */
  return (result);
}

It looks very simple, but a lot is going on under the hood. The C_return_bignum function contained all the hairy complexity; it would either convert the bignum to a fixnum, deallocate the bignum and call the passed continuation, or it would set up a continuation that would copy the bignum into a heap-allocated copy and deallocate the original bignum, and pass that to an allocation function.

This was changed into the following, which uses the core's _u_ naming convention to indicate that the function is unsafe, i.e. it doesn't check its arguments:

void C_ccall C_u_bignum_negate(C_word c, C_word self, C_word k, C_word x)
{
  C_word kab[C_SIZEOF_CLOSURE(3)], *ka = kab, k2, negp, size;

  /* Create continuation k2, to call after allocation */
  k2 = C_closure(&ka, 3, (C_word)bignum_negate_2, k, x);
  
  negp = C_i_not(C_u_i_bignum_negativep(x)); /* Toggle sign */
  size = C_u_i_bignum_size(x);
  C_allocate_bignum(3, (C_word)NULL, k2, size, negp, C_SCHEME_FALSE);
}

static void bignum_negate_2(C_word c, C_word self, C_word new_big)
{
  C_word k = C_block_item(self, 1), /* Extract original continuation */
         old_big = C_block_item(self, 2); /* Extract original bignum */

  /* Copy old bignum digits to newly allocated (negated) bignum */
  C_memcpy(C_bignum_digits(new_big), C_bignum_digits(old_big),
           C_header_size(C_internal_bignum(old_big))-C_wordstobytes(1));

  C_kontinue(k, new_big); /* "Return" the new bignum by calling k with it */
}

The new version looks hairier but does less, because it allocates the bignum directly into the nursery or the heap. Because this may require a GC, it needs to have a continuation, which can be invoked from the GC's trampoline. That's the reason this has to be cut into two separate C functions. There are functions that allocate 2 bignums or even more, which I had to cut up into 3 or more functions!

Besides using "native" naming conventions, this new version also gets rid of the unnecessary, un-CHICKENish bignum_type abstraction. Instead, it uses only C_word as its type. This also removed the need for some questionable type casts. Luckily, the final negating version that ended up in CHICKEN 5 is a lot simpler and again only one function, but that required a breakthrough in thinking that I hadn't had at this point yet. I will discuss this breakthrough in the final post in this series.

After having taken care of all the functions, very little remained of the original Scheme48 code. It was completely mutilated! Earlier I had to rewrite some of the Scheme code to improve performance, and now I was restructuring the C code. To top it off, after studying other bignum implementations, it became clear that the Scheme48 code was pretty slow when compared to other Schemes. It only implemented the "classical" algorithms, and it was optimised for readability, not speed.

So, I studied up on better algorithms to make it perform more acceptably. In the next few parts, I'll share with you a few things that I've learned.

by Peter Bex at October 10, 2016 07:36 PM

October 07, 2016

Programming Praxis

Sticks

We continue our occasional series of textbook exercises:

You are given a bunch of sticks, each of integer length. Two sticks can be combined into a single, larger stick by a process that costs the sum of the lengths of the two sticks. Your goal is to build a single stick combining all the original sticks, at minimal cost.

For example, suppose you initially have three sticks of lengths 1, 2, and 4. You could combine sticks 2 and 4 at a cost of 6, then combine that stick with stick 1 at a cost of 7, for a total cost of 13. But it is better to first combine sticks 1 and 2 at a cost of 3, then combine that stick with stick 4 at a cost of 7, for a total cost of 10.

Your task is to write a program that combines a bunch of sticks at minimal cost. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at October 07, 2016 09:00 AM

October 04, 2016

Programming Praxis

I’m Embarrassed!

Matthew Arcus pointed out a bug in my solution to the previous problem:

> (dollar 1234567.9999)
$1,234,567.100

I can’t remember a similarly bad bug in any previous Programming Praxis problem.

You get the day off today. There is no exercise for you to do. You are welcome to read or run my corrected solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at October 04, 2016 09:00 AM

September 30, 2016

Programming Praxis

Dollar Format

We have a simple task today, a function that formats a number in dollar format. A number like 1234567.8912 should be rounded to two positions after the decimal point, have commas inserted every three positions before the decimal point, and have a dollar sign prepended; thus, the function should format 1234567.8912 as $1,234,567.89.

Your task is to write a function that returns numbers in dollar format; if your language provides such a facility natively, you are not permitted to use it. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 30, 2016 09:00 AM

September 27, 2016

Programming Praxis

Maximum Product Of Three

Today’s exercise comes from the end-of-chapter exercises in the sorting chapter of a programming textbook:

Write a program that finds the maximum product of three numbers in a given array of integers.

Your task is to write the desired program. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 27, 2016 09:00 AM

September 23, 2016

Programming Praxis

Water Jugs Puzzle

There are various puzzles in which water is poured from one jug to another to reach a desired amount of water. In the version we consider today, we have two jugs, an unlimited amount of water to fill them, and a drain into which we can pour an unlimited amount of water. The two jugs have known capacities, but it is not possible to accurately measure portions of a jug.

As an example, we wish to obtain four gallons of water, using jugs of capacities three and five gallons. Starting with two empty jugs, it is possible to obtain four gallons of water using the following six steps:

  1. Fill the five-gallon jug.
  2. Pour three gallons from the five-gallon jug to the three-gallon jug, leaving two gallons in the five-gallon jug.
  3. Empty the three-gallon jug.
  4. Pour two gallons from the five-gallon jug to the three-gallon jug, leaving the five-gallon jug empty and two gallons in the three-gallon jug.
  5. Fill the five-gallon jug.
  6. Pour one gallon from the five-gallon jug into the three-gallon jug, filling it, leaving the desired four gallons in the five-gallon jug.

Bruce Willis figured that out once; so too do thousands of school children every year.

Your task is to write a program that solves this kind of water-jug problem using the minimum number of steps (filling a jug, emptying a jug, or pouring one jug into the other). When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 23, 2016 09:00 AM

September 21, 2016

Andy Wingo

is go an acceptable cml?

Yesterday I tried to summarize the things I know about Concurrent ML, and I came to the tentative conclusion that Go (and any Go-like system) was an acceptable CML. Turns out I was both wrong and right.

you were wrong when you said everything's gonna be all right

I was wrong, in the sense that programming against the CML abstractions lets you do more things than programming against channels-and-goroutines. Thanks to Sam Tobin-Hochstadt to pointing this out. As an example, consider a little process that tries to receive a message off a channel, and times out otherwise:

func withTimeout(ch chan int, timeout int) (result int) {
  var timeoutChannel chan int;
  var msg int;
  go func() {
    sleep(timeout)
    timeoutChannel <- 0
  }()
  select {
    case msg = <-ch: return msg;
    case msg = <-timeoutChannel: return 0;
  }
}

I think that's the first Go I've ever written. I don't even know if it's syntactically valid. Anyway, I think we see how it should work. We return the message from the channel, unless the timeout happens before.

But, what if the message is itself a composite message somehow? For example, say we have a transformer that reads a value from a channel and adds 1 to it:

func onePlus(in chan int) (result chan int) {
  var out chan int
  go func () { out <- 1 + <-in }()
  return out
}

What if we do a withTimeout(onePlus(numbers), 0)? Assume the timeout fires first and that's the result that select chooses. There's still that onePlus goroutine out there trying to read from in and at some point probably it will succeed, but nobody will read its value. At that point the number just vanishes into the ether. Maybe that's OK in certain domains, but certainly not in general!

What CML gives you is the ability to express an event (which is kinda like a possibility of sending or receiving a message on a channel) in such a way that we don't run into this situation. Specifically with the wrap combinator, we would make an event such that receiving on numbers would run a function on the received message and return that as the message value -- which is of course the same as what we have, except that in CML the select wouldn't actually read the message off unless it select'd that channel for input.

Of course in Go you could just rewrite your program, so that the select statement looks like this:

select {
  case msg = <-ch: return msg + 1;
  case msg = <-timeoutChannel: return 0;
}

But here we're operating at a lower level of abstraction; we were forced to intertwingle our concerns of adding 1 and our concerns of timeout. CML is more expressive than Go.

you were right when you said we're all just bricks in the wall

However! I was right in the sense that you can build a CML system on top of Go-like systems (though possibly not Go in particular). Thanks to Vesa Karvonen for this comment and the link to their proof-of-concept CML implementation in Clojure's core.async. I understand Vesa also has an implementation in F# as well.

Folks should read Vesa's code, after reading the Reppy papers of course; it's delightfully short and expressive. The basic idea is that event composition operators like choose and wrap build up data structures instead of doing things. The sync operation then grovels through those data structures to collect a list of channels to pass on to core.async's equivalent of select. When select returns, sync determines which event that chosen channel and message corresponds to, and proceeds to "activate" the event (and, as a side effect, possibly issue NACK messages to other channels).

Provided you can map from the chosen select channel/message back to the event, (something that core.async can mostly do, with a caveat; see the code), then you can build CML on top of channels and goroutines.

o/~ yeah you were wrong o/~

On the other hand! One advantage of CML is that its events are not limited to channel sends and receives. I understand that timeouts, thread joins, and maybe some other event types are first-class event kinds in many CML systems. Michael Sperber, current Scheme48 maintainer and functional programmer, tells me that simply wrapping events in channels+goroutines works but can incur a big performance overhead relative to supporting those event types natively, due to the need to make the new goroutine and channel and the scheduling costs. He quotes 10X as the overhead!

So although CML and Go appear to be inter-expressible, maybe a proper solution will base the simple channel send/receive interface on CML rather than the other way around.

Also, since these events are now second-class, it must be OK to lose these events, for the same reason that the naïve withTimeout could lose a message from numbers. This is the case for timeouts usually but maybe you have to think about this more, and possibly provide an infinite stream of the message. (Of course the wrapper goroutine would be collected if the channel becomes unreachable.)

you were right when you said this is the end

I've long wondered how contemporary musicians deal with the enormous, crushing weight of recorded music. I don't really pick any more but hoo am I feeling this now. I think for Guile, I will continue hacking on fibers in a separate library, and I think that things will remain that way for the next couple years and possibly more. We need more experience and more mistakes before blessing and supporting any particular formulation of highly concurrent programming. I will say though that I am delighted that we are able to actually do this experimentation on a library level and I look forward to seeing what works out :)

Thanks again to Vesa, Michael, and Sam for sharing their time and knowledge; all errors are of course mine. Happy hacking!

by Andy Wingo at September 21, 2016 09:29 PM

September 20, 2016

Andy Wingo

concurrent ml versus go

Peoples! Lately I've been navigating the guile-ship through waters unknown. This post is something of an echolocation to figure out where the hell this ship is and where it should go.

Concretely, I have been working on getting a nice lightweight concurrency system rolling for Guile. I'll write more about that later, but you can think of it as being modelled on Go, though built as a library. (I had previously described it as "Erlang-like", but that's just not accurate.)

Earlier this year at Curry On this topic was burning in my mind and of course when I saw the language-hacker fam there I had to bend their ears. My targets: Matthew Flatt, the amazing boundary-crossing engineer, hacker, teacher, researcher, and implementor of Racket, and Matthias Felleisen, the godfather of the PLT research family. I saw them sitting together and I thought, you know what, what can they have to say to each other? These people have been talking together for 30 years right? Surely they are actually waiting for some ignorant dude to saunter up to the PL genius bar, right?

So saunter I do, saying, "if someone says to you that they want to build a server that will handle 100K or so simultaneous connections on Racket, what abstraction do you tell them to use? Racket threads?" Apparently: yes. A definitive yes, in the case of Matthias, with a pointer to Robby Findler's paper on kill-safe abstractions; and still a yes from Matthew with the caveat that for the concrete level of concurrency that I described, you'd have to run tests. More fundamentally, I was advised to look at Concurrent ML (on which Racket's concurrency facilities were based), that CML was much better put together than many modern variants like Go.

This was very interesting and new to me. As y'all probably know, I don't have a formal background in programming languages, and although I've read a lot of literature, reading things only makes you aware of the growing dimension of the not-yet-read. Concurrent ML was even beyond my not-yet-read horizon.

So I went back and read a bunch of papers. Turns out Concurrent ML is like Lisp in that it has a tribe and a tightly-clutched history and a diaspora that reimplements it in whatever language they happen to be working in at the moment. Kinda cool, and, um... a bit hard to appreciate in the current-day context when the only good references are papers from 10 or 20 years ago.

However, after reading a bunch of John Reppy papers, here is my understanding of what Concurrent ML is. I welcome corrections; surely I am getting this wrong.

1. CML is like Go, composed of channels and goroutines. (Forgive the modern referent; I assume most folks know Go at this point.)

2. Unlike Go, in CML a channel is never buffered. To make a buffered channel in CML, you spawn a thread that manages a buffer between two channels.

3. Message send and receive operations in CML are built on a lower-level primitive called "events". (send ch x) is instead euivalent to (sync (send-event ch x)). It's like an event is the derivative of a message send with respect to time, or something.

4. Events can be combined and transformed using the choose and wrap combinators.

5. Doing a sync on an event created by choose allows a user to build select in "user-space", as a library. Cool stuff. So this is what events are for.

6. There are separate event type implementations for timeouts, channel send/recv blocking operations, file descriptor blocking operations, syscalls, thread joins, and the like. These are supported by the CML implementation.

7. The early implementations of Concurrent ML were concurrent but not parallel; they did not run multiple "goroutines" on separate CPU cores at the same time. It was only in like 2009 that people started to do CML in parallel. I do not know if this late parallelism has a practical impact on the viability of CML.

ok go

What is the relationship of CML to Go? Specifically, is CML more expressive than Go? (I assume the reverse is not the case, but that would also be an interesting result!)

There are a few languages that only allow you to select over message receives (not sends), but Go's select doesn't have this limitation, so that's not a differentiator.

Some people say that it's nice to have events as the common denominator, but I don't get this argument. If the only event under consideration is message send or receive over a channel, events + choose + sync is the same in expressive power as a built-in select, as far as I can see. If there are other events, then your runtime already has to support them either way, and something like (let ((ch (make-channel))) (spawn-fiber (lambda () (put-message ch exp))) (get-message ch)) should be sufficient for any runtime-supported event in exp, like sleeps or timeouts or thread joins or whatever.

To me it seems like Go has made the right choices here. I do not see the difference, and that's why I wrote all this, is to be shown the error of my ways. Choosing channels, send, receive, and select as the primitives seems to have the same power as SML events.

Let this post be a pentagram on the floor, then, to summon the CML cognoscenti. Well-actuallies are very welcome; hit me up in the comments!

[edit: Sam Tobin-Hochstadt tells me I got it wrong and I believe him :) In the meantime while I work out how I was wrong, examples are welcome!]

by Andy Wingo at September 20, 2016 09:33 PM

Programming Praxis

Two-Way Cipher

A recent post at Reddit asked for a way to encrypt two plaintexts to the same ciphertext; the application was in geocaching, where a series of caches leads to two different locations depending on the decoded message. That’s an interesting question, and the responses there got it wrong. Fortunately, the poster also asked at the crypto reddit, and the people there were more helpful.

Your task is to write a program that decrypts two different plaintexts from the same ciphertext, given two different keys. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution in the comments below.


by programmingpraxis at September 20, 2016 09:00 AM

September 16, 2016

Programming Praxis

Man Or Boy

During the development of Algol 60, Donald Knuth devised a nasty test of recursion:

There are quite a few ALGOL60 translators in existence which have been designed to handle recursion and non-local references properly, and I thought perhaps a little test-program may be of value. Hence I have written the following simple routine, which may separate the man-compilers from the boy-compilers:

begin 
 real procedure A(k, x1, x2, x3, x4, x5);
 value k; integer k;
 begin
  real procedure B;
  begin 
   k := k - 1; 
   B := A := A(k, B, x1, x2, x3, x4)
  end;
  if k ≤ then A : = x4 + x5 else B
 end
 outreal(A(10, 1, -1, -1, 1, 0))
end

This uses nothing known to be tricky or ambiguous. My question is: What should the answer be? Unfortunately, I don’t have to a man-compiler myself, and so I was forced to try hand calculations. My conjecture (probably wrong) is that the answer will be:

73 - 119 - 177 + 102 = - 121

I’d be very glad to know the right answer.

Your task is to write a program that computes the right answer. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 16, 2016 09:00 AM

September 14, 2016

Guile News

GNU Guile 2.1.4 released (beta)

We are delighted to announce GNU Guile release 2.1.4, the next pre-release in what will become the 2.2 stable series.

This release fixes many small bugs, adds an atomic reference facility, and improves the effectiveness of integer unboxing in the compiler. See the release announcement for full details and a download link.

by Andy Wingo at September 14, 2016 10:10 AM

September 13, 2016

Programming Praxis

A Common Interview Question

I’ve seen this question two or three times recently on message boards that provide interview questions, so I guess we ought to add it to our collection:

Create and implement a data structure that provides

  • insert
  • delete
  • find min
  • find max
  • delete min
  • delete max

 

all in O(1) time, regardless of the type of the underlying data.

Your task is to create and implement that data structure. When you are finished, you are welcome to read a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 13, 2016 09:00 AM