Planet Scheme

September 23, 2016

Programming Praxis

Water Jugs Puzzle

There are various puzzles in which water is poured from one jug to another to reach a desired amount of water. In the version we consider today, we have two jugs, an unlimited amount of water to fill them, and a drain into which we can pour an unlimited amount of water. The two jugs have known capacities, but it is not possible to accurately measure portions of a jug.

As an example, we wish to obtain four gallons of water, using jugs of capacities three and five gallons. Starting with two empty jugs, it is possible to obtain four gallons of water using the following six steps:

  1. Fill the five-gallon jug.
  2. Pour three gallons from the five-gallon jug to the three-gallon jug, leaving two gallons in the five-gallon jug.
  3. Empty the three-gallon jug.
  4. Pour two gallons from the five-gallon jug to the three-gallon jug, leaving the five-gallon jug empty and two gallons in the three-gallon jug.
  5. Fill the five-gallon jug.
  6. Pour one gallon from the five-gallon jug into the three-gallon jug, filling it, leaving the desired four gallons in the five-gallon jug.

Bruce Willis figured that out once; so too do thousands of school children every year.

Your task is to write a program that solves this kind of water-jug problem using the minimum number of steps (filling a jug, emptying a jug, or pouring one jug into the other). When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 23, 2016 09:00 AM

September 21, 2016

Andy Wingo

is go an acceptable cml?

Yesterday I tried to summarize the things I know about Concurrent ML, and I came to the tentative conclusion that Go (and any Go-like system) was an acceptable CML. Turns out I was both wrong and right.

you were wrong when you said everything's gonna be all right

I was wrong, in the sense that programming against the CML abstractions lets you do more things than programming against channels-and-goroutines. Thanks to Sam Tobin-Hochstadt to pointing this out. As an example, consider a little process that tries to receive a message off a channel, and times out otherwise:

func withTimeout(ch chan int, timeout int) (result int) {
  var timeoutChannel chan int;
  var msg int;
  go func() {
    sleep(timeout)
    timeoutChannel <- 0
  }()
  select {
    case msg = <-ch: return msg;
    case msg = <-timeoutChannel: return 0;
  }
}

I think that's the first Go I've ever written. I don't even know if it's syntactically valid. Anyway, I think we see how it should work. We return the message from the channel, unless the timeout happens before.

But, what if the message is itself a composite message somehow? For example, say we have a transformer that reads a value from a channel and adds 1 to it:

func onePlus(in chan int) (result chan int) {
  var out chan int
  go func () { out <- 1 + <-in }()
  return out
}

What if we do a withTimeout(onePlus(numbers), 0)? Assume the timeout fires first and that's the result that select chooses. There's still that onePlus goroutine out there trying to read from in and at some point probably it will succeed, but nobody will read its value. At that point the number just vanishes into the ether. Maybe that's OK in certain domains, but certainly not in general!

What CML gives you is the ability to express an event (which is kinda like a possibility of sending or receiving a message on a channel) in such a way that we don't run into this situation. Specifically with the wrap combinator, we would make an event such that receiving on numbers would run a function on the received message and return that as the message value -- which is of course the same as what we have, except that in CML the select wouldn't actually read the message off unless it select'd that channel for input.

Of course in Go you could just rewrite your program, so that the select statement looks like this:

select {
  case msg = <-ch: return msg + 1;
  case msg = <-timeoutChannel: return 0;
}

But here we're operating at a lower level of abstraction; we were forced to intertwingle our concerns of adding 1 and our concerns of timeout. CML is more expressive than Go.

you were right when you said we're all just bricks in the wall

However! I was right in the sense that you can build a CML system on top of Go-like systems (though possibly not Go in particular). Thanks to Vesa Karvonen for this comment and the link to their proof-of-concept CML implementation in Clojure's core.async. I understand Vesa also has an implementation in F# as well.

Folks should read Vesa's code, after reading the Reppy papers of course; it's delightfully short and expressive. The basic idea is that event composition operators like choose and wrap build up data structures instead of doing things. The sync operation then grovels through those data structures to collect a list of channels to pass on to core.async's equivalent of select. When select returns, sync determines which event that chosen channel and message corresponds to, and proceeds to "activate" the event (and, as a side effect, possibly issue NACK messages to other channels).

Provided you can map from the chosen select channel/message back to the event, (something that core.async can mostly do, with a caveat; see the code), then you can build CML on top of channels and goroutines.

o/~ yeah you were wrong o/~

On the other hand! One advantage of CML is that its events are not limited to channel sends and receives. I understand that timeouts, thread joins, and maybe some other event types are first-class event kinds in many CML systems. Michael Sperber, current Scheme48 maintainer and functional programmer, tells me that simply wrapping events in channels+goroutines works but can incur a big performance overhead relative to supporting those event types natively, due to the need to make the new goroutine and channel and the scheduling costs. He quotes 10X as the overhead!

So although CML and Go appear to be inter-expressible, maybe a proper solution will base the simple channel send/receive interface on CML rather than the other way around.

Also, since these events are now second-class, it must be OK to lose these events, for the same reason that the naïve withTimeout could lose a message from numbers. This is the case for timeouts usually but maybe you have to think about this more, and possibly provide an infinite stream of the message. (Of course the wrapper goroutine would be collected if the channel becomes unreachable.)

you were right when you said this is the end

I've long wondered how contemporary musicians deal with the enormous, crushing weight of recorded music. I don't really pick any more but hoo am I feeling this now. I think for Guile, I will continue hacking on fibers in a separate library, and I think that things will remain that way for the next couple years and possibly more. We need more experience and more mistakes before blessing and supporting any particular formulation of highly concurrent programming. I will say though that I am delighted that we are able to actually do this experimentation on a library level and I look forward to seeing what works out :)

Thanks again to Vesa, Michael, and Sam for sharing their time and knowledge; all errors are of course mine. Happy hacking!

by Andy Wingo at September 21, 2016 09:29 PM

September 20, 2016

Andy Wingo

concurrent ml versus go

Peoples! Lately I've been navigating the guile-ship through waters unknown. This post is something of an echolocation to figure out where the hell this ship is and where it should go.

Concretely, I have been working on getting a nice lightweight concurrency system rolling for Guile. I'll write more about that later, but you can think of it as being modelled on Go, though built as a library. (I had previously described it as "Erlang-like", but that's just not accurate.)

Earlier this year at Curry On this topic was burning in my mind and of course when I saw the language-hacker fam there I had to bend their ears. My targets: Matthew Flatt, the amazing boundary-crossing engineer, hacker, teacher, researcher, and implementor of Racket, and Matthias Felleisen, the godfather of the PLT research family. I saw them sitting together and I thought, you know what, what can they have to say to each other? These people have been talking together for 30 years right? Surely they are actually waiting for some ignorant dude to saunter up to the PL genius bar, right?

So saunter I do, saying, "if someone says to you that they want to build a server that will handle 100K or so simultaneous connections on Racket, what abstraction do you tell them to use? Racket threads?" Apparently: yes. A definitive yes, in the case of Matthias, with a pointer to Robby Findler's paper on kill-safe abstractions; and still a yes from Matthew with the caveat that for the concrete level of concurrency that I described, you'd have to run tests. More fundamentally, I was advised to look at Concurrent ML (on which Racket's concurrency facilities were based), that CML was much better put together than many modern variants like Go.

This was very interesting and new to me. As y'all probably know, I don't have a formal background in programming languages, and although I've read a lot of literature, reading things only makes you aware of the growing dimension of the not-yet-read. Concurrent ML was even beyond my not-yet-read horizon.

So I went back and read a bunch of papers. Turns out Concurrent ML is like Lisp in that it has a tribe and a tightly-clutched history and a diaspora that reimplements it in whatever language they happen to be working in at the moment. Kinda cool, and, um... a bit hard to appreciate in the current-day context when the only good references are papers from 10 or 20 years ago.

However, after reading a bunch of John Reppy papers, here is my understanding of what Concurrent ML is. I welcome corrections; surely I am getting this wrong.

1. CML is like Go, composed of channels and goroutines. (Forgive the modern referent; I assume most folks know Go at this point.)

2. Unlike Go, in CML a channel is never buffered. To make a buffered channel in CML, you spawn a thread that manages a buffer between two channels.

3. Message send and receive operations in CML are built on a lower-level primitive called "events". (send ch x) is instead euivalent to (sync (send-event ch x)). It's like an event is the derivative of a message send with respect to time, or something.

4. Events can be combined and transformed using the choose and wrap combinators.

5. Doing a sync on an event created by choose allows a user to build select in "user-space", as a library. Cool stuff. So this is what events are for.

6. There are separate event type implementations for timeouts, channel send/recv blocking operations, file descriptor blocking operations, syscalls, thread joins, and the like. These are supported by the CML implementation.

7. The early implementations of Concurrent ML were concurrent but not parallel; they did not run multiple "goroutines" on separate CPU cores at the same time. It was only in like 2009 that people started to do CML in parallel. I do not know if this late parallelism has a practical impact on the viability of CML.

ok go

What is the relationship of CML to Go? Specifically, is CML more expressive than Go? (I assume the reverse is not the case, but that would also be an interesting result!)

There are a few languages that only allow you to select over message receives (not sends), but Go's select doesn't have this limitation, so that's not a differentiator.

Some people say that it's nice to have events as the common denominator, but I don't get this argument. If the only event under consideration is message send or receive over a channel, events + choose + sync is the same in expressive power as a built-in select, as far as I can see. If there are other events, then your runtime already has to support them either way, and something like (let ((ch (make-channel))) (spawn-fiber (lambda () (put-message ch exp))) (get-message ch)) should be sufficient for any runtime-supported event in exp, like sleeps or timeouts or thread joins or whatever.

To me it seems like Go has made the right choices here. I do not see the difference, and that's why I wrote all this, is to be shown the error of my ways. Choosing channels, send, receive, and select as the primitives seems to have the same power as SML events.

Let this post be a pentagram on the floor, then, to summon the CML cognoscenti. Well-actuallies are very welcome; hit me up in the comments!

[edit: Sam Tobin-Hochstadt tells me I got it wrong and I believe him :) In the meantime while I work out how I was wrong, examples are welcome!]

by Andy Wingo at September 20, 2016 09:33 PM

Programming Praxis

Two-Way Cipher

A recent post at Reddit asked for a way to encrypt two plaintexts to the same ciphertext; the application was in geocaching, where a series of caches leads to two different locations depending on the decoded message. That’s an interesting question, and the responses there got it wrong. Fortunately, the poster also asked at the crypto reddit, and the people there were more helpful.

Your task is to write a program that decrypts two different plaintexts from the same ciphertext, given two different keys. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution in the comments below.


by programmingpraxis at September 20, 2016 09:00 AM

September 16, 2016

Programming Praxis

Man Or Boy

During the development of Algol 60, Donald Knuth devised a nasty test of recursion:

There are quite a few ALGOL60 translators in existence which have been designed to handle recursion and non-local references properly, and I thought perhaps a little test-program may be of value. Hence I have written the following simple routine, which may separate the man-compilers from the boy-compilers:

begin 
 real procedure A(k, x1, x2, x3, x4, x5);
 value k; integer k;
 begin
  real procedure B;
  begin 
   k := k - 1; 
   B := A := A(k, B, x1, x2, x3, x4)
  end;
  if k ≤ then A : = x4 + x5 else B
 end
 outreal(A(10, 1, -1, -1, 1, 0))
end

This uses nothing known to be tricky or ambiguous. My question is: What should the answer be? Unfortunately, I don’t have to a man-compiler myself, and so I was forced to try hand calculations. My conjecture (probably wrong) is that the answer will be:

73 - 119 - 177 + 102 = - 121

I’d be very glad to know the right answer.

Your task is to write a program that computes the right answer. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 16, 2016 09:00 AM

September 14, 2016

Guile News

GNU Guile 2.1.4 released (beta)

We are delighted to announce GNU Guile release 2.1.4, the next pre-release in what will become the 2.2 stable series.

This release fixes many small bugs, adds an atomic reference facility, and improves the effectiveness of integer unboxing in the compiler. See the release announcement for full details and a download link.

by Andy Wingo at September 14, 2016 10:10 AM

September 13, 2016

Programming Praxis

A Common Interview Question

I’ve seen this question two or three times recently on message boards that provide interview questions, so I guess we ought to add it to our collection:

Create and implement a data structure that provides

  • insert
  • delete
  • find min
  • find max
  • delete min
  • delete max

 

all in O(1) time, regardless of the type of the underlying data.

Your task is to create and implement that data structure. When you are finished, you are welcome to read a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 13, 2016 09:00 AM

September 09, 2016

Programming Praxis

A Divisor Apology

Today’s exercise is an apology from me, for writing an absolutely horrible piece of code.

While working on the nearly square divisors series of exercises, I discovered that the divisors function that I normally use is extremely slow when the number of divisors is large.

Your task is to write a function that returns the list of divisors of a number, and works with reasonable efficiency. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 09, 2016 09:00 AM

September 06, 2016

Programming Praxis

Nearly Square Divisors, Meet In The Middle

The nearly square divisors exercise has generated a considerable amount of interest, and several excellent solutions in the comments. We looked previously at Matthew Arcus’ solution using a knapsack algorithm, with logarithms to reduce the problem from multiplication to addition, thus allowing languages like C to solve the problem using their native data types instead of switching to big integers.

The knapsack solution works like this: To find the nearly square divisor of n, factor n and form the list of divisors ds. Then use the subset sum algorithm of a previous exercise (a variant of knapsack), taking products rather than sums, to find the maximal product of divisors less than the square root of n. There are several ways to solve the subset sum problem. The standard solution uses dynamic programming. Another solution splits the problem space into two parts. Both solutions require exponential time, though the meet-in-the-middle solution has better asymptotic time of O(n2n/2). We also studied a solution that takes polynomial time to produce an approximate answer to the subset sum problem, although that solution is not helpful to us because it already takes exponential time to calculate all the divisors.

Today we look at another solution buried in the comments, this one from Paul Hofstra. Here is his solution:

def divisors(facs):
    factors = [(1,) + tuple(accumulate(g, mul)) for _, g in groupby(facs)]
    div = [1]
    for g in factors:
        div = [d * f for d in div for f in g]
    return div
 
def nsd5(number):
    """ output: nearly square divisor
        method: split factors in 2 equal parts
                create divisors with small factors and sort (descending)
                create divisors with large factors (<= ulimit) and sort
                loop over large divisors and small divisors and search
                   for highest product <= ulimit
    """
    ulimit = isqrt(number)
    facs = rho_factors(number)
    mid = len(facs) // 2
    descending = reversed(sorted(divisors(facs[:mid])))
    ascending = iter(sorted(divisors(facs[mid:])))
    best = 0
    desc = next(descending)
    while True:
        for asc in ascending:
            prod = asc * desc
            if prod > best:
                if prod <= ulimit:
                    best = prod
                else:
                    break
        else:
            break  # while
        for desc in descending:
            prod = asc * desc
            if prod <= ulimit:
                if prod > best:
                    best = prod
                break
        else:
            break  # while
    return best

With this solution we are back to using big integers, and we are using the meet-in-the-middle variant of the subset sum solution to calculate the answer. The solution finds the factors, splits them into two halves, calculates the divisors of each half, then uses subset sum to find the maximal divisor less than the square root. Note that we don’t have to compute all the divisors, only the divisors of each of the two halves.

Your task is to implement Hofstra’s solution to the nearly square divisor problem and use it to calculate the nearly square divisor of the product of the primes less than 190. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below:


by programmingpraxis at September 06, 2016 09:00 AM

September 02, 2016

Programming Praxis

Find The Merge Point Of Two Lists

Given two lists, the merge point is the point after which the two lists are identical and can be merged. For instance, given the lists (a b c x y z) and (g h i x y z), the merge point is at (x y z). If the last items in the two lists differ, the merge point is null; if the two lists are the same, the entire list is the merge point.

Your task is to write a program that finds the merge point of two lists; it should return the unique prefix of the first list, the unique prefix of the second list, and the common suffix of the two lists. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at September 02, 2016 09:00 AM

August 30, 2016

Programming Praxis

Maximum K Items Out Of N

Today’s exercise is to write a program that, given n integers, outputs the list of k integers whose sum is maximum. That looks easy, but there is a catch: You must provide three solutions, one that runs in O(n log n) time, one that runs in O(n log k) time, and one that runs in O(n) time.

Your task is to write three programs as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at August 30, 2016 09:00 AM

August 26, 2016

Programming Praxis

Circular Arrays

Today’s exercise is to write a program that provides queues using a circular array. You should provide operations to make a new queue with a user-specified maximum size, determine if a queue is empty, add new elements to the end of the queue, and remove elements from the beginning of the queue.

Your task is to implement a queue as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at August 26, 2016 09:00 AM

August 23, 2016

Programming Praxis

Nearly Square Divisors, Revisited

In a recent exercise, we discussed the problem of computing the nearly square divisors of a composite number n = a · b, with a ≥ b, such that the difference ab is as small as possible. We gave two solutions: The first solution enumerated the divisors one-by-one, by trial division counting from 1, until reaching the square root, which is fast if n is small but slow in general. The second solution factored n, computed the divisors, the picked the two divisors in the middle of the list, which is fast in general but slow if n is highly composite and thus has a large number of divisors.

In a comment on that exercise, Matthew Arcus, who is a regular reader and commentor at Programming Praxis, gave a splendid answer to the problem; it relies on factoring n, but is fast even if n has a large number of divisors. His algorithm reduces the multiplication of the factors to addition of their logarithms, which means that the knapsack algorithm can be used to find the greatest sum less than the logarithm of the square root.

Your task is to implement Matthew’s algorithm to find the nearly-square divisors of a number. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at August 23, 2016 09:00 AM

August 21, 2016

Grant Rettke

August 19, 2016

Programming Praxis

Diminishing Gap Sort

Today’s exercise comes from my friend Dan:

I don’t have time to think about this now (very busy these days), so I’m just throwing the idea at you. I’m sure there’s some sort of mathematical theorem that relates, but I’m sure you’ll probably know off the top of your head.

I was fascinated not too long ago by the consulting firm we’ve hired to re-vamp our website. I think they were given a tip-off that the color selection for the website theme could be a sticky wicket. So their plan was to pull the colors dynamically from the main picture selected on the home page (and/or departmental pages), which could be changed easily or automatically. They were going to develop something that would pick out the most prominent 6 to 10 colors — even if there wasn’t much contrast (and even black & white photos would supposedly work) — which would be then used for the menus and general color scheme.

Anyway, all this got me thinking about sorting a list of numbers (if these were color codes, there could be lots) such that the greatest gaps would float to the top. I’m sure you get the idea, but I’ll give an example anyway — which I figured out in Excel (below):

Given a number set of 5, 12, 14, 15, 80, 121, 134, 144, 256 the descending-by-highest-difference sort would yield: 256, 80, 121, 134, 144, 12, 14, 15, 5. So in this case, the top 3 biggest gaps would be between 80, 121, and 256.

     Num  Diff              Num  Diff
    ----  ----             ----  ----
     256   112              256   112
     144    10               80    65
     134    13              121    41
     121    41              134    13
      80    65              144    10
      15     1               12     7
      14     2                5     5
      12     7               14     2
       5     5               15     1

I have no idea if this makes sense for the colors thing — it just seemed like the type of problem you frequently do on Programming Praxis.

Hope all is well with you and your family!

-Dan-

Your task is to write a program that sorts a set of points in order by the diminishing gaps between them. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at August 19, 2016 09:00 AM