Planet Scheme

July 17, 2019

Ben Simon

Don't Fear the x, and other lessons from Shamir Secret Sharing

Consider these facts about this 3rd degree polynomial:

f(x) = 1822 + 3x + 19x2 + 2x3

Fact 1. This is considered a 3rd degree polynomial because the largest exponent value is 3. While looking at this equation gives me a burst of High School Math PTSD, it need not. Upon further reflection, it's just a bit of terse code. I could program the above as:

(define f (lambda (x)
            (+ 1822 
               (* 3 x)
               (* 19 (expt x 2))
               (* 2  (expt x 3)))))

Because all polynomials have the same shape, it's easy to make a function that generates polynomials:

(define (make-poly coeffs)
  (lambda (x)
    (let loop ((coeffs coeffs)
               (i 0)
               (sum 0))
      (cond ((null? coeffs) sum)
            (else
             (loop (cdr coeffs)
                   (+ i 1)
                   (+ sum (* (car coeffs) (^ x i)))))))))

With this 'make' function, I can replace the above code with the following:

(define f (make-poly '(1822 3 19 2)))

I can call f with as many values of x as I wish:

> (for-each (lambda (x) (show (cons x (f x)))) (range 0 10))
((0 . 1822))
((1 . 1846))
((2 . 1920))
((3 . 2056))
((4 . 2266))
((5 . 2562))
((6 . 2956))
((7 . 3460))
((8 . 4086))
((9 . 4846))
((10 . 5752))

Fact 2. Evaluating a polynomial at x = 0 always returns the value of the first constant term. We see this above, as 0 maps to 1822. This will continue to hold true no matter how ugly and complex the polynomial is.

Fact 3. Using Lagrange Interpolating polynomials we can use degree + 1 values of a polynomial to construct a function which will return values equivalent to the original polynomial. That is, if we feed any four values above into make-lagr-poly we end up with a new function fg which computes the same values as f.

(define fg (make-lagr-poly '((3 . 2056)
                            (6 . 2956)
                            (7 . 3460)
                            (10 . 5752))))

> (for-each (lambda (x) (show (cons x (fg x)))) (range 0 10))
((0 . 1822))
((1 . 1846))
((2 . 1920))
((3 . 2056))
((4 . 2266))
((5 . 2562))
((6 . 2956))
((7 . 3460))
((8 . 4086))
((9 . 4846))
((10 . 5752))

The how and why of Lagrange polynomials will have to wait for another blog post. But for now, trust me that this bit of math-magic works.

Cryptographer Adi Shamir combined the above facts to create something quite useful: a way to securely share a secret among a group.

Consider this problem:

You need to distribute a vault code to bank executives, however you'd like to avoid a number of pitfalls. For one, you want to make sure that a single lost or stolen code doesn't allow an attacker into the vault. You'd also like to insure that at least 4 executives need to agree on the requirement of opening the vault before it can be accessed. This reduces the chances that a small number of executives will coordinate to steal from the vault.

Shamir used the above math to create Shamir Secret Sharing. Here's how it works. First, note your secret, which in this case is the vault code. We'll use 12345 as our code. Then decide on your threshold, that is the number of executives that are required before the secret can be revealed. We'll use 4 as suggested above. Finally, decide on how many shares you want to give out. Let's assume there are 10 executives, so we'll generate 10 shares.

Step one is to form a polynomial of threshold - 1 degree, where the first term is the secret:

;; values 4, 19 and 103 are random numbers, and should be
;; generated in a secure and truly random way.
(define secret (make-poly '(12345 4 19 103))) 

Step two is to generate the 10 shares, one for each executive:

> (for-each (lambda (x) (show (cons x (secret x)))) (range 1 10))
((1 . 12471))
((2 . 13253))
((3 . 15309))
((4 . 19257))
((5 . 25715))
((6 . 35301))
((7 . 48633))
((8 . 66329))
((9 . 89007))
((10 . 117285))

Note how we start x at 1 and not 0. Calling this function with 0 reveals our secret.

We give each pair of numbers to an executive and instruct them to keep the pair private. If an attacker obtains less than 4 shares, then the secret remains safe and no clues about the it are revealed.

When the vault needs to be opened, 4 executives pool their shares to calculate a function equivalent to the original polynomial:

(define soln (make-lagr-poly '((1 . 12471)
                               (5 .  25715)
                               (9 . 89007)
                               (10 . 117285))))

Finally, to derive the secret we pass 0 to the solution function:

> (soln 0)
12345

While the above implementation captures the essence of Shamir Secret Sharing, it's missing an important step to make it truly secure. I may cover that in a future post.

I had two takeaways from my experience implementing Shamir Secret Sharing. First, it's fascinating to see how mathematical properties can be combined to solve everyday problems

And Second, I found the math behind this scheme to be initially quite overwhelming. I struggled to both understand the mathematical notation, as well as how to convert it to code. And then it hit me: I've got a programming language that let's me directly implement polynomial and Lagrange interpolations functions; I don't need to think of them as purely symbolic expressions like so many of the examples on the web do. Putting this in terms of code and not math made all the difference for me.

What a joy it was to untangle what initially appeared to be so complex.

Find my implementation of Shamir Secret Sharing here. Find the original paper by Adi Shamir here. Enjoy!

by Ben Simon (noreply@blogger.com) at July 17, 2019 08:45 PM

July 12, 2019

GNU Guix

Towards Guix for DevOps

Hey, there! I'm Jakob, a Google Summer of Code intern and new contributor to Guix. Since May, I've been working on a DevOps automation tool for the Guix System, which we've been calling guix deploy.

The idea for a Guix DevOps tool has been making rounds on the mailing lists for some time now. Years, in fact; Dave Thompson and Chris Webber put together a proof-of-concept for it way back in 2015. Thus, we've had plenty of time to gaze upon the existing tools for this sort of thing -- Ansible, NixOps -- and fantasize about a similar tool, albeit with the expressive power of Guile scheme and the wonderful system configuration facilities of Guix. And now, those fantasies are becoming a reality.

"DevOps" is a term that might be unfamiliar to a fair number of Guix users. I'll spare you the detour to Wikipedia and give a brief explanation of what guix deploy does.

Imagine that you've spent the afternoon playing around with Guile's (web) module, developing software for a web forum. Awesome! But a web forum with no users is pretty boring, so you decide to shell out a couple bucks for a virtual private server to run your web forum. You feel that Wildebeest admirers on the internet deserve a platform of their own for discussion, and decide to dedicate the forum to that.

As it turns out, C. gnou is a more popular topic than you ever would have imagined. Your web forum soon grows in size -- attracting hundreds of thousands of simultaneous users. Despite Guile's impressive performance characteristics, one lowly virtual machine is too feeble to support such a large population of Wildebeest fanatics. So you decide to use Apache as a load-balancer, and shell out a couple more bucks for a couple more virtual private servers. Now you've got a problem on your hands; you're the proud owner of five or so virtual machines, and you need to make sure they're all running the most recent version of either your web forum software or Apache.

This is where guix deploy comes into play. Just as you'd use an operating-system declaration to configure services and user accounts on a computer running the Guix System, you can now use that same operating-system declaration to remotely manage any number of machines. A "deployment" managing your Wildebeest fan site setup might look something like this:

...

;; Service for our hypothetical guile web forum application.
(define guile-forum-service-type
  (service-type (name 'guile-forum)
                (extensions
                 (list (service-extension shepherd-root-service-type
                                          guile-forum-shepherd-service)
                       (service-extension account-service-type
                                          (const %guile-forum-accounts))))
                (default-value (guile-forum-configuration))
                (description "A web forum written in GNU Guile.")))

...

(define %forum-server-count 4)

(define (forum-server n)
  (operating-system
    (host-name (format #f "forum-server-~a" n))
    ...
    (services
     (append (list (service guile-forum-service-type
                            (guile-forum-configuration
                             "GNU Fan Forum!")))
             %base-services))))

(define load-balancer-server
  (operating-system
    (host-name "load-balancer-server"
    ...
    (services
     (append (list (service httpd-service-type
                            (httpd-configuration
                             ...)))
             %base-services)))))

;; One machine running our load balancer.
(cons (machine
       (system load-balancer-server)
       (environment manged-host-environment-type)
       (configuration (machine-ssh-configuration
                       ...)))

      ;; And a couple running our forum software!
      (let loop ((n 1)
                 (servers '()))
        (if (> n %forum-server-count)
            servers
            (loop (1+ n)
                  (cons (machine
                         (system (forum-server n))
                         (environment manged-host-environment-type)
                         (configuration (machine-ssh-configuration
                                         ...)))
                        servers)))))

The take-away from that example is that there's a new machine type atop the good ol' operating-system type, specifying how the machine should be provisioned. The version of guix deploy that's currently on the master branch only supports managed-host-environment-type, which is used for machines that are already up and running the Guix System. Provisioning, in that sense, only really involves opening an SSH connection to the host. But I'm sure you can imagine a linode-environment-type which automatically sets up a virtual private server through Linode, or a libvirt-environment-type that spins up a virtual machine for running your services. Those types are what I'll be working on in the coming months, in addition to cleaning up the code that's there now.

And yes, you did read that right. guix deploy is on the Guix master branch right now! In fact, we've already done a successful deployment right here on ci.guix.gnu.org. So, if this sounds as though it'd be up your alley, run guix pull, crack open the manual, and let us know how it goes!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Jakob L. Kreuze at July 12, 2019 07:00 PM

Programming Praxis

Interactive Diff

[ There will be no exercises next week, as my daughter is visiting from out-of-town and I will be spending time with her. ]

One of my favorite books is The Unix Programming Environment by Brian Kernighan and Rob Pike, which I have recently been re-reading; the spine of my copy broke long ago, so I must be careful turning the pages, and I wrap the book in a rubber band every time I put it back on my shelf. It is an excellent introduction to Unix, still relevant today even though it was published in 1984. The recent exercise Remind was inspired by a program in Section 4.4, and today’s exercise is a rewrite of the program in Section 6.8.

NAME

    idiff -- interactive diff

USAGE

    idiff file1 file2 -- interactively merge file differences

DESCRIPTION

    idiff takes two files file1 and file2, diffs them, and presents
    the difference to the user interactively. At each difference,
    the user may accept the text from file1 by responding <, or     accept the text from file2 by responding >, or edit the difference
    by responding e (in which case whatever the user saves from the
    editor is entered into the output file), or execute a command
    by typing !cmd, in which case the user must then respond when
    the prompt is repeated. The assembled output with the selected
    differences is placed in file idiff.out.

EXAMPLE

    $ cat file1
    This is
    a test
    of
    your
    skill
    and comprehension.
    $ cat file2
    This is
    not a test
    of
    our
    ability.
    $ diff file1 file2
    2c2
    < a test     ---     > not a test
    4,6d4,5
    < your
    < skill
    < and comprehension.     ---     > our
    > ability.
    $ idiff file1 file2
    2c2
    < a test     ---     > not a test
    ? >
    4,6c4,5
    < your
    < skill
    < and comprehension.     ---     > our
    > ability.
    ? <
    $ cat idiff.out
    This is
    not a test
    of
    your
    skill
    and comprehension.

[ WordPress is determined not to render less-than and greater-than signs properly. Look at https://ideone.com/nI9CNB if you can’t make sense of what you see. ]

Your task is to write idiff. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at July 12, 2019 09:00 AM

July 09, 2019

Programming Praxis

Doubled Letters

We have a fun little exercise on a lazy summer Tuesday:

Given a list of words, remove from the list those words that have two adjacent identical letters. For instance, given “Now is the time for all good men to come to the aid of their country” the program should remove the words “all” and “good”.

Your task is to write a program to remove words with doubled letters from a list of words. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at July 09, 2019 09:00 AM

July 02, 2019

Programming Praxis

Remind

NAME

    remind -- print reminders of upcoming events

USAGE

    remind -- show reminders for next seven days
    remind [year] month day message -- add reminder to database

DESCRIPTION

    Remind maintains a database of reminders in the .reminders file,
    in the user's home directory, each a single line of the form

        [year] month day message

    Year is optional, and must be an integer greater than 99; if no
    year is given, the reminder applies to all years (for instance,
    birthdays).

    If remind is called with no arguments, it writes to standard
    output all reminders that occur within the next seven days. If
    remind is called with arguments giving a date and message, a
    reminder is added to the database. Any time remind is called,
    all past reminders are deleted from the database.

EXAMPLE

    $ date
    Sun Jun 30 19:45:38 CDT 2019
    $ remind 4 2 Anne birthday
    $ remind 10 13 Kate birthday
    $ remind 7 4 Independence Day
    $ remind 2019 7 2 lunch with Pat
    $ remind 2019 5 13 dentist 2:00pm
    $ remind
    7 4 Independence Day
    2019 7 2 lunch with Pat
    $ cat ./reminders
    4 2 Anne birthday
    10 13 Kate birthday
    7 4 Independence Day
    2019 7 2 lunch with Pat

Your task is to implement remind. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at July 02, 2019 09:00 AM

June 28, 2019

Programming Praxis

Deblank

Today’s task is easy. I expect to see lots of imaginative and over-the-top solutions:

Write a program that passes its input to its output, removing any lines that are either empty or contain only “white” characters like space and tab.

Your task is to write a program that removes blank lines from files. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 28, 2019 09:00 AM

June 26, 2019

Andy Wingo

fibs, lies, and benchmarks

Friends, consider the recursive Fibonacci function, expressed most lovelily in Haskell:

fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

Computing elements of the Fibonacci sequence ("Fibonacci numbers") is a common microbenchmark. Microbenchmarks are like a Suzuki exercises for learning violin: not written to be good tunes (good programs), but rather to help you improve a skill.

The fib microbenchmark teaches language implementors to improve recursive function call performance.

I'm writing this article because after adding native code generation to Guile, I wanted to check how Guile was doing relative to other language implementations. The results are mixed. We can start with the most favorable of the comparisons: Guile present versus Guile of the past.


I collected these numbers on my i7-7500U CPU @ 2.70GHz 2-core laptop, with no particular performance tuning, running each benchmark 10 times, waiting 2 seconds between measurements. The bar value indicates the median elapsed time, and above each bar is an overlayed histogram of all results for that scenario. Note that the y axis is on a log scale. The 2.9.3* version corresponds to unreleased Guile from git.

Good news: Guile has been getting significantly faster over time! Over decades, true, but I'm pleased.

where are we? static edition

How good are Guile's numbers on an absolute level? It's hard to say because there's no absolute performance oracle out there. However there are relative performance oracles, so we can try out perhaps some other language implementations.

First up would be the industrial C compilers, GCC and LLVM. We can throw in a few more "static" language implementations as well: compilers that completely translate to machine code ahead-of-time, with no type feedback, and a minimal run-time.


Here we see that GCC is doing best on this benchmark, completing in an impressive 0.304 seconds. It's interesting that the result differs so much from clang. I had a look at the disassembly for GCC and I see:

fib:
    push   %r12
    mov    %rdi,%rax
    push   %rbp
    mov    %rdi,%rbp
    push   %rbx
    cmp    $0x1,%rdi
    jle    finish
    mov    %rdi,%rbx
    xor    %r12d,%r12d
again:
    lea    -0x1(%rbx),%rdi
    sub    $0x2,%rbx
    callq  fib
    add    %rax,%r12
    cmp    $0x1,%rbx
    jg     again
    and    $0x1,%ebp
    lea    0x0(%rbp,%r12,1),%rax
finish:
    pop    %rbx
    pop    %rbp
    pop    %r12
    retq   

It's not quite straightforward; what's the loop there for? It turns out that GCC inlines one of the recursive calls to fib. The microbenchmark is no longer measuring call performance, because GCC managed to reduce the number of calls. If I had to guess, I would say this optimization doesn't have a wide applicability and is just to game benchmarks. In that case, well played, GCC, well played.

LLVM's compiler (clang) looks more like what we'd expect:

fib:
   push   %r14
   push   %rbx
   push   %rax
   mov    %rdi,%rbx
   cmp    $0x2,%rdi
   jge    recurse
   mov    %rbx,%rax
   add    $0x8,%rsp
   pop    %rbx
   pop    %r14
   retq   
recurse:
   lea    -0x1(%rbx),%rdi
   callq  fib
   mov    %rax,%r14
   add    $0xfffffffffffffffe,%rbx
   mov    %rbx,%rdi
   callq  fib
   add    %r14,%rax
   add    $0x8,%rsp
   pop    %rbx
   pop    %r14
   retq   

I bolded the two recursive calls.

Incidentally, the fib as implemented by GCC and LLVM isn't quite the same program as Guile's version. If the result gets too big, GCC and LLVM will overflow, whereas in Guile we overflow into a bignum. Also in C, it's possible to "smash the stack" if you recurse too much; compilers and run-times attempt to mitigate this danger but it's not completely gone. In Guile you can recurse however much you want. Finally in Guile you can interrupt the process if you like; the compiled code is instrumented with safe-points that can be used to run profiling hooks, debugging, and so on. Needless to say, this is not part of C's mission.

Some of these additional features can be implemented with no significant performance cost (e.g., via guard pages). But it's fair to expect that they have some amount of overhead. More on that later.

The other compilers are OCaml's ocamlopt, coming in with a very respectable result; Go, also doing well; and V8 WebAssembly via Node. As you know, you can compile C to WebAssembly, and then V8 will compile that to machine code. In practice it's just as static as any other compiler, but the generated assembly is a bit more involved:

fib_tramp:
    jmp    fib

fib:
    push   %rbp
    mov    %rsp,%rbp
    pushq  $0xa
    push   %rsi
    sub    $0x10,%rsp
    mov    %rsi,%rbx
    mov    0x2f(%rbx),%rdx
    mov    %rax,-0x18(%rbp)
    cmp    %rsp,(%rdx)
    jae    stack_check
post_stack_check:
    cmp    $0x2,%eax
    jl     return_n
    lea    -0x2(%rax),%edx
    mov    %rbx,%rsi
    mov    %rax,%r10
    mov    %rdx,%rax
    mov    %r10,%rdx
    callq  fib_tramp
    mov    -0x18(%rbp),%rbx
    sub    $0x1,%ebx
    mov    %rax,-0x20(%rbp)
    mov    -0x10(%rbp),%rsi
    mov    %rax,%r10
    mov    %rbx,%rax
    mov    %r10,%rbx
    callq  fib_tramp
return:
    mov    -0x20(%rbp),%rbx
    add    %ebx,%eax
    mov    %rbp,%rsp
    pop    %rbp
    retq   
return_n:
    jmp    return
stack_check:
    callq  WasmStackGuard
    mov    -0x10(%rbp),%rbx
    mov    -0x18(%rbp),%rax
    jmp    post_stack_check

Apparently fib compiles to a function of two arguments, the first passed in rsi, and the second in rax. (V8 uses a custom calling convention for its compiled WebAssembly.) The first synthesized argument is a handle onto run-time data structures for the current thread or isolate, and in the function prelude there's a check to see that the function has enough stack. V8 uses these stack checks also to handle interrupts, for when a web page is stuck in JavaScript.

Otherwise, it's a more or less normal function, with a bit more register/stack traffic than would be strictly needed, but pretty good.

do optimizations matter?

You've heard of Moore's Law -- though it doesn't apply any more, it roughly translated into hardware doubling in speed every 18 months. (Yes, I know it wasn't precisely that.) There is a corresponding rule of thumb for compiler land, Proebsting's Law: compiler optimizations make software twice as fast every 18 years. Zow!

The previous results with GCC and LLVM were with optimizations enabled (-O3). One way to measure Proebsting's Law would be to compare the results with -O0. Obviously in this case the program is small and we aren't expecting much work out of the optimizer, but it's interesting to see anyway:


Answer: optimizations don't matter much for this benchark. This investigation does give a good baseline for compilers from high-level languages, like Guile: in the absence of clever trickery like the recursive inlining thing GCC does and in the absence of industrial-strength instruction selection, what's a good baseline target for a compiler? Here we see for this benchmark that it's somewhere between 420 and 620 milliseconds or so. Go gets there, and OCaml does even better.

how is time being spent, anyway?

Might we expect V8/WebAssembly to get there soon enough, or is the stack check that costly? How much time does one stack check take anyway? For that we'd have to determine the number of recursive calls for a given invocation.

Friends, it's not entirely clear to me why this is, but I instrumented a copy of fib, and I found that the number of calls in fib(n) was a more or less constant factor of the result of calling fib. That ratio converges to twice the golden ratio, which means that since fib(n+1) ~= φ * fib(n), then the number of calls in fib(n) is approximately 2 * fib(n+1). I scratched my head for a bit as to why this is and I gave up; the Lord works in mysterious ways.

Anyway for fib(40), that means that there are around 3.31e8 calls, absent GCC shenanigans. So that would indicate that each call for clang takes around 1.27 ns, which at turbo-boost speeds on this machine is 4.44 cycles. At maximum throughput (4 IPC), that would indicate 17.8 instructions per call, and indeed on the n > 2 path I count 17 instructions.

For WebAssembly I calculate 2.25 nanoseconds per call, or 7.9 cycles, or 31.5 (fused) instructions at max IPC. And indeed counting the extra jumps in the trampoline, I get 33 cycles on the recursive path. I count 4 instructions for the stack check itself, one to save the current isolate, and two to shuffle the current isolate into place for the recursive calls. But, compared to clang, V8 puts 6 words on the stack per call, as opposed to only 4 for LLVM. I think with better interprocedural register allocation for the isolate (i.e.: reserve a register for it), V8 could get a nice boost for call-heavy workloads.

where are we? dynamic edition

Guile doesn't aim to replace C; it's different. It has garbage collection, an integrated debugger, and a compiler that's available at run-time, it is dynamically typed. It's perhaps more fair to compare to languages that have some of these characteristics, so I ran these tests on versions of recursive fib written in a number of languages. Note that all of the numbers in this post include start-up time.


Here, the ocamlc line is the same as before, but using the bytecode compiler instead of the native compiler. It's a bit of an odd thing to include but it performs so well I just had to include it.

I think the real takeaway here is that Chez Scheme has fantastic performance. I have not been able to see the disassembly -- does it do the trick like GCC does? -- but the numbers are great, and I can see why Racket decided to rebase its implementation on top of it.

Interestingly, as far as I understand, Chez implements stack checks in the straightfoward way (an inline test-and-branch), not with a guard page, and instead of using the stack check as a generic ability to interrupt a computation in a timely manner as V8 does, Chez emits a separate interrupt check. I would like to be able to see Chez's disassembly but haven't gotten around to figuring out how yet.

Since I originally published this article, I added a LuaJIT entry as well. As you can see, LuaJIT performs as well as Chez in this benchmark.

Haskell's call performance is surprisingly bad here, beaten even by OCaml's bytecode compiler; is this the cost of laziness, or just a lacuna of the implementation? I do not know. I do know I have this mental image that Haskell is a good compiler but apparently if that's the standard, so is Guile :)

Finally, in this comparison section, I was not surprised by cpython's relatively poor performance; we know cpython is not fast. I think though that it just goes to show how little these microbenchmarks are worth when it comes to user experience; like many of you I use plenty of Python programs in my daily work and don't find them slow at all. Think of micro-benchmarks like x-ray diffraction; they can reveal the hidden substructure of DNA but they say nothing at all about the organism.

where to now?

Perhaps you noted that in the last graph, the Guile and Chez lines were labelled "(lexical)". That's because instead of running this program:

(define (fib n)
  (if (< n 2)
      n
      (+ (fib (- n 1)) (fib (- n 2)))))

They were running this, instead:

(define (fib n)
  (define (fib* n)
    (if (< n 2)
        n
        (+ (fib* (- n 1)) (fib* (- n 2)))))
  (fib* n))

The thing is, historically, Scheme programs have treated top-level definitions as being mutable. This is because you don't know the extent of the top-level scope -- there could always be someone else who comes and adds a new definition of fib, effectively mutating the existing definition in place.

This practice has its uses. It's useful to be able to go in to a long-running system and change a definition to fix a bug or add a feature. It's also a useful way of developing programs, to incrementally build the program bit by bit.


But, I would say that as someone who as written and maintained a lot of Scheme code, it's not a normal occurence to mutate a top-level binding on purpose, and it has a significant performance impact. If the compiler knows the target to a call, that unlocks a number of important optimizations: type check elision on the callee, more optimal closure representation, smaller stack frames, possible contification (turning calls into jumps), argument and return value count elision, representation specialization, and so on.

This overhead is especially egregious for calls inside modules. Scheme-the-language only gained modules relatively recently -- relative to the history of scheme -- and one of the aspects of modules is precisely to allow reasoning about top-level module-level bindings. This is why running Chez Scheme with the --program option is generally faster than --script (which I used for all of these tests): it opts in to the "newer" specification of what a top-level binding is.

In Guile we would probably like to move towards a more static way of treating top-level bindings, at least those within a single compilation unit. But we haven't done so yet. It's probably the most important single optimization we can make over the near term, though.

As an aside, it seems that LuaJIT also shows a similar performance differential for local function fib(n) versus just plain function fib(n).

It's true though that even absent lexical optimizations, top-level calls can be made more efficient in Guile. I am not sure if we can reach Chez with the current setup of having a template JIT, because we need two return addresses: one virtual (for bytecode) and one "native" (for JIT code). Register allocation is also something to improve but it turns out to not be so important for fib, as there are few live values and they need to spill for the recursive call. But, we can avoid some of the indirection on the call, probably using an inline cache associated with the callee; Chez has had this optimization since 1984!

what guile learned from fib

This exercise has been useful to speed up Guile's procedure calls, as you can see for the difference between the latest Guile 2.9.2 release and what hasn't been released yet (2.9.3).

To decide what improvements to make, I extracted the assembly that Guile generated for fib to a standalone file, and tweaked it in a number of ways to determine what the potential impact of different scenarios was. Some of the detritus from this investigation is here.

There were three big performance improvements. One was to avoid eagerly initializing the slots in a function's stack frame; this took a surprising amount of run-time. Fortunately the rest of the toolchain like the local variable inspector was already ready for this change.

Another thing that became clear from this investigation was that our stack frames were too large; there was too much memory traffic. I was able to improve this in the lexical-call by adding an optimization to elide useless closure bindings. Usually in Guile when you call a procedure, you pass the callee as the 0th parameter, then the arguments. This is so the procedure has access to its closure. For some "well-known" procedures -- procedures whose callers can be enumerated -- we optimize to pass a specialized representation of the closure instead ("closure optimization"). But for well-known procedures with no free variables, there's no closure, so we were just passing a throwaway value (#f). An unhappy combination of Guile's current calling convention being stack-based and a strange outcome from the slot allocator meant that frames were a couple words too big. Changing to allow a custom calling convention in this case sped up fib considerably.

Finally, and also significantly, Guile's JIT code generation used to manually handle calls and returns via manual stack management and indirect jumps, instead of using the platform calling convention and the C stack. This is to allow unlimited stack growth. However, it turns out that the indirect jumps at return sites were stalling the pipeline. Instead we switched to use call/return but keep our manual stack management; this allows the CPU to use its return address stack to predict return targets, speeding up code.

et voilà

Well, long article! Thanks for reading. There's more to do but I need to hit the publish button and pop this off my stack. Until next time, happy hacking!

by Andy Wingo at June 26, 2019 10:34 AM

June 25, 2019

Programming Praxis

Dividing Without Divide

Today’s task comes from a student programmer:

Given a numerator and divisor of unsigned integers, compute the quotient and remainder. You cannot use divide, cannot use mod, and you want to optimize for speed.

After a while, the student programmer looked up the official solution:

def divide_problem(num, div):
    quotient = 1
    while num - (quotient * div) >= div:
        print(num - (quotient * div), "loop")
        quotient += 1
    remainder = num - (quotient * div)
    return quotient, remainder

Your task is to comment on the official solution and write a better one. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 25, 2019 09:00 AM

June 21, 2019

Programming Praxis

String Comparison

A string is a sequence of characters, perhaps including a backspace character that renders both itself and the previous character as if they were not part of the string. For instance, if we make the backspace character visible using the # symbol, the two strings abcx#de and abcde are identical. Multiple successive backspaces remove multiple successive characters.

Your task is to write a program that compares two strings that may contain backspaces and reports if they are equal. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 21, 2019 09:00 AM

June 18, 2019

Programming Praxis

Latin Squares

A latin square of order n is a square matrix with n rows and n columns, with each entry in the matrix containing an integer from 0 to n − 1, arranged so that no row or column contains duplicate integers. Here is a sample latin square of order 10:

8 3 7 1 5 6 4 2 0 9
4 5 6 2 0 9 3 7 8 1
9 2 3 8 7 5 1 4 6 0
2 6 0 3 9 8 7 5 1 4
0 4 2 9 3 7 8 1 5 6
6 1 4 0 2 3 9 8 7 5
1 7 5 4 6 0 2 3 9 8
3 0 9 7 8 1 5 6 4 2
5 8 1 6 4 2 0 9 3 7
7 9 8 5 1 4 6 0 2 3

Your task is to write a program that generates latin squares of order n. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 18, 2019 09:00 AM

June 17, 2019

GNU Guix

Substitutes are now available as lzip

For a long time, our build farm at ci.guix.gnu.org has been delivering substitutes (pre-built binaries) compressed with gzip. Gzip was never the best choice in terms of compression ratio, but it was a reasonable and convenient choice: it’s rock-solid, and zlib made it easy for us to have Guile bindings to perform in-process compression in our multi-threaded guix publish server.

With the exception of building software from source, downloads take the most time of Guix package upgrades. If users can download less, upgrades become faster, and happiness ensues. Time has come to improve on this, and starting from early June, Guix can publish and fetch lzip-compressed substitutes, in addition to gzip.

Lzip

Lzip is a relatively little-known compression format, initially developed by Antonio Diaz Diaz ca. 2013. It has several C and C++ implementations with surprisingly few lines of code, which is always reassuring. One of its distinguishing features is a very good compression ratio with reasonable CPU and memory requirements, according to benchmarks published by the authors.

Lzlib provides a well-documented C interface and Pierre Neidhardt set out to write bindings for that library, which eventually landed as the (guix lzlib) module.

With this in place we were ready to start migrating our tools, and then our build farm, to lzip compression, so we can all enjoy smaller downloads. Well, easier said than done!

Migrating

The compression format used for substitutes is not a core component like it can be in “traditional” binary package formats such as .deb since Guix is conceptually a “source-based” distro. However, deployed Guix installations did not support lzip, so we couldn’t just switch our build farm to lzip overnight; we needed to devise a transition strategy.

Guix asks for the availability of substitutes over HTTP. For example, a question such as:

“Dear server, do you happen to have a binary of /gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2 that I could download?”

translates into prose to an HTTP GET of https://ci.guix.gnu.org/6yc4ngrsig781bpayax2cg6pncyhkjpq.narinfo, which returns something like:

StorePath: /gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
URL: nar/gzip/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
Compression: gzip
NarHash: sha256:0h2ibqpqyi3z0h16pf7ii6l4v7i2wmvbrxj4ilig0v9m469f6pm9
NarSize: 134407424
References: 2dk55i5wdhcbh2z8hhn3r55x4873iyp1-libxext-1.3.3 …
FileSize: 48501141
System: x86_64-linux
Deriver: 6xqibvc4v8cfppa28pgxh0acw9j8xzhz-emacs-26.2.drv
Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…

(This narinfo format is inherited from Nix and implemented here and here.) This tells us we can download the actual binary from /nar/gzip/…-emacs-26.2, and that it will be about 46 MiB (the FileSize field.) This is what guix publish serves.

The trick we came up with was to allow guix publish to advertise several URLs, one per compression format. Thus, for recently-built substitutes, we get something like this:

StorePath: /gnu/store/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
URL: nar/gzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
Compression: gzip
FileSize: 30872887
URL: nar/lzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
Compression: lzip
FileSize: 18829088
NarHash: sha256:10n3nv3clxr00c9cnpv6x7y2c66034y45c788syjl8m6ga0hbkwy
NarSize: 94372664
References: 05zlxc7ckwflz56i6hmlngr86pmccam2-pcre-8.42 …
System: x86_64-linux
Deriver: vi2jkpm9fd043hm0839ibbb42qrv5xyr-gimp-2.10.12.drv
Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…

Notice that there are two occurrences of the URL, Compression, and FileSize fields: one for gzip, and one for lzip. Old Guix instances will just pick the first one, gzip; newer Guix will pick whichever supported method provides the smallest FileSize, usually lzip. This will make migration trivial in the future, should we add support for other compression methods.

Users need to upgrade their Guix daemon to benefit from lzip. On a “foreign distro”, simply run guix pull as root. On standalone Guix systems, run guix pull && sudo guix system reconfigure /etc/config.scm. In both cases, the daemon has to be restarted, be it with systemctl restart guix-daemon.service or with herd restart guix-daemon.

First impressions

This new gzip+lzip scheme has been deployed on ci.guix.gnu.org for a week. Specifically, we run guix publish -C gzip:9 -C lzip:9, meaning that we use the highest compression ratio for both compression methods.

Currently, only a small subset of the package substitutes are available as both lzip and gzip; those that were already available as gzip have not been recompressed. The following Guile program that taps into the API of guix weather allows us to get some insight:

(use-modules (gnu) (guix)
             (guix monads)
             (guix scripts substitute)
             (srfi srfi-1)
             (ice-9 match))

(define all-packages
  (@@ (guix scripts weather) all-packages))

(define package-outputs
  (@@ (guix scripts weather) package-outputs))

(define (fetch-lzip-narinfos)
  (mlet %store-monad ((items (package-outputs (all-packages))))
    (return
     (filter (lambda (narinfo)
               (member "lzip" (narinfo-compressions narinfo)))
             (lookup-narinfos "https://ci.guix.gnu.org" items)))))

(define (lzip/gzip-ratio narinfo)
  (match (narinfo-file-sizes narinfo)
    ((gzip lzip)
     (/ lzip gzip))))

(define (average lst)
  (/ (reduce + 0 lst)
     (length lst) 1.))

Let’s explore this at the REPL:

scheme@(guile-user)> (define lst
                       (with-store s
                         (run-with-store s (fetch-lzip-narinfos))))
computing 9,897 package derivations for x86_64-linux...
updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
scheme@(guile-user)> (length lst)
$4 = 2275
scheme@(guile-user)> (average (map lzip/gzip-ratio lst))
$5 = 0.7398994395478715

As of this writing, around 20% of the package substitutes are available as lzip, so take the following stats with a grain of salt. Among those, the lzip-compressed substitute is on average 26% smaller than the gzip-compressed one. What if we consider only packages bigger than 5 MiB uncompressed?

scheme@(guile-user)> (define biggest
                       (filter (lambda (narinfo)
                                 (> (narinfo-size narinfo)
                                    (* 5 (expt 2 20))))
                               lst))
scheme@(guile-user)> (average (map lzip/gzip-ratio biggest))
$6 = 0.5974238562384483
scheme@(guile-user)> (length biggest)
$7 = 440

For those packages, lzip yields substitutes that are 40% smaller on average. Pretty nice! Lzip decompression is slightly more CPU-intensive than gzip decompression, but downloads are bandwidth-bound, so the benefits clearly outweigh the costs.

Going forward

The switch from gzip to lzip has the potential to make upgrades “feel” faster, and that is great in itself.

Fundamentally though, we’ve always been looking in this project at peer-to-peer solutions with envy. Of course, the main motivation is to have a community-supported and resilient infrastructure, rather than a centralized one, and that vision goes hand-in-hand with reproducible builds.

We started working on an extension to publish and fetch substitutes over IPFS. Thanks to its content-addressed nature, IPFS has the potential to further reduce the amount of data that needs to be downloaded on an upgrade.

The good news is that IPFS developers are also interested in working with package manager developers, and I bet there’ll be interesting discussions at IPFS Camp in just a few days. We’re eager to pursue our IPFS integration work, and if you’d like to join us and hack the good hack, let’s get in touch!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Ludovic Courtès at June 17, 2019 02:30 PM

June 14, 2019

Programming Praxis

Van Eck Sequence

Neil Sloane is on Numberphile again, discussing the Van Eck sequence (A181391):

The first item in the sequence is 0. Compute the next item as follows: If the previous item has not previously appeared in the sequence, add 0 to the sequence, otherwise add to the sequence the number of steps back in the sequence the previous item previously appeared. For instance, the first item is 0. Since 0 has not previously appeared in the sequence, the next item is 0. Now 0 has previously appeared, and the previous 0 was one back in the sequence, so add 1 to the sequence. Since 1 has not previously appeared, add 0. But 0 appeared previously, two back in the sequence, so add 2. Since 2 has not previously appeared, add 0. But 0 appeared previously, two items back, so add 2 to the sequence. Since 2 previously appeared in the sequence, two terms back, add another 2 to the sequence. The next item in the sequence is 1, because 2 also appeared as the previous number. Since 1 appeared in the sequence, count back to the previous 1, and add 6 to the sequence. And so on. The sequence begins 0, 0, 1, 0, 2, 0, 2, 2, 1, 6, 0, 5, 0, 2, 6, 5, 4, 0, ….

Your task is to write a program that generates the Van Eck sequence and investigate the properties of the sequence. When you are finished, you are welcome to ,a href=”https://programmingpraxis.com/2019/06/14/van-eck-sequence/2/”>read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 14, 2019 09:00 AM

June 11, 2019

Programming Praxis

Maximum Product

I think today’s exercise comes from one of those hacker testing sites, but I’m not sure:

Given three arrays of integers, both positive and negative, find the maximum product that can be formed by taking one element from each array. For instance, if A = [10,-10,15,-2], B = [10,-12,13,-2], and C = [-11,-10,9,-12], the maximum product is 2160 using 15, -12 and -12.

Your task is to write a program that finds the maximum product of three integers, taking one each from three arrays. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 11, 2019 09:00 AM

June 07, 2019

Programming Praxis

Primitive Roots

In modular arithmetic, a number g is a primitive root of n if, for every a coprime to n, there exists some k such that gka (mod n). The concept of a primitive root was developed by Gauss and pops up from time to time in work on cryptography and number theory.

There is no formula for computing a primitive root, but they are common enough that it is easy to just randomly search for them; more commonly, people just start at 2 and work their way up. Once a primitive root has been found, the remaining primitive roots of n can be found by exponentiating mod n.

Your task is to write a program that computes the primitive roots of prime n. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 07, 2019 09:00 AM

June 04, 2019

Programming Praxis

Reverse Vowels

Now that school is finished for the year, I’m catching up on the homework questions that people have sent me over the last several months. This one is fun:

Given a string, write it with the vowels reversed, maintaining the original capitalization of the vowels. For instance, “HELLO world” becomes “HOLLO werld” and “Programming PRAXIS” becomes “Prigramming PRAXOS” (I kinda like that one).

Your task is to write a program that returns an input string with vowels in reverse order, respecting the original capitalization. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 04, 2019 09:00 AM