Planet Scheme

February 14, 2020

Programming Praxis

Square Triple

We have a simple homework problem today:

Given a list of distinct integers, find all triples (x y z) where x, y and z are in the list and x * y = z².

Your task is to find the list of triples. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 14, 2020 10:00 AM

Greg Hendershott

Using check-syntax in Racket Mode

Using check-syntax in Racket Mode

:: Racket, Emacs

During most of January and into February, I’ve been working full-time to have Racket Mode make better use of drracket/check-syntax analysis of fully-expanded code. I’m pretty excited by it.

Consider this little Racket program:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#lang racket/base

(push-the-red-button!)

(define (f x)
  (+ x 1))

(module m typed/racket/base
  (define (g x)
    + x 1))

A few things to note:

  • The file module has a side-effecting expression (push-the-red-button!). Every time the file is evaluated a.k.a. “run”, this expression is evaluated.

  • The file module language is racket/base.

  • The submodule m language is typed/racket/base.

Status quo

Traditionally Racket Mode has required you to racket-run a file in order for some handy features to be available, because it can use the namespace resulting from module->namespace to support those features. This reflects its origins as a wrapper around xrepl. It retained that basic approach even as it grew beyond xrepl.

For instance it uses namespace-mapped-symbols to obtain a list of completion candidates. If it needs to get you documentation, it can synthesize a suitable identifier from a string using namespace-symbol->identifier. It can give such an identifier to xref-binding->definition-tag, and in turn give that result to xref-tag->path+anchor, and get the appropriate help topic.

Likewise the visit-definition feature can use the namespace to make the correct identifier to give to identifier-binding, which tells you where the binding was defined.

Racket Mode lets you run, not just the outermost, file’s module, but specific submodules — the innermost module around point. So when you run the m module, the namespace is inside that submodule. Therefore features work as you’d expect. For example help will be for Typed Racket’s define, and visit-definition will go to the source for that define.

This all works fine, with one drawback: You must racket-run the edit buffer to get results. (And if you make changes, you must run it again to get updated results.) This is a speed bump, especially when exploring many Racket files. It would be nice to open the file, and just visit definitions or get help — without needing to run each file. That can be slow. Worse, what about side-effects like push-the-red-button!?

Also, I knew at least a few people who use Racket Mode pretty much solely as an editing mode. In other words, they use racket-mode to edit .rkt files, but they rarely or never use racket-run and racket-repl-mode. Maybe racket-mode should be simpler and more lightweight — maybe some new minor mode should augment it with the “extras” for those who want them. This is a typical Emacs approach: A major mode handles a basic set of functionality, and one or more minor modes optionally enhance it.

Check Syntax

The Dr Racket IDE has a “Check Syntax” feature, which works solely by expanding a module — but not evaluating a.k.a. “running” it — and analyzing that fully-expanded syntax.

The acronym REPL — read, eval, print, loop? The “E” actually covers a few steps: expand, compile, evaluate. More like “RECEPL”.

A program in fully-expanded form looks like Racket “assembly language”. It is tedious to read, as a human. But a few key forms like define-values, define-syntaxes, and #%require, provide a lot of very useful information about bindings — things defined in that module or imported from another.1

The great news is that the drracket/check-syntax library exposes the analysis Dr Racket does to create annotations for the source buffer. I was able to leverage this to get most of what I wanted to do.

In a few cases, I need to extend this. For example, I wanted all local, module, and imported definitions to be available as completion candidates. The library produces annotations for the arrows Dr Racket draws between definitions and uses. If something is defined but not used? You can’t draw an arrow from something to nothing. Therefore it doesn’t produce an annotation. So the set of definitions in arrow annotations isn’t sufficient. Fortunately there is also a “mouseover text” annotation such as “N bound occurrence(s)”.2 So, a way to get all module and local definitions is to look for all such annotations — those are your definitions. As for imported definitions, I needed to walk the fully-expanded syntax myself, looking for #%require and its subforms that filter and rename what is imported, and do such filters/renames on module->exports lists. Although that didn’t take five minutes to figure out and debug, it wasn’t too bad.

Some of the work involved other issues. By far the slowest part isn’t the analysis of expanded code, it is getting that expanded code: expand can be non-trivial for languages like Typed Racket. For other reasons Racket Mode already maintained a cache of file -> expanded syntax. I needed that to also handle (path-string code-string) -> expanded syntax, for the case of unsaved buffer text. And as a result, I needed to make my own, cache-aware wrapper like show-content around make-traversal.

I also enhanced the back end command server to support the idea of cancelling a command already in progress. That way when we get a new request to analyze updated source, we can cancel any analysis possibly still running for the old, “stale” text.

Front end

How about the “front end”, implemented in Emacs Lisp?

Racket Mode already had a racket-check-syntax-mode minor mode. However it made the buffer read-only. You could navigate, and rename bindings, but you had to quit the mode to resume normal free-form editing. Also it didn’t attempt to unify the UX with normal commands to visit definitions, see help, and so on.

Instead, I wanted to this be something that — as in Dr Racket — runs automatically in the background on a read/write buffer. After you make some changes, and after some short idle delay, a new analysis kicks off. When its results are ready, the buffer annotations and completion candidates are refreshed. Annotations are done using Emacs text properties, and in the case of defs and uses which refer to each other, use Emacs markers so they remain valid after most edits. When there are syntax errors, those are annotated, but previous completion candidates are retained.

Although we can’t draw graphical arrows, we can highlight definitions and uses when point moves over them, using cursor-sensor-mode (which is a minor mode a user can disable if they find this slow or distracting).

Initially I still called this racket-check-syntax-mode. But (a) that’s pretty verbose, and (b) checking your syntax for mistakes is not most of the benefit. Really, it is a static analysis of expanded code that helps you explore and explain. With all those “exp” prefixes, plus an inclination for an even shorter name, I settled on racket-xp-mode.

At this point I’ll simply quote the documentation string for racket-xp-mode and show some screen shots:

racket-xp-mode is an interactive function defined in racket-xp.el.

Documentation

A minor mode that analyzes expanded code to explain and explore.

This minor mode is an optional enhancement to racket-mode edit buffers. Like any minor mode, you can turn it on or off for a specific buffer. If you always want to use it, put the following code in your Emacs init file:

1
2
(require 'racket-xp)
(add-hook 'racket-mode-hook #'racket-xp-mode)

Note: This mode won’t do anything unless/until the Racket Mode back end is running. It will try to start the back end automatically. You do not need to racket-run the buffer you are editing.

This mode uses the drracket/check-syntax package to analyze fully-expanded programs, without needing to evaluate a.k.a. “run” them. The resulting analysis provides information for:

  • Visually annotating bindings — local or imported definitions and references to them.

  • Completion candidates.

  • Defintions’ source and documentation.

When point is on a definition or use, related items are highlighted using racket-xp-def-face and racket-xp-use-face — instead of drawing arrows as in Dr Racket — and “mouse over”. Information is displayed using the function(s) in the hook variable racket-show-functions; it is also available when hovering the mouse cursor. Note: If you find these features too distracting and/or slow, you may disable cursor-sensor-mode. The remaining features discussed below will still work.

You may also use commands to navigate among a definition and its uses, or to rename a local definitions and all its uses.

In the following little example, not only does drracket/check-syntax distinguish the various “x” bindings, it understands the two different imports of “define”:

1
2
3
4
5
6
7
8
#lang racket/base
(define x 1)
x
(let ([x x])
  (+ x 1))
(module m typed/racket/base
  (define x 2)
  x)

The function racket-xp-complete-at-point is added to the variable completion-at-point-functions. Note that in this case, it is not smart about submodules; identifiers are assumed to be definitions from the file’s module or its imports. In addition to supplying completion candidates, it supports the ":company-location" property to inspect the definition of a candidate and the ":company-doc-buffer" property to view its documentation.

When you edit the buffer, existing annotations are retained; their positions are updated to reflect the edit. Annotations for new or deleted text are not requested until after racket-xp-after-change-refresh-delay seconds. The request is made asynchronously so that Emacs will not block — for moderately complex source files, it can take some seconds simply to fully expand them, as well as a little more time for the drracket/check-syntax analysis. When the results are ready, all annotations for the buffer are completely refreshed.

You may also set racket-xp-after-change-refresh-delay to nil and use the racket-xp-annotate command manually.

The mode line changes to reflect the current status of annotations, and whether or not you had a syntax error.

If you have one or more syntax errors, use the standard next-error command and key bindings to navigate among them. Although most languages will stop after the first syntax error, some like Typed Racket will try to collect and report multiple errors.

Tip: This mode follows the convention that a minor mode may only use a prefix key consisting of “C-c” followed by a punctuation key. As a result, racket-xp-control-c-hash-keymap is bound to “C-c #” by default. Although you might find this awkward to type, remember that as an Emacs user, you are free to bind this map to a more convenient prefix, and/or bind any individual commands directly to whatever keys you prefer.

C-c # .  racket-xp-visit-definition
C-c # g  racket-xp-annotate
C-c # j  racket-xp-next-definition
C-c # k  racket-xp-previous-definition
C-c # n  racket-xp-next-use
C-c # p  racket-xp-previous-use
C-c # r  racket-xp-rename
C-c C-.  racket-xp-describe
C-c C-d  racket-xp-documentation
M-.      racket-xp-visit-definition
Animatation of check-syntax screen shots

Keep in mind that almost everything can be customized using defined faces or variables. Information like “1 bound occurrence” or error messages can be configured to be shown using the echo area (as in these screen shots), a header line, a pos-tip tooltip, and/or whatever custom function you want to supply.3

It was particularly fun when things were working well enough that I could use racket-xp-mode to more quickly explore and better understand some of the drracket/check-syntax code — as well as my own.

Taking stock

What remained was to step back and look at the racket-mode major mode (in hindsight maybe better-named racket-edit-mode), now optionally enhanced by racket-xp-mode, as well as the racket-repl-mode major mode.

What should live where, what should change, and what should remain the same?

After all, the old way of using the namespace from a run program, is actually still useful and quite appropriate for the REPL. You can run a program, then go into the REPL, and define and require all sorts of things. It still makes send to have that live namespace answer questions like “What are your symbols for completion candiates?” or “Which define is this, exactly, and where is its help or definition source?”. It is nice that you no longer must run your program, but if you have run it, and you’re in the REPL buffer… when in Rome.

So I realized there were a few commands — visit-definition, describe, help, completions — where there should be two flavors. One provided by racket-xp-mode and another provided by racket-repl-mode still working the “old” way. For example the old racket-visit-definition is effectively just moved/renamed to racket-repl-visit-definition.

At a lower level, also, there were some functions where I wrapped both flavors in a single back end command with a “how” argument: Use the current-namespace from module->namespace, or, use the namespace and module lexical context for some given path? (The latter covers cases where there exists no annotation, or the user has typed some arbitrary text, and we want to synthesize an identifier.)

Automatically starting the back end

One other note. Racket Mode consists of an Emacs front end, obviously, and also a back end command server written in Racket. Although racket-xp-mode does not need to racket-run any .rkt buffer, it does need to talk to the back end server. Today that still means that a racket-repl-mode buffer is created. It just stays in the background, and it needn’t be running any particular file — but it must be open.4

When racket-xp-mode needs the back end, it will try to start it if not already running. This made me nervous, because auto-starting in the past led to some bugs, especially when something needed a command response to continue: Emacs could sometimes get stuck at a “starting…” or “waiting for…” message. You could C-g out of that, usually. But it sucked. I had a think and realized the key was to make everything “asynchronous” with explicit continuations. I’d already replaced many uses of racket--cmd/await with racket--cmd/async — which takes a completion callback a.k.a. continuation procedure. I replaced even more. And also, I realized that starting the command server should itself be asynchronous. And so any racket--cmd/async that can’t happen yet, is pushed onto a list of continuations to be run eventually by the “connected to command server” continuation when it is called. Meanwhile, Emacs remains unblocked. Although I am not going to jinx this by declaring “mission accomplished!”, and some bugs may lurk, I’ve tried to be diligent in thinking through race conditions. I feel like this approach at least simplifies what needs to be considered.

Availability

As I write this, I have 100+ commits still on a check-syntax branch — not yet merged to master or available from MELPA.

If you’re feeling brave or curious, feel free to try it out! I’ve been dogfooding it a fair amount. I’m still finding and fixing edge cases, but I think it’s stable enough to understand the basic experience.

  1. The outermost module form also gets a syntax-property with the lexical context inside the module, including imports. As a result, you can synthesize an identifer using that as the first argument to datum->syntax, and then do the same things with that identifier you could with one from namespace-symbol->identifer, in order to support features like visit-definition or help, where the user has supplied some arbitrary string. 

  2. And if you want to find unused imported and locally-defined definitions, they’ll have a “mouseover” annotation, “no bound occurrences”. 

  3. It’s also shown when you hover the mouse pointer. If you use a mouse with Emacs, I won’t judge you. *coughs*

  4. Perhaps in the future that will change — especially if Racket Mode ever supports multiple REPLs open at once. In that case, starting the back end will be an independent first step, things like racket-run will create one or more REPLs, and the I/O for the comint buffers will be a network stream instead of stdin/stdout. 

by Greg Hendershott at February 14, 2020 12:00 AM

February 13, 2020

Joe Marshall

A polygot program puzzle

Can you come up with an expression that evaluates to the symbol 'scheme in a Scheme system, but evaluates to the symbol 'common-lisp in a Common Lisp system?

by Joe Marshall (noreply@blogger.com) at February 13, 2020 05:06 AM

February 12, 2020

Joe Marshall

Anaphoric if

An anaphoric if expression binds the identifier “it” to the value of the conditional in the scope of the consequent
(aif (find-broken-computer)
     (fix it))
I have two objections to anaphoric macros. The first is that the binding of “it” isn't obvious, the second is the inflexibility of the variable name. You are kind of stuck with “it”. What if you wanted to use “it” to mean something else? Maybe you have an IT department to fix your computers
(let ((it (find-department "IT")))
  (aif (find-broken-computer)
       (tell it (fix it))))   ; name conflict
or maybe you want to nest two anaphoric conditionals
(aif (find-broken-computer)
     (aif (find-broken-component it)
          (repair it)
          (replace it)))
In this case, I want to replace the computer if I cannot repair the broken component, but the inner binding of “it” shadows the outer binding and makes it inaccessible.

The solution is pretty obvious if you think about it (though sometimes it takes me a while to see the obvious). Just replace the conditional test form with a binding form
(aif (it (find-broken-computer))
     (fix it))
This makes it obvious we are binding the variable “it” to the value returned by (find-broken-computer), and it lets us choose the name we bind. If we want to nest these, it would look like this
(aif (computer (find-broken-computer))
     (aif (component (find-broken-component computer))
          (repair component)
          (replace computer)))
But I'm not sure if this is so much more concise and clear than simply using a let expression that it is worth adding this syntax to the language. It's one more thing the reader of the code has to be prepared to encounter.

A slightly different approach would move the binding form closer to where it is used. Note that there is no point in binding a name in the alternative clause to the conditional because it will always have the value nil.
(aif (find-broken-computer)
     (it (fix it)))
and instead of using a let binding, I could use a lambda binding
(aif (find-broken-computer)
     (λ (it) (fix it)))
aif no longer needs to be a macro but can be an ordinary function, which might be handy if your language doesn't have macros
(defun aif (it consequent &optional alternative)
  (if it
      (funcall consequent it)
      (if alternative
          (funcall alternative)
          nil)))

(aif (find-broken-computer)
     (λ (computer)
        (aif (find-broken-component computer)
             (λ (component) (fix component))
             (λ () (replace computer)))))
The explicit lambdas make it obvious what is being bound and what the scope of the binding is, but they do add a bit of visual noise.

Instead of using anaphoric if, I just write the slightly more verbose
(let ((computer (find-broken-computer)))
  (if computer
      (let ((component (find-broken-component)))
        (if component
            (repair component)
            (replace computer)))))
The binding is obvious, and I get to choose the variable being bound; both problems solved. I don't see a compelling reason to use the anaphoric version.

Addendum

Hexstream suggests that “No discussion of anaphoric macros can be complete without at least mentioning anaphoric-variants: https://www.hexstreamsoft.com/libraries/anaphoric-variants/” I wouldn't want to be incomplete, so consider it mentioned.

by Joe Marshall (noreply@blogger.com) at February 12, 2020 06:01 PM

February 11, 2020

Joe Marshall

Macro pitfalls

Macros are a unique source of power in Common Lisp, but there are some pitfalls to watch out for.

A compiler macro is special macro that is expanded only by the compiler. The interpreter doesn't expand the macro and simply evaluates the form like a normal function call. If you aren't careful when writing a compiler macro, the interpreted and compiled forms may not evaluate the same and that's probably not what you want. Here we abuse this effect
(defun foo (x) 'interpreted)

(define-compiler-macro foo (x) ''compiled)

CL-USER> (foo)
INTERPRETED

CL-USER> ((lambda () (foo)))
COMPILED
That might be unexpected. It appears that in this implementation (SBCL) the compiler is called on lambda expressions when they are evaluated.

Like all macros, a compiler macro is given the unevaluated source code of the arguments, not the value. We can see that in this example
(defun foo (l r)
  (format t "~%Foo")
  (list r l))

(define-compile-macro foo (l r) 
  `(progn 
     (format t "~%Foo")
     (list ,r ,l)))

CL-USER> (foo (progn (format t "~%First") 'l) (progn (format t "~%Second") 'r))

First
Second
Foo
(r l)

CL-USER> ((lambda () (foo (progn (format t "~%First") 'l) (progn (format t "~%Second") 'r))))

Foo
Second
First
(r l)
When interpreted, the arguments are evaluated left to right before the function is entered. When compiled, the arguments end up being evaluated right to left and after the function is entered.

Unless you really want this — and shame on you if you do — you have to be careful when writing your macro to preserve the left-to-right, call-by-value semantics that are probably expected. The easiest way to do this is to write the expansion so that it just substitutes the body of the function. Something like this
(define-compiler-macro foo (l r)
  `((lambda (l r)
      (format t "~%Foo")
      (list r l))
    ,l
    ,r))

CL-USER> (foo (progn (format t "~%First") 'l) (progn (format t "~%Second") 'r))

First
Second
Foo
(r l)
Or you could use a let expression with the same effect
(define-compiler-macro foo (l r)
  `(let ((l ,l)
         (r ,r))
     (format t "~%Foo")
     (list r l)))
The version with the lambda expression doesn't even require putting a block of let bindings at the front. You just plop down the original argument list and body after the lambda, but both forms are equivalent.

The problem with doing this is that you have probably disabled the ability of the compiler to optimize the expression. You are forcing the compiler to ensure that the arguments are evaluated in left-to-right order before the body. A Sufficiently Smart compiler might be able to provide some optimizations anyway. If your compiler is not Sufficiently Smart, you can take matters in to your own hands and substitute the arguments at the point they are used. Just be aware that you might be surprising people by changing the semantics at the call site.

Funny semantics isn't just a problem with compiler macros. Regular macros have to be written with care as well or you may surprise users when they write code they think are normal function calls. Compiler macros just have the unique property that they can change the semantics between interpreted and compiled code.

You can see a related effect when using symbol macros. A symbol macro substitutes a piece of code that computes a value. If we write
CL-USER> (let ((l (progn (format t "~%First") 'l))
               (r (progn (format t "~%Second") 'r)))
           (format t "~%Let body")
           (list r l))

First
Second
Let body
(r l)
we get the standard left-to-right, call-by-value evaluation. But we can mimic normal-order reduction by substituting the code for l and r before evaluating the body of the let by use of symbol-macrolet*
CL-USER> (symbol-macrolet ((l (progn (format t "~%First") 'l))
                           (r (progn (format t "~%Second") 'r)))
           (format t "~%Symbol-macrolet body")
           (list r l))

Symbol-macrolet body
Second
First
(r l)
If one of the arguments to a macro is a block of code, for instance the &body argument, then you probably want to avoid accidental variable capture.
(defmacro capturing-macro (&body body)
  `(let ((temp 'captured))
     (format t "~%Macro body binds temp to ~S" temp)
     ,@body))

(let ((temp 'lexical))
  (capturing-macro
     (format t "~%Temp is ~s" temp)))

Macro body binds temp to CAPTURED
Temp is CAPTURED
NIL
The lexical binding of temp is shadowed by the binding introduced by capturing-macro. This is probably unintended (except in the case of anamorphic macros, where capture is intended). Instead, you can ensure lexical scoping is maintained by closing over the body before introducing any new bindings
(defmacro non-capturing-macro (&body body)
  `(let ((temp 'captured)
         (body (lambda () ,@body)))
     (format t "~%Macro body binds temp to ~S" temp)
     (funcall body)))

(let ((temp 'lexical))
  (non-capturing-macro
    (format t "~%Temp is ~s" temp)))

Macro body binds temp to CAPTURED
Temp is LEXICAL
NIL
In this case, even a fairly naive compiler ought to be able to inline the call to body because it is simply a lexically apparent code block.

Inadvertent capture can happen in other direction as well if the macro caller shadows a binding used by the macro.
(flet ((funcall (x) (format t "~%Unexpected")))
  (let ((temp 'lexical))
    (non-capturing-macro
      (list temp))))

Macro body binds temp to CAPTURED
Unexpected
NIL
Here the caller shadowed funcall and the code the macro introduced ends up inadvertently calling it. This doesn't happen often in practice because people rarely shadow the top-level functions a macro depends upon, and that is good because there isn't an easy way to solve this reverse capture problem (other than don't do that).

The “hygienic” macro system in Scheme solves both kinds of accidental capture by appropriately renaming variables. There is a price, however. You either have to forego direct code manipulation and use a special pattern matching language, or write code that explicitly keeps track of the environment where the variables are bound. For simple macros, the pattern matching language is adequate, but for more complex macros, neither option is appealing.

*Macrolet rhymes with Chevrolet, naturally.

by Joe Marshall (noreply@blogger.com) at February 11, 2020 12:41 PM

Programming Praxis

Removing Spaces

We have a simple task today, with dozens of potential solutions:

Write a program to remove all spaces from a string.

Your task is to write a program to remove all spaces from a string. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 11, 2020 10:00 AM

February 10, 2020

Joe Marshall

Four ways to use macros

The way I see it, there are about four five basic ways to use macros in Common Lisp.

First are macros that circumvent the regular call-by-value semantics. These might evaluate a subform at macro expansion time, treat a subform as a place (an l-value) rather than a value, or otherwise treat a subform as something other than a runtime function call. For example, if incf fully evaluated its argument, it could perform the increment on the value, but it couldn't put the value back where it got it. Another example is the check-type macro. You use it like this:
(defun foo (x)
  (check-type foo (integer 1 *) "a positive integer")
  (bar (- x 1)))
The check-type macro has to be a macro because it treats foo as a place (it will allow you to proceed by modifying foo), and it treats its second argument as a type specifier.

Second are macros that introduce new syntax to the language. Examples are cond, case, do, dotimes, defun, defvar, etc. These treat their arguments specially or have special clauses that don't act like ordinary function calls.
CL-USER> (macroexpand-all '(do ((i 0 (+ i 1))
                                (j 1 (* j 2)))
                               ((> j 65536) nil)
                             (format t "~%~2d ~5d" i j)))

(BLOCK NIL
  (COMMON-LISP:LET ((I 0) (J 1))
    (TAGBODY
      (GO #:G748)
     #:G747
      (TAGBODY (FORMAT T "~%~2d ~5d" I J))
      (COMMON-LISP:LET* ((#:NEW1 (+ I 1)) (#:NEW1 (* J 2)))
        (SETQ I #:NEW1)
        (SETQ J #:NEW1)
        NIL)
     #:G748
      (IF (> J 65536)
          NIL
          (GO #:G747))
      (RETURN-FROM NIL (PROGN NIL))))

Third are macros that implement tiny languages within Common Lisp. The loop macro is a good example. It looks like this
(loop for i from 1 to (compute-top-value)
      while (not (unacceptable i))
      collect (square i)
      do (format t "Working on ~D now" i)
      when (evenp i)
        do (format t "~D is a non-odd number" i) 
      finally (format t "About to exit!"))
It works like a little compiler. It parses the loop clauses and generates a Lisp form that carries them out
(BLOCK NIL
  (LET ((I 1) (#:LOOP-LIMIT-744 (COMPUTE-TOP-VALUE)))
    (DECLARE (TYPE (AND NUMBER REAL) #:LOOP-LIMIT-744)
             (TYPE (AND REAL NUMBER) I))
    (SB-LOOP::WITH-LOOP-LIST-COLLECTION-HEAD (#:LOOP-LIST-HEAD-745
                                              #:LOOP-LIST-TAIL-746)
      (TAGBODY
       SB-LOOP::NEXT-LOOP
        (WHEN (> I #:LOOP-LIMIT-744) (GO SB-LOOP::END-LOOP))
        (UNLESS (NOT (UNACCEPTABLE I)) (GO SB-LOOP::END-LOOP))
        (SB-LOOP::LOOP-COLLECT-RPLACD
         (#:LOOP-LIST-HEAD-745 #:LOOP-LIST-TAIL-746) (LIST (SQUARE I)))
        (FORMAT T "Working on ~D now" I)
        (IF (EVENP I)
            (FORMAT T "~D is a non-odd number" I))
        (SB-LOOP::LOOP-DESETQ I (1+ I))
        (GO SB-LOOP::NEXT-LOOP)
       SB-LOOP::END-LOOP
        (FORMAT T "About to exit!")
        (RETURN-FROM NIL
          (SB-LOOP::LOOP-COLLECT-ANSWER #:LOOP-LIST-HEAD-745)))))

Fourth are macros that run code in a special context. The with-… macros and when and unless fall in this category. These macros take ordinary Lisp code and wrap it with other code that establishes a context or tests a conditional.
CL-USER> (macroexpand '(when (condition) (foo) (bar)))

(IF (CONDITION)
    (PROGN (FOO) (BAR)))

CL-USER> (macroexpand '(with-open-file (foo "~/.bashrc" :if-does-not-exist :create)
                         (print (read-line foo))
                         (bar)))

(LET ((FOO (OPEN "~/.bashrc" :IF-DOES-NOT-EXIST :CREATE)) (#:G751 T))
  (UNWIND-PROTECT
      (MULTIPLE-VALUE-PROG1 (PROGN (PRINT (READ-LINE FOO)) (BAR)) (SETQ #:G751 NIL))
    (WHEN FOO (CLOSE FOO :ABORT #:G751))))

These aren't hard and fast categories, many macros can be thought of as in more than one category. All macros work by syntactic transformation and most treat at least one of their subforms as something other than a call-by-value form, for instance. There are also the occasional macros that have the look and feel of a standard function calls. The series package appears to allow you to manipulate series through standard function calls, but works by clever macroexpansion into iterative code.

I find it useful to think of macros in these four different ways, but I'm sure that others have their own categorizations that they find useful.

Addendum

An anonymous reader asked, “What about code walking/analysis/optimization?”. I really overlooked that. I think Richard Waters's series package would be a good example. It takes ordinary functional programs that can operate on series of data (coinductively defined data or codata), and turns it into the equivalent iterative construct that operates on one element at a time. It does this by clever macros that walk the code, analyse it, and rewrite it to a more optimal form
CL-USER> (sb-cltl2:macroexpand-all '(let ((x (scan-range :from 0 :below 10)))
                               (collect (choose-if #'evenp x))))

(COMMON-LISP:LET* ((X (COERCE (- 0 1) 'NUMBER))
                   (#:LASTCONS-751 (LIST NIL))
                   (#:LST-752 #:LASTCONS-751))
  (DECLARE (TYPE NUMBER X)
           (TYPE CONS #:LASTCONS-751)
           (TYPE LIST #:LST-752))
  (TAGBODY
   #:LL-756
    (SETQ X (+ X (COERCE 1 'NUMBER)))
    (IF (NOT (< X 10))
        (GO SERIES::END))
    (IF (NOT (EVENP X))
        (GO #:LL-756))
    (SETQ #:LASTCONS-751 (SB-KERNEL:%RPLACD #:LASTCONS-751 (CONS X NIL)))
    (GO #:LL-756)
   SERIES::END)
  (CDR #:LST-752)))
As you can see, the series code does a major rewrite of the original lisp code. An astute reader will notice that the let form has to have been redefined to do dataflow analysis of it's bindings and body. Thanks to anonymous for the suggestion.

Addendum 2: Symbol macros

A comment by Paul F. Dietz got me thinking about symbols and it occurred to me that symbol macros deserve their own category as well. Symbol macros appear to be an ordinary symbolic references to variables, but they expand to some code that computes a value. For instance, if foo were a symbol macro that expanded to (car bar), then using it in a form such as (+ foo 7) would expand to (+ (car bar) 7). A symbol macro is a two-edged sword. It is a very useful abstraction for providing a name to a computed value, but they also can fool the user of such a macro into thinking that a simple variable reference is happening when some more complex computation could be happening.

I think that makes seven ways and counting.

by Joe Marshall (noreply@blogger.com) at February 10, 2020 02:32 PM

February 09, 2020

GNU Guix

Outreachy May 2020 to August 2020 Status Report I

We are happy to announce that for the fourth time GNU Guix offers a three-month internship through Outreachy, the inclusion program for groups traditionally underrepresented in free software and tech. We currently propose three subjects to work on:

  1. Create Netlink bindings in Guile.
  2. Improve internationalization support for the Guix Data Service.
  3. Integration of desktop environments into GNU Guix.

The initial application deadline is on Feb. 25, 2020 at 4PM UTC.

The final project list is announced on Feb. 25, 2020.

Should you have any questions regarding the internship, please check out the timeline, information about the application process, and the eligibility rules.

If you’d like to contribute to computing freedom, Scheme, functional programming, or operating system development, now is a good time to join us. Let’s get in touch on the mailing lists and on the #guix channel on the Freenode IRC network!

Last year we had the pleasure to welcome Laura Lazzati as an Outreachy intern working on documentation video creation, which led to the videos you can now see on the home page.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Gábor Boskovits at February 09, 2020 02:30 PM

February 07, 2020

Andy Wingo

lessons learned from guile, the ancient & spry

Greets, hackfolk!

Like just about every year, last week I took the train up to Brussels for FOSDEM, the messy and wonderful carnival of free software and of those that make it. Mostly I go for the hallway track: to see old friends, catch up, scheme about future plans, and refill my hacker culture reserves.

I usually try to see if I can get a talk or two in, and this year was no exception. First on my mind was the recent release of Guile 3. This was the culmination of a 10-year plan of work and so obviously there are some things to say! But at the same time, I wanted to reflect back a bit and look at the past with a bit of distance.

So in the end, my one talk was two talks. Let's start with the first one. (I'm trying a new thing where I share my talks as blog posts. We'll see how this goes. I know the rendering can be a bit off relative to the slides, but hopefully it's good enough. If you prefer, you can just watch the video instead!)

Celebrating Guile 3

FOSDEM 2020, Brussels

Andy Wingo | wingo@igalia.com

wingolog.org | @andywingo

So yeah let's celebrate! I co-maintain the Guile implementation of Scheme. It's a programming language. Guile 3, in summary, is just Guile, but faster. We added a simple just-in-time compiler as well as a bunch of ahead-of-time optimizations. The result is that it runs faster -- sometimes by a lot!

In the image above you can see Guile 3's performance on a number of microbenchmarks, relative to Guile 2.2, sorted by speedup. The baseline is 1.0x as fast. You can see that besides the first couple microbenchmarks where things are a bit inconclusive (click for full-size image), everything gets faster. Most are at least 2x as fast, and one benchmark is even 32x as fast. (Note the logarithmic scale on the Y axis.)

I only took a look at microbenchmarks at the end of the Guile 3 series; before that, I was mostly going by instinct. It's a relief to find out that in this case, my instincts did align with improvement.

mini-benchmark: eval

(primitive-eval
 ’(let fib ((n 30))
    (if (< n 2)
        n
        (+ (fib (- n 1)) (fib (- n 2))))))

Guile 1.8: primitive-eval written in C

Guile 2.0+: primitive-eval in Scheme

Taking a look at a more medium-sized benchmark, let's compute the 30th fibonacci number, but using the interpreter instead of compiling the procedure. In Guile 2.0 and up, the interpreter (primitive-eval) is implemented in Scheme, so it's a good test of an important small Scheme program.

Before 2.0, though, primitive-eval was actually implemented in C. This had a number of disadvantages, notably that it prevented tail calls between interpreted and compiled code. When we switched to a Scheme implementation of primitive-eval, we knew we would have a performance hit, but we thought that we would gain it back eventually as the compiler got better.

As you can see, it took a while before the compiler and run-time improved to the point that primitive-eval in Scheme reached the speed of its old hand-tuned C implementation, but for Guile 3, we finally got there. Note again the logarithmic scale on the Y axis.

macro-benchmark: guix

guix build libreoffice ghc-pandoc guix \
  –dry-run --derivation

7% faster

guix system build config.scm \
  –dry-run --derivation

10% faster

Finally, taking a real-world benchmark, the Guix package manager is implemented entirely in Scheme. All ten thousand packages are defined in Scheme, the building scripts are in Scheme, the initial RAM disk is in Scheme -- you get the idea. Guile performance in Guix can have an important effect on user experience. As you can see, Guile 3 lowered elapsed time for some operations by around 10 percent or so. Of course there's a lot of I/O going on in addition to computation, so Guile running twice as fast will rarely make Guix run twice as fast (Amdahl's law and all that).

spry /sprī/

  • adjective: active; lively

So, when I was thinking about words that describe Guile, the word "spry" came to mind.

spry /sprī/

  • adjective: (especially of an old person) active; lively

But actually when I went to look up the meaning of "spry", Collins Dictionary says that it especially applies to the agèd. At first I was a bit offended, but I knew in my heart that the dictionary was right.

Lessons Learned from Guile, the Ancient & Spry

FOSDEM 2020, Brussels

Andy Wingo | wingo@igalia.com

wingolog.org | @andywingo

That leads me into my second talk.

guile is ancient

2010: Rust

2009: Go

2007: Clojure

1995: Ruby

1995: PHP

1995: JavaScript

1993: Guile (33 years before 3.0!)

It's common for a new project to be lively, but Guile is definitely not new. People have been born, raised, and earned doctorates in programming languages in the time that Guile has been around.

built from ancient parts

1991: Python

1990: Haskell

1990: SCM

1989: Bash

1988: Tcl

1988: SIOD

Guile didn't appear out of nothing, though. It was hacked up from the pieces of another Scheme implementation called SCM, which itself was initially based on Scheme in One Defun (SIOD), back before the Berlin Wall fell.

written in an ancient language

1987: Perl

1984: C++

1975: Scheme

1972: C

1958: Lisp

1958: Algol

1954: Fortran

1958: Lisp

1930s: λ-calculus (34 years ago!)

But it goes back further! The Scheme language, of which Guile is an implementation, dates from 1975, before I was born; and you can, if you choose, trace the lines back to the lambda calculus, created in mid-30s as a notation for computation. I suppose at this point I should say mid-2030s, to disambiguate.

The point is, Guile is old! Statistically, most software projects from olden times are now dead. How has Guile managed to survive and (sometimes) thrive? Surely there must be some lesson or other that can be learned here.

ancient & spry

Men make their own history, but they do not make it as they please; they do not make it under self-selected circumstances, but under circumstances existing already, given and transmitted from the past.

The tradition of all dead generations weighs like a nightmare on the brains of the living. [...]

Eighteenth Brumaire of Louis Bonaparte, Marx, 1852

I am no philospher of history, but I know that there are some ways of looking at the past that do not help me understand things. One is the arrow of enlightened progress, in which events exist in a causal chain, each producing the next. It doesn't help me understand the atmosphere, tensions, and possibilities inherent at any particular point. I find the "progress" theory of history to be an extreme form of selection bias.

Much more helpful to me is the Hegelian notion of dialectics: that at an given point in time there are various tensions at work. In our field, an example could be memory safety versus systems programming. These tensions create an environment that favors actions that lead towards resolution of the tensions. It doesn't mean that there's only one way to resolve the tensions, and it's not an automatic process -- people still have to do things. But the tendency is to ratchet history forward to a new set of tensions.

The history of a project, to me, is then a process of dialectic tensions and resolutions. If the project survives, as Guile has, then it should teach us something about the way this process works in practice.

ancient & spry

Languages evolve; how to remain minimal?

Dialectic opposites

  • world and guile

  • stable and active

  • ...

Lessons learned from inside Hegel’s motor of history

One dialectic is the tension between the world's problems and what tools Guile offers to understand and solve them. In 1993, the web didn't really exist. In 2033, if Guile doesn't run well in a web browser, probably it will be dead. But this process operates very slowly, for an old project; Guile isn't built on CORBA or something ephemeral like that, so we don't have very much data here.

The tension between being a stable base for others to build on, and in being a dynamic project that improves and changes, is a key tension that this talk investigates.

In the specific context of Guile, and for the audience of the FOSDEM minimal languages devroom, we should recognize that for a software project, age and minimalism don't necessarily go together. Software gets features over time and becomes bigger. What does it mean for a minimal language to evolve?

hill-climbing is insufficient

Ex: Guile 1.8; Extend vs Embed

One key lesson that I have learned is that the strategy of making only incremental improvements is a recipe for death, in the long term. The natural result is that you reach what you perceive to be the most optimal state of your project. Any change can only make it worse, so you stop moving.

This is what happened to Guile around version 1.8: we had taken the paradigm of the interpreter as language implementation strategy as far as it could go. There were only around 150 commits to Guile in 2007. We were stuck.

users stay unless pushed away

Inertial factor: interface

  • Source (API)

  • Binary (ABI)

  • Embedding (API)

  • CLI

  • ...

Ex: Python 3; local-eval; R6RS syntax; set!, set-car!

So how do we make change, in such a circumstance? You could start a new project, but then you wouldn't have any users. It would be nice to change and keep your users. Fortunately, it turns out that users don't really go away; yes, they trickle out if you don't do anything, but unless you change in an incompatible way, they stay with you, out of inertia.

Inertia is good and bad. It does conflict with minimalism as a principle; if you were to design Scheme in 2020, you would not include mutable variables or even mutable pairs. But they are still with us because if we removed them, we'd break too many users.

Users can even make you add back things that you had removed. In Guile 2.0, we removed the capability to evaluate an expression at run-time within the lexical environment of an expression, as we didn't know how to implement this outside an interpreter. It turns out this was so important to users that we had to add local-eval back to Guile, later in the 2.0 series. (Fortunately we were able to do it in a way that layered on lower-level facilities; this approach reconciled me to the solution.)

you can’t keep all users

What users say: don’t change or remove existing behavior

But: sometimes losing users is OK. Hard to know when, though

No change at all == death

  • Natural result of hill-climbing

Ex: psyntax; BDW-GC mark & finalize; compile-time; Unicode / locales

Unfortunately, the need to change means that sometimes you will lose users. It's either a dead project, or losing users.

In Guile 1.8, for example, the macro expander ran lazily: it would only expand code the first time it ran it. This was good for start-up time, because not all code is evaluated in the course of a simple script. Lazy expansion allowed us to start doing important work sooner. However, this approach caused immense pain to people that wanted "proper" Scheme macros that preserved lexical scoping; the state of the art was to eagerly expand an entire file. So we switched, and at the same time added a notion of compile-time. This compromise kept good start-up time while allowing fancy macros.

But eager expansion was a change. Users that relied on side effects from macro expansion would see them at compile-time instead of run-time. Users of old "defmacros" that could previously splice in live Scheme closures as literals in expanded source could no longer do that. I think it was the right choice but it did lose some users. In fact I just got another bug report related to this 10-year-old change last week.

every interface is a cost

Guile binary ABI: libguile.so; compiled Scheme files

Make compatibility easier: minimize interface

Ex: scm_sym_unquote, GOOPS, Go, Guix

So if you don't want to lose users, don't change any interface. The easiest way to do this is to minimize your interface surface. In Go, for example, they mostly haven't had dynamic-linking problems because that's not a thing they do: all code is statically linked into binaries. Similarly, Guix doesn't define a stable API, because all of its code is maintained in one "monorepo" that can develop in lock-step.

You always have some interfaces, though. For example Guix can't change its command-line interface from one day to the next, for example, because users would complain. But it's been surprising to me the extent to which Guile has interfaces that I didn't consider. Recently for example in the 3.0 release, we unexported some symbols by mistake. Users complained, so we're putting them back in now.

parallel installs for the win

Highly effective pattern for change

  • libguile-2.0.so

  • libguile-3.0.so

https://ometer.com/parallel.html

Changed ABI is new ABI; it should have a new name

Ex: make-struct/no-tail, GUILE_PKG([2.2]), libtool

So how does one do incompatible change? If "don't" isn't a sufficient answer, then parallel installs is a good strategy. For example in Guile, users don't have to upgrade to 3.0 until they are ready. Guile 2.2 happily installs in parallel with Guile 3.0.

As another small example, there's a function in Guile called make-struct (old doc link), whose first argument is the number of "tail" slots, followed by initializers for all slots (normal and "tail"). This tail feature is weird and I would like to remove it. Unfortunately I can't just remove the argument, so I had to make a new function, make-struct/no-tail, which exists in parallel with the old version that I can't break.

deprecation facilitates migration

__attribute__ ((__deprecated__))
(issue-deprecation-warning
 "(ice-9 mapping) is deprecated."
 "  Use srfi-69 or rnrs hash tables instead.")
scm_c_issue_deprecation_warning
  ("Arbiters are deprecated.  "
   "Use mutexes or atomic variables instead.");

begin-deprecated, SCM_ENABLE_DEPRECATED

Fortunately there is a way to encourage users to migrate from old interfaces to new ones: deprecation. In Guile this applies to all of our interfaces (binary, source, etc). If a feature is marked as deprecated, we cause its use to issue a warning, ideally at compile-time when users responsible for the package can fix it. You can even add __attribute__((__deprecated__)) on C types!

the arch-pattern

Replace, Deprecate, Remove

All change is possible; question is only length of deprecation period

Applies to all interfaces

Guile deprecation period generally one stable series

Ex: scm_t_uint8; make-struct; Foreign objects; uniform vectors

Finally, you end up in a situation where you have replaced the old interface and issued deprecation warnings to help users migrate. The next step is to remove the old interface. If you don't do this, you are failing as a project maintainer -- your project becomes literally unmaintainable as it just grows and grows.

This strategy applies to all changes. The deprecation period may last a while, and it may be that the replacement you built doesn't serve the purpose. There is still a dialog with the users that needs to happen. As an example, I made a replacement for the "SMOB" facility in Guile that allows users to define new types, backed by C interfaces. This new "foreign object" facility might not actually be good enough to replace SMOBs; since I haven't formally deprecatd SMOBs, I don't know yet because users are still using the old thing!

change produces a new stable point

Stability within series: only additions

Corollary: dependencies must be at least as stable as you!

  • for your definition of stable

  • social norms help (GNU, semver)

Ex: libtool; unistring; gnulib

In my experience, the old management dictum that "the only constant is change" does not describe software. Guile changes, then it becomes stable for a while. You need an unstable series escape hill-climbing, then once you found your new hill, you start climbing again in the stable series.

Once you reach your stable point, the projects you rely on need to exhibit the same degree of stability that you envision for your project. You can't build a web site that you expect to maintain for 10 years on technology that fundamentally changes every 6 months. But stable dependencies isn't something you can ensure technically; rather it relies on social norms of who makes the software you use.

who can crank the motor of history?

All libraries define languages

Allow user to evolve the language

  • User functionality: modules (Guix)

  • User syntax: macros (yay Scheme)

Guile 1.8 perf created tension

  • incorporate code into Guile

  • large C interface “for speed”

Compiler removed pressure on C ABI

Empowered users need less from you

A dialectic process does not progress on its own: it requires actions. As a project maintainer, some of my actions are because I want to do them. Others are because users want me to do them. The user-driven actions are generally a burden and as a lazy maintainer, I want to minimize them.

Here I think Guile has to a large degree escaped some of the pressures that weigh on other languages, for example Python. Because Scheme allows users to define language features that exist on par with "built-in" features, users don't need my approval or intervention to add (say) new syntax to the language they work in. Furthermore, their work can still compose with the work of others, even if the others don't buy in to their language extensions.

Still, Guile 1.8 did have a dynamic whereby the relatively poor performance of having to run all code through primitive-eval meant that users were pushed towards writing extensions in C. This in turn pushed Guile to expose all of its guts for access from C, which obviously has led to an overbloated C API and ABI. Happily the work on the Scheme compiler has mostly relieved this pressure, and we may therefore be able to trim the size of the C API and ABI over time.

contributions and risk

From maintenance point of view, all interface is legacy

Guile: Sometimes OK to accept user modules when they are more stable than Guile

In-tree users keep you honest

Ex: SSAX, fibers, SRFI

It can be a good strategy to "sediment" solutions to common use cases into Guile itself. This can improve the minimalism of an entire ecosystem of code. The maintenance burden has to be minimal, however; Guile has sometimes adopted experimental code into its repository, and without active maintenance, it soon becomes stale relative to what users and the module maintainers expect.

I would note an interesting effect: pieces of code that were adopted into Guile become a snapshot of the coding style at that time. It's useful to have some in-tree users because it gives you a better idea about how a project is seen from the outside, from a code perspective.

sticky bits

Memory management is an ongoing thorn

Local maximum: Boehm-Demers-Weiser conservative collector

How to get to precise, generational GC?

Not just Guile; e.g. CPython __del__

There are some points that resist change. The stickiest of these is the representation of heap-allocated Scheme objects in C. Guile currently uses a garbage collector that "automatically" finds all live Scheme values on the C stack and in registers. It was the right choice at the time, given our maintenance budget. But to get the next bump in performance, we need to switch to a generational garbage collector. It's hard to do that without a lot of pain to C users, essentially because the C language is too weak to express the patterns that we would need. I don't know how to proceed.

I would note, though, that memory management is a kind of cross-cutting interface, and that it's not just Guile that's having problems changing; I understand PyPy has had a lot of problems regarding changes on when Python destructors get called due to its switch from reference counting to a proper GC.

future

We are here: stability

And then?

  • Parallel-installability for source languages: #lang

  • Sediment idioms from Racket to evolve Guile user base

Remove myself from “holding the crank”

So where are we going? Nowhere, for the moment; or rather, up the hill. We just released Guile 3.0, so let's just appreciate that for the time being.

But as far as next steps in language evolution, I think in the short term they are essentially to further enable change while further sedimenting good practices into Guile. On the change side, we need parallel installability for entire languages. Racket did a great job facilitating this with #lang and we should just adopt that.

As for sedimentation, we should step back and if any common Guile use patterns built by our users should be include core Guile, and widen our gaze to Racket also. It will take some effort both on a technical perspective but also on a social/emotional consensus about how much change is good and how bold versus conservative to be: putting the dialog into dialectic.

dialectic, boogie woogie woogie

https://gnu.org/s/guile

https://wingolog.org/

#guile on freenode

@andywingo

wingo@igalia.com

Happy hacking!

Hey that was the talk! Hope you enjoyed the writeup. Again, video and slides available on the FOSDEM web site. Happy hacking!

by Andy Wingo at February 07, 2020 11:38 AM

Programming Praxis

My Mailbag

I had two interesting emails from readers this week. I’ll discuss both of them, but suppress the names of the writers; they can identify themselves in the comments below if they wish.

One writer saw Johnny Ball’s video about Russian Multiplication on Numberphile and suggested it would make a good exercise. Indeed it would; in fact, we have already done it twice ([1] [2]).

Another writer suggested that the compose function in the Standard Prelude should return the identity function if called with no arguments, rather than reporting an error. That is, of course, correct; my apologies for the bug.

Your task is to write programs to perform russian multiplication and compose functions, as suggested above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 07, 2020 10:00 AM

February 06, 2020

Joe Marshall

Dispatching

There are times when you are faced with a complex piece of control flow
try {
    if (condition())
    {
        ... block 1 ...
    }
    else 
    {
        switch (someValue())
        {
          case CASE_A:
            ... block 2 ...
            break;

          case CASE_B:
            ... block 3 ...
            break;

          default:
            ... block 4 ...
            break;
        }
    }
} catch (SomeException someException) {
    ... block 5 ...
}
and you want to abstract the control flow — all the conditionals, switches, and try/catches — away from the code that does the work — the various blocks. In fact, here I've abstracted it away by putting "... block n ..." in place of the blocks of code.

If I were writing this in Scheme or Common Lisp, I'd consider using continuation passing style. I'd write a function dispatch-on-foo that would perform the dispatch, but then invoke one of several first-class procedures passed in as arguments
(defun dispatch-on-foo (foo bar case1 case2 case3 case4 case5)
   (if (... complex conditional ...) 
       (funcall case1)
       (handler-case (case some-value
                       ((case-a) (funcall case2))
                       ((case-b) (funcall case3))
                       (t (funcall case4)))
         (error (condition) (funcall case5)))))
At the call site, I'd write
(dispatch-on-foo <arg1> <arg2>
  (lambda ()
     ... block 1 ...)
  (lambda ()
     ... block 2 ...)
  (lambda ()
     ... block 3 ...)
  (lambda ()
     ... block 4 ...)
  (lambda ()
     ... block 5 ...))
This is a win when the complexity of the dispatch is enough that you don't want to replicate it at every call site. Notice how the nested blocks of code have been pulled up to the same level and linearized. Granted, you've cluttered up the call site with lambda expressions, but as Steele pointed out, you can think of these as anonymous go tags: dispatch-on-foo in essence will end up doing a jump to one of these tags and execute the block there, skipping and ignoring the other blocks. Once you get used to thinking in this way, the lambdas disappear just like the parens do for a seasoned Lisp hacker. They just look like jump targets or case labels, and the call site looks a lot like a case expression. It is a bit more powerful than an ordinary case expression because you could arrange for dispatch-on-foo to funcall the appropriate closure on an argument (and have the lambda expression take an argument of course).

You could do something analagous with Java 8's lambdas, but on the rare occasion I've wanted to do something similar in Java 7. The problem is that Java 7 doesn't have lambda expressions. The solution is to change these anonymous lambdas into named callback methods. First we define a generic interface with our callbacks:
interface DispatchOnFooCases<T> {
    T caseOne (void);
    T caseTwo (void);
    T caseThree (void);
    ... etc. ...
 }
then we define the dispatch method:
<T> T dispatchOnFoo (FooClass foo, BarClass bar, DispatchOnFooCases dispatchOnFooCases)
{
    try {
        if (conditional())
            return dispatchOnFooCases.caseOne();
        else
            switch (someValue()) {
              case CASE_A:
                return dispatchOnFooCases.caseTwo();

              case CASE_B:
                return dispatchOnFooCases.caseThree();

              default:
                return dispatchOnFooCases.caseFour();
            }
    } catch (SomeException someException) {
        return dispatchOnFooCases.CaseFive();
    }
}
finally, at the call site, we write this:
{
    int value =
        dispatchOnFoo<int> (foo, bar,
            new DispatchOnFooCases<int> ()
            {
                @Override
                int caseOne (void)
                {
                    ... block 1 ...
                    return aValue;
                }

                @Override
                int caseTwo (void)
                {
                    ... block 2 ...
                    return aDifferentValue;
                }

                ... etc. ...
            });
}
The good news is that we've accomplished our goal of abstracting the complex conditional dispatch from the code that does the real work — the method bodies at the call site.

There is, unfortunately, a fair amount of bad news. First, if you thought lambda expressions introduced clutter, then this is a serious amount of clutter. Between @Overrides, type declarations, interfaces, and methods, there is just a lot of extra stuff you have to type. It still might be worth the clutter if the dispatch conditions are complex enough. They just need to be that much more complex to justify all this machinery. (We've actually done the work the compiler would do to allocate and pass a “multi-closure”.) There are cases where this pays off, though.

The second piece of bad news is that Java is not (in general) tail recursive. This means that the call to dispatchOnFoo and the callback to one of the cases both introduce a new stack frame. So although the case methods run in the same lexical environment as where they are defined, they are running two stack frames deeper. This won't make much of a difference unless you try to loop by recursively calling the code. In that case, you need to be very careful to limit the amount of recursion or you will overflow the stack. It is best to avoid recursion as much as possible in the bodies of the cases.

You probably won't need to resort to this doing this. It can be a case of the cure being worse than the disease. The complexity of introducing callbacks can exceed the complexity of the conditional you are trying to abstract. But this is an interesting way to abstract a very complex conditional and can come in handy when you can justify using it. I have actually used this technique in production code to separate some complex control flow from the code that did the actual work.

by Joe Marshall (noreply@blogger.com) at February 06, 2020 09:42 PM

February 04, 2020

Programming Praxis

Password Generator

Many web sites require passwords that have particular combinations of various types of characters; for instance, a web site might require a password that has at least ten characters, including at least one each from the sets of lower-case letters, upper-case letters, digits, and special characters.

Your task is to write a program that generates passwords according to specifications that you define. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 04, 2020 10:00 AM

January 31, 2020

Programming Praxis

Anna’s Homework

Let’s help Anna with her homework:

Given an array of n elements, find an element x that appears in the array at least n/3 times, or indicate that no such element exists. Your program may take no more than O(n) time and O(n) space.

Your task is to write a program to solve Anna’s homework problem. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at January 31, 2020 10:00 AM

January 28, 2020

Programming Praxis

Non-Abundant Sums

It’s been a long time since we did an exercise from Project Euler; here is number 23:

A perfect number is a number for which the sum of its proper divisors is exactly equal to the number. For example, the sum of the proper divisors of 28 would be 1 + 2 + 4 + 7 + 14 = 28, which means that 28 is a perfect number.

A number n is called deficient if the sum of its proper divisors is less than n and it is called abundant if this sum exceeds n.

As 12 is the smallest abundant number, 1 + 2 + 3 + 4 + 6 = 16, the smallest number that can be written as the sum of two abundant numbers is 24. By mathematical analysis, it can be shown that all integers greater than 28123 can be written as the sum of two abundant numbers. However, this upper limit cannot be reduced any further by analysis even though it is known that the greatest number that cannot be expressed as the sum of two abundant numbers is less than this limit.

Find the sum of all the positive integers which cannot be written as the sum of two abundant numbers.

Your task is to solve Project Euler 23; in the spirit of Project Euler, please do not display the solution. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at January 28, 2020 10:00 AM

January 26, 2020

Joe Marshall

The pros and cons of Agile

Agile methodology is currently the popular way of attempting to develop commodity software in a factory-like manner.  It's a little hard to define what exactly is Agile and what is not.  I've worked in several companies and with several groups in companies that all claim to be Agile, yet they did things in very different ways.  But they all seemed to agree on a few things, and this I suppose could be the spirit of Agile, if not the letter.

The basic characteristic of Agile is that you break down the software into smaller and smaller tasks until you reach those tasks that can be comfortably handled by a small (2-8 person) team of engineers to complete in a small (1-2 week) time frame.  Then at the end of the time frame, you supposedly have some software that “works” in some sense of the word.  It exhibits basic functionality and the general gist of what the customer wanted, if not satisfying all the requirements for the entire project. Then over the next several time periods of development, the software is iteratively improved by fleshing out remaining requirements, adding needed functionality, hooking components together, etc. During this time, the customer is involved to make sure that what is being delivered is actually what is needed.

Some projects I worked on did this formally by a book and followed strict guidelines for the methodology, others just winged it, but all had the basic characteristics above.

One of the primary advantages of Agile is its use to management.  By having the large problem broken down into team-size, biweekly pieces, management can easily allocate and track resources usage and progress.  They can treat a team as a black box of software development and assign tasks as they arise.  Management can attempt to measure the performance of a team and see whether it is increasing, decreasing, or remaining steady.  Teams are what managers like to manage.

Another advantage is frequent feedback from the customer.  Since after each time period, a somewhat working version of some fragment of the product is available for demonstration, the customer can give feedback as to how and whether this seems to meet his needs.  He can offer suggestions about what might be improved, what features he needs to make the product at least minimally useful to him, and prevent development from getting off track.

But Agile is not a panacea.  There is still a significant amount of software produced with the traditional “waterfall” methodology with specification completed before coding begins and integration done as a final step in coding and only then releasing to the customer.  There is also a fair amount of software written “artistically”. I would define artistic software as that written by a single developer working alone over a period of several months. Frequently, such a project never gets beyond the hobbyist stage, and as such it is a risky approach to writing production code. But on occasion, an artistic project can turn into something novel and useful. It more often exhibits a unity of vision and coherence that is harder to find in software written by groups of people. (Which isn't to say that small groups cannot write software with unity of vision and coherence, it's just harder. Or they'll have one particular person in the group that has more insight than the others.)

Managers aren't as useful to artistic developers. Artistic developers tend to manage themselves. And you cannot swap out one developer for another without swapping out the entire project with him. A manager can work with an artistic developer as a peer, and help manage the project, but cannot manage the developer.

Frequently involving customers has its pros and cons as well. Often customers have difficulty imagining anything beyond incremental improvements to the current ways of doing things. They'll want a UI widget that will make some task slightly easier, but not think of automating the task altogether. They'll want to automate a series of inefficient tasks when a different viewpoint of the end result might make those intermediate tasks irrelevant. Customers are not impressed with changes to the code that don't produce visible effects. You may have spent a week refactoring in order to make it trivial to add new commands and new outputs, but customers don't care. Customers don't care about potential use cases, they care about their specific use case to the exclusion of everything else. This can be discouraging to developers.

Because Agile is so useful to managers, big and intermediate sized companies will continue to use it to develop commodity software in a factory-like style. It isn't going to be replaced any time soon. But there is still ample room in the market for small companies and individuals with vision to carve out niches that Agile methodologies will overlook and find tricky to fill.

(But I'm a romantic at heart, and I like the image of the lone hacker crafting software on his home computer in his room late at night. If only it were easy to make a living that way.)

by Joe Marshall (noreply@blogger.com) at January 26, 2020 01:59 PM