Thursday, May 26, 2011

DAGs, partial orderings, program lattices

Program lattices, partial orderings and directed acyclic graphs are *the* critical foundational constructs that Flow is built upon, so I will attempt to describe here what these things are, why they matter to programming languages, and how they fit in the design of Flow.

Difference between dataflow programming languages and Flow

Dataflow programming languages (where you can build a program as a "circuit diagram") are overly complicated. Flow is very different from most other dataflow programming languages, you can't just connect a dataflow graph however you want in the style of old-school flow charts. Flow charts died out for a reason: they produce spaghetti code. Flowcharts -- and, by derivation, most dataflow languages, allow the user to make almost arbitrary connections between any part of the chart/graph and any other part of the graph, including introducing overlapping inter-nested loops of arbitrary complexity. This is directly correlated with the GOTO keyword in BASIC that died a just and horrible death.

For the purposes of describing the control flow of a program, old-school flowcharts were briefly superceded in the 90s (at least at my university) by a second generation of control flow diagramming tools: hierarchical program flow models, which were like a UML hierarchical call chart, only oriented vertically: you would show the program structure in a series of nested function calls, drawn as a tree. In effect, this enforced pure hierarchical containment of control flow, i.e. it forced you to design a control flow for your code that was in essence purely functional in style.

The Flow model introduces a third paradigm, that of program lattices. A lattice is a partial ordering and a DAG by definition, as described below.

(1) Directed Acyclic Graphs (DAGs)
http://en.wikipedia.org/wiki/Directed_acyclic_graph

Definition

Loosely defined, these are basically like a forest of rooted trees (meaning trees with arcs/edges directed away from the root) whose nodes may branch out but whose branches may also join together (coalesce) as you move away from the root.

More formally, and by very definition, a DAG is an arbitrary directed graph that does not possess a cycle: there is simply no way to follow a path from any start node and end up back at the same start node again after a nonzero number of moves. (Note that if you take away the directionality of an arc, turning it into an undirected edge, the graph may indeed possess cycles, but in the directed form.)

Implication for programming language design

Most imperative programming languages conflate the concept of variables and values: the action of reading from a variable returns its *current value*, and because (except when working with constants) imperative languages primarily concern themselves with variables, the actual values those variables hold cannot be known at compiletime. Consequently, the data dependency graph in an imperative language (what needs to be computed before something else can be computed from it) is also a graph of dependencies between variables, not values, and it is entirely possible to create directed data dependency cycles in your program, for example if you have the statement, "i = i + 1" inside a loop.

The problem with this is that underlying the variable data dependency graph (the "compiletime data dependency graph") is something that might be called the true "data operand graph" (or the "runtime data dependency graph"), and this is defined by values, not variables. If you were to record exactly what happens during a specific run through the program, every time a variable is updated, you could create a new "value node" in the graph, and when the current value of a variable is read, you would add an edge to a graph that points to the specific value node that corresponds to the current value of the variable.

In this model, it is easy to see where race conditions come from in a multithreaded context: if another thread pushes a value into a variable at an unexpected time, this actually changes the target node in the underlying "data operand graph" for the edge that corresponds to the the next read from the variable. In other words, the entire connectivity of the true underlying data dependency graph is at the mercy of the relative timing of reads and writes.

However if we remove the concept of "the current value of a variable" from the programming language, and replace it with "the one and only value this variable will ever take on" (immutability) or "the value of this variable at a specific timestamp" (syntactic sugar that allows aliasing of variables with a parameter that means something to the user and the compiler, e.g. "x(t) = x(t-1) + 1"), then this problem disappears.

In fact it turns out that the only compilable programs in a language with this constraint have compiletime and runtime data dependency graphs that are exactly the same, and that are DAGs, i.e. the language is syntactically constrained such that all valid programs have variable reference graphs that *are* the precise runtime data dependency graphs.

Functional programming languages already satisfy this, because all variables actually represent immutable values, but the requirement that the program's structure must exactly describe the runtime data dependency graph is actually in some ways a weaker condition than that imposed by functional programming, because it's possible to support a subset of imperative-style "scatter" computation in the DAG model whereas the functional style only really describes "gather" computation.

(2) Partial orderings
http://en.wikipedia.org/wiki/Partial_ordering

Definition

A partial ordering is a DAG where the directions on the arcs/edges follow the arrow of time, in other words where A -> B indicates "A happens before B" (or, as a dependency graph, A <- B indicates "A must happen before B can happen").

Assuming that time moves downwards, and given the following graph,

  A
 / \
B   C
 \ /
  D


we have that A must happen before B and C, and B and C have to happen before D. However, this graph says *nothing* about the relative order in which B and C must happen: B can happen before C, or C can happen before B, or both can happen concurrently.

Implication for programming language design

Direct examination of a partial ordering can yield a concurrent execution plan for a parallel program: if there is no directed path between two nodes that passes through zero or more other nodes in a partial ordering of work units to be completed, then the work units corresponding to the two nodes can be completed in parallel.

If your programming language has syntactically enforced that every program's reference structure is a DAG, then every program is also a partial ordering, and by examining the source alone, a simple parallelization plan can be determined in many cases.

(3) Lattices
http://en.wikipedia.org/wiki/Lattice_(order)

Definition

A lattice is a partial ordering with a single unique "top element" (least upper bound) and a single unique "bottom element" (greatest lower bound). Every node in the lattice is reachable by following some directed path from the single least upper bound, and there is a directed path from every node in the lattice to the single least lower bound.

Implication for programming language design

In a programming language, the "top element" could be thought of as the start state of the program, and any non-unique greatest lower bound indicates some value that was computed but not used before the program terminates at the "bottom element", and therefore may be pruned as an unused value. Program DAGs pruned of unused computation become lattices. Hence if the compiler is doing its job, a Flow program will really be a lattice and not just a DAG.

Lattices in Flow

In Flow, a node of a DAG typically represents a collection, and the arcs coming into a node represent the data dependencies required to compute the contents of that collection (as well as, collectively, the operation that actually computes the contents of the collection based the values of its dependencies).

Arbitrary conditional control flow is specified in Flow by conditionally nesting/embedding sub-lattices within a bigger lattice (i.e. nodes in lattices can be recursively expanded into entire nested lattices -- basically this is like recursive function application).

Because, with program lattices, there are no cycles or back-links that violate the arrow of time, it is impossible in Flow to create spaghetti "circuit diagrams", or there are no arbitrarily inter-nested loops (i.e. no GOTO equivalent).  However, you are also not forced into the pure functional mindset that emerges from requiring control flow to be purely the result of hierarchical/tree-like calls.

Structuring programs as lattices is not new per se, it's implicitly used in a weak form in "array programming" (performing elementwise operations on collections in parallel), but lattice-based computational models are not syntactically enforced elsewhere, and (it appears) are not yet used as the most important principle driving the design of programming languages other than Flow.

Saturday, May 7, 2011

On ordering and properties of domains and maps

Here is some of the thinking behind the "push" operator "->".  It basically maps to a "parallel-add-to-collection", ConcurrentCollection.add() or similar. The question was raised on the flowlang mailing list recently as to what the type was of this operator. There's quite a lot going on with type inference in Flow so it's worth highlighting those specific concepts in one place.  Also perhaps by rewording this, it may come across clearer than in the Flow Manifesto.

To recap relevant background:

Morphisms, domains and co-domains

Algorithms can basically be broken down into morphisms (elemental function applications) and control flow (loops/conditionals: the logic that connects data flowing into and out of morphisms together in some sort of fixed or dynamic graph). If the data dependency graph is viewed as a pipeline with DAG topology, then data can be pushed through the pipeline continuously but data that is all somehow mutually associated needs to be pushed through in "waves", and this gives rise to the need for collections at the nodes, not just single values.

[A morphism is fundamentally a map from a domain to a range, or a domain to a co-domain if you are talking about only the values in the range that are actually mapped to from the domain.  A specific co-domain is basically a collection (set or list, not a HashMap or other map type), and a morphism is basically the same thing as a function if it maps from a domain to an unknown range, or a map (HashMap or similar) if it maps from a finite domain to a specific co-domain.]

Flow tries hard to unify functions and maps -- and this makes a lot of sense, because in many functional programming languages, functions are "memoized" which basically means you cache a function's values in a HashMap to avoid having to re-compute them where possible. (Eventually we'll blur the line between what's computed at runtime and what's computed at compiletime, and some functions may be partially or wholely evaluated at compiletime and be transparently turned into a precomputed map, but that's a subject for another post.)

Tracking properties of variables

Every typed language keeps track of some information about variables (whether a variable represents an integer or a float, etc.), but Flow will take this much further, e.g. if a variable x can only represent an integer in the range [0..49] inclusive because it is the index of an item in an ordered collection of fixed constant size 50, then everywhere x is used, other things that are calculated from x have their own domains calculated relative to this number range.  e.g. if you set y = x/2 (using integer division), then y's range will be between [0..24] inclusive.

Tracking domains in close detail allows for some really powerful language features: for example, the isqrt() integer square root function might require a parameter that is an integer in the range [0..], so if you pass in an integer with unknown range or with known range that includes negative numbers, you'll get a compiletime error. No more division by zero etc. -- and you can even be warned about possible overflow, underflow, NaN situations etc. at compiletime.  (It will take some experimentation to figure out how pedantic to make this without annoying the programmer in some cases with variables whose value isn't known at compiletime.)  Also this can be extended to other types to eliminate NullPointerExceptions: you will know if it's possible for a reference's value to ever possibly be null just by looking to see if the "exceptional value" #Null is in the domain.

Tracking properties of collections and maps

The really powerful things that Flow will track are the properties of collections (co-domains, i.e. the specific collections that are produced by morphisms) and maps (morphisms).  This is where a lot of the magic will happen in Flow.

Some of the properties that will be tracked for collections include: orderedness; countability/iterability; sparseness; finiteness; whether or not duplicate objects are allowed.

Some of the properties that will be tracked for maps include: surjectivity; injectivity; bijectivity.

A couple of examples of traditional data representations that map to these:

(1) An array is a map from an ordered, dense, finite, iterable integer domain (the array indices) to a set of unordered values (the set of values in the array). (It is possible to iterate through the keys and pull out the values in array index order, but only if needed; if you don't need the values in a specific order, you can just consider the values in unordered set form).

(2) A sparse named array is a map from an ordered or unordered, sparse, finite, possibly iterable domain to some set of values.

(3) A map or morphism type could be created to read in lines from a text file, and the domain would be line numbers and the range would be lines of text.

Orderedness of domains

In most languages, sorting a collection is a manually-initiated operation: you start with an ArrayList<String>, for example, and you call Collections.sort() if you want that collection sorted.  There are also usually special collections that keep their elements sorted: for example, priority queues and TreeMaps.

In flow, orderedness is a first-class attribute of a collection. Sorting is not something you "do" in almost any case, it's simply the property of a domain or collection. And wherever possible, if orderedness is not required, collections are left unordered.  Collections are treated as unordered by default.

The reason for this is that there are a few basic algorithmic patterns that you see arising again and again: maps, folds and filters to name a few. Map operations are elementwise so don't impose any ordering constraints.  Folds only impose ordering constraints if the function being applied is not commutative (and they only impose "divide-and-conquer"-type grouping constraints if it's a parallel fold and the function being applied is not associative).  And, interestingly, filter operations only require their input to be ordered if something that depends on their output requires that the output is ordered relative to the input -- so the need for orderedness propagates lazily back through filter operations.  (The need for orderedness also propagates lazily back through map operations, if the output of the map is (key,value) pairs and not just values.)

Orderedness of domains and implicit parallelization

In fact, most sorting incurs overhead of some sort, but, worse than that, saying that you want a sorted collection when you don't ever apply some sort of non-commutative operator to the collection just unduly constrains the compiler from being able to find good parallelizations of your code.  So Flow will find the places that really need sorted data, and back-propagate the requirement as far as necessary up the data dependency graph to find other things that need to be ordered to produce output in the right order.

For example, if you are writing data out to a file, the method for writing lines of data will require an ordered collection, because the act of serializing elements of a collection is a non-commutative operation.  If you try to pass in an unordered collection of lines to be written to a file, you will get a compiletime error. This is how the compiler statically guarantees that operations like this are threadsafe. (Note however that it is possible for an operation to be non-commutative but still associative, for example string concatenation -- in this case, lets say you have 80 strings to concatenate into one string, and you have 8 threads, the compiler knows it has to keep all the strings in order relative to each other, so it requires the input be ordered, but it doesn't have to run the concatenation in series, it can split the list of 80 strings into 8 pieces containing 10 strings each, one per thread, and combine the results at the end to produce a single string. The compiler can transparently generate skip-lists or similar data structures under the hood to allow this sort of work splitting to be performed quickly.)

There could be several ways to specify that a non-commutative operator should first order a given unordered collection in an appropriate way.  One way would be to specify a comparator or "order by" clause, and this would produce a "view" of the collection that is remembered by the compiler, and reused without re-generating if the same ordering is used elsewhere in the program.  Another method would be to specify that the natural ordering of the elements in the collection should be used (e.g. lexicographic order for strings, increasing size for numbers).  Another method would be to have a function that takes a map with naturally-orderable keys and unordered values, and produces a list of values in the corresponding key order (this would be useful for dumping out values in arrays in order, for example) -- but I want to avoid this where possible because it implies launching a sort operation manually. I think the compiler should almost always make the decision as to what should be sorted and when, to minimize sorting wherever possible, again because this not only incurs overhead but it reduces parallelizability.  *Everything* in Flow should be designed with maximum parallelizability as THE driver.

Back to the "->" push operator

Back to "->": The reason specified in the previous post that "->" would push values into an unordered collection by default is mostly that it allows for maximum degrees of freedom in parallelizing.

Also hopefully the above description begins to answer questions about type inference for push operations.  The type of the target of a push operation is simply a set of elements whose type is the same as the type of the values being pushed into the collection.

Note that you can push into the same collection from multiple places in your source, and the compiler will try to unify the types that are being pushed from different places. If they are not unifiable, you will get a compiletime error.

As mentioned above, any requirement for orderedness is back-propagated by the compiler from operations that require orderedness because of non-commutativity.  So it is possible that ultimately the push target will end up ordered.  In that case, a sort or "shuffle" phase (to use Google's MapReduce terminology) will automatically be inserted by the compiler, or the collection will be turned into an ordered collection like a TreeSet, depending on what is most efficient.

Monday, May 2, 2011

Flow syntax and semantics

I'm cross-posting the following from the "flowlang" Google Group, with minor modification.

--

It might be helpful to think about the creation of a syntax for Flow that maps onto the semantics described in the Flow Manifesto as follows:
  • Take a very simple purely functional programming language syntax -- maybe a subset of Haskell, or a functional subset of Python.
  • Add one operator, a scatter/push operator, say "->", that gives the language more of an imperative feel.
  • [You can also add timestamps, e.g. x(t) = x(t-1) * 2. These also give the language more of an imperative feel, but I won't talk about those here for brevity. They're just basically a convenient way of aliasing immutable variables that has semantic value to both the user and the compiler.]
Then assignment, "=", is a pull / gather operation: y = f(x) pulls a value from x and applies f then pulls the computed value into y as expected.

The way that "->" operates is that it pushes / scatters values to locations, which is typical of the imperative style of programming. For example, take the following Java code:

int[] ys = new int[] {1, 5, 2, 9, 1, 1, 4, 2};
int maxVal = 0;
for (int i = 0; i < ys.length; i++)
    maxVal = Math.max(maxVal, ys[i]);
int[] hist = new int[maxVal + 1];
for (int y : ys)
    hist[y]++;


You would do something like this in Flow (making up the syntax):

ys = {1, 5, 2, 9, 1, 1, 4, 2}
for y in ys
    1 -> ones[y]      // each entry ones[y] is a set of integers (with value 1)
hist[i] = sum ones[i] // implicit iteration: "for all i", i.e. hist = map sum ones


This histogram example is one I keep coming back to in my own mind because it's a common and minimal testcase for the "push" model of programming. Basically you're scattering counts into a histogram at indices corresponding to list values, which is not in general threadsafe and is therefore not parallelizable in imperative languages without extra work.

In the Java case I just incremented the histogram values directly; in the Flow case I output a stream of 1s and they were then collected by running a "map" operator over the collection ones[i].  The key thing to realize here is that ones[i] is constrained (by being the target of the "->" operator) to be an unordered collection, because otherwise you would get a race condition if you were pushing different values that had to stay in order relative to the input. It doesn't matter here whether it is ordered or not, of course, because everything that is getting pushed is a 1 -- but in more complicated cases it can matter. (If you force the target of a "->" operator to be ordered, then the compiler can still parallelize, but it will have to do some extra work to keep things in order.)

For smaller lists, i.e. ys.length < L, it can just run this as a single-threaded program.  For larger lists, the compiler is free to parallelize this in a lot of different ways, for example:
  1. It put a lock on each bin, ones[i], to prevent race conditions.
  2. By realizing that the "sum" operator is just folded addition, and by realizing that addition is associative and commutative, the compiler can build one copy of the "ones" array-of-sets in each thread's TLS (thread local storage), and then combine these separate copies at the end.
(The second version has higher memory requirements but is lock-free until the end. It incurs a bit more time at the combine stage. This can all be calculated as a Big-Oh complexity function of the input size, input value distribution, etc., and the compiler can switch between implementations as needed.)

There are other optimizations that can be performed, e.g. the compiler can perform something similar to "hoisting" by realizing that you're just pumping the ones through a set collection and then into a sum function, and that can be converted to an increment operation.

UPDATE:

The "->" operator should be able to support arbitrary MapReduce-style mapping, scattering, shuffling, grouping by key and reducing.  For example, let's say the "y in ys" actually represents an id for a Person record of some form, and you want to group everybody together who has the same first name, you should be able to do something like the following:

for y in ys
    p = persons[y]
    p -> firstNameGroups[p.firstName]

It might be more helpful in general to extend "->" to take (key,value) pairs, and group by keys:

for y in ys
    p = persons[y]
    (p.firstName, p) -> firstNameGroups

...or, for the histogram example:

for y in ys
    (y, 1) -> ones

Saturday, November 6, 2010

Welcome to Flow

This site is in a very early stage of construction.  Please read the Flow Manifesto for information on what Flow is all about.

Luke Hutchison gave an invited talk on Flow at the Harvard Medical School Open Source Developer Retreat, "Write Once, Parallelize Anywhere: Solving the Multicore Dilemma with the Flow Programming Language" 2011-03-15.  [Slides]