(Also see the discussion page.)

Iterators and Generators

This proposal extends the for-in loop in ES1-3 and the for-each-in loop from E4X (ECMA-357) by imitating Python’s iteration protocol. It builds on this protocol with support for generators (see PEP 255 and PEP 342) and array comprehensions modeled on Python’s list comprehensions. Where language differences or sufficiently better ideas provide strong enough reason, this proposal differs in substance from the Python prior art.

The premises of this proposal are:

  • The for-in loop is broken, because it does enumeration, not iteration, of property identifiers.
    • As under-specified in ES1-3, enumeration visits properties in an implementation-defined order. In reality, browser inter-operation requires visiting in creation order, since Netscape 2.
    • Enumeration skips DontEnum properties. Prior to ES4, you could not set a DontEnum property, so user-set properties can break for-in loops and are anathema to Ajax gurus.
    • Enumeration walks the prototype chain, skipping enumerable but “shadowed” properties (where a property of the same name exists ahead of the shadowed property on the prototype object).
    • Enumeration coerces the property name to string type, when it could more usefully remain, e.g., uint for Array elements.
    • Enumeration skips properties that existed at the start of the loop but that were deleted before visited.
  • Iteration over a well-ordered sequence of values is what most users want when they write for-in loops.
    • Inspecting existing objects via for-in should work as before.
    • New ES4 and user-defined objects may customize iteration.
  • for-each-in loops (from E4X) also enumerate, but (this is the first premise again) were intended to iterate.
    • In E4X they iterate in a well-defined (natural number) order over XMLList element values.
    • Any iteration over enumerable prototype property values is unintended and such user-set proto-properties will break E4X scripts.
    • A common confusion, or request for enhancement: that for-in iterates over Array values, not indexes. Resolvable by using for-each-in, but more usably fixed if the language supported customizable iteration.
  • Compelling use cases for a special form of a function that can yield values to a caller (directly to the caller, no arbitrary stack unwinding and rewinding) exist in common ES embeddings.
  • Conserving syntax should be done if the revised meaning is a bug fix (i.e., more likely to please than frustrate).
  • Sharing syntax with Python is a secondary but still worthy goal.

The conclusion embodied by this proposal is that there should be an iteration protocol used by the for-in loop for all object types, defaulting to enumeration, which may be overridden. The ES4 spec and new code may define objects that are iterators, which implement the protocol. Further, there should be a way to write generator functions that yield values via special syntax, and that may participate in the iteration protocol. Finally, comprehension syntax for populating arrays from iterators is proposed as a nicety familiar to Python and other programmers.

Iteration Protocol

Spec location: Chapter 6 Types: Iterator types

Let iterator name a built-in namespace:

namespace iterator;

In this proposal, the following pragma is in effect except where overridden explicitly:

use default namespace iterator;

Let IterableType be the following structural type:

type IterableType.<T> = {
  iterator::get: function (boolean=) : IteratorType.<T>
};

The iterator::get method finds or creates an iterator for the iterable. Its boolean parameter, which by convention defaults to false, controls whether the iterator considers prototype objects. Getting an iterator for an iterable is thus delegated to the iterable’s implementation.

Let IteratorType be the following structural type:

type IteratorType.<T> = {
  next: function () : T
};

By convention, an iterator is iterable and returns itself as the result of its iterator::get method. So viewed as an iterable, an iterator is its own iterator.

Not all iterables are iterators. Only those iterables that both

  • return themselves from their iterator::get methods, and
  • match IteratorType, that is, have a next method taking no arguments and returning a value

are iterators.

Let StopIteration be a global constant pre-bound to a singleton exception object:

class StopIterationClass {
  public function toString() : string 
    "[object StopIteration]";
}
public const StopIteration : StopIterationClass = new StopIterationClass;

An iterator returns zero or more values from its next method, throwing StopIteration when it is exhausted. Repeated calls to next on an exhausted iterator throw StopIteration for each call.

Note that there is no Iterator nominal type (contrast with JS1.7). It is misleading to programmers unfamiliar with structural type systems to have such a type; such programmers are likely to want to subclass Iterator to implement an iterator type, when there is no such requirement.

The for-in loop specified in ES3 12.6.4 can be written as a while loop using the iteration protocol:

for (i in o)
  ...

is equivalent to:

let ($it = o.iterator::get(true)) {
  while (true) {
    try {
      i = $it.next();
    } catch (e : iterator::StopIterationClass) {
      break;
    }
    ...
  }
}

where $it is guaranteed not to collide with any name bound in ..., or shadow any name used in ....

Enumeration

Enumerator is a parameterized class that may be instantiated with the iteration result type T. Once instantiated, an Enumerator.<T> type may be constructed via new with a result function f to get an iterator over the enumerable properties of an object v, and optionally (depending on the e flag) of objects along the initial object’s prototype chain. Each iteration step invokes the result function on the property’s identifier and the initial object value passed to the constructor (v, if it is of type Object!). The Enumerator class is callable as a shorthand for constructing a new instance.

type EnumerableId      = (int, uint, string, Name);
type EnumerableIdArray = [EnumerableId];
 
class Enumerator.<T> {
  type ResultFun = function(EnumerableId, Object!) : T;
 
  function Enumerator(v, f : ResultFun, e : boolean = false) {
    initial_obj = (v is Object) ? v : null;
    current_obj = initial_obj;
    current_ids = magic::getEnumerableIds(initial_obj);
    result_fun = f;
    enumerate = e;
  }
 
  function call Enumerator(v) : IteratorType.<T>
    new Enumerator(v);
 
  iterator function get(e : boolean = false) : IteratorType.<T>
    (e == enumerate) ? this : new Enumerator.<T>(inital_obj, result_fun, e);
 
  public function next() : T {
    if (current_obj === null)
      throw StopIteration;
 
  loop:
    while (true) {
      if (current_index === current_ids.length) {
        if (!enumerate)
          throw StopIteration;
 
        // No more properties in current_obj: try walking up the prototype chain.
        current_obj = magic::getPrototype(current_obj);
        if (current_obj === null)
          throw StopIteration;
 
        current_ids = magic::getEnumerableIds(current_obj);
        current_index = 0;
      }
 
      let id : EnumerableId = current_ids[current_index++];
 
      // Check for a shadowing property from initial_obj to current_obj on the prototype chain.
      for (let obj : Object = initial_obj; obj !== current_obj; obj = magic::getPrototype(obj)) {
        if (magic::hasOwnProperty(obj, id))
          continue loop;
      }
 
      // Check whether name is still bound in order to skip deleted properties.
      if (magic::hasOwnProperty(current_obj, id))
        return result_fun(id, initial_obj);
  }
 
  private var initial_obj   : Object,
              current_obj   : Object,
              current_ids   : EnumerableIdArray,
              current_index : uint,
              result_fun    : ResultFun,
              enumerate     : boolean;
}

The magic::getEnumerableIds primitive has type function (Object) : EnumerableIdArray and returns an array of values identifying enumerable properties in its argument object, or an empty array for null. The identifiers are arrayed in property creation order. If an identifier fits in int, it is stored using type int, likewise for uint; otherwise it is stored as a string unless it is a namespace-qualified identifier, in which case it is stored as a Name object (this is prefigured by the QName object in E4X).

The magic::getEnumerableIds primitive is a specification device only. Implementations may avoid ever allocating anything like the array that it returns, so long as they observably behave the same as the reference implementation that is based on this primitive.

All objects are iterable by the for-in loop. If an object does not customize iterator::get, it delegates to Object.prototype.iterator::get, which has the following implementation (note that Object is defined in the unnamed package, not in the iterator namespace):

dynamic class Object {
  ...
  prototype.iterator::get = iterator::DEFAULT_GET;
  prototype.propertyIsEnumerable(Name(iterator, 'get'), false);
}

The DEFAULT_GET constant function is defined (in default namespace iterator) as follows:

type EnumeratedId = (string, Name);
 
const function DEFAULT_GET(e : boolean = false) : IteratorType.<EnumeratedId>
  new Enumerator.<EnumeratedId>(this,
                                function (id : EnumerableId, obj : Object!) : EnumeratedId
                                  (id is Name) ? id : id to string,
                                e);

Thus the for-in loop enumerates as specified by prose in ECMA-262 Edition 3, section 12.6.4, the last two paragraphs. In contrast to ES1-3, the current proposal specifies and implements enumeration using self-hosted code based on only the iteration protocol and the new magic::getEnumerableIds primitive.

(Note that Object.prototype.iterator::get could be left undefined, and the logic underlying for-in loops could be modified to test whether iterator::get is bound in the right operand of in, and default to enumeration if there is no such method. This is how Python detects its __iter__ method, which is not defined for all Python types. Nevertheless, for philosophical unity of iteration and enumeration both when teaching or learning ES4, the current proposal defines Object.prototype.iterator::get.)

Itemization

E4X specifies the for-each-in loop as iterating over values of enumerable properties, rather than names (see ECMA-357 section 12.3 and note the erratum: deleted properties not yet visited MUST NOT be visited, per ECMA-262 12.6.4, the next-to-last paragraph). Call this kind of iteration “itemization”. Since ES4’s grammar includes for each loop syntax, and it is part of several existing implementations (AS3, JS1.6+), we propose to support it as follows:

type KeyType   = EnumerableId,
     ValueType = *,
     ItemType  = [KeyType, ValueType];
 
const function values(v : *, e : boolean = false) : IteratorType.<ValueType> {
  if (v is ItemizableType.<KeyType, ValueType, ItemType>)
    return v.iterator::getValues(e);
  return null;
}
 
const function DEFAULT_GET_VALUES(e : boolean = false) : IteratorType.<ValueType>
  new Enumerator.<ValueType>(this,
                             function (id : EnumerableId, obj : Object!) : ValueType
                               obj[id],
                             e);

As with iterator::get, there is a default iterator::getValues method defined on Object.prototype:

dynamic class Object {
  ...
  prototype.iterator::getValues = iterator::DEFAULT_GET_VALUES;
  prototype.propertyIsEnumerable(Name(iterator, 'getValues'), false);
}

Using iterator::values, a for-each-in loop can be rewritten as a for-in loop. This loop:

for each (v in o)
  ...

is equivalent to this loop:

for (v in iterator::values(o))
  ...

The destructuring assignment proposal introduces another variant of the for-in loop:

for ([k, v] in o)
  ...

This form iterates k and v over the key (index, identifier, or qualified name) and value of each enumerable property in o. To generalize it via the iteration protocol, we introduce iterator::getItems in the same way that iterator::getValues underlies for-each-in:

dynamic class Object {
  ...
  prototype.iterator::getItems = iterator::DEFAULT_GET_ITEMS;
  prototype.propertyIsEnumerable(Name(iterator, 'getItems'), false);
}

In the default iterator namespace:

const function items(v, e : boolean = false) : IteratorType.<ItemType> {
  if (v is ItemizableType.<KeyType, ValueType, ItemType>)
    return v.iterator::getItems(e);
  return null;
}
 
const function DEFAULT_GET_ITEMS(e : boolean = false) : IteratorType.<ItemType>
  new Enumerator.<ItemType>(this,
                            function (id : EnumerableId, obj : Object!) : ItemType
                              [id, obj[id]],
                            e);

ISSUE: users of JS1.7 (see Mozilla bug 366941) want general destructuring of iterated values, e.g. for ([s,v,o] in triple_db) ... and are surprised that using a [k, v] pattern has special meaning, or is the only destructuring pattern allowed.

  • Should general destructuring be allowed?
  • If so, should there be a [k, v] itemization special case?
  • If so, how should general destructuring interact with the [k, v] itemization special form?

This proposal specifies that general destructuring is allowed, but that there is not a [k, v] special case. Thus, in general, for-in loops with destructuring patterns on the left of in rewrite from:

for (pattern in o)
  ...

to:

let ($it = o.iterator::get(true)) {
  while (true) {
    try {
      pattern = $it.next();
    } catch (e : iterator::StopIterationClass) {
      break;
    }
    ...
  }
}

Users who want to iterate over key-value items must write for ([k, v] in iterator::items(o)) ... or equivalent. This avoids any confusing double-meaning of the length-2 array pattern.

For symmetry with values and items, there is a keys function in iterator, which delegates to a getKey method on its v parameter:

function keys(v, e : boolean = false) : IteratorType.<KeyType> {
  if (v is ItemizableType.<KeyType, ValueType, ItemType>)
    return v.iterator::getKeys(e);
  return null;
}
 
const function DEFAULT_GET_KEYS(e : boolean = false) : IteratorType.<KeyType>
  new Enumerator.<KeyType>(this,
                           function (id : EnumerableId, obj : Object!) : KeyType
                             id,
                           e);

The iterator::getKeys method has a default in Object.prototype, as usual:

dynamic class Object {
  ...
  prototype.iterator::getKeys = iterator::DEFAULT_GET_KEYS;
  prototype.propertyIsEnumerable(Name(iterator, 'getKeys'), false);
}

The getKeys, getValues, and getItems protocols imply a structural type for itemizable objects:

type ItemizableType.<K, V, I> = {
  iterator::getKeys:   function (boolean=) : IteratorType.<K>,
  iterator::getValues: function (boolean=) : IteratorType.<V>,
  iterator::getItems:  function (boolean=) : IteratorType.<I>
};

Given the default values assigned to these names in Object.prototype, all objects are itemizable. But if an object were to override any method name with an incompatibly-typed value, that object would no longer be itemizable. This may be desirable, e.g., for an unordered set data type.

Note that DEFAULT_GET_KEYS and DEFAULT_GET_ITEMS return an iterator yielding (in whole or in part) EnumerableId = (int, uint, string, Name) – contrast with DEFAULT_GET which returns EnumeratedId = (string, Name) as required for backward compatibility with the ES1-3 for-in loop.

Using the itemizable protocol, collection classes can customize all the loop forms. To customize for-in, set iterator::get and iterator::getKeys to the same function that yields keys. To customize for-each-in, set iterator::getValues. To customize for ([k, v] in o) loops and comprehensions, set iterator::getItems.

In any event, iterator::get may be set to make the shortest kind of for-in loop perform the default or natural iteration for the given type on the right of in. Some types will want for-in to iterate values by default, others keys – still others items. This proposal intentionally supports all combinations.

Compatibility

As specified above under Enumeration, magic::getPropertyIds returns an array of values identifying properties indexed in the order in which those properties were created. This matches behavior required for inter-operation among browser-based ES1-3 implementations.

Strict compatibility with ES1-3 as written (see 12.6.4 step 3, and 9.9 ToObject) requires that a TypeError be thrown for any loop over null or undefined, but inter-operation on the Web imposes an additional requirement beyond visiting properties in creation order: browser implementations must tolerate for (i in null) and for (i in undefined), iterating zero times. ES4 should take this bug fix by amending the translation from for (i in o) loop to a while loop as follows:

let ($v = o) {
  if ($v !== null && $v !== undefined) {
    let $it = $v.iterator::get(true);
    while (true) {
      try {
        i = $it.next();
      } catch (e : iterator::StopIterationClass) {
        break;
      }
      ...
    }
  }
}

Note that ES3 specifies the in operator as throwing a TypeError on null and undefined, and no inter-operation pressure exists to change its behavior on these inputs. Thus for-in and in disagree at these edge cases.

Coherence

Apart from the null and undefined edge cases, for-in and the in operator agree by default when enumerating. That is,

let a = [3, 4, 5];
 
for (let i in a) {
  assert(i in a);
  assert(a.indexOf(a[i]) != -1);
}

Iteration and in-tested membership are coherent.

E4X’s for-each-in did not alter in or add an each in (sic) operator, resulting in loss of coherent meaning of in when itemizing:

for each (let v in a) {
  let i = a.indexOf(v);
  assert(i != -1);
  assert(i in a);
}

Customization of the iteration protocol can result in similarly incoherent in operator results. Therefore ES4 supports a companion protocol for extending the in operator, which is distinct from the operators that can be overridden using static class methods. This difference is intended to facilitate retrofitting, e.g.:

Array.prototype.iterator::get      = iterator::DEFAULT_GET_VALUES;
Array.prototype.iterator::contains = function (v) this.indexOf(v) != -1;

Note how operand order from operator in is reversed when calling iterator::contains: a in b may translate to b.iterator::contains(a).

The implementation of the in operator checks for a binding of iterator::contains on its right operand and invokes that method if found. For backward compatibility with ES3 11.8.7, Object.prototype contains a default binding as follows:

dynamic class Object {
  ...
  prototype.iterator::contains = iterator::DEFAULT_CONTAINS;
  prototype.propertyIsEnumerable(Name(iterator, 'contains'), false);
}

In the default iterator namespace:

const function DEFAULT_CONTAINS(v) : boolean {
  let id : EnumeratedId = (v is Name) ? v : v to string;
  let obj : Object = this;
  do {
    if (magic::hasOwnProperty(obj, id))
      return true;
    obj = magic::getPrototype(obj);
  } while (obj !== null);
  return false;
}

Thus there exists a container structural subtype of the iterable type (in default namespace iterator as usual):

type ContainerType.<T> = {
  iterator::get:      function (boolean=) : IteratorType.<T>,
  iterator::contains: function () : boolean
};

ISSUE: as with the iteration_protocol, an object used on the right of the in operator may fail to be compatible with this type. Mutable prototypes make it easy to break core language semantics. This is true to some extent in ES1-3, e.g. by overriding Object.prototype.toString or Object.prototype.hasOwnProperty. One way to enforce default semantics would be to say that if the right operand of in is incompatible with iterator::ContainerType, then in defaults to ES3 semantics. But if we say this, there is no point in defining Object.prototype.contains.

The same argument could be made, as noted above re: Python’s __iter__ probing, for the case where a for-in loop head’s right operand is incompatible with iterator::IterableType.

For an interesting attempt to unify iteration and contains-testing, see JMatch, which uses modal declarations for forward (contains-testing) and backward (iteration) directions of operation; boolean formulas may be used to unify both modes in one expression.

Generators

Let a new reserved identifier keyword yield introduce a variant of AssignmentExpression:

AssignmentExpression ::= "yield" AssignmentExpression

A function containing a yield expression is a generator function, which when called binds formal parameters to actual arguments but evaluates no part of its body. Instead, it returns a generator-iterator of nominal type Generator:

class Generator.<O, I, E> {
  public function next() : O;
  public function send(i: I) : O,
  public function throw(e : E) : O,
  public function close() : void
};

The aspects of this class are as follows:

  • The type O (”output”) is the type of values being generated via evaluated operands to yield.
  • The type I (”input”) is the type of values being sent to the generator via parameters to send.
  • The type E (”exception”) is the type of exceptions being thrown at the generator via parameters to throw.
  • The method call send(x) resumes evaluation of the generator function with the value x as the result of the last yield, and returns the operand of the next yield.
  • The method call next() is equivalent to the method call send(undefined).
  • The method call throw(x) forces the generator to throw the exception value x in its current suspended context (as if the yield that suspended were a throw x statement), and then returns the result of the next yield.
  • The method call close() forces the generator to close itself. The effect of close() is to “return” from the generator where it’s suspended. This is so that finally clauses active in the generator function can be run. If such a finally clause throws an exception other than StopIteration, the exception is propagated to the caller of close().

When it is written above that a method call “returns the operand of the next yield” the call may instead terminate abruptly with a StopIteration exception.

A generator must be started by a method call to next() or send(undefined). It is a TypeError to send a value other than undefined to a newborn generator.

An example that prints the first ten Fibonacci numbers:

function fib() {
  let i = 0, j = 1;
  while (true) {
    yield i;
    [i, j] = [j, i + j];
  }
}
 
let g = fib();
for (let i = 0; i < 10; i++)
  print(g.next());

Another example, a natural number generator:

function naturals(n : uint) {
  for (let i : uint = 0; i < n; i++)
    yield i;
}

The method close() is called automatically only for a generator iterator that is started by a for-in or for-each-in loop or comprehension, upon normal or abrupt termination of the loop. All but the first call to close() have no effect. Thus the for-in loop over a generator iterator can be translated from:

for (i in naturals(20))
  ...

to:

let ($gen = naturals(20)) {
  try {
    while (true) {
      try {
        i = $gen.next();
      } catch (e : iterator::StopIterationClass) {
        break;
      }
      ...
    }
  } finally {
    $gen.close();
  }
}

Note that close is called whether or not the generator iterator denoted by $gen can be reached via other references.

This translation is a specification device only. Implementations cannot analyze all for-in loops and discern generator iteration statically, so must check at runtime for a newborn generator iterator on the right of in.

Comprehensions

ES4 provides syntactic sugar for the common operation of populating a new Array instance using an expression that depends on an iterator. The initialiser below shows an example of an array comprehension:

let squares = [i * i for (i in naturals(10))];

The squares declaration is equivalent to:

let $arr = [];
for (let i in naturals(10))
    $arr[$arr.intrinsic::length] = i * i;
let squares = $arr;

(Again assume that $arr is a generated, non-colliding name.)

Note that the variable i is scoped as if by let but only within the [] enclosing the comprehension. Note also the evaluation of the expression i * i in the body of the loop, and the use of intrinsic::length.

for-in loop heads may be nested within a comprehension:

let flat_mat = [i * j for (i in naturals(8)) for (j in naturals(8))];

A final if (condition) may follow the last for-in head to filter elements out of the resulting array:

let funny_mat = [i * j for (i in naturals(8)) for (j in naturals(8)) if (i != 0 && j != 0)];

The comprehension variables are lexically bound within a single block scope conceptually delimited by the [] brackets.

Each variable name in a for-in head may be followed by an optional type annotation.

The each contextual keyword may be used after for to iterate over values rather than names when itemizing:

let odd_squares = [i * i for each (i in [1, 3, 5, 7])];

Example

Here’s an example of a common DOM programming style that’s made simpler with iterators and generators: imagine an animation that flashes a DOM element on and off n times:

function animate(n) {
    let i = 0;
    let id = window.setInterval(function() {
                                    if (i >= n)
                                        window.clearInterval(id);
                                    else {
                                        if (isHighlighted())
                                            unhighlight();
                                        else {
                                            highlight();
                                            i++;
                                        }
                                    }
                                },
                                100);
}
 
animate(3);

Using yield, we could implement this with a loop instead:

let id;
 
function makeAnimator(n) {
    let i = 0;
    while (i <= n) {
        if (isHighlighted())
            unhighlight();
        else {
            highlight();
            i++;
        }
        yield;
    }
    window.clearInterval(id);
}
 
id = window.setInterval(makeAnimator(3), 100);

(This assumes a slight DOM extension for window.setInterval to accept an IteratorType as its first argument. Otherwise you’d just use function(){animator.next()}.)

 
proposals/iterators_and_generators.txt · Last modified: 2007/03/11 11:20 by brendan
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki