Python's `range`: fancier than I thought

I knew from way back when, in the Python 2 days, that xrange was better than range because it was lazy (and range was kept around for backwards compatibility). All this means is that range(1000000) doesn't allocate a list with a million integers. Python 3 came around, and range finally started to behave like I wanted. I could still get a list numbers from one to ten with list(range(1, 11)) (and force it to allocate the memory all at once). But it turns out that most of the time, I don't need to!

Ranges aren't just iterators!

Somehow I got it into my head that range objects were iterators. Iterators are constructed by calling iter on an object that supports the iteration protocol or the sequence protocol. The critical thing about iterators is that you can't just grab a random item by its index, and they can just be consumed once:

In [1]: ls = [4, 5, 6]

In [2]: it = iter(ls)

In [3]: it[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-40b9578de2b6> in <module>
----> 1 it[0]

TypeError: 'list_iterator' object is not subscriptable

In [4]: next(it)
Out[4]: 4

In [5]: next(it)
Out[5]: 5

In [6]: next(it)
Out[6]: 6

In [7]: next(it)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-7-bc1ab118995a> in <module>
----> 1 next(it)

StopIteration:

However, the original list, ls, is left alone. Without getting too fancy, we could naively implement something similar:

class NaiveIterator:
    def __init__(self, sequence):
        self._pos = 0
        self._seq = sequence

    def next(self):
        pos = self._pos
        self._pos += 1
        return self._seq[pos]
In [1]: from naive_iterator import NaiveIterator

In [2]: ls = [4, 5, 6]

In [3]: it = NaiveIterator(ls)

In [4]: it.next()
Out[4]: 4

In [5]: it.next()
Out[5]: 5

In [6]: it.next()
Out[6]: 6

In [7]: it.next()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-24d6ab57088b> in <module>
----> 1 it.next()

~/src/blog-code/python-range/naive_iterator.py in next(self)
      9         pos = self._pos
     10         self._pos += 1
---> 11         return self._seq[pos]
     12

IndexError: list index out of range

If we want to split hairs, this only works for a subset of the objects that iter works for--only for sequences, or objects that implement __getitem__ and __len__. For example, we can't access the keys of a dictionary by index, but we can call iter on them:

In [1]: d = {'a': 1, 'b': 2}

In [2]: d.keys()[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-8ea809e8b798> in <module>
----> 1 d.keys()[0]

TypeError: 'dict_keys' object is not subscriptable

In [3]: it = iter(d.keys())

In [4]: next(it)
Out[4]: 'a'

What's an iterator, then?

We can iterate over lists--are they iterators? No! A list is not an iterator until we call iter on it. But it's an iterable! Basically, an iterable is a sequence or an iterator.

It occured to me the other day--the value at index i of a range can always be calculated based on its start, stop, and step attributes even if the value at an index greater than i has been accessed! Another naive implementation:

def range_at_i(start, stop, step, i):
    val = start + step * i
    if val > stop:
        raise IndexError()
    else:
        return val

What's range, then?!

I then played around and confirmed it--range is a sequence type, not an iterator after all! And, unlike sequence types that occupy more memory as they grow larger, range objects don't need to.

In [1]: r = range(100)

In [2]: r[30]
Out[2]: 30

In [3]: r[0]
Out[3]: 0

Who cares? Or, range to the rescue

Me! I was about to write a class that behaved just like range. Suppose you're writing a plotting library and you want to represent a 1-dimensional dataset, but you don't want to make the user provide the values for those little ticks along the x-axis at the bottom of the plot. By default, you just want to use the index of the datapoint in the provided dataset. You can simply write that like this:

class DataSeries:
    def __init__(ys, xs=None):
        self._ys = ys
        if xs is None:
            self._xs = range(len(ys))
        else:
            if len(xs) != len(ys):
                raise ValueError('Length of xs must match length of ys')
            self._xs = xs

    @property
    def zipped(self):
        return zip(self._xs, self._ys)

If ys is a list or Numpy array, it's already taking up memory proportional to the number of elements in the sequence. But there's no need to take up twice as much memory to store the x-values!

n [1]: from data_series import DataSeries

In [2]: ls = [4*i for i in range(4)]

In [3]: ls
Out[3]: [0, 4, 8, 12]

In [4]: series = DataSeries(ls)

In [5]: series
Out[5]: <data_series.DataSeries at 0x7fc8436ca278>

In [6]: series.zipped
Out[6]: <zip at 0x7fc843676a48>

In [7]: list(series.zipped)
Out[7]: [(0, 0), (1, 4), (2, 8), (3, 12)]

range to the rescue.