Python's `range`: fancier than I thought
I knew from way back when, in the Python 2 days, that xrange was better than
range because it was lazy (and range was kept around for backwards compatibility). All this
means is that range(1000000) doesn't allocate a list with a million integers. Python 3 came
around, and range finally started to behave like I wanted. I could still get a list numbers from
one to ten with list(range(1, 11)) (and force it to allocate the memory all at once). But it
turns out that most of the time, I don't need to!
Ranges aren't just iterators!
Somehow I got it into my head that range objects were iterators. Iterators are
constructed by calling iter on an object that supports the iteration protocol or the sequence
protocol. The critical thing about iterators is that you can't just grab a random item by its index,
and they can just be consumed once:
In [1]: ls = [4, 5, 6]
In [2]: it = iter(ls)
In [3]: it[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-40b9578de2b6> in <module>
----> 1 it[0]
TypeError: 'list_iterator' object is not subscriptable
In [4]: next(it)
Out[4]: 4
In [5]: next(it)
Out[5]: 5
In [6]: next(it)
Out[6]: 6
In [7]: next(it)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-7-bc1ab118995a> in <module>
----> 1 next(it)
StopIteration:
However, the original list, ls, is left alone. Without getting too fancy, we could naively
implement something similar:
class NaiveIterator:
    def __init__(self, sequence):
        self._pos = 0
        self._seq = sequence
    def next(self):
        pos = self._pos
        self._pos += 1
        return self._seq[pos]
In [1]: from naive_iterator import NaiveIterator
In [2]: ls = [4, 5, 6]
In [3]: it = NaiveIterator(ls)
In [4]: it.next()
Out[4]: 4
In [5]: it.next()
Out[5]: 5
In [6]: it.next()
Out[6]: 6
In [7]: it.next()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-24d6ab57088b> in <module>
----> 1 it.next()
~/src/blog-code/python-range/naive_iterator.py in next(self)
      9         pos = self._pos
     10         self._pos += 1
---> 11         return self._seq[pos]
     12
IndexError: list index out of range
If we want to split hairs, this only works for a subset of the objects that iter works for--only
for sequences, or objects that implement __getitem__ and __len__. For example, we
can't access the keys of a dictionary by index, but we can call iter on them:
In [1]: d = {'a': 1, 'b': 2}
In [2]: d.keys()[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-8ea809e8b798> in <module>
----> 1 d.keys()[0]
TypeError: 'dict_keys' object is not subscriptable
In [3]: it = iter(d.keys())
In [4]: next(it)
Out[4]: 'a'
What's an iterator, then?
We can iterate over lists--are they iterators? No! A list is not an iterator until we call iter on
it. But it's an iterable! Basically, an iterable is a sequence or an iterator.
It occured to me the other day--the value at index i of a range can always be calculated based on
its start, stop, and step attributes even if the value at an index greater than i has been
accessed! Another naive implementation:
def range_at_i(start, stop, step, i):
    val = start + step * i
    if val > stop:
        raise IndexError()
    else:
        return val
What's range, then?!
I then played around and confirmed it--range is a sequence type, not an iterator after all! And,
unlike sequence types that occupy more memory as they grow larger, range objects don't need to.
In [1]: r = range(100)
In [2]: r[30]
Out[2]: 30
In [3]: r[0]
Out[3]: 0
Who cares? Or, range to the rescue
Me! I was about to write a class that behaved just like range. Suppose you're writing a
plotting library and you want to represent a 1-dimensional dataset, but you don't
want to make the user provide the values for those little ticks along the x-axis at the bottom of
the plot. By default, you just want to use the index of the datapoint in the provided dataset. You
can simply write that like this:
class DataSeries:
    def __init__(ys, xs=None):
        self._ys = ys
        if xs is None:
            self._xs = range(len(ys))
        else:
            if len(xs) != len(ys):
                raise ValueError('Length of xs must match length of ys')
            self._xs = xs
    @property
    def zipped(self):
        return zip(self._xs, self._ys)
If ys is a list or Numpy array, it's already taking up memory proportional to the number of
elements in the sequence. But there's no need to take up twice as much memory to store the x-values!
n [1]: from data_series import DataSeries
In [2]: ls = [4*i for i in range(4)]
In [3]: ls
Out[3]: [0, 4, 8, 12]
In [4]: series = DataSeries(ls)
In [5]: series
Out[5]: <data_series.DataSeries at 0x7fc8436ca278>
In [6]: series.zipped
Out[6]: <zip at 0x7fc843676a48>
In [7]: list(series.zipped)
Out[7]: [(0, 0), (1, 4), (2, 8), (3, 12)]
range to the rescue.