List comprehension gives Python a simple way to create new lists from other lists. They are easy to read, execute quickly, and feel right at home in Python. With there being a few ways to create a new list, it's important to know their advantages.
We will take a look at how list comprehension stacks up again Python's support for lambda functions in combination with map, as well as seeing if plain old for-loops will work. Let's set up a testcase and see which might have a performance advantage.
List comprehensions are easily recognized as an expression inside of list brackets. Below is a simple example of a list comprehension to create a list of cubes.
[x**3 for x in range(5)] > [0, 1, 8, 27, 64]
We can create the same list using functional concepts. Note, while I'm calling these functional (an idea where functions should not change the values of arguments), there aren't any real guards to stop you from introducing side effects.
list(map(lambda x: x**3, range(5))) > [0, 1, 8, 27, 64]
One way to determine which pieces of code might run faster is to run them! Python has a simple, builtin function which will report on your code. Within the notebook it's %timeit
%timeit [list(range(x)) for x in range(1000)] > 11.6 ms ± 67.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Python ran your code 100 times in 7 different batches.
11.6ms is the average time it took to run those 100 executions.
Having that test run 7 different times means Python can get a better sampling
to ensure the other things going on in your computer won't alter the results.
Let's wrap our code in descriptive function names, add in a for-loop, and see how they stack up.
def comprehension(): [x**3 for x in range(1000)] def list_map_lambda(): list(map(lambda x: x**3, range(1000))) def for_loop(): cubes =  for x in range(1000): cubes.append(x**3) %timeit comprehension() %timeit list_map_lambda() %timeit for_loop() > 242 µs ± 1.98 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) > 270 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) > 273 µs ± 2.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Well firstly, it's good to know we are now talking microseconds (µs) and secondly list comprehension is looking like the fastest option. I think we can explore doing better though!
Did you notice anything interesting about our functional approach? Why do we need to wrap the output of map with a call to list? If we don't already have a list, what does map actually return?
Instead of a list, map() returns an iterator!
An iterator is an object that has a
next method. Instead of generating the entire list at once,
next will return the next item in the sequence.
In addition, you can iterate through an iterator! What if we remove the
list() call and time it now?
def map_lambda(): map(lambda x: x**3, range(1000)) %timeit map_lambda() > 239 ns ± 6.71 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
map_lambda ran exponentially faster than any of the other options!
However, this isn't a fair comparison since we have an iterator instead of a list.
But if we are willing to cut corners like this, doesn't this feel a bit like a generator?
Generators are functions that behave like iterators and have a notation that looks a lot like list comprehension.
(x**3 for x in range(5)) > <generator object <genexpr> at 0x10eeef5e8>
Just switching from brackets to parentheses and now we have a generator.
This returns the same as our
map_lambda function but seems like a much easier syntax.
Remember if we do want the entire list, we can wrap our generator with a call to list.
list((x**3 for x in range(5))) > [0, 1, 8, 27, 64]
Let's go back to our speed test but include a fair comparison with generators this time.
def list_generator(): list((x**3 for x in range(1000))) %timeit comprehension() %timeit list_map_lambda() %timeit list_generator() %timeit for_loop() > 250 µs ± 7.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) > 275 µs ± 4.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) > 261 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) > 289 µs ± 5.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
list_generator was quicker than the list/map/lambda approach,
but still not quicker than the list comprehension.
But let's pull the list() call out again and see where we are.
def generator(): (x**3 for x in range(1000)) %timeit map_lambda() %timeit generator() > 225 ns ± 0.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) > 310 ns ± 7.36 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Surprisingly, using map and lambda is quicker here. This is very close and in this example, it is a difference of nanoseconds. This is a very important point to keep in mind.
It matters! You first need to figure out if performance is really a concern of yours. Secondly, consider if you actually need a list or if an iterator will fit your needs.
These differences here are incredibly subtle micro-optimizations and are most likely going to be unnoticeable. If you really believe that the difference here is going to be impactful, maybe because you run very heavy calculations on very large data sets, then very often you may want to consider moving some work to C. Tiny python micro-optimizations may not give you the benefit you are looking for.
Using a generator expression is plenty fast, reads very nicely in python, and when you don't need the entire list to start it should be something you try out.
Thanks for reading! If you've made it this far then you are probably interested in the material that we will be producing. We have an idea of what we believe will be most valuable to our readers, but hearing from you directly would be even better.
If you have a topic that you are struggling with, a file that you can't seem to work with, or even a dataset that just seems impossible to wrangle, then please let us know. We want to provide you with useful and practical information so you can start using Python today.