Beyond Python Logo

When to use Numpy

Written by Matthew Yeager
6-minute read (700 words)
Published: Fri Aug 23 2019
When should I start using Python's Numpy?

numpy is a Python framework optimized for vector and matrix-based algorithms. With a purposed focus on the scientific community, numpy is able to achieve significant performance gains over the flexibility of built-in data structures. In practice, this means you might want to consider using numpy's array over Python's list for data analysis.

numpy's open-source repository showcases over 21,000 commits from 800+ contributors. These core data science packages have been downloaded and used by millions of Data Scientists, IT Professionals, and Business Leaders.

# Adding elements together
element_count = 100000
a = range(element_count)
b = range(element_count)
c = range(element_count)

def python_vector_addition():
    """ Add using list comprehension along with
        built-in functions sum and zip.
    """
    return [sum(x) for x in zip(a, b, c)]

%timeit python_vector_addition()


from operator import add
def python_functional_map_add():
    """ Functional approach leveraging the
        map built-in to apply add.
    """
    return list(map(add, map(add, a, b), c))

%timeit python_functional_map_add()


import numpy as np
np_a = np.arange(element_count)
np_b = np.arange(element_count)
np_c = np.arange(element_count)
def numpy_vector_addition():
    """ Vector addition taking advantage of numpy
        arrays' memory footprint and optimized codebase.
    """
    return np_a + np_b + np_c

%timeit numpy_vector_addition()

> 24.5 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> 17 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
> 127 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The pure Python approaches are taking tens of milliseconds, while our numpy solution is just a hundred microseconds!

If you are interested, we have more content on performance testing. Check out a review on the runtime of list comprehension, functional coding, and generators.

numpy makes sense in a few different workflows and can be an advantage for you as a student, researcher, or software developer. There are several factors that might make numpy the right choice for you.

Consider your audience

Maybe you can write your application in pure Python (what can't be built? :)), however, consider when its time to present and share your work. numpy's matrix-based approach allows a wide range of professionals to understand the operations being performed. The vector-based thinking is similar to classical research environments like Matlab or R.

With numpy you'll find many industry-standard algorithms already battle-tested and well named as to inform your readers to their purpose. Sure, you could also build these functions in Python. But these functions, their maintenance, and lifelong support are now a liability for you and your team. Let's look at a few sample functions.

# Polynomial fitting with Numpy

# Calculate coefficients for 3rd order fit
coefficients = np.polyfit(data.X, data.Y, 3)

# Make a function with the calculated coefficients
coefficent_1d = np.poly1d(coefficients)
fitted_values = [coefficent_1d(x) for x in x_values]

Here we are able to take a series of values and compute fitted values across a polynomial. This built-in support for linear algebra allows for fast and accurate operations alongside easy to read code.

Matrix and vector-based support

numpy's array is the center point of enabling fast, set-based computation. Instead of working with nested loops to access each item, analysts can instead use familiar vector-based mathematics. These operations have already been optimized for performance in a way that you can't get through the use of pure Python. This combination of Python's explicit, easy-to-read syntax and high-level mathematics has allowed for industry-wide adoption and support.

numpy has been established as the backbone for many machine learning and matrix-based computation libraries. For instance, Open Source Computer Vision Library (opencv.org) has more than 2500 optimized algorithms spanning a variety of topics which otherwise would not provide the level of performance required if not for numpy!

Beyond Python Visual Newsletter

Enjoying the content? We send step-by-step visual Python tutorials to your inbox! Be notified when new content is available by the Beyond Python team.



Speed and performance with real arrays

Although the idea of lists and arrays might seem interchangeable in Python, there are key differences that set arrays apart when performing many mathematical operations. Python lists provide a great deal of flexibility in storing a variety of data types and objects in a single list while also providing simple appending and deleting.

Arrays within numpy are values of the same datatype in a continuous memory allocation. This means, as opposed to Python lists, the values will be stored physically next to each other allowing for quick retrieval without needing to look up where the value may be stored.

These typed arrays allow for numpy to use specialized float-based functions instead of needing to spend time casting and converting values to different datatypes. These considerations add significant overhead when performed over and over in a matrix or vector.

Is it time to use numpy?

You need to think about the purpose of your work.

Are you trying to share knowledge with others?
Do you have matrix and vector-based computations?
Are you concerned with the runtime of your application?
Would you be able to leverage the existing data analysis algorithms available within numpy?

I think you only really need to say yes to one of these questions before it would be valuable to start integrating numpy into your work. The time saved in not only in running your code but also being able to leverage the robust numerical framework that will make the move over worthwhile.




Questions, Comments, Concerns?

Thanks for reading! If you've made it this far then you are probably interested in the material that we will be producing. We have an idea of what we believe will be most valuable to our readers, but hearing from you directly would be even better.

Send us an email at questions@beyondpython.com or reach out to us on twitter @BeyondPython

If you have a topic that you are struggling with, a file that you can't seem to work with, or even a dataset that just seems impossible to wrangle, then please let us know. We want to provide you with useful and practical information so you can start using Python today.

Beyond Python Visual Newsletter

Enjoying the content? We send step-by-step visual Python tutorials to your inbox! Be notified when new content is available by the Beyond Python team.



Disclosures & Privacy
All Rights Reserved
© 2019 Beyond Python