numpy is a Python framework optimized for vector and matrix-based algorithms.
With a purposed focus on the scientific community,
numpy is able to achieve
significant performance gains over the flexibility of built-in data structures.
In practice, this means you might want to consider using
array over Python's list for data analysis.
numpy's open-source repository
showcases over 21,000 commits from 800+ contributors.
These core data science packages have been downloaded and used by millions of
Data Scientists, IT Professionals, and Business Leaders.
# Adding elements together element_count = 100000 a = range(element_count) b = range(element_count) c = range(element_count) def python_vector_addition(): """ Add using list comprehension along with built-in functions sum and zip. """ return [sum(x) for x in zip(a, b, c)] %timeit python_vector_addition() from operator import add def python_functional_map_add(): """ Functional approach leveraging the map built-in to apply add. """ return list(map(add, map(add, a, b), c)) %timeit python_functional_map_add() import numpy as np np_a = np.arange(element_count) np_b = np.arange(element_count) np_c = np.arange(element_count) def numpy_vector_addition(): """ Vector addition taking advantage of numpy arrays' memory footprint and optimized codebase. """ return np_a + np_b + np_c %timeit numpy_vector_addition() > 24.5 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) > 17 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) > 127 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The pure Python approaches are taking tens of milliseconds,
numpy solution is just a hundred microseconds!
If you are interested, we have more content on performance testing. Check out a review on the runtime of list comprehension, functional coding, and generators.
numpy makes sense in a few different workflows and can be an
advantage for you as a student, researcher, or software developer.
There are several factors that might make
numpy the right choice for you.
Maybe you can write your application in pure Python (what can't be built? :)),
however, consider when its time to present and share your work.
numpy's matrix-based approach allows a wide range
of professionals to understand the operations being performed.
The vector-based thinking is similar to classical research environments like Matlab or R.
numpy you'll find many industry-standard algorithms already battle-tested
and well named as to inform your readers to their purpose.
Sure, you could also build these functions in Python.
But these functions, their maintenance, and lifelong support are now a liability for you and your team.
Let's look at a few sample functions.
# Polynomial fitting with Numpy # Calculate coefficients for 3rd order fit coefficients = np.polyfit(data.X, data.Y, 3) # Make a function with the calculated coefficients coefficent_1d = np.poly1d(coefficients) fitted_values = [coefficent_1d(x) for x in x_values]
Here we are able to take a series of values and compute fitted values across a polynomial. This built-in support for linear algebra allows for fast and accurate operations alongside easy to read code.
numpy's array is the center point of enabling fast,
set-based computation. Instead of working with nested loops to access each item,
analysts can instead use familiar vector-based mathematics.
These operations have already been optimized for performance
in a way that you can't get through the use of pure Python.
This combination of Python's explicit, easy-to-read syntax and high-level
mathematics has allowed for industry-wide adoption and support.
numpy has been established as the backbone
for many machine learning and matrix-based computation libraries.
For instance, Open Source Computer Vision Library (opencv.org) has more than 2500 optimized algorithms
spanning a variety of topics which otherwise would not provide
the level of performance required if not for
Although the idea of lists and arrays might seem interchangeable in Python,
there are key differences that set arrays apart when performing many mathematical operations.
lists provide a great deal of flexibility in storing a variety of data types
and objects in a single list while also providing simple appending and deleting.
numpy are values of the same datatype in a continuous memory allocation.
This means, as opposed to Python lists, the values will be stored physically next to each
other allowing for quick retrieval without needing to look up where the value may be stored.
These typed arrays allow for
numpy to use specialized float-based functions
instead of needing to spend time casting and converting values to different datatypes. These
considerations add significant overhead when performed over and over in a matrix or vector.
You need to think about the purpose of your work.
Are you trying to share knowledge with others?
Do you have matrix and vector-based computations?
Are you concerned with the runtime of your application?
Would you be able to leverage the existing data analysis algorithms available within numpy?
I think you only really need to say yes to one of these questions before it would be valuable to start integrating
numpy into your work.
The time saved in not only in running your code but also being able to leverage
the robust numerical framework that will make the move over worthwhile.
Thanks for reading! If you've made it this far then you are probably interested in the material that we will be producing. We have an idea of what we believe will be most valuable to our readers, but hearing from you directly would be even better.
If you have a topic that you are struggling with, a file that you can't seem to work with, or even a dataset that just seems impossible to wrangle, then please let us know. We want to provide you with useful and practical information so you can start using Python today.