Lists, arrays, and dataframes are some of the most useful datatypes available in Python. Learning how to handle these structures is important to anyone coding with Python. In this post, we'll learn how to access values with basic slicing. Slicing is simply just selecting a subset of the data.
First, let's start with the list. The slicing operations on the built-in list set up the fundamentals for dealing with arrays and dataframes.
# Make a list alphabet = 'abcdefghijklmnopqrstuvwxyz' abcs = list(alphabet) abcs > ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
To select an element in the list we use square brackets with an index value,
note that the index in Python starts at
# Select the first (0th) element in the list abcs > 'a'
# Select the third element in the list at index=2 abcs > 'c'
# Select the 2nd to the last element with index=-2 abcs[-2] > 'y'
What about if we need to select multiple elements? We can use the start and stop indexes!
# Select the 2nd to 5th elements in the list # Remember the zeroth item starts the list abcs[2:5] > ['c', 'd', 'e']
abcs > 'f'
The element at index=5 is
f, but our slice with stop=5 stopped at
This is because the stop value stops at the index provided and does not return the value at that index.
We can also do the slice with a negative index.
abcs[-24:-21] > ['c', 'd', 'e']
Let's look at how we could grab everything except the last 10 elements of a list.
# Leave off the last 10 elements # Calculate which index to stop at abcs[0:len(abcs)-10] > ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']
Well not only is that difficult to understand, but I also think we might have gotten lucky due to the length of the list. What happens when we try this on a smaller list. If a list only has 6 elements and I want to exclude the last 10, then I expect to find an empty list.
abcdef = list('abcdef') abcdef[0:len(abcdef) - 10] > ['a', 'b']
Yikes! That isn't what we wanted.
With a smaller list, we ended up asking for index
# Account for smaller lists abcdef[0:max(0, len(abcdef) - 10)] > 
Ahh! We just made it more confusing! With negative indexes, we can easily read the code and have the desired results.
Note, we can leave the start or stop index empty to select the rest of the list.
# Start from 0 and leave off the last 10 elements abcdef[:-10] > 
# Grab everything except the last 10 abcs[:-10] > ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']
# Return only the last 10 items abcs[-10:] > ['q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
List slicing has one more option and that is adding a step size.
By default the step size is
1, meaning Python will move from
1 element at a time as we saw above.
But, maybe you want to select every third element in the list.
# From the zeroth item, move forward 3 items, # selecting every third element with step=3 abcs[::3] > ['a', 'd', 'g', 'j', 'm', 'p', 's', 'v', 'y']
With the above example, we left both start and stop indexes blank. This simply means that we will start at the 0th element and go until we reach the end of the list.
# Produce a list of character pairs # like ['ab', 'cd', ... 'yz'] evens = abcs[::2] odds = abcs[1::2] pairs = [first + second for first, second in zip(evens, odds)] pairs > ['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']
We were able to pull off some pretty complex operations in a few simple, readable lines.
We produced a list of
evens by starting at
2 spots until the end, and a list of
odds by starting at
Maybe you haven't seen
This built-in function returns items from the same index for each list.
So the 0th indexed from
evens ('a') and
then the next items in order ('c', 'd').
You could imagine having start and stop times in a list and needing to compute the difference.
Splitting the list by odds/evens and using
zip would be a straight-forward approach!
Now we can do a neat, and sometimes very useful operation, reversing items in a Python list!
abcs[::-1] > ['z', 'y', 'x', 'w', 'v', 'u', 't', 's', 'r', 'q', 'p', 'o', 'n', 'm', 'l', 'k', 'j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']
Slicing is a powerful concept that also works with strings! Reversing the alphabet or a string of words will works just the same.
alphabet[::-1] > 'zyxwvutsrqponmlkjihgfedcba'
'hey whats going on'[::-1] > 'no gniog stahw yeh'
So what happened above? We left the start and stop indexes blank,
and from above we know that means we will be operating on the entire list.
Then the step size is
That means we will step through the list backwards!
That results in a reversed list.
Slicing is pretty easy, but those were just 1D lists. How about something more complicated? Let's take a look at 2D lists.
Note that this is actually a nested list. Python sees a 2D list as a list in a list.
# Make a 2D lists abcABCs = [['a', 'b', 'c', 'd', 'e'], ['A', 'B', 'C', 'D', 'E']] abcABCs > [['a', 'b', 'c', 'd', 'e'], ['A', 'B', 'C', 'D', 'E']]
# Select the first row abcABCs > ['a', 'b', 'c', 'd', 'e']
# Select the first element of the first row abcABCs > 'a'
It's important to note that the above operation is really 2 operations. You may be tempted to read this as first slicing all the rows and then slicing the column indexes, but it isn't! What happened is that we initially grabbed just the first list.
Then on that list, we selected the 0th element. Here is the more explicit way to write this.
first_list = abcABCs just_a_item = first_list just_a_item > 'a'
Why does this distinction matter? Let's take a look at 2D slicing to answer that. Our target is to retrieve the first two elements of the first two rows. So the output we are aiming for is:
[['a', 'b'], ['A', 'B']]
abcABCs[0:2][0:2] > [['a', 'b', 'c', 'd', 'e'], ['A', 'B', 'C', 'D', 'E']]
Hmm... well that didn't work. That's because of a comment that was made earlier. Python doesn't actually understand that the structure we make is what we consider to be 2 dimensional.
Remember that what we are doing above is actually 2 operations. Let's look at each operation to understand what just happened.
The first operation says to select the first 2 elements of the list,
The first two elements are the 2 nested lists(!), so we've just selected everything!
So if we just selected everything and then do the exact same operation, then we once again select everything!
So this should make it clear that what we have isn't really 2D, it is just a list of lists. We could get our desired output by looping over the lists and selecting the first 2 elements of each.
[['a', 'b'], ['A', 'B']]
[sub_alphabet[0:2] for sub_alphabet in abcABCs] > [['a', 'b'], ['A', 'B']]
Well, we got what we wanted, but I wouldn't call that convenient.
This is where a module like
numpy can be very helpful!
import numpy as np abcs = np.array(list(alphabet)) abcs > array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'], dtype='<U1')
Now we've turned our list into a numpy array. The good news is that everything we learned above still works. We don't need to go through everything again, but here are a couple of checks.
# Select a subset of the list abcs[2:5] > array(['c', 'd', 'e'], dtype='<U1')
# Reverse the list abcs[::-1] > array(['z', 'y', 'x', 'w', 'v', 'u', 't', 's', 'r', 'q', 'p', 'o', 'n', 'm', 'l', 'k', 'j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a'], dtype='<U1')
So the 1D array operates exactly like a list. How about our 2D example?
# Make a 2D lists abcABCs = np.array([['a', 'b', 'c', 'd', 'e'], ['A', 'B', 'C', 'D', 'E']]) abcABCs > array([['a', 'b', 'c', 'd', 'e'], ['A', 'B', 'C', 'D', 'E']], dtype='<U1')
Now can we get our desired result by using numpy's multidimensional indexing. Lucky for us, this will look very familar to the syntax we used for lists!
So with the same structure, we just simply add another set of indexes to the slice operation. Now we can get our desired output of:
abcABCs[0:2, 0:2] > array([['a', 'b'], ['A', 'B']], dtype='<U1')
Nice! Finally got want we wanted and we didn't have to do any crazy looping just to slice the array.
The last thing to note about numpy is that it is n-dimensional. No need to stick to the easily understandable 2D array. We can move into the much more confusing world of 3D and beyond!
# Build a 3D array abcABC3D = np.array([ [['a', 'b', 'c', 'd'], ['A', 'B', 'C', 'D']], [['one', 'two', 'three', 'four'], ['1', '2', '3', '4']]]) # Slice the 3D array to grab the # first two elements of each dimension abcABC3D[0:2, 0:2, 0:2] > array([[['a', 'b'], ['A', 'B']], [['one', 'two'], ['1', '2']]], dtype='<U5')
If you want to apply the same slicing selection (
0:2) to all dimensions,
then you can take advantage of using an ellipsis. An ellipsis is three periods in a row
# Slice the first two elements of # each dimension with the use of an ellipsis abcABC3D[..., :2] > array([[['a', 'b'], ['A', 'B']], [['one', 'two'], ['1', '2']]], dtype='<U5')
Slicing is a powerful concept in Python.
Simple, readable access to the data empowers users to explore what might be possible.
With a few references to the code above, you will have mastered slicing and the
tools that come along with
start:stop indexing, negative index locations,
and changing the final
You'll find slicing come up with your work on lists, tuples, arrays, dataframes, and even strings. Let us know if we missed something you were looking for!
Thanks for reading! If you've made it this far then you are probably interested in the material that we will be producing. We have an idea of what we believe will be most valuable to our readers, but hearing from you directly would be even better.
If you have a topic that you are struggling with, a file that you can't seem to work with, or even a dataset that just seems impossible to wrangle, then please let us know. We want to provide you with useful and practical information so you can start using Python today.