Pandas Axis

While using pandas, particularly with the drop method, you might receive an error message that includes reference to an ‘axis‘ (explained nicely here), or you might run into the concept in the Pandas documentation, but what is it? And why don’t the examples you find online work for your dataset? You might see the following

KeyError: "['id'] not found in axis"

We will start off to understand what is the axis and what it looks like in our dataset, then we will understand the problem isn’t with our data, just that we are not formulating our request correctly, possibly led astray from the examples we’ve see. Or you can jump straight to the solution.

Dataset from CSV File

Let’s start with with our CSV file, that we are creating our dataset from.

id,colour,weight,type,quantity
1,blue,finger,wool,22.0
2,green,finger,wool,4.0
3,blue,sock,wool-blend,20.0
6,black,chunky,acrylic,35.0
7,white,chunky,acrylic,11.0
10,white,finger,wool,10.0
9,yellow,sock,wool-blend,5.0
14,purple,bulky,cotton,3.0
15,opal,bulky,cotton,30

We can get insights into its axis, by printing it

import pandas as pd

database = "database.csv"
df = pd.read_csv(database)
print(df.axes)

When we run it, we get the following

[RangeIndex(start=0, stop=9, step=1), Index(['id', 'colour', 'weight', 'type', 'quantity'], dtype='object')]

Where we can see the RangeIndex references to the 9 rows, which are numbered from 0 .. 9. This isn’t shown in our file, but if we print the data itself, we can see this. We can also see the Index, which our list of columns.

import pandas as pd

database = "database.csv"
df = pd.read_csv(database)
print(df)

   id  colour  weight        type  quantity
0   1    blue  finger        wool      22.0
1   2   green  finger        wool       4.0
2   3    blue    sock  wool-blend      20.0
3   6   black  chunky     acrylic      35.0
4   7   white  chunky     acrylic      11.0
5  10   white  finger        wool      10.0
6   9  yellow    sock  wool-blend       5.0
7  14  purple   bulky      cotton       3.0
8  15    opal   bulky      cotton      30.0

So, the above error with the KeyIndex means we are not using the methods correctly. In this case, it was generated by trying to misuse the drop command. With df = df.drop(index = idField, columns=index)we get KeyError: "['id'] not found in axis".

Manually Created Data Set

Most of the examples we see online for pandas are using a manually created data set, or at least a data set that may not be created in the same way ours is. Is this the problem?

import pandas as pd

d = {'id': [0, 1, 2, 3], 'colour': pd.Series(['blue', 'green', 'yellow', 'white'], index=[0, 1, 2, 3]),
     'weight': pd.Series(['chunky', 'sport', 'fine', 'sock'], index=[0, 1, 2, 3]),
     'type': pd.Series(['wool', 'wool', 'cotton', 'bamboo'], index=[0, 1, 2, 3]),
     'quantity': pd.Series([5, 10, 20, 30], index=[0, 1, 2, 3])}

df = pd.DataFrame(data=d, index=[0, 1, 2, 3])

print(df)

print("----")
print(df.axes)

We see more clearly here the use of the pd.Series, which is based on N-dimentional array, which does in fact leverage axis.

When we run this, we see a very similar axis to the one we got from reading in our csv file. So, the dataset is not the issue

   id  colour  weight    type  quantity
0   0    blue  chunky    wool         5
1   1   green   sport    wool        10
2   2  yellow    fine  cotton        20
3   3   white    sock  bamboo        30
----
[Index([0, 1, 2, 3], dtype='int64'), Index(['id', 'colour', 'weight', 'type', 'quantity'], dtype='object')]

The Solution

If you look carefully at the drop examples, they tend to remove entire columns from the dataset. Even the ones that look like the are removing rows, just like we are trying to do, are not quite doing as they seem. In our example, while we call the id an Index in our head, it isn’t quite what Pandas is looking for in this case. In fact, the drop() method can be limited to label-based and position index based series objects. For our use case, we need to first find the row(s) of interest and then pass them to the drop function. Of course, this could be done in a single request, but we’ve split it out for readability.

Find the row in the database and drop using its (native) index.
- theRow = df.loc[df[idField] == int(index)]
- df.drop(theRow.index)