NaN Values#

Introduction#

NaN (Not a Number) values are ubiquitous in data science, often used to represent missing or undefined data. In this tutorial, we will learn how to handle NaN values in Python using numpy and pandas libraries.

Prerequisites#

Basic Python knowledge, basic understanding of numpy and pandas libraries. Make sure you have numpy and pandas installed in your Python environment.

Importing Necessary Libraries#

Let’s start by importing necessary libraries.

import numpy as np
import pandas as pd

Creating NaN values Let’s create a numpy array and a pandas DataFrame containing NaN values.

nan_array = np.array([1, 2, np.nan, 4, 5])
nan_df = pd.DataFrame({
    'A': [1, 2, np.nan, 4, 5],
    'B': [np.nan, 2, 3, 4, 5]
})
nan_df
A B
0 1.0 NaN
1 2.0 2.0
2 NaN 3.0
3 4.0 4.0
4 5.0 5.0

Arithmetic Operations with NaN#

Now we can see how NaN values affect arithmetic operations. For example, if we try to calculate the mean of the numpy array, we will get a NaN result

print("Mean of numpy array:", np.mean(nan_array))
Mean of numpy array: nan

This is because any arithmetic operation with NaN results in NaN. We can avoid this by using the np.nanmean function instead, which ignores NaN values

print("Mean of numpy array ignoring NaN:", np.nanmean(nan_array))
Mean of numpy array ignoring NaN: 3.0
assert np.nanmean(nan_array) == 3, 'np.nanmean did not work correctly!'

In pandas, the mean function automatically ignores NaN values

Hide code cell content
print("Mean of pandas DataFrame columns:\n", nan_df.mean())
Mean of pandas DataFrame columns:
 A    3.0
B    3.5
dtype: float64

Use :tags: [remove-input, remove-output] in a {code-cell} to include assert statements to make sure everything is working as expected.

Let’s say we want to render the following cell:

\```{code-cell}
:tags: [remove-input, remove-output]
assert nan_df.mean().equals(pd.Series({'A': 3, 'B': 3.5})), 'pandas mean function did not work correctly!'
\```

start of the output

end of the output

Find out more how to hide- or `remove-´ output here.

Use :tags: [raises-exception] if you want to show an error on purpose. Any other Exception will stop the documentation build process, as we force to nb_execution_raise_on_error = True in the conf.py.

raise ValueError("oopsie! ")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 1
----> 1 raise ValueError("oopsie! ")

ValueError: oopsie! 

Conclusion#

In summary, handling NaN values is important when working with data in Python. NaN values can impact the results of arithmetic operations. Both numpy and pandas provide functions to handle NaN values effectively.