Filtering and aggregating data with NumPy focuses on selecting required elements from arrays and computing summary values such as sum, mean or minimum. These operations are commonly used to analyze numerical data efficiently using simple NumPy functions.
Filtering Data
Filtering data in NumPy is done using boolean conditions applied directly to arrays. The result is a new array containing only those elements that satisfy the given condition.
1. Values Above a Limit: This operation selects all elements whose values are greater than a given number.
- np.random.randint(1, 11, size=(5,5)) generates random integers from 1 (inclusive) to 11 (exclusive)
- size=(5,5) creates a 5×5 NumPy array
- arr > 5 creates a boolean mask where values greater than 5 are marked True
- arr[arr > 5] extracts and returns only the elements that satisfy the condition
import numpy as np
arr = np.random.randint(1, 11, size=(5,5))
print(arr[arr>5])
Output
[ 7 6 8 10 6 9 10 9 6 6]
2. Even-Valued Elements: This approach extracts only those elements that are evenly divisible by 2.
- arr % 2 == 0 checks for even values and arr[arr % 2 == 0] returns only even elements.
import numpy as np
arr = np.random.randint(1, 11, size=(5,5))
print(arr[arr % 2 == 0])
Output
[ 4 2 2 4 2 10 10 4 8 8 4 2 6]
3. Multiple Conditions Combined: This method selects elements that satisfy more than one condition at the same time.
import numpy as np
arr = np.random.randint(1, 11, size=(5,5))
print(arr[(arr > 5) & (arr % 2 == 0)])
Output
[ 6 8 10 8 10]
4. Divisibility-Based Selection: This technique selects elements divisible by at least one of the specified numbers.
- (arr % 3 == 0) | (arr % 7 == 0) logical OR condition
import numpy as np
arr = np.random.randint(1, 11, size=(5,5))
print(arr[(arr % 3 == 0) | (arr % 7 == 0)])
Output
[9 9 9 9 6 7 3 3 9]
5. Boolean Mask from Another Array: This method uses a separate boolean array to select specific rows.
- arr1 boolean selector and arr[arr1] selects rows where value is True
import numpy as np
arr = np.random.randint(1,11,size=(5,5))
arr1 = np.array([True, False, True,False,True])
print(arr[arr1])
Output
[[ 9 5 2 5 8] [10 3 1 5 3] [ 5 10 1 6 4]]
6. Condition Applied to a Single Row: This approach filters elements from a specific row based on a condition.
- arr[2, :] selects the third row, arr[2, :] > 5 applies condition and arr[2][...] extracts matching values
import numpy as np
arr = np.random.randint(1,11,size=(5,5))
filtered_arr = arr[2][arr[2,:] > 5]
print(filtered_arr)
Output
[6 8 7]
Aggregating Data
Aggregation in NumPy refers to computing summary statistics over arrays. Functions such as sum, mean, standard deviation, minimum and maximum help analyze data across the entire array or along specific axes.
1. Total Sum: This operation calculates the sum of all elements in an array and also demonstrates how summation works along rows and columns.
- np.sum(arr) computes the sum of all elements in the array
- axis=0 performs summation column-wise
- axis=1 performs summation row-wise
import numpy as np
arr = np.array([1,2,3,4,5])
print(np.sum(arr))
arr = np.array([[1,2,3],[4,5,6]])
print(np.sum(arr))
print(np.sum(arr, axis = 0))
print(np.sum(arr, axis = 1))
Output
15 21 [5 7 9] [ 6 15]
2. Average Value: This operation computes the mean (average) of array elements across the entire array or along a specified axis.
- np.mean(arr) calculates the average of all values
import numpy as np
arr = np.array([1,2,3,4,5])
print(np.mean(arr))
arr = np.array([[1,2,3],[4,5,6]])
print(np.mean(arr))
print(np.mean(arr, axis = 0))
print(np.mean(arr, axis = 1))
Output
3.0 3.5 [2.5 3.5 4.5] [2. 5.]
3. Spread of Values: This operation measures how much the values in the array vary from the mean using standard deviation.
- np.std(arr) calculates standard deviation of all elements
import numpy as np
arr = np.array([1,2,3,4,5])
print(np.std(arr))
arr = np.array([[1,2,3],[4,5,6]])
print(np.std(arr))
print(np.std(arr, axis = 0))
print(np.std(arr, axis = 1))
Output
1.4142135623730951 1.707825127659933 [1.5 1.5 1.5] [0.81649658 0.81649658]
4. Smallest and Largest Values: These operations identify the minimum and maximum values present in the array.
- np.min(arr) returns the smallest value in the array
- np.max(arr) returns the largest value in the array
import numpy as np
arr = np.array([1,2,3,4,5])
print(np.min(arr))
arr = np.array([[1,2,3],[4,5,6]])
print(np.min(arr))
print(np.min(arr, axis = 0))
print(np.min(arr, axis = 1))
print('-'*20)
arr = np.array([1,2,3,4,5])
print(np.max(arr))
arr = np.array([[1,2,3],[4,5,6]])
print(np.max(arr))
print(np.max(arr, axis = 0))
print(np.max(arr, axis = 1))
Output
1 1 [1 2 3] [1 4] -------------------- 5 6 [4 5 6] [3 6]