Python: Pandas 比较 pandas.DataFrame.where 和 numpy.where
pandas.DataFrame.where
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.where.html?pandas.DataFrame.where
DataFrame.where(cond, other=_NoDefault.no_default, *, inplace=False, axis=None, level=None)[source]
Replace values where the condition is False.
Parameters:
cond: bool Series/DataFrame, array-like, or callable
Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).
other: scalar, Series/DataFrame, or callable
Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan
for numpy dtypes, pd.NA
for extension dtypes).
inplace: bool, default False
Whether to perform the operation in place on the data.
axis: int, default None
Alignment axis if needed. For Series this parameter is unused and defaults to 0.
level: int, default None
Alignment level if needed.
Returns:
Same type as caller or None if inplace=True
.
See also
Replace values where the condition is True. Return an object of same shape as self.
Notes
The where method is an application of the if-then idiom. For each element in the calling DataFrame, if cond
is True
the element is used; otherwise the corresponding element from the DataFrame other
is used. If the axis of other
does not align with axis of cond
Series/DataFrame, the misaligned index positions will be filled with False.
The signature for DataFrame.where()
differs from numpy.where()
. Roughly df1.where(m, df2)
is equivalent to np.where(m, df1, df2)
.
For further details and examples see the where
documentation in indexing.
The dtype of the object takes precedence. The fill value is casted to the object’s dtype, if this can be done losslessly.
Examples
s = pd.Series(range(5)) s.where(s > 0) Out: 0 NaN 1 1.0 2 2.0 3 3.0 4 4.0 dtype: float64
s.mask(s > 0) Out: 0 0.0 1 NaN 2 NaN 3 NaN 4 NaN dtype: float64
s = pd.Series(range(5)) t = pd.Series([True, False]) s.where(t, 99) Out: 0 0 1 99 2 99 3 99 4 99 dtype: int64
s.mask(t, 99) Out: 0 99 1 1 2 99 3 99 4 99 dtype: int64
s.where(s > 1, 10) Out: 0 10 1 10 2 2 3 3 4 4 dtype: int64
s.mask(s > 1, 10) Out: 0 0 1 1 2 10 3 10 4 10 dtype: int64
df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B']) df Out: A B 0 0 1 1 2 3 2 4 5 3 6 7 4 8 9
m = df % 3 == 0 df.where(m, -df) Out: A B 0 0 -1 1 -2 3 2 -4 -5 3 6 -7 4 -8 9
df.where(m, -df) == np.where(m, df, -df) Out: A B 0 True True 1 True True 2 True True 3 True True 4 True True
df.where(m, -df) == df.mask(~m, -df) Out: A B 0 True True 1 True True 2 True True 3 True True 4 True True
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4),
index=dates, columns=['A', 'B', 'C', 'D'])
df
Out:
A B C D
2000-01-01 1.091851 2.050447 -0.135930 1.485875
2000-01-02 -1.743172 0.008890 0.885485 0.810029
2000-01-03 -2.075038 -0.958871 -0.915315 -0.961541
2000-01-04 -1.465810 -1.588086 -0.213574 1.901356
2000-01-05 -1.408324 0.925165 -0.198167 0.141326
2000-01-06 -0.797945 0.528236 0.202516 0.393425
2000-01-07 -0.487216 0.007977 0.542896 0.795461
2000-01-08 0.217691 -0.333726 0.920486 -1.472329
df.where(df > 0, df['A'], axis='index')
Out:
A B C D
2000-01-01 1.091851 2.050447 1.091851 1.485875
2000-01-02 -1.743172 0.008890 0.885485 0.810029
2000-01-03 -2.075038 -2.075038 -2.075038 -2.075038
2000-01-04 -1.465810 -1.465810 -1.465810 1.901356
2000-01-05 -1.408324 0.925165 -1.408324 0.141326
2000-01-06 -0.797945 0.528236 0.202516 0.393425
2000-01-07 -0.487216 0.007977 0.542896 0.795461
2000-01-08 0.217691 0.217691 0.920486 0.217691
df.where(df > 0, df[0:1], axis='columns')
Out:
A B C D
2000-01-01 1.091851 2.050447 -0.135930 1.485875
2000-01-02 1.091851 0.008890 0.885485 0.810029
2000-01-03 1.091851 2.050447 -0.135930 1.485875
2000-01-04 1.091851 2.050447 -0.135930 1.901356
2000-01-05 1.091851 0.925165 -0.135930 0.141326
2000-01-06 1.091851 0.528236 0.202516 0.393425
2000-01-07 1.091851 0.007977 0.542896 0.795461
2000-01-08 0.217691 2.050447 0.920486 1.485875
numpy.where
https://numpy.org/doc/stable/reference/generated/numpy.where.html#numpy.where
numpy.where(condition, [x, y, ]/)
Return elements chosen from x or y depending on condition.
Note
When only condition is provided, this function is a shorthand for np.asarray(condition).nonzero()
. Using nonzero
directly should be preferred, as it behaves correctly for subclasses. The rest of this documentation covers only the case where all three arguments are provided.
Parameters:
condition: array_like, bool
Where True, yield x, otherwise yield y.
x, y: array_like
Values from which to choose. x, y and condition need to be broadcastable to some shape.
Returns:
out: ndarray
An array with elements from x where condition is True, and elements from y elsewhere.
Notes
If all the arrays are 1-D, where
is equivalent to:
[xv if c else yv for c, xv, yv in zip(condition, x, y)]
Examples
a = np.arange(10) a Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.where(a < 5, a, 10*a) Out: array([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90])
This can be used on multidimensional arrays too:
np.where([[True, False], [True, True]], [[1, 2], [3, 4]], [[9, 8], [7, 6]]) Out: array([[1, 8], [3, 4]])
The shapes of x, y, and the condition are broadcast together:
x, y = np.ogrid[:3, :4] np.where(x < y, x, 10 + y) # both x and 10+y are broadcast Out: array([[10, 0, 0, 0], [10, 11, 1, 1], [10, 11, 12, 2]])
a = np.array([[0, 1, 2], [0, 2, 4], [0, 3, 6]]) Out: np.where(a < 4, a, -1) # -1 is broadcast array([[ 0, 1, 2], [ 0, 2, -1], [ 0, 3, -1]])