索引的不同选择:
.loc is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found. Allowed inputs are:
- A single label, e.g. 5 or ‘a’ (Note that 5 is interpreted as a label
of the index. This use is not an integer position along the index.). - A list or array of labels [‘a’, ‘b’, ‘c’].
- A slice object with labels ‘a’:‘f’ (Note that contrary to usual
python slices, both the start and the stop are included, when present
in the index! See Slicing with labels.). - A boolean array
- A callable function with one argument (the calling Series, DataFrame
or Panel) and that returns valid output for indexing (one of the
above).
.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with Python/NumPy slice semantics). Allowed inputs are:
-
An integer e.g. 5.
-
A list or array of integers [4, 3, 0].
-
A slice object with ints 1:7.
-
A boolean array.
-
A callable function with one argument (the calling Series, DataFrame
or Panel) and that returns valid output for indexing (one of the
above).
Any of the axes accessors may be the null slice :. Axes left out of the specification are assumed to be :, e.g. p.loc[‘a’] is equivalent to p.loc[‘a’, :, :].

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner:
直接使用[]选取列进行设置列可以进行列值交换:
df[['B', 'A']] = df[['A', 'B']]
You may find this useful for applying a transform (in-place) to a subset of the columns.
Warning
pandas aligns all AXES when setting Series and DataFrame from .loc, and .iloc.
This will not modify df because the column alignment is before value assignment.
In [12]: df[['A', 'B']]
Out[12]:
A B
2000-01-01 -0.282863 0.469112
2000-01-02 -0.173215 1.212112
2000-01-03 -2.104569 -0.861849
2000-01-04 -0.706771 0.721555
2000-01-05 0.567020 -0.424972
2000-01-06 0.113648 -0.673690
2000-01-07 0.577046 0.404705
2000-01-08 -1.157892 -0.370647
In [13]: df.loc[:, ['B', 'A']] = df[['A', 'B']]
In [14]: df[['A', 'B']]
Out[14]:
A B
2000-01-01 -0.282863 0.469112
2000-01-02 -0.173215 1.212112
2000-01-03 -2.104569 -0.861849
2000-01-04 -0.706771 0.721555
2000-01-05 0.567020 -0.424972
2000-01-06 0.113648 -0.673690
2000-01-07 0.577046 0.404705
2000-01-08 -1.157892 -0.370647
使用loc却不行,因为在设定值之前会进行列对齐。
The correct way to swap column values is by using raw values:
df.loc[:, ['B', 'A']] = df[['A', 'B']].to_numpy()
切片范围:
With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels:
s[:5]
s[::2]
s[::-1]
s2[:5] = 0
With DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.
基于标签选择
.loc is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex. These will raise a TypeError.
In [39]: dfl = pd.DataFrame(np.random.randn(5, 4),
....: columns=list('ABCD'),
....: index=pd.date_range('20130101', periods=5))
....:
In [40]: dfl
Out[40]:
A B C D
2013-01-01 1.075770 -0.109050 1.643563 -1.469388
2013-01-02 0.357021 -0.674600 -1.776904 -0.968914
2013-01-03 -1.294524 0.413738 0.276662 -0.472035
2013-01-04 -0.013960 -0.362543 -0.006154 -0.923061
2013-01-05 0.895717 0.805244 -1.206412 2.565646
In [4]: dfl.loc[2:3]
TypeError: cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2] of <type 'int'>
String likes in slicing can be convertible to the type of the index and lead to natural slicing.
In [41]: dfl.loc['20130102':'20130104']
Out[41]:
A B C D
2013-01-02 0.357021 -0.674600 -1.776904 -0.968914
2013-01-03 -1.294524 0.413738 0.276662 -0.472035
201

本文详细介绍了Python中数据选择和索引的各种方法,包括.loc和.iloc的区别,基于标签和位置的选择,使用布尔数组,以及使用callable函数进行选择。同时,文章还讨论了重索引、随机采样、布尔索引、where方法、query方法和多级索引等高级用法,并提醒了设置值时避免链式索引可能导致的问题及其解决策略。

1万+

被折叠的 条评论
为什么被折叠?



