python_note:Indexing an Selecting Data

本文详细介绍了Python中数据选择和索引的各种方法,包括.loc和.iloc的区别,基于标签和位置的选择,使用布尔数组,以及使用callable函数进行选择。同时,文章还讨论了重索引、随机采样、布尔索引、where方法、query方法和多级索引等高级用法,并提醒了设置值时避免链式索引可能导致的问题及其解决策略。

索引的不同选择:

.loc is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found. Allowed inputs are:

  • A single label, e.g. 5 or ‘a’ (Note that 5 is interpreted as a label
    of the index. This use is not an integer position along the index.).
  • A list or array of labels [‘a’, ‘b’, ‘c’].
  • A slice object with labels ‘a’:‘f’ (Note that contrary to usual
    python slices, both the start and the stop are included, when present
    in the index! See Slicing with labels.).
  • A boolean array
  • A callable function with one argument (the calling Series, DataFrame
    or Panel) and that returns valid output for indexing (one of the
    above).

.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with Python/NumPy slice semantics). Allowed inputs are:

  • An integer e.g. 5.

  • A list or array of integers [4, 3, 0].

  • A slice object with ints 1:7.

  • A boolean array.

  • A callable function with one argument (the calling Series, DataFrame
    or Panel) and that returns valid output for indexing (one of the
    above).

Any of the axes accessors may be the null slice :. Axes left out of the specification are assumed to be :, e.g. p.loc[‘a’] is equivalent to p.loc[‘a’, :, :].

在这里插入图片描述

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner:
直接使用[]选取列进行设置列可以进行列值交换:

df[['B', 'A']] = df[['A', 'B']]

You may find this useful for applying a transform (in-place) to a subset of the columns.

Warning

pandas aligns all AXES when setting Series and DataFrame from .loc, and .iloc.

This will not modify df because the column alignment is before value assignment.

In [12]: df[['A', 'B']]
Out[12]: 
                   A         B
2000-01-01 -0.282863  0.469112
2000-01-02 -0.173215  1.212112
2000-01-03 -2.104569 -0.861849
2000-01-04 -0.706771  0.721555
2000-01-05  0.567020 -0.424972
2000-01-06  0.113648 -0.673690
2000-01-07  0.577046  0.404705
2000-01-08 -1.157892 -0.370647

In [13]: df.loc[:, ['B', 'A']] = df[['A', 'B']]

In [14]: df[['A', 'B']]
Out[14]: 
                   A         B
2000-01-01 -0.282863  0.469112
2000-01-02 -0.173215  1.212112
2000-01-03 -2.104569 -0.861849
2000-01-04 -0.706771  0.721555
2000-01-05  0.567020 -0.424972
2000-01-06  0.113648 -0.673690
2000-01-07  0.577046  0.404705
2000-01-08 -1.157892 -0.370647

使用loc却不行,因为在设定值之前会进行列对齐。
The correct way to swap column values is by using raw values:

df.loc[:, ['B', 'A']] = df[['A', 'B']].to_numpy()

切片范围:
With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels:

s[:5]
s[::2]
s[::-1]
s2[:5] = 0

With DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.

基于标签选择
.loc is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex. These will raise a TypeError.

In [39]: dfl = pd.DataFrame(np.random.randn(5, 4),
   ....:                    columns=list('ABCD'),
   ....:                    index=pd.date_range('20130101', periods=5))
   ....: 

In [40]: dfl
Out[40]: 
                   A         B         C         D
2013-01-01  1.075770 -0.109050  1.643563 -1.469388
2013-01-02  0.357021 -0.674600 -1.776904 -0.968914
2013-01-03 -1.294524  0.413738  0.276662 -0.472035
2013-01-04 -0.013960 -0.362543 -0.006154 -0.923061
2013-01-05  0.895717  0.805244 -1.206412  2.565646

In [4]: dfl.loc[2:3]
TypeError: cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2] of <type 'int'>

String likes in slicing can be convertible to the type of the index and lead to natural slicing.


In [41]: dfl.loc['20130102':'20130104']
Out[41]: 
                   A         B         C         D
2013-01-02  0.357021 -0.674600 -1.776904 -0.968914
2013-01-03 -1.294524  0.413738  0.276662 -0.472035
201
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值