【知识分享】用Python完成Excel的常用操作

最新推荐文章于 2026-06-17 16:51:11 发布

原创最新推荐文章于 2026-06-17 16:51:11 发布 · 257 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#运维 #webstorm #ide #学习 #python

本文介绍了如何在Python中使用pandas模块处理Excel数据，包括DataFrame的基本操作、分析2018世界杯数据，如计算比赛场次、胜场最多的球队等，并演示了pandas的CRUD操作。此外，还推荐了Python学习资源包作为读者福利。

Python中的pandas模块

在数据分析中，我们可能会遇到一些处理Excel的场景，处理小数据量时，使用Excel是较为合适的。最为目前最实用的脚本语言python，提供了一个能很好处理Excel数据的模块(pandas)，凡是你能在Excel中实现的操作pandas基本上也能实现，所以这期简单的分享pandas在真实的业务场景下的使用，在工作上使用Excel或对Excel感兴趣的朋友们也可以简单地了解一下。

01.pandas的DataFrame类型

DataFrame:

DataFrame 是表格型的数据结构，每列值的数据类型可以不同，也可以相同
DataFrame 常用于二维数据。
DataFrame 的属性：values,index,columns,dtypes

生成DataFrame:

import pandas as pddf = pd.read_excel("世界杯2018.xlsx")  # 通过读取excel/csv文件生成df.drop_duplicates(inplace=True)print(df)df_manual = pd.DataFrame([['法国', '冠军'], ['德国', '小组赛'], ['巴西', '1/4决赛'], ['阿根廷', '1/8决赛'], ['葡萄牙', '1/8决赛'], ['乌拉圭', '1/4决赛'], ['日本', '1/8决赛']],                         columns=["国家", "成绩"])print(df_manual)

DF的使用方法

df.head(n) # 查看DataFrame对象的前n⾏df.tail(n) # 查看DataFrame对象的最后n⾏df.shape # 查看⾏数和列数，返回的是元组df.info() # 查看索引、数据类型和内存信息df.columns() # 查看字段（⾸⾏）名称df.describe() # 查看数值型列的汇总统计df.dtypes# 查看数据类型df.T # 行和列转置df.sort_values(by= ) # 排序数据df["列名称"].unique() # 查看DataFrame对象中每⼀列的唯⼀值，去重操作df.sample(n=6, replace=True，weights=) # n为随机选择的数量，replace设置选择后能否再被选择，默认是False（不能被选择）df.iloc[n, a:b]  # a，b表示列的索引, n表示行；n行的a到b的数据df[columns]  # columns可以是一个字符串，或列表(获取多个字段的数据)直接根据字段名获取该列数据，每一列都是一个Series类型

02.pandas对Excel数据的分析

2018俄罗斯世界杯数据：

赛程  主队  客队  比分A 组  俄罗斯(主)  沙特阿拉伯(客)  5(主):0(客)...G 组  巴拿马(主)  突尼斯(客)  1(主):2(客)...决赛  法国(主)  克罗地亚(客)  4(主):2(客)-- 如果要全部数据微信私信我

需求：

总共踢了几场比赛
获取所有比赛中胜场最多的球队和进球(不包括点球)最多的球队
获取国家(法国, 德国, 巴西, 阿根廷, 葡萄牙, 乌拉圭, 日本)的18年世界杯成绩

分析思路：

需求1直接获取行数(不包括首行字段，以及数据是不重复的)
需求2先获取每个球队的进球数(胜场)，再对其进行降序排列取第一个行数据
需求3各队的成绩需要筛选，取该届比赛最好的成绩

实现代码:

import pandas as pdimport redf = pd.read_excel("世界杯2018.xlsx")  # 通过读取excel/csv文件生成df.drop_duplicates(inplace=True)  # 对数据进行去重操作# TODO 1.总共踢了几场比赛match_cnt = df.shape[0]  # shape属性返回行数和字段数, 114场country_score = dict()  # 将各队的成绩数据放到字典中# TODO 2.获取所有比赛中胜场最多的球队和进球(不包括点球)最多的球队# 初始化球队最好成绩team_best_score = 0score_sort = {"1/8决赛": 1, "1/4决赛": 2, "半决赛": 3, "3/4决赛": 4, "决赛": 5, "冠军": 6}  # 给不同级别的比赛评级for i in range(match_cnt):    goal = re.findall("(\d)(.*)\(主\):(\d)(.*)\(客\)(.*)", df.iloc[i, -1])    best_score = re.findall("(1/8决赛|1/4决赛|半决赛|3/4决赛|决赛)", df.iloc[i, 0])  # 获取球队淘汰赛后的最好成绩    if goal:        if best_score:            team_best_score = score_sort[best_score[0]]  # 若赛程不是小组赛，则team_best_score会有评级        # 点球决胜的局不计入进球数        if "点球" in goal[0][-1]:            # print(goal)            # print(df.iloc[i, -1])            country_first = df.iloc[i, 1][:-3]  # 获取主队国家            country_second = df.iloc[i, 2][:-3]  # 获取客队国家            # 点球一定会分胜负 goal值：[('1', ' (3)', '1', ' (4)', ' 点球')]            if goal[0][1][-2] > goal[0][3][-2]:                score_first = 1                score_second = 0                # 获胜队伍，最好成绩的标识+1                if country_score[country_first]["best_score"] < team_best_score:                    country_score[country_first]["best_score"] = team_best_score + 1                if country_score[country_second]["best_score"] < team_best_score:                    country_score[country_second]["best_score"] = team_best_score            else:                score_first = 0                score_second = 1                if country_score[country_first]["best_score"] < team_best_score:                    country_score[country_first]["best_score"] = team_best_score                if country_score[country_second]["best_score"] < team_best_score:                    country_score[country_second]["best_score"] = team_best_score + 1            country_score[country_first]["score"] = country_score[country_first]["score"] + score_first  # 对胜场数据进行更新 1 表示胜， 0表示输/平            country_score[country_second]["score"] = country_score[country_second]["score"] + score_second        else:            country_first = df.iloc[i, 1][:-3]  # 获取主队国家            country_second = df.iloc[i, 2][:-3]  # 获取客队国家            # 获取获胜队伍  goal值：[('5', '', '0', '', '')]            if goal[0][0] > goal[0][2]:                score_first = 1                score_second = 0            elif goal[0][0] < goal[0][2]:                score_first = 0                score_second = 1            else:                score_first = 0                score_second = 0            if country_first not in country_score.keys():                country_score[country_first] = {"goal": int(goal[0][0]), "score": score_first, "best_score": team_best_score}  # 首次记录国家的进球，以及胜场            else:                country_score[country_first]["goal"] = country_score[country_first]["goal"] + int(goal[0][0])  # 在原先的进球数加1                country_score[country_first]["score"] = country_score[country_first]["score"] + score_first  # 对胜场数据进行更新 1 表示胜， 0表示输/平                if country_score[country_first]["best_score"] < team_best_score:                    if score_first > 0:                        country_score[country_first]["best_score"] = team_best_score + 1                    else:                        country_score[country_first]["best_score"] = team_best_score            if country_second not in country_score.keys():                country_score[country_second] = {"goal": int(goal[0][2]), "score": score_second, "best_score": team_best_score}            else:                country_score[country_second]["goal"] = country_score[country_second]["goal"] + int(goal[0][2])                country_score[country_second]["score"] = country_score[country_second]["score"] + score_second                if country_score[country_second]["best_score"] < team_best_score:                    if score_second > 1:                        country_score[country_second]["best_score"] = team_best_score + 1                    else:                        country_score[country_second]["best_score"] = team_best_scoreprint(country_score)  # {'俄罗斯': {'goal': 8, 'score': 3, 'best_score': 1}, '法国': {'goal': 14, 'score': 6, 'best_score': 5}, ...}# TODO 获取国家(法国, 德国, 巴西, 阿根廷, 葡萄牙, 乌拉圭, 日本)的18年世界杯成绩country_list = ["法国", "德国", "巴西", "阿根廷", "葡萄牙", "乌拉圭", "日本"]score_sort_list = list(score_sort.keys())for country in country_list:    final_score = country_score[country]["best_score"]    if final_score == 0:        final_score_desc = "小组赛"    else:        final_score_desc = score_sort_list[final_score-1]    print(f"2018届世界杯{country}的最终成绩: {final_score_desc}")

注：这里我还是用了传统python处理数据的方式，你们可以尝试使用pandas的方式

03.pandas中的CURD操作

1. 查询操作df.index[(df['Name'] == 'blue')  # 条件查询df.loc[(df['Name'] == 'charlie') & (df['Type'] =='Raptors')  #  &且条件查询（|或条件查询）df.index[df["赛程"].str.contains("决赛")]  # 模糊查询2. 更新操作df.replace(to_replace, value)df.replace({"sz":{2:-4,6:-4}},inplace=True)  # 将sz这一列中值为2替换为-4，值为6替换为-4df.replace([-4,4],[0,1])#将-4替换为0，4替换为13. 插入操作pd.merge(df1, df2, on=column, how="inner/outer")  # 两个df将通过相同字段名来进行关联合并，关联的方式有in/out两种df['d'] = [1, 2, 3]  # 插入值为[1,2,3]的d列df.loc[len(df)]=[1, 2, 3] # 追加一行新的数据：1,2,34. 删除操作df.drop("列名",axis = 1)  #删除单列df.drop(["列名1","列名2"],axis = 1)  #删除多列 axis:0是行, 1是列  df.drop("行索引") #删除单行df.drop(["行索引","行索引"]) #删除多行df.drop(df.index[[1,3]])  #删除指定位置行df[ df['A'] >= 100]  #删除A列小于100的数值df.drop_duplicates()  # 删除重复行