第3次

该博客展示了如何使用Python不依赖pandas库处理CSV文件,统计葡萄酒品质分布,按品质分组数据并计算特征均值和中位数。通过读取、清洗、分组及统计分析数据,揭示了不同品质葡萄酒的‘固定酸度’和‘剩余糖量’的平均和中位数差异。此外,还对比了使用pandas库进行相同操作的简便性。

在这里插入图片描述

不使用pandas

导入数据

import csv
path = "C:\学习资料\大二下\python数据分析\作业\葡萄酒作业.csv"
f = open(path, 'r', encoding="utf-8")
# 去掉文件首行
next(f)
reader = csv.reader(f)
conTent = []
for con in reader:
    # 将reader中所有字符串型数字转换成数字类型,并加入conTent
    conTent.append(list(map(eval,con)))
f.close()

统计葡萄酒总共分为几个品质

品质 = set()
for con in conTent:
   品质.add(con[11])
print("葡萄酒总共分为%d个品质,分别为:" % (len(品质)), end = "")
for i in 品质:
    print(str(i) + "  ", end = "")
葡萄酒总共分为7个品质,分别为:3  4  5  6  7  8  9  

将数据集按“品质”分成不同子集, 并打印每个子集前10行数据

import pprint
subSets = []
for i in 品质:
    data = []
    for con in conTent:
        if con[11] == i:
            data.append(con)
    subSets.append(data)
for i in range(len(subSets)):
    print("第%d个子集的前10行数据:" % (i+1))
    pprint.pprint(subSets[i][0:10])
    print('\n')
第1个子集的前10行数据:
[[5.8, 0.24, 0.44, 3.5, 0.029, 5, 109, 0.9913, 3.53, 0.43, 11.7, 3],
 [7.1, 0.32, 0.32, 11, 0.038, 16, 66, 0.9937, 3.24, 0.4, 11.5, 3],
 [6.9, 0.39, 0.4, 4.6, 0.022, 5, 19, 0.9915, 3.31, 0.37, 12.6, 3],
 [7.9, 0.64, 0.46, 10.6, 0.244, 33, 227, 0.9983, 2.87, 0.74, 9.1, 3],
 [8.6, 0.55, 0.35, 15.55, 0.057, 35.5, 366.5, 1.0001, 3.04, 0.63, 11, 3],
 [7.5, 0.32, 0.24, 4.6, 0.053, 8, 134, 0.9958, 3.14, 0.5, 9.1, 3],
 [6.7, 0.25, 0.26, 1.55, 0.041, 118.5, 216, 0.9949, 3.55, 0.63, 9.4, 3],
 [7.1, 0.49, 0.22, 2, 0.047, 146.5, 307.5, 0.9924, 3.24, 0.37, 11, 3],
 [11.8, 0.23, 0.38, 11.1, 0.034, 15, 123, 0.9997, 2.93, 0.55, 9.7, 3],
 [7.6, 0.48, 0.37, 1.2, 0.034, 5, 57, 0.99256, 3.05, 0.54, 10.4, 3]]


第2个子集的前10行数据:
[[6.2, 0.45, 0.26, 4.4, 0.063, 63, 206, 0.994, 3.27, 0.52, 9.8, 4],
 [9.8, 0.36, 0.46, 10.5, 0.038, 4, 83, 0.9956, 2.89, 0.3, 10.1, 4],
 [5.5, 0.485, 0, 1.5, 0.065, 8, 103, 0.994, 3.63, 0.4, 9.7, 4],
 [6.4, 0.595, 0.14, 5.2, 0.058, 15, 97, 0.9951, 3.38, 0.36, 9, 4],
 [7.6, 0.48, 0.37, 0.8, 0.037, 4, 100, 0.9902, 3.03, 0.39, 11.4, 4],
 [6, 0.67, 0.07, 1.2, 0.06, 9, 108, 0.9931, 3.11, 0.35, 8.7, 4],
 [6.5, 0.28, 0.28, 8.5, 0.047, 54, 210, 0.9962, 3.09, 0.54, 8.9, 4],
 [5.8, 0.28, 0.35, 2.3, 0.053, 36, 114, 0.9924, 3.28, 0.5, 10.2, 4],
 [10.2, 0.44, 0.88, 6.2, 0.049, 20, 124, 0.9968, 2.99, 0.51, 9.9, 4],
 [6.8, 0.64, 0.08, 9.7, 0.062, 26, 142, 0.9972, 3.37, 0.46, 8.9, 4]]


第3个子集的前10行数据:
[[8.1, 0.27, 0.41, 1.45, 0.033, 11, 63, 0.9908, 2.99, 0.56, 12, 5],
 [8.6, 0.23, 0.4, 4.2, 0.035, 17, 109, 0.9947, 3.14, 0.53, 9.7, 5],
 [7.9, 0.18, 0.37, 1.2, 0.04, 16, 75, 0.992, 3.18, 0.63, 10.8, 5],
 [8.3, 0.42, 0.62, 19.25, 0.04, 41, 172, 1.0002, 2.98, 0.67, 9.7, 5],
 [6.5, 0.31, 0.14, 7.5, 0.044, 34, 133, 0.9955, 3.22, 0.5, 9.5, 5],
 [5.8, 0.27, 0.2, 14.95, 0.044, 22, 179, 0.9962, 3.37, 0.37, 10.2, 5],
 [7.3, 0.28, 0.43, 1.7, 0.08, 21, 123, 0.9905, 3.19, 0.42, 12.8, 5],
 [6.5, 0.39, 0.23, 5.4, 0.051, 25, 149, 0.9934, 3.24, 0.35, 10, 5],
 [7.3, 0.24, 0.39, 17.95, 0.057, 45, 149, 0.9999, 3.21, 0.36, 8.6, 5],
 [6.2, 0.46, 0.25, 4.4, 0.066, 62, 207, 0.9939, 3.25, 0.52, 9.8, 5]]


第4个子集的前10行数据:
[[7, 0.27, 0.36, 20.7, 0.045, 45, 170, 1.001, 3, 0.45, 8.8, 6],
 [8.1, 0.28, 0.4, 6.9, 0.05, 30, 97, 0.9951, 3.26, 0.44, 10.1, 6],
 [7.2, 0.23, 0.32, 8.5, 0.058, 47, 186, 0.9956, 3.19, 0.4, 9.9, 6],
 [7.2, 0.23, 0.32, 8.5, 0.058, 47, 186, 0.9956, 3.19, 0.4, 9.9, 6],
 [8.1, 0.28, 0.4, 6.9, 0.05, 30, 97, 0.9951, 3.26, 0.44, 10.1, 6],
 [6.2, 0.32, 0.16, 7, 0.045, 30, 136, 0.9949, 3.18, 0.47, 9.6, 6],
 [7, 0.27, 0.36, 20.7, 0.045, 45, 170, 1.001, 3, 0.45, 8.8, 6],
 [6.3, 0.3, 0.34, 1.6, 0.049, 14, 132, 0.994, 3.3, 0.49, 9.5, 6],
 [6.6, 0.27, 0.41, 1.3, 0.052, 16, 142, 0.9951, 3.42, 0.47, 10, 6],
 [6.9, 0.24, 0.35, 1, 0.052, 35, 146, 0.993, 3.45, 0.44, 10, 6]]


第5个子集的前10行数据:
[[6.6, 0.16, 0.4, 1.5, 0.044, 48, 143, 0.9912, 3.54, 0.52, 12.4, 7],
 [7.2, 0.32, 0.36, 2, 0.033, 37, 114, 0.9906, 3.1, 0.71, 12.3, 7],
 [7.4, 0.18, 0.31, 1.4, 0.058, 38, 167, 0.9931, 3.16, 0.53, 10, 7],
 [6.4, 0.26, 0.24, 6.4, 0.04, 27, 124, 0.9903, 3.22, 0.49, 12.6, 7],
 [7, 0.32, 0.34, 1.3, 0.042, 20, 69, 0.9912, 3.31, 0.65, 12, 7],
 [6.9, 0.24, 0.33, 1.7, 0.035, 47, 136, 0.99, 3.26, 0.4, 12.6, 7],
 [6.9, 0.21, 0.33, 1.8, 0.034, 48, 136, 0.9899, 3.25, 0.41, 12.6, 7],
 [8.6, 0.265, 0.36, 1.2, 0.034, 15, 80, 0.9913, 2.95, 0.36, 11.4, 7],
 [6.5, 0.24, 0.32, 7.6, 0.038, 48, 203, 0.9958, 3.45, 0.54, 9.7, 7],
 [6.1, 0.3, 0.56, 2.8, 0.044, 47, 179, 0.9924, 3.3, 0.57, 10.9, 7]]


第6个子集的前10行数据:
[[6.2, 0.66, 0.48, 1.2, 0.029, 29, 75, 0.9892, 3.33, 0.39, 12.8, 8],
 [6.8, 0.26, 0.42, 1.7, 0.049, 41, 122, 0.993, 3.47, 0.48, 10.5, 8],
 [6.7, 0.23, 0.31, 2.1, 0.046, 30, 96, 0.9926, 3.33, 0.64, 10.7, 8],
 [5.2, 0.44, 0.04, 1.4, 0.036, 43, 119, 0.9894, 3.36, 0.33, 12.1, 8],
 [5.2, 0.44, 0.04, 1.4, 0.036, 43, 119, 0.9894, 3.36, 0.33, 12.1, 8],
 [6.8, 0.53, 0.35, 3.8, 0.034, 26, 109, 0.9906, 3.26, 0.57, 12.7, 8],
 [6.1, 0.31, 0.58, 5, 0.039, 36, 114, 0.9909, 3.3, 0.6, 12.3, 8],
 [6.4, 0.32, 0.35, 4.8, 0.03, 34, 101, 0.9912, 3.36, 0.6, 12.5, 8],
 [6, 0.25, 0.28, 2.2, 0.026, 54, 126, 0.9898, 3.43, 0.65, 12.9, 8],
 [5.9, 0.27, 0.29, 11.4, 0.036, 31, 115, 0.9949, 3.35, 0.48, 10.5, 8]]


第7个子集的前10行数据:
[[9.1, 0.27, 0.45, 10.6, 0.035, 28, 124, 0.997, 3.2, 0.46, 10.4, 9],
 [6.6, 0.36, 0.29, 1.6, 0.021, 24, 85, 0.98965, 3.41, 0.61, 12.4, 9],
 [7.4, 0.24, 0.36, 2, 0.031, 27, 139, 0.99055, 3.28, 0.48, 12.5, 9],
 [6.9, 0.36, 0.34, 4.2, 0.018, 57, 119, 0.9898, 3.28, 0.36, 12.7, 9]]

统计每个子集样本量

for i in range(len(subSets)):
    print("第%d个子集的样本量为:%d" % (i+1, len(subSets[i])))
第1个子集的样本量为:14
第2个子集的样本量为:115
第3个子集的样本量为:1020
第4个子集的样本量为:1539
第5个子集的样本量为:616
第6个子集的样本量为:123
第7个子集的样本量为:4

将每个子集中“固定酸度”与“剩余糖量”的数值单独提取出来,用数组存放

scidity = []
sugar = []
for subset in subSets:
    scidity.append([i[0] for i in subset])
    sugar.append([i[3] for i in subset])

利用numpy库中现有函数计算均值与中位数

import numpy as np
for i in range(len(sugar)):
    print("第%d个子集的“固定酸度”均值为:%.2f,中位数值为:%s" % (i+1, np.mean(scidity[i]), np.median(scidity[i])))
    print("           “剩余糖量”均值为:%.2f,中位数值为:%s" % ( np.mean(sugar[i]), np.median(sugar[i])))
第1个子集的“固定酸度”均值为:7.54,中位数值为:7.1
           “剩余糖量”均值为:6.64,中位数值为:4.6
第2个子集的“固定酸度”均值为:7.05,中位数值为:6.9
           “剩余糖量”均值为:4.61,中位数值为:2.5
第3个子集的“固定酸度”均值为:6.91,中位数值为:6.8
           “剩余糖量”均值为:7.11,中位数值为:6.8
第4个子集的“固定酸度”均值为:6.81,中位数值为:6.7
           “剩余糖量”均值为:6.44,中位数值为:5.3
第5个子集的“固定酸度”均值为:6.76,中位数值为:6.7
           “剩余糖量”均值为:5.28,中位数值为:3.8
第6个子集的“固定酸度”均值为:6.71,中位数值为:6.8
           “剩余糖量”均值为:5.72,中位数值为:4.6
第7个子集的“固定酸度”均值为:7.50,中位数值为:7.15
           “剩余糖量”均值为:4.60,中位数值为:3.1

使用pandas

导入数据

import pandas as pd
path = r"C:\学习资料\大二下\python数据分析\作业\葡萄酒作业.csv"
content = pd.read_csv(path)

统计葡萄酒总共分为几个品质

品质 = content['品质'].unique()
print("葡萄酒总共分为%d个品质,分别为:" % len(品质),end = "")
for i in 品质:
    print(str(i) + "  ", end = "")
葡萄酒总共分为7个品质,分别为:6  5  7  8  4  3  9  

将数据集按“品质”分成不同子集, 并打印每个子集前10行数据,统计样本量,计算均值和中位数

content_group = content.groupby('品质')
for key,value in content_group:
    display(value.head(10))
    print("品质为%d的子集样本量:%d" % (key, len(value)))
    print("固定酸度平均值:%.2f,中位数:%s" % (value['固定酸度'].mean(),value['固定酸度'].median()))
    print("剩余糖量平均值:%.2f,中位数:%s" % (value['剩余糖'].mean(),value['剩余糖'].median()))
固定酸度挥发性酸度柠檬酸剩余糖氯化物游离二氧化碳总二氧化硫密度PH值酸碱盐酒精品质
1735.80.240.443.500.0295.0109.00.991303.530.4311.73
3057.10.320.3211.000.03816.066.00.993703.240.4011.53
5246.90.390.404.600.0225.019.00.991503.310.3712.63
7417.90.640.4610.600.24433.0227.00.998302.870.749.13
10018.60.550.3515.550.05735.5366.51.000103.040.6311.03
10467.50.320.244.600.0538.0134.00.995803.140.509.13
11906.70.250.261.550.041118.5216.00.994903.550.639.43
13687.10.490.222.000.047146.5307.50.992403.240.3711.03
144211.80.230.3811.100.03415.0123.00.999702.930.559.73
16577.60.480.371.200.0345.057.00.992563.050.5410.43
品质为3的子集样本量:14
固定酸度平均值:7.54,中位数:7.1
剩余糖量平均值:6.64,中位数:4.6
固定酸度挥发性酸度柠檬酸剩余糖氯化物游离二氧化碳总二氧化硫密度PH值酸碱盐酒精品质
306.20.4500.264.40.06363.0206.00.99403.270.529.84
679.80.3600.4610.50.0384.083.00.99562.890.3010.14
795.50.4850.001.50.0658.0103.00.99403.630.409.74
1056.40.5950.145.20.05815.097.00.99513.380.369.04
1207.60.4800.370.80.0374.0100.00.99023.030.3911.44
1246.00.6700.071.20.0609.0108.00.99313.110.358.74
1326.50.2800.288.50.04754.0210.00.99623.090.548.94
1415.80.2800.352.30.05336.0114.00.99243.280.5010.24
14310.20.4400.886.20.04920.0124.00.99682.990.519.94
1596.80.6400.089.70.06226.0142.00.99723.370.468.94
品质为4的子集样本量:115
固定酸度平均值:7.05,中位数:6.9
剩余糖量平均值:4.61,中位数:2.5
固定酸度挥发性酸度柠檬酸剩余糖氯化物游离二氧化碳总二氧化硫密度PH值酸碱盐酒精品质
88.10.270.411.450.03311.063.00.99082.990.5612.05
98.60.230.404.200.03517.0109.00.99473.140.539.75
107.90.180.371.200.04016.075.00.99203.180.6310.85
128.30.420.6219.250.04041.0172.01.00022.980.679.75
146.50.310.147.500.04434.0133.00.99553.220.509.55
225.80.270.2014.950.04422.0179.00.99623.370.3710.25
237.30.280.431.700.08021.0123.00.99053.190.4212.85
246.50.390.235.400.05125.0149.00.99343.240.3510.05
257.30.240.3917.950.05745.0149.00.99993.210.368.65
316.20.460.254.400.06662.0207.00.99393.250.529.85
品质为5的子集样本量:1020
固定酸度平均值:6.91,中位数:6.8
剩余糖量平均值:7.11,中位数:6.8
固定酸度挥发性酸度柠檬酸剩余糖氯化物游离二氧化碳总二氧化硫密度PH值酸碱盐酒精品质
07.00.270.3620.70.04545.0170.01.00103.000.458.86
18.10.280.406.90.05030.097.00.99513.260.4410.16
27.20.230.328.50.05847.0186.00.99563.190.409.96
37.20.230.328.50.05847.0186.00.99563.190.409.96
48.10.280.406.90.05030.097.00.99513.260.4410.16
56.20.320.167.00.04530.0136.00.99493.180.479.66
67.00.270.3620.70.04545.0170.01.00103.000.458.86
76.30.300.341.60.04914.0132.00.99403.300.499.56
166.60.270.411.30.05216.0142.00.99513.420.4710.06
176.90.240.351.00.05235.0146.00.99303.450.4410.06
品质为6的子集样本量:1539
固定酸度平均值:6.81,中位数:6.7
剩余糖量平均值:6.44,中位数:5.3
固定酸度挥发性酸度柠檬酸剩余糖氯化物游离二氧化碳总二氧化硫密度PH值酸碱盐酒精品质
116.60.1600.401.50.04448.0143.00.99123.540.5212.47
187.20.3200.362.00.03337.0114.00.99063.100.7112.37
297.40.1800.311.40.05838.0167.00.99313.160.5310.07
416.40.2600.246.40.04027.0124.00.99033.220.4912.67
507.00.3200.341.30.04220.069.00.99123.310.6512.07
636.90.2400.331.70.03547.0136.00.99003.260.4012.67
646.90.2100.331.80.03448.0136.00.98993.250.4112.67
668.60.2650.361.20.03415.080.00.99132.950.3611.47
896.50.2400.327.60.03848.0203.00.99583.450.549.77
906.10.3000.562.80.04447.0179.00.99243.300.5710.97
品质为7的子集样本量:616
固定酸度平均值:6.76,中位数:6.7
剩余糖量平均值:5.28,中位数:3.8
固定酸度挥发性酸度柠檬酸剩余糖氯化物游离二氧化碳总二氧化硫密度PH值酸碱盐酒精品质
136.20.660.481.20.02929.075.00.98923.330.3912.88
156.80.260.421.70.04941.0122.00.99303.470.4810.58
436.70.230.312.10.04630.096.00.99263.330.6410.78
1115.20.440.041.40.03643.0119.00.98943.360.3312.18
1125.20.440.041.40.03643.0119.00.98943.360.3312.18
1316.80.530.353.80.03426.0109.00.99063.260.5712.78
1956.10.310.585.00.03936.0114.00.99093.300.6012.38
2286.40.320.354.80.03034.0101.00.99123.360.6012.58
3036.00.250.282.20.02654.0126.00.98983.430.6512.98
4105.90.270.2911.40.03631.0115.00.99493.350.4810.58
品质为8的子集样本量:123
固定酸度平均值:6.71,中位数:6.8
剩余糖量平均值:5.72,中位数:4.6
固定酸度挥发性酸度柠檬酸剩余糖氯化物游离二氧化碳总二氧化硫密度PH值酸碱盐酒精品质
5519.10.270.4510.60.03528.0124.00.997003.200.4610.49
5886.60.360.291.60.02124.085.00.989653.410.6112.49
5927.40.240.362.00.03127.0139.00.990553.280.4812.59
6286.90.360.344.20.01857.0119.00.989803.280.3612.79
品质为9的子集样本量:4
固定酸度平均值:7.50,中位数:7.15
剩余糖量平均值:4.60,中位数:3.1
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值