统计文章里的词频并降序输出

最新推荐文章于 2023-12-13 13:01:44 发布

原创最新推荐文章于 2023-12-13 13:01:44 发布 · 1.6k 阅读

8 ·

本内容遵循CC 4.0 BY-SA版权协议

python 专栏收录该内容

13 篇文章

订阅专栏

本文介绍了使用Python进行文本处理的方法，包括读取文件、转换文本为小写、去除标点符号、统计词频等基本操作，并展示了如何从CSV文件中读取数据，处理并排序国家数据。通过实际代码示例，读者可以学习到Python在处理文本和数据方面的强大功能。

python基础

资料
onelife.txt

import re
from string import punctuation
# 读取文件
with open('D://onelife.txt', encoding='utf-8') as f1:
    contents = f1.readlines()
# 遍历每行的单词
for content in contents:
    # 将字母转成小写
    content = content.lower()
    # 过滤标点符号
    content = re.sub('[{}]'.format(punctuation + '《》'), ' ', content)
    # 定义一个空的字典用来统计词频
    WordConut = {}
    # 将每行单词转成列表
    words = content.split()
    for word in words:
        # 判断单词是否在字典中 存在加1
        if word in WordConut:
            WordConut[word] += 1
        else:
            WordConut[word] = 1
# 将字典转成列表
WordConut = WordConut.items()
# 对列表进行排序
items = sorted(WordConut, key=lambda x: x[1])
# 按词频降序
for i in range(len(items)-1, 0, -1):
    print(items[i][0],':',items[i][1])

在这里插入图片描述

countries_zh.csv

# 引入有模板首航跳过
from itertools import islice
# 定义一个空字典
direct = {}
# 读文件
with open('D://countries_zh.csv', encoding='utf-8') as  f1:
    # 首行跳过
    for line in islice(f1, 1, None):
        # 将每行截成字符数组
        item = line.split(',')
        # 将字符串转成整形
        item[4] = int(item[4].split('\n')[0])
        # 将每行的单词以key:value写入字典中
        direct[item[0] + ',' + item[1] + ',' + item[2] + ',' + item[3]] = item[4]
# 将字典转成列表
direct = direct.items()
# 对列表排序
list = sorted(direct, key=lambda x: x[1])
# 最后在对列表降序
for i in range(len(list) - 1, 0, -1):
    print(list[i][0], ',', (list[i][1]))

在这里插入图片描述