从英文文章中拆分单词的方法

最新推荐文章于 2026-06-19 17:00:49 发布

原创

最新推荐文章于 2026-06-19 17:00:49 发布 · 925 阅读

标签

#python #list

收录于

本文介绍了两种从英文文章中拆分单词的方法：自定义拆分字符和使用正则表达式。自定义拆分虽然代码繁琐，但能精确处理空格；而正则表达式简洁优雅，但可能无法保留空格。

第一种方法：自定义拆分字符的方式并进行拆分

txt='''  i am the fastest man alive.
when i was eight, i was presenting to the room.'''
txt=txt+" "
#处理文本

alist=[]
#定义结果列表

pos1=-1
for i in range (0,len(txt)-1):
    if txt[i].isalpha() and not txt[i+1].isalpha() or not txt[i].isalpha() and txt[i+1].isalpha():
#定义拆分规则
        pos2=i
        alist.append(txt[pos1+1:pos2+1],)
        pos1=pos2
#pos1与pos2都是用作标记的变量
print(alist)

优点：能精准定位自己想要的部分，不会造成空格丢失。

缺点：代码比较繁琐，思维难度较高。

第二种方法：用正则表达式进行拆分

txt='''  i am the fastest man alive.
when i was eight, i was presenting to the room
i am man alive'''

blist=re.split("\W",txt)
print(blist)

优点：十分简洁，优雅。

缺点：入门较难，适用面不如上面一种方法高。

blist=['', '', 'i', 'am', 'the', 'fastest', 'man', 'alive', '',
 'when', 'i', 'was', 'eight', '', 'i', 'was', 'presenting',
 'to', 'the', 'room', 'i', 'am', 'man', 'alive', '']


alist=['  ', 'i', ' ', 'am', ' ', 'the', ' ', 'fastest', ' ', 
'man', ' ',