正文

中文词频统计(代码片段)

After17  After17  2022-10-28  728

关键词：

下载一长篇中文文章。

从文件读取待分析文本。

news = open(\'gzccnews.txt\',\'r\',encoding = \'utf-8\')

安装与使用jieba进行中文分词。

pip install jieba

import jieba

list(jieba.lcut(news))

生成词频统计

排序

排除语法型词汇，代词、冠词、连词

输出词频最大TOP20

将代码与运行结果截图发布在博客上。

import jieba
text = open(\'jinpingmei.txt\',encoding=\'utf-8\').read()
textList = list(jieba.lcut(text))
useless = \'，\',\'。\',\' \',\'了\',\'：\',\'“\',\'”\',\'的\',\'\\n\',\'他\',\'道\',\'你\',\'我\',\'在\',\'？\',\\
           \'来\',\'说\',\'去\',\'与\',\'不\',\'是\',\'、\',\'也\',\'又\',\'！\',\'着\',\'儿\',\'这\',\'到\',\'就\', \\
           \'把\',\'那\',\'有\',\'上\',\'都\',\'便\',\'和\',\'说道\',\'等\',\'只\',\'要\',\'小\',\'罢\',\'问\',\'那里\',\\
           \'怎\',\'一个\',

textDic = 
for i in textList:
    textDic[i] = textDic.get(i,0)+1
    
for d in useless:
    del textDic[d]
    
textLs = list(textDic.items())
textLs.sort(key=lambda e:e[1],reverse=True)
for s in range(20):
    print(textLs[s])

中文词频统计(代码片段)

下载一长篇中文文章。从文件读取待分析文本。news=open(‘gzccnews.txt‘,‘r‘,encoding=‘utf-8‘)安装与使用jieba进行中文分词。pipinstalljiebaimportjiebalist(jieba.lcut(news))生成词频统计排序排除语法型词汇，代词、冠词、连词输出词频最大... 查看详情

中文词频统计(代码片段)

题目：下载一长篇中文文章。从文件读取待分析文本。news=open(‘gzccnews.txt‘,‘r‘,encoding=‘utf-8‘)安装与使用jieba进行中文分词。pipinstalljiebaimportjiebalist(jieba.lcut(news))生成词频统计排序排除语法型词汇，代词、冠词、连词输出词... 查看详情

1.英文词频统2.中文词频统计(代码片段)

1.英文词频统news=‘‘‘GuoShuqing,headofthenewlyestablishedChinabankingandinsuranceregulatorycommission,wasappointedPartysecretaryandvice-governorofthecentralbankonMonday,accordingtoanannouncementpublishedont 查看详情

中文词频统计(代码片段)

importjiebaf=open(‘novel.txt‘,‘r‘,encoding=‘utf-8‘)content=f.read()f.close()symbol=‘‘‘。，“”！？\n（）；‘‘‘foriinsymbol:content=content.replace(i,‘‘)#使用jieba进行中文分词contentList=list(jieba.cut(content))#生成词频统计c 查看详情

中文词频统计(代码片段)

中文词频统计1.下载一长篇中文小说。2.从文件读取待分析文本。3.安装并使用jieba进行中文分词。pipinstalljiebaimportjiebaljieba.lcut(text)4.更新词库，加入所分析对象的专业词汇。jieba.add_word(‘天罡北斗阵‘) #逐个添加jieba.load_userdi... 查看详情

中文词频统计(代码片段)

#coding=utf--8importjiebaexclude=‘,‘,‘、‘,‘。‘,‘\u3000‘,‘\n‘,‘"‘,"《",‘》‘,‘?‘txt=open(‘doupo.txt‘,‘r‘).read()wordList=list(jieba.cut(txt))wordSet=set(wordList)-excludewordDict=forwinwordSet:wordDict[ 查看详情

中文词频统计(代码片段)

importjiebafo=open(‘aaa.txt‘,‘r‘,encoding=‘utf-8‘)text=fo.read()text2=list(jieba.lcut(text))sign=‘你‘,‘‘,‘我‘,‘我们‘,‘他‘,‘他们‘,‘我的‘,‘他的‘,‘你的‘,‘呀‘,‘和‘,‘是‘,‘，‘,‘。‘,‘：‘,‘“‘,‘”‘,‘的‘... 查看详情

中文词频统计(代码片段)

importjiebaf=open(‘article.txt‘,‘r‘,encoding=‘utf-8‘)text=f.read()f.close()str=‘‘‘一！“”，。？；’"‘,.、：\n‘‘‘forsinstr:text=text.replace(s,‘‘)wordlist=list(jieba.cut(text))exclude=‘你‘,‘你们‘,‘的‘,‘他‘,‘了‘,‘她‘,‘ 查看详情

中文词频统计(代码片段)

importjiebanews=open(‘bignews.txt‘,‘r‘).read()news_cut=jieba.lcut(news)dict=foriinset(news_cut):dict[i]=news_cut.count(i)delete=‘的‘,‘和‘,‘了‘,‘在‘,‘为‘,‘是‘,‘为‘,‘我‘,‘‘,‘-‘,‘\n‘,‘，‘,‘。‘,‘？‘,‘！‘,‘“‘,‘”‘,... 查看详情

中文词频统计(代码片段)

importjiebaf=open(‘new.txt‘,‘r‘,encoding=‘utf-8‘)new=f.read()#关闭文件流f.close()#删除数字和标点符号str=‘‘‘1234567890一!！“”，。？、；’"‘,.、：（）()\n‘’‘‘‘foriinstr:new=new.replace(i,"")NEW=list(jieba.lcut(new))exclude=[‘说‘, 查看详情

中文词频统计(代码片段)

...hellip;…"forvinjieba.cut(str): print(v) 中文词频统计1.下载一长篇中文小说。 2.从文件读取待分析文本。 3.安装并使用jieba进行中文分词。pipinstalljiebaimportjiebaljieba.lcut(text) 4.更新词库，加入... 查看详情

python中文分词+词频统计(代码片段)

文章目录目录文章目录前言一、文本导入二、使用步骤1.引入库2.读入数据 3.取出停用词表 3.分词并去停用词（此时可以直接利用python原有的函数进行词频统计） 4. 输出分词并去停用词... 查看详情

中文词频统计

1.下载一中文长篇小说，并转换成UTF-8编码。2.使用jieba库，进行中文词频统计，输出TOP20的词及出现次数。3.排除一些无意义词、合并同一词。4.对词频统计结果做简单的解读。代码如下：importjiebatxt=open(‘aaa.txt‘,‘r‘,encoding=‘u... 查看详情

运用jieba库进行词频统计(代码片段)

Python第三方库jieba(中文分词）一、概述jieba是优秀的中文分词第三方库-中文文本需要通过分词获得单个的词语-jieba是优秀的中文分词第三方库，需要额外安装-jieba库提供三种分词模式，最简单只需掌握一个函数二、安装说明全自... 查看详情