Jieba Text Segmentation

Last Updated OnJanuary 25, 2021

Jieba Text Segmentation

Jieba is a simple text segmentation library used to segment a text into words. The words can be more easily processed. For example, you can determine the main purpose of a paragraph based on words that appear frequently or understand the trends by analyzing the words that appear frequently over the years.

import jieba

import jieba.analyse

text = ”’

中国载人航天工程办公室主任助理季启明在发射成功后的新闻发布会上介绍称，长征五号B遥一火箭已将载荷组合体准确送入预定轨道，它搭载的新一代载人飞船试验船是面向中国空间站运营及未来载人探月需求而研发的新一代天地往返运输器，本次任务将对飞船高速再入返回的防热、控制、群伞回收及部分重复使用等关键技术进行验证。同时升空的还有柔性充气式货物返回舱试验舱，这是中国新型空间运输飞行器的试验器，本次任务将对充气展开式返回飞行器轨道再入关键技术进行验证。按照飞行程序，试验舱和试验船完成在轨试验后，计划分别于5月6日和8日返回东风着陆场。

”’

# 全模式

seg_list = jieba.cut(text, cut_all=True)

print(u”full mode: “, “/ “.join(seg_list))

# 精确模式

seg_list = jieba.cut(text, cut_all=False)

print (u”accurate mode: “, “/ “.join(seg_list))

# 搜索引擎模式

seg_list = jieba.cut_for_search(text)

print(u”search engine mode: “, “/ “.join(seg_list))

#关键词分析

tags = jieba.analyse.extract_tags(text, topK=3)

print(u”keywords: “, “/ “.join(tags))

Note: This example program is translated from the Chinese version. You can try to compile an English one by using English APIs.

How Can We Help?

Jieba Text Segmentation