牛骨文教育服务平台(让学习变的简单)
博文笔记

TextRank算法

创建时间:2017-03-10 投稿人: 浏览次数:1180
# -*-coding=UTF-8-*-
import networkx
from nltk.tokenize.punkt import PunktSentenceTokenizer
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer

document = """To Sherlock Holmes she is always the woman. I have
seldom heard him mention her under any other name. In his eyes she
eclipses and predominates the whole of her sex. It was not that he
felt any emotion akin to love for Irene Adler. All emotions, and that
one particularly, were abhorrent to his cold, precise but admirably
balanced mind. He was, I take it, the most perfect reasoning and
observing machine that the world has seen, but as a lover he would
have placed himself in a false position. He never spoke of the softer
passions, save with a gibe and a sneer. They were admirable things for
the observer-excellent for drawing the veil from men’s motives and
actions. But for the trained reasoner to admit such intrusions into
his own delicate and finely adjusted temperament was to introduce a
distracting factor which might throw a doubt upon all his mental
results. Grit in a sensitive instrument, or a crack in one of his own
high-power lenses, would not be more disturbing than a strong emotion
in a nature such as his. And yet there was but one woman to him, and
that woman was the late Irene Adler, of dubious and questionable
memory.
"""
document = " ".join(document.strip().splitlines())
sentence_tokenizer = PunktSentenceTokenizer()
sentences = sentence_tokenizer.tokenize(document)
c = CountVectorizer()
bow_matrix = c.fit_transform(sentences)
normalized_matrix = TfidfTransformer().fit_transform(bow_matrix)
nx_graph = networkx.from_scipy_sparse_matrix(normalized_matrix * normalized_matrix.T)
scores = networkx.pagerank(nx_graph)
results = sorted(scores.iteritems(), key=lambda x: x[1], reverse=True)
print(sentences[results[0][0]])

声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。