由于0.15版本,每个特征的TF-IDF评分可以通过属性来检索
idf_所述的
TfidfVectorizer对象:
from sklearn.feature_extraction.text import TfidfVectorizercorpus = ["This is very strange", "This is very nice"]vectorizer = TfidfVectorizer(min_df=1)X = vectorizer.fit_transform(corpus)idf = vectorizer.idf_print dict(zip(vectorizer.get_feature_names(), idf))
输出:
{u'is': 1.0, u'nice': 1.4054651081081644, u'strange': 1.4054651081081644, u'this': 1.0, u'very': 1.0}
如评论中所述,在0.15版之前,一种解决方法是
idf_通过所谓的矢量化程序隐藏的
_tfidf(实例化
TfidfTransformer)访问属性:
idf = vectorizer._tfidf.idf_print dict(zip(vectorizer.get_feature_names(), idf))
它应该提供与上述相同的输出。