Text mining techniques for patent analysis

计算机科学 术语 专利可视化 过程(计算) 集合(抽象数据类型) 鉴定(生物学) 情报检索 数据挖掘 领域(数学分析) 关联规则学习 分割 信息抽取 人工智能 数据科学 哲学 数学分析 操作系统 生物 植物 程序设计语言 语言学 数学
作者
Yuen‐Hsien Tseng,Chi-Jen Lin,Yu-I Lin
出处
期刊:Information Processing and Management [Elsevier]
卷期号:43 (5): 1216-1247 被引量:664
标识
DOI:10.1016/j.ipm.2006.11.011
摘要

Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a series of text mining techniques that conforms to the analytical process used by patent analysts. These techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping. The issues of efficiency and effectiveness are considered in the design of these techniques. Some important features of the proposed methodology include a rigorous approach to verify the usefulness of segment extracts as the document surrogates, a corpus- and dictionary-free algorithm for keyphrase extraction, an efficient co-word analysis method that can be applied to large volume of patents, and an automatic procedure to create generic cluster titles for ease of result interpretation. Evaluation of these techniques was conducted. The results confirm that the machine-generated summaries do preserve more important content words than some other sections for classification. To demonstrate the feasibility, the proposed methodology was applied to a real-world patent set for domain analysis and mapping, which shows that our approach is more effective than existing classification systems. The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
顺心飞绿完成签到,获得积分10
2秒前
2秒前
2秒前
鸭梨发布了新的文献求助10
3秒前
小二郎应助ss采纳,获得10
3秒前
承乐发布了新的文献求助10
3秒前
开心的孤云完成签到,获得积分10
3秒前
3秒前
考拉完成签到,获得积分10
4秒前
maffei完成签到,获得积分10
4秒前
无极微光应助十米采纳,获得20
4秒前
小鹿完成签到,获得积分10
5秒前
5秒前
纳斯达克完成签到,获得积分10
6秒前
6秒前
7秒前
淡淡de橙子完成签到,获得积分10
7秒前
贝塔贝塔发布了新的文献求助10
7秒前
7秒前
科研通AI2S应助调皮的滑板采纳,获得10
8秒前
bubu发布了新的文献求助10
8秒前
xixi发布了新的文献求助10
8秒前
9秒前
9秒前
xiaofeizhu发布了新的文献求助10
9秒前
深情安青应助刘丰铭采纳,获得10
9秒前
无极微光应助雷Lei采纳,获得20
10秒前
10秒前
10秒前
Eon发布了新的文献求助10
10秒前
12秒前
十把刀刀完成签到,获得积分10
12秒前
13秒前
隐形曼青应助美好的冷亦采纳,获得10
13秒前
xiasha完成签到 ,获得积分10
13秒前
14秒前
14秒前
幽默的尔蓝完成签到,获得积分10
14秒前
科研通AI6应助f1mike110采纳,获得10
14秒前
Liao完成签到,获得积分10
15秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Basic And Clinical Science Course 2025-2026 3000
Encyclopedia of Agriculture and Food Systems Third Edition 2000
人脑智能与人工智能 1000
花の香りの秘密―遺伝子情報から機能性まで 800
Principles of Plasma Discharges and Materials Processing, 3rd Edition 400
Pharmacology for Chemists: Drug Discovery in Context 400
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5608292
求助须知:如何正确求助?哪些是违规求助? 4692876
关于积分的说明 14875899
捐赠科研通 4717214
什么是DOI,文献DOI怎么找? 2544162
邀请新用户注册赠送积分活动 1509147
关于科研通互助平台的介绍 1472809