Screening patents of ICT in construction using deep learning and NLP techniques

计算机科学 独创性 人工智能 商标 分类器(UML) 深度学习 机器学习 数据科学 情报检索 自然语言处理 社会学 定性研究 社会科学 操作系统
作者
Hengqin Wu,Qiping Shen,Xue Lin,Minglei Li,Boyu Zhang,Clyde Zhengdao Li
出处
期刊:Engineering, Construction and Architectural Management [Emerald Publishing Limited]
卷期号:27 (8): 1891-1912 被引量:14
标识
DOI:10.1108/ecam-09-2019-0480
摘要

Purpose This study proposes an approach to solve the fundamental problem in using query-based methods (i.e. searching engines and patent retrieval tools) to screen patents of information and communication technology in construction (ICTC). The fundamental problem is that ICTC incorporates various techniques and thus cannot be simply represented by man-made queries. To investigate this concern, this study develops a binary classifier by utilizing deep learning and NLP techniques to automatically identify whether a patent is relevant to ICTC, thus accurately screening a corpus of ICTC patents. Design/methodology/approach This study employs NLP techniques to convert the textual data of patents into numerical vectors. Then, a supervised deep learning model is developed to learn the relations between the input vectors and outputs. Findings The validation results indicate that (1) the proposed approach has a better performance in screening ICTC patents than traditional machine learning methods; (2) besides the United States Patent and Trademark Office (USPTO) that provides structured and well-written patents, the approach could also accurately screen patents form Derwent Innovations Index (DIX), in which patents are written in different genres. Practical implications This study contributes a specific collection for ICTC patents, which is not provided by the patent offices. Social implications The proposed approach contributes an alternative manner in gathering a corpus of patents for domains like ICTC that neither exists as a searchable classification in patent offices, nor is accurately represented by man-made queries. Originality/value A deep learning model with two layers of neurons is developed to learn the non-linear relations between the input features and outputs providing better performance than traditional machine learning models. This study uses advanced NLP techniques lemmatization and part-of-speech POS to process textual data of ICTC patents. This study contributes specific collection for ICTC patents which is not provided by the patent offices.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
西风惊绿发布了新的文献求助10
刚刚
Rosechanel完成签到,获得积分20
1秒前
个性的南珍完成签到 ,获得积分10
1秒前
accelia发布了新的文献求助10
1秒前
yiiiii完成签到,获得积分10
2秒前
2秒前
邢夏之发布了新的文献求助10
2秒前
2秒前
2秒前
NexusExplorer应助jm采纳,获得10
4秒前
Ww完成签到,获得积分10
4秒前
Rosechanel发布了新的文献求助20
4秒前
苏家楠木琳完成签到,获得积分10
4秒前
冷静冷亦完成签到,获得积分10
5秒前
胡文彬完成签到,获得积分10
5秒前
6秒前
张贵虎完成签到 ,获得积分10
6秒前
sl发布了新的文献求助10
6秒前
6秒前
6秒前
6秒前
6秒前
alian完成签到,获得积分20
7秒前
wert完成签到,获得积分10
7秒前
BLESSING发布了新的文献求助10
8秒前
LBY完成签到,获得积分10
8秒前
ding应助CHEN采纳,获得10
8秒前
典雅碧空发布了新的文献求助10
8秒前
8秒前
季宇完成签到,获得积分10
8秒前
西风惊绿完成签到,获得积分10
9秒前
9秒前
cc发布了新的文献求助10
9秒前
ChenXinde发布了新的文献求助10
10秒前
10秒前
10秒前
prof.zhang完成签到,获得积分10
10秒前
10秒前
10秒前
11秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Burger's Medicinal Chemistry, Drug Discovery and Development, Volumes 1 - 8, 8 Volume Set, 8th Edition 1800
Cronologia da história de Macau 1600
Contemporary Debates in Epistemology (3rd Edition) 1000
International Arbitration Law and Practice 1000
文献PREDICTION EQUATIONS FOR SHIPS' TURNING CIRCLES或期刊Transactions of the North East Coast Institution of Engineers and Shipbuilders第95卷 1000
BRITTLE FRACTURE IN WELDED SHIPS 1000
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 计算机科学 化学工程 生物化学 物理 复合材料 内科学 催化作用 物理化学 光电子学 细胞生物学 基因 电极 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6160390
求助须知:如何正确求助?哪些是违规求助? 7988687
关于积分的说明 16605563
捐赠科研通 5268631
什么是DOI,文献DOI怎么找? 2811172
邀请新用户注册赠送积分活动 1791287
关于科研通互助平台的介绍 1658143