XLMR4MD: New Vietnamese dataset and framework for detecting the consistency of description and permission in Android applications using large language models

许可 计算机科学 越南语 一致性(知识库) Android(操作系统) Android应用程序 数据挖掘 人工智能 操作系统 语言学 哲学 政治学 法学
作者
Qui Ngoc Nguyen,Nguyen Tan Cam,Kiet Van Nguyen
出处
期刊:Computers & Security [Elsevier]
卷期号:140: 103814-103814
标识
DOI:10.1016/j.cose.2024.103814
摘要

Google Play and other application marketplaces have various Android applications and metadata. Among these, description information and privacy policy help explain the application's functionality. They also describe the permission of the application, especially those related to sensitive information. Detecting the inconsistency between the description of the application and privacy information and the permission extracted in the application's source code helps users decide whether to install and use the application. In this research, we propose a new method based on a pre-trained language model to detect inconsistencies between the permission extracted from the description application and privacy policy and the permission extracted from the application's source code (file APK). Related works focus on models of large-scale datasets, especially for resource-rich languages such as English. However, a language with low resources, like Vietnamese, needs more datasets for the task. To solve this problem, we propose the ViDPApp dataset (Description and Privacy Policy of Applications on Vietnamese domains), a high-quality dataset that humans manually annotate with 12,000+ sentences with an inter-annotator agreement (IAA) of over 85%. In addition, we proposed XLMR4MD, a new framework using large language models, outperforming powerful machine models (LSTM, Bi-GRU-LSTM-CNN, WikiBERT, DistilBERT, mBERT, and PhoBERT) and achieving the best with 84.04% F1 score in detecting inconsistencies between Android application permission and description. This framework can be fine-tuned for 100 languages, which benefits low-resource languages like Vietnamese. The dataset is available for research purposes.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
llli完成签到,获得积分10
刚刚
Gloria发布了新的文献求助10
刚刚
善学以致用应助爱吃米线采纳,获得10
1秒前
1秒前
思源应助白格采纳,获得10
1秒前
1秒前
昨夜書发布了新的文献求助20
1秒前
1秒前
明亮画笔发布了新的文献求助10
2秒前
2秒前
系统提示完成签到,获得积分10
2秒前
chc123完成签到,获得积分10
2秒前
奋斗的万怨完成签到 ,获得积分20
3秒前
关关发布了新的文献求助10
3秒前
3秒前
3秒前
丁一完成签到,获得积分10
3秒前
4秒前
4秒前
orixero应助张国栋采纳,获得10
4秒前
思源应助神揽星辰入梦采纳,获得20
5秒前
5秒前
chc123发布了新的文献求助10
5秒前
llli发布了新的文献求助10
5秒前
6秒前
6秒前
李健应助科研通管家采纳,获得10
7秒前
小蘑菇应助科研通管家采纳,获得10
7秒前
科研通AI6应助大气的冷亦采纳,获得10
7秒前
无花果应助科研通管家采纳,获得10
7秒前
ilihe应助科研通管家采纳,获得10
7秒前
求助人员应助科研通管家采纳,获得10
7秒前
张一二发布了新的文献求助20
7秒前
小猴子应助科研通管家采纳,获得10
7秒前
李健应助科研通管家采纳,获得10
7秒前
求助人员应助科研通管家采纳,获得10
7秒前
今后应助科研通管家采纳,获得10
7秒前
林夏应助科研通管家采纳,获得10
7秒前
乐乐应助科研通管家采纳,获得10
7秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Binary Alloy Phase Diagrams, 2nd Edition 6000
Encyclopedia of Reproduction Third Edition 3000
Comprehensive Methanol Science Production, Applications, and Emerging Technologies 2000
化妆品原料学 1000
The Political Psychology of Citizens in Rising China 800
1st Edition Sports Rehabilitation and Training Multidisciplinary Perspectives By Richard Moss, Adam Gledhill 600
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5637298
求助须知:如何正确求助?哪些是违规求助? 4743192
关于积分的说明 14998742
捐赠科研通 4795599
什么是DOI,文献DOI怎么找? 2562070
邀请新用户注册赠送积分活动 1521546
关于科研通互助平台的介绍 1481548