计算机科学
节点(物理)
Torrent文件
分布式文件系统
文件系统
倒排索引
计算机网络
索引(排版)
文件系统碎片
散列函数
复制(统计)
搜索引擎索引
自我认证文件系统
设备文件
计算机文件
数据库
操作系统
情报检索
万维网
统计
工程类
结构工程
计算机安全
数学
摘要
Distributed storage plays an increasingly important role in the context of big data. interplanetary file system (IPFS) is a distributed file system, which can form a network of all heterogeneous devices in the same way. Different from traditional HTTP protocol based on physical location, IPFS distributed network is based on content addressing and obtains files through file hash. However, this precise file search method cannot obtain files without file content hash which greatly reduces file utilization and liquidity. Therefore, this paper proposes a two-layer index scheme. After receiving the uploaded file, the node parses the file and establishes the index. The nodes are replicated using a CRDT data structure based on optimistic replication for indexing operations. IPFS pub-sub is used as the CRDT message delivery method between nodes. The first-layer index is the inverted index file corresponding to each keyword. The second-layer index is the CID of the inverted index file for each keyword. Each node maintains full index rather than through a distributed hash table stores dispersion index can ensure complete data search, at the same time greatly reduce search response time. Inverted index files are stored in IPFS network to reduce storage space and facilitate state-based replication of newly added nodes or nodes that have been offline for a long time. Finally, through the analysis of experimental data, it is proved that the scheme can greatly reduce the search response time while occupying acceptable storage space.
科研通智能强力驱动
Strongly Powered by AbleSci AI