BioLadder: A bioinformatic platform primarily focused on proteomic data analysis

上传 计算机科学 数据科学 步伐 编码(社会科学) 云计算 数据分析 万维网 数据挖掘 大地测量学 数学 统计 操作系统 地理
作者
Yupeng Zhang,Chunyuan Yang,Jinhao Wang,Lixin Wang,Yan Zhao,Longqing Sun,Wei Sun,Yunping Zhu,Jingli Li,Songfeng Wu
出处
期刊:iMeta [Wiley]
卷期号:3 (4) 被引量:1
标识
DOI:10.1002/imt2.215
摘要

BioLadder (https://www.bioladder.cn/) is an online data analysis platform designed for proteomics research, which includes three classes of experimental data analysis modules and four classes of common data analysis modules. It allows for a variety of proteomics analyses to be conducted easily and efficiently. Additionally, most modules can also be utilized for the analysis of other omics data. To facilitate user experience, we have carefully designed four different kinds of functions for customers to quickly and accurately utilize the relevant analysis modules. In recent years, the vigorous development of multiomics research has generated massive amounts of data, and in-depth data analysis and mining have become an important feature of life science research [1, 2]. Bioinformatics has become one of the most commonly used research tools, playing a pivotal role in life science research. However, bioinformatics research requires programming training, which may not be a strong suit for those researchers that focus on scientific questions. Moreover, even some researchers have coding skills, who still need to invest considerable time and effort in coding to complete the analysis, which undoubtedly leads to delays in related work. Online analytical platforms are undoubtedly the first choice for researchers, as they do not require additional installation and preparation work. Simply opening a web page and uploading data for analysis can greatly accelerate the pace of life science research. Currently, there are many similar online data analysis platforms, including some specialized omics data analysis platforms, such as ImageGP [3], Sangerbox [4], Majorbio Cloud [5], OmicStudio [6], OmicsSuite [7], OmicsAnalyst [8], and so forth. However, most of these analytical platforms were developed based on the needs of genomics and transcriptomics, and almost none are specifically designed for proteomics. The proteome is translated from the transcriptome and not only possesses the expressive properties of the transcriptome, but also includes additional properties, such as modifications and interactions [9, 10]. In terms of qualitative and quantitative experimental techniques, proteomics is far more complex than genomics and transcriptomics, which imposes additional requirements on data analysis. In recent years, with the advancement of technology, proteomics has gradually played an increasingly important role in medical research [11, 12], leading to a growing and diverse demand for protein data analysis. Here, we provide the BioLadder bioinformatics platform (https://www.bioladder.cn/), which not only offers some conventional analytical tools but also provides commonly used proteomics analysis tools, including experimental result visualization, sequence-level analysis, expression data analysis, and functional analysis. Some of the tools are newly developed and currently have no equivalent tools available online. Proteomic data analysis can be divided into two main categories (Figure 1): (1) Experimental data analysis: Analysis related to proteomics experimental data, including the analysis of experimental data, expression matrix data analysis, and so forth (Classes 1–3); (2) Common data analysis: Analysis not dependent on proteomics experimental data, including protein sequence analysis, as well as some general classification and functional analysis, and so forth (Classes 4–7). The seven classes are outlined as follows: Experimental data visualization currently includes two modules (CoverageBar and Pep2ProMap), which display the coverage of proteomic identification peptides to proteins, as well as the information on protein digestion sites. Data preprocessing includes data format conversion, normalization, imputation, and so on. It is an important part for the following analysis. Quantitative comparison entails analyzing the quantitative results of each protein and is the most prevalent type of analysis module, subdivided into five groups: (1) DifferenceAnalysis: Differential analysis encompasses differential calculation, FDR (False Discovery Rate) correction, and the visualization of differential results, such as volcano plots, ROC (Receiver Operating Characteristic) curves, and so forth. These modules are capable of both differential calculation and result presentation; (2) QuantitativeDes: Quantitative data description includes creating scatter plots, density plots, distribution bar or line graphs, as well as coefficient of variation (CV). These modules are designed to describe the distribution, density, and other features of quantitative data; (3) QuantitativeComp: Quantitative data comparison includes bar graphs, heat maps, box plots, and so on. These modules are primarily utilized to compare the quantitative differences or variations among different samples or genes; (4) QuantitativeCorr: Quantitative data correlation includes correlation heat maps, correlation matrix graphs, and more. These modules calculate the quantitative correlation between samples or genes to reveal the relationships among samples or genes; (5) QuantitativeCluster: Quantitative clustering includes dimensionality reduction methods, like, PCA (Principal Component Analysis), T-SNE (T-Distributed Stochastic Neighbor Embedding), UMAP (UniformManifold Approximation and Projection) for dimension reduction, trend analysis of multiple data sets, TreeDiagram, and so on. These modules generally utilize algorithms for dimensionality reduction or other distance calculation methods to cluster and analyze samples or genes. Sequence analysis refers to analyses that can be completed based on protein sequences, including multiple sequence alignment, sequence motif analysis, calculation of protein physicochemical properties, and so forth. The abundance chart offers a convenient way to query and display reference quantitative data for body fluids (currently including blood and urine). Classification analysis consists of two groups: classification display and classification comparison: (1) Classification display involves presenting the differences in results of different types after classification using scatter plots, pie charts, area charts, and so forth; (2) Classification comparison entails comparing results of different types using VennChart, Sankey diagrams, Radar charts, and other visualizations. Function analysis focuses on visualizing enrichment results based on Gene Ontology, as well as drawing interaction network diagrams. Therefore, the analysis modules included in Bioladder cover experimental data analysis in proteomics research, as well as multiple modules for public sequence data analysis (Table S1). These analysis modules can meet most of the data analysis needs of researchers in the field of proteomics. In response to the needs of proteomics research, we have developed several proteome data visualization modules (Table S2), such as (1) coverage analysis of peptide segments in protein sequences, including the CoverageBar and Pep2ProMap modules. These modules are primarily designed for presenting Lip-MS (Limited Proteolysis-Mass Spectrometry) experimental results, but can also be used to display identification data from any proteomics experiment; (2) analysis and visualization of quantitative data distribution, including the CV curve and SumCurve modules. Users can utilize these modules to examine the variability and abundance curves of quantitative data; (3) quantification data and marked proteins, including the AbundancePoint and BodyFluidMap modules. The former allows users to input their own quantitative data and specify proteins, while the latter enables users to query the quantitative information of specific proteins in the body fluid database (currently including blood and urine). We believe that these proteome data visualization modules will meet the demands of proteomics research and provide valuable insights for researchers. To enable users in omics research to utilize our online analysis platform in the most convenient and efficient manner, we have meticulously designed various aspects, including input file formats (Figure 2A), parameter settings (Figure 2B), color schemes (Figure 2C), and so on. We provide help documentation, WeChat customer service, and real-time tooltips to make it easy for customers to access relevant help information (Figure 2D). Only part of these designs can be implemented in current online cloud platforms (Table S3). Many data analysis methods are universal across different fields with its own input data format, which may not be commonly used in the field of proteomics. Proteomics data may require some transformation to facilitate the corresponding analysis. Therefore, in our design, we provide conversion modules for different types of data (e.g., converting between long and wide formats) and design some modules to directly support common proteomics formats. For example, in the Venn diagram module, users can not only input commonly used Venn format data but also directly input quantitative matrix data tables (i.e., usually used in proteomics) for analysis. Additionally, it could also filter out some data below a certain minimum quantitation value, which helps eliminate results that may be caused by noise. To meet the specific requirements of proteomics data analysis, we have established suitable default parameters for some modules to minimize the need for parameter adjustments as much as possible. First, in terms of algorithms, we have adjusted default parameters based on the characteristics of proteomics data. For instance, in correlation calculations, due to the nature of expression data, a few highly abundant proteins may significantly impact the default Pearson correlation calculation. Therefore, in those modules that involved correlation calculations, we have defaulted to using Spearman rank correlation for computation, which were adopted in many proteomics-related studies as well [13-15]. Furthermore, considering that there is often significant variation in the identified protein numbers of different samples, conventional normalization methods may inevitably introduce bias. To address this issue, we have incorporated a method called median normalization of common proteins in the normalization module. Second, in data preprocessing, we made some adjustments based on the data characteristics of proteomics. For example, as most genes tend to be relatively low abundance, directly plotting quantitative distributions often results in most proteins being concentrated in low abundance, which makes the differences between samples hard to discern [13-15]. Hence, in modules such as box plots, violin plots, and kernel density plots, we have directly set the default to require logarithmic transformation, allowing for clear visualization of quantitative data variances across different samples without any parameter modifications. Furthermore, we also made some special default parameters in data presentation. For instance, in heatmap analysis, with genes typically numerous on the y-axis, displaying gene names can often be illegible. Therefore, we have defaulted to display only sample names and omitting gene names for better clarity. In addition, to cater to user preferences, we have incorporated easily adjustable parameters in several modules, empowering users to customize their display results. For example, in volcano plot analysis, we have included two types of point annotation methods: (1) Customizing protein markers based on a designated marker column in the uploaded file; (2) Batch marking based on p value and fold change thresholds. Similarly, in box plot analysis, users can choose whether to add hypothesis test labels between different groups. We have also devised custom options allowing users to selectively add hypothesis test labels to specific group comparisons (e.g., only annotating significant results or comparisons of particular interest). Color scheme is a crucial aspect of data visualization, as improper color combinations can significantly reduce the effectiveness of visualizations. To address this problem, we have configured default color schemes in all modules, including some default color schemes from R packages or ggplot2 (https://github.com/tidyverse/ggplot2), ensuring users can immediately create refined graphics without additional steps. Furthermore, more than half the modules have incorporated additional color schemes sourced from commonly used excellent color schemes in literature or journals, such as Nature, Science, and Lancet (ggsci: https://github.com/nanxstats/ggsci). For users with specific requirements, we offer the option to customize colors. Users can select colors directly using color palettes or precisely modify color configurations by adjusting color codes, enabling them to customize colors for each sample or group based on their preferences and esthetics. These three functionalities provide our modules with powerful color customization capabilities, catering to various user needs and allowing users to quickly complete color customization according to their preferences. Additionally, certain modules with unique characteristics utilize special color schemes. For example, the volcano plot module typically only requires three colors for upregulation, downregulation, and nonsignificance, so a color picker is used to set up the tricolor scheme. To ensure users can smoothly utilize our modules for data analysis, we provide helpful information from multiple perspectives in the "User Guide." First, we offer an introduction to provide an overview of the website structure and functionalities. Second, we have a "Frequently Asked Questions" page that compiles the most common inquiries. Third, detailed documentation is provided for each module. Additionally, we offer a WeChat communication group where users can directly consult our staff about encountered issues. Furthermore, besides commonly used parameter settings, we have added tooltips for instant assistance, allowing users to access helpful information on parameter settings at any time to help accurately configure the corresponding parameters. For instance, in the heatmap module, four types of tooltips are provided: (1) Tooltip for input file details, including file content explanation, maximum file limits, and file formats; (2) Tooltip for dropdown selection boxes, explaining the meaning of each option; (3) Tooltip for download formats, providing download instructions and graphical explanations of download settings; (4) In the top left corner of result plots in most modules, a "Text Tutorial" link is provided, along with a tooltip explaining the plot, allowing customers to quickly understand the plot's significance. These tooltips enable users to easily access helpful information and seamlessly continue with configuration and data analysis. BioLadder's user interface is engineered upon the Vue.js framework, offering a robust and interactive client-side experience. The server-side architecture is meticulously crafted utilizing the Laravel framework, known for its expressive syntax and robust features. The platform's data persistence is managed by MySQL, ensuring reliable and efficient data management. The analytical functionalities are delivered through a synergistic integration of JavaScript for dynamic web interactions, Shiny for creating interactive web applications, and a selection of R packages optimized for statistical analysis and graphics, enabling sophisticated data processing and visualization within the proteomics domain. Songfeng Wu, Jingli Li, and Yunping Zhu conceived the idea of developing the BioLadder platform. Yupeng Zhang completed the construction of the platform and the implementation of various modules. Chunyuan Yang built and maintained computing services and wrote manuscript. Jinhao Wang assisted in completing some of the development work, while Lixin Wang assisted in conducting research and promotion. Yan Zhao, Longqing Sun, and Wei Sun assisted in the testing and proposed modification suggestions. All authors have read the final manuscript and approved it for publication. This work was supported by the National Key Research Program of China (2021YFA1301603). We thank Shouke Zhang for his invaluable assistance in crafting and enhancing the graphical representations. Yupeng Zhang, Jinhao Wang, Lixin Wang, Yan Zhao, Longqing Sun, Wei Sun, Jingli Li, and Songfeng Wu are employees and researchers of Qinglian Biotech Co., Ltd. The remaining authors declare no conflict of interest. Table S1: BioLadder modules in the proteome data analysis framework. Table S2: The possible application for each new developed modules. Table S3: Comparison of BioLadder convenient and user-friendly designs in different cloud platforms. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
1秒前
1秒前
1秒前
oylonq完成签到,获得积分10
1秒前
等你下课发布了新的文献求助10
2秒前
bobo_research完成签到,获得积分10
2秒前
内向的跳跳糖完成签到,获得积分10
2秒前
realismaa发布了新的文献求助10
3秒前
霹雳小土豆-完成签到,获得积分0
4秒前
cc完成签到,获得积分10
4秒前
4秒前
饱满的老九完成签到,获得积分10
4秒前
啦啦啦完成签到 ,获得积分10
5秒前
5秒前
pluto应助小城故事和冰雨采纳,获得10
5秒前
6秒前
ljljljlj发布了新的文献求助10
6秒前
是锦锦呀完成签到,获得积分10
6秒前
朴实惜天发布了新的文献求助10
6秒前
幸福的之槐完成签到,获得积分10
6秒前
kangnakangna完成签到,获得积分10
6秒前
6秒前
徐恭完成签到,获得积分10
7秒前
努力努力发布了新的文献求助10
7秒前
lin发布了新的文献求助10
7秒前
7秒前
forever完成签到,获得积分10
7秒前
穆易羊完成签到 ,获得积分10
7秒前
2190894524关注了科研通微信公众号
8秒前
是锦锦呀发布了新的文献求助10
8秒前
小斌完成签到,获得积分10
8秒前
9秒前
ymm关闭了ymm文献求助
9秒前
9秒前
Cc完成签到,获得积分10
9秒前
10秒前
10秒前
blink完成签到,获得积分10
10秒前
伊诺完成签到,获得积分10
10秒前
高分求助中
【提示信息,请勿应助】关于scihub 10000
The Mother of All Tableaux: Order, Equivalence, and Geometry in the Large-scale Structure of Optimality Theory 3000
Social Research Methods (4th Edition) by Maggie Walter (2019) 2390
A new approach to the extrapolation of accelerated life test data 1000
北师大毕业论文 基于可调谐半导体激光吸收光谱技术泄漏气体检测系统的研究 390
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370
Robot-supported joining of reinforcement textiles with one-sided sewing heads 360
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4009668
求助须知:如何正确求助?哪些是违规求助? 3549638
关于积分的说明 11302957
捐赠科研通 3284181
什么是DOI,文献DOI怎么找? 1810535
邀请新用户注册赠送积分活动 886356
科研通“疑难数据库(出版商)”最低求助积分说明 811355