# Publication

• ### Functional genomic assays to annotate enhancer-promoter interactions genome-wide

Enhancers are pivotal for regulating gene transcription that occurs at promoters. Identification of the interacting enhancer–promoter pairs and understanding the mechanisms behind how they interact and how enhancers modulate transcription can provide fundamental insight into gene regulatory networks. Recently, advances in high-throughput methods in three major areas—chromosome conformation capture assay, such as Hi-C to study basic chromatin architecture, ectopic reporter experiments such as self-transcribing active regulatory region sequencing (STARR-seq) to quantify promoter and enhancer activity, and endogenous perturbations such as clustered regularly interspaced short palindromic repeat interference (CRISPRi) to identify enhancer–promoter compatibility—have further our knowledge about transcription. In this review, we will discuss the major method developments and key findings from these assays.
• ### Survey of the binding preferences of RNA-binding proteins to RNA editing events

Background: Adenosine-to-inosine (A-to-I) editing is an important RNA posttranscriptional process related to a multitude of cellular and molecular activities. However, systematic characterizations of whether and how the events of RNA editing are associated with the binding preferences of RNA sequences to RNA-binding proteins (RBPs) are still lacking. Results: With the RNA-seq and RBP eCLIP-seq datasets from the ENCODE project, we quantitatively survey the binding preferences of 150 RBPs to RNA editing events, followed by experimental validations. Such analyses of the RBP-associated RNA editing at nucleotide resolution and genome-wide scale shed light on the involvement of RBPs specifically in RNA editing-related processes, such as RNA splicing, RNA secondary structures, RNA decay, and other posttranscriptional processes. Conclusions: These results highlight the relevance of RNA editing in the functions of many RBPs and therefore serve as a resource for further characterization of the functional associations between various RNA editing events and RBPs.
• ### A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers

Mounting evidence supports the idea that transcriptional patterns serve as more specific identifiers of active enhancers than histone marks; however, the optimal strategy to identify active enhancers both experimentally and computationally has not been determined. Here, we compared 13 genome-wide RNA sequencing (RNA-seq) assays in K562 cells and show that nuclear run-on followed by cap-selection assay (GRO/PRO-cap) has advantages in enhancer RNA detection and active enhancer identification. We also introduce a tool, peak identifier for nascent transcript starts (PINTS), to identify active promoters and enhancers genome wide and pinpoint the precise location of 5′ transcription start sites. Finally, we compiled a comprehensive enhancer candidate compendium based on the detected enhancer RNA (eRNA) transcription start sites (TSSs) available in 120 cell and tissue types, which can be accessed at https://pints.yulab.org. With knowledge of the best available assays and pipelines, this large-scale annotation of candidate enhancers will pave the way for selection and characterization of their functions in a timeand labor-efficient manner.

• ### Extensive disruption of protein interactions by genetic variants across the allele frequency spectrum in human populations

Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual’s genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations.
• ### bioSyntax: syntax highlighting for computational biology

Background: Computational biology requires the reading and comprehension of biological data files. Plain-text formats such as SAM, VCF, GTF, PDB and FASTA, often contain critical information which is obfuscated by the data structure complexity. Results: bioSyntax is a freely available suite of biological syntax highlighting packages for vim, gedit, Sublime, VSCode, and less. bioSyntax improves the legibility of low-level biological data in the bioinformatics workspace. Conclusion: bioSyntax supports computational scientists in parsing and comprehending their data efficiently and thus can accelerate research output.
• ### Large-scale prediction of ADAR-mediated effective human A-to-I RNA editing

Adenosine-to-inosine (A-to-I) editing by adenosine deaminase acting on the RNA (ADAR) proteins is one of the most frequent modifications during post- and co-transcription. To facilitate the assignment of biological functions to specific editing sites, we designed an automatic online platform to annotate A-to-I RNA editing sites in pre-mRNA splicing signals, microRNAs (miRNAs) and miRNA target untranslated regions ($3^\prime$ UTRs) from human (Homo sapiens) high-throughput sequencing data and predict their effects based on large-scale bioinformatic analysis. After analysing plenty of previously reported RNA editing events and human normal tissues RNA high-seq data, >60000 potentially effective RNA editing events on functional genes were found. The RNA Editing Plus platform is available for free at https://www.rnaeditplus.org/, and we believe our platform governing multiple optimized methods will improve further studies of A-to-I-induced editing post-transcriptional regulation.
• ### BioQueue: a novel pipeline framework to accelerate bioinformatics analysis

Motivation: With the rapid development of Next-Generation Sequencing, a large amount of data is now available for bioinformatics research. Meanwhile, the presence of many pipeline frameworks makes it possible to analyse these data. However, these tools concentrate mainly on their syntax and design paradigms, and dispatch jobs based on users’ experience about the resources needed by the execution of a certain step in a protocol. As a result, it is difficult for these tools to maximize the potential of computing resources, and avoid errors caused by overload, such as memory overflow. Results: Here, we have developed BioQueue, a web-based framework that contains a checkpoint before each step to automatically estimate the system resources (CPU, memory and disk) needed by the step and then dispatch jobs accordingly. BioQueue possesses a shell command-like syntax instead of implementing a new script language, which means most biologists without computer programming background can access the efficient queue system with ease.
• ### Circulating microRNAs: Promising Biomarkers Involved in Several Cancers and Other Diseases

Recently, many studies indicated that microRNAs (miRNAs) stably existed in various body fluids, including serum, plasma, saliva, and urine. Such miRNAs that exist in mammalian body fluids are known as circulating miRNAs, and they can transmit signals between cells and regulate intracellular gene expression. Currently, we barely understand the characteristics, sources, secretion, uptake, and functions of newly generated miRNAs. Particularly, it has been shown that certain types of circulating miRNAs can provide effective clinical data, suggesting their roles as novel biomarkers for the early detection of diseases such as cancers, cardiovascular diseases, and diabetes. Therefore, miRNAs have attracted much attention in academia for their promising applications in fundamental research and clinical diagnosis. This review summarizes some of the functional studies that have been conducted as well as the promising applications of circulating miRNAs, and we hope it will benefit other researchers in this field.
• {title}
{tags}