CATS-rf

Logo

Documentation

Introduction

CATS-rf is the reference-free module of the CATS (Comprehensive Assessment of Transcript Sequences) framework. It evaluates the quality of transcriptomes assembled de novo from short reads, relying solely on RNA-seq reads used in the assembly construction. The pipeline maps reads back to the assembled transcripts and examines mapping evidence suggesting misassembly. Quality evaluation is performed at the transcript level, integrating four score components each targeting specific assembly errors:

Score Component	Evidence	Targeted Assembly Errors
Coverage component (S_c)	Low-coverage regions	Insertions, redundancy
Accuracy component (S_a)	Low-accuracy regions	Sequence inaccuracy
Local fidelity component (S_l)	Inconsistent pair mapping within transcripts	Structural errors (e.g. deletions, translocations, inversions…)
Integrity component (S_i)	Pairs mapping to different transcripts	Transcript fragmentation

Transcript quality score S_t is calculated as the product of the described score components, equally weighting detected assembly errors. Assembly score S is computed as the mean of individual transcript scores. All components and scores are normalized to a range between 0 and 1, where higher values indicate better quality.

In addition to transcript scores, CATS-rf provides a comprehensive set of assembly metrics, including transcript length and composition statistics, read mapping rates, positional coverage and accuracy profiles, and pair mapping consistency metrics.

CATS-rf consistently displays stronger performance than currently existing reference-free transcriptome assembly evaluation tools. For detailed benchmarks and methodology, please refer to the CATS preprint.

This site is open source. Improve this page.