
CATS_rf_compare)CATS-rf is the reference-free module of the CATS (Comprehensive Assessment of Transcript Sequences) framework. It evaluates the quality of transcriptomes assembled de novo from short reads, relying solely on RNA-seq reads used in the assembly construction. The pipeline maps reads back to the assembled transcripts and examines mapping evidence suggesting misassembly. Quality evaluation is performed at the transcript level, integrating four score components each targeting specific assembly errors:
| Score Component | Evidence | Targeted Assembly Errors |
|---|---|---|
| Coverage component (Sc) | Low-coverage regions | Insertions, redundancy |
| Accuracy component (Sa) | Low-accuracy regions | Sequence inaccuracy |
| Local fidelity component (Sl) | Inconsistent pair mapping within transcripts | Structural errors (e.g. deletions, translocations, inversions…) |
| Integrity component (Si) | Pairs mapping to different transcripts | Transcript fragmentation |
Transcript quality score St is calculated as the product of the described score components, equally weighting detected assembly errors. Assembly score S is computed as the mean of individual transcript scores. All components and scores are normalized to a range between 0 and 1, where higher values indicate better quality.
In addition to transcript scores, CATS-rf provides a comprehensive set of assembly metrics, including transcript length and composition statistics, read mapping rates, positional coverage and accuracy profiles, and pair mapping consistency metrics.
CATS-rf consistently displays stronger performance than currently existing reference-free transcriptome assembly evaluation tools. For detailed benchmarks and methodology, please refer to the CATS preprint.