Synthetic long read sequencing protocols utilize accurate barcoded short read NGS sequencing of long DNA fragments in order to obtain long range information at high nucleotide accuracy. Instantiations of this protocol, such as Moleculo, require high depth sequencing of each long fragment to first enable de novo assembly into synthetic long reads in order to benefit from the full fragment length.
This page contains software that operates on much lower coverage shallow sequenced read clouds that, with the aid of statistical methods, facilitate accurate alignment and variation discovery within complex repeats of a target genome at a fraction of the sequencing requirements.
Random Field Aligner (RFA) captures the relationships among the short reads governed by a particular synthetic long read process via a Markov Random Field. By modelling the short read generative process from source long molecules, RFA is able to accurately align shallow sequenced read clouds accurately within high identity repeats of a target genome.
RFA is written in Python and the current version can be downloaded here