`lapa.correction`

Module Contents

Classes

`Transcript`	Transcript class for performing manupulation on transcripts.
`TranscriptModifier`	Modifier to update transcript start, end sites

Functions

`_links_transcript_agg`(links, read_annot_path)
`_transcript_tss_tes`(df, threshold=1)
`_save_corrected_gtf`(df, gtf, gtf_output, keep_unsupported=False)
`_update_abundace`(df_abundance, df_link_counts, keep_unsupported=False)
`correct_talon`(links_path, read_annot_path, gtf_input, gtf_output, abundance_path, abundance_output, link_threshold=1, keep_unsupported=False)	LAPA creates GTF file with tss/poly(A) cluster support based

class lapa.correction.Transcript(transcript_id, df, min_exon_len=25)

Transcript class for performing manupulation on transcripts.

Parameters

transcript_id – transcript id
df – Transcript and subfeatures as data.frame. Enteries of gtf file related with the transcript.
min_exon_len – Minimum exon length

property five_prime_exon_idx(self): Most five prime exon of transcript

property three_prime_exon_idx(self): Most three prime exon of transcript

copy(self, new_transcript_id)

Create copy of transcript with new transcript id.

Parameters: new_transcript_id – oldTranscriptId#suffix

valid_five_prime_exon_len(self, tss_site: int)

Checks new proposed tss_site is valid for most five prime exon of transcript based on coordinates and minimum exon length.

Parameters: tss_site – Position of proposed tss site

valid_three_prime_exon_len(self, polyA_site)

Checks new proposed polyA_site is valid for most three prime exon of transcript based on coordinates and minimum exon length.

Parameters: polyA_site – Position of proposed poly(A) site

update_tss_site(self, tss_site)

Updates tss site of transcript and most five prime exons.

Parameters: tss_site – Position of proposed tss site.

update_polyA_site(self, polyA_site)

Updates poly(A) site of transcript and most three prime exons.

Parameters: polyA_site – Position of proposed poly(A) site.

class lapa.correction.TranscriptModifier(templete_gtf, min_exon_len=25)

Modifier to update transcript start, end sites of transcript and respective exons, genes.

Parameters

templete_gtf – Use gtf as templete.
min_exon_len – Minimum exon length.

fetch_transcript(self, transcript_id): Fetch transcript from the templete gtf and return transcript object.

add_transcript(self, transcript): Add new trascript isoform to modifier.

static _sort_gtf_key(col)

static _sort_gtf(df)

to_gtf(self, path)

Save all motifiers with motified trascript as gtf.

Parameters: path – Output path to save gtf.

__contains__(self, transcript_id)

lapa.correction._links_transcript_agg(links, read_annot_path)

lapa.correction._transcript_tss_tes(df, threshold=1)

lapa.correction._save_corrected_gtf(df, gtf, gtf_output, keep_unsupported=False)

lapa.correction._update_abundace(df_abundance, df_link_counts, keep_unsupported=False)

lapa.correction.correct_talon(links_path, read_annot_path, gtf_input, gtf_output, abundance_path, abundance_output, link_threshold=1, keep_unsupported=False)

LAPA creates GTF file with tss/poly(A) cluster support based on the linking reads and using splice chain of TALON.

Parameters

links_path – Path to linking read file generated with lapa_link_tss_to_tes command.
read_annot_path – read_annot of TALON annotating read transcript assignments.
gtf_input – Input gtf file to extract splice chains.
gtf_output – Output corrected gtf contains trascripts with tss/poly(A) end support.
abundance_path – Input abundance file of TALON which contains abundance of each transcript.
abundance_output – Update abundance file which calculated based on abundance of linking reads.
link_threshold – Minimum number of linking reads to create transcript isoform.
keep_unsupported – Keep transcripts without tss and tes support in the original gtf. If true transcript created with non-linking reads (partial) in the original files are kept gtf and abundance.

lapa.correction

Module Contents

Classes

Functions

`lapa.correction`