lapa.correction

Module Contents

Classes

Transcript

Transcript class for performing manupulation on transcripts.

TranscriptModifier

Modifier to update transcript start, end sites

Functions

_links_transcript_agg(links, read_annot_path)

_transcript_tss_tes(df, threshold=1)

_save_corrected_gtf(df, gtf, gtf_output, keep_unsupported=False)

_update_abundace(df_abundance, df_link_counts, keep_unsupported=False)

correct_talon(links_path, read_annot_path, gtf_input, gtf_output, abundance_path, abundance_output, link_threshold=1, keep_unsupported=False)

LAPA creates GTF file with tss/poly(A) cluster support based

class lapa.correction.Transcript(transcript_id, df, min_exon_len=25)

Transcript class for performing manupulation on transcripts.

Parameters
  • transcript_id – transcript id

  • df – Transcript and subfeatures as data.frame. Enteries of gtf file related with the transcript.

  • min_exon_len – Minimum exon length

property five_prime_exon_idx(self)

Most five prime exon of transcript

property three_prime_exon_idx(self)

Most three prime exon of transcript

copy(self, new_transcript_id)

Create copy of transcript with new transcript id.

Parameters

new_transcript_idoldTranscriptId#suffix

valid_five_prime_exon_len(self, tss_site: int)

Checks new proposed tss_site is valid for most five prime exon of transcript based on coordinates and minimum exon length.

Parameters

tss_site – Position of proposed tss site

valid_three_prime_exon_len(self, polyA_site)

Checks new proposed polyA_site is valid for most three prime exon of transcript based on coordinates and minimum exon length.

Parameters

polyA_site – Position of proposed poly(A) site

update_tss_site(self, tss_site)

Updates tss site of transcript and most five prime exons.

Parameters

tss_site – Position of proposed tss site.

update_polyA_site(self, polyA_site)

Updates poly(A) site of transcript and most three prime exons.

Parameters

polyA_site – Position of proposed poly(A) site.

class lapa.correction.TranscriptModifier(templete_gtf, min_exon_len=25)

Modifier to update transcript start, end sites of transcript and respective exons, genes.

Parameters
  • templete_gtf – Use gtf as templete.

  • min_exon_len – Minimum exon length.

fetch_transcript(self, transcript_id)

Fetch transcript from the templete gtf and return transcript object.

add_transcript(self, transcript)

Add new trascript isoform to modifier.

static _sort_gtf_key(col)
static _sort_gtf(df)
to_gtf(self, path)

Save all motifiers with motified trascript as gtf.

Parameters

path – Output path to save gtf.

__contains__(self, transcript_id)
lapa.correction._transcript_tss_tes(df, threshold=1)
lapa.correction._save_corrected_gtf(df, gtf, gtf_output, keep_unsupported=False)
lapa.correction._update_abundace(df_abundance, df_link_counts, keep_unsupported=False)
lapa.correction.correct_talon(links_path, read_annot_path, gtf_input, gtf_output, abundance_path, abundance_output, link_threshold=1, keep_unsupported=False)

LAPA creates GTF file with tss/poly(A) cluster support based on the linking reads and using splice chain of TALON.

Parameters
  • links_path – Path to linking read file generated with lapa_link_tss_to_tes command.

  • read_annot_path – read_annot of TALON annotating read transcript assignments.

  • gtf_input – Input gtf file to extract splice chains.

  • gtf_output – Output corrected gtf contains trascripts with tss/poly(A) end support.

  • abundance_path – Input abundance file of TALON which contains abundance of each transcript.

  • abundance_output – Update abundance file which calculated based on abundance of linking reads.

  • link_threshold – Minimum number of linking reads to create transcript isoform.

  • keep_unsupported – Keep transcripts without tss and tes support in the original gtf. If true transcript created with non-linking reads (partial) in the original files are kept gtf and abundance.