lapa.correction
Module Contents
Classes
Transcript class for performing manupulation on transcripts. |
|
Modifier to update transcript start, end sites |
Functions
|
|
|
|
|
|
|
|
|
LAPA creates GTF file with tss/poly(A) cluster support based |
- class lapa.correction.Transcript(transcript_id, df, min_exon_len=25)
Transcript class for performing manupulation on transcripts.
- Parameters
transcript_id – transcript id
df – Transcript and subfeatures as data.frame. Enteries of gtf file related with the transcript.
min_exon_len – Minimum exon length
- property five_prime_exon_idx(self)
Most five prime exon of transcript
- property three_prime_exon_idx(self)
Most three prime exon of transcript
- copy(self, new_transcript_id)
Create copy of transcript with new transcript id.
- Parameters
new_transcript_id – oldTranscriptId#suffix
- valid_five_prime_exon_len(self, tss_site: int)
Checks new proposed tss_site is valid for most five prime exon of transcript based on coordinates and minimum exon length.
- Parameters
tss_site – Position of proposed tss site
- valid_three_prime_exon_len(self, polyA_site)
Checks new proposed polyA_site is valid for most three prime exon of transcript based on coordinates and minimum exon length.
- Parameters
polyA_site – Position of proposed poly(A) site
- update_tss_site(self, tss_site)
Updates tss site of transcript and most five prime exons.
- Parameters
tss_site – Position of proposed tss site.
- update_polyA_site(self, polyA_site)
Updates poly(A) site of transcript and most three prime exons.
- Parameters
polyA_site – Position of proposed poly(A) site.
- class lapa.correction.TranscriptModifier(templete_gtf, min_exon_len=25)
Modifier to update transcript start, end sites of transcript and respective exons, genes.
- Parameters
templete_gtf – Use gtf as templete.
min_exon_len – Minimum exon length.
- fetch_transcript(self, transcript_id)
Fetch transcript from the templete gtf and return transcript object.
- add_transcript(self, transcript)
Add new trascript isoform to modifier.
- static _sort_gtf_key(col)
- static _sort_gtf(df)
- to_gtf(self, path)
Save all motifiers with motified trascript as gtf.
- Parameters
path – Output path to save gtf.
- __contains__(self, transcript_id)
- lapa.correction._links_transcript_agg(links, read_annot_path)
- lapa.correction._transcript_tss_tes(df, threshold=1)
- lapa.correction._save_corrected_gtf(df, gtf, gtf_output, keep_unsupported=False)
- lapa.correction._update_abundace(df_abundance, df_link_counts, keep_unsupported=False)
- lapa.correction.correct_talon(links_path, read_annot_path, gtf_input, gtf_output, abundance_path, abundance_output, link_threshold=1, keep_unsupported=False)
LAPA creates GTF file with tss/poly(A) cluster support based on the linking reads and using splice chain of TALON.
- Parameters
links_path – Path to linking read file generated with lapa_link_tss_to_tes command.
read_annot_path – read_annot of TALON annotating read transcript assignments.
gtf_input – Input gtf file to extract splice chains.
gtf_output – Output corrected gtf contains trascripts with tss/poly(A) end support.
abundance_path – Input abundance file of TALON which contains abundance of each transcript.
abundance_output – Update abundance file which calculated based on abundance of linking reads.
link_threshold – Minimum number of linking reads to create transcript isoform.
keep_unsupported – Keep transcripts without tss and tes support in the original gtf. If true transcript created with non-linking reads (partial) in the original files are kept gtf and abundance.