lapa.count
Module Contents
Classes
Base class to count features from alignment file. |
|
Counts 3' ends of reads (transcript end sites) per position |
|
Counts 5' ends of reads (transcript end sites) per position |
|
Counts 3' end of reads with polyA-tail (transcript end sites) per position |
|
Base class to counts reads from multiple aligment files. |
|
Counts transcript end sites from multiple aligment files. |
|
Counts transcript start sites from multiple aligment files. |
Functions
|
Adaptor for tqdm to integrate to logging |
|
Saves counts as bigwig file for each strand generated with |
|
Saves counts of transcript end sites (TSS) as bigwig file |
|
Saves counts of transcript end sites (TSS) as bigwig file |
- lapa.count._tqdm_counting(iterable)
Adaptor for tqdm to integrate to logging
- class lapa.count.BaseCounter(bam_file, mapq=10, progress=True)
Base class to count features from alignment file.
- Parameters
bam_file – Path to bam file or pysam.AlignmentFile object.
mapq (
int
, optional) – minimum reads quality required to use in counting.progress (
bool
, optional) – Show progress in counting.
- bam
alignment file.
- Type
pysam.AlignmentFile
Examples
Count mid location of the read.
>>> class MidCounter(BaseCounter): >>> def count_read(self, read): >>> return (read.reference_start + read.reference_end) / 2 >>> counter = MidCounter(bam_file) >>> counter.to_bigwig(chrom_sizes, output_dir, 'mid') >>> os.listdir(output_dir) ['mid_pos.bw', 'mid_neg.bw'] >>> counter.to_df() +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | count | coverage | | (category) | (int32) | (int32) | (category) | (int64) | (int64) | |--------------+-----------+-----------+--------------+-----------+------------| | chr1 | 887771 | 887772 | + | 5 | 5 | | chr1 | 994684 | 994685 | - | 8 | 10 | ...
- property bam(self)
Alignment file used in counting.
- iter_reads(self, chrom=None, strand=None)
- filter_read(self, read)
Filter reads from counting if not true
- count(self)
Counts reads per positions defined by count_Read_function
- Returns
- Returns dictionary of
chromosome, position and strand ad index and counts as values.
- Return type
Dict[(chrom, pos, strand), int]
- abstract count_read(self, read: pysam.AlignedSegment)
- to_gr(self)
- Counts as dataframe with columns of
[‘Chromosome’, ‘Start’, ‘End’, ‘Strand’, ‘count’]
- to_df(self)
- static _to_bigwig(gr, chrom_sizes, output_dir, prefix)
- to_bigwig(self, chrom_sizes, output_dir, prefix='lapa_counts')
Saves counts as bigwig file for each strand
- Parameters
chrom_sizes (str) – Chrom sizes files (can be generated with) from fasta with faidx fasta -i chromsizes > chrom_sizes
output_dir – Output directory to save bigwig files
prefix (str) – File prefix to used in bigwig the files
- lapa.count.save_count_bw(df, output_dir, chrom_sizes, prefix)
- Saves counts as bigwig file for each strand generated with
instance of BaseCounter object
- Parameters
chrom_sizes (str) – Chrom sizes files (can be generated with) from fasta with faidx fasta -i chromsizes > chrom_sizes
output_dir – Output directory to save bigwig files
prefix (str) – File prefix to used in bigwig the files
- lapa.count.save_tss_count_bw(df, chrom_sizes, output_dir, prefix)
- Saves counts of transcript end sites (TSS) as bigwig file
for each strand generated with TesCounters
- Parameters
chrom_sizes (str) – Chrom sizes files (can be generated with) from fasta with faidx fasta -i chromsizes > chrom_sizes
output_dir – Output directory to save bigwig files
prefix (str) – File prefix to used in bigwig the files
- lapa.count.save_tes_count_bw(df, chrom_sizes, output_dir, prefix)
- Saves counts of transcript end sites (TSS) as bigwig file
for each strand generated with TesCounters
- Parameters
chrom_sizes (str) – Chrom sizes files (can be generated with) from fasta with faidx fasta -i chromsizes > chrom_sizes
output_dir – Output directory to save bigwig files
prefix (str) – File prefix to used in bigwig the files
- class lapa.count.ThreePrimeCounter(bam_file, mapq=10, progress=True)
Bases:
BaseCounter
- Counts 3’ ends of reads (transcript end sites) per position
from alignment file.
- Parameters
bam_file – Path to bam file or pysam.AlignmentFile object.
mapq (
int
, optional) – minimum reads quality required to use in counting.progress (
bool
, optional) – Show progress in counting.
- bam
alignment file.
- Type
pysam.AlignmentFile
Examples
Count 3’ ends of the read per position.
>>> counter = ThreePrimeCounter(bam_file) >>> counter.to_bigwig(chrom_sizes, output_dir, 'mid') >>> os.listdir(output_dir) ['lapa_count_pos.bw', 'lapa_count_neg.bw'] >>> counter.to_df() +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | count | coverage | | (category) | (int32) | (int32) | (category) | (int64) | (int64) | |--------------+-----------+-----------+--------------+-----------+------------| | chr1 | 887771 | 887772 | + | 5 | 5 | | chr1 | 994684 | 994685 | - | 8 | 10 | ...
- count_read(self, read: pysam.AlignedSegment)
Returns 3’ end of the read
- static _calculate_tail_seq(tail_seq, tail_base)
Calculate tail seq
- class lapa.count.FivePrimeCounter(bam_file, mapq=10, progress=True)
Bases:
BaseCounter
- Counts 5’ ends of reads (transcript end sites) per position
from alignment file.
- Parameters
bam_file – Path to bam file or pysam.AlignmentFile object.
mapq (
int
, optional) – minimum reads quality required to use in counting.progress (
bool
, optional) – Show progress in counting.
- bam
alignment file.
- Type
pysam.AlignmentFile
Examples
Count 5’ ends of the read per position.
>>> counter = StartCounter(bam_file) >>> counter.to_bigwig(chrom_sizes, output_dir, 'mid') >>> os.listdir(output_dir) ['lapa_count_pos.bw', 'lapa_count_neg.bw'] >>> counter.to_df() +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | count | coverage | | (category) | (int32) | (int32) | (category) | (int64) | (int64) | |--------------+-----------+-----------+--------------+-----------+------------| | chr1 | 887771 | 887772 | + | 5 | 5 | | chr1 | 994684 | 994685 | - | 8 | 10 | ...
- count_read(self, read: pysam.AlignedSegment)
Returns 5’ end of the read
- class lapa.count.PolyaTailCounter(bam_file, mapq=10, progress=True, min_tail_len=10, min_percent_a=0.9, count_aligned=False)
Bases:
ThreePrimeCounter
- Counts 3’ end of reads with polyA-tail (transcript end sites) per position
from alignment file.
- Parameters
bam_file – Path to bam file or pysam.AlignmentFile object.
mapq (
int
, optional) – minimum reads quality required to use in counting.progress (
bool
, optional) – Show progress in counting.min_tail_len –
- bam
alignment file.
- Type
pysam.AlignmentFile
Examples
Count 3’ ends of the read per position.
>>> counter = PolyaTailCounter(bam_file) >>> counter.to_bigwig(chrom_sizes, output_dir, 'mid') >>> os.listdir(output_dir) ['lapa_count_pos.bw', 'lapa_count_neg.bw'] >>> counter.to_df() +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | count | coverage | | (category) | (int32) | (int32) | (category) | (int64) | (int64) | |--------------+-----------+-----------+--------------+-----------+------------| | chr1 | 887771 | 887772 | + | 5 | 5 | | chr1 | 994684 | 994685 | - | 8 | 10 | ...
- static detect_polyA_tail(read: pysam.AlignedSegment, count_aligned=False)
Detect polyA tails from a read
- Parameters
read – aligned reads
count_aligned – Count aligned base pairs (likely internal priming) as well in tail length.
- Returns
Tuple of polyA_site, length of tail, percent of A base in tails.
- _read_is_tailed(self, tail_len, percent_a)
- iter_tailed_reads(self)
Iterates polyA reads and polyA_site based on polyA filters.
- save_tailed_reads(self, output_bam)
Save tailed reads as bam files
- Parameters
output_bam – Path to bam file or pysam.AlignmentFile object.
- tail_len_dist(self)
Returns tail length distribution of reads based on the filters.
- plot_tail_len_dist(self)
Plots pdf and cdf of tail length distribution
- filter_read(self, read)
Filter tailed reads and quality
- class lapa.count.BaseMultiCounter(df_alignment: pandas.DataFrame, method: str, mapq=10, is_read_annot=False)
Base class to counts reads from multiple aligment files.
- Parameters
df_alignment – DataFrame with columns of [‘sample’, ‘dataset’, ‘path’] where sample is the sample name, dataset is name of the group (replicates) of sample belong, path is the path to bam file.
method – Counting method implemented by child class.
mapq – minimum mapping quality
is_read_annot – Talon reads annotate file can be provided to df_alignment argument in that case this argument need to True.
- abstract build_counter(self, bam)
- abstract _count_read_annot(self)
- static _to_bigwig(df_all, tes, chrom_sizes, output_dir, prefix='polyA')
- to_df(self)
Export counts as dataframe.
- Returns
Counst as tuple the first element is dataframe of all the counts and second element dictonary where first element is the name of sample and second element dataframe of counts.
- Return type
(pd.DataFrame, Dict[str, pd.DataFrame])
- class lapa.count.TesMultiCounter(alignment, method='end', mapq=10, min_tail_len=10, min_percent_a=0.9, is_read_annot=False)
Bases:
BaseMultiCounter
Counts transcript end sites from multiple aligment files.
- Parameters
df_alignment – DataFrame with columns of [‘sample’, ‘dataset’, ‘path’] where sample is the sample name, dataset is name of the group (replicates) of sample belong, path is the path to bam file.
method – either end or tail see PolyaTailCounter and ThreePrimeCounter for countering behavior.
mapq – minimum mapping quality
is_read_annot – Talon reads annotate file can be provided to df_alignment argument in that case this argument need to True.
Examples
Counts transcript end files for two samples with two replicates
>>> df_alignment = pd.DataFrame({ >>> 'sample': ['s1', 's2', 's3', 's4'], >>> 'dataset': ['d1', 'd2', 'd3', 'd4'], >>> 'path': ['s1.bam', 's2.bam', 's3.bam', 's4.bam'] >>> }) >>> counter = TesMultiCounter(df_alignment) >>> counter.to_bigwig(chrom_sizes, output_dir) # export counts as bw >>> df_all, samples = counter.to_df() # or export as df >>> df_all +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | count | coverage | | (category) | (int32) | (int32) | (category) | (int64) | (int64) | |--------------+-----------+-----------+--------------+-----------+------------| | chr1 | 887771 | 887772 | + | 5 | 5 | | chr1 | 994684 | 994685 | - | 8 | 10 | ... >>> samples['s1'] +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | count | coverage | | (category) | (int32) | (int32) | (category) | (int64) | (int64) | |--------------+-----------+-----------+--------------+-----------+------------| | chr1 | 887771 | 887772 | + | 5 | 5 | | chr1 | 994684 | 994685 | - | 8 | 10 | ...
- build_counter(self, bam)
- _count_read_annot(self)
- class lapa.count.TssMultiCounter(alignment, method='start', mapq=10, is_read_annot=False)
Bases:
BaseMultiCounter
Counts transcript start sites from multiple aligment files.
- Parameters
df_alignment – DataFrame with columns of [‘sample’, ‘dataset’, ‘path’] where sample is the sample name, dataset is name of the group (replicates) of sample belong, path is the path to bam file.
method – either end or tail see FiveTailCounter
mapq – minimum mapping quality
is_read_annot – Talon reads annotate file can be provided to df_alignment argument in that case this argument need to True.
Examples
Counts transcript end files for two samples with two replicates
>>> df_alignment = pd.DataFrame({ >>> 'sample': ['s1', 's2', 's3', 's4'], >>> 'dataset': ['d1', 'd2', 'd3', 'd4'], >>> 'path': ['s1.bam', 's2.bam', 's3.bam', 's4.bam'] >>> }) >>> counter = TssMultiCounter(df_alignment) >>> counter.to_bigwig(chrom_sizes, output_dir) # export counts as bw >>> df_all, samples = counter.to_df() # or export as df >>> df_all +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | count | coverage | | (category) | (int32) | (int32) | (category) | (int64) | (int64) | |--------------+-----------+-----------+--------------+-----------+------------| | chr1 | 887771 | 887772 | + | 5 | 5 | | chr1 | 994684 | 994685 | - | 8 | 10 | ... >>> samples['s1'] +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | count | coverage | | (category) | (int32) | (int32) | (category) | (int64) | (int64) | |--------------+-----------+-----------+--------------+-----------+------------| | chr1 | 887771 | 887772 | + | 5 | 5 | | chr1 | 994684 | 994685 | - | 8 | 10 | ...
- build_counter(self, bam)
- _count_read_annot(self)