lapa.genomic_regions
Module Contents
Classes
Annotate polyA sites based on the genomics features. |
|
Annotate tss sites based on the genomics features. |
Functions
Adaptor for tqdm to integrate to logging |
- lapa.genomic_regions._tqdm_pandas_gr()
Adaptor for tqdm to integrate to logging
- class lapa.genomic_regions._GenomicRegions(gtf_file, feature_order, annotated_feature_site)
- abstract annotated_site(self, df)
- features(self, features: set = None) pyranges.PyRanges
- annotate(self, gr, features=None)
- intergenic_genes(self, df)
Assign gene_id and gene_name to intergenic clusters.
- _agg_annotation_gene(self, df)
- class lapa.genomic_regions.PolyAGenomicRegions(gtf_file)
Bases:
_GenomicRegions
Annotate polyA sites based on the genomics features.
- Parameters
gtf_file – Annotation file overlap against.
Examples
Annotation of poly(A) in pyranges format:
>>> regions = PolyAGenomicRegions('hg38.gtf') >>> gr +--------------+-----------+-----------+--------------+--------------+ | Chromosome | Start | End | Strand | polyA_site | | (category) | (int32) | (int32) | (category) | (int64) | |--------------+-----------+-----------+--------------+--------------| | chr17 | 4303000 | 4303100 | - | 4303050 | | chr17 | 43044826 | 43045289 | - | 43045057 | | chr17 | 43046541 | 43046997 | - | 43046769 | | chr17 | 43115728 | 43115767 | - | 43057094 | | chr17 | 43093458 | 43093573 | - | 43093515 | +--------------+-----------+-----------+--------------+--------------+ >>> regions.annotate(gr) +--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------+ | Chromosome | Start | End | Strand | polyA_site | Feature | gene_id | gene_name | annotated_site | | (category) | (int32) | (int32) | (category) | (int64) | (object) | (object) | (object) | (int64) | |--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------| | chr17 | 4303000 | 4303100 | - | 4303050 | intergenic | intergenic_0 | intergenic_0 | -1 | | chr17 | 43044826 | 43045289 | - | 43045057 | three_prime_utr | ENSG00000012048.23 | BRCA1 | 43044294 | | chr17 | 43046541 | 43046997 | - | 43046769 | intron | ENSG00000012048.23 | BRCA1 | -1 | | chr17 | 43093458 | 43093573 | - | 43093515 | three_prime_utr | ENSG00000012048.23 | BRCA1 | 43091434 | | chr17 | 43115728 | 43115767 | - | 43057094 | exon | ENSG00000012048.23 | BRCA1 | -1 | +--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------+
- annotated_site(self, df)
- class lapa.genomic_regions.TssGenomicRegions(gtf_file)
Bases:
_GenomicRegions
Annotate tss sites based on the genomics features.
- Parameters
gtf_file – Annotation file overlap against.
Examples
Annotation of tss in pyranges format:
>>> regions = TssGenomicRegions('hg38.gtf') >>> gr +--------------+-----------+-----------+--------------+--------------+ | Chromosome | Start | End | Strand | polyA_site | | (category) | (int32) | (int32) | (category) | (int64) | |--------------+-----------+-----------+--------------+--------------| | chr17 | 4303000 | 4303100 | - | 4303050 | | chr17 | 43044826 | 43045289 | - | 43045057 | | chr17 | 43046541 | 43046997 | - | 43046769 | | chr17 | 43115728 | 43115767 | - | 43057094 | | chr17 | 43093458 | 43093573 | - | 43093515 | +--------------+-----------+-----------+--------------+--------------+ >>> regions.annotate(gr) +--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------+ | Chromosome | Start | End | Strand | polyA_site | Feature | gene_id | gene_name | annotated_site | | (category) | (int32) | (int32) | (category) | (int64) | (object) | (object) | (object) | (int64) | |--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------| | chr17 | 4303000 | 4303100 | - | 4303050 | intergenic | intergenic_0 | intergenic_0 | -1 | | chr17 | 43044826 | 43045289 | - | 43045057 | five_prime_utr | ENSG00000012048.23 | BRCA1 | 43044294 | | chr17 | 43046541 | 43046997 | - | 43046769 | intron | ENSG00000012048.23 | BRCA1 | -1 | | chr17 | 43093458 | 43093573 | - | 43093515 | five_prime_utr | ENSG00000012048.23 | BRCA1 | 43091434 | | chr17 | 43115728 | 43115767 | - | 43057094 | exon | ENSG00000012048.23 | BRCA1 | -1 | +--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------+
- annotated_site(self, df)