
Module Contents




Annotate polyA sites based on the genomics features.


Annotate tss sites based on the genomics features.



Adaptor for tqdm to integrate to logging


Adaptor for tqdm to integrate to logging

class lapa.genomic_regions._GenomicRegions(gtf_file, feature_order, annotated_feature_site)
abstract annotated_site(self, df)
features(self, features: set = None) pyranges.PyRanges
annotate(self, gr, features=None)
intergenic_genes(self, df)

Assign gene_id and gene_name to intergenic clusters.

_agg_annotation_gene(self, df)
class lapa.genomic_regions.PolyAGenomicRegions(gtf_file)

Bases: _GenomicRegions

Annotate polyA sites based on the genomics features.


gtf_file – Annotation file overlap against.


Annotation of poly(A) in pyranges format:

>>> regions = PolyAGenomicRegions('hg38.gtf')
>>> gr
| Chromosome   |     Start |       End | Strand       |   polyA_site |
| (category)   |   (int32) |   (int32) | (category)   |      (int64) |
| chr17        |   4303000 |   4303100 | -            |      4303050 |
| chr17        |  43044826 |  43045289 | -            |     43045057 |
| chr17        |  43046541 |  43046997 | -            |     43046769 |
| chr17        |  43115728 |  43115767 | -            |     43057094 |
| chr17        |  43093458 |  43093573 | -            |     43093515 |
>>> regions.annotate(gr)
| Chromosome   |     Start |       End | Strand       |   polyA_site | Feature         | gene_id            | gene_name    |   annotated_site |
| (category)   |   (int32) |   (int32) | (category)   |      (int64) | (object)        | (object)           | (object)     |          (int64) |
| chr17        |   4303000 |   4303100 | -            |      4303050 | intergenic      | intergenic_0       | intergenic_0 |               -1 |
| chr17        |  43044826 |  43045289 | -            |     43045057 | three_prime_utr | ENSG00000012048.23 | BRCA1        |         43044294 |
| chr17        |  43046541 |  43046997 | -            |     43046769 | intron          | ENSG00000012048.23 | BRCA1        |               -1 |
| chr17        |  43093458 |  43093573 | -            |     43093515 | three_prime_utr | ENSG00000012048.23 | BRCA1        |         43091434 |
| chr17        |  43115728 |  43115767 | -            |     43057094 | exon            | ENSG00000012048.23 | BRCA1        |               -1 |
annotated_site(self, df)
class lapa.genomic_regions.TssGenomicRegions(gtf_file)

Bases: _GenomicRegions

Annotate tss sites based on the genomics features.


gtf_file – Annotation file overlap against.


Annotation of tss in pyranges format:

>>> regions = TssGenomicRegions('hg38.gtf')
>>> gr
| Chromosome   |     Start |       End | Strand       |   polyA_site |
| (category)   |   (int32) |   (int32) | (category)   |      (int64) |
| chr17        |   4303000 |   4303100 | -            |      4303050 |
| chr17        |  43044826 |  43045289 | -            |     43045057 |
| chr17        |  43046541 |  43046997 | -            |     43046769 |
| chr17        |  43115728 |  43115767 | -            |     43057094 |
| chr17        |  43093458 |  43093573 | -            |     43093515 |
>>> regions.annotate(gr)
| Chromosome   |     Start |       End | Strand       |   polyA_site | Feature         | gene_id            | gene_name    |   annotated_site |
| (category)   |   (int32) |   (int32) | (category)   |      (int64) | (object)        | (object)           | (object)     |          (int64) |
| chr17        |   4303000 |   4303100 | -            |      4303050 | intergenic      | intergenic_0       | intergenic_0 |               -1 |
| chr17        |  43044826 |  43045289 | -            |     43045057 | five_prime_utr  | ENSG00000012048.23 | BRCA1        |         43044294 |
| chr17        |  43046541 |  43046997 | -            |     43046769 | intron          | ENSG00000012048.23 | BRCA1        |               -1 |
| chr17        |  43093458 |  43093573 | -            |     43093515 | five_prime_utr  | ENSG00000012048.23 | BRCA1        |         43091434 |
| chr17        |  43115728 |  43115767 | -            |     43057094 | exon            | ENSG00000012048.23 | BRCA1        |               -1 |
annotated_site(self, df)