lapa.genomic_regions

Module Contents

Classes

_GenomicRegions

PolyAGenomicRegions

Annotate polyA sites based on the genomics features.

TssGenomicRegions

Annotate tss sites based on the genomics features.

Functions

_tqdm_pandas_gr()

Adaptor for tqdm to integrate to logging

lapa.genomic_regions._tqdm_pandas_gr()

Adaptor for tqdm to integrate to logging

class lapa.genomic_regions._GenomicRegions(gtf_file, feature_order, annotated_feature_site)
abstract annotated_site(self, df)
features(self, features: set = None) pyranges.PyRanges
annotate(self, gr, features=None)
intergenic_genes(self, df)

Assign gene_id and gene_name to intergenic clusters.

_agg_annotation_gene(self, df)
class lapa.genomic_regions.PolyAGenomicRegions(gtf_file)

Bases: _GenomicRegions

Annotate polyA sites based on the genomics features.

Parameters

gtf_file – Annotation file overlap against.

Examples

Annotation of poly(A) in pyranges format:

>>> regions = PolyAGenomicRegions('hg38.gtf')
>>> gr
+--------------+-----------+-----------+--------------+--------------+
| Chromosome   |     Start |       End | Strand       |   polyA_site |
| (category)   |   (int32) |   (int32) | (category)   |      (int64) |
|--------------+-----------+-----------+--------------+--------------|
| chr17        |   4303000 |   4303100 | -            |      4303050 |
| chr17        |  43044826 |  43045289 | -            |     43045057 |
| chr17        |  43046541 |  43046997 | -            |     43046769 |
| chr17        |  43115728 |  43115767 | -            |     43057094 |
| chr17        |  43093458 |  43093573 | -            |     43093515 |
+--------------+-----------+-----------+--------------+--------------+
>>> regions.annotate(gr)
+--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------+
| Chromosome   |     Start |       End | Strand       |   polyA_site | Feature         | gene_id            | gene_name    |   annotated_site |
| (category)   |   (int32) |   (int32) | (category)   |      (int64) | (object)        | (object)           | (object)     |          (int64) |
|--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------|
| chr17        |   4303000 |   4303100 | -            |      4303050 | intergenic      | intergenic_0       | intergenic_0 |               -1 |
| chr17        |  43044826 |  43045289 | -            |     43045057 | three_prime_utr | ENSG00000012048.23 | BRCA1        |         43044294 |
| chr17        |  43046541 |  43046997 | -            |     43046769 | intron          | ENSG00000012048.23 | BRCA1        |               -1 |
| chr17        |  43093458 |  43093573 | -            |     43093515 | three_prime_utr | ENSG00000012048.23 | BRCA1        |         43091434 |
| chr17        |  43115728 |  43115767 | -            |     43057094 | exon            | ENSG00000012048.23 | BRCA1        |               -1 |
+--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------+
annotated_site(self, df)
class lapa.genomic_regions.TssGenomicRegions(gtf_file)

Bases: _GenomicRegions

Annotate tss sites based on the genomics features.

Parameters

gtf_file – Annotation file overlap against.

Examples

Annotation of tss in pyranges format:

>>> regions = TssGenomicRegions('hg38.gtf')
>>> gr
+--------------+-----------+-----------+--------------+--------------+
| Chromosome   |     Start |       End | Strand       |   polyA_site |
| (category)   |   (int32) |   (int32) | (category)   |      (int64) |
|--------------+-----------+-----------+--------------+--------------|
| chr17        |   4303000 |   4303100 | -            |      4303050 |
| chr17        |  43044826 |  43045289 | -            |     43045057 |
| chr17        |  43046541 |  43046997 | -            |     43046769 |
| chr17        |  43115728 |  43115767 | -            |     43057094 |
| chr17        |  43093458 |  43093573 | -            |     43093515 |
+--------------+-----------+-----------+--------------+--------------+
>>> regions.annotate(gr)
+--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------+
| Chromosome   |     Start |       End | Strand       |   polyA_site | Feature         | gene_id            | gene_name    |   annotated_site |
| (category)   |   (int32) |   (int32) | (category)   |      (int64) | (object)        | (object)           | (object)     |          (int64) |
|--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------|
| chr17        |   4303000 |   4303100 | -            |      4303050 | intergenic      | intergenic_0       | intergenic_0 |               -1 |
| chr17        |  43044826 |  43045289 | -            |     43045057 | five_prime_utr  | ENSG00000012048.23 | BRCA1        |         43044294 |
| chr17        |  43046541 |  43046997 | -            |     43046769 | intron          | ENSG00000012048.23 | BRCA1        |               -1 |
| chr17        |  43093458 |  43093573 | -            |     43093515 | five_prime_utr  | ENSG00000012048.23 | BRCA1        |         43091434 |
| chr17        |  43115728 |  43115767 | -            |     43057094 | exon            | ENSG00000012048.23 | BRCA1        |               -1 |
+--------------+-----------+-----------+--------------+--------------+-----------------+--------------------+--------------+------------------+
annotated_site(self, df)