CLI arguments

lapa

CLI interface for lapa polyA cluster calling.

lapa [OPTIONS]

Options

--alignment <alignment>: Required Single or multiple bam file paths are separated with a comma.Alternatively, CSV file with columns of sample, dataset, path where the sample columns contains the name of the sample, the dataset is the group of samples replicates of each other, and path is the path of bam file.

--fasta <fasta>: Required Genome reference (GENCODE or ENSEMBL fasta)

--annotation <annotation>: Required Standart genome annotation (GENCODE or ENSEMBL gtf). GENCODE gtf file do not contains annotation for five_prime_utr and three_prime_utr so need to be corrected with gencode_utr_fix (see https://github.com/MuhammedHasan/gencode_utr_fix.git).

--chrom_sizes <chrom_sizes>: Required Chrom sizes files (can be generated with faidx fasta -i chromsizes > chrom_sizes)

--output_dir <output_dir>: Required Output directory of LAPA. See lapa.readthedocs.io/en/latest/output.html) for the details of the directory structure and file format.

--counting_method <counting_method>

Counting method either end or tail where tails counting only counts reads with poly(A)-tail with certain length defined by –min_tail_len parameter. end counting still detects tails if exists but uses end location of all the reads in counting regardless of tail length.

Options: end | tail

--min_tail_len <min_tail_len>: Minimum tail length for tail counting strategy.This parameter will be ignored in end counting setting.

--min_percent_a <min_percent_a>: Minimum percentage of A bp in soft-trimmed segment to consider the segment as tails. This parameter will be ignored for end counting.

--mapq <mapq>: Minimum read quality to required for tes calling.

--cluster_extent_cutoff <cluster_extent_cutoff>: Minimum number of reads to initialized cluster and terminated cluster will be terminated if read numbers below this cutoff for certain number of base pairs.

--cluster_ratio_cutoff <cluster_ratio_cutoff>: Percentage of coverage change for initialize cluster.At least x% of reads covering the bp need to ended in the position to initilized the cluster. This filter implies <x% of the reads given position could stop by chance so filtered as noise.

--cluster_window <cluster_window>: Patience threshold to wait for termination cluster.If reads counts below the threshold for x bp then cluster will be terminated otherwise cluster will be extended. if number of reads subceed the cluster_extent_cutoff.

--min_replication_rate <min_replication_rate>: Minimum replication rate to include cluster in replicated clusters. 0.95 is recommended cutoff for experimental replication and 75% for biological replication.

--replication_rolling_size <replication_rolling_size>: Replication rolling size to calculate replication rate.

--replication_num_sample <replication_num_sample>: Number of samples which region need to be observed for replication.

--replication_min_count <replication_min_count>: Minimum count needed to recognize region as expressed.

--non_replicates_read_threhold <non_replicates_read_threhold>: Minimum read count need for the samples without replication. If there is not replicate samples for the sample, this default cutoff will be applied.

lapa_tss

CLI interface for lapa tss cluster calling.

lapa_tss [OPTIONS]

Options

--alignment <alignment>: Required Single or multiple bam file paths are separated with a comma.Alternatively, CSV file with columns of sample, dataset, path where the sample columns contains the name of the sample, the dataset is the group of samples replicates of each other, and path is the path of bam file.

--fasta <fasta>: Required Genome reference (GENCODE or ENSEMBL fasta)

--annotation <annotation>: Required Standart genome annotation (GENCODE or ENSEMBL gtf). GENCODE gtf file do not contains annotation for five_prime_utr and three_prime_utr so need to be corrected with gencode_utr_fix (see https://github.com/MuhammedHasan/gencode_utr_fix.git)

--chrom_sizes <chrom_sizes>: Required Chrom sizes files (can be generated with)`faidx fasta -i chromsizes > chrom_sizes`)

--output_dir <output_dir>: Required Output directory of LAPA. See lapa.readthedocs.io/en/latest/output.html) for the details of the directory structure and file format.

--mapq <mapq>: Minimum read quality to required for tss calling

--cluster_extent_cutoff <cluster_extent_cutoff>: Minimum number of reads to initialized cluster and terminated cluster will be terminated if read numbers below this cutoff for certain number of base pairs.

--cluster_ratio_cutoff <cluster_ratio_cutoff>: Percentage of coverage change for initialize cluster.At least x% of reads covering the bp need to ended in the position to initilized the cluster. This filter implies <x% of the reads given position could stop by chance so ignored as noise

--cluster_window <cluster_window>: Patience threshold to wait for termination cluster.If reads counts below the threshold for x bp then cluster will be terminated otherwise cluster will be extended. if number of reads subceed the cluster_extent_cutoff

--min_replication_rate <min_replication_rate>: Minimum replication rate to include cluster in replicated clusters. 0.95 is recommended cutoff for experimental replication and 75% for biological replication.

--replication_rolling_size <replication_rolling_size>: Replication rolling size to calcultate replication rate

--replication_num_sample <replication_num_sample>: Number of samples which region need to be observed for replication

--replication_min_count <replication_min_count>: Minimum count needed to recognize region as expressed

--non_replicates_read_threhold <non_replicates_read_threhold>: Minimum read count need for the samples without replication. If there is not replicate samples for the sample, this default cutoff will be applied.

lapa_link_tss_to_tes

CLI interface for detecting of linking reads. Linking reads are the reads start in a tss cluster and in poly(A) cluster. Linking reads represents transcript complete isoforms.

lapa_link_tss_to_tes [OPTIONS]

Options

--alignment <alignment>: Required Path of the bam file. Start and end position of each read in the file will be overlaped against the tss/poly(A) cluster and annotated accordingly.

--lapa_dir <lapa_dir>: Required LAPA output directory of generated before with lapa command.

--lapa_tss_dir <lapa_tss_dir>: Required LAPA output directory of generated before with lapa_tss command.

--output <output>: Required Output path to .csv file which contains linking reads.

--mapq <mapq>: Minimum read quality to required for linking.

--min_read_length <min_read_length>: Minimum read quality to required for linking.

--dataset <dataset>: Which dataset to use in linking. Valid options (all, raw_all, or dataset)

lapa_correct_talon

CLI interface for create GTF file with tss/poly(A) cluster support based on the linking reads and using splice chain of TALON.

lapa_correct_talon [OPTIONS]

Options

--links <links>: Required Path to linking read file generated with lapa_link_tss_to_tes command

--read_annot <read_annot>: Required read_annot of TALON annotating read, transcript assignments.

--gtf_input <gtf_input>: Required Input gtf file to extract splice chains.

--gtf_output <gtf_output>: Required Output corrected gtf contains trascripts with tss/poly(A) end support.

--abundance_input <abundance_input>: Required Input abundance file of TALON which contains abundance of each transcript.

--abundance_output <abundance_output>: Required Update abundance file which calculated based on abundance of linking reads.

--keep_unsupported: Keep transcripts without tss and tes support in the original gtf. If true transcript created with non-linking reads (partial) in the original files are kept gtf and abundance.