Available Commands¶
Below is a list of the available Chewbacca commands.
Error Correction¶
-
class
preclean.Preclean_Command.
Preclean_Command
(args_)[source]¶ Attempts to fix minor sequencing errors caused by pyrosequencing. By reducing errors prior to sequence assembly, a greater number of paired reads can be sucessfully assembled. Matching forward and reverse files should be identically named, except for a <forward>/<reverse> suffix that indicates the read orientation. The two suffix conventions below are supported. Choose ONE suffix style and stick to it! Mixed suffixes are not supported.
_forwards/_reverse and _R1/_R2
- Inputs:
- fastq file(s) with left reads .
- fastq file(s) with right reads .
- Outputs:
- <left reads file>_corrected.fastq file(s).
- <right reads file>_corrected.fastq file(s).
Example:
Assuming a forwards read file ‘Data_R1.fq’ and a reverse reads file ‘Data_R1.fq’,
./ Data_R1.fq Data_R2.fq
$ python chewbacca.py preclean -f Data_R1.fq -r Data_R2.fq -o rslt
rslt/ Data_R1_corrected.fq Data_R2_corrected.fq
-
default_program
¶ alias of
Preclean_Program_Bayeshammer
Assembling Sequences¶
-
class
assemble.Assemble_Command.
Assemble_Command
(args_)[source]¶ Assembles reads from two (forward and reverse) fastq files/directories. For a set of k forward read files, and k reverse read files, return k assembled files. Matching forward and reverse files should be identically named, except for a <forward>/<reverse> suffix that indicates the read orientation. The two suffix conventions below are supported. Choose ONE suffix style and stick to it! Mixed suffixes are not supported.
_forwards/_reverse and _R1/_R2
- Inputs:
- fastq file(s) with left reads
- fastq file(s) with right reads
- Outputs:
- fastq File(s) with assembled reads
- Notes:
- Choose ONE suffix style and stick to it! Mixed suffixes are not supported. e.g. Sample_100_forwards.fq and Sample_100_reverse.fq will be assembled into Sample_100_assembled.fq. Simmilarly, Sample_100_R1.fq and Sample_100_R2.fq will be assembled into Sample_100_assembled.fq. However, Sample_100_forwards.fq and Sample_100_R2.fq are not guaranteed to be matched.
- You can provide as many pairs of files as you wish as long as they follow exactly one of the above naming conventions. If a ‘name’ parameter is provided, it will be used as a filename (not path) prefix for all assembled sequence files.
Example
Assuming a forwards read file ‘Data_R1.fq’ and a reverse reads file ‘Data_R1.fq’,
./ Data_R1.fq Data_R2.fq
$ python chewbacca.py assemble -n BALI -f Data_R1.fq -r Data_R2.fq -o rslt
rslt/ BALI_DATA.assembled.fq
-
default_program
¶ alias of
Assemble_Program_Pear
Demultiplexing by Barcode¶
Sequence Renaming¶
-
class
rename.Rename_Command.
Rename_Command
(args_)[source]¶ Renames sequences in a file with their filename and a serial ID#. Useful for simplifying complex naming systems into human-readable sequence names. In order to ensure the correct sample names are preserved, it is reccomended that this command be run immediately after the Demux Command.
- Inputs:
- One or more fasta/fastq files to rename.
- Outputs:
- _renamed.<ext> file - A <fasta/fastq> file with the renamed sequences.
- .samples file - A Samples File.
- .mapping file - A Mapping file.
- Notes:
In order for the .samples file to correctly list the sample name of the sequences in a file, this command should be run immediately after the Demux Command.
- The –clip parameter tells Chewbacca that trailing _<file_ID#> (from the demultiplexing command) should not
be considered part of the sample name. By default this is set to True, and should be fine. If you notice parts of your sample names getting clipped off in your .samples file, you should explicitly set this parameter to False.
Each input file will have a corresponding .samples, .mapping, and _renamed file.
The .samples file is needed by downstream Chewbacca processes (Building the OTU Table).
The .mapping file is purely for user convenience and record-keeping.
Example:
SampleA_0.fasta: @M03292:26:000000000-AH6AG:1:1101:22127:1256 AAAA @M03292:26:000000000-AH6AG:1:1101:22127:1257 AAAT
$ python chewbacca.py rename -i SampleA_0.fasta -o rslt
rslt/SampleA_0_renamed.fasta: @SampleA_ID0 AAAA @SampleA_ID1 AAAT rslt_samples/SampleA_0_renamed.samples: SampleA_ID0 SampleA SampleA_ID1 SampleA rslt_aux/SampleA_0_renamed.mapping: M03292:26:000000000-AH6AG:1:1101:22127:1256 SampleA_ID0 M03292:26:000000000-AH6AG:1:1101:22127:1257 SampleA_ID1
Adapter Removal¶
-
class
clean.Clean_Adapters_Command.
Clean_Adapters_Command
(args_)[source]¶ Removes sequencing adapters (and preceeding barcodes) from sequences in input file(s). Sequences should be in the following format:
<BARCODE><ADAPTER><SEQUENCE><RC_ADAPTER>.
Valid ADAPTER sequences, and their reverse-complements (RC_ADAPTER) should be defined separately in a pair of fasta-formatted files. Sequences passed to this command should have already been demultiplexed, as this process will remove the identifying barcode sequences.
- Inputs:
- One or more fasta/fastq files to clean.
- A single Adapters file file
- A single RC Adapters file file
- Outputs:
- <filename>_debarcoded.<ext> file(s) - <fasta/fastq> files, containing sequences with their leading adapters, trailing adapters, and barcodes removed.
- Notes:
- Be aware of the program-specific details around ‘N’ nucleotide characters.
Example:
Given Data_ID#1 with barcode=AGACGC:
./ Data.fasta: @Data_ID#1 AGACGCGGWACWGGWTGAACWGTWTAYCCYCCATCGATCGATCGTGRTTYTTYGGNCAYCCNGARGTNTA Data.adapters: >1 GGWACWGGWTGAACWGTWTAYCCYCC Data.adaptersRC: >first TGRTTYTTYGGNCAYCCNGARGTNTA
$ python chewbacca.py trim_adapters -i Data.fasta -o rslt -a Data.adapters -arc Data.adapters_RC
rslt/ Data_debarcoded.fastq: @Data_ID#1 ATCGATCGATCG
-
default_program
¶ alias of
Clean_Adapters_Program_Flexbar
Quality Cleaning¶
-
class
clean.Clean_Quality_Command.
Clean_Quality_Command
(args_)[source]¶ Removes regions of low quality from fastq-formatted reads. These regions are likely sources of error, and would be detrimental to other analytical process. Input sequences to this command should have already been demultiplexed, and had their barcodes/adapters removed. Otherwise, the partial removal of these markers would leave behind invalid fragments that would be difficult to detect and likely cause errors down-stream.
- Inputs:
- One or more fastq files to clean.
- Outputs:
- <filename>_cleaned.fastq file(s) - Fastq files, containing sequences with areas of low quality removed.
- Notes:
- Be aware of the program-specific details around ‘N’ nucleotide characters.
- Be aware of the program-specific defaults for minimum surviving sequence lengths.
Example:
./ Data.fasta: @Data_ID#1 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCTTTACAG + !zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz%%%zzzz
The command below asks Chewbacca to trim away any section of length 3 NT in Data_ID#1 that has quality worse than 20, keeping the longer of the remaining ends. If the remaining sequence at the end of this process is shorter than 15 NT, discard the whole sequence (these values are chosen for illustrative purposes).
$ python chewbacca.py clean_seqs -i Data.fasta -o rslt -m 15 -w 3 -q 20
Note that the ‘TTT’ subsequence has been cut, because its average quality (5) is less than the threshold (20). After this cut, the longest surviving subsequence (the subsequence to the left of the cut) was kept, and the shorter subsequence (to the right of the cut) was discarded. Because the final sequence is longer than 15NT, it is kept and written to the output file.
rslt/ Data_cleaned.fastq: @Data_ID#1 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTC + !zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
-
default_program
¶ alias of
Clean_Quality_Program_Trimmomatic
File Conversion¶
-
class
util.Convert_Fastq_Fasta_Command.
Convert_Fastq_Fasta_Command
(args_)[source]¶ Converts a Fasta-formatted file to a FastQ-formatted file. Useful for reducing data size and preparing for fasta-only operations.
- Inputs:
- One or more fastq files to convert to fasta.
- Outputs:
- <filename>.fasta file(s) - Converted fasta files.
Example:
./ Data.fasta: @Data_ID#1 AGACGCGGWACWGGWTGAACWGTWTAYCCYCCATCGATCGATCGTGRTTYTTYGGNCAYCCNGARGTNTA
$ python chewbacca.py trim_adapters -i Data.fasta -o rslt -a Data.adapters -arc Data.adapters_RC
rslt/ Data_debarcoded.fastq: @Data_ID#1 ATCGATCGATCG
Dereplication¶
-
class
dereplicate.Dereplicate_Command.
Dereplicate_Command
(args_)[source]¶ Dereplicates a fasta file by grouping identical reads together under one representative sequence. The number of duplicate/replicant sequences each representative sequence represents is given by a ‘replication count’ at the end of the sequence name in output fasta file. If a .groups file is provided, then previous replication counts will be take in into account (e.g. Imagine a representative sequence X that represents 3 sequences. If X is found to be a replicant of another sequence Y, X will add 3 to replication count of Y). Replication counts are denoted with a suffix of ‘_K’ on the sequence name, where K is the replication count for the group that sequence represents.
- Inputs:
- One or more fasta files to dereplicate.
- Optional: Groups File - A list of representative names and the names of their replicant sequences. You likely have one of these files if you’ve previously run a clustering or dereplication command.
- Outputs:
- _counts.fasta file - A fasta file with unique sequences and their replication counts.
- _derep:ref:.groups - A list of representative names and the names of their replicant sequences.
- Notes:
- This command only dereplicates within each fasta file (not across all files). This means a sequence in one file will be unique within that file, but might exist in another file. To ensure sequences are uniqe across an entire dataset, merge all fasta files into one file, then dereplicate that fasta file.
- Each input file will generate a corresponding _count file.
- If an input .groups file is not provided, then each input fasta file will generate a new groups file named <file_name>_derep.groups. If an input .groups file IS provided, then a single groups file named ‘dereplicated_updated.groups’ will be generated.
- The output .groups file is needed by downstream Chewbacca processes (Dereplication, Clustering, Building the OTU Table).
- The order of sequence names in the *_counts.fasta and .groups file is arbitrary.
Example:
./ Data.fasta >seq1 AAA >seq2 AAA >seq3 AAAG >seq4 AAAGT >seq7 AAAGT test.groups seq4 seq4 seq5 seq6
In the above example, test.groups indicates that seq4 is a sequence that has previously been identified as a representative (in some earlier round of clustering or dereplication). Note that seq4 is a representative for a ‘group’ of identical sequences and therefore listed within that group.
$ python chewbacca.py dereplicate_fasta -i Data.fasta -o rslt -g test.groups
rslt/Data_counts.fasta: >seq4_4 AAAGT >seq1_2 AAA >seq3_1 AAAG rslt_groups_files/*.groups: seq3 seq3 seq1 seq2 seq1 seq4 seq7 seq6 seq5 seq4
Notice that Data_counts.fasta lists the unique sequences from Data.fasta, and their replication counts. Also notice that seq4 had previous replication data (stored in the Data.groups file).
-
default_program
¶ alias of
Dereplicate_Program_Vsearch
File Splitting¶
-
class
util.Partition_Command.
Partition_Command
(args_)[source]¶ A utility command that partitions a fasta/fastq file into a set of files (of the same file format), with a user-specified (maximum) number of sequences per file. Allows users to partition a large file into segments, and perform discrete operations in run_parallel over those segments.
- Inputs:
- One or more fasta/fastq files to partition.
- C: An integer defining the maximum number of sequences per file
- Outputs:
- <filename>_part_<part_#>.<ext> file(s) - <fasta/fastq> files, with at most C sequences per file.
Example:
./ Data.fq: @Data_ID1 GATTTGGGG + !zzzzzzzzz @Data_ID2 GATTTGGGG + !zzzzzzzzz @Data_ID3 GATTTGGGG + !zzzzzzzzz
$ python chewbacca.py convert_fastq_to_fasta -i Data.fq -o rslt/
rslt/ Data.fasta: @Data_ID1 GATTTGGGG @Data_ID2 GATTTGGGG @Data_ID3 GATTTGGGG
File Merging¶
-
class
util.Merge_Command.
Merge_Command
(args_)[source]¶ Concatenates multiple files into a single file. Useful for combining the results of a run_parallel operation, or when preparing for cross-sample derepication.
- Inputs:
- A set of files to merge.
- An <output_filename>.
- An <output_prefix>.
- Outputs:
- <output_filename>.<output_prefix> - A file consisting of all the input files concatenated together.
- Notes:
- The order of the content in the concatenated files is not guaranteed.
Example:
targets/ Data.fq: @Data_ID1 GATTTGGGG + !zzzzzzzzz Data2.fa: @Data_ID1 GATTTGGGG Blah.txt Hello World!
$ python chewbacca.py merge_files -i targets/ -o rslt/ -f txt -n Merged
rslt/ Merged.txt: Hello World! @Data_ID1 GATTTGGGG + !zzzzzzzzz @Data_ID1 GATTTGGGG
File Cleaning¶
-
class
util.Ungap_Command.
Ungap_Command
(args_)[source]¶ Removes target characters from a fasta/fastq file. Useful for removing gap characters from sequence alignments.
- Inputs:
- One or more fasta/fastq files to clean.
- A string of gap characters to remove.
- Outputs:
- *_cleaned.<ext> file - A <fasta/fastq> file with gap characters removed from its sequences.
Example:
Data.fasta: >seq1 AAAAA.A*A-A
$ python chewbacca.py ungap_fasta -i Data.fasta -o rslt -f fasta -g ".*-"
rslt/Data.fasta: >seq1 AAAAAAAA
Deep Cleaning¶
-
class
clean.Clean_Deep_Command.
Clean_Deep_Command
(args_)[source]¶ Performs an intensive deep-cleaning of sequences to eliminate frameshifts, detect chimeras, and determine sequence orientation. Input files to this command should first be dereplicated. Doing so will reduce the total number of alignments required, and reduce computation time.
- Inputs:
- One or more fasta/fastq files to deep clean (nucleotide sequences).
- One reference fasta (nucleotide sequences).
- Outputs:
- *_AA - Amino Acid Alignment file, including reference sequences.
- *_log.csv - A log listing each input sequence, and deep cleaning results for each sequence.
- *_NT - Nucleotide Alignment file, including reference sequences.
- Notes:
- Sequences that do not meet quality cleaning standards are dropped.
- The output files contain reference sequences, and odd alignment characters. Both of these need to be removed by running the Clean_Deep_Repair Command.
Example:
Data.fasta BIOCODE.fa
$ python chewbacca.py macseAlign -i Data.fasta -o rslt -d BIOCODE.fa
rslt/Data_AA rslt/Data_NT rslt/Data_log.csv
-
default_program
¶ alias of
Clean_Deep_Program_Macse
Deep Cleaning Repair¶
-
class
clean.Clean_Deep_Repair_Command.
Clean_Deep_Repair_Command
(args_)[source]¶ - Cleans aligned files by removing gap characters and reference sequences from the file. Sequences passed to this
- command should have previously been aligned.
- Inputs:
- *_AA - Amino Acid Alignment file, including reference sequences.
- *_log.csv - A log listing each input sequence, and deep cleaning results for each sequence.
- *_NT - Nucleotide Alignment file, including reference sequences.
- Nucleotide reference fasta.
- * The original fasta files that were passed in to the Clean_Deep Command
- * The Nucleotide reference fasta that was passed to the Clean_Deep Command
- Outputs:
- *_MERGED.fasta - A clean fasta file with all the surviving sequences from deep cleaning.
- Notes:
- A single *_MERGED.fasta is generated regardless of the number of input files.
Example:
BIOCODE.fa originalData/Data.fasta input/ Data_AA Data_NT Data_log.csv
$ python chewbacca.py -i input/ -o out/ -d BIOCODE.fa -s originalData/
out/ MACSE_OUT_MERGED.fasta
-
default_program
¶ alias of
Clean_Deep_Repair_Program_Macse
Sequence Clustering¶
-
class
cluster.Cluster_Command.
Cluster_Command
(args_)[source]¶ Clusters a set of fasta files. This command generates a fasta file of unique sequences (each representing a cluster) and a .groups file. This command also takes an optional .groups file containing replication data from previous commands. If a .groups file is supplied, only one output .groups file is generated (regardless of the number of inputs).
- Inputs:
- One or more fasta files to cluster.
- Optional: Groups File - A list of representative names and the names of their replicant sequences. You likely have one of these files if you’ve previously run a clustering or dereplication command.
- Outputs:
- *.fasta file - A fasta file with unique sequences and their replication counts.
- *.groups - A Groups File
- Notes:
The input fasta file(s) should have been dereplicated before clustering.
- For a single experiment with multiple fasta files, it is best to merge all input fasta files, dereplicate
them, then cluster the single merged and dereplicated fasta file. This provides the best OTU groupings.
Example:
./ Data.fasta: >seq1_3 AAAAAAAAAA >seq2_1 ATAAAAAAAA >seq3_1 TTTTTTTTTT >seq4_1 TTTTTTATTT >seq5_1 TTTTTTATCT Data.groups: seq1 seq6 seq1 seq7
$ python chewbacca.py cluster_seqs -i Data.fasta -o rslt -g Data.groups
rslt/ Data_clustered_seeds.fasta: >seq1_4 AAAAAAAAAA >seq3_3 TTTTTTTTTT rslt_groups_files/ postcluster_updated.groups: seq3 seq3 seq5 seq4 seq1 seq2 seq1 seq7 seq6
OTU Table Construction¶
-
class
otu.Build_OTU_Table_Command.
Build_OTU_Table_Command
(args_)[source]¶ Builds an OTU table using a .groups, .samples, and .barcodes file. The OTU table shows OTU (group) abundance by sample.
- Inputs:
- One or more Samples File.
- One or more Barcodes file.
- one or more Groups File.
- Outputs:
- matrix.txt - A tab-delimited table mapping OTUs (groups) to their abundance in each sample.
- Notes:
- A sequence name may not appear in more than one group file (or more than one line in a gropus file for that matter!).
Example:
./ test.barcodes Sample1 aaaaaa Sample2 aaaaat Sample3 aaaaac Sample4 aaaaag test.groups seq3 seq3 seq5 seq4 seq1 seq2 seq1 seq7 seq6 test.samples seq1 Sample1 seq2 Sample1 seq3 Sample1 seq4 Sample2 seq5 Sample2 seq6 Sample2 seq7 Sample3
$ python chewbacca.py build_matrix -b test.barcodes -g test.groups -s test.samples -o rslt/
rslt/ matrix.txt OTU Sample1 Sample2 Sample3 Sample4 seq3 1 2 0 0 seq1 2 1 1 0
OTU Identification¶
-
class
otu.Query_OTU_DB_Command.
Query_OTU_DB_Command
(args_)[source]¶ Aligns sequences in a fasta file against those in a reference database in order to determine OTU identity.
- Inputs:
- One or more fasta files containing sequences to identify.
- A curated fasta file of high quality sequences and known species.
- A database containing taxonomic identifiers for sequences in the curated fasta file.
- Outputs:
- A Tax file.
- Notes:
- The files COI.fasta and ncbi.db are included in the Chewbacca Docker distributions.
Example:
~/ARMS/refs/ COI.fasta # A precompiled fasta file of COI data from NCBI. >94483305 AGGACGGATCAGACGAAGAGGGGCGTTTGGTATTGGGTTATGGCAGGGGGTTTTATATTGATAATTGTTGTGATGAAATT GATGGCCCCTAAGATAGAGGAGACACCTGCTAGGTGTAAGGAGAAGATGGTTAGGTCTACGGAGGCTCCAGGGTGGGAGT ncbi.db # A precompiled database of (Taxa) for the entries in 'COI.fasta'. data/ Data.fasta: >seq1 GAATAGGTGTTGGTATAGAATGGGGTCTCCTCCTCCGGCGGGGTCGAAGAAGGTGGTGTTGAGGTTGCGGTCTGTTAGTAGTATAGTGATGCCAGCAG CTAGGACTGGGAGAGATAGGAGAAGTAGGACTGCTGTGATTAGGACGGATCAGACGAAGAGGGGCGTTTGGTATTGGGTTATGGCAGGGGGTTTTATA TTGATAATTGTTGTGAGGAAATTGATGGCCCCTAAGATAGAGGAGACACCTGCTAGGTGTAAGGAGAAGATGGTTAGGTCTACGGAGGCTCCAGGGTG GGAGTAGTTCCCTGCTAA
$ python chewbacca.py query_db -i Data.fasta -o out/ -r ~/ARMS/refs/COI.fasta -d ~/ARMS/refs/ncbi.db
rslt/ Data_result.out seq1 94483305 99.4 173 55.4 Chordata:Mammalia:Primates:Hominidae:Homo:Homo sapiens
-
class
otu.Query_OTU_Fasta_Command.
Query_OTU_Fasta_Command
(args_)[source]¶ Aligns sequences in a fasta file against those in a reference fasta in order to determine OTU identity.
- Inputs:
- One or more fasta files containing sequences to identify.
- A curated fasta file of high quality sequences and known species.
- A two-column, tab-delimited text file mapping sequence names in the curated fasta file to taxonomic identifiers.
- Outputs:
- A Tax file.
- Notes:
- The files ‘bold.fna’ and ‘seq_lin.mapping’ are included in the Chewbacca Docker distributions.
Example:
~/ARMS/data/ bold.fna # A precompiled fasta file of data from BOLD. >GBMAA1117-14 GGGCTTTTGCGGGTATGATAGGAACAGCATTTAGTATGCTTATTAGGTTAGAACTATCTTCCCCAGGGTCTATGTTAGGAGATGATCATTTATATAAT GTTATAGTAACAGCTCATGCATTTGTAATGATATTTTTTTTAGTTATGCCAGTAATGATTGGGGGTTTTGGTAATTGGTTAGTACCTTTATATATTGG TGCCCCGGATATGGCTTTTCCTAGATTAAATAATATTAGTTTTTGGTTATTACCTCCGGCGCTTACTTTATTATTAGGTTCGGCTTTTGTAGAACAAG GGGCTGGGACAGGTTGGACAGTTTATCCGCCTTTATTTAGTATTCAAACTCATTCTGGGGGGTCTGTGGATATGGTAATATTTAGTTTACATTTAGCT GGAATATCTTCTATATTAGGGGCTATGAATTTTATAACAACAATCTTTAATATGAGGTCTCCGGGAGTAACTATGGATAGAATGCCTTTATTTGTTTG ATCTGTTTTAGTAACTGCTTTTTTATTATTATTATCATTGCCAGTATTAGCTGGTGCCATAACAAGTCTTTTAACCGATCGAGATTTTAATACTACAT TT seq_lin.mapping # A precompiled two-column tab file of (Taxa) for the entries in 'bold.fna'. GBMAA1117-14 Animalia;Porifera;Demospongiae;Haplosclerida;Phloeodictyidae;;Calyx;Calyx podatypa ./ Data.fasta: >seq1 ACTATCAGGCATTCAAGCCCATTCAGGGGGAGCAGTAGATATGGCTATATTTAGTCTACATCTAGCTGGTGTATCCTCTATTTTAAGTTCTATAAACT TTATAACTACTATAATTAATATGAGGGTTCCTGGGATGAGTATGCATAGATTACCTCTATTCGTATGGTCTGTATTAGTTACTACAATATTATTGTTG TTATCTTTACCAGTATTAGCTGGTGGAATTACAATGTTATTGACAGATAGAAATTTTAATACAACATTCTTTGACCCTGCGGGAGGAGGAGATCCTAT TTTATTCCAGCACTTATTT
$ python chewbacca.py query_fasta -i Data.fasta -o rslt -r ~/ARMS/data/bold.fna -x ~/ARMS/data/seq_lin.mapping
rslt/ Data_result.out seq1 GBMAA1117-14 90.6 265 84.7 Animalia;Porifera;Demospongiae;Haplosclerida;Phloeodictyidae;;Calyx;Calyx podatypa
OTU Annotation¶
-
class
otu.Annotate_OTU_Table_Command.
Annotate_OTU_Table_Command
(args_)[source]¶ - Annotates an OTU table with Taxonomic names by replacing sequence names in the OTU table with their identified
- taxonomies.
- Inputs:
- Outputs:
- An OTU Table with sequence names replaced by taxonomic names in the input .tax file.
- Notes:
- The input annotation file(s) should list only one identification per sequence name. If you find more than one taxonomic identity for a sequence, choose only one to include in the input .tax file(s).
Example:
./ matrix.txt OTU Sample1 Sample2 Sample3 Sample4 seq3 1 2 0 0 seq1 2 1 1 0 data.tax: seq1 94483305 99.4 173 55.4 Chordata:Mammalia:Primates:Hominidae:Homo:Homo sapiens
$ python chewbacca.py annotate_matrix -i matrix.txt -a data.tax -o rslt
rslt/ matrix.txt OTU Sample1 Sample2 Sample3 Sample4 seq3 1 2 0 0 Chordata:Mammalia:Primates:Hominidae:Homo:Homo sapiens 2 1 1 0