Documentation:
Backwards Incompatible Changes:
locus
: makeChangeoClone
. In
groupGenes
, locus
was previously required only for single cell data, now
it is also required for bulk data.General:
ExampleTrees
to use the igraph 1.5.0 format. See
https://r.igraph.org/news/index.html#igraph-150 for details.collapseDuplicates
.Diversity:
plotDiversityCurve
and plotAbundanceCurve
where limits were
not being applied correctly to zoom in the plots.Gene:
groupGenes
where TCR chains where not being considered when
detecting heavy chain sequences prior to subsetting.General:
ape::read.fastq
.General:
junctionAlignment
, which counts the number of nucleotides in the
reference germline not present in the alignment, and the number of V and J
nucleotides in the CDR3.Gene Usage:
getFamily
where temporary designation gene names were not
being correctly subset to the cluster (family) level.Lineage:
runPhylip
which was causing buildPhylipLineage
to fail
when run on Windows.General:
readFastqDb
, which reads a repertoire's .fastq file and imports the
sequencing quality scores for sequence_alignment
. Added maskPositionsByQuality
masks positions that have a sequencing quality score lower than the specified
threshold. The convenience function getPositionQuality
will create a
data.frame
with quality scores per position.dplyr
dependency to v1.0.padSeqEnds
, the argument mod3=TRUE
has been added so that sequences are
padded to a length that is a multiple of 3.translateDNA
where NA
values weren't being translated properly.Amino Acid Analysis:
aminoAcidProperties
,
which will now default to nt=TRUE
.Diversity:
countClones
(remove_na
) that will remove all rows with NA
values in the clone column if TRUE
(default) and issue a warning with how many were removed.
If FALSE
, those rows will be kept instead.Gene Usage:
getLocus
to extract the locus information from the
segment call.getChain
to define the chain from the segment or
locus call.countGenes
to give a warning instead of
an error so as not to disrupt running workflows.getSegment
where filtering of non-localized genes was not
being applied when called from getFamily
, because the "NL" part of the name
was removed before the filtering step.getAllele
, getGene
, getFamily
and
getLocus
, to parse constant region gene names correctly.getSegment
to be able to parse
constant region gene names correctly and not remove the "D" from
"IGHD" when strip_d=TRUE
.Lineage:
branch_length
argument to buildPhylipLineage
, and augmented
graphToPhylo
and phyloToGraph
to track intermediate sequence in nodes
for phylo object.countGenes
(remove_na
) that will remove all rows with NA
values in the gene column if TRUE
(default) and issue a warning with how many were removed.
If FALSE
, those rows will be kept instead.Diversity:
plotDiversityTest
that caused all values of q
to appear on
the plot rather than just the specified one.Gene Usage:
groupGenes
where the v_call
j_call
column for J gene grouping.groupGenes
.only_igh
argument of groupGenes
to only_heavy
.Backwards Incompatible Changes:
V_CALL
(Change-O) as the default to identify the field that stored
the V gene calls, they now use v_call
(AIRR). That means, scripts that
relied on default values (previously, v_call="V_CALL"
), will now fail if
calls to the functions are not updated to reflect the correct value for the
data. If data are in the Change-O format, the current default value
v_call="v_call"
will fail to identify the column with the V gene calls
as the column v_call
doesn't exist. In this case, v_call="V_CALL"
needs
to be specified in the function call.ExampleDb
converted to the AIRR Rearrangement standard and examples updated
accordingly. The legacy Change-O version is available as ExampleDbChangeo
.GRAVY
to gravy
);countGenes
, countClones
(e.g., SEQ_COUNT
to seq_count
)estimateAbundance
(e.g., RANK
to rank
)groupGenes
(e.g., VJ_GROUP
to vj_group
)collapseDuplicates
and makeChangeoClone
(e.g., SEQUENCE_ID
to
sequence_id
, COLLAPSE_COUNT
to collapse_count
)summarizeTrees
, getPathLengths
, getMRCA
,
tableEdges
, testEdges
) also return columns in lower case (e.g.,
parent
, child
, outdegree
, steps
, annotation
, pvalue
)IG_COLOR
names converted to official C region identifiers
(IGHA, IGHD, IGHE, IGHG, IGHM, IGHK, IGHL).General:
baseTheme
looks is now consistent across sizing
options.cpuCount
will now return 1
if the core count cannot be determined.padSeqEnds
wherein the pad_char
argument was being
ignored.Diversity:
estimateAbundance
slot clone_by
now contains the name of the column
with the clonal group identifier, as specified in the function call. For
example, if the function was called with clone="clone_id"
,
then the clone_by
slot will be clone_id
.Lineage:
buildPhylipLineage
arguments vcall
, jcall
and
dnapars_exec
to v_call
, j_call
and phylip_exec
, respectively.Deprecated:
rarefyDiversity
is deprecated in favor of alphaDiversity
, which includes
the same functionality.testDiversity
is deprecated. The test calculation have been added to the
normal output of alphaDiversity
.General:
ape
and tibble
dependencies.Lineage:
readIgphyml
to read in IgPhyML output and combineIgphyml
to
combine parameter estimates across samples.graphToPhylo
and phyloToGraph
to allow conversion between
graph and phylo formats.Diversity:
estimateAbundance
where setting the clone
column to a
non-default value produced an error.estimateAbundance
through the min_n
,
max_n
, and uniform
arguments.estimateAbundance
. alphaDiversity
will call estimateAbundance
for
bootstrapping if not provided an existing AbundanceCurve
object.DiversityCurve
and AbundanceCurve
objects to accommodate
the new diversity methods.Gene Usage:
groupGenes
now supports grouping by V gene, J gene, and junction length
(junc_len
) as well, in addition to grouping by V gene and J gene without
junction length. Also added support for single-cell input data with the addition
of new arguments cell_id
, locus
, and only_igh
.General:
nonsquareDist
function to calculate the non-square distance matrix of
sequences.progressBar
, baseTheme
, checkColumns
and cpuCount
.Diversity:
estimateAbundance
, and plotAbundanceCurve
, will now allow group=NULL
to be specified to performance abundance calculations on ungrouped data.Gene Usage:
fill
argument to countGenes
. When set TRUE
this adds zeroes
to the group
pairs that do not exist in the data.groupGenes
to group sequences sharing same V and J gene.Topology Analysis:
indirect=TRUE
.makeChangeoClone
will now issue an error and terminate, instead of
continuing with a warning, when all sequences are not the same length.General:
IPUAC_AA
wherein X was not properly matching against Q.getAAMatrix
to treat * (stop codon) as a mismatch.General:
readChangeoDb
.padSeqEnds
function which pads sequences with Ns to make
then equal in length.collapseDuplicates
.Diversity:
uniform
argument to rarefyDiversity
allowing users to toggle
uniform vs non-uniform sampling.plotAbundance
to plotAbundanceCurve
.estimateAbundance
return object from a data.frame to a new
AbundanceCurve
custom class.plot
call for AbundanceCurve
to plotAbundanceCurve
.annotate
argument from plotDiversityCurve
to
plotAbundanceCurve
.score
argument to plotDiversityCurve
to toggle between
plotting diversity or evenness.plotDiversityTest
to generate a simple plot of
DiversityTest
object summaries.Gene Usage:
omit_nl
argument to getAllele
, getGene
and getFamily
to
allow optional filtering of non-localized (NL) genes.Lineage:
makeChangeoClone
preventing it from interpreting the id
argument correctly.pad_end
argument to makeChangeoClone
to allow automatic
padding of ends to make sequences the same length.General:
dry
argument to collapseDuplicates
which will annotate duplicate
sequences but not remove them when set to TRUE
.collapseDuplicates
was returning one sequence if all
sequences were considered ambiguous.Lineage:
makeChangeoClone
and buildPhylipLineage
for purposes of (optionally)
treating indels as mismatches.buildPhylipLineage
when PHYLIP doesn't generate inferred
sequences and has only one block.General:
readChangeoDb
causing the select
argument to do nothing.Gene Usage:
countGenes
when the clone
argument
is specified to CLONE_COUNT
/CLONE_FREQ
.General:
readChangeoDb
and writeChangeoDb
.General:
seqDist()
wherein distance was not properly calculated in
some sequences containing gap characters.getAAMatrix()
return matrix.General:
readChangeoDb()
to wrap data.table::fread()
instead of
utils::read.table()
if the input file is not compressed.testSeqEqual()
, getSeqDistance()
and getSeqMatrix()
to C++ to
improve performance of collapseDuplicates()
and other dependent functions.testSeqEqual()
, getSeqDistance()
and getSeqMatrix()
to
seqEqual()
, seqDist()
and pairwiseDist()
, respectively.pairwiseEqual()
which creates a logical sequence distance matrix;
TRUE if sequences are identical, FALSE if not, excluding Ns and gaps.X
in
translateDNA()
.collapseDuplicates()
wherein the input data type sanity check
would cause the vignette to fail to build under R 3.3.ExampleDb.gz
file with a larger, more clonal, ExampleDb
data object.ExampleTrees
with a larger set of trees.multiggplot()
to gridPlot()
.Amino Acid Analysis:
normalize=FALSE
for charge calculations to be more consistent
with previously published repertoire sequencing results.Diversity Analysis:
progress
argument to rarefyDiversity()
and testDiversity()
to
enable the (previously default) progress bar.estimateAbundance()
were the function would fail if there
was only a single input sequence per group.data
and summary
slots of DiversityTest
to
uppercase for consistency with other tools.plot
to plotDiversityCurve
for DiversityCurve
objects.Gene Usage:
sortGenes()
function to sort V(D)J genes by name or locus position.clone
argument to countGenes()
to allow restriction of gene
abundance to one gene per clone.Topology Analysis:
General:
base::nchar()
.General:
Amino Acid Analysis:
aliphatic()
function were not being
passed through the ellipsis argument of aminoAcidProperties()
.aminoAcidProperties()
.AA_TRANS
to ABBREV_AA
.Diversity:
rarefyDiversity()
output.Lineage:
ExampleTrees
data with example output from buildPhylipLineage()
.General:
getDNADistMatrix()
and getAADistMatrix()
to getDNAMatrix
and
getAAMatrix()
, respectively.getSeqMatrix()
which calculates a pairwise distance matrix for a set
of sequences.multiggplot()
function for performing multiple panel plots.Amino Acid Analysis:
gravy()
, bulk()
, aliphatic()
, polar()
,
charge()
, countPatterns()
and aminoAcidProperties()
.Annotation:
getSegment()
, getAllele()
, getGene()
and getFamily()
. May be
disabled by providing the argument strip_d=FALSE
.countGenes()
to tabulate V(D)J allele, gene and family usage.Diversity:
countClones()
, estimateAbundance()
and plotAbundance()
.resampleDiversity()
to rarefyDiversity()
and changed many of
the internals. Bootstrapping is now performed on an inferred complete
relative abundance distribution.rarefyDiversity()
and testDiversity()
.rarefyDiversity()
and testDiversity()
are now calculated using the mean and standard
deviation of the bootstrap realizations, rather than the median and
upper/lower quantiles.plotDiversityCurve()
.Initial public release.
General:
citation("alakazam")
command.Lineage:
buildPhylipLineage()
.Lineage:
buildPhylipLineage()
would hang on R 3.2 due to R change
request PR#15508.Prerelease for review.