Documentation:
General:
alakazam 1.3.0
, alakazam::makeChangeoClone
requires the parameter
locus
with default value locus
. This function is used in some examples and
tests in shazam
. We added a locus
column to the package's example data.Distance Profiling:
distToNearest
the parameter locusValues=c("IGH")
to specify loci
values to focus the analysis on.distToNearest
where grouping by fields
was applied
after grouping by genes, therefore not treating independently the different
subsets of data to identify groups of genes. In practice, this means that
if fields was set to treat samples independently (fields='sample_id'
),
single linkage was applied to all data, and two genes could be placed in the
same group of genes if they where connected by an ambiguous gene call in any
of the samples. Now, data is separated by fields
(sample_id in this example)
before creating the groups of genes, and ambiguities in other samples are not
considered.Mutation Profiling:
Bug fix in parallelization set up for functions slideWindowTune
and slideWindowDb
.
plotSlideWindowTune
(slideWindowTunePlot
). Updated the possible
values of the parameter plotFiltered
, for easier usage. The new values
(and their equivalent values in slideWindowTunePlot
) are filtered
(TRUE
),
remaining
(FALSE
), and per_mutation
(NULL
).
Deprecated:
slideWindowTunePlot
in favor of plotSlideWindowTune
, for naming
consistency.General:
New feature:
convertNumbering
to convert between numbering systems (IMGT, Kabat).Mutation Profiling:
shmulateTree
has new argument nproc
to specify the number of cores. Default
values mutThresh
and windowSize
have been set to mutThresh=6
and
windowSize=10
.
Added the option plotFiltered=NULL
to slideWindowTunePlot
.
Fixed a bug in listObservedMutations
not returning a list when db
had
one sequence with one mutation.
Fixed bars shifted in plotMutability
.
General:
Selection Analysis:
observedMutations
, expectedMutations
, and calcBaseline
can analyze
mutations in all regions (CDR1, CDR2, CDR3, FWR1, FWR2, FWR3 and FWR4) by
specifying regionDefinition=IMGT_VDJ
or
regionDefinition=IMGT_VDJ_BY_REGIONS
.setRegionBoundaries
to build sequence-specific
RegionDefinition
objects extending to CDR3 and FWR4.makeGraphDf
to facilitate mutational analysis on
lineage trees.Distance Profiling:
distToNearest
where TRB and TRD sequences where ignored in
distance calculation.distToNearest
causing a fatal error when cross
was set.nearestDist
causing a fatal error when using model="aa"
and crossGroups
.Targeting Models:
plotMutability
.Mutation Profiling:
observedMutations
and calcObservedMutations
causing
mutation counting to fail when there are gap (-
) characters in the
germline sequence.Targeting Models:
createTargetingModel
causing empty counts in the
numMutS
and numMutR
slots.Distance Profiling:
distToNearest
.groupUsingOnlyIGH
argument of distToNearest
to onlyHeavy
.Backwards Incompatible Changes:
V_CALL
(Change-O) as the default to identify the field that stored
the V gene calls, they now use v_call
(AIRR). That means, scripts that
relied on default values (previously, v_call="V_CALL"
), will now fail if
calls to the functions are not updated to reflect the correct value for the
data. If data are in the Change-O format, the current default value
v_call="v_call"
will fail to identify the column with the V gene calls
as the column v_call
doesn't exist. In this case, v_call="V_CALL"
needs
to be specified in the function call.ExampleDb
converted to the AIRR Rearrangement standard and examples
updated accordingly.labels
slot of IMGT_V
has
changed from CDR_R
, CDR_S
, FWR_R
and FWR_S
to cdr_r
, cdr_s
,
fwr_r
and fwr_s
, respectively.CODON_TABLE
and the different MUTATION_SCHEMES
change
from R
, S
and Stop
to r
, s
and stop
, respectively.MU_COUNT_SEQ
to mu_count_seq
.calcBaseline
and related function output columns and S4 object slots.
For example, from PVALUE
, REGION
and BASELINE_CI_PVALUE
to
pvalue
, region
and baseline_ci_pvalue
, respectively.createSubstitutionMatrix
, createMutabilityMatrix
and
createTargetingModel
, changed from model=c("S","RS")
to
model=c("s","rs")
.General:
Targeting Models:
createMutabilityMatrix
, extendMutabilityMatrix
, createTargetingMatrix
,
and createTargetingModel
now also returns the numbers of silent and
replacement mutations used for estimating the 5-mer mutabilities. These
numbers are recorded in the numMutS
and numMutR
slots in the newly
defined MutabilityModel
, MutabilityModelWithSource
, and TargetingMatrix
classes.Mutation Profiling:
shmulateSeq
now also supports specifying the frequency of mutations to be
introduced. (Previously, only the number of mutations was supported.)General:
General:
Distance Calculation:
distToNearest
that could potentially cause sequences from
different partitions to be used for distance calculation.General:
Distance Calculation:
plotDensityThreshold
for negative densities.distToNearest
for performing subsampling while calculating
cross-group nearest neighbor distances.distToNearest
now supports, via a new argument
VJthenLen
, either a 2-stage partitioning (first by V gene and J gene, then
by junction length), or a 1-stage partitioning (simultaneously by V gene, J
gene, and junction length). For 1-stage partitioning, distToNearest
supports
export of the partitioning information as a new column via keepVJLgroup
.distToNearest
now supports single-cell input data with the addition of new
arguments cellIdColumn
, locusColumn
, and groupUsingOnlyIGH
.Mutation Profiling:
shmulateTree
has new arguments, start
and end
, to specify the region
in the sequence where mutations can be introduced.Selection Analysis:
consensusSequence
which can be used to build a
consensus sequence using a variety of methods.General:
TargetingModel
and
RegionDefinition
S4 classes.General:
subsample
argument to distToNearest
function.alakazam
. Specifically, progressBar
, getBaseTheme
and checkColumns
.clearConsole
, getnproc
, and getPlatform
functions.Distance Calculation:
findThreshold
method to density
.density
method by returning the
bandwidth detection process. The density
method should now also yield more
consistent thresholds, on average.subsample
argument to findThreshold
now applies to both the
density
and gmm
methods. Subsampling of distance is not performed by
default.plotDensityThreshold
and plotGmmThreshold
wherein the
breaks
argument was ignored when specifying xmax
and/or xmin
.Selection Analysis:
plotBaselineDensity
arising when the groupColumn
and idColumn
arguments were set to the same column.sizeElement
argument to plotBaselineDensity
to control
line sizefield_name
argument to field
in editBaseline
.Selection Analysis:
plotBaselineDensity
which caused an empty plot to be
generated if there was only a single value in the idColumn
.calcBaseline
which caused a crash in summarizeBaseline
and groupBaseline
when input baseline
is based on only 1 sequence
(i.e. when nrow(baseline@db)
is 1).plot
call on a Baseline
object to plotBaselineDensity
.getBaselineStats
function.summary
method for Baseline
objects that calls
summarizeBaseline
and returns a data.frame.Mutation Profiling:
shmulateSeq
which caused a crash when the input
sequence contains gaps (.
).mutations
in shmulateSeq
to numMutations
.shmulateSeq
and shmulateTree
.calcExpectedMutations
will now treat non-ACTG characters as Ns rather
than produce an error.RegionDefinition
objects for the full V segment as
single region (IMGT_V_BY_SEGMENTS
) and the V segment with each
codon as a separate region (IMGT_V_BY_CODONS
).Targeting Models:
calculateMutability
function which computes the aggregate
mutability for sequences.createSubstitutionMatrix
to fail for data
containing only a single V family.model="S"
) in
createSubstitutionMatrix
, createSubstitutionMatrix
and
createTargetingModel
plot
call on a TargetingModel
object to plotMutability
.General:
Distance Calculation:
"gmm"
method of findThreshold()
that allows users to choose a mixture of two univariate density distribution
functions among four available combinations: "norm-norm"
, "norm-gamma"
,"gamma-norm"
, or "gamma-gamma"
."gmm"
method of findThreshold()
from the best average sensitivity and specificity,
the curve intersection or user defined sensitivity or specificity.cutEdge
argument of findThreshold()
to edge
.Mutation Profiling:
collapseClones()
, adding various deterministic and stochastic
methods to obtain effective clonal sequences, support for including ambiguous
IUPAC characters in output, as well as extensive documentation. Removed
calcClonalConsensus()
from exported functions.observedMutations()
and calcObservedMutations()
.calcObservedMutations()
for sequences with non-triplet overhang at the tail.OBSERVED
) and
expected mutations (previously EXPECTED
) returned by observedMutations()
and expectedMutations()
to MU_COUNT
and MU_EXPECTED
respectively.Selection Analysis:
calcBaseline()
no longer calls collapseClones()
automatically if a CLONE
column is present. As indicated by the documentation for calcBaseline()
users are advised to obtain effective clonal sequences (for example, calling
collapseClones()
) before running calcBaseline()
.calcBaseline()
.Mutation Profiling:
collapseClones()
that prevented it from running when nproc
is greater than 1.General:
Mutation Profiling:
collapseClones()
that resulted in erroneous CLONAL_SEQUENCE
and CLONAL_GERMLINE
being returned.observedMutations
was running.General:
Selection Analysis:
summarizeBaseline()
. The returned
p-value can now be either positive or negative. Its magnitude (without the
sign) should be interpreted as per normal. Its sign indicates the direction
of the selection detected. A positive p-value indicates positive selection,
whereas a negative p-value indicates negative selection.editBaseline()
to exported functions, and a corresponding section
in the vignette.calcBaseline()
.Targeting Models:
numMutationsOnly
argument to createSubstitutionMatrix()
, enabling
parameter tuning for minNumMutations
.minNumMutationsTune()
and minNumSeqMutationsTune()
to
tune for parameters minNumMutations
and minNumSeqMutations
in functions
createSubstitutionMatrix()
and createMutabilityMatrix()
respectively.
Also added function plotTune()
which helps visualize parameter tuning using
the above mentioned two new functions.HKL_S5F
).HS5FModel
as HH_S5F
, MRS5NFModel
as MK_RS5NF
, and U5NModel
as U5N
.HH_S1F
),
human kappa and lambda light chain, silent, 1-mer, functional substitution model
(HKL_S1F
), and mouse kappa light chain, replacement and silent, 1-mer,
non-functional substitution model (MK_RS1NF
).makeDegenerate5merSub
and makeDegenerate5merMut
which make degenerate
5-mer substitution and mutability models respectively based on the 1-mer models.
Also added makeAverage1merSub
and makeAverage1merMut
which make 1-mer
substitution and mutability models respectively by averaging over the 5-mer models.Mutation Profiling:
returnRaw
argument to calcObservedMutations()
, which if true returns
the positions of point mutations and their corresponding mutation types, as
opposed to counts of mutations (hence "raw").slideWindowSeq()
and slideWindowDb()
which implement
a sliding window approach towards filtering a single sequence or sequences in
a data.frame which contain(s) equal to or more than a given number of mutations
in a given number of consecutive nucleotides.slideWindowTune()
which allows for parameter tuning for
using slideWindowSeq()
and slideWindowDb()
.slideWindowTunePlot()
which visualizes parameter tuning
by slideWindowTune()
.Distance Calculation:
distToNearest
wherein normalize="length"
for 5-mer models
was resulting in distances normalized by junction length squared instead of
raw junction length.distToNearest
wherein symmetry="min"
was calculating the
minimum of the total distance between two sequences instead of the minimum
distance at each mutated position.findThreshold
function to infer clonal distance threshold from
nearest neighbor distances returned by distToNearest
.length
option for the normalize
argument of distToNearest
to len
so it matches Change-O.HS1FDistance
and M1NDistance
distance models, which have
been renamed to hs1f_compat
and m1n_compat
in the model
argument of
distToNearest
. These deprecated models should be used for compatibility
with DefineClones in Change-O v0.3.3. These models have been replaced by
replaced by hh_s1f
and mk_rs1nf
, which are supported by Change-O v0.3.4.hs5f
model in distToNearest
to hh_s5f
.MK_RS5NF
models to distToNearest
.calcTargetingDistance()
to enable calculation of a symmetric distance
matrix given a 1-mer substitution matrix normalized by row, such as HH_S1F
.findThreshold
. The previous smoothed density method is available via the
method="density"
argument and the new GMM method is available via
method="gmm"
.plotGmmThreshold
and plotDensityThreshold
to plot
the threshold detection results from findThreshold
for the "gmm"
and
"density"
methods, respectively.Region Definition:
IMGT_V_NO_CDR3
and IMGT_V_BY_REGIONS_NO_CDR3
. Updated IMGT_V
and IMGT_V_BY_REGIONS
so that neither includes CDR3 now.Selection Analysis:
Targeting Models:
numSeqMutationsOnly
argument to createMutabilityMatrix()
, enabling
parameter tuning for minNumSeqMutations
.General:
InfluenzaDb
data object, in favor of the updated ExampleDb
provided in alakazam 0.2.4.Distance Calculation:
cross
argument to distToNearest()
which allows restriction of
distances to only distances across samples (i.e., excludes within-sample
distances).mst
flag to distToNearest()
, which will return all distances to
neighboring nodes in a minimum spanning tree.aa
model
of distToNearest()
.aa
model of distToNearest()
.Mutation Profiling:
MutationDefinition
VOLUME_MUTATIONS
.shmulateSeq()
and shmulateTree()
to simulate
mutations on sequences and lineage trees, respectively, using a 5-mer
targeting model.collapseByClone
, calcDbExpectedMutations
and
calcDbObservedMutations
to collapseClones
, expectedMutations
,
and observedMutations
, respectively.Selection Analysis:
Baseline
object through groupBaseline()
multiple times resulted in incorrect normalization.title
options to plotBaselineSummary()
and plotBaselineDensity()
.plotBaselineSummary()
and plotBaselineDensity()
.testBaseline()
function to test the significance of
differences between two selection distributions.General:
InfluenzaDb
.dplyr::tbl_df
object instead of a data.frame
.Distance Calculation:
distToNearest()
did not return the nearest neighbor
with a non-zero distance.Targeting Models:
createSubstitutionMatrix()
,createMutabilityMatrix()
, and plotMutability()
.plotMutability()
.Mutation Profiling:
MutationDefinition
objects MUTATIONS_CHARGE
,
MUTATIONS_HYDROPATHY
, MUTATIONS_POLARITY
providing alternate approaches
to defining replacement and silent annotations to mutations when calling
calcDBObservedMutations()
and calcDBExpectedMutations()
.regionDefinition=NULL
consistent for all mutation
profiling functions. Now the entire sequence is used as the region and
calculations are made accordingly.calcDBObservedMutations()
returns R and S mutations also
when regionDefinition=NULL
. Older versions reported the sum of R and S
mutations. The function will add the columns OBSERVED_SEQ_R
and
OBSERVED_SEQ_S
when frequency=FALSE
, and MU_FREQ_SEQ_R
and
MU_FREQ_SEQ_R
when frequency=TRUE
.General:
Distance Calculation:
symmetry
parameter to distToNearest to change behavior of how
asymmetric distances (A->B != B->A) are combined to get distance
between A and B.Mutation Profiling:
Selection Analysis:
Targeting Models:
minNumMutations
parameter to createSubstitutionMatrix. This is the
minimum number of observed 5-mers required for the substitution model.
The substitution rate of 5-mers with fewer number of observed mutations
will be inferred from other 5-mers.minNumSeqMutations
parameter to createMutabilityMatrix. This is the
minimum number of mutations required in sequences containing the 5-mers of
interest. The mutability of 5-mers with fewer number of observed mutations
in the sequences will be inferred.returnModel
parameter to createSubstitutionMatrix. This gives user
the option to return 1-mer or 5-mer model.returnSource
parameter to createMutabilityMatrix. If TRUE, the
code will return a data frame indicating whether each 5-mer mutability is
observed or inferred.Initial public release.
General:
Influenza.tab
file did not load on Mac OS X.citation("shazam")
command.Distance Calculation:
HS1FDistance
, based on the
Yaari et al, 2013 data.hs1f
as the default distance model for distToNearest()
.distToNearest()
.Mutation Profiling:
calcDBClonalConsensus()
so that the function now works
correctly when called with the argument collapseByClone=FALSE
.frequency
argument to calcObservedMutations()
and
calcDBObservedMutations()
, which enables return of mutation frequencies
rather the default of mutation counts.Targeting Models:
M3NModel
and all options for using said model.createSubstitutionMatrix()
and createMutabilityMatrix()
where IMGT gaps were not being handled.General:
Targeting Models:
Targeting Models:
U5NModel
, which is a uniform 5-mer model.plotMutability()
output.Prerelease for review.